Table of Links
Abstract and 1. Introduction
2 Methodology
3 Hardware
4 Software
5 Network
6 Consensus
7 Cryptocurrency Economics
8 Client API
9 Governance
10 Geography
11 Case Studies
12 Discussion and References
A. Decentralization and Policymaking
B. Software Testing
C. Brief Evaluations per Layer
D. Measuring decentralization
E. Fault Tolerance and Decentralization
C Brief Evaluations per Layer
In this section we provide brief evaluations of various systems for each subcategory of the layers covered in Sections 3 – 10. In doing so, we identify various questions that require further research across two broad axes. First, from a measurement perspective, many systems and dimensions lack pertinent data or, to make matters more interesting, it is unclear how to even conduct robust measurements for the data under question. Second, from a design perspective, a relevant thread of research would focus on enabling or incentivizing protocol designers to implement accurate data collection mechanisms as a part of the systems themselves.
C.1 Hardware: Physical Hardware
Centralization around specialized hardware has been documented [62], although no academic research could be found on mining hardware usage in real-world systems. Interestingly, it is unclear how to even measure the usage of hardware equipment in PoW mining via public data, as well as how to develop PoW algorithms that promote diversity, thus future research could aim at answering these questions. Nonetheless, there exist some reports, though they often present conflicting assessments. In Bitcoin, between 2017−2019 a single mining hardware provider accounted for either 65 − 75% [18] or 46 − 58% [185] of the network’s hashrate, with 98% of the market controlled by 4 firms.
C.2 Hardware: Virtual Hardware
A comprehensive evaluation of centralization in terms of hardware hosting, and how to incentivize hosting diversity is a promising thread of future research. Here, we consider two examples of highly-valued PoS systems, Solana and Avalanche.[31] Solana’s validators predominantly operate cloud-based nodes; of the 1873 nodes, more than half are hosted in two services, with more than 50% and more than 66% of participating stake hosted by 3 and 5 providers respectively.[32] Avalanche observes similar concentration issues; 731 out of 1254 validators, who control 71.84% of all stake, are hosted by a single company.[33]
C.3 Software: Protocol Participation
The literature is lacking formal analyses on the usage of full node ledger software, so a rigorous evaluation of the dynamics in software development and usage could highlight various centralization tendencies. Various community and commercial projects do keep track of statistics though. In most systems, a single client software is predominantly used by the participating nodes in the network. In Bitcoin 99% use Bitcoin Core (aka Satoshi), in Ethereum 78% use geth, in Litecoin 95% use LitecoinCore, while systems like Zcash are completely centralized with all nodes using one software (MagicBean); a notable exception is Bitcoin Cash, where usage is split between BCH Unlimited (33%), Bitcoin Cash Node (51%), and Bitcoin ABC (12%).[34]
Some projects are actively managed by a wide network of developers, e.g., more than 200 contribute to Ethereum [173], while others are particularly centralized. As of 2018, 7% of all Bitcoin Core files were written by the same person, while 30% of all files had a single author. In Ethereum, these figures rise to 20% and 55% respectively [11]. Comments observe similar centralization patterns, with 8 (0.3%) and 18 (0.6%) people contributing half of all comments in Bitcoin and Ethereum respectively [11].
C.4 Software: Asset Management
As keys and addresses are wallet agnostic, it is impossible to identify if two addresses are generated by the same wallet implementation, unless it purposely reveals such information. Consequently, it is unclear how to evaluate the wallet market’s diversity and how widespread wallet usage is from public data. To our knowledge, no rigorous investigation has been conducted on this topic, either analyzing historical data patterns or conducting usability studies.
C.5 Network: Topology
Bitcoin is notoriously vigilant in hiding its network topology [55,74]. Various works analyze it by inferring a node’s neighborhood [20], timing analysis [133], or conflicting transaction propagation [55]. In 2014, it was found that more than half of Bitcoin nodes resided in 40 autonomous systems (ASs), with 30% in just 10 ASs [69]. In 2017, Bitcoin’s and Ethereum’s P2P networks observed similar sizes (3390 nodes for Bitcoin, 4302 for Ethereum). Bitcoin offered lower latency and higher bandwidth, with nodes being closer geographically and 56% of them hosted on dedicated hosting services (vs. 28% for Ethereum) [82]. In addition, 68% of the mining power was hosted on 10 transit networks, while 3 transit networks saw more than 60% of all connections [4]. In 2019, Ethereum’s network presented a large degree of centralization around clusters, forming a “small world network” [76] with 10 cloud hosting providers accounting for 57% of all nodes and one hosting almost a quarter [111]. This was reaffirmed in 2020, as Ethereum messages could be sent to most nodes within 6 hops [176]. In 2020, Monero’s topology also observed a high level of centralization, as 13.2% of nodes maintained 82.86% of all connections [35]. No analysis of PoS systems’ networks could be found; given their non-reliance on specialized hardware and ease of relocation, a PoS-PoW comparison would be of interest.
Bitcoin, as the first blockchain system, has also seen multiple eclipse attacks and defenses [166,87,167]. Some works attempt to increase the number of connections without reaching prohibitive levels of bandwidth usage [132]. Ethereum was also found vulnerable to eclipse attacks that do not require monopolizing a node’s connections, but relied on message propagation [180].
C.6 Network: Node Bootstrapping and Peer Discovery
Bitcoin Core defines 8 outgoing connections, selected randomly from a known list of identities, and up to 125 incoming [59]. When (re)joining the network, a node attempts to connect to previously-known identities and, if unsuccessful, employs a (hardcoded) list of DNS seeds. Other systems, like Ethereum and Cardano, employ more complex, DHT-based mechanisms [125] that require further analysis. Cardano is also an interesting implementation, as it assumes two node types: (a) core nodes that participate in consensus, and (b) relays that intermediate between core and edge nodes (e.g., wallets); in the default configuration relays are operated by only a small committee [59].
C.7 Consensus
In (game) theory, Bitcoin’s resistance to centralization has been both supported [113,108] and refuted [68], depending on the economic model assumed for the participants’ utilities. In practice, mining pools have been observed as early as 2013 [83]. Between 2016 − 2020, pools created 98.6% of Bitcoin blocks [117], with 5 pools consistently contributing between 65 − 85% of the eventual blocks and 25 controlling more than 94% of all hashing power [175]. Centralization has also been observed within mining pools. Between 2017 − 2018, no entity controlled more than 21% [82] of hashing power, but three pools controlled a majority; within these pools, a few participants (≤ 20) received over 50% of rewards [152]. Miners often participate in multiple pools at the same time, a behavior also observed in Ethereum [182]. Although centralization around pools is high in (PoW-based) Ethereum (in 2019, 3 pools controlled a majority of mining power [119]), power within the pools is spread across hundreds of addresses [182], albeit some possibly owned by the same parties.
Some systems use a committee-based approach, as opposed to Bitcoin’s open participation model. Here, at each time there exists a known designated party which proposes a block and a committee of participants that vote for it. The following are examples of such systems, where each employs their own consensus protocol and defines a different number of participants per epoch using an on-chain process:[35] i) Cosmos: 175; ii) Polkadot: 297; iii) EOS: 21; iv) Harmony: 800; v) NEAR: 100. In all these systems, well-known exchanges are among the top elected validators.[36] Interestingly, the stake controlled by the elected validators is mostly delegated, instead of self-owned. Also, organizations often control multiple validators, so the number of real actors is often even smaller than the nominal number of participants (nevertheless some systems, e.g., Polkadot, go to greater lengths to ensure the representative participation satisfies desirable properties such as proportionality, cf. [38]). Consequently, identifying the participation distribution among real-world users and the refreshment rate of the elected committee across multiple epochs is an interesting research question. Similarly for investigating all the desiderata of representative participation from a social choice perspective.
C.8 Cryptocurrency Economics: (Initial) Token Distribution
PoS systems like Cardano, NEO, and Algorand tried to reduce early-stage risks via a two-phase launch. At first, the ledger was controlled by either the core development company or foundation or a committee numbering a small number of entities. After token ownership was sufficiently distributed, participation opened widely to all stakeholders. Beyond the obvious issues in maintaining a permissioned database, the first phase typically takes years to conclude. Early users often tend to either not participate or transfer their tokens to the few exchanges that support these new tokens [174]. Therefore, an interesting question is the relationship of the delay between launch and full decentralization and the diversity of early investors.
C.9 Cryptocurrency Economics: Token Ownership
Bitcoin’s wealth ownership and transaction graph has been analyzed since at least 2012 [153]. Over time, it demonstrated a three-phase history of distinct (de)centralization patterns, where 100 addresses possess a high centralization degree of assets and wealth flow in the network [42,157]. Similar analyses exist for Ethereum [41], Zcash [97], and other cryptocurrencies [130].
As of 2022, cryptocurrency wealth concentration is particularly extreme (Table 2). To establish some context, the income Gini coefficient of the 10 lowestperforming countries ranges between 0.63 − 0.512 [179]. Bitcoin has a Gini coefficient of 0.514, considering only the 10, 000 richest addresses, and a staggering 0.955 w.r.t. all addresses. In the arguably deeply unequal global real-world economy, the richest 0.01% of individuals (520, 000 people) hold 11% of all wealth [85]. Bitcoin manages to beat that figure, with 100 addresses holding 14.01% of all tokens.
A complexity in measuring wealth decentralization in cryptocurrencies arises due to their pseudonymous (or even anonymous) nature. Specifically, the number of addresses often does not correspond to individual people or entities, cf. [153,126]. A user may control multiple addresses, e.g., each with a small balance. When interpreting the Gini coefficient, this artificially enlarges the population and possibly biases the results towards decentralization. In addition, an address’s assets may be owned by many users (e.g., exchange addresses), which biases Gini towards centralization. Thus, developing tools to compute wealth inequality in blockchain systems, without sacrificing core features like anonymity and privacy, is a crucial problem for exploration.
C.10 Cryptocurrency Economics: Secondary Markets
Table 3 summarizes secondary blockchain market data across 121 exchanges. Many systems (Bitcoin, Ethereum, Litecoin, XRP) are traded on all but a few small exchanges. Tether is by far the most available, in terms of market pairs,
and used, in terms of volume. Interestingly, for all systems, except perhaps Bitcoin, the majority of volume is not of the highest transparency.[37] This is consistent with reports that show market manipulation is endemic in cryptocurrency markets, with multiple cases of wash trading, fake trading volumes, and other fraudulent behavior [46,29,21]. Market transactions are primarily conducted in a handful of exchanges. By far the most used is Binance (20% of the total daily volume),[38] although Coinbase is the most recognized in North America [137].
C.11 Client API
In Bitcoin, most wallets are either SPV or explorer-based [100]. In the first case, the wallet obtains the chain’s headers and, to verify that a transaction is published, requests a proof from full nodes. Although SPV does mitigate safety attacks, it also hurts the user’s privacy, as their transaction information is leaked to the full node operators. Explorer-based wallets instead rely entirely on a single explorer service and its full nodes, which are trusted completely. In 2018, 5−10% of all Ethereum nodes reportedly relied on a centralized blockchain API service, Infura [138]. This reliance continued throughout the years. In 2020, a service outage demonstrated in practice the hazards of such centralization [178]. In 2022, a misconfiguration on Infura’s part resulted in wallets (and, thus, user funds) being inaccessible [45]. In terms of applications, OpenSea is the leading hosting service for NFTs. As of 2021, it reportedly handled 98% of all NFT volume [181], charging a 2.5% commission on all sales. As expected, an OpenSea outage in 2022 also resulted in the NFT market being practically unusable [72].
C.12 Governance: Improvements and Conflict Resolution
Most systems employ an Improvement Proposal mechanism, where proposals are posed as issues in Github, a (centralized) system that is extensively used for software development. If a change gathers enough support, it is incorporated in the codebase. To voice approval for proposals, miners often include encoded messages in blocks. From early on, proposals in Bitcoin and Ethereum have been made by a handful of developers [83,11]. In the discussion phase, many people participate but again only a few actors contribute most comments, while in cases like Bitcoin the groups of developers and commenters largely overlap.
C.13 Governance: Development Funding
Most existing blockchain systems follow the first approach, i.e., not making funding provisions. In many cases, funding is channeled through a few foundations and companies.[39] Treasuries are present in some ledgers, like Decred, Cardano, and Dash. Despite their potential though, widespread funding has yet to be demonstrated for most systems.[40]
C.14 Geography
In terms of legal jurisdiction, different aspects are centralized in different countries. In Bitcoin, the 4 companies that predominantly produce mining hardware[43] are all based in China. Regarding secondary markets, many exchanges operate in multiple countries (Table 3); 20 of 121 operate in USA, thus falling under US jurisdiction, 17 in China, and 10 in Japan, with the rest spread across the world. However, only 8 are based in the US, with most registered in the Seychelles (13) and other “offshore” locations. Many ICOs also exclude US investors, following their US classification as securities [170]. Finally, an interesting case concerns the Bitcoin Core software, which is not available via bitcoin.org in the UK, following a related court ruling [9].
[31] Solana and Avalanche are #9 and #14 respectively w.r.t. market capitalization. [CoinMarketCap; August 2022]
[32] validators.app. [August 2022]
[33] Data obtained from avascan.info. [August 2022]
[34] Sources: blockchair, ethernodes [August 2022]
[35] Sources: hub.cosmos.network, wiki.polkadot.network, developers.eos.io,docs.harmony.one, near.org [August 2022]
[36] For example, Binance is a validator in all mentioned systems. [Cosmos, Polkadot, EOS, Harmony, NEAR; August 2022]
[37] For “transparency” see the methodology and data of Nomics: https://nomics.com/blog/essays/transparency-ratings.
[38] CoinMarketCap [August 2022]
[39] The first usually take the name of the token, e.g., the {Bitcoin, Ethereum, Cardano} Foundations. Examples of the second are the ASIC companies discussed in Section 3 or software companies like Blockstream (Bitcoin), Consensys (Ethereum), Input Output (Cardano), etc.
[40] Decred’s treasury holds $23.8M, and has allocated $250K over the past year. Cardano’s treasury holds approx. $500M and has distributed $17.2M across 939 projects. Dash, one of the first systems to set a treasury, allocated $500, 000 over 2018, but it appears non-functional as of 2022. [dcrdata.decred.org, cardano.ideascale.com, dashvotetracker; August 2022]
[41] 46.5% of Bitcoin’s nodes operate over Tor. [bitnodes; August 2022]
[42] bitinfocharts.com
[43] Bitmain, MicroBT, Canaan, Ebang