Paradigm: A Detailed Explanation of Ethereum's Historical Growth Issues and Solutions
EIP-4444 can solve the historical growth issue of Ethereum and make room for increasing the Gas limit.
Original Title: How to Raise the Gas Limit, Part 2: History Growth
Original Authors: Storm Slivkoff, Georgios Konstantopoulos
Original Translation: Luffy, Foresight News
History growth is currently the biggest bottleneck for Ethereum scalability. Surprisingly, history growth has become a bigger issue than state growth. Within a few years, historical data will exceed the storage capacity of many Ethereum nodes.
The good news is:
- History growth is an issue easier to solve than state growth.
- Solutions are actively being developed.
- Solving history growth will alleviate the state growth problem.
In this article, we will continue to explore Ethereum scalability issues from Part 1, shifting the focus from state growth to history growth. Using detailed datasets, our goal is to 1) technically understand Ethereum's scalability bottlenecks, and 2) facilitate discussions on the optimal solutions surrounding Ethereum Gas limits.
What is History Growth?
History is the collection of all blocks and transactions executed by Ethereum throughout its entire lifecycle, encompassing all data from the genesis block to the current block. History growth involves the accumulation of new blocks and transactions over time.
Figure 1 illustrates the relationship between history growth and various protocol metrics and Ethereum node hardware constraints. Unlike state growth, history growth is subject to a different set of hardware constraints. History growth puts pressure on network IO as new blocks and transactions need to be transmitted across the entire network. It also stresses node storage space as each Ethereum node stores a complete copy of the historical records. If history growth accelerates beyond these hardware limits, nodes will no longer be able to achieve stable consensus with their peers. For an overview of state growth and other scalability bottlenecks, please refer to Part 1 of this series.
Figure 1: Ethereum Scalability Bottlenecks
Until recently, most of each node's network throughput was used for transmitting historical records (such as new blocks and transactions). With the introduction of blobs in the Dencun hard fork, this situation has changed. Blobs now account for a significant portion of node network activity. However, blobs are not considered part of historical records because 1) they are only stored by nodes for 2 weeks and then discarded, and 2) they do not require rehashing Ethereum data since the genesis block. Due to (1), blobs do not significantly increase the storage burden of each Ethereum node. We will discuss blobs later in this article.
In this article, we will focus on history growth and discuss the relationship between history and state. Since state growth and history growth share some overlapping hardware constraints, they are related issues, and solving one can help address the other.
How Fast is History Growth?
Figure 2 shows the historical growth rate of Ethereum since its inception. Each vertical line represents growth for a month. The y-axis represents the monthly historical growth in gigabytes. Transactions are categorized by their "to address" and represented in bytes using RLP (https://ethereum.org/en/developers/docs/data-structures-and-encoding/rlp/). Contracts that are not easily identifiable are classified as "unknown." The "other" category includes a range of small categories such as infrastructure and gaming.
Figure 2: Ethereum Historical Growth Rate Over Time
Several key points from the above chart:
- History growth rate is 6 to 8 times faster than state growth: History growth rate recently peaked at 36.0 GiB/month, currently at 19.3 GiB/month. State growth rate peaked at around 6.0 GiB/month, currently at 2.5 GiB/month. A comparison of historical and state growth in terms of growth and cumulative size will be discussed later in this article.
- Prior to Dencun, history growth rate was accelerating: While state growth has been roughly linear over the years (see Part 1), history has exhibited superlinear growth. Given that the growth rate of linear growth leads to quadratic growth in overall size, superlinear growth rate results in size exceeding quadratic growth. This acceleration abruptly stopped after Dencun. This marked a significant decrease in Ethereum's historical growth rate for the first time.
- Most recent history growth is largely from Rollup: Each L2 publishes its transaction copies back to the mainnet. This generates a significant amount of historical records and makes Rollup the most significant contributor to historical growth over the past year. However, Dencun allows L2 to use blobs instead of historical records to publish their transaction data, reducing Rollup's generation of most Ethereum historical records. We will delve into Rollup in more detail later in this article.
Who are the Biggest Contributors to Ethereum Historical Growth?
The amount of history generated by different contract categories reveals how Ethereum's usage patterns have evolved over time. Figure 3 shows the relative contributions of various contract categories. This data is normalized from the same data as Figure 2.
Figure 3: Contributions of Different Contract Categories to Historical Growth
These data unveil four distinct periods of Ethereum usage patterns:
- Early Period (Purple): There was minimal on-chain activity in Ethereum's initial years. Many of these early contracts are now difficult to identify, marked as "unknown" in the chart.
- ERC-20 Era (Green): The ERC-20 standard was finalized by the end of 2015 but saw significant development only in 2017 and 2018. ERC-20 contracts became the largest source of historical growth in 2019.
- DEX/DeFi Era (Brown): DEX and DeFi contracts appeared on-chain as early as 2016 and gained attention in 2017. However, it wasn't until the DeFi summer of 2020 that they became the largest category of historical growth. DeFi and DEX contracts occupied over 50% of historical growth in parts of 2021 and 2022.
- Rollup Era (Gray): In early 2023, L2 Rollups began executing more transactions than the mainnet. In the months leading up to Dencun, they generated about 2/3 of Ethereum's historical records.
Each era represents increasingly complex Ethereum usage patterns compared to the previous one. Complexity over time can be seen as a form of Ethereum expansion that cannot be measured by simple metrics like transactions per second.
In the most recent data month (April 2024), Rollup no longer generates the majority of historical records. It is currently unclear whether future historical records will come from DEX and DeFi, or if new usage patterns will emerge.
What About Blobs?
The Dencun hard fork introduced blobs, significantly altering the dynamics of historical growth by allowing Rollup to use inexpensive blobs instead of historical records to publish data. Figure 4 zooms in on the impact of Dencun on historical growth rates before and after the upgrade. The chart is similar to Figure 2, except each vertical line represents a day instead of a month.
Figure 4: Impact of Dencun on Historical Growth
Key conclusions drawn from this chart:
- Since Dencun, Rollup's historical growth has decreased by about 2/3: Most Rollups have transitioned from call data to blobs, significantly reducing the volume of historical records they generate. However, as of April 2024, some Rollups have yet to transition from call data to blobs.
- Since Dencun, total historical growth has decreased by about 1/3: Dencun Only reduced the historical growth of rollup. The historical growth of other contract categories has slightly increased. Even after Dencun, historical growth is still 8 times that of state growth (see next section for details).
Although blobs have reduced the rate of historical growth, they are still a new feature of Ethereum. It is currently unclear at what level the historical growth rate will stabilize with the existence of blobs.
How fast is acceptable historical growth?
Increasing the Gas limit will increase the historical growth rate. Therefore, proposals to increase the Gas limit (such as Pump the Gas) must consider the relationship between historical growth and hardware bottlenecks for each node.
To determine an acceptable historical growth rate, it is necessary to first understand how long the current node hardware can sustain in terms of network and storage. Networking hardware may be able to sustain the status quo indefinitely because historical growth is unlikely to return to the peak before Dencun before increasing the Gas limit. However, the storage burden of history will continue to increase over time. Under the current storage strategy, the storage disk of each node will eventually be filled with historical records, which is inevitable.
Figure 5 shows the storage burden of Ethereum nodes over time and predicts the growth of the storage burden in the next 3 years. The prediction refers to the growth rate in April 2024. With changes in future usage patterns or Gas limits, this growth rate may increase or decrease.
Figure 5: Size of historical records, state, and full node storage burden
From this figure, we can draw several key conclusions:
- The storage space occupied by historical records is approximately 3 times that of the state. This difference will increase over time as the historical growth rate is about 8 times that of the state.
- 1.8 TiB is a critical threshold, and many nodes will be forced to upgrade their storage disks. 2 TB is a common storage disk size, providing only 1.8 TiB of available space. Note that TB (1 trillion bytes) and TiB (= 1024^4 bytes) are different units. For many node operators, the "real" critical threshold is even lower because merged validators must run consensus clients together with execution clients.
- The critical threshold will be reached within 2 to 3 years. Increasing any amount of Gas limit will correspondingly accelerate the arrival of this time. Reaching this threshold will bring a significant maintenance burden to node operators and require the purchase of additional hardware (e.g., $300 NVME drives).
Unlike state data, historical data is append-only and much less frequently accessed. Therefore, theoretically, historical data can be stored separately from state data on cheaper storage media. This can be achieved by some clients like Geth.
In addition to storage capacity, network IO is another major constraint on historical growth. Unlike storage capacity, network IO constraints will not pose problems for nodes in the short term, but these constraints will become important for future increases in Gas limits.
To understand how much historical growth typical Ethereum nodes' network capacity can support, it is necessary to know the relationship between historical growth and various network health metrics, such as reorg rate, slot misses, finality misses, proof misses, sync committee misses, and block submission delays. The analysis of these metrics is beyond the scope of this article but can be found in previous surveys of consensus layer health. Additionally, the Ethereum Foundation's Xatu project has been building public datasets to expedite such analysis.
How to address historical growth issues?
Historical growth is a problem easier to solve than state growth. It can be almost entirely addressed by the proposed EIP-4444. This EIP will change each node from storing the entire Ethereum historical data to only storing one year of historical data. After implementing EIP-4444, data storage will no longer be a bottleneck for Ethereum scalability, and the increase in Gas limits will no longer be constrained in the long run. EIP-4444 is necessary for the long-term sustainability of the network; otherwise, historical growth rates will quickly necessitate regular updates to network node hardware.
Figure 6 shows the impact of EIP-4444 on the storage burden of each node over the next 3 years. This is similar to Figure 4 but with lighter lines added to represent the storage burden after the implementation of EIP-4444.
Figure 6: Impact of EIP-4444 on Ethereum node storage burden
From this figure, several key conclusions can be drawn:
- EIP-4444 will halve the current storage burden. The storage burden will decrease from 1.2 TiB to 633 GiB.
- EIP-4444 will stabilize historical storage burden. Assuming a constant historical growth rate, historical data will be discarded at the rate it is generated.
- After EIP-4444, it will take many years for node storage burden to reach today's levels. This is because state growth will be the only factor increasing the storage burden, and the growth rate of the state is slower than that of historical growth.
Even after implementing EIP-4444, historical growth will still bring a certain level of storage burden as nodes will store one year of historical records. However, even if Ethereum reaches a global scale, this burden is not difficult to address. Once the method of storing historical records is proven reliable, the one-year expiration time of EIP-4444 may be shortened to a few months, weeks, or even shorter.
How to store Ethereum's historical records?
EIP-4444 raises a question: if historical records are not saved by Ethereum nodes themselves, how should they be saved? Historical records play a crucial role in Ethereum's verification, accounting, and analysis, so saving historical records is essential. Fortunately, saving historical records is a simple problem that only requires 1/n honest data providers. This is in stark contrast to the state consensus problem that requires 1/3 to 2/3 of participants to be honest. Node operators can verify the authenticity of historical datasets by 1) replaying all transactions since the genesis block and 2) checking if these transactions reproduce the same state root as the current blockchain endpoint.
There are many methods for saving historical records.
- Torrents/P2P: Torrents are the simplest and most reliable method. Ethereum nodes can periodically package parts of historical records and share them as public Torrent files. For example, a node might create a new historical Torrent file every 100,000 blocks. Node clients like erigon have already implemented this process to some extent in a non-standardized way. To standardize this process, all node clients must use the same data format, parameters, and P2P network. Nodes will be able to choose whether to participate in this network based on their storage and bandwidth capabilities. The advantage of Torrents is the high lindy open standard supported by a large amount of data tools.
- Portal Network: The Portal Network is a new network designed for hosting Ethereum data. This is a method similar to Torrents but also provides some additional features to make data verification easier. The advantage of the Portal Network is that these additional verification layers provide utilities for light clients to effectively verify and query shared datasets.
- Cloud Hosting: Cloud storage services like AWS's S3 or Cloudflare's R2 provide a cheap and high-performance option for saving historical records. However, this method brings more legal and operational risks because these cloud services cannot guarantee that they will always be willing and able to host cryptocurrency data.
The remaining implementation challenges are more social challenges than technical challenges. The Ethereum community needs to coordinate specific implementation details to integrate them directly into each node client. In particular, executing a full sync from the genesis block (rather than snapshot sync) will require retrieving historical records from historical record providers rather than Ethereum nodes. These changes do not require a hard fork technically, so they can be implemented earlier than Ethereum's next hard fork, Pectra.
All of these historical storage methods can also be used by L2 to store their blob data released to the mainnet. Compared to historical storage, blob storage is 1) more challenging because the total data volume is much larger; 2) less critical because blobs are not necessary for replaying mainnet history. However, blob storage is still necessary for each L2 to replay its own history. Therefore, some form of blob storage is important for the entire Ethereum ecosystem. Additionally, if L2 develops robust blob storage infrastructure, they may also be able to easily store L1 historical data.
Directly comparing the datasets stored by different node configurations before and after EIP-4444 would be helpful. Figure 7 shows the storage burden of different types of Ethereum nodes. State data consists of accounts and contracts, historical data consists of blocks and transactions, and archive data is a set of optional data indexes. The byte counts in this table are based on the most recent reth snapshot, but the numbers for other node clients should be roughly equivalent.
Figure 7: Storage burden of different types of Ethereum nodes
In other words,
- Archive nodes store both state data, historical data, and archive data. When someone wants to easily query the historical chain state, they can use an archive node.
- Full nodes only store historical data and state data. Most nodes today are full nodes. The storage burden of a full node is about half that of an archive node.
- After EIP-4444, full nodes only store state data and the most recent year of historical data. This reduces the node's storage burden from 1.2 TiB to 633 GiB and stabilizes the storage space for historical data.
- Stateless nodes, also known as "light nodes," do not store any dataset and can immediately verify at the end of the chain. Once Verkle attempts or other state commitment schemes are added to Ethereum, this type of node becomes possible.
Finally, there are some additional EIPs that can limit historical growth rates, not just adapt to the current growth rate. This helps to stay within network IO constraints in the short term and within storage constraints in the long term. Although EIP-4444 is still necessary for the network's long-term sustainability, these other EIPs will help Ethereum scale more efficiently in the future:
- EIP-7623: Repricing call data to make transactions with excessive call data more expensive. Making these usage patterns more expensive will force some of them to convert from call data to blob. This will reduce historical growth rates.
- EIP-4488: Imposing limits on the total amount of call data that can be included in each block. This will impose stricter limits on the rate of historical record growth.
These EIPs are easier to implement than EIP-4444, so they may serve as short-term measures before EIP-4444 is put into production.
Conclusion
The purpose of this article is to understand 1) how historical growth works and 2) methods to address this issue through data. Much of the data in this article is difficult to obtain through traditional means, so we hope to provide some new insights into the historical growth issue.
Historical growth as a bottleneck for Ethereum scalability has not received enough attention. Even without increasing the Gas limit, the current practice of Ethereum storing historical records will force many nodes to upgrade their hardware within a few years. Fortunately, this is not an insurmountable problem. There is already a clear solution in EIP-4444. We believe that the implementation of this EIP should be accelerated to make room for future Gas limit increases.
Disclaimer: The content of this article solely reflects the author's opinion and does not represent the platform in any capacity. This article is not intended to serve as a reference for making investment decisions.
You may also like
2024 Crypto Developer Report SummaryExecutive Summary
Digital Veblen Goods and Fees
Musings on the Future of Actually Smart Wallets
Bitwise CIO: Биткойн может достичь $200 000 без краха доллара