The storage demand for Ethereum is continuously increasing, which brings about significant challenges. This article examines the reasons behind the problem and proposes solutions and future prospects. The article is sourced from an article by EthStorage, compiled, translated, and written by Deep Reinsurance.
Table of Contents:
Background
Challenges of storage
Ethereum storage roadmap and its consequences
Solution 1: Ethereum Portal Network
Solution 2: EthStorage Network
Future outlook
On October 22, 2023, Péter Szilágyi, the lead developer of the famous Go-Ethereum (Geth), expressed his deep concern on Twitter. He pointed out that while the Geth client retains all historical data, other Ethereum clients such as Nethermind and Besu can configure to delete certain historical Ethereum data (e.g., historical blocks and headers). This leads to inconsistent behavior among all clients and unfairness towards Geth. This sparked intense discussion and debate regarding the Ethereum storage issue in the Ethereum roadmap.
Why did Nethermind and Besu choose to stop storing historical data? What are the issues behind this decision? From our perspective, there are two main reasons:
1. The storage requirements of Ethereum clients are continuously increasing.
2. There is no protocol-level incentive or punishment for storing Ethereum historical data.
The first reason stems from the increasing storage demands of Ethereum clients. To gain a deeper understanding of the specific requirements, the pie chart below shows the storage distribution of a new Geth node as of Block 18,779,761 on December 13, 2023.
[Image]
As shown in the chart:
Total storage size: 925.39 GB
Historical data (blocks/transactions receipts): approximately 628.69 GB
State data in Merkle Patricia Trie (MPT): approximately 269.74 GB
The second reason is the lack of protocol-level incentives or punishments for storing historical blocks. While the protocol requires nodes to store all historical data, it fails to provide any mechanism to incentivize storage or penalize non-compliant behavior. Node storage and sharing of historical data become purely altruistic, and client operators are free to delete or modify all historical data without any consequences. In contrast, Validator nodes are required to maintain and update the full state locally to prevent slashing due to proposing/voting for invalid blocks.
Therefore, it is not surprising that some node operators choose to delete historical data when storage costs become a significant burden. Without historical data, node clients can significantly reduce storage costs from approximately 1TB to around 300GB.
With the upcoming Ethereum Data Availability (DA) upgrade, the storage challenge will intensify.
The roadmap to fully scale Ethereum DA began with EIP-4844 in the DenCun upgrade, which introduced a fixed-size binary large object (BLOB) and an independent cost model called blobGasPrice. Each BLOB is set to 128KB, and EIP-4844 allows a maximum of 6 BLOBs per block. To scale the data throughput, Ethereum plans to adopt 1D Reed-Solomon codes, initially allowing 32 BLOBs per block and reaching 256 BLOBs per block in full scalability.
If Ethereum DA is fully implemented with maximum capacity (256 BLOBs per block), the Ethereum DA network is expected to receive approximately 80TB of DA data per year, far exceeding the storage capacity of most nodes.
Vitalik’s tweet about the Ethereum roadmap mentioned Purge, which primarily involves storage-related content.
The increasing storage costs have caught the attention of Ethereum ecosystem researchers. To address this issue and ensure consistency among all clients, researchers are developing proposals to explicitly delete historical storage. Two major proposals are:
1. EIP-4444: Limiting historical data in executing clients: This proposal allows clients to delete historical blocks older than one year. Assuming an average block size of 100K, the maximum historical block data would be around 250GB (100K * (3600 * 24 * 365) / 12, assuming block time = 12 seconds).
2. EIP-4844: Sharding BLOB transactions: EIP-4844 discards BLOBs older than 18 days. This is a more radical approach compared to EIP-4444, limiting the historical BLOB size to around 100GB ((18 * 3600 * 24) * 128K * 6 / 12, assuming block time = 12 seconds).
What are the consequences of deleting historical data for all clients? One major issue is that new nodes cannot synchronize to the latest state through “full sync” mode, which is a synchronization that executes transactions from the genesis block to the latest block. Accordingly, we have to resort to “snap sync” or “state sync” to directly synchronize the latest state from Ethereum nodes. This approach has been implemented in Geth and is the default synchronization method.
Similarly, this consequence also applies to all Layer 2 (L2) solutions, meaning new nodes of L2 cannot fully synchronize the latest state of Ethereum L2 by replaying from L2 genesis to the latest L2 block. Furthermore, since L1 nodes do not maintain L2 states, the “snap sync” method of L2 cannot derive the latest L2 state from L1, which violates an important L2 assumption of inheriting Ethereum’s security guarantees. The proposed solution will rely on third-party services such as Infura/Etherscan/L2 projects to store historical L2 data or state copies, which is a centralized solution achieved through off-chain, indirect incentives.
The core question we want to explore is:
1. Can we find better decentralized solutions for storage and access?
2. Is it possible to have a solution that is directly incentivized, protocol-consistent (e.g., on top of L1 contracts), and aligned with Ethereum?
3. Based on all of this, can we provide a fully decentralized, protocol-level incentivized solution for Ethereum storage in the roadmap?
The Ethereum Portal Network is a lightweight, decentralized access network for connecting to the Ethereum protocol. It provides Ethereum JSON-RPC interfaces such as eth_call, eth_getBlockByNumber, etc., by converting JSON-RPC requests into P2P requests to a decentralized hash table (DHT), similar to the IPFS network. Unlike IPFS, which allows storing any data type and is susceptible to junk data, the Portal P2P network specifically hosts Ethereum data such as historical block headers and block transaction data. This is achieved through the built-in light client verification technology in the Portal network.
One important feature of the Portal network is its lightweight implementation design and compatibility with resource-constrained devices. It can run on nodes with only a few megabytes of storage space and low memory, thus promoting decentralization. Even mobile devices or Raspberry Pi devices can join the network and contribute to the availability of Ethereum data.
The development of the Portal network is in line with the concept of Ethereum client diversity, with clients written in Rust, JavaScript, and Nim. The Beacon network and Historical network are already available, while the State network is actively being developed. It is worth noting that the Portal network does not provide direct incentives for data storage – all nodes in the network operate in an altruistic manner.
EthStorage Network is a decentralized incentivized storage network specifically designed for storing EIP-4844 BLOBs and is funded by the ESP project.
Minimal trust: Unlike existing solutions that require centralized data bridges, EthStorage relies on the consensus of Ethereum and a 1/m trust model of permissionless EthStorage storage nodes. The process of storing BLOBs goes as follows: the user signs a transaction carrying the BLOB and calls the put(key, blob_idx) method of the storage contract. The storage contract then records the BLOB hash on the chain. Subsequently, storage providers download and store BLOBs directly from the Ethereum DA network, bypassing the data bridge problem.
Storage cost aligned with incentives: When calling the put() method, the transaction must include a storage fee (via msg.value) and deposit it into the contract. As successful storage nodes submit and verify storage proofs on the chain, this storage fee will be gradually distributed to the storage nodes over time. Compared to the existing Ethereum storage fee model that pays a one-time storage fee to the proposer, the storage fee paid over time follows a discounted cash flow model – assuming that storage costs will decrease relative to the ETH price over time. This significant innovation introduced by EthStorage ensures cost consistency with storage contributions by nodes.
Storage proofs: Storage proofs are inspired by data availability sampling, and in EthStorage, sampling is performed on the storage of BLOBs over a period of time. To efficiently verify the sampled data on the chain, EthStorage fully utilizes smart contracts and the latest developments in SNARK technology.
Permissionless operation: Any storage node in EthStorage can earn rewards as long as they store data and regularly submit storage proofs on the chain.
From the perspective of modular blockchains, EthStorage acts as an Ethereum storage Layer 2, but it charges storage fees instead of transaction fees. By indexing BLOB hashes on the chain, EthStorage serves as a modular storage layer on Ethereum, enhancing storage scalability and reducing costs (targeting approximately 1000x).
In terms of development, EthStorage has been integrated with EIP-4844 on the Ethereum Sepolia testnet. We have conducted stress testing on EthStorage and the Ethereum Sepolia testnet, including writing hundreds of GBs of BLOBs to EthStorage. Over 100 community participants have joined the network and successfully demonstrated their local storage.
The main advantage of the EthStorage network is its decentralized, protocol-level direct incentivization on top of Ethereum – to our current knowledge, this is a groundbreaking feature. However, the limitation of this network is that it is designed specifically for fixed-size BLOBs.
Although Ethereum storage has not received significant attention, it holds significant importance in the Ethereum ecosystem. With the rapid growth of the Ethereum network, storage and accessibility of Ethereum data become crucial challenges. Both the Portal network and the EthStorage network are still in the early stages, and there are many important long-term development directions to consider:
1. Decentralized low-latency access to Ethereum state data network: Accessing Ethereum state in a decentralized and verifiable manner is a critical but challenging task. With traditional DHT network models, querying account information often requires multiple queries to internal trie nodes stored in different P2P nodes. This often leads to considerable latency. Finding ways to leverage the structure of the state tree to accelerate access is crucial. The upcoming state network in the Ethereum Portal network aims to address this problem.
2. Integration of the Portal network with the EthStorage network: The Portal network can seamlessly expand to support BLOB data. The EthStorage team has partially implemented this feature. The next step is to unify these networks, providing a decentralized JSON-RPC network that can programmatically access BLOBs through contracts. By combining the application logic in contracts with the scalable BLOB storage provided by EthStorage, we can enable new dApps on Ethereum, such as dynamic decentralized websites (e.g., decentralized Twitter/YouTube/Wikipedia).
3. Decentralized access to browsers: Similar to accessing data in the IPFS network with the ipfs:// protocol, the web3 industry needs a native Ethereum access protocol to support direct browser access, unleashing the tremendous potential of Ethereum’s rich data. These data cover a wide range of areas, from token ownership and account balances to NFT images and dynamic decentralized websites, all benefiting from the capabilities of smart contracts and future Ethereum storage. In this field, the web3:// protocol defined by ERC-4804/6860 is currently under active development and promotion to achieve this goal.
4. Advanced storage proofs for dynamic-sized data: In addition to fixed BLOBs, exploring advanced storage proofs is crucial for addressing dynamic-sized data (e.g., historical blocks or even state objects). Developing sophisticated algorithms can enhance the adaptability of storage solutions.
In our pursuit, we hope to contribute to the Ethereum roadmap and lay the foundation for decentralized storage solutions for the future Ethereum ecosystem through these efforts.