Ethereum's state is a data structure that needs to be implicitely constructed, stored and accessed in order to be able to execute arbitrary transactions. This is because a transaction may theoretically access any item in the current state. The state size grew beyond the capacity of RAM (Random Access Memory) on average computers some time in 2017. After that point, RAM could only be used to cache certain portions of the state, whereas the entirety of the state reside on persitent storage devices. Any caching strategy apart from keeping a random portion of the accessible state, would be vulnerable to attack. Therefore, assuming that random caching strategy, the cache hit ratio would very close to the ratio of size of the cache to the size of the entire state. And cache miss would mean accessing devices with much higher latency. Due to the further growth of the state, it became impractical to use HDD (Hard Disk Drives - storage devices with mechanically spinning disks) for storage of the state, due to the high latency of the access. Even SDD (Solid State Drives) are on the edge of being appropriate. Devices such as NVM (Non Volative Memory) are now required to ensure good performance. However, such devices are still relatively expensive and their price proportional to capacity ($/Gb) is highly non-linear after certain point. State size also places a significant burden on the new participants in the Ethereum network. Most popular way of joining the network at this moment is so-called "snapshot syncronisation". It is the process in which the new joiner downloads the entire state from the existing peers. The sheer size of the state puts a high demand on the bandwidth quality. Dealing the network latencies requires sophisticated algorithms for downloading. And the ever-changing nature of the state (it keeps changing during the download) either puts snapshot sync at odds with state history pruning, or requires even more sophisticated algorithms for the downloading of the state.
Statless clients is one of the approaches to improve performance of Ethereum client implementations while processing blocks of transactions. Specifically, it seeks to alleviate the increasing burden that is the state size. It does so by removing the need to download, or implicitely construct and maintain the state, for the majority of the participants in the Ethereum network. The requirement of the access to the state is removed by introducing another type of data packets (existing data packet types are, for example, blocks and transactions) to be gossipped around the p2p network. We call this data packets "block witnesses". For each block we have one corresponding block witness. The two main properties that block witnesses have:
- It is possible to efficiently verify that the block witness is indeed constructed from the correct version of the Ethereum state.
- Block witness has all the required information to make it possible to execute the corresponding block.
More details can be found here: https://medium.com/@akhounov/data-from-the-ethereum-stateless-prototype-8c69479c8abc
In order to make stateless clients a reality and get it implemented in all Ethereum 1 clients, these large steps seem to be necessary, up to the testnet launch:
- Revisiting of the prototype (currently based on Turbo-Geth). Previous version of the prototype, that was used to produced data in the medium article showed above, had a different encoding for the structure than what is currently proposed (based on the "multiproofs" section here: https://github.com/ledgerwatch/turbo-geth/blob/master/docs/programmers_guide/guide.md#multiproofs).
- Refresh the data. Using updated prototype, produce new data, and add non-smoothed charts (previously published charts were moving average over 1024 block to make them smoother, but they potentially concealed some extemes)
- Extend to the prototype to generate block witness for the past blocks ranges. Witnesses may be produced not just for a single block, but for a range of blocks. This may make more sense for the past blocks, to speed up the sync process. If block witness is 1Mb per block, then for million blocks, the cumulative witness size would be 1Tb. However, if we produced one witness per 1024 blocks, the ratio "cumulative witnesss size/cumulative block size" will go down. The goal of this step is to find the optimal value of this ratio and access whether it can be used to speed up syncing by producing and making available witnesses for all past blocks.
- Visualisation in the prototype. Some initial steps have been taken (e.g. visualisations for the presentation at the STARKWare sessions). This needs to be extended to provide interactive visualisaions of state and block witnesses, to help extend the circle of researches and developers dealing with the stateless clients.
- Prototype to support semi-stateless. Semi-stateless approach described here: https://medium.com/@akhounov/the-shades-of-statefulness-in-ethereum-nodes-697b0f88cd04, but the data gathering was still hard computationally. Data needs to be collected and published on how the block witness size reduced with the degree of statefullness, and how much memory various degrees of statefullness would require.
- Analysis of potential adversarial behavior. Using the prototype developed, make an assessment of how much gas would an adversary need to pay to inflate the size of the block witness. This analysis will be the input to the gas cost adjustment that the introduction of stateless client would require.
- Block witness specification. Formal (implementation-independent) description of how block witness can be produced, how to encode/decode them into/from the wire packets, how to execute transactions on a block witness.
- Partial block witness specification. This includes the case of semi-stateless block witnesses, tailored for a specified degree of statefulness.
- New subprotocol specification. Specify a new, optinal subprotocol for gossiping block witnesses (full, partial, and for range of blocks) around the network.
- Gas cost specification. Based on the analysis of the adversarial behaviour, specify how the transaction senders will be charged for the production of block witnesses.
- Testnet. Construct a testnet supporting the block witnesses. Having specification from previous steps would allow any client implementation to catch up and join.
Anyone supporting the research and development of the stateless clients would want to see what steps have been completed. All steps starting from step 2 have a tangible deliverable, available for public scrurity. It would be either a publication that contain data analysis, or a specification document, or a publicly available network (testnet). Therefore we propose to "check-point" finantial support on these deliverables.