Arweave: Humanity’s permanent store of knowledge

Alex Beckett
15 min readSep 22, 2021

--

Disclosures
The purpose of this report is purely informational and educational. It is not to be construed as investment advice.

Introduction

Information preservation across time has been essential to passing down and storing humanity’s collective knowledge. Much of this has historically been physical objects that are susceptible to decay, and vulnerable to damage. If preserved, such objects are at risk of loss or alteration, the latter posing implications on our knowledge of past information. While much of our collective knowledge has moved to digital means, we are still faced with the same problems.

Arweave is a protocol purpose-built for permanent data storage at scale in a sustainable environment. A network for knowledge and content to be stored forever without the potential to be altered, censored, or removed. It also acts as layer 1 for the perma web, where applications can be published to the internet in an immutable and decentralized manner. Fitting with the original ethos of the internet, it provides a platform for individuals to communicate and interact in a free manner.

Network design

Blockweave, proof of access

Blockweave is Arweave’s purpose-built “blockchain” where combined with its proof of access consensus mechanism, it is able to permanently store data at a low cost at scale. To store blocks, the blockweave links each block to two prior blocks. Those being the prior block and a block from the network’s previous history (the recall block).This makes the blockweave less like a traditional blockchain, and more like a modified graph structure.

Miners are also not required to store the entire blockweave. New miners can join the network by only needing to download the current block from a group of trusted peers, or by “verifying some backward portion of proofs of work”. This gives flexibility and lowers the barriers to entry for new miners as they can dictate the amount of data they store. By doing so, each additional miner added to the network effectively increases Arweave’s size and capacity. Overall, this results in increased potential for decentralisation, which if realised increases the security and censorship resistance of data being stored on the network.

Proof of access is Arweave’s consensus mechanism where miners compete to store as many replications of data as possible in the network. The incentives to store data are greater when there is excess capacity in the network, resulting in faster data access times.

To ensure the permanence and immutability of stored data, each piece of data is intertwined within the network. Attempts to attack, remove, or alter data in the network will be automatically identified and ultimately rejected.

Wildfire

As Arweave’s state increases in size, it is paramount that data can be accessed quickly and efficiently. Wildfire creates a social-based ranking system, where nodes locally rank the quality of the actions of their peers. Rankings are based on qualities like the speed of requests (blocks, and transactions) and responses to data acceptance. Because nodes are served by ranking, the system encourages positive social behaviour as nodes are financially incentivised to perform at a high standard. Resulting in better ratings and a positive feedback loop. Poorly performing nodes are removed from the network. The outcome of wildfire is a system that delivers faster node performance and data accessibility.

Network mechanics

AR is the native token of Arweave, which is used to pay for transactions and data storage on the network. Those activities are undertaken by the users. A transaction can also be defined by two categories: value transactions, and data transactions. Value transactions only contain AR tokens while data transactions require data to be stored on the network. The two transaction categories don’t change the structure of the transaction. This provides flexibility to transactions that benefit the smart contract layer of Arweave. On the receiving end, miners are rewarded with AR in the form of a block reward. By mining the block and accepting the AR tokens they are also required to store any relevant data.

Data storage

As the demand for permanent and immutable data storage grows, the network needs to have a scalable, sustainable, and low-cost method of providing it. The cost of storing data on Arweave is a function of the cost to store data in perpetuity. Measured through the price per gigabyte hour, it considers costs such as the lowest market price of a hard drive, the capacity of the hard drive, the average time between hard drive malfunctions. The cost of perpetual data storage can then be calculated as the infinite sum of the declining costs over time.

The benefit of technology with a declining cost curve is that it gets cheaper to produce over time. In the 7 years from 2013 to 2020, hard drives and solid-state drive prices experienced a -13% and -33% CAGR respectively. This is in response to continual growth in data density and efficiency of production, following both Moore’s law and Wright’s law. HDDs and SSDs are both projected to decline in price at -5.4% and -26.1% CAGR respectively over the next 9 years. As such, storage on the network will be cheaper over time.

Blockshadows

Typically, blockchains distribute each block to every node in the network. This introduces bottlenecks and limitations on the data that can be included in the block, and the time it takes to transfer it across the network. Arweave scales this by allowing blocks to only include necessary data when being distributed throughout the network — a blockshadow. Once nodes have received a blockshadow they can reconstruct the full block without needing to receive all its contents. By reducing block propagation times, blockshadows allow for a reduction in network congestion as data isn’t required to be sent to nodes that already possess the data. This reduces the time for the network to come to consensus, making it quicker and more efficient.

Storage endowment

To store data on Arweave, the costs are paid upfront in AR tokens for 200 years in an endowment. The principal needed to cover the endowment is calculated based on the cost of storing data in perpetuity. Like a traditional endowment, the principal earns yield over time. This is used to accrue storage purchasing power and offset miner expenses when miners are operating at a loss — when block rewards don’t cover the cost of operational expenditures.

Endowments are calculated with a conservative estimate to take into account fluctuations in AR price and other factors. Interest will continue to accrue in the endowment as miners continue operational profitability until a time where the network requires costs to be subsidised — not expected for years. In total, the overall health and viability of the network long-term is ensured by the network’s endowment structure, aided by the declining costs of storage over time.

Decentralized content policy

Over time, large platforms emerged on the internet with rules governing the type of content appropriate for the platform. Arweave has created a content policy that decentralises the process that would typically be fulfilled by a sole arbitrator. This mitigates the problem of allowing a central party to dictate the type of content allowed. Participants are free to choose the type of content they want to store on the network, while nodes are free to reject content they are unwilling to store.

Content policies can be created at the local level and managed with blacklists. To verify whether a block is suitable to be stored by a node, computation is performed on the block against the node’s given blacklist, and then labelled as acceptable or not. While blacklists are created at the local level, they can also be enforced at the network level based on social consensus. If certain content is rejected by more than half the nodes in the network, it will then be rejected by the whole network. This allows the network to converge on a common set of blacklist content through the many varying local blacklists and social consensus. Only a very small subset of content falls into this. Content that falls outside of the network-wide blacklist is given the censorship resistance and immutability that Arweave provides.

To balance the forces of content rejection and acceptance, the network utilises incentives. To manage the number of content rejections, declining mining rewards are levied on nodes that reject too many transactions that the network is likely to accept. Conversely, nodes will mine invalid blocks if they accept too many transactions that the network is likely to reject.

Smart Weave

Arweave is much more than a data storage system, it is also a protocol with smart contract functionality. SmartWeave is the smart contract layer built on top of Arweave that enables applications to be deployed in an immutable, trustless, and decentralized way to the perma web.

Unlike a typical smart contract platform where transactions are validated by nodes, SmartWeave pushes the responsibility of transaction validation onto the users. To interact with a smart contract the interaction is first placed in a block, which is then evaluated and verified during the interaction.

By eliminating the need for nodes to validate smart contract interactions, it removes the need for gas costs to be paid for each interaction. To increase scalability, a layer two solution has been implemented that allows for transactions to be bundled together before being sent to the network in a single transaction. Each sub-transaction within the bundle is given a unique ID so that it can still be referenced smart contracts.

In theory, bundles could store quadrillions of sub-transactions in a single transaction, but practically the limit is roughly 600k.

SmartWeave also features delegated transactions, which allows a third party to pay for transaction fees on behalf of the users. Bundling service is an example of this, where users can pay the transaction bundlers in the digital currency of their choice, while the bundler covers the costs in AR.

Use cases

Data storage

Data storage is at the core of Arweave’s network, offering a number of use-cases that are unaddressed by the market.

Permanence

Files and links are known to corrupt over time. Data stored on the network can be accessed and remain forever. This is setup by the endowment structure, which is designed to pay for the costs of storing data forever. This guarantee can’t be given to businesses that provide data storage, which can change and cease operations.

Censorship resistance

Information needs a method of storage that frees it from censorship and the whim of central arbitrators. Once stored, data cannot be removed or censored from the network. This can free users of centralizing web2 monopolies or allow citizens to express freedom of information in quasi-dictatorship countries.

Immutability

Information preservation is only as useful as the ability for information to not be altered over time. Once data is stored on Arweave it can’t be altered or changed, even by the originating wallet. History is no longer at the mercy of alteration. Arweave provides immutability which lends authenticity to the information.

Provenance

Information stored over time also needs to retain proven authenticity. Verifiable ownership and authenticity of data can be proven through the history of transactions on the network. Disputes regarding the origin, ownership, or authenticity of information can be verified by the data’s linked information which is preserved permanently on Arweave.

Scalability

As the demand for data storage rises along with the sum of information, it needs a system that can scale to meet these needs. Arweave is designed to expand in capacity as demand for data storage on the network increases. The only limit to capacity is the number of nodes willing to supply storage, which is incentivised.

Sustainability

There is no use in creating a permanent network for data storage if it is not designed for long-term sustainability. Arweave is built for long-term sustainability from both a cost and network structure perspective.

Never have humans had a system that comes close to information permanence that expresses all the above characteristics like Arweave.

NFTs

With the recent explosion in the NFT market, Arweave has seen significant usage growth through the storage of NFTs. The problem that presents itself with NFTs will largely gravitate to the storage of the underlying artwork as it becomes more apparent over time. Storing an NFT and referencing a centralized cloud provider has several problems, namely that the business can degrade or shut down which makes it a single point of failure. Moreover, links in web2 are subject to phenomena like link rot and content drift. Link rot occurs when links cease to target their original object as the resource becomes unavailable or moves to a new address. Content drift occurs when links point to content that is not the original content, which can occur because of link rot. Therefore, decentralized storage is the only option. Short-term decentralized file storage won’t solve this because it is reliant on continual payments for the service. The individual storing the NFT can only maintain the link as long as they are willing to do so. With a broken link, all future owners of the NFT won’t be able to access the underlying artwork. Therefore, the only solution for NFT storage is to store it with a decentralized protocol focused on permanence, which is perfectly suited for Arweave.

The cost of storing NFTs on Arweave is also extremely low because of the transaction bundling capabilities. This allows a set of 10,000 NFTs to be bundled into a single transaction on the network, which can cost as little as $0.37.

Profit sharing communities and tokens

Just as DAOs have become proliferous among layer 1 smart contract platforms, Arweave has its own similar community “organisations” called profit sharing communities (PSC). PSCs allow for a fairer model regarding ownership and governance, which are represented by smart contracts deployed on the network. Profit sharing tokens (PST) are used to represent ownership in the PSC. PSCs that deploy applications to the network can receive revenue and distribute it proportionally to individuals holding the integrated PST.

Governance is also modified so that collective ownership is not solely based on the number of tokens held, but also the duration that individuals are willing to lock up their tokens. This can give extra voting power to those that commit to a long lock-up period, lessening the negative effects of a coin voting system.

Applications and ecosystems have been developed to support PSCs and PST infrastructure, like Community.xyz and Verto. Community.xyz is a governance platform for PSCs to be built on Arweave. It provides PSCs with the necessary infrastructure to begins operations, like smart contracts, a community dashboard, governance, and a token vault system. Verto is decentralized exchange for PSTs to be traded. It is a permissionless platform, so any individual can create a market for a PST to begin trading.

Layer 1 state storage

As layer 1 state sizes continue to grow at an increasingly large rate, solutions need to be developed to maintain growth so that it doesn’t negatively affect the network. Solana has one of the fastest-growing state sizes due to the vast amount of data it produces, ~2TB per year, which lead them to partner with Arweave as a storage solution. While transaction history must be available and accessible in perpetuity, “It doesn’t make sense to build a dedicated storage network and force that burden on validators when a perfectly amazing solution already exists…”. Since validators are only required to store 2 days’ worth of the ledger, archiving the rest on Arweave, it allows them to be more efficient which in turn results in a faster chain.

Solana isn’t the only layer 1 with a state size problem. Current state sizes for prominent L1s include 365GB for Bitcoin with a 26% CAGR, and 972GB for Ethereum (GETH client) at an 80% CAGR. For L1s with a focus on low barriers to entry for decentralization, utilising Arweave for state storage would require less hardware to run a full node thus increasing potential decentralization.

Network growth

Total weave size

Total size of the network experienced MoM growth of 18% with a 6-month CAGR of 9%.

Transactions

Network transaction volume increased 50% MoM with a 6-month CAGR of 17%.

Node geographic distribution

Node distribution has 48% of nodes in the US, with a further 72% of nodes in the top four countries.

Comparison against competitors

Filecoin

Filecoin is a decentralised storage network that is trying to compete with centralized web2 storage providers, like AWS and Dropbox. Filecoin utilises the InterPlanetary File System (IPFS) for its distributed storage system. Users pay for ongoing storage and for accessing data while miners are paid for storing and retrieving files for users. Storage costs are determined by supply and demand, where miners compete to provide the lowest cost of storage for users. Users can then choose factors that will suit their needs, like redundancy and speed, which will also affect storage costs. Redundancy is the number of replications miners store of users’ data.

In contrast, Arweave storage costs are determined by the declining costs over time, which doesn’t subject users to the potential volatility of supply and demand. Filecoin also utilises IPFS while Arweave created a purpose-built system for their data storage. IPFS is a distributed hash table, where data is hashed and stored as a key to access the data. Since IPFS is a free platform, shortages tend to occur where there aren’t enough nodes on the network to match demand. Nodes are needed to both store the data and retrieve it but aren’t incentivised to do so. While Filecoin solves this by creating an incentive layer on top of it the economics are uncertain in the long term. While both Filecoin and Arweave have a cap on maximum token supply, Filecoin has no incentive structure once block rewards finish. Once this occurs, miners’ revenue will solely be based on demand for storage and retrieval, which may not be enough to cover operational costs. If this happens, there is no incentive for miners to continue operating, and thus the system becomes less secure and sustainable in the long-term. Arweave mitigates this with the storage endowment and transaction fee distribution. When miners are operating profitably, a portion of the transaction fee will be taken and held in the endowment instead of solely going to the miner. As the endowment grows, it can be used to subsidise the miners at a time when mining is unprofitable, thus creating long-term security and sustainability for the system.

IPFS also has scalability problems because of computational complexity that is inherent in the system. Computational complexity refers to the complexity of an algorithm and the amount of resources required to run it, which is something you want to reduce as much as possible. IPFS has logarithmic computational complexity to the size of the data in the network. As the network grows computation complexity increases exponentially, which makes it exponentially more difficult to store and access files on the network. Filecoin users can express the number of replications they want miners to store of their data, but the system doesn’t optimise this for users by default. Arweave incentivises miners to store more replications based on the available storage capacity. This replication maximisation leads results in faster lookup, and access times for data. Therefore increasing security and sustainability long-term as the network aims to be the method for the internet’s permanent data storage.

Arweave tokenomics

AR is the native token of Arweave, which is used for transaction fees, block rewards, and as the principal to fund the storage endowment. At genesis, 55 million AR were created with an additional 11 million, 20% of initial supply, being introduced gradually into circulation through block rewards. Giving AR a maximum supply cap of 66 million AR.

While AR is paid as transaction fees most of it doesn’t go directly to miners, but rather the storage endowment. This is dictated by the level of profitability of miners. As profitability turns negative, the storage endowment uses AR to subsidize the miners to keep the network secure and maintain its long-term viability.

Historical returns

Conclusion

Arweave is the first purpose-built network that aims to achieve permanent data storage in an immutable, sustainable, and low-cost environment. Once on the network, data can’t be altered, censored, or removed. By providing censorship-resistance, users are enabled to express freedom of information without the restrictions set in place by centralized web2 entities. With the incentive structures that govern the network, Arweave is set up for long-term sustainability and security to ensure the network can fulfill its role in providing permanent storage.

Arweave also acts as the base layer for the perma web, the layer on which immutable and decentralized applications can be deployed on. Use-cases have been emerging to fulfil vital roles within crypto, particularly NFTs and L1 state storage. Over time, more sophisticated and mature applications will emerge that will be able to take full advantage of the benefits of the ecosystem. The result is a network of critical infrastructure that is vital not only for the crypto economy but for the global market for data storage and information preservation.

“In the long term I think that people start to realise this is essentially humanity’s… decentralized store of knowledge. And so… if you have a piece of data you want to contribute to the collective memory of all people, then this is the system you go to.”

Link to a downloadable version of the report: https://drive.google.com/file/d/1ehr47rZ_aj1_A7dV7BXqZbMmGJVubM1w/view?usp=sharing

--

--