How to Build a High-Performance Blockchain

Bitget App

Trade smarter

Bitget

News

BlockBeats2025/04/22 09:20

By:BlockBeats

Source: Aptos Labs

Since the advent of computing technology, engineers and researchers have been continuously exploring how to push computing resources to the performance limit, aiming to maximize efficiency while minimizing the latency of computing tasks. The two pillars of high performance and low latency have always shaped the development of computer science, influencing a wide range of fields from CPUs, FPGAs, and database systems to more recent artificial intelligence infrastructure and blockchain systems. In the pursuit of high performance, pipeline technology has become an indispensable tool. Since the introduction of pipeline technology in the IBM System/360 in 1964 [1], it has been a core of high-performance system design, driving key discussions and innovations in the field.

Pipeline technology is not only applied to hardware but also widely used in the database field. For example, Jim Gray introduced the pipeline parallelism approach in his work "High-Performance Database Systems" [2]. This method breaks down complex database queries into multiple stages and runs them simultaneously, thus improving efficiency and performance. Pipeline technology is equally vital in the field of artificial intelligence, especially in widely used deep learning frameworks like TensorFlow. It utilizes data pipeline parallelism to process data preprocessing and loading, ensuring a smooth flow of data for training and inference, making AI workflows faster and more efficient [3].

Blockchain is no exception. Its core function is similar to a database, handling transactions and updating the state, but it adds the challenge of Byzantine fault-tolerant consensus. The key to improving blockchain throughput (transactions per second) and reducing latency (time to finality) lies in optimizing the different stages—ordering, execution, submission, and transaction synchronization—during interactions under high loads. This challenge is particularly crucial in high-throughput scenarios where traditional designs struggle to maintain low latency.

To explore these concepts, let's consider a familiar analogy: the automobile factory. Understanding how the assembly line has revolutionized manufacturing can help us grasp the evolution of the blockchain pipeline—and why next-generation designs like Zaptos [8] are pushing blockchain performance to new heights.

From Automobile Factory to Blockchain

Imagine you are the owner of an automobile factory with two main goals:

· Maximize throughput: Assemble as many cars as possible every day.

· Minimize latency: Reduce the build time of each car.

Now, consider three types of factories:

Simple Factory

In a simple factory, a group of versatile workers systematically assembles a car. One worker assembles the engine, the next worker installs the wheels, and so on—producing only one car at a time.

The issue? Some workers often wait idle, leading to an overall low production efficiency because no one is working on different parts of the same car simultaneously.

Ford Factory

Enter the Ford assembly line[4]! Here, each worker focuses on a single task. The car moves along a conveyor belt, and as each car passes through, a dedicated worker adds their part.

The result? Multiple cars are at different assembly stages simultaneously, and all workers are busy. Throughput increases significantly—but each car still needs to go through each worker sequentially, meaning the delay per car remains the same.

Magic Factory

Imagine a magic factory where all workers can work on a single car simultaneously! No longer needing to move the car from one station to the next, each part of the car is built simultaneously.

The outcome? The car is assembled at a record speed, with every step happening in sync. This is the ideal scenario to address throughput and latency issues.

Alright, enough about car factories—what about blockchain? As it turns out, designing a high-performance blockchain is not so different from optimizing an assembly line.

Blockchain as a Car Factory

In blockchain, processing a block is akin to assembling a car. The analogy goes as follows:

· Worker = Validator Resource

· Car = One Block

· Assembly Task = Consensus, Execution, and Submission stages

Just as in a simple factory where only one car is processed at a time, if a blockchain were to handle only one block at a time, it would result in underutilization of resources. In contrast, modern blockchain designs aim to emulate the Ford assembly line—processing multiple blocks in different stages simultaneously. This is where pipeline technology shines.

Evolution of Blockchain Pipelines

Traditional Architecture: Sequential Blockchain

Imagine a blockchain that processes blocks sequentially. Validators need to:

1. Receive block proposals.

2. Execute blocks to update the blockchain state.

3. Proceed with achieving consensus on that state.

4. Persist the state to the database.

5. Initiate the consensus for the next block.

Where is the problem?

· Execution and submission are in the critical path of the consensus process.

· Each consensus instance needs to wait for the previous one to complete before starting.

This setup is akin to factories of the pre-Ford era: workers (resources) often idle as they focus on only one block (car) at a time. Unfortunately, many existing blockchains still fall into this category, leading to low throughput and high latency.

Aptos: Parallelizing Performance

Diem introduced a pipeline architecture that decouples execution and submission from the consensus phase, with the consensus phase itself also adopting a pipeline design.

· Asynchronous Execution and Submission [5]: Validators first agree on a block, then execute the block based on the parent block's state. Once validated by a quorum of validators, the state is persisted to storage.

· Pipeline Consensus (Jolteon[6]): New consensus instances can start before the previous one completes, akin to a moving assembly line.

This enhancement allows different blocks to be in different stages simultaneously, increasing throughput and significantly reducing block times to just 2 message delays. However, Jolteon's leader-based design may lead to bottlenecks as the leader can become overloaded during transaction dissemination.

Aptos further optimizes the pipeline through Quorum Store[7], a mechanism that decouples data distribution from consensus. Quorum Store no longer relies on a single leader to broadcast large data blocks in the consensus protocol but separates data distribution from metadata ordering, allowing validators to asynchronously and concurrently distribute data. This design leverages the total bandwidth of all validators, effectively eliminating leader bottlenecks in consensus.

Visualization: How Quorum Store balances resource utilization in leader-based consensus protocols.

Thus far, the Aptos blockchain has built the "Ford Factory" of blockchains. Just as Ford's assembly line revolutionized car manufacturing—different cars in different stages simultaneously—Aptos processes different blocks in different stages concurrently. Each validator's resources are fully utilized, ensuring no part of the process remains idle. This clever arrangement has led to a high-throughput system, making Aptos a robust platform for efficiently and scalably handling blockchain transactions.

Illustration: Pipelined Processing of Sequential Blocks in the Aptos Blockchain. Validators can pipeline process different stages of sequential blocks to maximize resource utilization and increase throughput.

While throughput is crucial, end-to-end latency—the time from transaction submission to final confirmation—is equally important. For applications such as payments, decentralized finance (DeFi), and gaming, every millisecond counts. Many users have experienced delays during high-traffic events because each transaction must sequentially pass through a series of stages: client-full node-validator communication, consensus, execution, state validation, submission, and full node synchronization. Under high load, stages like execution and full node synchronization introduce additional latency.

Illustration: Pipeline Architecture of the Aptos Blockchain. The diagram shows client Ci, full node Fi, and validator Vi. Each box represents a stage a transaction block in the blockchain must go through from left to right. The pipeline consists of five stages: consensus (including dissemination and ordering), execution, validation, submission, and full node synchronization.

It's like a Ford factory: while the assembly line maximizes overall throughput, each car still needs to pass through each worker sequentially, resulting in longer completion times. To truly push blockchain performance to the limit, we need to build a "magic factory" where these stages run in parallel.

Zaptos: Towards Optimal Blockchain Latency

Zaptos[8] further reduces latency through three key optimizations without sacrificing throughput.

· Optimistic Execution: Reducing pipeline latency by starting execution immediately upon receiving a block proposal. Validators promptly add the block to the pipeline and speculatively execute after the parent block completes. Full nodes, upon receiving the proposal from the validator, also perform optimistic execution to validate the state proof.

· Optimistic Submission: Writing state to storage immediately after block execution—even before state validation. When validators eventually validate the state, only minimal updates are needed to complete the submission. If a block ultimately remains unsorted, its optimistically submitted state is rolled back for consistency.

· Fast Verification: Validators expedite verification by concurrently sending validation messages at the final consensus round, starting early verification of the executed block's state without waiting for consensus completion. This optimization significantly reduces pipeline latency by one round in common scenarios.

Illustration: Parallel Pipeline Architecture of Zaptos. Stages other than consensus are effectively hidden within the consensus stage, reducing end-to-end latency.

Through these optimizations, Zaptos effectively hides the latency of other pipeline stages within the consensus stage. Thus, if a blockchain adopts an optimal latency consensus protocol, the overall blockchain latency can also reach an optimum!

Talk is Cheap, Show Me the Data

We evaluated Zaptos' end-to-end performance through geographically distributed experiments, with Aptos as the high-performance baseline. For more details, refer to the paper [8].

On Google Cloud, we simulated a globally decentralized network consisting of 100 validators and 30 full nodes distributed across 10 regions, using commercial-grade machines similar to Aptos deployment.

Throughput-Latency

Figure: Common performance characteristics of Zaptos and Aptos blockchains.

The above figure compares the relationship between end-to-end latency and throughput of the two systems. Both exhibit a gradual latency increase as the load increases, with sharp spikes at maximum capacity, but Zaptos consistently demonstrates more stable latency before reaching peak throughput, reducing latency by 160 milliseconds under low load and over 500 milliseconds under high load.

Impressively, Zaptos achieves sub-second latency at 20k TPS in a production-level mainnet environment—this breakthrough makes real-world applications requiring speed and scalability a possibility.

Latency Breakdown

Figure: Latency breakdown of the Aptos blockchain.

Figure: Latency breakdown of Zaptos.

The latency breakdown charts detail the duration of each stage for validators and full nodes in the pipeline. Key insights include:

· Up to 10k TPS: Zaptos' overall latency is nearly equivalent to its consensus latency, as optimistic execution, authentication, and optimistic commit stages are effectively "hidden" within the consensus stage.

· Above 10k TPS: Due to increased optimistic execution and full node synchronization time, non-consensus stages become more significant. Nevertheless, Zaptos significantly reduces overall latency by overlapping most stages. For example, at 20k TPS, the baseline total latency is 1.32 seconds (consensus 0.68 seconds, other stages 0.64 seconds), while Zaptos is 0.78 seconds (consensus 0.67 seconds, other stages 0.11 seconds).

Conclusion

The evolution of blockchain architecture parallels the transformation in manufacturing—from simple sequential workflows to highly parallelized assembly lines. Aptos's assembly line approach has significantly increased throughput, while Zaptos goes further, reducing latency to sub-second levels, all while maintaining high TPS. Just as modern computing architectures leverage parallelism to maximize efficiency, blockchain must continuously optimize its design to eliminate unnecessary delays. By comprehensively optimizing the blockchain pipeline to achieve minimal latency, Zaptos paves the way for real-world blockchain applications that require speed and scalability.

References

[1] Gene M. Amdahl, Gerrit A. Blaauw, and Frederick P. Brooks. 1964. "Architecture of the IBM System/360." IBM Journal of Research and Development. https://doi.org/10.1147/rd.82.0087

[2] David DeWitt, and Jim Gray. 1992. "Parallel Database Systems: The Future of High Performance Database Systems." Communications of the ACM. https://doi.org/10.1145/129888.129894

[3] Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin et al. 2016. "TensorFlow: a System for Large-Scale Machine Learning." In 12th USENIX symposium on operating systems design and implementation (OSDI). https://arxiv.org/abs/1605.08695

[4] The Moving Assembly Line and the Five-Dollar Workday. https://corporate.ford.com/articles/history/moving-assembly-line.html

[5] Zekun Li, and Yu Xia. 2021. DIP-213 - Decoupled Execution. https://github.com/diem/dip/blob/7dc44ee57bb7efe76559f05dcc6851d97e2d3149/dips/dip-213.md

[6] Rati Gelashvili, Lefteris Kokoris-Kogias, Alberto Sonnino, Alexander Spiegelman, and Zhuolun Xiang. 2022. "Jolteon and Ditto: Network-Adaptive Efficient Consensus with Asynchronous Fallback." In International conference on financial cryptography and data security (FC). https://arxiv.org/abs/2106.10362

[7] Quorum Store: How Consensus Horizontally Scales on the Aptos Blockchain. https://medium.com/aptoslabs/quorum-store-how-consensus-horizontally-scales-on-the-aptos-blockchain-988866f6d5b0

[8] Zhuolun Xiang, Zekun Li, Balaji Arun, Teng Zhang, and Alexander Spiegelman. 202 2025. "Zaptos: Towards Optimal Blockchain Latency." arXiv preprint arXiv:2501.10612. https://arxiv.org/abs/2501.10612

This article is from a submission and does not represent the views of BlockBeats.

Disclaimer: The content of this article solely reflects the author's opinion and does not represent the platform in any capacity. This article is not intended to serve as a reference for making investment decisions.

PoolX: Locked for new tokens.

APR up to 10%. Always on, always get airdrop.

Lock now!