RaptorCast: Designing a Messaging Layer

In Proof of Stake blockchains, a pre-determined leader typically proposes a block of transactions at each round. The propagation of this block to all validators is one of the most challenging and time-consuming steps in the consensus protocol. In this blog post, we examine RaptorCast, a solution designed to address the following considerations:

  • Performance - a block proposal needs to be sent quickly to the rest of the network
  • Security - each recipient of the block (or its pieces) needs to be able to verify that it originated from the correct leader and arrived untampered
  • Robustness - honest validators need guarantees that in the presence of packet loss and a Byzantine minority they can reconstruct the block proposal

Background

How does a leader get their block proposal to other validators?

There are many strategies to tackle this challenge. The leader can naively send the proposal block to each validator one by one. However, this method does not scale. For instance, with 1,000 validators and a block size of 2MB, the leader would need to upload approximately 2GB of data. Assuming zero network latency and a bandwidth of 1 Gbps, this would still take around 16 seconds, which is clearly impractical for a performant blockchain.

In our system, we need to make a few key decisions to ensure compatibility with blockchain constraints while optimizing for speed and reliability. Specifically, we need to choose:

  • A Data Transmission Protocol
  • An Encoding System
  • A Broadcast Strategy

Below, we describe each of these design parameters and explain our choices.

Data Transmission Protocol

When selecting a data transmission protocol for our system, we are primarily considering two options: TCP (Transmission Control Protocol) and UDP (User Datagram Protocol). Each protocol offers unique advantages and trade-offs, and our decision should align with our performance, reliability, and security requirements.

TCP is a connection-oriented protocol that ensures reliable data delivery. It provides features such as error checking, retransmission of lost packets, and data sequencing. This makes TCP highly suitable for environments where data integrity and reliability are critical—especially in scenarios involving potential packet loss or network instability. However, this reliability comes at a cost: increased latency when packets are reordered or lost, which can reduce performance in real-time or high-throughput applications.

On the other hand, UDP is a connectionless protocol that emphasizes speed and efficiency. It transmits data without the overhead of establishing a connection, making it ideal for applications that demand low latency and can tolerate occasional packet loss. However, we need to adjust our design to compensate for packet loss over UDP and ensure that the target number of encoding symbols is received by validators despite potential losses.

That said, UDP does not inherently provide mechanisms for authentication or packet delivery guarantees, which are two major concerns in our context. However, if we can address these limitations at other layers then UDP becomes a compelling choice.

So in our Proposal Block Propagation design we choose UDP over TCP to take advantage of its speed, and we try to address the packet loss and data integrity and message authentication issues with our design and encoding systems.

QUIC is another transport layer protocol that incorporates secure sessions and flow control strategies (similar to TCP) with efficiency gains. Relative to UDP, QUIC introduces additional overhead of handshakes and encryption that outweigh the benefit of better reliability guarantees.

Encoding System

Since we are choosing UDP instead of TCP, integrating an encoding system to handle packet loss becomes essential. By selecting UDP, we inherently accept the possibility of packet loss, so we must compensate for it efficiently. The most effective way to do this is by incorporating a robust forward error correction (FEC) scheme. For example, if we anticipate a 10% packet loss, we can simply send 10% extra encoded packets. This way, the receiver can successfully decode the source data as long as they receive a sufficient number of packets—regardless of which specific packets arrive.

There are several well-known encoding schemes, including LT codes, R10, RaptorQ, and Reed-Solomon. Each offers distinct advantages: some, like Reed-Solomon (Reed & Solomon, 1960), are robust against corrupted data, while others, such as R10 and RaptorQ, are designed to handle missing data effectively.

For our system, we have chosen R10, an implementation of Raptor codes (Shokrollahi, 2006), due to its performance and scalability. However, R10 does not inherently protect against symbol corruption (i.e., modified data). To address this, we will incorporate symbol-level authentication to ensure data integrity and detect any tampering during transmission.

Broadcast Strategy

This design layer defines how packets are distributed and forwarded from the leader to the other validators. Several strategies exist, each with different trade-offs in terms of speed, reliability, and scalability.

One widely known approach is the Gossip protocol, where each validator forwards a subset of received packets to a few randomly selected peers. This method is simple and robust but can be inefficient due to redundant transmissions.

The approach that we are adopting is structured broadcast, where each validator is assigned a specific portion of the data to re-broadcast to a predefined group of peers. This two-hop method is more bandwidth-efficient and predictable.

There is also a semi-structured approach, which blends randomness with structure. In this model, packet forwarding targets are selected randomly but within a  defined structure, similar to Solana’s Turbine (Anza, n.d.) protocol.

1.  Data Transmission Layer

When a leader broadcasts their proposal, what prevents a malicious node from modifying the packets during forwarding (man-in-the-middle attack)? How can the receiver verify the integrity of the received data and ensure that the packets were indeed generated by the correct leader?

In our design, each encoding symbol (each chunk) fits into a single UDP packet, and each packet is independently verifiable. To understand how this works, we first describe the structure of each packet. Every packet is composed of four key components:

  • Merkle Proofs
  • Header
  • Chunk Header
  • Data

Let’s step through each component.

Merkle Proofs (100 bytes)

In order to validate authenticity of a message, the sender needs to sign the message with their private key, while the receiver verifies with the sender’s public key. As mentioned earlier, we are planning to convert a block into a bunch of chunks and transmit each chunk independently; thus, each chunk needs to be authenticated by the sender. In theory, the leader could sign each chunk, which would be sufficient protection from tampering or data corruption. But the signing operation is computationally expensive when run over hundreds or thousands of chunks.

With RaptorCast,  the number of signatures is reduced by a factor of 32 as follows. The leader generates a Merkle tree over every group of 32 data chunks (including their respective chunk headers). The leader then signs only the header (defined below) concatenated with the hash of the Merkle root, rather than signing each chunk individually. Since each hash is 20 bytes and the Merkle tree depth is 5 (for 32 leaves), the Merkle proof requires 100 bytes.

With the Merkle tree approach, we reduce the number of signatures by a factor of 32, at the cost of 100 bytes of additional data per chunk.

This approach also benefits verification. Because every 32 chunks share the same signature, verifiers can cache signatures and perform fast Merkle proof verifications, rather than verifying thousands of individual signatures.

Header (108 bytes)

The signature mentioned above provides guarantees for the recipient not only about the Merkle root but also associates it with all the metadata defined below. In fact, the signature takes up most of the header space (65 bytes).

  • 65 bytes => Signature of sender over hash(rest of the header, concatenated with Merkle root). This signature will be shared with 31 other UDP packets (same Merkle root).
  • 2 bytes => Version, bumped on protocol updates
  • 1 bit => Broadcast flag
  • 7 bits => Merkle tree depth
  • 8 bytes (u64) => Epoch #
  • 8 bytes (u64) => Unix timestamp in milliseconds
  • 20 bytes => First 20 bytes of the hash of the block proposal
  • 4 bytes (u32) => Serialized block proposal length in bytes

The total size of the header is 108 Bytes:

Chunk Header (24 bytes)

The chunk header includes metadata specific to each data chunk and contains information about the encoding index. The structure is shown in the diagram below:

Data

To compute the minimum number of UDP packets required to represent a given block proposal, we take the MTU (maximum transmission unit) size of ~1500 bytes and subtract the fixed 232 bytes used by the header, chunk header, and Merkle proof. This leaves 1268 bytes for actual data.

We then divide the total size of the message by this payload capacity to determine how many packets are needed.

With this in place, we've addressed the data integrity and message authentication concern. Each chunk includes the signature of the Merkle root, so if a malicious node modifies any packet, the hash of the altered version will not match the Merkle tree, and the integrity check will fail.

2. Encoding System

We have already addressed message integrity and authentication through Merkle proofs and digital signatures. We now turn to the second major challenge: ensuring message availability despite unreliable delivery. To tackle this challenge, we use an encoding system that ensures delivery of messages despite packet loss.

Packet loss can occur for various reasons, including network instability, congestion, or even malicious behavior by nodes that deliberately drop packets or fail to forward packets. Our goal is to ensure that each honest node receives enough encoded packets to reconstruct the original proposal message, regardless of which specific packets were delivered.

This last point is crucial: we cannot rely on the packet indices, since we don’t control which packets are lost. Instead, we rely on a type of encoding system, erasure coding, which transforms the original message into a longer derived version. The original message can be recovered from any sufficiently large subset of the new message.

To achieve this, we must choose an appropriate number of encoding chunks (encoding symbols) to transmit. For example, if we divide the proposal into 1000 source chunks (source symbols), we may estimate that 1100 encoded chunks are sufficient for successful reconstruction. However, if we also anticipate 10% packet loss, and account for up to 33% of nodes being malicious (e.g., not forwarding data), we must introduce additional redundancy to ensure that honest nodes receive at least 1100 chunks.

Determining the exact redundancy depends on the broadcasting strategy, which we will explain in the next section. Once the strategy is defined, we will return to explain how we select the redundancy level with high confidence. Next, we will explain the encoding system that we are using.

We use R10 as the basis of our encoding-decoding system. R10 is a standardized fountain code specified in RFC 5053 (Luby et al., 2007). It is designed for efficient and robust forward error correction (FEC), particularly over unreliable or lossy networks like UDP.

For those unfamiliar, R10 is closely related to Luby Transform (LT) codes (Luby, 2002), which it builds upon and extends. To provide proper context, we begin with a brief explanation of LT encoding and decoding, before transitioning into how R10 enhances these methods.

LT Encoding (Luby Transform)

In LT codes, the core idea is to produce an arbitrary number of encoded symbols from a fixed set of source chunks (say, N chunks , they are also called source symbols) . Each encoded symbol is generated by:

  1. Selecting a degree d from a degree distribution (this determines how many source chunks to combine).
  2. Randomly picking d distinct source chunks from the original set.
  3. XORing those source chunks together to form the encoded symbol.

Note that the encoder deterministically selects source chunks based on the inputs (N and encoded symbol index).

Example Degree Distribution

degree 1 : 10%
degree 2 : 45%
degree 3 : 5%
degree 4 : 5%
degree 5 : 10%
degree 8 : 10%
degree 40: 15%

This means:

  • 45% of the time, the encoder picks 2 source chunks to XOR.
  • 15% of the time, it picks 40 source chunks to XOR.
  • And so on...

The encoding process is stateless and simple, which makes LT codes particularly attractive for systems where symbols are generated on the fly or streamed.

LT Decoding

Decoding in LT systems is based on the idea of peeling, an incremental approach to reducing the degree of received encoding chunks until all N degree 1 chunks are deduced.

  1. Find a degree-1 encoded symbol.
  2. This symbol contains only a single unknown source chunk, so it can be immediately recovered.
  3. Substitute the known source chunk into all other encoding symbols that include it (i.e., XOR it out).
  4. Repeat the process — each substitution may create new degree-1 symbols.

If at any point no degree-1 symbols remain, and decoding is not yet complete, the process fails.

LT Decoding Example

Let’s assume the source chunks are c1 to c5, and we receive the following encoded symbols:

s1 = c1                                      # degree 1 DONE
s2 = c1 + c2 + c3                            # degree 3
s3 = c2 + c1                                 # degree 2
s4 = c5 + c3                                 # degree 2
s5 = c4 + c5 + c2 + c1                       # degree 4

Step-by-step decoding:

  • s1 has degree 1 → recover c1
  • Substitute c1 into all other symbols that include c1: s2, s3, and s5

s1 = c1                                      # degree 1 DONE
s2 = c1 + c2 + c3        =>  c2 + c3         # degree 2
s3 = c2 + c1             =>  c2              # degree 1 DONE
s4 = c5 + c3                                 # degree 2
s5 = c4 + c5 + c2 + c1   => c4 + c5 + c2     # degree 3

  • s3 is now degree 1 → recover c2
  • Substitute c2 into all other symbols that include c2: s2 and s5

s1 = c1                                      # degree 1 DONE
s2 = c2 + c3             =>  c3              # degree 1 DONE
s3 = c2                                      # degree 1 DONE
s4 = c5 + c3                                 # degree 2
s5 = c4 + c5 + c2        => c4 + c5          # degree 2

  • s2 is now degree 1 → recover c3
  • Substitute c3 into s4: s4 becomes c5

s1 = c1                                      # degree 1 DONE
s2 = c3                                      # degree 1 DONE
s3 = c2                                      # degree 1 DONE
s4 = c5 + c3             => c5               # degree 1 DONE
s5 = c4 + c5                                 # degree 2

  • Recover c5, then use it in s5: s5 becomes c4
  • Finally, recover c4

s1 = c1                                      # degree 1 DONE
s2 = c3                                      # degree 1 DONE
s3 = c2                                      # degree 1 DONE
s4 = c5                                      # degree 1 DONE
s5 = c4 + c5             => c4               # degree 1 DONE

✅ All source chunks are recovered successfully.

Limitations of LT Codes

However, not all decoding sessions proceed this cleanly. Consider a set of encoded symbols where no degree-1 symbol is present:

s1 = c2 + c1                                 # degree 2
s2 = c3 + c5 + c2                            # degree 3
s3 = c1 + c5                                 # degree 2
s4 = c1 + c4                                 # degree 2
s5 = c4 + c3 + c2                            # degree 3
s6 = c1 + c3                                 # degree 2

Here, no symbol is degree 1, so the LT decoder gets stuck — it cannot make any progress via peeling. Despite this, the system contains 6 linear equations with 5 variables, which is algebraically solvable.

Here is another example showing that despite the presence of a degree-1 symbol, the LT decoder can get stuck and ultimately fail. This is almost identical to the first, successful example above, except that s1 = c3, instead of s1 = c1.

s1 = c3                                      # degree 1 DONE
s2 = c1 + c2 + c3                            # degree 3
s3 = c2 + c1                                 # degree 2
s4 = c5 + c3                                 # degree 2
s5 = c4 + c5 + c2 + c1                       # degree 4

Despite some progress, we get stuck.

s1 = c3                                      # degree 1 DONE
s2 = c1 + c2                                 # degree 2
s3 = c2 + c1                                 # degree 2
s4 = c5                                      # degree 1 DONE
s5 = c4 + c2 + c1                            # degree 3

R10: Improvements over LT

R10 codes extend LT by adding a pre-coding stage and supporting matrix-based decoding when the peeling decoder fails.

  • The pre-coding stage increases the decoding success rate by adding structured redundancy before the LT encoding
  • If peeling fails, R10 falls back to solving the system of equations using Gaussian elimination.
  • R10 codes require fewer additional encoded symbols for successful decoding compared to raw LT codes, thanks to the pre-coding stage and Gaussian elimination fallback. However, decoding still needs slightly more than the original number of source symbols, because some encoded symbols may be linearly dependent. A small redundancy factor addresses this requirement.

So, in the first failure case above, R10 would solve the full system to recover the source chunks — even when the LT decoder would give up.

3. Broadcast Strategy

Our design is based on a two-level structured broadcast model. Each validator receives a number of packets proportional to its stake and is responsible for then re-broadcasting those packets to the rest of the network.

Let $ S $ represent the stake distribution across all validators :

$$S = [S_1,S_2,...,S_n]$$

where each $ S_i $ is the stake of validator $i$ and let $S_T = \sum_{i=1}^{n} S_i$ be the total stake. Given $M$ total encoded symbols (chunks), each validator $i$ is assigned :

$$\lceil \frac{S_i}{S_T} M\rceil$$

chunks to re-broadcast. The rounding up via the ceiling function guarantees that each honest validator receives sufficient chunks to decode, even with one-third Byzantine nodes.

How Does the Leader Decide the Number of Encoded Symbols $M, M'$?

Let’s assume the original proposal message fits in K chunks. The leader needs to ensure that honest validators can still collectively receive enough chunks to successfully decode the original message, even in the presence of the following:

  • $L$: Packet loss from leader to validators (first hop)
  • $L'$: Packet loss among validators (second hop)
  • Byzantine behavior by validators (who may withhold or refuse to rebroadcast their assigned chunks) - expected to be less than 1/3 of the stake weight

We define the additional following variables:

  • $S_T$: total stake
  • $S_L$: leader's stake

Redundancy for each factor can be constructed as follows:

Packet Loss

$$r_{pl}=\frac{1}{(1-L)(1-L')}$$

The terms $1-L$ and $1-L'$ appear in the denominator because they represent the fraction of packets successfully delivered. For example, with a packet loss of 66.66%, only 1/3 of the packets are delivered, requiring a 3× redundancy factor to compensate ($\frac{1}{1 - 0.666} \approx 3$). Hence, placing them in the denominator scales the required redundancy appropriately.

Byzantine behavior (up to 1/3 of the network)

$$r_{b}=\frac{3S_T-3S_L}{2S_T - 3S_L}$$

The fraction $\frac{3S_T - 3S_L}{2S_T - 3S_L}$ ensures Byzantine fault tolerance under the assumption that 1/3 of validators may be malicious and refuse to rebroadcast. It adjusts the redundancy factor based on the leader’s stake.

Notably, if the leader holds zero stake ( $S_L = 0$ ), the expression simplifies to $\frac{3}{2}$.

which yields a base redundancy of 1.5, aligning with the requirement for tolerance against 1/3 Byzantine nodes. For instance, if 10 chunks are needed, the leader would send 15 chunks. Even if 5 of those are dropped (due to Byzantine behavior), the remaining 10 are sufficient.

Finally, as the leader’s stake approaches 2/3 of the total stake, the denominator $(2S_T - 3S_L)$ approaches zero, driving the redundancy factor toward infinity.

We can combine these factors multiplicatively and add a constant 1.1 term to account for decoding overhead.

$$r=1.1\times r_{pl} \times r_b=\frac{1.1(3S_T-3S_L)}{(1-L)(1-L')(2S_T - 3S_L)}$$

This formula ensures that:

  • Packet loss is accounted for at both transmission stages
  • The leader’s own stake is excluded from the redundancy calculation
  • Byzantine nodes (up to 1/3) are assumed to not participate in rebroadcasting
  • Honest validators receive at least 1.1× the required number of chunks for successful decoding

In the current implementation, the redundancy value is hardcoded. This can lead to inefficiencies when lower redundancy would suffice, and may also introduce Byzantine tolerance issues in scenarios where higher redundancy is required. For this reason, adopting a dynamic redundancy scheme (the above equation) could significantly improve the system's robustness.

Some examples, assuming total stake $S_T= 100$.

$S_L$ $L$ $L'$ Redundancy with BFT Guarantee Redundancy without BFT Guarantee
1 1% 1% 1.69 1.12
5 1% 1% 1.72 1.12
10 1% 1% 1.78 1.12
20 1% 1% 1.92 1.12
1 5% 5% 1.83 1.22
5 5% 5% 1.87 1.22
10 5% 5% 1.93 1.22
20 5% 5% 2.08 1.22

So the leader will generate $M' = K \times r + n$   chunks and set :

$$M=K \times r$$

and distribute chunks to individual validators according to $\lceil \frac{S_i}{S_T} M\rceil$.

Example

Consider a stake vector:

$$S=[2,3,2,1] $$

Here, the total stake is $S_T=8$.

Suppose the proposal block fits into 4 data chunks, and the redundancy factor is set to 2. This gives a total of $M = 4\times2 = 8$. To account for rounding in the above formula, the leader preemptively generates 4 extra backup chunks, one per validator, so $M' = M+n = 12$ chunks are created.

Phase 1: Assignment

The leader assigns chunks to validators based on their stake and $\lceil \frac{S_i}{S_T} M\rceil$ equation:

  • Validator 1 has $\frac{2}{8}$ stake → receives 2 chunks
  • Validator 2 has $\frac{3}{8}$ stake → receives 3 chunks
  • Validator 3 has $\frac{2}{8}$ stake → receives 2 chunks
  • Validator 4 has $\frac{1}{8}$  stake → receives 1 chunk
Phase 2: Re-broadcasting

After the leader sends the assigned chunks to each validator in the first phase, the validators are responsible for re-broadcasting their assigned chunks to all other validators

The following image illustrates the distribution and re-broadcasting of chunks:

In the previous example, the number of chunks assigned to each validator happened to be an integer, so no rounding was required. Hence, none of the backup chunks (c9 - c12) were used. However, in most real-world scenarios, fractional values arise and rounding up is necessary.

For instance, consider a new stake distribution with total stake 12:

$$S=[3,5,2,2]$$

Let the number of original chunks be $2$ with redundancy factor 4, which implies $M=8$. Now, we compute the number of chunks each validator should receive:

  • Validator 1 has $\lceil \frac{8 \times 3}{12}  \rceil$ stake → receives 2 chunks
  • Validator 2 has $\lceil \frac{8 \times 5}{12}  \rceil $ stake → receives 4 chunks
  • Validator 3 has $\lceil \frac{8 \times 2}{12}  \rceil$ stake → receives 2 chunks
  • Validator 4 has $\lceil \frac{8 \times 2}{12}  \rceil$ stake → receives 2 chunks

This results in 2+4+2+2=10  assigned chunks, even though the original number of chunks was 8. The 2 extra chunks are required to account for the rounding effect. This is precisely why the leader pre-generates $n$ extra chunks, where $n$ is the number of validators.

The following picture illustrates the chunks distribution from leader to validators in the second example.

Conclusion and Future Directions:

A high-performance blockchain demands that every part of the technology stack is optimized for throughput and low latency. RaptorCast is our current solution to the problem of sharing block proposals to a global network of node operators in a scalable and efficient way. As a quick summary, RaptorCast:

  1. Uses UDP as its data transmission protocol, accepting the lack of delivery guarantees to favor performance. Through digital signatures, each individual chunk is directly attributable to a given leader and block proposal.
  2. Addresses inevitable packet loss with encoding systems. In particular, we selected R10 (an implementation of Raptor codes). R10 extends Luby Transforms (LT) by adding a pre-coding stage and falls back to matrix-based decoding when the peeling decoder fails.
  3. Implements a two-hop broadcast tree in which chunks are distributed by stake weight initially, and the recipients re-broadcast their chunks to the rest of the network. Redundancy is a function of expected packet loss, leader’s stake value, potential Byzantine behavior, and a small decoding overhead.

We’ve described the current design but are actively researching areas of improvement. Future work could include the following high-level ideas:

Optimistic design with fallback mechanism

If a Byzantine minority refused to participate in block propagation, the network would lose a round. By relaxing the Byzantine fault tolerance guarantee, the redundancy requirements drop significantly. To achieve a balance of performance and liveness security, one could design an optimistic (low redundancy) proposal broadcast system equipped with a fallback mechanism. If consecutive or patterned failures are observed, higher redundancy to accommodate Byzantine fault tolerance can be activated. After a series of successful rounds, these parameters can then be automatically relaxed to enhance performance.

Faster encoder/decoder

Reed-Solomon codes could be a good alternative if we have a small number of source chunks, as they become significantly slower as the number of chunks increases. One way to manage this is by dividing the data into smaller segments and encoding each segment separately. This makes the use of Reed-Solomon codes more practical. However, to maintain Byzantine tolerance, the redundancy required for each segment can result in substantial bandwidth overhead.

References

Anza. (n.d.). Turbine Block Propagation. Agave Validator Documentation. Retrieved April 29, 2025, from https://docs.anza.xyz/consensus/turbine-block-propagation

Luby, M. (2002). LT codes. Proceedings of the 43rd Annual IEEE Symposium on Foundations of Computer Science, 271–282. https://doi.org/10.1109/SFCS.2002.1181950

Luby, M., Shokrollahi, A., Watson, M., & Stockhammer, T. (2007). Raptor forward error correction scheme for object delivery (RFC 5053). IETF. https://datatracker.ietf.org/doc/html/rfc5053

Reed, I. S., & Solomon, G. (1960). Polynomial codes over certain finite fields. Journal of the Society for Industrial and Applied Mathematics, 8(2), 300–304. https://doi.org/10.1137/0108018

Shokrollahi, A. (2006). Raptor codes. IEEE Transactions on Information Theory, 52(6), 2551–2567. https://doi.org/10.1109/TIT.2006.874390

Go to all blogs