The 8th KIR: NetS&P Lab (KAIST)_Securing and Improving BFT Consensus Protocols with Advanced Networking_Progress Report(5)

Summary

For much improved transaction volume, latency, and finality, many blockchain systems utilize BFT consensus protocols. In particular, permissioned blockchains enjoy the full benefits of BFT consensus among a small (e.g., a few tens or hundreds) group of distributed nodes. One remaining problem though is the limited scalability of BFT consensus protocols. In particular, the many existing protocols, such as PBFT, Istanbul BFT, have quadratic message complexity to the number of replicas in the system, and this makes the blockchains hard to scale beyond a few tens of nodes participating in consensus. Some newer proposals have attempted to reduce the message complexity via committee selections, threshold signatures, and trusted hardware.

In this proposal, we ask whether the current network infrastructure is properly used for the BFT consensus protocols in permissioned blockchains and investigate if emerging network primitives can further improve the throughput performance. First, we plan to study whether the current Internet infrastructure is properly used for scalable operation of the Istanbul BFT consensus protocol in real large-scale blockchain networks like Klaytn. In particular, we will perform an in-depth evaluation on the effect of the current underlying Internet architecture of the commercial blockchains on their throughput performance. Furthermore, we aim to utilize emerging hardware-based networking primitives, such as programmable switches, to further speed up the message propagation in the Istanbul BFT consensus protocol.

Key Deliverables - Technical report

For our KIR milestone #5, we submit this technical report on the progress of our evaluation of the network message complexity in the current Klaytn network. We have found that the current Klaytn implementation sends many redundant consensus messages in all rounds regardless of whether round changes occur. We believe that this behavior increases the message complexity in Klaytn unnecessarily from O(n^2) to O(n^3) and, thus, we provide our detailed analysis on this problem in this report. To clearly show this problem, we empirically confirm the redundant message transfers in a controlled environment, including decryption and decompression of all consensus messages between all consensus nodes. We make some observations and suggestions for remaining tasks.

1 Investigating Message Broadcasts in Klaytn

1.1 Review of IBFT message complexity

Let us first review the IBFT protocol and its intended message complexity [1].


Figure 1. Intended message complexity of IBFT

Figure 1 illustrates a possible scenario of message transfers between four consensus nodes (CNs). CN1 takes the role of a proposer in this example and initiates a round with a Preprepare message. Once the Preprepare message is broadcast to CN2-CN4, Prepare messages are sent from all CNs (except the proposer) to all other CNs. Commit and RoundChange messages are sent from all CNs to all other CNs. Overall, the Preprepare message type incurs O(n) message complexity while all other message types create O(n^2) message complexity in IBFT.

1.2 Empirical evidence of n^3 message complexity in Klaytn

To measure the message complexity in Klaytn, we set up seven CNs in a controlled environment and count the number of messages of the Prepare message type in each round.


Figure 2. Number of received Prepare messages per round

Figure 2 shows the number of Prepare messages the CN1 node receives at each round. As we discuss in Figure 1, we expect six Prepare messages at the CN1 node per round. However, we observe that about 36 Prepare messages are received at the CN1 node in most cases. This significant difference is hard to explain with the message transfer model of IBFT, as shown in Figure 1.


Figure 3. Distribution of redundant message events

To better understand the main cause of this huge gap between the theory and the practice, we take a look at the more detailed patterns of message receptions at the CN1 node. One conjecture we set up is that there seems to exist many redundant consensus messages exchanges between CNs. For example, in a single round, CN1 sends only one Prepare message to CN2. The original IBFT protocol [1] has no redundant messages in the protocol, as shown in Figure 1. In Figure 3, we show the number of cases for different numbers of redundant messages. Interestingly, out of all 755 unique messages we observe during one experiment with seven CN nodes, only 29 messages are exchanged without any redundant messages. In all other 726 cases, some redundant messages are made and received by the CN1 node. Surprisingly, in the vast majority (e.g., 528 out of 755) of cases, there exist five redundant messages for the already received consensus messages.


Figure 4. Actual message complexity in Klaytn’s IBFT

From the experiments above, we conclude that the Klaytn’s IBFT implementation somehow makes redundant consensus messages. Based on the number of redundant messages (i.e., 6 in most cases when 7 CN nodes exist), we further conjecture the message exchange patterns between CN nodes for the Klaytn’s IBFT, as shown in Figure 4. The best explanation of the behaviors shown in Figure 2 and Figure 3 is that each CN node receives a consensus message for the first time and then broadcasts consensus messages to all other nodes that it believes have not yet received it. Figure 4 visualizes this process of broadcasting redundant messages in red. Interestingly, this implementation model explains our observations in Figure 2 and Figure 3. Moreover, the redundant messages due to this broadcast implementation of the Klaytn’s IBFT easily outnumbers the originally intended IBFT consensus messages.

1.3 SDN-based test environment


Figure 5. Overview of our SDN-based Klaytn network analysis framework

Let us briefly describe our experiment setup for the simple controlled experiment with seven CNs. The seven CN nodes run on separate virtual machines (VMs) and all of their peer-to-peer sessions go over the common software Open vSwitch (OvS). The software switch is connected to a single POX controller.

The POX controller fetches all the session keys between CNs. Whenever CNs roll over their session keys, the key information at the POX controller is also updated. With the current session key information, the POX controller decrypts the consensus messages.

1.4 Decrypting RLPx messages


Figure 6. RLPx message structure

To accurately count the number of redundant messages of the same type, we decrypt and decode the received RLPx messages. Figure 6 shows the RLPx message structure. We use the counter-mode symmetric key decryption with AES. And the Snappy compression is used to decompress messages.

1.4 Code reviews

To better understand the broadcasting behavior in the Klaytn’s IBFT, we review the current Klaytn’s code. All messages handled in HandleMsg are forwarded to GossipSubPeer unless any error occurs. Each node receives the same consensus message at least 1 to 6 times, according to the following flowcharts in Figure 7 and Figure 8.

569x325
Figure 7. HandleMsg function in klaytn/consensus/istanbul/backend/handler.go (version 1.8.3)

(Some details are omitted for easier understanding)


Figure 8. GossipSubPeer function in klaytn/consensus/istanbul/backend/backend.go (version 1.8.3)

To investigate where this logic of sending redundant messages has originated, we crawled several open source projects and found that it originates from Quorum 2.0. There exist HandleMsg (quorum/consensus/istanbul/backend/handler.go) and Gossip (quorum/consensus/istanbul/backend/backend.go) functions in Quorum 2.0. We discover that the two functions are almost identical to the ones in the current Klaytn code.

1.5 Impact analysis

It is, in fact, unclear why this broadcasting behavior was first introduced to the IBFT codebase since none of the source codes (both in Klaytn and Quorum) leaves any explanations in the code. One possible reason we can guess is to better resist message drops with redundant messages at the network layer. Since every message is redundantly broadcast to all other nodes, this provides another layer of message propagation which makes the consensus protocol highly robust against message drops.

However, the added robustness to message drops does not fully explain the cost incurred by this broadcasting behavior. With this additional broadcast the message complexity of O(n^2) becomes O(n^3). We believe that more in-depth analysis of tradeoff between message complexity and robustness to message drops should be conducted to fully understand the pros and cons of the consensus message broadcast behavior in Klaytn.

1.6 Remaining tasks

  • Evaluate the benefit of having broadcasts
  • Evaluate network bandwidth increase due to the broadcast
  • Evaluate the effect of the broadcast for larger networks

References

[1] Istanbul Byzantine Fault Tolerance · Issue #650 · ethereum/EIPs · GitHub

1 Like

Thank you for such an interesting study!

It’s a guess.
I think broadcast is for short consensus time. IBFT is a consensus algorithm that can improve scalability with only a limited consensus node, so it will have a short consensus time.

In a short consensus time, all five steps of consensus (including proposal) must be completed, but without broadcast is considered to be unstable. The reason may be to optimize communication time through broadcast because the physical location or communication congestion status between consensus nodes is unknown.

Of course, the above is not verified through experiments or research, but it is a assumption that came up with the results of the KAIST team’s research.

1 Like

@MinGyu Thanks for your feedback. Yes, we also guess that it is for making the IBFT consensus more robust to network failures, which is particularly critical for fast consensus algorithms like IBFT.

The current broadcast can be optimal (since all important messages are heavily repeated) for reducing consensus time only if the network has infinite bandwidth. In other words, repetition is great for reliability when its cost is bearable. However, we don’t think that’s the case for Klaytn. As Klaytn aims to increase the number of consensus nodes concurrently participating for consensus, the network bandwidth required for consensus is critical to the reliable operation of Klaytn. We believe it is the time to carefully evaluate whether if this heavily repetition of consensus messages in Klaytn is really worth it and whether if there exists any sweet spot between full broadcast of consensus messages and zero redundant messages.

2 Likes

@Prop_minsukk

According to this link, maximum number of validators in IBFT is 30. Beyond number can cause performance loss. Now, number of klaytn’s CN is 32, and Klaytn are planning to increase it further. If this link is correct, Klaytn must find a solution for how to prevent performance loss.

I think, we can find solution in ‘committee’. How about only committee participate and broadcast messages. If so, as number of CN increases, consensus stability and communication congestion will not increase as much as O(n^3) if number of committee is maintained.