For much improved transaction volume, latency, and finality, many blockchain systems utilize BFT consensus protocols. In particular, permissioned blockchains enjoy the full benefits of BFT consensus among a small (e.g., a few tens or hundreds) group of distributed nodes. One remaining problem though is the limited scalability of BFT consensus protocols. In particular, the many existing protocols, such as PBFT, Istanbul BFT, have quadratic message complexity to the number of replicas in the system, and this makes the blockchains hard to scale beyond a few tens of nodes participating in consensus. Some newer proposals have attempted to reduce the message complexity via committee selections, threshold signatures, and trusted hardware.
In this proposal, we ask whether the current network infrastructure is properly used for the BFT consensus protocols in permissioned blockchains and investigate if emerging network primitives can further improve the throughput performance. First, we plan to study whether the current Internet infrastructure is properly used for scalable operation of the Istanbul BFT consensus protocol in real large-scale blockchain networks like Klaytn. In particular, we will perform an in-depth evaluation on the effect of the current underlying Internet architecture of the commercial blockchains on their throughput performance. Furthermore, we aim to utilize emerging hardware-based networking primitives, such as programmable switches, to further speed up the message propagation in the Istanbul BFT consensus protocol.
For our KIR milestone #6, which is the final technical report of this KIR, we summarize our contributions to this KIR in three fold. (1) We have investigated possible network attacks against the current Klaytn’s underlying peer-to-peer network. (2) We have investigated ways to reduce the message complexity in the current Istanbul BFT consensus algorithm. (3) We have designed and implemented a general-purpose peer-to-peer message analysis tool for Klaytn and open-sourced it.
In this KIR, we aim to study the robustness of the current Klaytn’s peer-to-peer network layer against failures, disruptions, and even active attacks. The robustness of underlying blockchain networks has always been an interesting topic in general blockchain research. Interestingly, this very research problem has attracted even more attention in the last year while our team has been working towards more robust Klaytn operations. Particularly, as many real-world blockchain applications have surfaced in the recent years (e.g., NFT, DeFi), their application-level requirements are of utmost importance ever before. Most of these new real-world applications require fast and reliable network operations as they typically offer real-time transactions of values and non-fungible objects. We all have witnessed how robust blockchain operation is important for those emerging applications; or, how these NFT and DeFi applications are badly affected by any glitches, delays, or failures in underlying blockchain networks.
Therefore, in this aspect, we strongly believe that our work for a more robust Klaytn’s network and our contribution through an open-source project has advanced the state-of-the-art operation strategy of Klaytn. In the next three sections, we summarize our contributions in this KIR. The first two are the summary of previous KIR reports, and the last describes a new contribution regarding an open-source project to the Klaytn ecosystem. We then close our final report by listing a number of interesting future project ideas that will strengthen the networking layer of blockchain systems.
Network attackers can mount any combination of attacks from a wide set of attack vectors, from bandwidth denial-of-service attacks to sophisticated network link-flooding attacks, to disrupt the Klaytn’s underlying networks. As the reliability of underlying networks becomes more critical to the business of killer apps on blockchains, network attacks have become a realistic threat we must consider and prepare for in advance.
In our earlier reports, we have investigated whether targeted attacks against the current Klaytn CNs would be possible. From our report (1), we summarize our findings below.
Let us first review Klaytn’s Core Cell (CC) architecture. The following figure shows an example of a CC with one CN and two PNs. By design, Klaytn CNs are supposed to directly connect to all other CNs. In addition, a CN is connected to the PNs in its CC. Note that there are no other connections each CN creates and maintains.
Figure 1. Klaytn Core Cell architecture.
This simple connectivity for CNs ensures that a CN never directly connects to others outside its own CC and the CN mesh network. This is indeed a critical design choice since this makes Klaytn CNs not directly observable from outside. Consider an adversary who wishes to make a direct connection to a Klaytn CN. She must find the network address (i.e., IP and port number) of the CN; however, there exists no protocol-level support that provides the CN’s network address to her EN. The adversary’s EN can only connect to one of the PNs and this PN assignment is done by a bootnode.
We begin with the information that can be easily obtained by any unauthorized parties (e.g., any participants with an EN deployment). Thus, our first approach is to enumerate all existing PN network addresses in the current Klaytn cypress network.
In order to obtain as many PNs as in the network, we create a single EN node and repeatedly connect to the cypress network. Everytime our EN node connects to the network different sets of PNs are selected for our EN. By repeating this process several times, we are able to enumerate all existing PNs in the network. In order to minimize any unnecessary burden to the cypress network, we space out the two EN connections with enough time gap (e.g., a few minutes). Moreover, we repeat this experiment in five different Amazon EC2 data centers: US-east, US-west, East Asia, Southeast Asia, and Europe.
From our experiment conducted in August 2021, we enumerated a total of 27 PNs with their IP addresses as shown in Table 1.
Table 1. Enumeration of all 27 PNs in Klaytn cypress (measured in August 2021). Last 16 bits of the IP addresses are redacted.
From the list of PNs, we learn a number of new insights. First, all the PN nodes are located in Asia, especially, in three countries: South Korea, Singapore, and Japan. This is due to the 120 ms latency restriction imposed by the Klaytn team. Second, among the three countries, South Korea currently has the majority of PN nodes. Third, the majority of CN nodes are hosted by Amazon AWS; especially, 100% of PNs outside of South Korea are in Amazon data centers.
One interesting finding is that more than half of these PNs are located within the same (or adjacent) subnet with at least another PN. For example, two PNs are deployed within the network subnet 220.127.116.11/24, which is owned by Korea Telecom. In fact, this close proximity of PN nodes is not a surprising design artifact as the very example of the Klaytn-suggested Core Cell architecture; see Figure 1. In a sense, it is natural to create two PN nodes in the same network subnet and let them get advertised by the bootnodes.
The enumerated PN nodes and their network addresses may not pose any security threat by themselves. However, when this very information is used to infer the CNs’ network addresses, it can lead to some attacks targeting the CN nodes in the network; see Section 3 for a potential liveness attack against Klaytn.
Our finding of close proximity between a pair (or triple) of PNs does not necessarily suggest that a CN node connecting to the PNs with adjacent IP addresses must be located in the same subnet. Their CN could very well be assigned an IP address that is unrelated to the IPs of the PNs.
What we suggest is a reasonable doubt that some Klaytn CNs in the cypress network may be located within the same or adjacent network subnet of their PNs. Considering the typical network deployment exercises in private corporations, it is reasonable to create three Amazon EC2 instances in the same subnet and use one as a CN and the two others as PNs of the Core Cell. Since there exist no explicit or implicit policy that suggests or forces the use of different subnets for a CN and its PNs, it is likely that some Klaytn participants may have created all nodes in a Core Cell in the same (or adjacent) subnet(s). The same subnet exercise is especially beneficial for the Klaytn participants since placing all the nodes in the same subnet can reduce the bandwidth charge for cross-subnet traffic in Amazon EC2.
Another important contribution of this KIR is to identify one specific feature found in the current Klaytn’s Istanbul BFT algorithm that increases the message complexity of Klaytn to O(n^3) when n is the number of CN nodes in the system. The original Istanbul BFT literature ensures the message complexity of O(n^2) in the normal case (i.e., no view changes); however, the current Klaytn implementation constantly creates O(n^3) messages due to its redundant message broadcast feature.
In our earlier reports, we have clarified the effect of this feature and discuss whether it should be considered a bug. From our report (5), we summarize our findings below.
Let us first review the IBFT protocol and its intended message complexity .
Figure 2. Intended message complexity of IBFT
Figure 2 illustrates a possible scenario of message transfers between four consensus nodes (CNs). CN1 takes the role of a proposer in this example and initiates a round with a Preprepare message. Once the Preprepare message is broadcast to CN2-CN4, Prepare messages are sent from all CNs (except the proposer) to all other CNs. Commit and RoundChange messages are sent from all CNs to all other CNs. Overall, the Preprepare message type incurs O(n) message complexity while all other message types create O(n^2) message complexity in IBFT.
To measure the message complexity in Klaytn, we set up seven CNs in a controlled environment and count the number of messages of the Prepare message type in each round.
Figure 3. Number of received Prepare messages per round
Figure 3 shows the number of Prepare messages the CN1 node receives at each round. As we discuss in Figure 2, we expect six Prepare messages at the CN1 node per round. However, we observe that about 36 Prepare messages are received at the CN1 node in most cases. This significant difference is hard to explain with the message transfer model of IBFT, as shown in Figure 2.
Figure 4. Distribution of redundant message events
To better understand the main cause of this huge gap between the theory and the practice, we take a look at the more detailed patterns of message receptions at the CN1 node. One conjecture we set up is that there seems to exist many redundant consensus messages exchanges between CNs. For example, in a single round, CN1 sends only one Prepare message to CN2. The original IBFT protocol  has no redundant messages in the protocol, as shown in Figure 2. In Figure 4, we show the number of cases for different numbers of redundant messages. Interestingly, out of all 755 unique messages we observe during one experiment with seven CN nodes, only 29 messages are exchanged without any redundant messages. In all other 726 cases, some redundant messages are made and received by the CN1 node. Surprisingly, in the vast majority (e.g., 528 out of 755) of cases, there exist five redundant messages for the already received consensus messages.
Figure 5. Actual message complexity in Klaytn’s IBFT
From the experiments above, we conclude that the Klaytn’s IBFT implementation somehow makes redundant consensus messages. Based on the number of redundant messages (i.e., 6 in most cases when 7 CN nodes exist), we further conjecture the message exchange patterns between CN nodes for the Klaytn’s IBFT, as shown in Figure 5. The best explanation of the behaviors shown in Figure 3 and Figure 4 is that each CN node receives a consensus message for the first time and then broadcasts consensus messages to all other nodes that it believes have not yet received it. Figure 5 visualizes this process of broadcasting redundant messages in red. Interestingly, this implementation model explains our observations in Figure 3 and Figure 4. Moreover, the redundant messages due to this broadcast implementation of the Klaytn’s IBFT easily outnumbers the originally intended IBFT consensus messages.
As we have investigated the Klaytn’s network, we have faced a number of practical challenges in analyzing the peer-to-peer messages in Klaytn. Therefore, out of necessity, we have developed the peer-to-peer message analysis tool for Klaytn, which particularly conducts the three processes:
- Software-defined networking (SDN) based visibility vantage point (Section 4.1)
- Peer-to-peer message assembly and decryption (Section 4.2)
- Visualization of real-time consensus message types between CN nodes (Section 4.3)
In a realistic deployment scenario for the Klaytn network with multiple consensus nodes (or CN nodes), peer-to-peer connections may traverse different network components, leaving no single common network vantage point in the network. This is less ideal for our peer-to-peer message analysis because we cannot observe all the messages exchanged between CNs. Therefore, we develop a software based controller that serves all the peer-to-peer messages between all the CNs and mediates them, whenever necessary. The details can be found in our open-source project description below (Section 4.4).
All peer-to-peer messages exchanged between CNs are encrypted and integrity protected. Thus, a complete peer-to-peer message analysis requires the vantage point to correctly decrypt the messages with proper symmetric keys. For this, all the segments must be correctly assembled first and then decompressed too. We have developed all these steps in our message analysis tool. Our tool makes it convenient to analyze any peer-to-peer messages observed at the network, enabling further investigation of the Istanbul BFT protocol easier. The details can be found in our open-source project description below (Section 4.4).
Analysis of peer-to-peer messages and the consensus behaviors of multiple CN nodes often requires real-time observation of message types and counts. Such instant feedback is helpful for identifying subtle mistakes and even bugs in the protocol designs. As we sought after the protocol bug (i.e., unnecessary redundant broadcast of consensus messages in the current Istanbul BFT), we have developed the real-time visualization using open-source tools so that the experimenters can easily trace the message complexity and thus infer the state of the CN nodes easily. The details can be found in our open-source project description below (Section 4.4).
We open source our peer-to-peer message analysis tool at the following github link: https://github.com/NetSP-KAIST/Klaytn_CN_monitoring . The details of the installation and use cases can be found in the link as well. We summarize them below in this report as well.
This document is to reproduce the Klaytn consensus message monitoring experiment environment. Let us describe briefly our experimental setup for the simple controlled experiment involving seven CNs. Each CN operates on separate virtual machines (VMs) and all of their peer-to-peer sessions go over the common software Open vSwitch (OvS). One POX controller is connected to the software switch. Then we captured all packets per network interface we defined (tap1, tap2 and so on).
Figure 6. Overview of our SDN-based Klaytn network analysis framework
- Consensus message
Consensus node(CN) of Klaytn is built atop the Istanbul Byzantine Fault Tolerance(IBFT) protocol. IBFT is a consensus protocol for blockchain that guarantees immediate finality. There are four distinct types of consensus messages in IBFT: pre-prepare, prepare, commit, and roundchange.
In terms of implementation, several types of messages are used to maintain the Klaytn system, including transaction msg, messages for the rlpx protocol, and consensus message. Consensus message refers to a message type utilized by CNs to achieve consensus. Each consensus message contains consensus message type, sender’s address, sender’s signature, message including round number and so on. All of these messages are encrypted with keys that are unique to each TCP session. To obtain the round number and message type, we extracted keys and decrypted the data.
- Consensus message monitoring
Monitoring consensus messages represents the list of messages received per round for each CN. If one of the CNs becomes the proposer at a particular round number, it will not receive a preprepare message. We utilize Grafana’s dashboard to represent the number of messages received from each CN.
Figure 7. Grafana real-time dashboard example
This file does many things as described below:
- creating a txt file from a pcap file that contains every packet sent during a tcp session.
- identifying a secret key for every tcp session.
- decrypt each RLPx frame.
- record the outcome in a MySQL database.
This python file uses Crypto, sha3, snappy, rlp library to decode each frame.
You need to modify this file based on your system. This file is used in your host machine.
This file describes some useful functions for data_processing.py.
You need to modify this file based on your system. This file is used in your host machine.
The Klaytn kcn file is created using this file. I slightly altered the code in order to obtain the secret key. Don’t forget to run “make” once the original Klaytn rlpx has been converted to this. Your virtual machine uses this file. Also, make a SecData directory and you need to modify some code based on your system.
- /klaytn_pox/VM_ script/script.sh
This file transmits the secret key extracted from rlpx.go to the host machine. You need to make a ‘secrets’ directory in your host machine and modify this code based on your system.
- Data Structure
We decrypt and decode the received RLPx messages. Figure 3 shows the RLPx message structure. We use the counter-mode symmetric key decryption with AES. And the Snappy compression is used to decompress messages.
Figure 8. RLPx message structure
- Data collection
We use the open-source PacketSorter library to gather data. It sorts packets ordered by sequence number and any duplicated and empty messages are eliminated from the output pcap file. Also, this program attaches to each network interface. In order to capture all incoming and outgoing packets, I run PacketSorter at tap1. We use the Scapy package to read pcap files in Python.
The data processing.py file handles data decryption. The frame is decrypted using Crypto and the sha3 library. And the frame is decompressed using Snappy.
There exist several things that must be installed manually, so it will take some time to handle everything. The intended reader is Klaytn developers, so simple steps may be omitted. If you are familiar with Klaytn node implementation using virtual machines (VMs), it may be of great assistance. Details of the installation steps can be found at the github readme page .
Completing the tasks in this KIR, we would like to close the report with discussion on several interesting and important research questions that still remain to be addressed. Considering the importance and difficulties in these research problems, we suggest forming an interdisciplinary research team and tackle these problems, if possible. The following potential future research directions are only examples and more details can be further discussed for subsequent research collaboration.
- Automated Detection of Network Vulnerabilities. In this KIR and many other recent work, researchers have investigated performance bugs and security vulnerabilities in the networking stacks of blockchain projects. One limitation is that all these work require manual investigation and is thus time consuming. We envision that automated detection of such network vulnerabilities would greatly benefit existing blockchain projects by finding performance bugs and security vulnerabilities and addressing them before they are triggered and exploited.
- Extracting Network Synchrony Requirements from Consensus Implementations. Most blockchain systems explicitly make certain assumptions regarding their underlying network synchrony. Yet, consensus safety analysis for given synchrony assumptions is known to be difficult and error prone. Also, sometimes blockchain implementations do not even state the explicit network synchrony assumption. We envision that extracting required network assumptions directly from blockchain implementations would significantly benefit the blockchain ecosystem.
- Guaranteed Network Access for Critical Consensus Messages. Blockchain networks may have certain network synchrony assumptions and design their consensus logics that work with these assumptions. However, blockchain networks have never been able to enforce certain levels of network requirements to the underlying networks. We envision that exploring ways to design and build such a network access guarantee mechanisms for blockchain applications would benefit existing blockchains and their latency-sensitive applications.