For much improved transaction volume, latency, and finality, many blockchain systems utilize BFT consensus protocols. In particular, permissioned blockchains enjoy the full benefits of BFT consensus among a small (e.g., a few tens or hundreds) group of distributed nodes. One remaining problem though is the limited scalability of BFT consensus protocols. In particular, the many existing protocols, such as PBFT, Istanbul BFT, have quadratic message complexity to the number of replicas in the system, and this makes the blockchains hard to scale beyond a few tens of nodes participating in consensus. Some newer proposals have attempted to reduce the message complexity via committee selections, threshold signatures, and trusted hardware.
In this proposal, we ask whether the current network infrastructure is properly used for the BFT consensus protocols in permissioned blockchains and investigate if emerging network primitives can further improve the throughput performance. First, we plan to study whether the current Internet infrastructure is properly used for scalable operation of the Istanbul BFT consensus protocol in real large-scale blockchain networks like Klaytn. In particular, we will perform an in-depth evaluation on the effect of the current underlying Internet architecture of the commercial blockchains on their throughput performance. Furthermore, we aim to utilize emerging hardware-based networking primitives, such as programmable switches, to further speed up the message propagation in the Istanbul BFT consensus protocol.
Expected project completion time: 12 months
For our KIR milestone #1, we submit this technical report on our initial measurement study on Klaytn’s CN network. Our main focus is in the CN-to-CN network of Klaytn because it is the very peer-to-peer network that performs the Istanbul BFT protocol for consensus.
This technical report first summarizes our measurement study of a Klaytn CN network in a small-scale lab test environment. Then, we present our initial observation study of the Klaytn cypress network and the consensus-related CN-to-CN messages. Our inference
This technical report not only provides our initial measurement results of Klaytn but also prepares our security analysis of the cypress CN network. Last, we introduce a potential liveness attack in the current Klaytn cypress network and overview the remaining tasks for an attack demonstration.
We first conduct a measurement study on the Klaytn instances in our local network. This small-scale Klaytn network is a standalone network, which is isolated from the main or test network of Klaytn, and produces its own independent Klaytn blockchain that is irrelevant to the operation of the main Klaytn.
The main purposes of our experiments in this local setting are three folds: First, we aim to become more familiar with the current Klaytn codebase and its peer-to-peer network protocol by executing its binary in a safe local network. Second, we want to understand the message structure between CNs in Klaytn. Third, we wish to measure the network message complexity between CNs when performing Istanbul BFT consensus protocol.
We have created a small Klaytn network using the official golang implementation of Klaytn nodes from the Klaytn’s main github branch (GitHub - klaytn/klaytn: Official Go implementation of the Klaytn protocol).
Figure 1. Topology of our small-scale local Klaytn network.
Figure 1 shows our Klaytn network topology. In total we created 8 CNs, 8 PNs, and 2 ENs. All the CNs are directly connected to each other. All PNs are also connected to each other with direct peering. And a PN is paired with a CN, forming a Core Cell. We create two ENs and each of them is connected to all the PNs.
All these nodes are connected in a carefully controlled manner. We, the experimenters, manually pre-configure each node connectivity. During the entire experiment, the node connectivity does not change.
As an initial experiment setup, we ignore any network propagation between nodes; that is, the network latency between all nodes is practically none. In the future experiments, we plan to model an appropriate network latency model based on the inferred CN locations and add artificial network delay between each node pair using network traffic control utilities, such as tc (traffic control). Moreover, in the future, we plan to expand the network size so that we can better measure the consensus message complexity of larger networks.
We follow the default Klaytn parameters for the IBFT consensus operation. This includes an average one-second period for each block finalization. For every consensus round, we use only six out of all eight CNs; that is, our Klaytn committee includes a selected subset of six CNs out of the eight CNs. The selection of the committee nodes follows the Klaytn protocol. The committee can be dynamically changed as done in the Klaytn cypress network. All these are configured in the configuration JSON file in Klaytn.
We use two ENs to submit transactions that are generated by the accounts under our control. As a first set of test experiments, we have tried two transactions-per-second (TPS) configurations: 0 (i.e., no transactions thus empty blocks) and 420. As the ENs generate transactions in the network, the PNs receive the transactions and forward them to the CNs. The leader CN of the round then creates a new block at the head of the round and initiates the IBFT consensus protocol. The CN nodes then finish the consensus protocol and finalize the block.
As we run the small-scale Klaytn network, we have the full advantage of monitoring (and controlling, if wished) any CN-to-CN messages as we control its entire networking component.
CN-to-CN message format. Transaction messages in Klaytn are exchanged via 32324 port. The frame format is given in Figure 2.
Figure 2. Klaytn transaction message format.
A transaction message has the following fields:
- value : The amount of KLAY in peb to be transferred.
- to: The account address that will receive the transferred value.
- input: Data attached to the transaction, used for transaction execution.
- v,r,s: The cryptographic signature generated by the sender to let the receiver obtain the sender’s address.
- nonce: A value used to uniquely identify a sender’s transaction. If two transactions with the same nonce are generated by a sender, only one is executed.
- gas: The maximum amount of transaction fee the transaction is allowed to use.
- gasPrice: A multiplier to get how much the sender will pay in tokens. The amount of tokens the sender will pay is calculated via gas * gasPrice.
Block messages are also exchanged via 32323 port.
Figure 3. Klaytn block message format.
A block message contains all the information about the block it is carrying, including its proposer identity, reward address, block hash, block header, etc.
The last type of messages is consensus messages and they are exchanged via 32323 port. Consensus messages may have several types such as request, preprepare, prepare,commit, round change etc.
Point-to-point messages between CN1 and CN2. We first monitor the CN-to-CN messages between two CNs: CN1 and CN2. The following Figure 4 shows how the CN-to-CN messages are exchanged and how large they are.
Figure 4. TCP segments (on port = 32323) exchanged between CN1 and CN2.
We learn several interesting traffic characteristics of CN-to-CN messages in Klaytn. First, they are highly punctual. All consensus message in our experiment are exchanged within a short time period (e.g., 100 msec) once a new round begins. This highly punctual message exchange is a result of the design choices in Klaytn. Moreover, this is also due to the fact that our small-scale experiment setup has practically zero network propagation latency, rendering virtually instant message transfer. Second, we observe that the vast majority of consensus messages are small (e.g., < 100 Bytes), suggesting that most consensus messages are for the main IBFT consensus operations not for sharing block information.
All CN-to-CN messages. To further study the message size distributions in CN-to-CN consensus messages, we take all CN-to-CN messages in our small-scale testbed with six committee nodes. The following Figure 5 shows the histogram of CN-to-CN consensus messages.
Figure 5. TCP segment size distribution in CN-to-CN messages in a small-scale experiment.
As Figure 5 shows, the vast majority of CN-to-CN messages are small, carrying only a few bytes in the contents. We make a number of interesting observations. First, even when the CN network handles a lot more transactions per second, the number of small consensus messages (e.g., < 100 Bytes) stay almost the same. This suggests that the consensus protocols’ main complexity due to the IBFT messages does not grow as TPS increases. Second, related to the first observation, the number of large consensus messages grows as TPS increases. See the increased frequency for large messages (e.g., > 100 Bytes) when TPS = 420. The largest message is 33,722 Bytes. Although their frequencies are several orders of magnitude lower than the small messages, they occupy significant bandwidth between CN nodes. Based on the sizes of these large messages, they seem to be Klaytn block messages, which carry some block information. Yet, as the sizes of these block messages vary a lot, some more investigation seems necessary.
Non-consensus messages between CNs. Our last observation from the small-scale Klaytn experiment is that there exist large numbers of transaction messages between CNs.
Table 1. There exist large numbers of transaction messages directly exchanged between CNs.
To our surprise, we find that there are a lot more transaction messages in the CN-to-CN channels than consensus messages. This is evidenced with the two different port numbers with the different types of messages in Klaytn networks. Any block messages are exchanged via the TCP port number 34343 while transaction messages are sent via the port number 34344.
Figure 6. Distribution of CN-to-CN messages based on their port numbers.
In Figure 7, we also investigate the distribution of these two types of CN-to-CN messages in sizes. As we easily expect, transaction messages are much smaller than block messages as they carry much less data in general. However, transaction messages significantly outnumber block messages, which can create some bottlenecks in the CN network.
We learn from our conversation in the Klaytn forum (CN간 trasaction 데이터 교환 - colin.kim님의 글 #2 - 클레이튼 - Klaytn Developers Forum) that it is by design that CNs often exchange transactions directly with other CNs for the purpose of faster propagation of transactions. Although we understand the design choice, we believe this is undesirable. CN-to-CN communication channels should be reserved for consensus messages only because they are much more expensive than other channels via PNs. Consensus messages have O(N2) complexity, where N is the number of CNs in the committee. We cannot increase N easily to increase the CN-to-CN channel bandwidth. Whereas increasing PNs has no such complexity limit. The more PNs, the more bandwidth for non-consensus communication. The effect of transaction messages in the CN-to-CN channel in Klaytn needs more investigation and experiments in larger networks.
Once we have gained the core knowledge of how Klaytn protocol works in local network experiments, we attempt to learn some high-level metadata about the Klaytn cypress network. In general, accurate topology information of blockchain peer-to-peer networks is considered highly sensitive. When such accurate topology information is leaked to an adversary, she can mount targeted attacks against some nodes with high connectivity (to degrade the connectivity of the network) or some nodes that process consensus information (to disrupt the consensus protocol of the blockchain). Klaytn cypress network topology is also considered sensitive. In particular, since Klaytn runs a leader-based consensus protocol, any availability attack towards a selected leader in a round can seriously degrade the liveness of Klaytn. We present our intermediate report on this very topic.
Let us first review Klaytn’s Core Cell (CC) architecture. The following figure shows an example of a CC with one CN and two PNs. By design, Klaytn CNs are supposed to directly connect to all other CNs. In addition, a CN is connected to the PNs in its CC. Note that there are no other connections each CN creates and maintains.
Figure 7. Klaytn Core Cell architecture.
This simple connectivity for CNs ensures that a CN never directly connects to others outside its own CC and the CN mesh network. This is indeed a critical design choice since this makes Klaytn CNs not directly observable from outside. Consider an adversary who wishes to make a direct connection to a Klaytn CN. She must find the network address (i.e., IP and port number) of the CN; however, there exists no protocol-level support that provides the CN’s network address to her EN. The adversary’s EN can only connect to one of the PNs and this PN assignment is done by a bootnode.
We begin with the information that can be easily obtained by any unauthorized parties (e.g., any participants with an EN deployment). Thus, our first approach is to enumerate all existing PN network addresses in the current Klaytn cypress network.
In order to obtain as many PNs as in the network, we create a single EN node and repeatedly connect to the cypress network. Everytime our EN node connects to the network different sets of PNs are selected for our EN. By repeating this process several times, we are able to enumerate all existing PNs in the network. In order to minimize any unnecessary burden to the cypress network, we space out the two EN connections with enough time gap (e.g., a few minutes). Moreover, we repeat this experiment in five different Amazon EC2 data centers: US-east, US-west, East Asia, Southeast Asia, and Europe.
From our experiment conducted in August 2021, we enumerated a total of 27 PNs with their IP addresses as shown in Table 2.
Table 2. Enumeration of all 27 PNs in Klaytn cypress (measured in August 2021). Last 16 bits of the IP addresses are redacted.
From the list of PNs, we learn a number of new insights. First, all the PN nodes are located in Asia, especially, in three countries: South Korea, Singapore, and Japan. This is due to the 120 ms latency restriction imposed by the Klaytn team. Second, among the three countries, South Korea currently has the majority of PN nodes. Third, the majority of CN nodes are hosted by Amazon AWS; especially, 100% of PNs outside of South Korea are in Amazon data centers.
One interesting finding is that more than half of these PNs are located within the same (or adjacent) subnet with at least another PN. For example, two PNs are deployed within the network subnet 220.127.116.11/24, which is owned by Korea Telecom. In fact, this close proximity of PN nodes is not a surprising design artifact as the very example of the Klaytn-suggested Core Cell architecture; see Figure 7. In a sense, it is natural to create two PN nodes in the same network subnet and let them get advertised by the bootnodes.
The enumerated PN nodes and their network addresses may not pose any security threat by themselves. However, when this very information is used to infer the CNs’ network addresses, it can lead to some attacks targeting the CN nodes in the network; see Section 3 for a potential liveness attack against Klaytn.
Our finding of close proximity between a pair (or triple) of PNs does not necessarily suggest that a CN node connecting to the PNs with adjacent IP addresses must be located in the same subnet. Their CN could very well be assigned an IP address that is unrelated to the IPs of the PNs.
What we suggest is a reasonable doubt that some Klaytn CNs in the cypress network may be located within the same or adjacent network subnet of their PNs. Considering the typical network deployment exercises in private corporations, it is reasonable to create three Amazon EC2 instances in the same subnet and use one as a CN and the two others as PNs of the Core Cell. Since there exist no explicit or implicit policy that suggests or forces the use of different subnets for a CN and its PNs, it is likely that some Klaytn participants may have created all nodes in a Core Cell in the same (or adjacent) subnet(s). The same subnet exercise is especially beneficial for the Klaytn participants since placing all the nodes in the same subnet can reduce the bandwidth charge for cross-subnet traffic in Amazon EC2.
As unauthorized external experimenters, our team is not able to directly confirm our same-subnet conjecture. We seek for any other ways to confirm this conjecture in the future.
Klaytn is based on the Istanbul BFT consensus protocol. It guarantees the safety property as long as the number of nodes n satisfies n ≥ 3f + 1, where f denotes the number of arbitrarily misbehaving (i.e., Byzantine) nodes in the system. In this project, we focus on a network attack that aims to violate the liveness of Klaytn; that is, the attack wants to add some non-negligible, arbitrary delays to the consensus operation. An extreme form of liveness attack can be considered as denial-of-service attacks as it may make the Klaytn blockchain nearly unusable.
We consider a network adversary as the main threat actor. Examples may include malicious Internet Service Providers (ISPs), malicious cloud providers, or malicious network vendors. One may argue that assuming a network adversary would make a too strong adversary model for blockchains like Klaytn; yet, considering that blockchains constantly have been targeted by network operators, competing businesses, and even nation states, it is not a far-fetched adversary model.
We also consider that a network adversary can conduct basic network message analysis, such as deep packet inspection and packet redirection for any message of particular interest. The network adversaries can also delay any message of interest with any arbitrary amount of added latency. Infinite added latency would mean a packet drop. We also consider an adaptive adversary, where an attack strategy may change in real time in response to the target systems’ behaviors and state changes.
We assume that the Klaytn nodes are connected via the public Internet, not an isolated private network. Also, no connections are made through a network obfuscation service (e.g., Tor).
The liveness attack requires identifying the Klaytn consensus-related messages in Internet traffic. This can be a daunting task especially if an adversary is a network operator as it is like finding a needle in a haystack. We believe, however, that it is a feasible attack operation for typical network operators due to the two following attack capabilities.
Identifying CN-to-CN messages. CN-to-CN messages have clear message formats. Moreover, the protocol dictates the specific port numbers (e.g., 34343 for consensus messages and 34344 for transaction messages) and this makes the CN-to-CN message identification further easier for network operators.
Inferring CN node network addresses. By design, the network addresses of Klaytn CN nodes are not exposed publicly. Instead, Klaytn PNs are exposed publicly and any message to CNs must travel through Klaytn PNs. In practice, however, it is very likely that Klaytn CN nodes collocate with their PNs mainly for cost reduction purposes, and this would make the CN node address inference much easier.
TODO item 1. We plan to demonstrate a Klaytn liveness attack in the next few months in a local Klaytn network with a reasonable committee size. The main task of the attack demonstration would include the optimized attack strategy that determines which CN-to-CN Klaytn messages should be delayed, exactly when the message disruption should be started, and how long each message should be delayed. This would require a careful analysis of the current Klaytn’s IBFT protocol.
TODO item 2. We then simulate a more realistic global network topology that reflects the current Klaytn cypress node distribution. For example, we will consider the hosting operators (e.g., Amazon, Korea Telecom) and their geographical locations to evaluate the risk of liveness attacks.