The 8th KIR: NetS&P Lab (KAIST)_Securing and Improving BFT Consensus Protocols with Advanced Networking_Progress Report(3)


For much improved transaction volume, latency, and finality, many blockchain systems utilize BFT consensus protocols. In particular, permissioned blockchains enjoy the full benefits of BFT consensus among a small (e.g., a few tens or hundreds) group of distributed nodes. One remaining problem though is the limited scalability of BFT consensus protocols. In particular, the many existing protocols, such as PBFT, Istanbul BFT, have quadratic message complexity to the number of replicas in the system, and this makes the blockchains hard to scale beyond a few tens of nodes participating in consensus. Some newer proposals have attempted to reduce the message complexity via committee selections, threshold signatures, and trusted hardware.

In this proposal, we ask whether the current network infrastructure is properly used for the BFT consensus protocols in permissioned blockchains and investigate if emerging network primitives can further improve the throughput performance. First, we plan to study whether the current Internet infrastructure is properly used for scalable operation of the Istanbul BFT consensus protocol in real large-scale blockchain networks like Klaytn. In particular, we will perform an in-depth evaluation on the effect of the current underlying Internet architecture of the commercial blockchains on their throughput performance. Furthermore, we aim to utilize emerging hardware-based networking primitives, such as programmable switches, to further speed up the message propagation in the Istanbul BFT consensus protocol.

Key Deliverables - Technical report

For our KIR milestone #3, we submit this technical report on the progress of our on-going effort for building two evaluation frameworks for Klaytn, or in-general BFT-based, consensus algorithms. The first evaluation framework, which we call the precise network degradation emulation, enables experimenters to create various actual network disruptions between Klaytn CN nodes for the purpose of damaging the Klaytn’s consensus operation. The second evaluation framework, which we call the consensus offloading emulation, allows us to delegate some functions in the Klaytn nodes to programmable switches in the network substrate for reducing message complexity. These two evaluation frameworks are in development and thus this report focuses on the design of the frameworks, the current status, and our next steps. The frameworks will be open sourced once they are complete and we hope they can make evaluations of many attack and disruption scenarios for Klaytn much easier.

1 Precise Network Degradation for Consensus Liveness Evaluation

We have built an evaluation framework, called precise network degradation emulation, to provide a versatile network evaluation framework for Klaytn security testing. In particular, we focus on a wide range of network degradation between CN nodes in Klaytn.

Network degradation we wish to test with this framework ranges from transient link failure, transient node failure [1], large-scale network failure (e.g., KT ISP network disruption happened in October 2021 [2]), DDoS attacks against network equipment [3] or CN nodes [4], and even sophisticated packet-level network disruption of CN communications by malicious ISPs [5].

In order to create such a wide range of network degradation while executing the unmodified Klaytn CN nodes and its Istanbul BFT consensus algorithm, we aim to build a system with a general-purpose software-defined networking (SDN) capability [6] so that we (or any other experimenters and researchers) can easily express sophisticated network disruption/attack scenarios with a high-level language (e.g., Python) [7] and conduct Klaytn consensus evaluations. Particularly, we utilize an OpenFlow protocol [8] to program and enforce our network degradation rules to OpenFlow-supporting software switches (e.g., Open vSwitch [9]). See Figure 1 for a simplified illustration of our SDN-based Klaytn evaluation framework.

Figure 1. Overview of the SDN-based network degradation evaluation framework

In the rest of this section, we explain our current status with some initial results and we summarize our remaining tasks.

1.1 Building a network emulation platform with software switches

We set up multiple virtual machines for hosting multiple CNs running on individual operating systems. This way, we can isolate CNs with their own networking stacks, ensuring any issues in a networking stack in a CN does not directly affect the others’ performance. These CNs in separate VMs are connected to a single software switch.

We install an Open vSwitch or OVS for the switch that controls all CN-to-CN communication as shown in Figure 1. Each CN node is connected to separate interfaces of the OVS switch for isolated network degradation enforcement. That is, network degradation or attack on a link does not affect the performance of other unaffected links.

Some service nodes can also be connected through the software switch to connect to the CN nodes, and this connection is also separated from the data-plane of all the CN-to-CN communication links. Moreover, our network of CNs is strictly disconnected from the Internet for a completely controlled experiment environment.

1.2 Creating a network degradation mechanism with SDN controllers

We aim to build a universal, expressive network degradation policy, which is written in a high-level language, that allows experimenters and researchers to easily write a wide range of network degradation scenarios for consensus networks.

In particular, our network degradation policy should be able to express the followings:

  • Transient link failure: any CN-to-CN link can be temporarily disabled by network instability or short-term attacks.
  • Transient node failure: any CN node can be temporarily unavailable for communication due to node instability or short-term denial-of-service attacks. Such CN instability has actually occurred in 2020 and caused disruption in the Klaytn network [1].
  • Large-scale network failure: multiple CN-to-CN links can become unavailable for a short-term or even long-term period. The recent KT ISP network disruption happened in October 2021 is a good example [2].
  • DDoS attacks against network equipment or CN nodes: This attack is similar to large-scale network failure but with one big difference. Attacks can change the targeted network segment or nodes [3] to be adaptive to the network state changes.
  • Sophisticated packet-level network disruption of CN communications by malicious ISPs: We consider advanced attacks that disrupt the networking layer between CN nodes in a fine grained manner. To be specific, a network adversary, such as a malicious ISP, monitors every packet between CNs and selectively drop or delay any of them for achieving specific attack goals.

We utilize existing software-defined networking (SDN) capabilities to realize the network degradation policy and enforcement. To be specific, we place one logical network controller that directly interacts with the software switch by inserting, modifying, and removing flow rules at the switch. This way, what we want for different flows that are differentiated by the TCP/IP five tuple (i.e., source IP, destination IP, source port number, destination port number, protocol) and some other packet characteristics (e.g., packet sizes) to be treated differently by the switch that emulates the underlying network infrastructure between CN nodes.

For simple degradation actions, such as packet drops, we directly install flow rules via OpenFlow. The OVS switch then drops all matched CN-to-CN packets. For more sophisticated actions, such as packet delays with arbitrary amount of latency, we use a controller logic to hold the packets and forward after predefined delays.

We utilize a high-level language based network controller for easier use of the precise network degradation system. Particularly, we use POX, a Python-based open source SDN controller. POX makes it extremely easy to write rules for various network degradation policies.

1.3 Simple test results

We have tested the following two scenarios to show that our SDN-based evaluation framework indeed allows and enforces highly flexible network degradation for Klaytn consensus liveness tests.

  • Test scenario 1: transient link failures between one or more pairs of CN nodes
    • In this test, we first disconnect CN1 and CN2 nodes and this drops all packets between the two nodes. Then, we observe the status of the block generation of the test Klaytn network for one minute. For this simple test, we use only four CN nodes. We repeat this process by increasing the number of disconnected CN pairs.

  • Test results:

  • Test scenario 2: targeted packet drops between some pairs of CN nodes
    • In this test, we first drop packets with the length of 98 byte between CN1 and CN2. We then observe the status of the block generation for one minute. We repeat the above process by increasing the number of the affected CN pairs.

  • Test results:

1.4 Remaining tasks

  • Testing a wide range of network outages and attacks on the Klaytn network with more CN nodes with the precise network degradation evaluation framework
  • Identifying encrypted IBFT message types with machine learning
  • Obfuscating packets with padding, timing adjustment, and relaying to make IBFT message type inference harder

2 Consensus Offloading for Message Complexity Reduction

For investigating our vision for the complexity reduction with emerging programmable switches, we have begun testing the programmable switch capability with the P4 language. We choose the bmv2 platform to evaluate a wide range of functionalities that can potentially be offloaded to switches without assuming specific hardware configurations.

2.1 Overview of programmable switches

Traditional network equipment (such as routers and switches) have only a limited set of functionalities that are baked into their hardware when they are manufactured. The lack of flexibility of these traditional network gears has motivated the programmability of network equipment. The first generation of these more flexible routers/switches is the software-defined networking (SDN) paradigm [6]. SDN platforms achieve flexible programming of switching capabilities in datacenter networks and ISP networks by separating the control and data planes. That is, a logical controller of an entire network programs the entire network switches by pushing match-and-action rules at each switch. OpenFlow is the de facto standard protocol used for SDN networks.

While SDN with OpenFlow indeed achieves programmability in networks, it is insufficient for consensus offloading because it provides flexible programmability at the control plane. New protocols and functionalities can be pushed into switches but it is limited to what OpenFlow protocol provides (e.g., 40 predefined header fields) for data-plane operations. Thus, OpenFlow is inappropriate for highly flexible line-rate operation at the switches.

In 2016, a new switching chip called Tofino by Barefoot Networks [10] was developed to show the feasibility of programmability directly into silicon chips. That is, a much wider set of instructions can be directly written into the data-plane of switches, making line-rate packet programming possible. Moreover, an open source programming language P4 [11] was developed in 2014 and it enables developers to specify how switches should work at the data plane for processing each incoming packet with the new silicon chips at the switches. With the new programmable switch hardware and the new programming language P4, network engineers now can directly program their ideas into the switches without sacrificing switching performance. This makes programmable switches a perfect candidate for our consensus offloading for Klaytn since we can potentially reduce some networking burden of the CN nodes by executing some functions at the switches without sacrificing the network forwarding performance.

2.2 Development with the reference P4 software switch

To quickly prototype our ideas of consensus offloading, we aim to build the first proof-of-concept system with a software switch that fully supports the P4 language. This way, our proof-of-concept development does not depend on any hardware specific limitation. We envision that, once we show the potential of consensus offloading with P4 software switches, we accordingly can construct the idea in the P4-supporting hardware switches on the market.

Our choice of P4 software switch is bmv2 (which stands for behavioral model version 2) [12]. It is the second version of the reference P4 software switch and it is written in C++11. It takes a P4 program that is generated by a P4 compiler, and implements the packet-processing behavior written in the given P4 program. bmv2 faithfully interprets P4 programs and thus a successful compilation and implementation of a P4 program ensures that the given P4 program indeed guarantees line-rate operation at P4 hardware switches. This is why we can rely on bmv2 for our prototype implementation of consensus offloading. That is, if we successfully develop a P4 program for our consensus offloading and test with bmv2, it ensures line-rate consensus offloading at the data plane.

Having said that, there exists a clear disadvantage of using bmv2 for development compared to utilizing P4 hardware switches. bmv2 is not designed to be a production-grade P4 switch and thus the performance of bmv2, in terms of throughput and latency, can be worse than that of a production-grade software switches or hardware switches. Thus, in our proof-of-concept implementation, we shall not push the bandwidth capacity limit of the bmv2 software switch but instead confirm the correctness of its operations for consensus offloading. For more details, we refer readers to the bmv2 project page [12].

2.3 Testing P4 software switch capabilities

Let us dive into the P4 programming language and explore what network programmability P4 switches can provide us with some examples we have tested in previous projects.

P4 allows a network programmer full control over the switch behavior. As for the P4 language itself, it can be described as C without loops, recursion, pointers, and dynamic memory allocation [13].

We have developed the two following simple P4 network applications and explored P4 switch capabilities.

  • Test application 1: Flow volume observation with P4. In this simple application, we aim to build a compact data structure within a P4 switch using hash tables and use them to store k largest flows. For this test application, we have tested and used the following P4 capabilities:

    • Parsing and processing arbitrary packet header fields. We use the 5 tuple in the TCP/IP header and some other arbitrary application fields to construct flow IDs and then distinguish each flow.
    • Building and utilizing hash tables. In the v1model in bmv2, some number of hash operations are supported and {read, modify, write} operations for internal registers are also supported. With these, we can easily prototype operations for hash tables.
    • Storing and manipulating data pertaining to each packet, known as metadata. We use metadata to contain the hash table index and counter of other flows.
  • Test application 2: Packet manipulation with P4. In this simple application, we develop two separate switches that signal each other in line rate to share their real-time measurement results. To be specific, one switch monitors the existence of a request message for any given response message to alarm the other switch if it cannot find a corresponding request message. The receiving switch then blacklists the specific IP address.

    • Building and utilizing a cuckoo hash table with 4 hashing algorithms. Since P4 switches do not allow loops, the algorithm has to be unrolled at compilation time (e.g., iteration on stack with maximum counts).
    • Extracting source IP addresses from IPv4 headers and blacklisting them. Such a real-time responsive action can be easily implemented with P4 switches.
    • Appending a customized header when a counter value exceeds a threshold. We can easily manipulate headers in arbitrary manner when certain conditions are met.

Note that we refer to two previous projects we have conducted for other network security problems [14] [15] to develop the two test applications.

2.4 Remaining tasks

  • Exploring more P4 capabilities with bmv2 software switch
  • Investigating encryption/decryption capability of P4 switches
  • Discussing what consensus functionalities can be potentially offloaded to programmable switches


[1] Analysis on Consensus Delay at Cypress Block 24,002,380 | by Tech at Klaytn | Klaytn | Medium

[2] 뒤늦게 손 든 KT “인터넷 장애, 디도스 아닌 네트워크 오류” : 쇼핑·소비자 : 경제 : 뉴스 : 한겨레

[3] Kang, Min Suk, Soo Bum Lee, and Virgil D. Gligor. “The crossfire attack.” 2013 IEEE symposium on security and privacy. IEEE, 2013.

[4] Gencer, Adem Efe, et al. “Decentralization in bitcoin and ethereum networks.” International Conference on Financial Cryptography and Data Security. Springer, Berlin, Heidelberg, 2018.

[5] Hacker Redirects Traffic From 19 Internet Providers to Steal Bitcoins | WIRED

[6] Kirkpatrick, Keith. “Software-defined networking.” Communications of the ACM 56.9 (2013): 16-19.

[7] GitHub - noxrepo/pox: The POX network software platform

[8] McKeown, Nick, et al. “OpenFlow: enabling innovation in campus networks.” ACM SIGCOMM computer communication review 38.2 (2008): 69-74.

[9] Pfaff, Ben, et al. “The design and implementation of open vswitch.” 12th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 15). 2015.

[10] Intel® Tofino™ Series Programmable Ethernet Switch ASIC

[11] Bosshart, Pat, et al. “P4: Programming protocol-independent packet processors.” ACM SIGCOMM Computer Communication Review 44.3 (2014): 87-95.

[12] GitHub - p4lang/behavioral-model: The reference P4 software switch

[13] GitHub - jafingerhut/p4-guide: Guide to p4lang repositories and some other public info about P4

[14] Khooi, Xin Zhe, Levente Csikor, Min Suk Kang, and Dinil Mon Divakaran. “In-Network Defense Against AR-DDoS Attacks.” SIGCOMM Demos and Posters, 2020

[15] Khooi, X. Z., Csikor, L., Li, J., Kang, M. S., & Divakara, D. M. (2021, June). Revisiting Heavy-Hitter Detection on Commodity Programmable Switches. In 2021 IEEE 7th International Conference on Network Softwarization (NetSoft)