The 10th KIR: S2WLAB_”S2_EYEZ” for Klaytn Token Economies_Progress Report(1)

Summary

Currently, token economies are considered as the major components to monitor the financial behaviors of blockchain networks. To be specific, the combination of market cap of Ethereum tokens has surpassed the value of native Ethereum already. The Klaytn network is not an exception to this trend. For instance, the KLAYSwap protocol which is Klaytn’s iconic token has successfully made more than 20 millions of transactions for now and also, it is being increased continuously. Furthermore, the number of newly launched services on Klaytn token economies is rapidly increasing and thus, over one hundred tokens are serving at this moment. Consequently, Klaytn’s token economies are surely getting a lot of attention and growing as much larger as native Klaytn.

In this KIR project, we propose an analysis framework for tracking the money flows on the Klaytn token economies. Our goal is to provide public web services to anyone who is interested in analyzing money flows on the Klaytn network. To achieve this, we have built a graph analysis engine for Klaytn token economies and it coexists together with another on-chain monitoring system we have proposed in the last KIR project. The overall operations of our system has two main steps: Token Transfer Graph Construction(TTGC) and Graph Analysis. For the first milestone, we have implemented an infrastructure to construct and analyze token transfer graphs and then, these technical reports give the full explanation of how we construct token transfer graphs and its performance evaluation results with fundamental analysis applications.

KLAY funding

Project milestones and schedule

Key Deliverables - Technical report

System Design

We focus on giving the concrete visibility of all types of financial dealings on the Klaytn mainnet(i.e., cypress) including external and internal transactions. In this project, we present S2_EYEZ for Klaytn, a high-performance on-chain money flow analysis framework for Klaytn blockchain. Essentially, the proposed system addresses blockchain transactions as a format of graph data to increase trace performance and to make track money flow easy.

We have already presented the details and implementations of the basis of the S2_EYEZ system in our previous KIR project which was well performed. However, there is no visibility on token transfers because we have considered only external transactions as financial behaviors of the blockchain. Thus, we would be supposed to deal with not only external transactions but also token transfers for this project. As the first step toward it, we put an additional graph cache system for Klaytn token economies on the existing S2_EYEZ system.


Figure 1. System Design of S2_EYEZ for Klaytn

Figure 1 shows the concise overview of S2_EYEZ system design. There are two graph cache systems each for external transactions and token economies respectively. Users do not need to know how the graph cache systems work. Once the web application server takes the user request, it is passed down to the graph cache systems and then, the individual graph cache server prepares appropriate responses with graph analysis results for the request.

As we mentioned above, the graph-based analysis system for external transactions has been launched in the last KIR project. Also, we have built an additional analysis framework for Klaytn token economies for the first milestone in this project. In the following sections, every detail of the proposed system would be addressed such as how we build a graph for token economies and what’s different about external transactions and token transfers as an analysis system.

Token transfers from event logs

In the blockchain network, event logs could represent what smart contracts have done. It records various types of events that occurred from smart contracts such as a token locking and a transfer. For the aspect of money flow tracking, we mainly focus on the event type of token transfer instead of others. The event logs of token transfers depict all of the details about the information of transferred tokens including sender and receiver addresses and transferred amount of tokens. Thus, we sorted out specific types of events(i.e., token transfer) from entire event logs to build a token transfer graph for Klaytn token economies.


Figure 2. An example of token transfers from event logs.

Figure 2 shows the example of extracted token transfer data from event logs. The log could describe how much tokens are transferred and who sends the tokens to whom. To be specific, the ‘address’ data of the example indicates a contract address to launch this transfer and this tells us which type of tokens are transferred. Additionally, we could get the answer for how many tokens are transferred from the ‘data’ of the log. Also, there is the combination of three entities as ‘topics’ to represent sender’s and receiver’s addresses and the type of event log(e.g., ‘transfer’ for this example).

The proposed system constructs a token transfer graph from such key variables of event logs. For instance, each address and transaction hash value of logs could be represented as a vertex on the graph and an edge between two vertices be generated if there is a token transfer between two addresses.

Design considerations: micro vs monolithic graph architecture


Figure 3. Micro vs Monolithic graph architecture for token economies

One of the most important design concepts of our system architecture is about the type of graph base. To be specific, we need to make a careful decision between monolithic and micro graph architectures to support many types of tokens on the token economies unlike a network for external transactions. If we apply it to real world money markets, it seems like there are several types of notes such as the US dollar, the yen and the pound to put on graphs.

By taking the concept of monolithic architecture, we could put all the types of tokens together on a single graph. This kind of singleton manner makes it easy to track the multiple types of token transfers at once by maintaining a single generalized data structure. But in general, this tightly coupled single graph has disadvantages for maintanance because the modification on a token network could bring about side effects on the other token economies.

On the other hand, the micro graph architecture provides a dedicated graph base for each type of token. This is good for the case of traversing transactions of a specified token type only. In contrast, when we make an inquiry into a large number of types of tokens at once, it requires additional costs to refer to a string of graphs to trace token transfers over one token type. Also, it is hard to maintain because there are as many individual graphs as the number of token types to manage.

In particular, the monolithic graph architecture is suitable for tracking money flow on a graph because the tracking is generally done by referencing several types of token transfers. Also, the total volume of Klaytn token transfers up to now is about dozens of gigabytes of data and it is enough to address on a single in-memory system. As a result, we have designed our system on a monolithic graph basis and also easily expand such graphs for newly released token networks with no rebuild of the entire graph base.

Graph architecture: vertices and edges

Basically, the graph represents relational data through vertices and edges. In our system, each vertex represents an entity of the blockchain network such as a wallet address and a contract address, and there could be an edge between any two vertices if two addresses have made a token transfer. The sender and receiver of a transaction is indicated as the manner of directed graph architecture. Also, we compose an extra data storage for additional information about each token transaction such as a block hash and a number.

To represent blockchain transactions, we organize three main data structures for the graph base: v-map table, Dictionaries, Relationships.

Figure 4. The format of a data structure for Dictionaries.

Figure 5. The format of a data structure for Relationships.

V-map table. This provides the conversion of text-based addresses into numeric values. The numeric value has a unique property and every single vertex on a graph is mapped with a unique numeric value. We have applied a hash data structure for this v-map table and thus, the proposed system provides constant-time lookup on average regardless of the number of items in the table.

Dictionaries. Dictionaries give a direction toward the actual address of memory location of edges for each vertex. For instance, when we refer to withdrawals of an address, we could get transaction information through touching the data on a storage location indicated with a file identifier and offset in dictionaries.

Relationships. The relationships structure stores practical data about edges. It consists of a set of records which represents an edge with a type of edge and a destination vertex. Also, each relationship could have supplementary information(e.g., transferred amount) as variable length data on a payload.

Graph architecture: doubly indirect block file system

Originally, we have firstly deployed a single indirect block file system which is used in the linux file system for a while and our system allocates 64k bytes block space to each vertex for edge representation. If a vertex has edges more than 64k bytes then, the system reallocates an indirect block for the vertex and the indirect block consists of an array of pointers for child blocks which have edge data.

In this concept, the proposed system could support about 22 millions of edges as the maximum acceptable number of edges for each vertex. However, with the steady growth of Klaytn network, this 22 million edges are not enough for some tokens any more. For example, the KLAYSwap protocol has made more than 25 million token transfers already and it is growing continuously.


Figure 6. The description of a doubly indirect block file system for graph base.

To support token economies that continue to grow, we have expanded the indirect block file system to doubly indirect blocks. Doubly indirect block file system operates two depths of pointers. That is, each vertex has an indirect pointer for edge data that points to a set of pointers that point to other blocks of pointers that finally point to actual edge data. With this architecture, the proposed system is accommodatable for up to 183 billion edges for every single vertex.

Construction result of a token transfer graph

Table 1. The overview of constructed token transfer graph

To bootstrap our system, we have gathered token transfers from the event logs of 70,472,170 blocks(i.e., from 0 to 70,472,169 block numbers). The total number of transfers extracted for the logs was exactly 121,897,518 transfers and it was close to 45 gigabytes of data. We have built a token transfer graph with the logs and it takes about two hours(note that the proposed system requires this bootstrap process only once for all lifetime). As a result of bootstrap, the token transfer graph was generated as 126,629,982 number of vertices and 488,600,320 number of edges. Also, the supplementary information about token transfers was recorded on an extra database system and it was close to 46 gigabytes of data.

Contract address report

Table 2. The number of smart contracts which made a transfer at least once.

The contract addresses on event logs indicate the identification of transferred tokens. If there is a contract address as a vertex on a token transfer graph, it means that the contract has made transfers at least once. Also, we could know how many and when the contracts live in token economies by carefully analyzing contract information on the logs.

From the understanding, our system collects such contract addresses from event logs and maintains it for further analysis. Table 2 describes the overview of collected contract addresses. For now, our team is on the deeper analysis of the contract addresses to discover any abnormal behaviors of token contracts. For the next milestone, we would present the analysis result on it with noticeable findings.

Graph search performance

Table 3. The composition of a token transfer graph.

Table 4. The execution time to traverse every single vertex on a graph.

We have evaluated the performance of the proposed graph system with simple applications as shown in Table 3 and 4. To evaluate graph search performance, we have measured the time spent traversing every single vertex in sequential order and figured out how many transfers have occurred by each vertex. During the execution, the system has touched all 126,629,982 vertices including 122,150,080 for transactions and 4,479,902 for addresses and the average number of transfers made to an address was 54. We have repeated the execution to traverse the entire graph for three times and it tooks 87.66 seconds on average. That is, the system could get the information about first level token transfers of each vertex within a microsecond on average.

Budget

This looks interesting!
Are you planning to open-source your projects?

Hello, thanks for your interest in our project.

Sorry, we don’t have any plans to release the project as open source yet and it is about some business reasons.

Instead, we would suggest you check out the BlockSci project which is a high performance system for analyzing blockchain data(may you already get).