The 5th KIR: S2WLAB_”S2_EYEZ”_Progress Report(3)_Final

Klaytn Improvement Reserve: Final Report

Summary

S2WLAB designed and implemented a high-performance on-chain analyzer, called S2_EYEZ for Klaytn. It provides fast on-chain traverses for getting insights from transaction data, based on key strategies: 1) locality of on-chain analysis queries, 2) loose ACID properties to Blockchain data access, and 3) disk/memory IO optimization. To demonstrate our design and implementation, during the second period, we introduced an implementation of S2_EYEZ for Klaytn and deployed it to open publicly accessible APIs to the forum users.

In this final report, we discuss an example use-case scenario with EYEZ’s open APIs, which gives an insight of how to leverage S2_EYEZ APIs to solve real-world problems that involve cryptocurrency transactions. We describe the procedure to deep dive into the case and provide source codes with the results. Then, we evaluate system performance with respect to usability and scalability to access Klaytn transaction data. We lastly discuss the future plan and lessons learned from this project.

Project Milestones and Schedule

Here, the below table describes the project milestones and schedules. We have completed all planned tasks with deliverables in each milestone.
Expected project completion time: 6 months

Key Deliverables

This section shows example usecase scenarios with S2_EYEZ’s open APIs and implementation details to access Klaytn transaction data. Then, we perform experiments to measure the performance of S2_EYEZ for Klaytn while traversing on-chain transaction data.

1. Example Scenario with S2_EYEZ’s open APIs

In this section, we introduce an example use case scenario that shows how to leverage S2_EYEZ’s open APIs. Because Klaytn hasn’t been used for criminal purposes yet, we make a probable criminal scenario involving cryptocurrency. In order to focus on the usage of S2_EYEZ’s open APIs, we make our scenario straightforward. We assume that there are two main threat actors: 1) dealer who makes black money from drug trades and tryies to cash-out the money 2) broker who has an experience to use exchanges for money laundering purposes. At the beginning, we assume that analysts only know about the address with purposes, not relationships between two actors. The below table describes pairs of address and owner.

In the rest of sections, we detail the procedures for how to identify hidden relationships between two actors, which consis of three steps: (i) applying preliminary knowledge to S2_EYEZ, (ii) black money trace, (iii) data visualization.

1.1. Applying Preliminary Knowledge to S2_EYEZ

We first ‘tag’ information to apply preliminary knowledge to S2_EYEZ, which makes our tool to identify important information while following money transfers. To this, analysts use tagging APIs with an address and its owner information. The below represents a code for registering new tags to S2_EYEZ.

import requests
 
tag_data={
    "token" : [USE_YOUR_OWN_TOKEN],
    "address" : "0x22f567e4a0b07f3e03160003f760095d098bc851",
    "tag" : "dealer" 
}
header={
    'Content-Type':'application/json',
    'X-api-key': [X_API_KEY] # see our API document
}
url= [S2_EYEZ_ENDPOINT] + '/tag/register' # see our API document
response=requests.post(url,json=tag_data,headers=header)
print(response.json())

After registering tags, analysts can check whether tag information is successfully applied to S2_EYEZ. The test code is below.

import requests
import json
 
query_data={
    "token" : [USE_YOUR_OWN_TOKEN],
    "page" : 1
}
header={
    'Content-Type':'application/json',
    'x-api-key': [X_API_KEY] # see our API document
}
url = [S2_EYEZ_ENDPOINT] + '/tag/get # see our API document
response=requests.post(url,json=query_data,headers=header)
print(json.dumps(response.json(),indent=4))

If a request is valid, you will receive the following message from S2_EYEZ. Each document represents address information and status (i.e., has_node) indicating whether S2_EYEZ has a valid address node or not. ‘Identifier’ is an unique id to manage tag data. Please check the API document to get detailed information. Check the below message.

{
  "code": 1000,
  "data": {
    "data": [
      {
        "address": "0x22f567e4a0b07f3e03160003f760095d098bc851", # registered address
        "has_node": true, # ‘true’ means that S2_EYEZ has a valid address node.
        "identifier": "ee62aa6d7bb3dffb4e5607272c3b115e074ebc7a",
        "tag": "dealer"  # registered address
      }
    ],
    "query_info": {
      "total_document_num": 5,
      "total_page_num": 1
    }
  },
  "message": "success"
}

1.2. Black Money Trace

In the previous section, we registered information about two threat actors. Here, we now can query to identify hidden illicit money transfer between two actors. If there are paths from the two, it means that the both actors have collaborated with each other and the addresses in money paths might be used in further investigation as intermediate addressess. It is quite easy to traverse black money traces by leveraging S2_EYEZ’s open API. Analyts make a query with specific parameters to request money traces, then S2_EYEZ will calculate money paths satisfied with the parameters. Check the below code sample with detailed comments below.

import requests
import json
 
query_data={
    "token" : [USE_YOUR_OWN_TOKEN],
    "address" : "0x9385b9833daf9116fc3c429492cb7f99cad6e3c2", #broker
    "amount_max" : "100000000000000000000000",
    "transfer_direction" : "transfer_deposit", # tracking deposit
    "enable_tag_filter" :0, # get all nodes in the middle of transfers
    "hop_count_max" :3, # maximum traverse depth
    "node_limit" : 10 # maximum the number of traverse node
}
 
header={
    'Content-Type':'application/json',
    'x-api-key': [X_API_KEY] # see our API document
}
url = [S2_EYEZ_ENDPOINT] + '/analysis/graph-search' # see our API document 
response=requests.post(url,json=query_data,headers=header)
print(json.dumps(response.json(),indent=4))

Starting from the broker’s address, we found that the broker receives an specific amount of money from the dealer’s address. In the following section, we discuss how to visualize the results for making them more human readable and discuss the results. Please check the raw results in advance.

{
  "code": 1000,
  "data": {
    "query_result": {
      "edges": [
        {
          "amount": 1000000000000000000,
          "from": "0x3fc6546d2cc1710940360a21aa9d0e0e2fc987647b0a18e31fa1679e3e110a85",
          "to": "0x9385b9833daf9116fc3c429492cb7f99cad6e3c2",
          "type": "address-tx"
        },
--------------------------
        {
          "amount": 9000000000000000000,
          "from": "0x27f0b2b0f6ccfd9f67dac40234511cc100836b472e0ef85b81820f2dcdaa735e",
          "to": "0x9385b9833daf9116fc3c429492cb7f99cad6e3c2",
          "type": "address-tx"
        }
      ],
      "nodes": [
        {
          "address_currency": "6",
          "graph_id": "0x22f567e4a0b07f3e03160003f760095d098bc851",
          "type": "address"
        },
        {
          "address_currency": "6",
          "graph_id": "0x22f567e4a0b07f3e03160003f760095d098bc851",
          "tags": [
            "dealer"
          ],
          "type": "address"
        },
        {
          "address_currency": "6",
          "graph_id": "0xecdbb29130c81733756dee2efec99f1044da5506",
          "type": "address"
        },
--------------------------
        }
      ]
    },
    "remaining_quota": 9966
  },
  "message": "success"
}

1.3. Data Visualization

Since S2_EYEZ traverses transactions across the entire blockchain, it is sometimes difficult to understand the raw data itself. Thus, data visualization is quite important to understand the data and get more insights. To do this, we graphviz python library that is a well-known graph visualization framework. Check the below code to visualize the S2_EYEZ’s analysis data.

import requests
import json
import graphviz
from graphviz import Digraph
import os
 
query_data={
    "token" : [USE_YOUR_OWN_TOKEN],
    "address" : "0x9385b9833daf9116fc3c429492cb7f99cad6e3c2", #broker
    "amount_max" : "100000000000000000000000",
    "transfer_direction" : "transfer_deposit", # tracking deposit
    "enable_tag_filter" :0, # get all nodes in the middle of transfers
    "hop_count_max" :3, # maximum traverse depth
    "node_limit" : 10 # maximum the number of traverse node
}
 
header={
    'Content-Type':'application/json',
    'x-api-key': [X_API_KEY] # see our API document
}
url = [S2_EYEZ_ENDPOINT] + '/analysis/graph-search' # see our API document 
response=requests.post(url,json=query_data,headers=header)
res_data=response.json()
 
# create node/edge from the result
edges=res_data['data']['query_result']['edges']
nodes=res_data['data']['query_result']['nodes']
 
g_digraph = graphviz.Digraph()
 
#create nx node with highlighter
for n in nodes:
    if 'tags' in n:
        g_digraph.node(n['graph_id'],n['graph_id'][:8]+'...',shape="rectangle", style='filled',fontcolor="red")
    elif n['type']=='tx':
        g_digraph.node(n['graph_id'],n['graph_id'][:8]+'...',shape="circle")
    elif n['type']=='address':
        g_digraph.node(n['graph_id'],n['graph_id'][:8]+'...',shape="rectangle", style='filled')
 
#create nx edge 
for e in edges:
    g_digraph.edge(e['from'], e['to'])
 
g_digraph.format='png'
g_digraph.graph_attr['rankdir']='LR' #draw Left -> Right Flowchart 
 
save_image_path=os.path.join('./save_sub_grpah_flow')
g_digraph.render(save_image_path)

After executing the visualization code, you will see the following figures.

As you can see, the dealer (0x22f…) transfers money that is eventually delivered to the broker(0x938…). If the dealer sends money directly to the broker, there is no doubt that the dealer deposits to the broker. However, the intermediate addresses are involved in the transfer of money, which indicates that other actors can be participated in the transaction or there are other addresses owned by the two actors. Regardless of the facts, these addresses can be used for further investigations such as tracking money flows to KYC-enforced exchanges.

2. Performance evaluation

We evaluate S2_EYEZ for Klaytn with respect to its performance. To explore its performance, we first consider node visiting performance in two cases: 1) retrieve a transaction graph from an arbitrary address by leveraging S2_EYEZ’s query API 2) sequential search to traverse all address/transaction nodes starting from the genesis block. Then, we evaluate the graph construction performance that represents how fast S2_EYEZ imports newly mined blocks. To evaluate our work, we use a service environment with high performance servers (2.9 GHz Intel Xeon Gold 6226R processor and 512 GB memory) and conduct each experiment on Klaytn Mainnet. Also, we have allocated 64 GB memory to our cache system.

2.1. Transaction graph traverse performance

To verify the runtime performance of our system, we measure the time to trace a transaction graph. We perform each experiment scenario five times and calculate the average time of each execution. In order to measure performance gain from the locality of reference, we perform each experiment with/without cache.


Table. S2_EYEZ running time in seconds to traverse Klaytn transaction graphs. Non-cached means that S2_EYEZ first traverses the transaction data without cache. The Cached environment indicates that S2_EYEZ has already loaded adjanecny address and transaction nodes in memory before executing queries.

Average running time for query API calls.

Contrast to the general block explorers, our tool stores the blockchain data in a relational graph. Such a relational graph considers addresses and transactions as graph nodes and edges. In this scenario, we bring the financial asset tracking problem to the graph traverse problem. Thus, we can figure out the financial asset flows on blockchain with the graph data structure more efficiently than storage-oriented blockchain structures.
The above table shows the experiment result of asset tracking. We first measure the response time of a graph query that traverses adjacency addresses and transactions within three hops. While traversing the graph, S2_EYEZ filters addresses and transactions according to the query parameters such as timestamp (transaction creation time) and amount of Klay (deposit/withdrawal amount). Since the size of the blockchain is huge to load all data in memory, it is necessary to load/unload the block data. Thus, optimizing the I/O operations is crucial to improve the performance of traverse. In order to measure the benefits of cache management mechanisms in S2_EYEZ, we perform experiments in which queries traverse more addresses and transactions. To simulate the experiments, we set the maximum traverse node to 10 million nodes without stop conditions. With cached and non-cached data, the above table shows that the small graph query requires 0.13 and 6.44 seconds. Also, the big query consumes 15.49 and 19.81 seconds depending on cache. The results indicate that S2_EYEZ has scalability against heavy quries.

Average running time for sequential search on entire mainnet transactions.

Contrary to the general block exploreres, S2_EYEZ constructs a transaction graph that only includes data required in asset tracking. Thus, S2_EYEZ travers the entire transaction data more dfficently as well as traversing arbitrary nodes from specific addresses. In order to measure how fast S2_EYEZ reads transaction data, we measure the sequential search from the genesis block. At the time of writing, the number of nodes and edges are 140 millions and 516 millions. The address nodes consist of 10 millions addresses and 130 millions transactions. To retrieve these all graph data sequentially, the system requires just 456 and 124 seconds for the non-cached and cached graph respectively.

2.2. Graph construction performance

Klaytn optimizes the consensus algorithm to improve block mining performance, which is one of the unique benefits to support various real-world applications in the world. Klaytn network generates a new transaction block for every single second, which is 10 times faster than Ethereum. Thus, S2_EYEZ should import a new transaction block within a second to sync up the latest data.


Table. Running time in milliseconds to import a new transaction to S2_EYEZ graph structure.

The above table shows the results of the insertion time for appending a new transaction data to the graph. Since S2_EYEZ records the graph data permanently, the disk I/O operations affect the insertion task in handling new block data. Thus, the insertion performance varies depending on whether the adjacency nodes (be) linked to a new transaction data are loaded in memory or not. The most optimal state is that all adjacent nodes of the new transaction data is already loaded in memory. In this case, it is not necessary to read the block data from a disk. If not, S2_EYEZ reads the necessary data into memory. In both cases, S2_EYEZ handles a new transaction within max 33 ms (worst case), which is enough to import newly mined blocks in real-time.

Futhre plan and lessons

In this section, we discuss lessons learned from this project and future plans to improve on-chain analyzer.

The expanding coverage of asset monitoring needs an understanding of smart contracts. As the growth of Klaytn Ecosystem, there are many financial applications has introduced to serve diverse assets across the Blockchain. Especially, KIP-7 (Fungible Token) and KIP-17 (Non-Fungible Token) have a protocol standard that makes developers easy to create new ones, as well as integrate into services (e.g., wallet). Also, non-standard DeFi applications like Swap or game applications manage their own assets depart from the general Klay transfer. In order to expand the coverage of asset monitoring with S2_EYEZ for Klaytn, it is necessary to understand different types of smart contracts, and also improve graph schema to record such assets. Our team is working to expand the coverage of assets starting from the standard tokens with the satisfaction of performance requirements as the same level of Klay monitoring.

Monitoring results tend to be complex to understand the meaning of results. Transaction patterns in cryptocurrencies are complicated rather than fiat currencies because there are no restrictions to make transactions. This makes it difficult to understand the meaning of asset tracking data without sufficient knowledge of the cryptocurrency domain. In order to solve this challenge, S2_EYEZ supports flexible query options that allow to filter out less important data. Even though it reduces the size of data, asset tracking results are raw data that are still large to get analysis insights. To solve this problem, our team plans to abstract asset tracking data based on predefined patterns, as well as providing the low level query APIs presented in this project. For example, if a user desires to query ‘get flows satisfying the addresses tagged by hack_incidents and the amount above 1 Klay’, such query seems like the below.

# request
curl --request POST \
  --url [ENDPOINT]/analysis/graph-search \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: [ACCESS_API_KEY]' \
  --data '{
	"token" : [ASSIGNED_TOKEN]",
             "address" : "0x867593ffb80e3776464909e303ff33bf8a92a826",
"amount_min": "1000000000000000000",
	"transfer_direction" : "transfer_deposit",
"tags": [ “hack_incidents”]
}'

This query API provides flexible and strong query parameters. However, it is difficult to create appropriate queries by users who do not have sufficient knowledge about the cryptocurrency technologies. Thus, our team is working on providing more abstract APIs based on predefined patterns.
For the next KIR, we are planning to propose a proposal to solve these challenges for improving asset monitoring functions.

Budget

The final status of execution and remaining of the project budget is as follows.

1. Product development & maintenance fee


Table. The product development and maintenance fee. Engineers are assigned based on a schedule for each task.

The above table shows the executed budget in the third period. Including $65,000 during this period, the total product development & and maintenance fee was $220,000 and the planned budget was all executed.

2. Development & deployment server fee

In this period, we execute $6,000 for Amazon EC2 and API gateway services, for a frontmost API interface to users. We executed all planned budgets $66,000 for the development & deployment server fee.

3. Product licensing fee


The S2_EYEZ basic license fee has been operated for March 2021 to April 2021. The total licensing fee $20,000 was executed as planned.

We executed $125,000, $90,000, and $91,000 for each period. In total, $306,000 has executed completely.

Hi, thanks for the progress report

To my understandings, the suggested is a graph traversal engine :smiley:

  • The suggested constructs a transaction graph and maintains (updates) it in real-time as new transactions arrive.
    • The accounts are modeled as vertices in a graph
    • The transfers of assets are modeled as edges between two accounts in the graph
  • The users can query graph traversals over the transaction graph

I have some questions and several feedbacks for possible improvements

  • Can you guide me to the API documents? thanks

  • What database engines are used in the backend? The relational database is not a good fit because it needs to perform repeated joins for graph traversals generating large intermediate results. Do you run any graph DB or custom-built traversal engines?

  • Given the APIs in the second use case ‘Black Money Trace’, the user needs to set the ‘hop_count_max’ and ‘node_limit’. The money launderer can easily create thousands or even millions of ‘intermediate addresses’. Do you still achieve high-performance graph traversals with large values of ‘hop_count_max’ and ‘node_limit’?

  • (Plz, correct me if wrong) It seems the engine constructs a new transaction graph for each query by scanning the entire blocks from the genesis block. I think the entire graph data need to be kept in a database and its subgraph needs to be retrieved when needed.

  • What is the algorithm used for graph traversals? What is your strategy? For example, in the ‘Black Money Trace’, you may use bi-directional BFS from both ending vertices.

  • What do you mean by ‘your cache system’? Do you have a dedicated caching layer on top of your back-end database system?

  • Regarding the performance,

    • In 2.1 (Transaction graph traverse performance), what do you mean by ‘sequential search’? Does it measure the elapsed time for constructing the transaction graph?
    • In 2.2 (Graph construction performance), you measure the update time for a single edge (a new transaction). As Klaytn supports 4000 TPS, the suggested need to support 4000 edge insertions per second at least. Considering that a single transaction can involve multiple asset transfers, the update rates can be even higher.
    • How many concurrent API calls can the suggested system sustain?

Sorry for my many questions :smiley:

Thank you for your interest.

  1. Please check this out (The 5th KIR: S2WLAB_”S2_EYEZ”_Progress Report(2)).

  2. A graph DB has advantages to explorer transaction data, but general purpose graph DBs force strong ACID properties. Thus, we newly designed and implemented a graph database, not commercial or open source ones. Our first deliverable details the internal graph DB (The 5th KIR : S2WLAB_S2_EYEZ_Progress Report(1))

  3. As you can see the evaluation section, it depends on how many nodes are visited. S2_EYEZ is more faster than other storage-oriented block explorers.

  4. After importing transaction data from the blockchain data, S2_EYEZ manages the data in a relational DB. It happens once.

  5. Yes. By default, BFS is preferred.

  6. Yes. Please see the details in our first deliverable.

  7. a) Sequential search means that reading transaction data including address/transaction nodes with their edges. b,c) I think the question B and C are similar. There are two main things to affect the performance: i) How many transactions (from the newly mined block or queries) rise cache-miss and ii) How much memory S2_EYEZ can use. All depend on data in blocks or queries. If users want to guarantee the constant time of them, one possible strategy is S2_EYEZ forcely keeps specific addresses/transactions to memory.

Please leave the comments, if you have further questions :slight_smile:

@Prop_S2WLAB_SH
Dear S2WLAB,
The final disbursement is completed yesterday. Please confirm your receipt through a reply here.
Best,

We received KLAY for the reward well. Thanks :slight_smile: