Data Providers

The essential quality of HAPI is database building. In order to build a robust database that will reliably supply smart contracts with verifiable and constant provision of addresses, there is a need to aggregate this data. There is however an issue relative to the time needed for build-up. This process of acquiring data is a long-term one therefore it hinders us from instituting our own database from the outset. As such we are resorting to the use of a third-party data provider that will be a provisional substitute for the time of our own data aggregation. Data provider therefore can be described as a centralized tracing solution that will provide up-to-date data on the blockchain activity and relay this data via oracles to the HAPI SC (smart contracts). By this formula, we can largely circumvent the need for preliminary aggregation before deployment of the protocol and allow it to gather data in a, so to speak, real-life environment.

A data provider is a complex tool that enables a multi-granular analysis of blockchain transactions in real-time by utilizing a slew of intelligence solutions. The core element of the software is automation. It uses automated tracking that initializes tracing of the flow of digital funds across a wide field of intermediate wallets and pinpointing the end-point wallet. It also is able to effectively parse the blockchain into certain datasets that can ultimately be used as a foothold for identifying the likelihood of illicitness. The solution specifies a target, a certain in- and outflow of funds from the specific starting points, for instance, hacked or exploited DeFi project and endpoint, or also known as off-ramp, CEX, or any other entity that enables a withdrawal of fiat.

Our Data Provider is able to classify transactions based on a plethora of factors that ultimately play a role in appointing a respective verdict on their "safety". To that can also be added the ability of a data provider to ascertain the likelihood of the transaction being ML or not. Nominally every transaction is considered to be neutral or non-risky. Depending on the movement of the fund and transactional activity, a tentative verdict can be placed. Since a data provider is able to access (via openness of the ledger) each transaction path with relative ease, there is already accrued data on previous transactions. In this vein, a system is capable of employing historical and chronological methods of identifying the present likelihood of fraud. If a given address has been flagged or marked as "risky" a respective alert is issued, notifying HAPI SC about the potential insecureness or malfeasance of the address in question. Despite a swath of categories used by a data provider, HAPI will package all of the categories and reassort them into only two categories: “risky” or “non-risky”. More on categorization will be expounded in the later paragraphs. To trace transactions we require an identificator of the deposit transaction on the input, normally called curIn (deposit) blockchain. To accurately identify an on-chain transaction we need two main conditions fulfilled.

  • The timestamp of a transaction is closer to the point of it being called-in via API

  • The called-in value was the same as the value carried

An example of a request and response of a simple transaction tracer within Ethereum for one transaction thus will look similar to this:

Request: curl --data '{"method":"trace_call","params":[{ ... },["trace"]],"id":1,"jsonrpc":"2.0"}' -H "Content-Type: application/json" -X POST localhost:8545 Response: {

"id": 1,

"jsonrpc": "2.0",

"result": {

"output": "0x",

"stateDiff": null,

"trace": [{

"action": { ... },

"result": {

"gasUsed": "0x0",

"output": "0x"

},

"subtraces": 0,

"traceAddress": [],

"type": "call"

}],

"vmTrace": null

}

}

In order to identify whether all inputs are related to the same entity in question, the multi-clustering method is often used as a determinator of all inputs belonging to the same end-point. The issue with this method, however, is that multiple clusters can be operated in each cryptocurrency. This complicates further if we are also to consider that each currency may also have a different “behavior” meaning that it cascades complexity even more making it harder to capture overarching data on the activity.

Therefore for us to accurately identify the relation between two or more addresses we need to find similarities. The easiest one would be to establish a common social relationship between transactions/addresses. For example, we shall assume that there is a recipient address that receives coins from two or more addresses in the curOut (withdrawal) or those addresses receive coins from the identical address in the curIn, we can reasonably assume that these addresses have in fact a common social relationship on the basis of having identical interactions within short timestamp. Although we denote the “relatedness” quality of addresses, it doesn’t mean however that we can clearly identify the meaning of each. There can be a number of potential meanings to this including two users simply sending coins to the address unknowing about each other's activities. Despite the meaning and goal of transactions remaining unknown, we can safely conclude that they do represent a kind of relatedness.

From this point on the analysis takes place utilizing proprietary algorithms and machine learning in order to sift, parse, detect, identify and categorize each transaction before and relay the results to the HAPI SC.

Last updated