Skip to content
This repository has been archived by the owner on Jul 5, 2024. It is now read-only.

API Server

Tom Lichtenstein edited this page Nov 22, 2017 · 29 revisions

API-Server

The API-Server is responsible for collecting data from public and private chains, generating statistics from the data and distributing the data to the clients.

Collect data

Public chain data

To gather public chain data, the API-Server accesses the provided API of serveral websites, that offer free statistics of the most important and relevant block chains.

Private chain data

To gather information about our private block chain, the API-Server provides an interface for all docker nodes that are involved in the current block chain setting. The nodes can push their information on the server and the server selects and stores relevant data.

Make data accessable

Our API-Server provides an interface via a http-express server. It responds on data requests with a JSON-string.

Store gathered information

Database Management System

Our storage technology has to handle serveral different analytical tasks and a huge amount of new data that has to be collected simultaniusly. So we thought about choosing a column based database management system to minimize the time for analytical requests. Another big point is the format of the data we are gathering. We chose JSON-objects as our main data format for all data transfers. By choosing a NoSQL DBMS we could easily adapt our database layout to our JSON-objects. But a SQL database should work as well. We did not chose a system yet, but we are focusing our research on column based NoSQL DBMS's.

Filter the amount of incoming data (Idea 1)

To prevent the situation of an overload of incoming data that has to be processed by the DBMS, we could simply place a buffer between the DBMS and the nodes (information source). The buffer could work like a cache and has only one entry for each node. After a time limit the buffer generates a timestamp and flushes the whole block including the timestamp into the database. But it could be a problem if a node sends his data multiple times to the buffer during the time limit if we have only one entry for each node. If we would drop all other incoming datasets it could tamper the statistics. To keep things simple and mostly correct in this situation, we can simply average the stored value in the buffer and the new updated set.

Filter the amount of incoming data (Idea 2)

The second idea is based on the first idea but extends it by allowing asynchronus updates for the buffer. The idea is to provide multiple instances (according to the number of nodes) of our buffers and one router. The buffer instances are only responsible for a small amount of nodes. The router routs incoming data according to the id of the node to the appropriate buffer. After the time limit, the buffer instances can be merged and the data can flush into the database. This approach could tackle the bottleneck issue of the first approach. By dividing the incoming data into smaller groups that can be proccessed asynchronous, the overall speed will increase.

Clone this wiki locally