-
Notifications
You must be signed in to change notification settings - Fork 0
API Server
The API-Server is responsible for collecting data from public and private chains, generating statistics from the data and distributing the data to the clients.
To gather public chain data, the API-Server accesses the provided API of serveral websites, that offer free statistics of the most important and relevant block chains.
To gather information about our private block chain, the API-Server provides an interface for all docker nodes that are involved in the current block chain setting. The nodes can push their information on the server via a socket and the server selects and stores relevant data.
Our API-Server provides an interface via a http-express server. It responds to data requests with a JSON-string.
Our storage technology has to handle serveral different analytical tasks and a huge amount of new data that has to be collected simultaneously. So we thought about choosing a column based database management system to minimize the time for analytical requests. Another major reason is the format of the data we are gathering. We chose JSON-objects as our main data format for all data transfers - the information retrieved by the docker nodes is pushed into the server as JSON-objects. By choosing a NoSQL DBMS we could easily adapt our database layout to these, so there is little to no need parsing the nodes' information. But a SQL database should work as well. We did not chose a system yet, but we are focusing our research on column based NoSQL DBMS's.
To prevent the situation of an overload of incoming data that has to be processed by the DBMS, we could simply place a buffer between the DBMS and the nodes (information source). The buffer could work like a cache and has only one entry for each node. After a time limit the buffer generates a timestamp and flushes the whole block including the timestamp into the database - this allows us to accurately generate statistics and metrics over time periods. However, it could be a problem if a node sends his data multiple times to the buffer during the time limit if we have only one entry for each node. If we would drop all other incoming datasets it could tamper the statistics. To keep things simple and mostly correct in this situation, we can simply average the stored value in the buffer and the new updated set.
The second idea is based on the first idea but extends it by allowing asynchronus updates for the buffer. The idea is to provide multiple instances (according to the number of nodes) of our buffers and one router. The buffer instances are only responsible for a small amount of nodes. The router routs incoming data according to the id of the node to the appropriate buffer. After the time limit, the buffer instances can be merged and the data can flush into the database. This approach could tackle the bottleneck issue of the first approach. By dividing the incoming data into smaller groups that can be proccessed asynchronous, the overall speed will increase.