IP Sibling Detection Toolset
A Python framework designed to measure clock skew and Bitcoin-specific properties over all nodes of the Bitcoin network.
The list of nodes is acquired using the Bitnodes.io API. Details about the high-level process may be found in the
Bachelor's thesis that produced this framework, which is provided in thesis-pnowak-github.pdf
.
The statistics produced by the evaluation chapter of the thesis are provided in the measurements/
directory.
usage: ipsiblings [-h] [-d BASE_DIR] [--run-id RUN_ID] [--export-plots] [--evaluator {TCPRAW_SCHEITLE,TCPRAW_STARKE,DOMAIN,SSH_KEYSCAN,TCP_OPTIONS,ML_STARKE,BITCOIN,BITCOIN_ADDR,BITCOIN_ADDR_SVC}]
[--skip-evaluator {TCPRAW_SCHEITLE,TCPRAW_STARKE,DOMAIN,SSH_KEYSCAN,TCP_OPTIONS,ML_STARKE,BITCOIN,BITCOIN_ADDR,BITCOIN_ADDR_SVC}] [--eval-batch-size EVAL_BATCH_SIZE]
[--eval-fail-fast] [--eval-ssh-timeout EVAL_SSH_TIMEOUT] [--eval-first-batch EVAL_FIRST_BATCH] [--eval-batch-count EVAL_BATCH_COUNT] [--eval-discard-results]
[--eval-totals-in-memory] [--eval-recalc-stats] [--skip-eval] [--only-init] [-v | -q] [--targets-from {BITCOIN,FILESYSTEM}] [--skip-v SKIP_V] [--from START_INDEX] [--to END_INDEX]
[--do-harvest] [--really-harvest] [--harvester {TCP_TS,BTC}] [-hd HARVEST_DURATION] [-ti TS_INTERVAL] [-bi BTC_INTERVAL] [-htf HARVEST_TIMEOUT_FINAL] [--skip-os]
[--skip-os-sysctls] [--skip-os-iptables] [--skip-os-ntp]
IP Siblings Toolset
optional arguments:
-h, --help show this help message and exit
PATHS:
-d BASE_DIR, --base-dir BASE_DIR
Base directory for application data (default ./target)
--run-id RUN_ID Identifier for the run to contribute to, appended to the base directory. (default current date-time)
EVALUATION:
--export-plots Export plots after evaluation
--evaluator {TCPRAW_SCHEITLE,TCPRAW_STARKE,DOMAIN,SSH_KEYSCAN,TCP_OPTIONS,ML_STARKE,BITCOIN,BITCOIN_ADDR,BITCOIN_ADDR_SVC}
Select a specific evaluator instead of running all of them. May be specified multiple times.
--skip-evaluator {TCPRAW_SCHEITLE,TCPRAW_STARKE,DOMAIN,SSH_KEYSCAN,TCP_OPTIONS,ML_STARKE,BITCOIN,BITCOIN_ADDR,BITCOIN_ADDR_SVC}
Skip a specific evaluator. May be specified multiple times.
--eval-batch-size EVAL_BATCH_SIZE
Candidates to evaluate per batch (default 10_000)
--eval-fail-fast Exit immediately upon the first evaluation exception.
--eval-ssh-timeout EVAL_SSH_TIMEOUT
Timeout in seconds per batch for SSH keyscan, default 60.
--eval-first-batch EVAL_FIRST_BATCH
Start counting eval batches at this number, default 0.
--eval-batch-count EVAL_BATCH_COUNT
How many batches to evaluate, default all.
--eval-discard-results
Discard evaluation results (to only invoke side effects, e.g. keyscan)
--eval-totals-in-memory
Compute total stats in memory
--eval-recalc-stats Recompute stats for each batch from candidates.tsv, no evaluation
SKIP STEPS:
--skip-eval Skip any interpretation of collected data
--only-init Exit after loading configuration
LOGGING:
-v, --verbose Increase verbosity once per call
-q, --quiet Decrease verbosity once per call
TARGET NODES:
--targets-from {BITCOIN,FILESYSTEM}
Where to get target nodes from (default TargetProviderChoice.BITCOIN)
--skip-v SKIP_V Skip IPvX addresses while acquiring targets (for testing, may be specified multiple times, ignored for filesystem provider)
--from START_INDEX Index of first target to consider (default 0)
--to END_INDEX Index of first target to skip (default none)
TIMESTAMP COLLECTION:
--do-harvest Collect (harvest) if no timestamps present
--really-harvest Harvest even if we already have timestamps
--harvester {TCP_TS,BTC}
Select a specific harvester instead of running all of them. May be specified multiple times.
-hd HARVEST_DURATION, --harvest-duration HARVEST_DURATION
Collection duration, seconds (default 36000)
-ti TS_INTERVAL, --ts-interval TS_INTERVAL
Collection interval for timestamps per target, seconds (default 60)
-bi BTC_INTERVAL, --btc-interval BTC_INTERVAL
Collection interval for Bitcoin protocol per target, seconds (default 1800 / 30min)
-htf HARVEST_TIMEOUT_FINAL, --harvest-timeout-final HARVEST_TIMEOUT_FINAL
Wait at least this long for replies after the last iteration (default 120)
OPERATING SYSTEM SETTINGS:
By default, we adapt some global (!!) OS settings. The previous values are saved to ./settings.bak and restored when the application exits.
--skip-os Skip all OS settings
--skip-os-sysctls Skip overwriting necessary sysctls
--skip-os-iptables Skip adding necessary iptables rules
--skip-os-ntp Skip disabling NTP client
Allowed outgoing dependencies:
- any → model, config, logsetup, libconstants (legacy)
- bootstrap, scripts → any
- evaluation → harvesting → preparation
For measurement, you need a dual-stack server with sufficient bandwidth and at least 1 GB of RAM. It is advised to enable at least 2 GB of swap (file-based is fine) to handle spikes due to other applications -- you don't want a ten-hour run to be OOM-killed at the ninth hour. If swap is not desired, an appropriately higher amount of RAM is necessary.
For evaluation, more resources are necessary. It is technically possible to evaluate on a low-end host such as the one
suggested for measurement above, but you need to specify a very low batch size (10k or less) and it will take a long
time, producing many batches that need to be merged. Note that, at the time of writing, the Bitcoin network consists of
around 6k v4 nodes and 1.5k v6 nodes, which results in around 6 million candidates for evaluation. For this work, a
Linux server with 16 GB of RAM was used. Note that evaluation is single-threaded and performance may be improved by
having multiple processes responsible for different batch numbers. Statistics collection for this execution mode may be
performed in a later run over all batches with the --eval-recalc-stats
flag.
Data is stored in tab-separated values (TSV) files. The reason for this
is that we have nested lists which we want to separate using a reasonable
separator. The separator hierarchy is TAB, ,
, :
.
Target information for the filesystem
target provider is stored using
tuples of the following format:
Field | Example | Description | ||
---|---|---|---|---|
* IP Version | 4 | IP Version of the address | ||
* IP Address | 127.0.0.1 | Target IP address | ||
* Port | 8333 | Target TCP port | ||
Target Domains | bitcoin.example.com,bitcoin2.example.com | Domains known to be associated with this target, comma-separated. | ||
TCP Options | Timestamp:1234:0,MSS:1024 | TCP Options captured with the first harvested timestamp, comma-separated. The constant string --None-- if absent. Multiple values for the same option are separated by colons, and the first field is always the name of the option. |
||
...TSval | 1234 | First captured TCP timestamp value for this target, integer as returned by the TCP stack. The granularity of this varies and is not defined by the standard. | ||
...Reception Timestamp | 34132453.567 | Reception timestamp populated by our TCP stack, in seconds since the Unix epoch, with fractional values. | ||
... | The last two fields (marked with ... ) are repeated for every received timestamp. |
Individual candidate evaluation results (candidates.tsv
)
are stored with column names corresponding to the
export
method of SiblingProperty
(for all properties that were requested
during evaluation).
The remaining file structures are implementation details and subject to change.
via pip/setuptools:
python3 -m pip install -e .
As per https://packaging.python.org/tutorials/packaging-projects/.
Increment the version number in setup.py
.
Then run ./local_install.sh
.
You can deploy this to a Linux server using the remote_install.sh
script.
Note that you need to have python3 python3-venv cmake
installed
on the remote server if it is running Debian.
Further, for Python-Bitcoinlib, we need the libssl-dev
package.
It is desirable for the measuring platform not to run as root user.
For raw socket access, we could use CAP_NET_RAW
, passed to the
Python process via ambient capabilities, as described
here.
We could provide a C/Rust/CPython binary wrapper that does
this, since we do not want the general Python executable
(even of a venv) to have that capability, which allows to
read and write ALL network traffic of the system.
However, we cannot modify sysctl values if we are not root, which means we'd need to move that logic into the wrapper as well, and also have the wrapper setuid to root, which is an improvement but still not the best. Further, the OS settings logic is complex enough that we wouldn't want it to be written in C if avoidable.
Hence, for the time being, we still require to be run as root.
- Reduce disk space and access times by working with a database instead of text files
- Set this up to be distributed with multiple measurement hosts
- More sophisticated Bitcoin application-layer check
- More ideas in the Future Work section of the thesis
The original base of this work, https://github.com/tumi8/siblings, is licensed under GPLv2. The immediate base, created
by Marco Starke as a result of his
Master's thesis, does not include a
license file but notes GPLv3 in the ipsiblings/__init__.py
file. This base does not seem to be available publicly, but
can be located in the shared Git history at the starke
tag. Due to being based on a GPLv2 work, a compatible license
is required, which GPLv3 is, so we assume that the license designation in that file is intentional and valid.