Skip to content
Arvind Narayanan edited this page Dec 4, 2018 · 17 revisions

Frequently Asked Questions

Table of Contents

General

Where can I find BlockSci's documentation?

Documentation for the Python interface is available here. Most users will want to use this interface.

Does BlockSci support cryptocurrency XYZ?

BlockSci supports many cryptocurrencies that are similar to Bitcoin (e.g., they forked Bitcoin's codebase and made no modifications to the data model). BlockSci comes with a disk parser that is highly optimized for Bitcoin, and a RPC parser that should work with most forks of Bitcoin (but is much slower than the disk parser).

The disk parser can break when a cryptocurrency changes the data format, adds new consensus rules or otherwise changes the rules of how blocks and transactions are created.

Does BlockSci support Monero?

No. Monero's data model is different from Bitcoin's and thus doesn't currently work with BlockSci. It would be possible to extend BlockSci to support Monero, but this is currently not on our roadmap.

Does BlockSci support Ethereum?

No. Ethereum's design is fundamentally different from Bitcoin's and thus incompatible with BlockSci.

Does BlockSci support Omni Layer / Colored Coins / etc.?

BlockSci only handles parsing of the core blockchain layer (layer 1), but exposes any special data stored in the blockchain. Thus, for most protocols that build upon layer 1, you can write your own analysis code.

Related issues:

What software do you use to develop BlockSci?

We're developing BlockSci on OSX using XCode. You can easily generate an XCode project using cmake:

mkdir xcode && cd xcode
cmake -G Xcode ..

We don't have any recommendations for IDEs on other platforms, though we are using gdb to debug BlockSci on Linux.

Does BlockSci run on CentOS / Windows / etc.?

We only provide support for Ubuntu and OSX (MacOS). It may be possible to run BlockSci on other platforms by manually compiling the various dependencies.

Clustering

Does BlockSci provide state-of-the-art clustering?

BlockSci provides the fundamental building blocks of address clustering: multi-input clustering with CoinJoin detection and change address clustering with support for various different change address heuristics.

There are, however, many corner cases (e.g., MtGox allowing users to import their private keys, breaking the multi-input heuristic) that require special treatment to prevent the occurrence of "superclusters". Superclusters are extremely large clusters that occur when different clusters collapse into each other due to over-eager address linking. To some degree, address clustering today is more art than science, and building a highly accurate clustering module, while possible, is not in the current roadmap for BlockSci. Anything that goes beyond the basic address clustering described above, you'll need to implement yourself.

Here's some helpful literature on address clustering:

How do I use BlockSci's clustering module?

We recommend using the clustering module available through the Python interface.

If you haven't used the clusterer before, you'll need to first create a clustering:

import blocksci
chain = blocksci.chain("/path/to/blocksci/data/") # in v0.6 this needs to point to the config file

cm = blocksci.cluster.ClusterManager.create_clustering("/directory/where/cluster/files/can/be/stored", chain)

If you already created such a clustering, you can simply load it:

cm = blocksci.cluster.ClusterManager("/directory/where/cluster/files/can/be/stored", chain)

Which heuristic is the clusterer using by default?

By default, the clusterer is using the following two heuristics:

  • Multi-Input: Inputs that are co-spent in the same transaction are clustered together, unless the transaction looks like a CoinJoin transaction.
  • Legacy Change: If there is an output that has less value than any of the inputs and was the first output to send coins to the associated address, it is clustered as the change address.

How do I use a different change address heuristic?

BlockSci provides a number of different change address heuristics.

You can use a different change address heuristic by passing it to the create_clustering function. For example:

reuse_change_heuristic = blocksci.heuristics.change.address_reuse()
cm = blocksci.cluster.ClusterManager.create_clustering("/directory/where/cluster/files/can/be/stored", chain, reuse_change_heuristic)

How do I disable change address clustering?

Currently, you need to use the following workaround to disable change address clustering:

no_change_heuristic = blocksci.heuristics.change.legacy() - blocksci.heuristics.change.legacy()
cm = blocksci.cluster.ClusterManager.create_clustering("/directory/where/cluster/files/can/be/stored", chain, no_change_heuristic)

In v0.6, you can use the none heuristic:

cm = blocksci.cluster.ClusterManager.create_clustering("/directory/where/cluster/files/can/be/stored", chain, blocksci.heuristics.change.none)

Why is cluster.size() slow?

Clustering works based on equiv addresses. When calling cluster.size(), BlockSci first needs to look up in a database with which address types the equiv addresses are actually used on chain.

Instead, you can use cluster.type_equiv_size which does not need to perform the database lookups but simply returns the number of equiv addresses in the cluster.

Analysis / How do I ...?

How can I map addresses to exchanges or pools?

BlockSci allows to tag address clusters with names, but we don't provide any such tags ourself. There are a few public sources such as WalletExplorer or Blockchain.info, but they may not be reliable or complete.

BlockSci can map blocks to pools by looking at the information contained in the coinbase transaction, but the data we use to identify pools does not cover all pools/coinbase transactions. Furthermore, there's no guarantee that miners report their identity correctly in the coinbase transaction.

blocksci.get_miner(chain[300005])
>>> 'SlushPool'

How do I extract the full scriptPubKey and scriptSig of an output/input?

For most standard scripts, BlockSci does not store the full scriptSig and scriptPubKey but instead extracts the important information and stores it as an Address. Docs » Reference » Address Classes » Addresses provides more information about what information is stored.

The actual scriptSig and scriptPubKey are stored only for non-standard scripts. For example:

myout = chain.tx_with_hash("15c2b9bc3b93e0c0a037c5fa8402d0e34e13d3bb0ce7fca65888e5d24e597dcc").outputs[0]

myout.address_type == blocksci.address_type.nonstandard
>> True

myoutput.address.out_script
>> 'OP_DEPTH OP_1SUB OP_IF OP_RETURN 737069746861736820616e6420796d6f64652c2062726f6772616d6d657273346c796665 OP_ENDIF 0 OP_TOALTSTACK OP_DUP OP_HASH256 efb81cd930d56703304f63d7f94575c4cd17f0985ed2fd126aabf1d866471d2f OP_EQUAL OP_IF 1 OP_TOALTSTACK OP_ENDIF OP_DUP OP_HASH256 9ddd5c986827e8bc5848b4fdc1f8152f597b852ed2429ae7ee2baf7a14096a8f OP_EQUAL OP_IF 1 OP_TOALTSTACK OP_ENDIF OP_DUP OP_HASH256 fda5bd74925349ba07de25db126b9148a7a508e48475c33d2abe7c81a341a3ab OP_EQUAL OP_IF 1 OP_TOALTSTACK OP_ENDIF OP_FROMALTSTACK'

How can I plot the UTXO Age Distribution over time?

See Updating the UTXO set at each block #108

Clone this wiki locally