Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster.txes() is taking too long to execute #81

Closed
0xrajath opened this issue Apr 5, 2018 · 2 comments
Closed

cluster.txes() is taking too long to execute #81

0xrajath opened this issue Apr 5, 2018 · 2 comments

Comments

@0xrajath
Copy link

0xrajath commented Apr 5, 2018

cluster.txes() is taking too long and memory is eventually running out in the AWS instance.
Is there a more optimized way to get all the transactions that a cluster is involved in?

Right now I'm doing the following:

import blocksci
import collections
import pandas as pd
import numpy as np
import blocksci.cluster_python

chain = blocksci.Blockchain("/home/ubuntu/bitcoin")
cm = blocksci.cluster_python.ClusterManager("/home/ubuntu/bitcoin/clusters", chain)
address = chain.address_from_string("1BTCDiceLs79syendE1DM1XCaHcKkzBNnP")
cluster = cm.cluster_with_address(address)
#The below line is taking too long to execute
cluster_txns_list = cluster.txes()

@hkalodner
Copy link
Collaborator

The cluster that your trying to list transactions from is a super cluster which contains over 200 millions addresses which is why you running into problems with this. You can get the number of addresses in a cluster by calling len(cluster) Improving the clustering module is on our todo list, but for now this remains a problem. At some point soon I’m planning on changing calls like cluster.txes() to return iterators rather than lists which will at least make it possible to iterate over large groupings like this.

@hkalodner
Copy link
Collaborator

This situation has been fairly resolved as of v0.5. cluster.txes() still has the same problem which has not been directly fixed. However cluster.outs() returns a OutputIterator which allows you to iterate over all the outputs linked to a cluster without allocating a list of all of them at once. Having cluster.txes() return an iterator rather than a list is infeasible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants