Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero-length clusters #219

Open
geofurb opened this issue Nov 30, 2018 · 2 comments
Open

Zero-length clusters #219

geofurb opened this issue Nov 30, 2018 · 2 comments

Comments

@geofurb
Copy link

geofurb commented Nov 30, 2018

Trying to get the size of clusters returns all zero length for all clusters. c.type_equiv_size does not seem to cause this issue. This issue seems to be tied to a behavior where iterating over a cluster takes an exceptionally long amount of time, even for small clusters (e.g. c.type_equiv_size=10). This may be related to #200.

cm = blocksci.cluster.ClusterManager(cluster_data_dir, chain)
for c in cm.clusters():
    print(c.size())

Reproduction Steps

import blocksci
import blocksci.cluster

chain = blocksci.Blockchain(BITCOIN_DATA_DIR)
cm = blocksci.cluster.ClusterManager(BITCOIN_CLUSTER_DIR, chain)

cm = blocksci.cluster.ClusterManager(cluster_data_dir, chain)
for c in cm.clusters():
    print(c.size())

System Information

BlockSci version: 0.5
Using AMI: no
Compiled under Ubuntu 16.04
cmake version 3.12.4
gcc/g++ 7.3.0-21ubuntu1~16.04
Anaconda version 3.5.1 (Python 3.7.0)
Total memory: 64 GB DRAM, 188GB swap

Dependencies installed:
blocksci==0.5.0

  • dateparser [required: >=0.6.0, installed: 0.7.0]
    • python-dateutil [required: Any, installed: 2.6.1]
      • six [required: >=1.5, installed: 1.10.0]
    • pytz [required: Any, installed: 2017.2]
    • regex [required: Any, installed: 2018.11.2]
    • tzlocal [required: Any, installed: 1.5.1]
      • pytz [required: Any, installed: 2017.2]
  • multiprocess [required: >=0.70.5, installed: 0.70.6.1]
    • dill [required: >=0.2.8.1, installed: 0.2.8.2]
  • pandas [required: >=0.22.0, installed: 0.23.4]
    • numpy [required: >=1.9.0, installed: 1.13.1]
    • python-dateutil [required: >=2.5.0, installed: 2.6.1]
      • six [required: >=1.5, installed: 1.10.0]
    • pytz [required: >=2011k, installed: 2017.2]
  • psutil [required: >=5.4.2, installed: 5.4.8]
  • pycrypto [required: >=2.6.1, installed: 2.6.1]
@geofurb
Copy link
Author

geofurb commented Nov 30, 2018

Accessing an individual cluster's addresses takes a very long time and returns an empty list:

IPython console

chain = blocksci.Blockchain(BITCOIN_DATA_DIR)
cx = blocksci.cluster.ClusterManager(CUSTOM_CLUSTER_DIR,chain)
len(cx.clusters())
Out[7]: 330464891
clist = list(cx.clusters())
a = clist[6]
a
Out[21]: <blocksci.cluster.Cluster at 0x7f17ebced688>
a.addresses
Out[22]: <blocksci.AddressIterator at 0x7efdf84d07a0>
[x for x in a.addresses]
Out[23]: []
a.type_equiv_size
Out[24]: 125

@geofurb
Copy link
Author

geofurb commented Dec 1, 2018

I've uploaded my bitcoin-data and bitcoin-clusters directories here, in case it helps with reproducing the error. You might want to let that run while you're at lunch; it's a 102 GB download, and when you unzip the *.tar.bz2 (which will also likely take forever), it's something like 170 - 180 GB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants