Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

anvio #280

Closed
boegel opened this issue Jan 31, 2024 · 3 comments
Closed

anvio #280

boegel opened this issue Jan 31, 2024 · 3 comments
Assignees
Labels
container Container image available difficulty: easy software that should be easy to support easyconfig Easyconfig is available priority: medium Python site:ugent Software installation request for UGent Tier-2 update

Comments

@boegel
Copy link
Contributor

boegel commented Jan 31, 2024

@boegel boegel added difficulty: easy software that should be easy to support priority: medium Python update site:ugent Software installation request for UGent Tier-2 container Container image available easyconfig Easyconfig is available labels Jan 31, 2024
boegel added a commit that referenced this issue Jan 31, 2024
@boegel boegel self-assigned this Jan 31, 2024
boegel added a commit that referenced this issue Jan 31, 2024
@boegel
Copy link
Contributor Author

boegel commented Feb 1, 2024

I spent quite a bit of time to try and get anvio v8 working on top of foss/2023a, but ran into trouble because the scikit-learn and pandas (in SciPy-bundle were too new).

When using scikit-learn-1.3.1-gfbf-2023a.eb as dependency for anvio-8-foss-2023a.eb, the "anvi-self-test --suite mini --no-interactive" sanity check command was failing with "ValueError: node array from the pickle has an incompatible dtype".

Traceback (most recent call last):
  File "/user/gent/400/vsc40023/eb_arcaninescratch/RHEL8/skylake-ib/software/anvio/8-foss-2023a/bin/anvi-interactive", line 122, in <module>
    d = interactive.Interactive(args)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/user/gent/400/vsc40023/eb_arcaninescratch/RHEL8/skylake-ib/software/anvio/8-foss-2023a/lib/python3.11/site-packages/anvio/interactive.py", line 211, in __init__
    self.completeness = Completeness(self.contigs_db_path)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/user/gent/400/vsc40023/eb_arcaninescratch/RHEL8/skylake-ib/software/anvio/8-foss-2023a/lib/python3.11/site-packages/anvio/completeness.py", line 45, in __init__
    self.SCG_domain_predictor = scgdomainclassifier.Predict(argparse.Namespace(), run=terminal.Run(verbose=False), progress=self.progress)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/user/gent/400/vsc40023/eb_arcaninescratch/RHEL8/skylake-ib/software/anvio/8-foss-2023a/lib/python3.11/site-packages/anvio/scgdomainclassifier.py", line 234, in __init__
    SCGDomainClassifier.__init__(self, args, run, progress)
  File "/user/gent/400/vsc40023/eb_arcaninescratch/RHEL8/skylake-ib/software/anvio/8-foss-2023a/lib/python3.11/site-packages/anvio/scgdomainclassifier.py", line 73, in __init__
    self.rf.initialize_classifier()
  File "/user/gent/400/vsc40023/eb_arcaninescratch/RHEL8/skylake-ib/software/anvio/8-foss-2023a/lib/python3.11/site-packages/anvio/learning.py", line 103, in initialize_classifier
    classifier_obj = pickle.load(open(self.classifier_object_path, 'rb'))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "sklearn/tree/_tree.pyx", line 728, in sklearn.tree._tree.Tree.__setstate__
  File "sklearn/tree/_tree.pyx", line 1432, in sklearn.tree._tree._check_node_ndarray
ValueError: node array from the pickle has an incompatible dtype:
- expected: {'names': ['left_child', 'right_child', 'feature', 'threshold', 'impurity', 'n_node_samples', 'weighted_n_node_samples', 'missing_go_to_left'], 'formats': ['<i8', '<i8', '<i8', '<f8', '<f8', ''
<i8', '<f8', 'u1'], 'offsets': [0, 8, 16, 24, 32, 40, 48, 56], 'itemsize': 64}
- got     : [('left_child', '<i8'), ('right_child', '<i8'), ('feature', '<i8'), ('threshold', '<f8'), ('impurity', '<f8'), ('n_node_samples', '<i8'), ('weighted_n_node_samples', '<f8')]

When using scikit-learn 1.2.2 as extension in anvio-8-foss-2023a.eb, the error changed to "gzip.BadGzipFile: Incorrect length of data produced"

Traceback (most recent call last):
  File "/user/gent/400/vsc40023/eb_arcaninescratch/RHEL8/skylake-ib/software/anvio/8-foss-2023a/bin/anvi-summarize", line 123, in <module>
    main(args)
  File "/user/gent/400/vsc40023/eb_arcaninescratch/RHEL8/skylake-ib/software/anvio/8-foss-2023a/bin/anvi-summarize", line 64, in main
    summary = summarizer.ProfileSummarizer(args)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/user/gent/400/vsc40023/eb_arcaninescratch/RHEL8/skylake-ib/software/anvio/8-foss-2023a/lib/python3.11/site-packages/anvio/summarizer.py", line 707, in __init__
    DatabasesMetaclass.__init__(self, self.args, self.run, self.progress)
  File "/user/gent/400/vsc40023/eb_arcaninescratch/RHEL8/skylake-ib/software/anvio/8-foss-2023a/lib/python3.11/site-packages/anvio/dbops.py", line 3778, in __init__
    ProfileSuperclass.__init__(self, self.args, self.run, self.progress)
  File "/user/gent/400/vsc40023/eb_arcaninescratch/RHEL8/skylake-ib/software/anvio/8-foss-2023a/lib/python3.11/site-packages/anvio/dbops.py", line 3014, in __init__
    self.init_gene_level_coverage_stats_dicts(outliers_threshold=outliers_threshold,
  File "/user/gent/400/vsc40023/eb_arcaninescratch/RHEL8/skylake-ib/software/anvio/8-foss-2023a/lib/python3.11/site-packages/anvio/dbops.py", line 3191, in init_gene_level_coverage_stats_dicts
    self.init_split_coverage_values_per_nt_dict(split_names)
  File "/user/gent/400/vsc40023/eb_arcaninescratch/RHEL8/skylake-ib/software/anvio/8-foss-2023a/lib/python3.11/site-packages/anvio/dbops.py", line 3254, in init_split_coverage_values_per_nt_dict
    self.split_coverage_values_per_nt_dict[split_name] = self.split_coverage_values.get(split_name)
                                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/user/gent/400/vsc40023/eb_arcaninescratch/RHEL8/skylake-ib/software/anvio/8-foss-2023a/lib/python3.11/site-packages/anvio/auxiliarydataops.py", line 149, in get
    coverage_array = utils.convert_binary_blob_to_numpy_array(blob, dtype=self.coverage_dtype)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/user/gent/400/vsc40023/eb_arcaninescratch/RHEL8/skylake-ib/software/anvio/8-foss-2023a/lib/python3.11/site-packages/anvio/utils.py", line 782, in convert_binary_blob_to_numpy_array
    return np.frombuffer(gzip.decompress(blob), dtype=dtype)
                         ^^^^^^^^^^^^^^^^^^^^^
  File "/user/gent/400/vsc40023/eb_arcaninescratch/RHEL8/skylake-ib/software/Python/3.11.3-GCCcore-12.3.0/lib/python3.11/gzip.py", line 614, in decompress
    raise BadGzipFile("Incorrect length of data produced")
gzip.BadGzipFile: Incorrect length of data produced

My best guess is that this is caused by using a too recent pandas (2.0.3 as included in SciPy-bundle v2023.07, instead of the expected pandas 1.4.4).

These problems do not occur when using foss/2022b and the standard scikit-learn 1.2.1 + pandas 1.4.2 (in SciPy-bundle 2022.05).

@boegel
Copy link
Contributor Author

boegel commented Feb 1, 2024

@boegel
Copy link
Contributor Author

boegel commented Feb 8, 2024

PR merged, software installed, so closing...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
container Container image available difficulty: easy software that should be easy to support easyconfig Easyconfig is available priority: medium Python site:ugent Software installation request for UGent Tier-2 update
Projects
None yet
Development

No branches or pull requests

1 participant