You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Caused by:
Process read_clustering (1) terminated with an error exit status (1)
Command executed [/home/minion/git/NanoCLUST/templates/umap_hdbscan.py]:
#!/usr/bin/env python
import numpy as np
import umap
import matplotlib.pyplot as plt
from sklearn import decomposition
import random
import pandas as pd
import hdbscan
df = pd.read_csv("freqs.txt", delimiter=" ")
#UMAP
motifs = [x for x in df.columns.values if x not in ["read", "length"]]
X = df.loc[:,motifs]
X_embedded = umap.UMAP(n_neighbors=15, min_dist=0.1, verbose=2).fit_transform(X)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File ".command.sh", line 23, in
umap_out["bin_id"] = hdbscan.HDBSCAN(min_cluster_size=int(50), cluster_selection_epsilon=int(0.5)).fit_predict(X)
File "/home/minion/git/NanoCLUST/work/conda/read_clustering-5ad1d823e66c1828058a33f36a6c51c6/lib/python3.8/site-packages/hdbscan/hdbscan_.py", line 941, in fit_predict
self.fit(X)
File "/home/minion/git/NanoCLUST/work/conda/read_clustering-5ad1d823e66c1828058a33f36a6c51c6/lib/python3.8/site-packages/hdbscan/hdbscan_.py", line 919, in fit
self.min_spanning_tree) = hdbscan(X, **kwargs)
File "/home/minion/git/NanoCLUST/work/conda/read_clustering-5ad1d823e66c1828058a33f36a6c51c6/lib/python3.8/site-packages/hdbscan/hdbscan.py", line 610, in hdbscan
(single_linkage_tree, result_min_span_tree) = memory.cache(
File "/home/minion/git/NanoCLUST/work/conda/read_clustering-5ad1d823e66c1828058a33f36a6c51c6/lib/python3.8/site-packages/joblib/memory.py", line 349, in call
return self.func(*args, **kwargs)
File "/home/minion/git/NanoCLUST/work/conda/read_clustering-5ad1d823e66c1828058a33f36a6c51c6/lib/python3.8/site-packages/hdbscan/hdbscan_.py", line 275, in _hdbscan_boruvka_kdtree
alg = KDTreeBoruvkaAlgorithm(tree, min_samples, metric=metric,
File "hdbscan/_hdbscan_boruvka.pyx", line 375, in hdbscan._hdbscan_boruvka.KDTreeBoruvkaAlgorithm.init
File "hdbscan/_hdbscan_boruvka.pyx", line 411, in hdbscan._hdbscan_boruvka.KDTreeBoruvkaAlgorithm._compute_bounds
File "/home/minion/git/NanoCLUST/work/conda/read_clustering-5ad1d823e66c1828058a33f36a6c51c6/lib/python3.8/site-packages/joblib/parallel.py", line 1043, in call
if self.dispatch_one_batch(iterator):
File "/home/minion/git/NanoCLUST/work/conda/read_clustering-5ad1d823e66c1828058a33f36a6c51c6/lib/python3.8/site-packages/joblib/parallel.py", line 833, in dispatch_one_batch
islice = list(itertools.islice(iterator, big_batch_size))
File "hdbscan/_hdbscan_boruvka.pyx", line 412, in genexpr
TypeError: delayed() got an unexpected keyword argument 'check_pickle'
Work dir:
/home/minion/git/NanoCLUST/work/d5/c1956140ebe9ed2b034fdd72099a72
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named .command.sh
The text was updated successfully, but these errors were encountered:
Hi all when I run the code below (after adjust the conda environment yml files to umap-learn =0.5.3 and blast+=2.12.0 the demo data work.
nextflow run main.nf --reads "test_datasets/mock4_run3bc08_5000.fastq" --db "./db/16S_ribosomal_RNA" --tax "db/taxdb/" -profile conda
When I try the same approach on my own data (5 kreads of full length 16S) with the following code:
nextflow run main.nf --reads "/path/to/barcode01.fastq" --db "./db/16S_ribosomal_RNA" --tax "db/taxd
b/" -profile conda
Error executing process > 'read_clustering (1)'
Caused by:
Process
read_clustering (1)
terminated with an error exit status (1)Command executed [/home/minion/git/NanoCLUST/templates/umap_hdbscan.py]:
#!/usr/bin/env python
import numpy as np
import umap
import matplotlib.pyplot as plt
from sklearn import decomposition
import random
import pandas as pd
import hdbscan
df = pd.read_csv("freqs.txt", delimiter=" ")
#UMAP
motifs = [x for x in df.columns.values if x not in ["read", "length"]]
X = df.loc[:,motifs]
X_embedded = umap.UMAP(n_neighbors=15, min_dist=0.1, verbose=2).fit_transform(X)
df_umap = pd.DataFrame(X_embedded, columns=["D1", "D2"])
umap_out = pd.concat([df["read"], df["length"], df_umap], axis=1)
#HDBSCAN
X = umap_out.loc[:,["D1", "D2"]]
umap_out["bin_id"] = hdbscan.HDBSCAN(min_cluster_size=int(50), cluster_selection_epsilon=int(0.5)).fit_predict(X)
#PLOT
plt.figure(figsize=(20,20))
plt.scatter(X_embedded[:, 0], X_embedded[:, 1], c=umap_out["bin_id"], cmap='Spectral', s=1)
plt.xlabel("UMAP1", fontsize=18)
plt.ylabel("UMAP2", fontsize=18)
plt.gca().set_aspect('equal', 'datalim')
plt.title("Projecting " + str(len(umap_out['bin_id'])) + " reads. " + str(len(umap_out['bin_id'].unique())) + " clusters generated by HDBSCAN", fontsize=18)
for cluster in np.sort(umap_out['bin_id'].unique()):
read = umap_out.loc[umap_out['bin_id'] == cluster].iloc[0]
plt.annotate(str(cluster), (read['D1'], read['D2']), weight='bold', size=14)
plt.savefig('hdbscan.output.png')
umap_out.to_csv("hdbscan.output.tsv", sep=" ", index=False)
Command exit status:
1
Command output:
UMAP( verbose=2)
Tue Jun 7 23:36:49 2022 Construct fuzzy simplicial set
Tue Jun 7 23:36:50 2022 Finding Nearest Neighbors
Tue Jun 7 23:36:50 2022 Building RP forest with 21 trees
Tue Jun 7 23:36:55 2022 NN descent for 17 iterations
1 / 17
2 / 17
3 / 17
4 / 17
5 / 17
6 / 17
Stopping threshold met -- exiting after 6 iterations
Tue Jun 7 23:37:14 2022 Finished Nearest Neighbor Search
Tue Jun 7 23:37:17 2022 Construct embedding
Tue Jun 7 23:38:27 2022 Finished embedding
Command error:
Epochs completed: 91%| █████████ 182/200 [00:51]
Epochs completed: 92%| █████████▏ 183/200 [00:51]
Epochs completed: 92%| █████████▏ 184/200 [00:52]
Epochs completed: 92%| █████████▎ 185/200 [00:52]
Epochs completed: 93%| █████████▎ 186/200 [00:52]
Epochs completed: 94%| █████████▎ 187/200 [00:52]
Epochs completed: 94%| █████████▍ 188/200 [00:53]
Epochs completed: 94%| █████████▍ 189/200 [00:53]
Epochs completed: 95%| █████████▌ 190/200 [00:53]
Epochs completed: 96%| █████████▌ 191/200 [00:54]
Epochs completed: 96%| █████████▌ 192/200 [00:54]
Epochs completed: 96%| █████████▋ 193/200 [00:54]
Epochs completed: 97%| █████████▋ 194/200 [00:54]
Epochs completed: 98%| █████████▊ 195/200 [00:55]
Epochs completed: 98%| █████████▊ 196/200 [00:55]
Epochs completed: 98%| █████████▊ 197/200 [00:55]
Epochs completed: 99%| █████████▉ 198/200 [00:55]
Epochs completed: 100%| █████████▉ 199/200 [00:56]
Epochs completed: 100%| ██████████ 200/200 [00:56]
Epochs completed: 100%| ██████████ 200/200 [00:56]
Traceback (most recent call last):
File "/home/minion/git/NanoCLUST/work/conda/read_clustering-5ad1d823e66c1828058a33f36a6c51c6/lib/python3.8/site-packages/joblib/parallel.py", line 822, in dispatch_one_batch
tasks = self._ready_batches.get(block=False)
File "/home/minion/git/NanoCLUST/work/conda/read_clustering-5ad1d823e66c1828058a33f36a6c51c6/lib/python3.8/queue.py", line 167, in get
raise Empty
_queue.Empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File ".command.sh", line 23, in
umap_out["bin_id"] = hdbscan.HDBSCAN(min_cluster_size=int(50), cluster_selection_epsilon=int(0.5)).fit_predict(X)
File "/home/minion/git/NanoCLUST/work/conda/read_clustering-5ad1d823e66c1828058a33f36a6c51c6/lib/python3.8/site-packages/hdbscan/hdbscan_.py", line 941, in fit_predict
self.fit(X)
File "/home/minion/git/NanoCLUST/work/conda/read_clustering-5ad1d823e66c1828058a33f36a6c51c6/lib/python3.8/site-packages/hdbscan/hdbscan_.py", line 919, in fit
self.min_spanning_tree) = hdbscan(X, **kwargs)
File "/home/minion/git/NanoCLUST/work/conda/read_clustering-5ad1d823e66c1828058a33f36a6c51c6/lib/python3.8/site-packages/hdbscan/hdbscan.py", line 610, in hdbscan
(single_linkage_tree, result_min_span_tree) = memory.cache(
File "/home/minion/git/NanoCLUST/work/conda/read_clustering-5ad1d823e66c1828058a33f36a6c51c6/lib/python3.8/site-packages/joblib/memory.py", line 349, in call
return self.func(*args, **kwargs)
File "/home/minion/git/NanoCLUST/work/conda/read_clustering-5ad1d823e66c1828058a33f36a6c51c6/lib/python3.8/site-packages/hdbscan/hdbscan_.py", line 275, in _hdbscan_boruvka_kdtree
alg = KDTreeBoruvkaAlgorithm(tree, min_samples, metric=metric,
File "hdbscan/_hdbscan_boruvka.pyx", line 375, in hdbscan._hdbscan_boruvka.KDTreeBoruvkaAlgorithm.init
File "hdbscan/_hdbscan_boruvka.pyx", line 411, in hdbscan._hdbscan_boruvka.KDTreeBoruvkaAlgorithm._compute_bounds
File "/home/minion/git/NanoCLUST/work/conda/read_clustering-5ad1d823e66c1828058a33f36a6c51c6/lib/python3.8/site-packages/joblib/parallel.py", line 1043, in call
if self.dispatch_one_batch(iterator):
File "/home/minion/git/NanoCLUST/work/conda/read_clustering-5ad1d823e66c1828058a33f36a6c51c6/lib/python3.8/site-packages/joblib/parallel.py", line 833, in dispatch_one_batch
islice = list(itertools.islice(iterator, big_batch_size))
File "hdbscan/_hdbscan_boruvka.pyx", line 412, in genexpr
TypeError: delayed() got an unexpected keyword argument 'check_pickle'
Work dir:
/home/minion/git/NanoCLUST/work/d5/c1956140ebe9ed2b034fdd72099a72
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named
.command.sh
The text was updated successfully, but these errors were encountered: