You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was getting the following error for both the test data and my own data:
executor > local (5)
[98/2d4484] process > QC (1) [100%] 1 of 1 ✔
[87/209bff] process > fastqc (1) [100%] 1 of 1 ✔
[9b/223d33] process > kmer_freqs (1) [100%] 1 of 1 ✔
[e6/82bfaa] process > read_clustering (1) [100%] 1 of 1, failed: 1 ✘
[- ] process > split_by_cluster -
[- ] process > read_correction -
[- ] process > draft_selection -
[- ] process > racon_pass -
[- ] process > medaka_pass -
[- ] process > consensus_classification -
[- ] process > join_results -
[- ] process > get_abundances -
[- ] process > plot_abundances -
[80/872ff6] process > output_documentation [100%] 1 of 1 ✔
Error executing process > 'read_clustering (1)'
Caused by:
Process `read_clustering (1)` terminated with an error exit status (1)
Command executed [/home/idun/1_Software/NanoCLUST/templates/umap_hdbscan.py]:
#!/usr/bin/env python
import numpy as np
import umap
import matplotlib.pyplot as plt
from sklearn import decomposition
import random
import pandas as pd
import hdbscan
df = pd.read_csv("freqs.txt", delimiter=" ")
#UMAP
motifs = [x for x in df.columns.values if x not in ["read", "length"]]
X = df.loc[:,motifs]
X_embedded = umap.UMAP(n_neighbors=15, min_dist=0.1, verbose=2).fit_transform(X)
df_umap = pd.DataFrame(X_embedded, columns=["D1", "D2"])
umap_out = pd.concat([df["read"], df["length"], df_umap], axis=1)
#HDBSCAN
X = umap_out.loc[:,["D1", "D2"]]
umap_out["bin_id"] = hdbscan.HDBSCAN(min_cluster_size=int(50), cluster_selection_epsilon=int(0.5)).fit_predict(X)
#PLOT
plt.figure(figsize=(20,20))
plt.scatter(X_embedded[:, 0], X_embedded[:, 1], c=umap_out["bin_id"], cmap='Spectral', s=1)
plt.xlabel("UMAP1", fontsize=18)
plt.ylabel("UMAP2", fontsize=18)
plt.gca().set_aspect('equal', 'datalim')
plt.title("Projecting " + str(len(umap_out['bin_id'])) + " reads. " + str(len(umap_out['bin_id'].unique())) + " clusters generated by HDBSCAN", fontsize=18)
for cluster in np.sort(umap_out['bin_id'].unique()):
read = umap_out.loc[umap_out['bin_id'] == cluster].iloc[0]
plt.annotate(str(cluster), (read['D1'], read['D2']), weight='bold', size=14)
plt.savefig('hdbscan.output.png')
umap_out.to_csv("hdbscan.output.tsv", sep=" ", index=False)
Command exit status:
1
Command output:
(empty)
Command error:
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-dyrbsl_v because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
sys:1: DtypeWarning: Columns (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511,512,513) have mixed types.Specify dtype option on import or set low_memory=False.
Traceback (most recent call last):
File ".command.sh", line 16, in <module>
X_embedded = umap.UMAP(n_neighbors=15, min_dist=0.1, verbose=2).fit_transform(X)
File "/opt/conda/envs/read_clustering/lib/python3.8/site-packages/umap/umap_.py", line 2014, in fit_transform
self.fit(X, y)
File "/opt/conda/envs/read_clustering/lib/python3.8/site-packages/umap/umap_.py", line 1613, in fit
X = check_array(X, dtype=np.float32, accept_sparse="csr", order="C")
File "/opt/conda/envs/read_clustering/lib/python3.8/site-packages/sklearn/utils/validation.py", line 72, in inner_f
return f(**kwargs)
File "/opt/conda/envs/read_clustering/lib/python3.8/site-packages/sklearn/utils/validation.py", line 598, in check_array
array = np.asarray(array, order=order, dtype=dtype)
File "/opt/conda/envs/read_clustering/lib/python3.8/site-packages/numpy/core/_asarray.py", line 83, in asarray
return array(a, dtype, copy=False, order=order)
File "/opt/conda/envs/read_clustering/lib/python3.8/site-packages/pandas/core/generic.py", line 1778, in __array__
return np.asarray(self._values, dtype=dtype)
File "/opt/conda/envs/read_clustering/lib/python3.8/site-packages/numpy/core/_asarray.py", line 83, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: could not convert string to float: 'TTTTG'
Work dir:
/home/idun/1_Software/NanoCLUST/work/e6/82bfaa94d00dc318b1037dc0f4851f
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
It seemed to be caused by a first line in the freqs.txt that was not being skipped (see below), so the dataframe in the umap_hdbscan.py script did not get loaded in correctly.
I changed line 11 of umap_hdbscan.py to skip the first line.
From: df = pd.read_csv("$kmer_freqs", delimiter="\t")
To: df = pd.read_csv("$kmer_freqs", delimiter="\t", skiprows=[0])
And now it works fine for me.
I just wanted to note this issue if anyone else encountered it!
The text was updated successfully, but these errors were encountered:
Hi!
I was getting the following error for both the test data and my own data:
It seemed to be caused by a first line in the freqs.txt that was not being skipped (see below), so the dataframe in the umap_hdbscan.py script did not get loaded in correctly.
I changed line 11 of umap_hdbscan.py to skip the first line.
From:
df = pd.read_csv("$kmer_freqs", delimiter="\t")
To:
df = pd.read_csv("$kmer_freqs", delimiter="\t", skiprows=[0])
And now it works fine for me.
I just wanted to note this issue if anyone else encountered it!
The text was updated successfully, but these errors were encountered: