WARNING: At least one BLAST run failed. ANIb may fail and The condensed distance matrix must contain only finite values #267

genomesandMGEs · 2021-05-20T08:25:09Z

Hey,
I got an error while running ANIb on a large dataset (>2k bacterial genomes).
I installed pyani via conda, version 0.2.10 and ran this command:

average_nucleotide_identity.py -i ./ -o ANIb_output -g --gformat svg,png -m ANIb --gmethod seaborn

This is the error I got:

WARNING: At least one BLAST run failed. ANIb may fail.
/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/site-packages/seaborn/matrix.py:649: UserWarning: Clustering large matrix with scipy. Installing fastcluster may give better performance.
warnings.warn(msg)
Traceback (most recent call last):
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/bin/average_nucleotide_identity.py", line 977, in
draw(methods[args.method][1], gfmt)
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/bin/average_nucleotide_identity.py", line 809, in draw
pyani_graphics.heatmap_seaborn(
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/site-packages/pyani/pyani_graphics.py", line 200, in heatmap_seaborn
fig = get_seaborn_clustermap(dfr, params, title=title)
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/site-packages/pyani/pyani_graphics.py", line 144, in get_seaborn_clustermap
fig = sns.clustermap(
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/site-packages/seaborn/_decorators.py", line 46, in inner_f
return f(**kwargs)
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/site-packages/seaborn/matrix.py", line 1408, in clustermap
return plotter.plot(metric=metric, method=method,
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/site-packages/seaborn/matrix.py", line 1221, in plot
self.plot_dendrograms(row_cluster, col_cluster, metric, method,
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/site-packages/seaborn/matrix.py", line 1066, in plot_dendrograms
self.dendrogram_row = dendrogram(
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/site-packages/seaborn/_decorators.py", line 46, in inner_f
return f(**kwargs)
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/site-packages/seaborn/matrix.py", line 774, in dendrogram
plotter = _DendrogramPlotter(data, linkage=linkage, axis=axis,
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/site-packages/seaborn/matrix.py", line 584, in init
self.linkage = self.calculated_linkage
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/site-packages/seaborn/matrix.py", line 651, in calculated_linkage
return self._calculate_linkage_scipy()
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/site-packages/seaborn/matrix.py", line 619, in _calculate_linkage_scipy
linkage = hierarchy.linkage(self.array, method=self.method,
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/site-packages/scipy/cluster/hierarchy.py", line 1065, in linkage
raise ValueError("The condensed distance matrix must contain only "
ValueError: The condensed distance matrix must contain only finite values.

Thanks!

The text was updated successfully, but these errors were encountered:

widdowquinn · 2021-05-20T09:05:29Z

Hi @genomesandMGEs - thank you for your interest in pyani.

I expect the issue is as the warning says: a BLAST run failed. Failure here includes writing no output, which can happen when there is no identifiable homology between two genomes (pyani may not distinguish between a BLAST run halting early, and a BLAST run producing zero matches). It is possible that, with a dataset of 2k bacterial genomes, you have at least one pair of genomes with this issue.

In your position, I would identify the pair(s) of genomes with no identifiable homology and modify the input dataset accordingly. There are several methods for achieving this, including running a tool like sourmash or fastANI to quickly identify very distant pairs of genomes.

I hope this is helpful to you.

L.

genomesandMGEs · 2021-05-20T09:20:33Z

Hi @widdowquinn ,

Thanks for the quick reply. I find it difficult that some pair of genomes have no identifiable homology, since all these genomes belong to the sames species, and I used a filter to only include genomes with a maximum distance to the reference of 0.05 (~95% ANI).

Is it possible that ANIb can't handle such large dataset? I'm trying to run with ANIm now to see if it works.

widdowquinn · 2021-05-20T09:47:11Z

I have run pyani on similarly-sized datasets, so I would not expect this to be the issue.

When you determine genome distance, do you take coverage/alignments fraction into account? (It’s possible to get falsely high identities because a very small region of genome is being aligned, for instance)

It’s worth keeping in mind also that existing species assignments can be inaccurate.

If you can identify the specific BLAST output that gives the error, that may help the diagnosis.

genomesandMGEs · 2021-05-20T11:34:53Z

That's a good point - I used panacota to donwload the genomes of interest and filter by the 0.05 max distance, but there's no mention of the coverage/alignment fraction cut-off. I just ran bindash on my 2009 genomes, and here's the first 10 hits. Table was sorted by col 5, that represents shared k-mers/total k-mers:

PSAE_0321_00764.fasta PSAE_0321_00952.fasta 2.8550e-02 0.0000e+00 0.378418
PSAE_0321_00150.fasta PSAE_0321_01107.fasta 2.8372e-02 0.0000e+00 0.380371
PSAE_0321_01067.fasta PSAE_0321_01107.fasta 2.8108e-02 0.0000e+00 0.383301
PSAE_0321_01107.fasta PSAE_0321_01613.fasta 2.8108e-02 0.0000e+00 0.383301
PSAE_0321_00952.fasta PSAE_0321_01751.fasta 2.8064e-02 0.0000e+00 0.383789
PSAE_0321_00232.fasta PSAE_0321_01107.fasta 2.7976e-02 0.0000e+00 0.384766
PSAE_0321_00764.fasta PSAE_0321_00926.fasta 2.7933e-02 0.0000e+00 0.385254
PSAE_0321_00956.fasta PSAE_0321_01107.fasta 2.7933e-02 0.0000e+00 0.385254
PSAE_0321_00764.fasta PSAE_0321_01791.fasta 2.7889e-02 0.0000e+00 0.385742
PSAE_0321_01107.fasta PSAE_0321_01651.fasta 2.7846e-02 0.0000e+00 0.38623

So, the genome pairs with the smallest fraction of shared k-mers is ~38%. Do you think this will be problematic for pyani's ANIb?

widdowquinn · 2021-05-20T12:49:41Z

You'll have a better idea of that if you, for instance, put those genomes into a folder with your reference and run ANIb, then inspect the BLAST output that gets written. If you use pyani's log output you should be able to reproduce each individual BLAST command, too to see if the BLAST command generates usable output outwith the context of pyani.

If BLAST falls over for whatever reason and doesn't produce an output, this will be problematic for pyani's implementation of ANIb; if there's not sufficient similarity between the genomes that is problematic for the ANIb algorithm in general.

widdowquinn · 2021-05-20T12:51:43Z

FWIW I try not to encourage ANIb - the ANIm algorithm is more robust and stable in part because it doesn't require arbitrary fragmentation of genomes, or for properties of the alignments of those specific (yet arbitrary) fragments to be met.

genomesandMGEs · 2021-05-20T13:42:10Z

Just a heads up - just ran ANIm using average_nucleotide_identity.py -i ./ -o ANIm_output -g --gformat svg,png --gmethod seaborn and also got an error:

Process ForkPoolWorker-20:
Process ForkPoolWorker-24:
Process ForkPoolWorker-6:
Process ForkPoolWorker-4:
Process ForkPoolWorker-10:
Process ForkPoolWorker-22:
Process ForkPoolWorker-26:
Process ForkPoolWorker-3:
Process ForkPoolWorker-19:
Process ForkPoolWorker-14:
Process ForkPoolWorker-28:
Process ForkPoolWorker-11:
Process ForkPoolWorker-27:
Process ForkPoolWorker-9:
Process ForkPoolWorker-7:
Process ForkPoolWorker-5:
Process ForkPoolWorker-30:
Process ForkPoolWorker-29:
Process ForkPoolWorker-16:
Process ForkPoolWorker-8:
Process ForkPoolWorker-23:
/var/spool/slurmd/job894724/slurm_script: line 17: 2828435 Killed average_nucleotide_identity.py -i ./ -o ANIm_output -g --gformat svg,png --gmethod seaborn
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/queues.py", line 378, in put
self._writer.send_bytes(obj)
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/queues.py", line 378, in put
self._writer.send_bytes(obj)
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 205, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
self._send(header + buf)
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 373, in _send
n = write(self._handle, buf)
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/queues.py", line 378, in put
self._writer.send_bytes(obj)
BrokenPipeError: [Errno 32] Broken pipe
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 205, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
self._send(header + buf)
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 373, in _send
n = write(self._handle, buf)
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
BrokenPipeError: [Errno 32] Broken pipe
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/queues.py", line 378, in put
self._writer.send_bytes(obj)
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 205, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
self._send(header + buf)
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 373, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/queues.py", line 378, in put
self._writer.send_bytes(obj)
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 205, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
self._send(header + buf)
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 373, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/queues.py", line 378, in put
self._writer.send_bytes(obj)
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 205, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
self._send(header + buf)
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 373, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/queues.py", line 378, in put
self._writer.send_bytes(obj)
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 205, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
self._send(header + buf)
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 373, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/pool.py", line 131, in worker
put((job, i, result))
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/queues.py", line 378, in put
self._writer.send_bytes(obj)
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 205, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
self._send(header + buf)
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 373, in _send
n = write(self._handle, buf)
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/queues.py", line 378, in put
self._writer.send_bytes(obj)
BrokenPipeError: [Errno 32] Broken pipe
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/queues.py", line 378, in put
self._writer.send_bytes(obj)
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 205, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 205, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
self._send(header + buf)
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
self._send(header + buf)
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 373, in _send
n = write(self._handle, buf)
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 373, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/queues.py", line 378, in put
self._writer.send_bytes(obj)
BrokenPipeError: [Errno 32] Broken pipe
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 205, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
self._send(header + buf)
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 373, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/queues.py", line 378, in put
self._writer.send_bytes(obj)
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 205, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
self._send(header + buf)
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/connection.py", line 373, in _send
n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
File "/gxfs_home/cau/sunzm592/anaconda3/envs/pyani/lib/python3.9/multiprocessing/queues.py", line 378, in put
self._writer.send_bytes(obj)