Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poppunk --threshold raises error #106

Closed
SilasK opened this issue Oct 2, 2020 · 17 comments
Closed

poppunk --threshold raises error #106

SilasK opened this issue Oct 2, 2020 · 17 comments

Comments

@SilasK
Copy link

SilasK commented Oct 2, 2020

I got the following error
with
PopPUNK 2.1.1


(poppunk) [kiesers@login2 Pangenome]$ poppunk --threshold 0.0205 --distances Am/Am.dists --output Am --ref-db Am --full-db --ignore-length --max-a-dis 1

Graph-tools OpenMP parallelisation enabled: with 1 threads
PopPUNK (POPulation Partitioning Using Nucleotide Kmers)
        (with backend: sketchlib v1.4.0
         sketchlib: /home/kiesers/scratch/miniconda3/envs/poppunk/lib/python3.8/site-packages/pp_sketchlib.cpython-38-x86_64-linux-gnu.so)
Mode: Applying a core distance threshold

Traceback (most recent call last):
  File "/home/kiesers/scratch/miniconda3/envs/poppunk/bin/poppunk", line 10, in <module>
    sys.exit(main())
  File "/home/kiesers/scratch/miniconda3/envs/poppunk/lib/python3.8/site-packages/PopPUNK/__main__.py", line 438, in main
    assignments = new_model.apply_threshold(distMat, args.threshold)
  File "/home/kiesers/scratch/miniconda3/envs/poppunk/lib/python3.8/site-packages/PopPUNK/models.py", line 653, in apply_threshold
    y = self.assign(X)
  File "/home/kiesers/scratch/miniconda3/envs/poppunk/lib/python3.8/site-packages/PopPUNK/models.py", line 745, in assign
    y = pp_sketchlib.assignThreshold(X/self.scale, 0, self.core_boundary, 0, cpus)
TypeError: assignThreshold(): incompatible function arguments. The following argument types are supported:
    1. (distMat: numpy.ndarray[float32[m, n], flags.writeable, flags.c_contiguous], slope: int, x_max: float, y_max: float, num_threads: int = 1) -> numpy.ndarray[float32[m, 1]]

Invoked with: array([[0.00373481, 0.37379527],
       [0.01729033, 0.49502736],
       [0.01865711, 0.50816035],
       ...,
       [0.02465274, 0.27738789],
       [0.02404468, 0.30117333],
       [0.01657796, 0.25241569]]), 0, 0.0205, 0, 1

The database was built with

poppunk --easy-run --r-files ref_genome_list.tsv --output Am --threads 8 --plot-fit 5 --min-k 13 --full-db --ignore-length --max-a-dis 1

@johnlees
Copy link
Member

johnlees commented Oct 2, 2020

Could you first try upgrading to PopPUNK v2.2.0 (with sketchlib v1.5.1) and rerunning? If you still get the error, I can take a look as it's probably an interface bug that needs fixing

@SilasK
Copy link
Author

SilasK commented Oct 2, 2020

Now with the updated version(s).

(poppunk) [kiesers@login2 Pangenome]$ poppunk --threshold 0.0205 --distances Am/Am.dists --output Am --ref-db Am --full-db  --max-a-dist 1

Graph-tools OpenMP parallelisation enabled: with 1 threads
PopPUNK (POPulation Partitioning Using Nucleotide Kmers)
        (with backend: sketchlib v1.5.1
         sketchlib: /home/kiesers/scratch/miniconda3/envs/poppunk/lib/python3.8/site-packages/pp_sketchlib.cpython-38-x86_64-linux-gnu.so)
Mode: Applying a core distance threshold

Illegal instruction

@johnlees
Copy link
Member

johnlees commented Oct 2, 2020

Hmm, that's a little odd, I wonder if there is a vectorised function your CPU doesn't have. What is the result if you run cat /proc/cpuinfo? (assuming you're on Linux)

@SilasK
Copy link
Author

SilasK commented Oct 2, 2020

I get this info for all processors


processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 45
model name      : Genuine Intel(R) CPU  @ 2.60GHz
stepping        : 5
microcode       : 0x513
cpu MHz         : 3298.876
cache size      : 20480 KB
physical id     : 0
siblings        : 16
core id         : 0
cpu cores       : 8
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts
bogomips        : 5187.04
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

@johnlees
Copy link
Member

johnlees commented Oct 2, 2020

Thanks for that - I can't see any obvious missing instructions.

I will have a look into this and see if I can replicate/debug, and get back to you soon.

@johnlees
Copy link
Member

johnlees commented Oct 2, 2020

I replicated your first error - it was due to mismatched data types in numpy (float vs double). I've got a fix on the master branch of PopPUNK which is now working for me (e95a720). You can clone that now and run with python poppunk-runner.py, or I will make a release through bioconda in the next week or two.

Let me know if you run into the illegal instruction issue again by reopening the issue.

@johnlees johnlees closed this as completed Oct 2, 2020
@SilasK
Copy link
Author

SilasK commented Oct 5, 2020

I cloned the repo and installed it python setup.py install and continue having the same error (now with v 2.2.1).

@johnlees
Copy link
Member

johnlees commented Oct 5, 2020 via email

@SilasK
Copy link
Author

SilasK commented Oct 5, 2020 via email

@johnlees
Copy link
Member

johnlees commented Oct 5, 2020

Ah ok. Would you be able to attach your data so I can try and replicate? The contents of the Am/ directory should be sufficient. Could you also confirm the commit hash of PopPUNK you are using (git log), the versions of both pp-sketchlib and PopPUNK, and the return from conda list?

@johnlees johnlees reopened this Oct 5, 2020
@SilasK
Copy link
Author

SilasK commented Oct 6, 2020

Ok, I probably didn't install poppunk correctly from the git. Havind done this I get 'only' the illegal instruction error. You said this is something due to my processor?

I got now
PopPUNK 2.2.1

Graph-tools OpenMP parallelisation enabled: with 1 threads
PopPUNK (POPulation Partitioning Using Nucleotide Kmers)
(with backend: sketchlib v1.5.1

@SilasK
Copy link
Author

SilasK commented Oct 6, 2020

Worse, I get the illegal instruction also by just generating the db 🤔

@johnlees
Copy link
Member

johnlees commented Oct 6, 2020

It could be a problem with the compiler generating code not supported by your CPU (but, from what you posted above, I can't see what that might be). It may also be another bug which is mimicking this behaviour, but it's hard to know without going in the debugger. Unfortunately this all works on my machine, and the testing server, so I'm not sure how to reproduce the error myself.

Can I suggest three things:

  1. Try installing pp-sketchlib from source. This will compile for your CPU target directly.
  2. Run again, but put valgrind before the command, and post the output back here.
  3. Generate a core dump and post it here. First run ulimit -c unlimited. On crash a core dump will be made, see /sbin/sysctl kernel.core_pattern to find where the file is.

For 1, you will want to first install the necessary conda packages:

conda install cmake pybind11 hdf5 highfive Eigen armadillo

Then, as before, clone the github and run:

python setup.py install

@johnlees
Copy link
Member

johnlees commented Oct 6, 2020

(as a note to self: could this be caused by use of simd in the openmp pragma here: https://github.com/johnlees/pp-sketchlib/blob/master/src/dist/matrix_ops.cpp#L129)

@johnlees
Copy link
Member

I am now getting the same issue on some azure builds, e.g. https://dev.azure.com/jlees/PopPUNK/_build/results?buildId=382&view=results

I thought it may be graph-tool, but error above suggests that is working ok. Todo: make a debug build on azure and get a backtrace. This could be a segfault/misuse of blas. Also run cat /proc/cpuinfo in job

@johnlees
Copy link
Member

Seems to no longer be an issue in the azure builds. I got this error locally building the docs, and the culprit was graph_tool

@johnlees
Copy link
Member

I think this is resolved, but please reopen is observed again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants