"index 0 is out of bounds" ERROR in newref "train_gender_model" #24

rgiannico · 2018-11-29T14:42:30Z

Hi leraman,
I'm getting this weird error while using 'newref' on my 88 training samples:
$ WisecondorX newref *.npz myref.npz --nipt --binsize 50000 --cpus 12

[INFO - 2018-11-29 14:50:00]: Creating new reference
[INFO - 2018-11-29 14:50:00]: Importing data ...
[INFO - 2018-11-29 14:50:00]: Loading: Sample_01.npz
[INFO - 2018-11-29 14:50:00]: Binsize: 5000
[...]
[INFO - 2018-11-29 14:50:18]: Loading: Sample_88.npz
[INFO - 2018-11-29 14:50:18]: Binsize: 5000
Traceback (most recent call last):
  File "/storage/conda/anaconda2/envs/wisecondorx_v1.0.1/bin/WisecondorX", line 11, in <module>
    load_entry_point('WisecondorX==1.0.1', 'console_scripts', 'WisecondorX')()
  File "/storage/conda/anaconda2/envs/wisecondorx_v1.0.1/lib/python2.7/site-packages/wisecondorX/main.py", line 361, in main
    args.func(args)
  File "/storage/conda/anaconda2/envs/wisecondorx_v1.0.1/lib/python2.7/site-packages/wisecondorX/main.py", line 55, in tool_newref
    genders, trained_cutoff = train_gender_model(samples)
  File "/storage/conda/anaconda2/envs/wisecondorx_v1.0.1/lib/python2.7/site-packages/wisecondorX/newref_tools.py", line 48, in train_gender_model
    cut_off = gmm_x[local_min_i][0]
IndexError: index 0 is out of bounds for axis 0 with size 0

For debugging purposes I also added this line logging.info('function train_gender_model sorted_gmm_y: {} local_min_i: {} gmm_x: {}' .format(sorted_gmm_y, local_min_i, gmm_x)) to the newref.tools.py and it also printed out this:

[INFO - 2018-11-29 14:50:20]: function train_gender_model sorted_gmm_y: [1.83156350e-16 2.15748202e-16 2.54060382e-16 ... 0.00000000e+00
 0.00000000e+00 0.00000000e+00] local_min_i: (array([], dtype=int64),) gmm_x: [0.00000000e+00 4.00080016e-06 8.00160032e-06 ... 1.99919984e-02
 1.99959992e-02 2.00000000e-02]

Do you have any idea on what is going on?
If you need more debugging prints just tell me :)

The text was updated successfully, but these errors were encountered:

leraman · 2018-11-29T16:33:07Z

Hi @rgiannico,

This function fits a Gaussian mixture model with two components to the Y-read-fraction, which is used to separate male from female feti.

I'm not quite sure what's going on yet, but we could try two things: you could share your .npz files, and I'll push an update to make WisecondorX robust against it, or if that's not a possibility, would you mind uncommenting the 'plotting code', and re-run the software locally? It should yield an image like this: (could you share this image?)

rgiannico · 2018-11-30T09:46:26Z

Thank you @leraman ,
I actually had to add 3 lines to the plotting script to fix a couple or errors (I report here with comments if you are interested):

    import matplotlib           # I added this an the following line
    matplotlib.use('Agg')       # I need this to fix a matplotlib error discussed here: https://stackoverflow.com/questions/37604289/tkinter-tclerror-no-display-name-and-no-display-environment-variable
    import matplotlib.pyplot as plt
    fig, ax = plt.subplots(figsize=(10, 6))
    ax.hist(y_fractions, bins=50, normed=True)
    ax.plot(gmm_x, gmm_y, 'r-', label='Gaussian mixture fit')
    ax.set_xlim([0.001, 0.01])
    ax.legend(loc='best')
    plt.savefig('gender_model_gaussian.png')    # I need it because I'm in a server without X11
    plt.show()

An this is my gaussian:

I suppose it could it be related to the fact I only have 12 male fetuses over the total 88 NIPT samples.
I can ask permission to send you my 88 npz files, but ... just an idea... don't you think a better solution could be to let the user define with a metadata file the fetal sex for each training sample instead of guessing?
Do you think this is possible or you need my npz files?

Thank you :)

leraman · 2018-11-30T09:57:10Z

Hi @rgiannico

Indeed, you probably have to add more male cases. The problem is, WisecondorX looks for a local minimum in the bimodel, and there is none. I didn't release this was possible. I'll look into it. In the meantime, maybe these parameters will work, at line 28:
gmm = GaussianMixture(n_components=2, covariance_type='full', reg_covar=1e-99, max_iter=10000, tol=1e-99)

Anyway, as the manual states, it's always a good idea to try to include more or less the same amount of males as females. Nevertheless, manual gender assignment during reference creation could indeed be a solution, I'll think about it.

rgiannico · 2018-11-30T11:03:38Z

Ok thank you @leraman ,
Great! It's working now, there is a local minimum and the reference has been produced!

Two more questions to better understand how much "important" is this training samples gender bias:

Do you think I can use this "max_iter=10000, tol=1e-99" reference for prediction step without more code modifications or it could lead to wrong predictions on test samples (I suppose mostly on fetal sex prediction)?
I also had planned some more female fetus training samples, they are already extracted and ready for sequencing, do you suggest to procede sequencing and just use those new parameters for reference creation to fix the bias? Or you strongly and absolutely suggest NOT to add more female fetus to the training pool to avoid feeding the gender bias?

Thank you

leraman · 2018-11-30T11:31:45Z

You can, but don't forget the reg_covar=1e-99. This won't lead to any 'wrong' predictions. The gender prediction is only used for the reference creation (for NIPT anyway): for the autosomal reference, all samples are used, however, only females are used for the gonosomal reference. For you, this only implies that fewer female samples will be used for the gonosomal reference than what's actually present in your set.
Well, the more reference samples the better I guess, yet, I would opt for male feti: we noted that normalization performance generally (and slightly) increased when using both female and male feti compared to using e.g. only female or only male feti.

Good luck!

chantisakee · 2019-09-11T03:47:31Z

Hi leraman,
I'm getting this weird error while using 'newref' on my 88 training samples:
$ WisecondorX newref *.npz myref.npz --nipt --binsize 50000 --cpus 12

[INFO - 2018-11-29 14:50:00]: Creating new reference
[INFO - 2018-11-29 14:50:00]: Importing data ...
[INFO - 2018-11-29 14:50:00]: Loading: Sample_01.npz
[INFO - 2018-11-29 14:50:00]: Binsize: 5000
[...]
[INFO - 2018-11-29 14:50:18]: Loading: Sample_88.npz
[INFO - 2018-11-29 14:50:18]: Binsize: 5000
Traceback (most recent call last):
  File "/storage/conda/anaconda2/envs/wisecondorx_v1.0.1/bin/WisecondorX", line 11, in <module>
    load_entry_point('WisecondorX==1.0.1', 'console_scripts', 'WisecondorX')()
  File "/storage/conda/anaconda2/envs/wisecondorx_v1.0.1/lib/python2.7/site-packages/wisecondorX/main.py", line 361, in main
    args.func(args)
  File "/storage/conda/anaconda2/envs/wisecondorx_v1.0.1/lib/python2.7/site-packages/wisecondorX/main.py", line 55, in tool_newref
    genders, trained_cutoff = train_gender_model(samples)
  File "/storage/conda/anaconda2/envs/wisecondorx_v1.0.1/lib/python2.7/site-packages/wisecondorX/newref_tools.py", line 48, in train_gender_model
    cut_off = gmm_x[local_min_i][0]
IndexError: index 0 is out of bounds for axis 0 with size 0

For debugging purposes I also added this line logging.info('function train_gender_model sorted_gmm_y: {} local_min_i: {} gmm_x: {}' .format(sorted_gmm_y, local_min_i, gmm_x)) to the newref.tools.py and it also printed out this:

[INFO - 2018-11-29 14:50:20]: function train_gender_model sorted_gmm_y: [1.83156350e-16 2.15748202e-16 2.54060382e-16 ... 0.00000000e+00
 0.00000000e+00 0.00000000e+00] local_min_i: (array([], dtype=int64),) gmm_x: [0.00000000e+00 4.00080016e-06 8.00160032e-06 ... 1.99919984e-02
 1.99959992e-02 2.00000000e-02]

Do you have any idea on what is going on?
If you need more debugging prints just tell me :)

Hi, I do have a same error as you
could you please tell me what would you finally do to fix the code?

Thanks in advance :)

leraman · 2019-09-11T07:55:28Z

Hi @chantisakee

Which version are you using? How much samples are in your reference? Did you include both male and female feti?

rgiannico · 2019-09-11T08:13:29Z

Hi @chantisakee ,
At the current version you should not get this error because leraman added the gmm = GaussianMixture(n_components=2, covariance_type='full', reg_covar=1e-99, max_iter=10000, tol=1e-99) line of code to the wisecondorX/newref_tools.py script to be more 'stringent' to discern between males and females feti.

I had this error because my training samples had unbalaced fetal sex (too many female feti compared to the male feti or vice-versa).
I suggest you to use the latest Wisecondorx version and make sure you have a more balanced fetal sex distribution for your training samples.

( p.s: nice dog though :P ^^ )

chantisakee · 2019-09-11T08:34:22Z

Hi @chantisakee

Which version are you using? How much samples are in your reference? Did you include both male and female feti?

i'm quit sure that using the latest version and i already found that the newref.py was modified as you were describe. But i still got the same error as this issue.
unfortunately, i have only 10 healthy samples for reference set creation. is that enough for using wisecondorx?
my purpose is finding Copy Number Variation not NIPT so my input samples are human WGS not from maternal cf-DNA (i mean pregnant woman) and i'm quite not sure about gender of my input data.

leraman · 2019-09-11T08:40:13Z

Hi @chantisakee

I believe 10 samples might be too small for the gaussian mixture model to work reliably. I'll implement a workaround so you can make a reference anyway.

chantisakee · 2019-09-11T08:43:27Z

Hi @chantisakee ,
At the current version you should not get this error because leraman added the gmm = GaussianMixture(n_components=2, covariance_type='full', reg_covar=1e-99, max_iter=10000, tol=1e-99) line of code to the wisecondorX/newref_tools.py script to be more 'stringent' to discern between males and females feti.

I had this error because my training samples had unbalaced fetal sex (too many female feti compared to the male feti or vice-versa).
I suggest you to use the latest Wisecondorx version and make sure you have a more balanced fetal sex distribution for your training samples.

( p.s: nice dog though :P ^^ )

Yeah, i found that the version that i already downloaded were modified ;w; but i still got an error. i'm quite not sure that is it from my input data or not. Unfortunately, I have only 10 healthy samples for reference creation and the gender of my input is also missing. My goal using this software is finding CNV from human WGS.
Btw, thanks for your answering ^-^ and the compliment for my dog lol :)

chantisakee · 2019-09-11T08:44:12Z

Hi @chantisakee

I believe 10 samples might be too small for the gaussian mixture model to work reliably. I'll implement a workaround so you can make a reference anyway.

Thank you very much for your help :)

leraman · 2019-09-11T12:17:02Z

Hi @chantisakee

I've updated WisecondorX. You can download the latest version using

pip install -U git+https://github.com/CenterForMedicalGeneticsGhent/WisecondorX

During reference creation, you can now manually set the chromosome Y fraction cutoff using --yfrac, which overrules Gaussian mixture modeling. I'm guessing (not sure) you only have female samples, so I would try --yfrac 1.

chantisakee · 2019-09-11T15:04:53Z

Hi @chantisakee

I've updated WisecondorX. You can download the latest version using
pip install -U git+https://github.com/CenterForMedicalGeneticsGhent/WisecondorX
During reference creation, you can now manually set the chromosome Y fraction cutoff using --yfrac, which overrules Gaussian mixture modeling. I'm guessing (not sure) you only have female samples, so I would try --yfrac 1.

Hi @leraman,
thanks so much for your help :}. Now reference creation step works fine.
but there are still some problems in prediction step and i got this error message..

[INFO - 2019-09-11 21:48:50]: Starting CNA prediction
[INFO - 2019-09-11 21:48:50]: Importing data ...
[INFO - 2019-09-11 21:48:51]: Normalizing autosomes ...
[INFO - 2019-09-11 21:50:05]: Normalizing gonosomes ...
[WARNING - 2019-09-11 21:51:16]: Non-numeric values found in weights -- reference too small. Circular binary segmentation and z-scoring will be unweighted
[INFO - 2019-09-11 21:51:16]: Executing circular binary segmentation ...
Error in parse_con(txt, bigint_as_char) :
lexical error: malformed number, a digit is required after the minus sign.
gender": "F", "results_r": [[-Infinity, -Infinity, -Infinity
(right here) ------^
Calls: read_json ... parse_json -> parse_and_simplify -> parseJSON -> parse_con
Execution halted
[CRITICAL - 2019-09-11 21:51:17]: Rscript failed: Command '['Rscript', '/tarafs/biobank/data/modules/.local/easybuild/software/Miniconda3/4.4.10/envs/wisecondorX/lib/python2.7/site-packages/wisecondorX/include/CBS.R', '--infile', '/tarafs/biobank/bio0001-human/NIPT/WISECONDORX/Thalassemia/DownSampling/script/HS06006_CBS_tmp_01.json']' returned non-zero exit status 1

what should do?

Thanks,
Chantisa

Ps. Do i have to create a new topic?

leraman · 2019-09-12T07:35:12Z

Can you take a look at your .npz files? Are you sure they are not empty? Which reference genome did you use during mapping?

chantisakee · 2019-09-23T04:00:22Z

Can you take a look at your .npz files? Are you sure they are not empty? Which reference genome did you use during mapping?

Hi leraman, sorry for late answering.
yes, you are right! i prepared test samples incorrectly and now it works fine :).

Anyway, I just wondering that what is the minimum bin size for copy number variation prediction by wisecondorX?? can i down to 2000 bp?

leraman · 2019-09-23T08:42:38Z

It depends on your sequencing depth. WisecondorX is developped for 15 kb and up, but if your coverage is >1x you might get good results for 2000 bp. Running time will increase though.

chantisakee · 2019-09-25T02:30:44Z

Thanks for your suggestion @leraman

so i've tried setting the bin size via reference set creation process as 2000 bp. After that I did CNA prediction process and it turned out that

[INFO - 2019-09-23 10:46:59]: Starting CNA prediction
[INFO - 2019-09-23 10:46:59]: Importing data ...
Traceback (most recent call last):
File "/tarafs/biobank/data/modules/.local/easybuild/software/Miniconda3/4.4.10/envs/wisecondorX/bin/WisecondorX", line 12, in
sys.exit(main())
File "/tarafs/biobank/data/modules/.local/easybuild/software/Miniconda3/4.4.10/envs/wisecondorX/lib/python2.7/site-packages/wisecondorX/main.py", line 400, in main
args.func(args)
File "/tarafs/biobank/data/modules/.local/easybuild/software/Miniconda3/4.4.10/envs/wisecondorX/lib/python2.7/site-packages/wisecondorX/main.py", line 155, in tool_test
if not ref_file['is_nipt']:
File "/tarafs/biobank/data/modules/.local/easybuild/software/Miniconda3/4.4.10/envs/wisecondorX/lib/python2.7/site-packages/numpy/lib/npyio.py", line 262, in getitem
raise KeyError("%s is not a file in the archive" % key)
KeyError: 'is_nipt is not a file in the archive'

rgiannico changed the title ~~train_gender_model index 0 is out of bounds error~~ "index 0 is out of bounds" ERROR in newref "train_gender_model" Nov 29, 2018

leraman mentioned this issue Dec 7, 2018

Updated params. Ignore warnings. #25

Merged

leraman closed this as completed Jan 16, 2019

leraman mentioned this issue Sep 11, 2019

Yfrac #48

Merged

leraman mentioned this issue Jan 5, 2021

Added --plotyfrac option and automatic CPA calculation #67

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"index 0 is out of bounds" ERROR in newref "train_gender_model" #24

"index 0 is out of bounds" ERROR in newref "train_gender_model" #24

rgiannico commented Nov 29, 2018 •

edited

Loading

leraman commented Nov 29, 2018 •

edited

Loading

rgiannico commented Nov 30, 2018 •

edited

Loading

leraman commented Nov 30, 2018 •

edited

Loading

rgiannico commented Nov 30, 2018 •

edited

Loading

leraman commented Nov 30, 2018

chantisakee commented Sep 11, 2019

leraman commented Sep 11, 2019

rgiannico commented Sep 11, 2019

chantisakee commented Sep 11, 2019 •

edited

Loading

leraman commented Sep 11, 2019

chantisakee commented Sep 11, 2019

chantisakee commented Sep 11, 2019

leraman commented Sep 11, 2019

chantisakee commented Sep 11, 2019

leraman commented Sep 12, 2019

chantisakee commented Sep 23, 2019

leraman commented Sep 23, 2019

chantisakee commented Sep 25, 2019

"index 0 is out of bounds" ERROR in newref "train_gender_model" #24

"index 0 is out of bounds" ERROR in newref "train_gender_model" #24

Comments

rgiannico commented Nov 29, 2018 • edited Loading

leraman commented Nov 29, 2018 • edited Loading

rgiannico commented Nov 30, 2018 • edited Loading

leraman commented Nov 30, 2018 • edited Loading

rgiannico commented Nov 30, 2018 • edited Loading

leraman commented Nov 30, 2018

chantisakee commented Sep 11, 2019

leraman commented Sep 11, 2019

rgiannico commented Sep 11, 2019

chantisakee commented Sep 11, 2019 • edited Loading

leraman commented Sep 11, 2019

chantisakee commented Sep 11, 2019

chantisakee commented Sep 11, 2019

leraman commented Sep 11, 2019

chantisakee commented Sep 11, 2019

leraman commented Sep 12, 2019

chantisakee commented Sep 23, 2019

leraman commented Sep 23, 2019

chantisakee commented Sep 25, 2019

rgiannico commented Nov 29, 2018 •

edited

Loading

leraman commented Nov 29, 2018 •

edited

Loading

rgiannico commented Nov 30, 2018 •

edited

Loading

leraman commented Nov 30, 2018 •

edited

Loading

rgiannico commented Nov 30, 2018 •

edited

Loading

chantisakee commented Sep 11, 2019 •

edited

Loading