Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix to ensure dataframe indices still match after adjustments #551

Merged
merged 1 commit into from
Dec 8, 2020

Conversation

johnegarza
Copy link
Contributor

This PR will hopefully solve, or at least further the conversation on how to solve, the bug reported in #436 and possibly #547. For reference, my command, run within the etal/cnvkit:0.9.7 docker container, was /usr/bin/python3 /code/cnvkit.py batch /inputs/1455918947/tumor.bam --normal /inputs/29426665/normal.bam --targets /inputs/876494580/hla_and_brca_genes_bait.interval_list --method hybrid. I am happy to provide these input files if you'd like. The resulting error trace:

Traceback (most recent call last):
  File "/code/cnvkit.py", line 9, in <module>
    args.func(args)
  File "/code/cnvlib/commands.py", line 138, in _cmd_batch
    pool.submit(batch.batch_run_sample,
  File "/code/cnvlib/parallel.py", line 19, in submit
    return SerialFuture(func(*args))
  File "/code/cnvlib/batch.py", line 186, in batch_run_sample
    segments = segmentation.do_segmentation(cnarr, segment_method,
  File "/code/cnvlib/segmentation/__init__.py", line 61, in do_segmentation
    rets = list(pool.map(_ds, ((ca, method, threshold, variants,
  File "/code/cnvlib/segmentation/__init__.py", line 89, in _ds
    return _do_segmentation(*args)
  File "/code/cnvlib/segmentation/__init__.py", line 174, in _do_segmentation
    seg_out = core.call_quiet(rscript_path,
  File "/code/cnvlib/core.py", line 31, in call_quiet
    raise RuntimeError("Subprocess command failed:\n$ %s\n\n%s"
RuntimeError: Subprocess command failed:
$ Rscript --no-restore --no-environ /tmp/tmp6alxd4pm

b'Loading probe coverages into a data frame\nWarning message:\nIn CNA(cbind(tbl$log2), tbl$chromosome, tbl$start, data.type = "logratio",  :\n  markers with missing chrom and/or maploc removed\n\nSegmenting the probe data\nError in segment(cna, weights = tbl$weight, alpha = 1e-04) : \n  length of weights should be the same as the number of probes\nExecution halted\n'

The issue begins when a mask is applied to drop poor quality bins in cnvlib/fix.py. In my case, examination of the dataframes after masking showed that their indices were no longer a continuous series from 0, 1, 2, ... x, but instead had gaps. While there are gaps, at this point in the code, both cnarr and ref_matched still have matching indices. However, a later section of the code may pass cnarr to the function center_by_window and set its value to the function's return value. Within center_by_window, the index of cnarr is reset. No such reset is applied to ref_matched. This means that when the log2 column of ref_matched is subtracted from the log2 column of cnarr, there are "gaps" in ref_matched relative to cnarr, and the values are set to NaN in cnarr. Modified code with output to demonstrate:

Code

    logging.info('cnarr log2:')
    logging.info(cnarr.data['log2'])
    logging.info('ref matched log2:')
    logging.info(ref_matched['log2'])
    
    cnarr.data['log2'] -= ref_matched[log2_key]

    logging.info('cnarr, but with gaps:')
    logging.info(cnarr.data['log2'])

Logs

cnarr log2:
0     0.838088
1    -1.612942
2     0.729092
3     0.092408
...
ref matched log2:
0     0.252140
2     0.000000
3     0.000000
...
cnarr, but with gaps:
0     0.585948
1          NaN
2     0.729092
3     0.092408

Thus, NaN values are introduced to the log2 column of cnarr, resulting in the Rscript error shown above. This PR adds a flag and conditional that resets the index of ref_matched if the index of cnarr has been reset by a call to center_by_window. This should be safe, since center_by_window preserves the original order of the rows, and I believe sufficient to solve the problem; however, there may be some things I missed, and I'd appreciate your thoughts.

@etal etal merged commit a8564c0 into etal:master Dec 8, 2020
@etal
Copy link
Owner

etal commented Dec 8, 2020

Thanks so much for triaging this issue and submitting a fix! Your description makes sense and the code looks good to me, so I've merged it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants