Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in `ggplot2::geom_bar() #191

Open
pfeutry opened this issue Sep 13, 2024 · 8 comments
Open

Error in `ggplot2::geom_bar() #191

pfeutry opened this issue Sep 13, 2024 · 8 comments

Comments

@pfeutry
Copy link

pfeutry commented Sep 13, 2024

I was running the filter_rad() on a new dataset and got the following error

Filter genotyping threshold: 0.2
Number of individuals / strata / chrom / locus / SNP:
Before: 658 / 3 / 1 / 5478 / 5478
Blacklisted: 0 / 0 / 0 / 32 / 32
After: 658 / 3 / 1 / 5446 / 5446

Computation time, overall: 18 sec
######################### completed filter_genotyping ##########################
################################################################################
###################### radiator::filter_snp_position_read ######################
################################################################################
Execution date@time: 20240913@1606
Function call and arguments stored in: [email protected]
2 steps to visualize and filter the data based on the number of SNP on the read/locus:
Step 1. Visualization (boxplot, distribution
Step 2. Threshold selection
Generating SNP position on read stats
Saving 17.5 x 10 cm image
Error in ggplot2::geom_bar():
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error in check_aesthetics():
! Aesthetics must be either length 1 or the same as the data
(5446).
✖ Fix the following mappings: x.
Run rlang::last_trace() to see where the error occurred.
Warning messages:
1: In ggplot2::scale_y_log10(labels = scales::number_format(), oob = scales::squish_infinite) :
log-10 transformation introduced infinite values.
2: There was 1 warning in dplyr::mutate().
ℹ In argument: WHITELISTED_MARKERS = purrr::map_int(...).
Caused by warning:
! Using one column matrices in filter() was deprecated in dplyr
1.1.0.
ℹ Please use one dimensional logical vectors instead.
ℹ The deprecated feature was likely used in the radiator package.
Please report the issue at
https://github.com/thierrygosselin/radiator/issues.
This warning is displayed once every 8 hours.
Call lifecycle::last_lifecycle_warnings() to see where this warning
was generated.

Command line used was:

data <- radiator::filter_rad(data="Report_DWs24-9586_Counts.csv",
strata = "Whale_Shark_Strata.txt" )

devtools::session_info()
─ Session info ─────────────────────────────────────────────────────
setting value
version R version 4.3.2 (2023-10-31)
os Ubuntu 22.04.4 LTS
system x86_64, linux-gnu
ui RStudio
language (EN)
collate en_AU.UTF-8
ctype en_AU.UTF-8
tz Australia/Canberra
date 2024-09-13
rstudio 2023.06.1+524.pro1 Mountain Hydrangea (server)
pandoc 2.9.2.1 @ /usr/bin/pandoc

─ Packages ─────────────────────────────────────────────────────────
package * version date (UTC) lib source
ade4 1.7-22 2023-02-06 [2] CRAN (R 4.3.2)
adegenet 2.1.10 2023-01-26 [2] CRAN (R 4.3.2)
ape 5.7-1 2023-03-13 [2] CRAN (R 4.3.2)
backports 1.5.0 2024-05-23 [1] CRAN (R 4.3.2)
BiocGenerics 0.48.1 2023-11-01 [2] Bioconductor
BiocManager 1.30.25 2024-08-28 [1] CRAN (R 4.3.2)
Biostrings 2.70.1 2023-10-25 [2] Bioconductor
bit 4.0.5 2022-11-15 [2] CRAN (R 4.3.2)
bit64 4.0.5 2020-08-30 [2] CRAN (R 4.3.2)
bitops 1.0-8 2024-07-29 [1] CRAN (R 4.3.2)
boot 1.3-28.1 2022-11-22 [2] CRAN (R 4.3.2)
broom 1.0.6 2024-05-17 [1] CRAN (R 4.3.2)
cachem 1.0.8 2023-05-01 [2] CRAN (R 4.3.2)
callr 3.7.3 2022-11-02 [2] CRAN (R 4.3.2)
cli 3.6.3 2024-06-21 [1] CRAN (R 4.3.2)
cluster 2.1.4 2022-08-22 [2] CRAN (R 4.3.2)
codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.2)
colorspace 2.1-1 2024-07-26 [1] CRAN (R 4.3.2)
crayon 1.5.3 2024-06-20 [1] CRAN (R 4.3.2)
data.table 1.16.0 2024-08-27 [1] CRAN (R 4.3.2)
devtools 2.4.5 2022-10-11 [2] CRAN (R 4.3.2)
digest 0.6.37 2024-08-19 [1] CRAN (R 4.3.2)
dplyr 1.1.4 2023-11-17 [1] CRAN (R 4.3.2)
ellipsis 0.3.2 2021-04-29 [2] CRAN (R 4.3.2)
fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.2)
farver 2.1.2 2024-05-13 [1] CRAN (R 4.3.2)
fastmap 1.1.1 2023-02-24 [2] CRAN (R 4.3.2)
foreach 1.5.2 2022-02-02 [2] CRAN (R 4.3.2)
fs 1.6.3 2023-07-20 [2] CRAN (R 4.3.2)
fst 0.9.8 2022-02-08 [1] CRAN (R 4.3.2)
fstcore * 0.9.18 2023-12-02 [1] CRAN (R 4.3.2)
gdsfmt 1.38.0 2023-10-24 [1] Bioconductor
generics 0.1.3 2022-07-05 [2] CRAN (R 4.3.2)
GenomeInfoDb 1.38.1 2023-11-08 [2] Bioconductor
GenomeInfoDbData 1.2.11 2023-11-10 [2] Bioconductor
GenomicRanges 1.54.1 2023-10-29 [2] Bioconductor
ggplot2 3.5.1 2024-04-23 [1] CRAN (R 4.3.2)
glmnet 4.1-8 2023-08-22 [2] CRAN (R 4.3.2)
glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.2)
gridExtra 2.3 2017-09-09 [2] CRAN (R 4.3.2)
gtable 0.3.5 2024-04-22 [1] CRAN (R 4.3.2)
HardyWeinberg 1.7.8 2024-04-06 [1] CRAN (R 4.3.2)
hms 1.1.3 2023-03-21 [2] CRAN (R 4.3.2)
htmltools 0.5.7 2023-11-03 [2] CRAN (R 4.3.2)
htmlwidgets 1.6.2 2023-03-17 [2] CRAN (R 4.3.2)
httpuv 1.6.12 2023-10-23 [2] CRAN (R 4.3.2)
igraph 2.0.3 2024-03-13 [2] CRAN (R 4.3.2)
IRanges 2.36.0 2023-10-24 [2] Bioconductor
iterators 1.0.14 2022-02-05 [2] CRAN (R 4.3.2)
jomo 2.7-6 2023-04-15 [1] CRAN (R 4.3.2)
labeling 0.4.3 2023-08-29 [2] CRAN (R 4.3.2)
later 1.3.1 2023-05-02 [2] CRAN (R 4.3.2)
lattice 0.22-5 2023-10-24 [2] CRAN (R 4.3.2)
lifecycle 1.0.4 2023-11-07 [2] CRAN (R 4.3.2)
lme4 1.1-35.5 2024-07-03 [1] CRAN (R 4.3.2)
magrittr 2.0.3 2022-03-30 [2] CRAN (R 4.3.2)
MASS 7.3-60 2023-05-04 [2] CRAN (R 4.3.2)
Matrix 1.6-1.1 2023-09-18 [2] CRAN (R 4.3.2)
matrixStats 1.1.0 2023-11-07 [2] CRAN (R 4.3.2)
memoise 2.0.1 2021-11-26 [2] CRAN (R 4.3.2)
mgcv 1.9-0 2023-07-11 [2] CRAN (R 4.3.2)
mice 3.16.0 2023-06-05 [1] CRAN (R 4.3.2)
mime 0.12 2021-09-28 [2] CRAN (R 4.3.2)
miniUI 0.1.1.1 2018-05-18 [2] CRAN (R 4.3.2)
minqa 1.2.8 2024-08-17 [1] CRAN (R 4.3.2)
mitml 0.4-5 2023-03-08 [1] CRAN (R 4.3.2)
munsell 0.5.1 2024-04-01 [1] CRAN (R 4.3.2)
nlme 3.1-163 2023-08-09 [2] CRAN (R 4.3.2)
nloptr 2.1.1 2024-06-25 [1] CRAN (R 4.3.2)
nnet 7.3-19 2023-05-03 [2] CRAN (R 4.3.2)
pan 1.9 2023-12-07 [1] CRAN (R 4.3.2)
permute 0.9-7 2022-01-27 [2] CRAN (R 4.3.2)
pillar 1.9.0 2023-03-22 [2] CRAN (R 4.3.2)
pkgbuild 1.4.2 2023-06-26 [2] CRAN (R 4.3.2)
pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.3.2)
pkgload 1.3.3 2023-09-22 [2] CRAN (R 4.3.2)
plyr 1.8.9 2023-10-02 [2] CRAN (R 4.3.2)
prettyunits 1.2.0 2023-09-24 [2] CRAN (R 4.3.2)
processx 3.8.2 2023-06-30 [2] CRAN (R 4.3.2)
profvis 0.3.8 2023-05-02 [2] CRAN (R 4.3.2)
promises 1.2.1 2023-08-10 [2] CRAN (R 4.3.2)
ps 1.7.5 2023-04-18 [2] CRAN (R 4.3.2)
purrr 1.0.2 2023-08-10 [2] CRAN (R 4.3.2)
R6 2.5.1 2021-08-19 [2] CRAN (R 4.3.2)
radiator 1.3.4 2024-09-13 [1] Github (3a8cb4f)
ragg 1.2.6 2023-10-10 [2] CRAN (R 4.3.2)
Rcpp 1.0.13 2024-07-17 [1] CRAN (R 4.3.2)
RCurl 1.98-1.16 2024-07-11 [1] CRAN (R 4.3.2)
readr 2.1.5 2024-01-10 [1] CRAN (R 4.3.2)
remotes 2.5.0 2024-03-17 [1] CRAN (R 4.3.2)
reshape2 1.4.4 2020-04-09 [2] CRAN (R 4.3.2)
rlang 1.1.4 2024-06-04 [1] CRAN (R 4.3.2)
rpart 4.1.21 2023-10-09 [2] CRAN (R 4.3.2)
Rsolnp 1.16 2015-12-28 [1] CRAN (R 4.3.2)
rstudioapi 0.15.0 2023-07-07 [2] CRAN (R 4.3.2)
S4Vectors 0.40.1 2023-10-26 [2] Bioconductor
scales 1.3.0 2023-11-28 [1] CRAN (R 4.3.2)
SeqArray 1.42.4 2024-04-03 [1] Bioconductor 3.18 (R 4.3.2)
seqinr 4.2-30 2023-04-05 [2] CRAN (R 4.3.2)
sessioninfo 1.2.2 2021-12-06 [2] CRAN (R 4.3.2)
shape 1.4.6.1 2024-02-23 [1] CRAN (R 4.3.2)
shiny 1.7.5.1 2023-10-14 [2] CRAN (R 4.3.2)
SNPRelate 1.36.1 2024-02-26 [1] Bioconductor 3.18 (R 4.3.2)
stringi 1.8.4 2024-05-06 [1] CRAN (R 4.3.2)
stringr 1.5.1 2023-11-14 [1] CRAN (R 4.3.2)
survival 3.5-7 2023-08-14 [2] CRAN (R 4.3.2)
systemfonts 1.0.5 2023-10-09 [2] CRAN (R 4.3.2)
textshaping 0.3.7 2023-10-09 [2] CRAN (R 4.3.2)
tibble 3.2.1 2023-03-20 [2] CRAN (R 4.3.2)
tidyr 1.3.1 2024-01-24 [1] CRAN (R 4.3.2)
tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.3.2)
truncnorm 1.0-9 2023-03-20 [2] CRAN (R 4.3.2)
tzdb 0.4.0 2023-05-12 [2] CRAN (R 4.3.2)
UpSetR 1.4.0 2019-05-22 [1] CRAN (R 4.3.2)
urlchecker 1.0.1 2021-11-30 [2] CRAN (R 4.3.2)
usethis 2.2.2 2023-07-06 [2] CRAN (R 4.3.2)
utf8 1.2.4 2023-10-22 [2] CRAN (R 4.3.2)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.2)
vegan 2.6-4 2022-10-11 [2] CRAN (R 4.3.2)
vroom 1.6.5 2023-12-05 [1] CRAN (R 4.3.2)
withr 3.0.1 2024-07-31 [1] CRAN (R 4.3.2)
xtable 1.8-4 2019-04-21 [2] CRAN (R 4.3.2)
XVector 0.42.0 2023-10-24 [2] Bioconductor
zlibbioc 1.48.0 2023-10-24 [2] Bioconductor

[1] /home/feu003/R/x86_64-pc-linux-gnu-library/4.3
[2] /apps/R/4.3.2/lib/R/library

Happy to send the data if required

@thierrygosselin
Copy link
Owner

Surprised you even got that far! With your dataset I got an error very early while reading the DArT file:

################################################################################
############################# radiator::read_dart ##############################
################################################################################
Execution date@time: 20241107@2039
Folder created: read_dart_20241107@2039
File written: [email protected]             
Reading DArT file...
    Number of individuals: 658                                      
Analyzing strata file                                               
    Number of strata: 3                                             
    Number of individuals: 658
Using individuals in strata file to filter individuals in DArT file
Number of blacklisted samples: 0
Error in `import_dart()` at radiator/R/dart.R:371:3:
! 
Problem tidying DArT dataset: contact author
Run `rlang::last_trace()` to see where the error occurred.

Computation time, overall: 1 sec
############################# completed read_dart ##############################

@thierrygosselin
Copy link
Owner

Using the other file you sent:

> data <- radiator::read_dart(data = "Report_DWs24-9586_Counts_mod.csv", strata = "Whale_Shark_Strata.txt", verbose = TRUE)
################################################################################
############################# radiator::read_dart ##############################
################################################################################
Execution date@time: 20241107@2042
Folder created: read_dart_20241107@2042
File written: [email protected]             
Reading DArT file...
    Number of individuals: 658                                      
Analyzing strata file                                               
    Number of strata: 3                                             
    Number of individuals: 658
Using individuals in strata file to filter individuals in DArT file
Number of blacklisted samples: 0

DArT characteristics:
DArT SNP format: alleles coverage in 2 Rows counts
fstcore package v0.9.18
(OpenMP detected, using 56 threads)
File written: [email protected]
Generating genotypes and calibrating REF/ALT alleles...
Number of markers recalibrated based on counts of allele read depth: 1664
Generating GDS...
File written: [email protected]                        
done!

Genotypes formats generated with 8057 SNPs: 
    GT_BIN (the dosage of ALT allele: 0, 1, 2 NA): TRUE
    GT_VCF (the genotype coding VCFs: 0/0, 0/1, 1/1, ./.): FALSE
    GT_VCF_NUC (the genotype coding in VCFs, but with nucleotides: A/C, ./.): FALSE
    GT (the genotype coding 'a la genepop': 001002, 001001, 000000): FALSE
################################### SUMMARY ####################################

Number of chrom: 1
Number of locus: 8057
Number of SNPs: 8057
Number of strata: 3
Number of individuals: 658

Number of ind/strata:
Madagascar = 50
South_Africa = 2
Ningaloo = 606

Number of duplicate id: 0

Computation time, overall: 27 sec
############################# completed read_dart ##############################

@thierrygosselin
Copy link
Owner

I assume the first file is the one you received from DArT and the one with _mod is the one you modified to be read by radiator ?

Report_DWs24-9586_Counts.csv
Report_DWs24-9586_Counts_mod.csv

If it's the case, do you think it's an error from DArT or we will likely see in the future this modified format from them ?

@thierrygosselin
Copy link
Owner

I was able to reproduce the error which is very specific to your dataset ...

The reproducibility was very strange for a DArT dataset.
The coverage of markers, very weird, nothing I've seen with DArT so far...

################################################################################
###################### radiator::filter_snp_position_read ######################
################################################################################
Execution date@time: 20241107@2058
Function call and arguments stored in: [email protected]
2 steps to visualize and filter the data based on the number of SNP on the read/locus:
Step 1. Visualization (boxplot, distribution
Step 2. Threshold selection
Generating SNP position on read stats
Saving 32.9 x 10 cm image
Error in `ggplot2::geom_bar()` at radiator/R/filter_snp_position_read.R:239:5:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error in `check_aesthetics()`:
! Aesthetics must be either length 1 or the same as the data
  (6162).
✖ Fix the following mappings: `x`.
Run `rlang::last_trace()` to see where the error occurred.
Warning message:
In ggplot2::scale_y_log10(labels = scales::number_format(), oob = scales::squish_infinite) :
  log-10 transformation introduced infinite values.

Computation time, overall: 1 sec
###################### completed filter_snp_position_read ######################

@thierrygosselin
Copy link
Owner

thierrygosselin commented Nov 8, 2024

I've had a look and the problem above.

It's generated when using this file : Report_DWs24-9586_Counts_mod.csv in filter_rad and more precisely in this part of the filtering pipeline: radiator::filter_snp_position_read

What I see so far is that radiator is generating a lot of NA regarding the position of the SNP on the read sequence position. So obviously something is not read correctly...

It's been more than 4 years since radiator as read incorrectly a DArT file, consequently, I'm going to wait until you've answered the questions above...

@thierrygosselin
Copy link
Owner

I can easily adapt the script to read it correctly, just want to make sure it's not a one time format...

@thierrygosselin
Copy link
Owner

IMPORTANT
Don't use the datasets, modified or not.
It's really not compatible.

The new DArT format in Report_DWs24-9586_Counts.csv:

  • 2 new column names: MarkerName and Variant
  • MarkerName is similar to the old AlleleID
  • AlleleID column is missing
  • SnpPosition column is missing but remains embedded in MarkerName like it was in AlleleID, but not all the times
  • MarkerName: not consistent. Up to row 116 it's similar to this: 100013135|F|0--33:T>G in the file Report_DWs24-9586_Counts.csv and after it's more like this for remaining rows: 100258_198 . This prevent the extraction of useful DArT information and generate parsing problems because not all rows are coded the same.
  • Up until now DArT was making things complicated not using the more useful and accepted VCF format, but was nonetheless consistent with its naming scheme...

Which begs the question: is this a legitimate unmodified DArT file or it was modified and / or combined by someone ?

@pfeutry
Copy link
Author

pfeutry commented Nov 8, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants