Skip to content

Reviewers section

Diego De Panis edited this page Feb 20, 2024 · 14 revisions

Important aspects to keep in mind:

  • Your task as a reviewer is to evaluate the assembly using mainly the information available in the EAR.

  • The reviewing process should not be time-consuming. We expect the assemblies reaching this point to be in good shape.

  • The bottom line is that if the assembly meets the EBP Assembly Standards (v5), considering the particularities of the case, it should be approved.

  • You can request all the clarifications, updates and corrections you deem necessary. Be kind and remember that, like in every other working space, we follow the ERGA Code of Conduct during the reviewing process.

  • It is the reviewer who decides the meaning of the metrics or warnings shown in the report. Everything happens in the context of each particular assembly. The PDF report does not provide an approved status itself.

  • The report guides the reviewer throughout the process and presents metrics and warning flags in a standardised way. Be prepared for edge cases or warnings not being displayed. The PDF report is created by a code based on rules, but the rules are subject to refinement and exceptions, and the code can contain errors, so you must be vigilant.


Going through the EAR

In addition to what is explained in the example structure of the EAR, here you can find other details:

Tags: Check that a valid tag is being displayed. At the moment, the Tags field only displays the particular ERGA project to which the species belongs. Valid Tags are ERGA-BGE, ERGA-Pilot and ERGA-Satellite.

Species table: ToLID and Species name are manually provided. Class and Order are retrieved from GoaT (notice that sometimes only some of the data will be available on the site).

Traits table: Summarises observed and expected data. Deviations should raise warnings in the section below.

Summary section: The EBP metric is calculated as floor(log10(Contig N50)).floor(log10(Scaffold N50)).QV(floor(QV)) on each haplotype of the curated assembly. The reviewer will check if the N50 score corresponds to C when the case is required. The warnings are designed to bring the attention of the reviewer to that specific point. The reviewer should quickly double-check the warning in the corresponding section from which it is coming.

The following warnings will be automatically flagged based on expected/observed values:

  • Final assembly size has more than 20% difference from the obtained with Genoscope
  • Observed Haploid number is different to the one retrieved from GoaT
  • Ploidy number obtained from Smudgeplot (or Genomescope) is different to the one retrieved from GoaT
  • Observed sex is different from the recorded Sample sex

The following warnings will be automatically flagged for each haplotype:

  • QV value is less than 40
  • Kmer completeness value is less than 90
  • BUSCO single copy value is less than 90
  • BUSCO duplicated value is more than 5
  • There is more than 3% loss in the size of the curated assembly in comparison with the pre-curation
  • More than 1000 gaps/Gbp
  • 90% of the assembly is not in chromosomes, inferred by comparing Scaffold L90 and the observed haploid number

All the curation notes are manually provided. They should provide insight to help understand the assembly process.

Quality metrics table: The values are obtained from gfastats, Merqury and BUSCO. The epigraph below the table shows the BUSCO version and lineage used for all the assemblies. A warning will be printed if there are inconsistencies across versions or lineages.

HiC contact maps section: Shows PNG snapshots of the post-curation assembly and provides a link to .pretext and/or .mcool files to properly load the map and swiftly check for issues. Important: HiC maps must be analysed by opening the .pretext/.mcool file through the link available in the PDF and walking the diagonal to spot issues. Please check the Rapid Curation guide if you need to refresh the interpretation of HiC contact maps.

Kmer spectra section: Merqury Kmer plots are not automatically analysed. The reviewer should check them for signs of issues.

Contamination screening: Blob plots are not automatically analysed to raise flags. The reviewer should check them for signs of issues, also taking into consideration the curation notes.

Data coverage table: As of today, warning flags were not added for sequencing coverage. BGE-recommended sequencing recipes are HiFi 25x, HiC 50x, and ONT&Illumina 60&60x. Ultimately, it is the obtained quality of the assembly that determines if the sequencing is deep enough.

Pipelines sections: Both assembly and curation tools and versions here are shown to help give context to the overall process.

Ending section: The EAR ends with information about the submitter and a timestamp of the creation of the document.


The Pull Request space during reviewing

.