Modified resulting sequences in CSV #6

EfresBR · 2024-03-05T11:32:24Z

Hello

I am using the package to evaluate potential G4s.

Starting from a fasta file with an entry such as: GGTGGGTAGTTTGACTGGGGCGG

I analyze using
python3 G4Boost.py -f Sequences.fasta --maxloop 20 --minloop 0 --maxG 4 --minG 1 --loops 10 --noreverse --classifier G4Boost_classifier.json --regressor G4Boost_regressor.json

The result is a gff and a csv.
In the gff the results are:
Sequence_1 0 23 Sequence_1_0_23 23 + GGTGGGTAGTTTGACTGGGGCGG

In the csv however, the G4motif is modified and reduced, missing two Gs in the middle of the PQS (the motif goes form a length of 23 to 21).
GGTGGGTAGTTTGACTGGGGCGG --> GGtgGGtagtttgactGGcGG

Is this normal?
Why does this happen? It is changing the sequence.
Is there a way to obtain the G4-pred; G4-prob and mfe-pred of the entire imputed motif? (the 23 length motif and not the 21-long "modified" which is not really what i want to evaluate?)

Also, regarding G4-topology prediction, the algorithm is designed to give the Gs predicted to be part of G-runs (Gs in mayuscules), but not its actual predicted topology (parallel, antiparallel or hybrid), correct?

Thanks for the time

EBR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modified resulting sequences in CSV #6

Modified resulting sequences in CSV #6

EfresBR commented Mar 5, 2024 •

edited

Loading

Modified resulting sequences in CSV #6

Modified resulting sequences in CSV #6

Comments

EfresBR commented Mar 5, 2024 • edited Loading

EfresBR commented Mar 5, 2024 •

edited

Loading