Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modified resulting sequences in CSV #6

Open
EfresBR opened this issue Mar 5, 2024 · 0 comments
Open

Modified resulting sequences in CSV #6

EfresBR opened this issue Mar 5, 2024 · 0 comments

Comments

@EfresBR
Copy link

EfresBR commented Mar 5, 2024

Hello

I am using the package to evaluate potential G4s.

Starting from a fasta file with an entry such as: GGTGGGTAGTTTGACTGGGGCGG

I analyze using
python3 G4Boost.py -f Sequences.fasta --maxloop 20 --minloop 0 --maxG 4 --minG 1 --loops 10 --noreverse --classifier G4Boost_classifier.json --regressor G4Boost_regressor.json

The result is a gff and a csv.
In the gff the results are:
Sequence_1 0 23 Sequence_1_0_23 23 + GGTGGGTAGTTTGACTGGGGCGG

In the csv however, the G4motif is modified and reduced, missing two Gs in the middle of the PQS (the motif goes form a length of 23 to 21).
GGTGGGTAGTTTGACTGGGGCGG --> GGtgGGtagtttgactGGcGG

Is this normal?
Why does this happen? It is changing the sequence.
Is there a way to obtain the G4-pred; G4-prob and mfe-pred of the entire imputed motif? (the 23 length motif and not the 21-long "modified" which is not really what i want to evaluate?)

Also, regarding G4-topology prediction, the algorithm is designed to give the Gs predicted to be part of G-runs (Gs in mayuscules), but not its actual predicted topology (parallel, antiparallel or hybrid), correct?

Thanks for the time

EBR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant