Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about the test data #3

Open
kailingli opened this issue Feb 25, 2023 · 3 comments
Open

about the test data #3

kailingli opened this issue Feb 25, 2023 · 3 comments

Comments

@kailingli
Copy link

Hi,

In the "promoters_seq_example.bed" file, what is the 4th and 5th column?
I assume 1st-3rd column is "chr", "start", "end" and the 6th is "strand"

I was trying to use my own bed file with only "chr, start, end, strand" to do the prediction but failed. Can you help me with this?
Thank you so much!!!

--KL

@rochevin
Copy link
Collaborator

Hi,
The bed was generated automatically by export.bed function. The 4th column is the name (an id) and the 5th represent a score that is set to zero, as you can see here : https://genome.ucsc.edu/FAQ/FAQformat.html#format1

Hope it will help !
Best,
Vincent

@kailingli
Copy link
Author

Thank you for your reply. It seems the res <- DeepG4Scan(X = sequences,k=20,treshold=0.5) only work for sequences that longer than 200bp, when I use bedfile containing the sequence less than 200bp it will show this error message :

> res <- DeepG4Scan(X = sequences,k=20,treshold=0.5)
Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 1, 0

I was wondering is there any way to use one set of code to predict sequences less or longer than 200bp at the same time? Or i just need to separate them and run them twice.

@rochevin
Copy link
Collaborator

It seem that you are right, the code who subset a big sequence into smallers one should be responsible of this failure.

For me, it's better to separate the two sets of sequences because it will not tell you the same thing. For sequences with less than 200bp, you want to know if you may have an active G4 or not. For big sequences, you want to "locate" or at least know if you will have a potential active G4 at some location, for insteance if you scan a full promoter region.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants