Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New biomol fields #400
base: develop
Are you sure you want to change the base?
New biomol fields #400
Changes from 2 commits
eca24df
572a29f
173172f
5911d53
05a8fed
76f690c
1f5650d
a8781ac
ac6ed65
55f71e2
534ec8d
f865a5a
af3c817
616019c
bd3e9e1
ce93c9d
a875aaf
a1ddd88
a0ee16a
c0ea1ac
af14d72
7610146
c38075c
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the same sequence occurs twice in a chain. Should the sequence be listed here twice or just once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add here that the order of the values should also be the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me the indentation levels were not consistent so I try to correct them here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you not duplicating data here? It seems that the sequences already occur in the biomol_chains field. It would perhaps be better to make a reference from the biomol_chains to these sequences.
Is the main point of this field not to enable querying on the sequences?
If that is the case, it may be better to set the query ability to SHOULD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, data would be duplicated.
I was thinking that it could be better specifying which chains (and even which residues) are included on each sequence, by their indices, and then removing the sequences and sequence_types fields on biomol_chains.
Actually, instead of making new fields for this we could reshape sequences like a list of dictionaries, to make it coherent with the other biomol field formats. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could indeed turn the sequences into a dictionary.
I was still wondering whether there is a clear hierarchical structure? (That a chain can have one or more sequences, but a sequence can not contain more than one chain? In that case, we would not need to duplicate the residues.
I was also wondering how the situation is handled, where there are multiple chains with the same sequence.
Will there be multiple sequence entries with the same sequence ? My idea was to make a sequence unique, so it only occurs once in the biomol_sequence field. But if you include the residues, you would need to have a separate sequence entry for each chain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my previous suggestion the chains property is a list of integers so a sequence may contain more than one chain. This is important since a polymer may be splitted in several chains. In the other hand, a chain may contain the whole sequence of a polymer and more things at the same time, so not all residues in the chain would be part of the sequence. Then listing residues in the chain makes sense.
I did not think about this and you are right. Then we can forget about residues to make everything easier.