-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arc based atom bonds #94
base: master
Are you sure you want to change the base?
Conversation
Okay upon further investigation some things stall the progress a bit.
|
Hey @douweschulte, Is work on this still active? I would like to utilise it for a project I am working on. Are you in need of any help? |
This PR has stalled a bit, mainly because I was not able to figure out how to make it work properly and because I did not have a use case for it myself. So if you could explain what you want to do with it more we could revise the code to support that use case. For the actual implementation then if you feel courageous you could try to tackle this yourself, but it is quite involved so be warned. Otherwise I could work on an implementation, but that would be after my holidays in a couple of weeks. |
Thanks for getting back to me, @douweschulte. I think that the non-par implementation will actually suffice for my use case - it would be cool to see this in the future tho. 😄 |
Do you need anything more than can be provided by the function that is already in place? With this one you can query all bonds and get the pairs of atoms connected by each bond. If needed this function could be reworked to give mutable references as well I suppose. But this means that you need to keep the pdb struct at hands and loop over all bonds to find the ones you are interested in. But if this is what you need generating a couple of user friendly functions which try to prevent doing useless work should be quite easy. The main difference with this stalled PR is that in the current version the PDB has a list of all bonds with all ids of each atom involved and when the bonds are queried it uses that to find references to all atoms involved, while this PR tried to change the behaviour to make it so that all atoms have a list of all other atoms they bind saved to their own struct. As this last options involves keeping pointers to other structs in different places in the PDB hierarchy this is fundamentally at odds with the rust philosophy and that is why it is stalled. Sorry it took a while for me to read back into what I was trying to accomplish here. Let me know what your intended use case is so that we can decide what the best path forward is. |
Actually, the change in behaviour for referenced Atoms within each Atom individually would make what I am doing much easier, but I understand the conceptual dilemma with implementing that. I am working on the functionality to build a graph from the PDB object, and while technically possible to do this now, the time complexity is far from optimal with so many lookups. |
Internally all bonds are stored as a single list of the IDs of both atoms and the type. Would having access to this list directly make your life easier? This makes it so that you do not have to spend time looking up the atoms beforehand, and would make it possible to get mutable references if you have mut access to the above layer(s) in your code. I might've able to get this in before holidays if neede. Alternatively given that you get the full list of all bonds if you would make two dictionaries one indexed by the first atom and one by the second solve your problem (or any kind of organised tree layout to make lookups fast)? I this way you will only pay for the atom reference lookup once, and I think I already made it log lookup time thanks to sorting. And the dictionaries together essentially for a full graph of all bonds. But using it like this maybe if structered very differently from how you want to use it. Also if I remember correctly only 'special' bonds are saved in the pdb files and those are the ones that are present in the output from the mentioned function. If all normal bonds are needed (think: amino acid backbone, internal amino acid bonds) then some way of figuring those out will have to be found first, possibly with the |
Direct access to the adjacency list might be useful, but please don't prioritise this if you have other things to finish before your time off because the current method would be the same time complexity I think. Creating two hash tables for target + source is an interesting idea tho, I cannot remove the lookups altogether but this would at least make it constant. Part of the problem with what I am trying to do comes from my adjacency matrix and the fact that currently each node is assigned a sequential usize index - if I was able to map the index to the atom serial number for example, then that would improve runtime, but this comes with it's own pitfalls. To get around this I can build a HashMap on top and map the node index to the serial number, but when I have a molecule with >100k atoms (plus, not every atom has a bond anyway) I don't know if this would scale very well xD. But this is a problem that has to be fixed on my end, not within pdbtbx. I am pretty new to this, but I believe that the graph DS I am working on could facilitate network analysis for bond inferencing. if this is something you envision as a feature for this crate, I would be more than happy to copy it across at some point and implement some of the algorithms. As a side point, I am conscious of polluting the PR comments with unnecessary verbiage. Do you (or plan to) have a Discord or something for further discussions around pdbtbx? Or is this ok? |
Good to hear that the hashmaps could help you solve some time complexity. We are indeed polluting this thread a bit, opening a separate issue at some point is warranted. I do not have an active discord but if you want to continue the discussion on a bit more informal/private note using my email (in the cargo.toml) is the best way. If you want we could also schedule a call at some point when I am back to talk about the possibilities for your code and inclusion into pdbtbx. For now I am quite interested in the idea of including this here. Have a nice summer, I will be back in a couple of weeks. |
Fixes #80.
The bonds are now implemented using
Arc
, meaning they behave in the way I envisioned them to be. The major problem before merging this into the main branch is that I do not like how it works withHierarchies
.Left to do:
Arc
in Hierarchies (preventing a double lookup)