-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PGFinder 2.0 upgrade #259
Comments
Hi @smesnage, Do you have copies of the CSV files you would like users to be able to download please? |
Nope but I can make these. I will! |
Morning, There looks to be two aspects for the Database builder.... Building Block componentsThis is the top box on the left hand side and on the last slide there are screen shots of a couple of CSVs with the title .csv files to download for users to amend. If you're able to construct these I can start looking at getting them added. Muropeptide listThis is the bottom box on the left hand side and lists...
Its possible they might be some of the files in lib/pgfinder/masses but if not or there are now more available if you could pass them on that would be great. Cheers, |
Hi @smesnage I've had a look through the files under various data directories in the repository and can only find masses for...
But these look slightly different to the examples in the last slide as they are the masses of the items in the orange box. I was wondering though can I take the
If I'm understanding correctly the work here is to build these from their components which are defined in the green CSV... And the items in the orange CSV get constructed automatically from the weights of the components in the green? Aside from the functionality to providing default files for people to customise we'll need to add the functionality to construct the masses of those in the orange from the constituents in the green I think. If you could let me know if I've understood this correctly that would be great and if the sample structures for E coli and C difficile are incorrect I'd need those as well as the sample masses for green. Cheers, |
You're correct. I hope it makes sense. |
Thanks @smesnage 👍 |
Hi @smesnage , Been working on this and have a decent idea of how to proceed but some questions about the example you shared. MuropeptidesThese are the structures...
Reference MassesThese are the reference masses (with empty column and description stripped as they're not needed for the task I'm trying to achieve).
Approach
Simple Example 1
I can calculate this ok. Simple Example 2
I can calculate this ok. Things get complicated!
I've not sussed this one out yet because decomposing the muropeptides into their components requires splitting the string. This can be done very simply by splitting every character but that is overkill and the reference masses have things like This means we need to leverage Regular Expressions to split the strings of the muropeptides into their building blocks. This looks for patterns on which to split, some obvious ones are If not that simplifies things and we can focus on splitting everything upto the first space in the muropeptides encoding and ignore everything after, although there is one instance where I noticed also there is |
That's a bit more complicated than it seems. Explanations are below to calculate these structures. But the worrying thing is that you should not bother about this because Brooks has already written the piece of code that does this? Simple Example 2 gm-AEJA=gm-AEJA (Anh) (4-3) |2 |
Thanks for the quick response @smesnage So the @TheLostLambda has indeed given me access to his work and I copied it to the repository Mesnage-Org/smithereens. Its written in a language I don't know (Rust) so I've not been able to follow what its doing and get my head round it properly. I could see that its using atomic masses of elements accounting for different isotopes and that there is a muropetide module in there too. @TheLostLambda do you have any documentation on how to use ❱ smithereens
Molecule: gm
× expected a chemical formula (optionally followed by a '+' or '-' and a particle offset), or a standalone particle offset
├─▶ × expected a particle (like p or e), optionally preceded by a number
│
╰─▶ × the particle "g" could not be found in the supplied atomic database
help: double-check for typos, or add a new entry to the atomic database
╭────
1 │ gm
· ┬
· ╰── particle not found
╰────
Molecule: gm-gm
× expected a chemical formula (optionally followed by a '+' or '-' and a particle offset), or a standalone particle offset
├─▶ × expected a particle (like p or e), optionally preceded by a number
│
╰─▶ × the particle "g" could not be found in the supplied atomic database
help: double-check for typos, or add a new entry to the atomic database
╭────
1 │ gm-gm
· ┬
· ╰── particle not found
╰────
Molecule: gm-AEJ
× expected a chemical formula (optionally followed by a '+' or '-' and a particle offset), or a standalone particle offset
├─▶ × expected a particle (like p or e), optionally preceded by a number
│
╰─▶ × the particle "g" could not be found in the supplied atomic database
help: double-check for typos, or add a new entry to the atomic database
╭────
1 │ gm-AEJ
· ┬
· ╰── particle not found
╰────
Molecule: H2O
Monoisotopic Mass: 18.010565
Average Mass: 18.0153
Charge: 0
Molecule: CH4
Monoisotopic Mass: 16.031300
Average Mass: 16.0425
Charge: 0
Molecule: C6H12O6
Monoisotopic Mass: 180.063388
Average Mass: 180.1561
Charge: 0
Molecule: I can get some basic output from it but how to access the muropeptide builder side of things? As well as no tests documentation appears thin on the ground. As an aside... As neat and fast as Rust is using multiple languages adds another layer of complexity to the development and long term maintenance of software (particularly as people with limited programming experience may be involved in the future). I'm not adverse to this where its required but I'm not sure the advantages of Rust (memory safe and fast due to being compiled) are of any benefit here. I'm happy to be convinced otherwise though. |
Hi @ns-rse ! Sorry for the delayed response! First bit that might be confusing: the With that being said, the most up-to-date grammar exists here: https://github.com/TheLostLambda/smithereens/tree/main/grammar — the HTML is very nice to look at. I have more information about the semantics of these structures elsewhere, but that's something that I still need to implement in code. For the purposes of getting this PGFinder 2.0 out and to unblock a paper, I think all you'll need is to copy this pre-compiled Once this third-iteration of code is done, it should be drop-in replacement for the On Rust, I'll admit this is me selfishly trying to reduce the amount of duplicate work done in Python. For PGFinder (the Python application + web UI), it is annoying to introduce this third language into the mix, but the reasons I think it's currently the best path are:
In the long-term then, I'm planning to avoid having too many languages by replacing Python with Rust, and this current three-language state is a compromise to get PGFinder 2.0 and the corresponding paper out before that greater project is complete (which could be some months still). I'm happy to chat more or have a meeting if I sound crazy or if people have more questions, but that's how I've been thinking of things at least! |
Asynchronicity is fine by me and 7hrs isn't long. Thanks for the update on all these aspects, I'll check out the branch and start working out how to do things. How are the artefacts in that folder created? As I wrote I'm not adverse to using Rust going forward and that there are existing tools that are linking in and being leveraged is a good reason. My perspective is very much about maintainability of the code, you won't always be around or available to work on issues and so regardless of the language used it will be important to have not just a working tool but also...
In this regard the following guidelines used in reviewing software are useful...
One obvious omission I've noticed is the lack of a license applied to |
Hi @ns-rse! The artefacts in that folder are created by I agree with the importance of maintainability here! I'm making large effort in that regard with this third iteration — you need to use the command in the I've not yet put that workflow into CI form, since it's currently very much in single-developer prototype mode, and documentation hasn't been written just yet because it's not clear what will be public API just yet! The code is split into several stand-alone libraries that will eventually be pulled together by a single application, but most of these parts will be reusable for other purposes. Thanks for pointing out the license! I actually didn't know that it defaulted to something closed! |
Just a quick comment to follow up on Brooks' messages. We're in a quite special situation with the addition of the masscalc and fragment predictor in the WebUI. Given this preambule, I suggest taking pragmatic approach:
Sometimes a dirty job is good enough! |
Hi @TheLostLambda and @smesnage Apologies for the slight delay, I'm trying to time-box work on projects to specific days as I juggle multiple projects as I've a bad habit of getting sucked in and spending too much time on one thing when I should be working on others. Wednesday is my PGFinder day so will forge ahead tomorrow with reviewing the PR and understanding Web Assembly. Wanted to say thought that its good to hear you're working on the tests and documentation of everything @TheLostLambda and that I understand the need to get something that is working out @smesnage . |
I've made some progress with my understanding of JavaScript/TypeScript/HTML/CSS/Svelte/Rust/WebAssembly Work-in-progress is on the In checking what is required though I see that in the mock-up PowerPoint slides there are for the "Muropeptide list" some items under the "Built-In" which I'm not sure about and wanted to check. For the target structures I have a file with...
Do these cover all of the six species listed or is there meant to be a single file for each species and users are then able to select which or upload their own custom file? |
Hi,
I reply here because I have no clue where to find this on Github. Total
mystery.
The list below is just to provide the user with a sample of structures that
show the syntax that describes peptidoglycan fragments. There were some
mistakes but I have checked with Brooks and the list belon in bold is the
right one. Ooops.
Structure
*gmgm *CHANGED — drop the `-` or use `~` instead
*gm(Anh)*
*gm(DeAc)g(DeAc)mgm (DeAc)*
*gm-AEJ*
*gm-AEJA*
*gm-AEJG*
*gm-AEJAG*
*gm-AEJKR*
*gm-AEJ (Anh)* CHANGED — needed a space!
*gm-AEJA (Anh)* CHANGED — needed a space!
*gmgm-AEJA* CHANGED — drop the `-` or use `~` instead
*gm-AEJA=gm-AEJA (4-3) *
*gm-AEJ=gm-AEJA (3-3)* CHANGED — wrong order, 3-3 implies the donor is the
left / first structure
*gm-AEJ=gm-AEJ (3-3) *
*gm-AEJA=gm-AEJKR (4-3) *CHANGED — wrong order, 4-3 implies the donor is
the left / first structure
*gm-AEJA=gm-AEJA (Anh) (4-3) *
*gm-AEJA=gm-AEJAG (4-3)* CHANGED — wrong order, 4-3 implies the donor is
the left / first structure
*gm-AEJ=gm-AEJKR (Anh) (3-3)* CHANGED — wrong order, 3-3 implies the donor
is the left / first structure
*gmgm-AEJA=gm-AEJA (4-3) *CHANGED — drop the `-` or use `~` instead
*gm(DeAc)-AEJA=gm-AEJA (4-3)* CHANGED — `-Ac` to `DeAc`
*gm-AEJ=gm(Ac)-AEJA (Anh) (3-3)* CHANGED — `-Ac` to `DeAc` and wrong order,
3-3 implies the donor is the left / first structure
The rules are as follows:
the Anh, DeAc modifications apply to the residues that precedes them OR can
be on either residue if you add a space;
g(DeAc)m means that g is deacetylated, m is not
gm(DeAc) means m is deacetylated, g is not
gm (DeAc) means either m is deacetylated
Brooks' script will be able to spot mistakes and fix them so let's provide
this model syntax and let's not go over the top with documentation for now.
I can certainly do it and I would rather you spent your time on stuff that
only you can do ($$$$$...).
Let me know if you have any questions, I hope this is helping!
St
I don't understand the question below:
*Do these cover all of the six species listed or is there meant to be a
single file for each species and users are then able to select which or
upload their own custom file?*
I believe these examples are representative of the diversity of fragments
that users will include in their database.
Let me know if you have any questions!
|
Hi @smesnage
Thanks for that, I think the finer details of what is included can be wrangled at a later date I was just curious if there would be multiple files from which users could choose or nor (as is the case in the existing functionality) as that influences how the dialogue would be created.
I feel like I'm slowly getting the hang of how the website framework functions and hope to have the layout in place after another day on it next week (hooking it into so that the "Build database" button does what it needs to leveraging the smithereens programme would be the next step after that, I'm trying to walk before running!).
|
Ah, I understand the question now. In fact you already asked and the answer
is yes, I need to provide these for model organisms:
- *Bacillus subtilis*
- *Staphylococcus aureus*
*- Enterococcus faecalis*
*- Enterococcus faecium*
St
|
Cool, thanks for the confirmation.
I can simply make dummy files for the time being and they can be replaced
once you've got them ready.
|
@TheLostLambda : I've hit an impasse with Svelte and am unsure how to proceed. Web development is not something I've done before (nor are the JavaScript/TypeScript languages) so your advice and guidance would be very much appreciated. Things I have UnderstoodHow to include dataThe mass data is loaded by functions in the How to add a cardTop-level page layout is in
Generalise some
|
@ns-rse Would you be available for a Google Meet or Zoom call soon? I can do early in my morning, to fit within your working hours? If not, I can write up a proper reply! Either way it's good to see some architecture diagramming come together! |
It would definitely be good to go through all of the documentation you've been writing on a call! But in the meantime briefly:
For |
Hmm, wrote a reply late last night but seemed not to have committed it. Sent you an invitation via Google to chat today at 16:00 (BST) if you're free.
Getting JavaScript and Python to play ball is all new to me.
Ah now this is something major that I'd not clocked, will check out that commit and see if I can test locally, having to use
Only get errors when the page fails to render. Don't even know what |
@ns-rse Whoops, I had an invite I saw at some point for now? |
Progress of sorts!
I'm now starting to work on how, once these have been selected to call a function that runs Smithereens. Attempted to adapt the logic in I'm relying heavily on the existing example you've written in // Reactively compute if Smithereens is ready
$: SmithereensReady = !loading && !processing && pyio.fragmentsLibrary !== undefined && pyio.muropeptidesLibrary !== undefined; Branch is |
Hi @ns-rse ! I'll jot down some notes here! It's very possible that you're just working with things in a different directory, but for me I needed to move the As for the Probably not a core issue, but something flagged up by my type-checker — the The The issue with the Custom Same missing I've pushed a commit with the working directory I ended up with, including some messy |
Oooh, I was close! Thank you for all of that really helpful.
Not sure about this, I've maintained the repository with
Thanks for the explanation/pointer that is going to be really useful. I haven't found the Svelte examples that informative yet. Once I've got it working as it currently is I'll perhaps look at replacing with RadioGroups.
I had a suspicion I had missed something, again thank you for finding and correcting.
Ahha, didn't know about adding debugging that way, thank you for the pointer. I hadn't got round to trying uploading custom CSV files yet either as I've been focusing on loading the default (dummy) files I've added. Good to know that is already working.
I thought I was missing something for
Spot on, got me back on track and I'll start working on using the selected files with Smithereens, thank you 🙏 |
Hi @TheLostLambda , I'm stumped again as I'm unsure what the functions/classes are that have been compiled into the Smithereens WebAssembly. I'm using the Rust WASM : hello world example as a basis and looking through the files there are...
|
Hi @ns-rse !
Let me know if that's the info you were after! Definitely check out what that example on |
Again really helpful. I'll have a look at this tomorrow and hope to make progress. Cheers 👍 |
Took a bit longer to look at this and whilst the example was useful I'm struggling to adapt it to work with Svelte. Currently I've at least two problems that I'm aware of. PythonProgressI've added the example files to I've put in place all the boiler plate code to add these to Svelte and I have a box which gets the JSON files, reads the metadata and allows selection of components and there is a "Build Database" button to build the database of muropeptide masses based on the provided mass library. ProblemI don't understand how to make this button reactive. For processing with PGFinder the I've added a Rust WASMThe example you pointed me to was useful so I added
This throws an error...
...and so I duly installed
...but I still get the above complaint/error that I need to use Thus aside from not yet being able to load the two libraries because of the Python problem, I wouldn't be able to do anything with them because I can't import the Rust WASM compiled I've gone round in many circles it feels tinkering with x, y and z and have read through the Rust WASM pages multiple times but not gained any insight. Any suggestions as to how to 1) run the Current state of play is on ns-rse/259-muropeptides-fragment 😕 |
Hi @ns-rse ! Working in reverse here, these web bundlers (like Vite, used here) have always been a pain-point for me as well, but there is a Vite plugin (a bit different from the one you've added) that I've used successfully before: https://github.com/nshen/vite-plugin-wasm-pack . That one is meant to work with Rust's wasm-pack in particular! These bundler plugins are configured in their own file, then will automatically resolve imports when it comes across them! Following the manual install process, I end up with: import { purgeCss } from 'vite-plugin-tailwind-purgecss';
import { sveltekit } from '@sveltejs/kit/vite';
import { defineConfig } from 'vite';
import wasmPack from 'vite-plugin-wasm-pack';
export default defineConfig({
plugins: [sveltekit(), purgeCss(), wasmPack('./smithereens')],
worker: {
format: 'es'
}
}); (I moved the Then in import init, { Peptidoglycan, pg_to_fragments } from 'smithereens';
// ...
onMount(() => {
init().then(() => {
console.log("smithereens wasm loaded!");
})
})
// ...
function runSmithereensAnalysis() {
let pg = new Peptidoglycan("gm-AEJA")
console.log(`Monoisotopic Mass : ${pg.monoisotopic_mass()}`);
console.log(`Fragments :\n ${pg_to_fragments(pg)}`);
} To fix another bundler error, I needed add Here is the example I used to figure out that loading process! https://github.com/nshen/vite-plugin-wasm-pack/blob/main/example/src/index.ts After that, and on the current version of the branch I just pushed, things are working! You can call the smithereens WASM functions! As for the first half of your question, the PGFinder Pyodide code all runs in a separate "thread" from the main Javascript (called a web-worker: https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers), and the API of that is very minimalist. You can pass arbitrary messages back and forth, and it's up to you to write code that distinguishes between the different events being sent over the channel. So the most direct answer to your question might be adding a new message type, using the same With that being said, I think there is probably a better way to do this! I think we should create a new web-worker, since smithereens and pgfinder can (and should) work independently from each other! That would mean creating something like a Using the Sorry that's not super exhaustive, but hopefully it's enough to get you unstuck on the WASM loading front, and gives you a place to start with making a new web worker for smithereens to live in! Let me know if you have any more questions! |
Thanks for that @TheLostLambda will have a go at working through all this and no doubt get stuck again and be back with more questions. 😄 |
Some progress but have still hit a rock and have little idea of what I'm doing. Rust WASM
|
@ns-rse Just a quick bit since I'm now in a time-zone for semi-immediate help: For the first bit, I think you might be missing calling the init function for the WASM? pgfinder/web/src/routes/+page.svelte Lines 94 to 96 in c270df2
And that will need to be imported like this: pgfinder/web/src/routes/+page.svelte Line 30 in c270df2
(I'll have a closer look and your second part soon!) |
Okay, I read the second bit and I'm back in the UK! So hopefully that makes time-zone stuff a non-issue! Just let me know whenever you'd want to meet! |
Very much unfinished, attempted to do as [suggested](#259 (comment)) but didn't get very far. Details of failuers are in [comment](#259 (comment))
I've tried adding that its on line 35 (it looks like your local branch or at least the commit you've linked to is about four commits behind the If you're free this afternoon I'm around until 17:00 will put something in the calendar. |
From @ns-rse :
For this bit, it looks like you already have a pgfinder/web/src/routes/+page.svelte Line 187 in e91e14a
One issue with that particular line is that the If everything is plumbed up correctly in the component itself, then that should just work. In the component, you just need to make sure that "prop" is exported from the component:
value in bind:value is coming from!)
Then that has to be bound to the value of the radio boxes within the component's body:
bind:value is shorthand for bind:value={value} , I believe)
Then again in that component, you see the same pattern, and end up with: pgfinder/web/src/routes/BuiltinFragmentsSelector.svelte Lines 14 to 15 in e91e14a
Ultimately though, I think that's all set up correctly, so all that really needs fixing is that typo in the name of the
And with all of the rest of that said, perhaps I've misinterpreted your question! So here is another answer! If you're talking about how selecting built-in mass databases for PGFinder returns an object with only the filename and not the content (the If you upload a database from your computer, then you should see the Since we don't have these fragment / muropeptide databases built into If you want more information on doing this bit, just let me know and I'm happy to help more! |
Important message!! B. subtilis.csv |
Hi @smesnage I've been a bit quiet as I've had a lot of learning to do to work with the Svelte web-framework the site is implemented in but am very close to having something that works. It runs locally ok and a short demo video is shown below. I am incredibly indebted to @TheLostLambda for his patient guidance and assistance when I got stuck though. pgfinder_eg2.webmI've made a pull request to merge this into the
Officially my time on this work has ended but I would like to get the above sorted so that the work is complete. I can spend a couple of hours a week on working out how to adapt the Its been a really steep learning curve for me (never used JavaScript/TypeScript/Rust/WebAssembly/Playwright before) but I can see a path to having a work system. |
Closed by #285 |
PGFinder 2.0.pptx
The text was updated successfully, but these errors were encountered: