Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quantmsio and thermorawfileparser #80

Open
ypriverol opened this issue Oct 21, 2024 · 11 comments
Open

quantmsio and thermorawfileparser #80

ypriverol opened this issue Oct 21, 2024 · 11 comments
Assignees

Comments

@ypriverol
Copy link
Member

We have a schema for mz data, which is similar to @lazear representation issue #57. We are moving thermorawfileparser to dotnet and it would be great to take the opportunity to include this parquet representation on it. I recommend the following roadmap:

  • Get approval from everyone about the parquet format @timosachsenberg @mobiusklein @lazear @daichengxin @zprobot .
  • Update the PR in Thermorawfile parser to export to the version of the format.
  • benchmark with raw and mzML, the format with different dataset sizes for the following variables:
    • compression.
    • data retrieval,
    • data writing.
@ypriverol ypriverol self-assigned this Oct 21, 2024
@lazear
Copy link
Collaborator

lazear commented Oct 21, 2024

I have a (private) fork of TRFP that already writes the mz_parquet format, which I am happy to contribute

@ypriverol
Copy link
Member Author

ypriverol commented Oct 21, 2024

That is fantastic, I think the best way is to contribute to the new dotnet we are creating https://github.com/compomics/ThermoRawFileParser/tree/dotnetcore. Dotnet is now the platform for the new MSReader, that I think is better to base everything against that. What do you think?

@lazear
Copy link
Collaborator

lazear commented Oct 21, 2024

Should be easy enough, I have it on dotnet 7 right now.

@ypriverol
Copy link
Member Author

ypriverol commented Oct 21, 2024

@lazear if you can modify that branch with the parquet implementation making sure it matches your specification and quantms.io specification. I can try to trigger the benchmark.

@timosachsenberg
Copy link
Contributor

Cool. FYI I started playing around with ways to store both spectra and chromatograms in
e.g., a single parquet file. This is mainly intended to look at some potential ways forward with the HuPO PSI mzNext initiative. Once the white paper with the requirements are published I will give this a closer look.
https://github.com/timosachsenberg/mzNext-POC

@ypriverol
Copy link
Member Author

Should be easy enough, I have it on dotnet 7 right now.

@lazear would be cool to merge your branch with this branch in TRFP https://github.com/compomics/ThermoRawFileParser/tree/dotnetcore

@lazear
Copy link
Collaborator

lazear commented Nov 15, 2024

So it turns out that what I have is not a fork of TRFP but a separate implementation in C#. Let me see if I can adapt it

@ypriverol
Copy link
Member Author

That would be great, we are trying to work to release a new dotnet version of TRFP.

@lazear
Copy link
Collaborator

lazear commented Nov 15, 2024

@ypriverol
Copy link
Member Author

@lazear, thanks for the PR. Last time we discussed mz_parquet and the alignment between your representation in mz_parquet and quantms.io representation (which I decided to adopt yours), you mentioned you have a private version. will be trying to push to update the representation. My question is: Is the TRFPe the latest version of the mz_parquet definition?

@lazear
Copy link
Collaborator

lazear commented Nov 16, 2024

Yes, this is the most recent version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants