Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using DIA-NN results #11

Open
Maithy15 opened this issue Mar 24, 2023 · 4 comments
Open

Using DIA-NN results #11

Maithy15 opened this issue Mar 24, 2023 · 4 comments

Comments

@Maithy15
Copy link

Hi,

Thanks for the nice tool. It would be great if py_diaid could use DIA-NN output as well.

Thanks
Maithy

@Cajun-data
Copy link

This appears to have been added in the latest version of pyDIAID, however it isn't clear to me which file pyDIAID is expecting from DIA-NN. Furthermore, I am not certain pYDIAID is compatible with DIA-NN 1.9.2 output at the moment. It would be great if a developer could chime in on some of these questions related to using DIA-NN output.

@Cajun-data
Copy link

There are some clues in the loader_proteomics_library.py function. Specifically:

dataframe (pd.DataFrame): imported library file from the analysis software
"DIANN". File format: .csv, required columns:
'PrecursorMz',
'IonMobility',
'PrecursorCharge',
'ProteinName',
'ModifiedPeptide'.

Therefore, in R (my preferred language) I can convert a .parquet DIA library to these specifications.

###Convert parquet to csv/tsv

library(arrow)
library(tidyverse)

# Load the Parquet file
df <- read_parquet("DIA_Library.parquet")

#
#"DIANN". File format: .csv, required columns: 
#  'PrecursorMz',
#'IonMobility',
#'PrecursorCharge',
#'ProteinName',
#'ModifiedPeptide'.

df <- df %>%
  rename(IonMobility = IM,
         PrecursorMz = Precursor.Mz,
         PrecursorCharge = Precursor.Charge,
         ProteinName = Protein.Names,
         ModifiedPeptide = Modified.Sequence) %>%
  select(PrecursorMz,IonMobility,PrecursorCharge,
         ProteinName, ModifiedPeptide) %>%
  distinct(ModifiedPeptide, PrecursorCharge, .keep_all = T)

# Save as CSV
write.csv(df, "DIA_Library.csv", row.names = FALSE)

@Maithy15
Copy link
Author

Maithy15 commented Nov 5, 2024

Did the converted format work for you?

@Cajun-data
Copy link

Did the converted format work for you?

Yep - the above code works for me. Just make sure to use the right parquet file since there are a few that typically show up in the output. I used the one that corresponds to an experimentally-derived spectral library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants