Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AG-1395: Bump harmonized targets source version to pick up delimiter fix #152

Merged
merged 3 commits into from
Oct 29, 2024

Conversation

JessterB
Copy link
Contributor

No description provided.

Copy link
Contributor

@jaclynbeck-sage jaclynbeck-sage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I take back my approval until CI has re-run. I see this may have caused an issue with gene_info but want to let CI run again to make sure.

Copy link
Contributor

@jaclynbeck-sage jaclynbeck-sage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah ok, it looks like something about the new file did something bad to gene_info processing. What is different about the new file?

@JessterB
Copy link
Contributor Author

Yeah ok, it looks like something about the new file did something bad to gene_info processing. What is different about the new file?

@jaclynbeck-sage I converted some semi-colon delimiters to commas, and forgot to escape them. I'll update the source and verify locally before I push up a new commit...sorry, I should have tested this locally first.

@jaclynbeck-sage
Copy link
Contributor

Thanks. It looks like something is... off with this file (as well as its previous versions 48 and 47 which is as far back as I looked). The raw file has 1362 lines in it, but if you read it in as a CSV either programmatically or with Excel it only has 1162 lines (versions 48-50). For v47 it as 1334 lines / 1134 lines, so a 200-line discrepancy in both versions.

Update: It looks like starting line 320, The Emory data has line breaks in the Data_used_to_support_target_selection column, which are escaped, so it does get read in successfully as one field. I guess that's ok? The raw file looks like:

320  ...,"Discovery quantitative proteomics of FrCx 
321    WPCNA of multiple and consensus cohorts
322    ANOVA",...

and the read-in string looks like:
"Discovery quantitative proteomics of FrCx \n WPCNA of multiple and consensus cohorts\n ANOVA"

Is this something we're aware of/handle on the front end for display?

@JessterB
Copy link
Contributor Author

JessterB commented Oct 28, 2024

Thanks. It looks like something is... off with this file (as well as its previous versions 48 and 47 which is as far back as I looked). The raw file has 1362 lines in it, but if you read it in as a CSV either programmatically or with Excel it only has 1162 lines (versions 48-50). For v47 it as 1334 lines / 1134 lines, so a 200-line discrepancy in both versions.

Update: It looks like starting line 320, The Emory data has line breaks in the Data_used_to_support_target_selection column, which are escaped, so it does get read in successfully as one field. I guess that's ok? The raw file looks like:

320  ...,"Discovery quantitative proteomics of FrCx 
321    WPCNA of multiple and consensus cohorts
322    ANOVA",...

and the read-in string looks like: "Discovery quantitative proteomics of FrCx \n WPCNA of multiple and consensus cohorts\n ANOVA"

Is this something we're aware of/handle on the front end for display?

@jaclynbeck-sage This seems to look ok in Agora in prod today (for the 2 genes I checked, RPH3A & STX1A): mv68 -> gene_info v56 -> harmonized targets v48

Screenshot 2024-10-28 at 1 11 24 PM

...but I could remove those /n since they are confusing and useless.

@jaclynbeck-sage
Copy link
Contributor

jaclynbeck-sage commented Oct 28, 2024

...but I could remove those /n since they are confusing and useless.

Maybe? I'm guessing the line breaks are to indicate these are 3 separate items, not one single discovery method, so the way it's displaying in Agora right now isn't right because it looks like it's all mashed together in one sentence. I think probably the \n should be replaced with commas but also there's extra spaces around the \n that need to be dealt with in order for commas to look normal.

Copy link

sonarcloud bot commented Oct 28, 2024

@JessterB
Copy link
Contributor Author

JessterB commented Oct 28, 2024

@jaclynbeck-sage Those newlines go back to v1 of the source file from 2018 and no one has complained yet, but I went ahead and fixed it (in v51) by replacing the newlines with commas, and cleaning up extra spaces.

Copy link
Contributor

@jaclynbeck-sage jaclynbeck-sage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you!!

@JessterB JessterB merged commit 3382947 into dev Oct 29, 2024
9 checks passed
@JessterB JessterB deleted the AG-1395 branch October 29, 2024 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants