Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read cytokit csv #77

Merged
merged 12 commits into from
Feb 25, 2020
Merged

Read cytokit csv #77

merged 12 commits into from
Feb 25, 2020

Conversation

mccalluc
Copy link
Collaborator

@mccalluc mccalluc commented Feb 25, 2020

This is preparatory to hubmapconsortium/portal-containers#14

As discussed, the document is large enough that it might be useful to use Arrow.

@mccalluc mccalluc requested a review from manzt February 25, 2020 21:38
@mccalluc mccalluc marked this pull request as ready for review February 25, 2020 21:39
@mccalluc
Copy link
Collaborator Author

(Sorry, need to update fixture...)

@mccalluc mccalluc closed this Feb 25, 2020
@mccalluc mccalluc reopened this Feb 25, 2020
Copy link
Member

@manzt manzt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I don't think we've added Arrow things into vitessce yet. I will explore generating arrow and comparing file sizes if that's something we are running into. Feel free to merge this and then I'll take a look tomorrow with Arrow.

If you could comment on how large the output currently is, that would be helpful for tomorrow.



def round_conv(s):
# TODO: Truncating after decimal point might be slightly too aggressive?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into this with Arrow and compare file sizes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps for the x and y coordinates, we should have some precision beyond the decimal point, but I'm not convinced for the gene "levels" this is necessary.


main() {
# Download and process data which describes cell locations,
# and gene expression levels. Multiple JSON output files are produced:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be more output files than just cells.json?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants