-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add scATAC-seq UMAP DR + clustering -> Arrow container #40
Conversation
Hmm, this succeeded for me locally through
I'm not familiar enough with the test setup -- what does
mean in the test output? I might have expected something like |
I had thought these would be binary identical after verifying that the DataFrame read from the CSV was equal to what was constructed from the h5ad file in h5ad-to-arrow, but apparently "equal" isn't "identical" for these purposes.
Adding @ilan-gold as a reviewer. I'll take a look at it, but I really haven't kept up with this code base. (I have this browser tab open, so I'm assuming there was a request for me to look at this in Slack or something, but I've lost track now of how it came to me.) @mruffalo : Our convention has been that after review, and if the tests pass, the author of the PR merges. Fine with me if you'll do that, when those conditions are met, or if you think it should be in our court, that's fine, too. |
@mruffalo I agree about the code duplication. That's unfortunate - it seems like simply changing the input file would have solved it (once it is converted to arrow), but we're past that point now. I'll take a look closer as well. |
Not binary identical, and apparently some package version differences on my host machine cause this to differ.
Thanks for the feedback -- I fixed this by updating the I don't like the code duplication here, but I would advocate for fixing this after the first data release. @mccalluc I'll be happy to merge and publish a tagged Docker container once @ilan-gold gives a 👍 |
I agree @mruffalo about the timing of de-duplicating code. I don't think this is the last we will hear of needing to generate this sort of information. @mccalluc It may be useful for us to begin discussing soon some sort of |
Thanks! |
@mruffalo , @ilan-gold : Sorry to be late to the party:
|
This duplicates a lot of the code in the
h5ad-to-arrow
code in a way that I'm not very happy about; I didn't realize how much morecontainers/h5ad-to-arrow/context/main.py
does since the last time I looked at it. This works, though, and should allow visualization of scATAC-seq dimensionality reduction and clustering in the same way as scRNA-seq, more-or-less for "free" in the UI code. (At least, that's how I interpreted @mccalluc's comments in a standup today.)At the moment I don't have many good ideas about how to de-duplicate this code, given how the repository and containers are structured. That seems worth doing ASAP, but not necessarily right now.