Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Click, add .gitignore, format code, and use Poetry #8

Open
wants to merge 47 commits into
base: master
Choose a base branch
from

Conversation

sumanthratna
Copy link
Contributor

@sumanthratna sumanthratna commented Mar 13, 2020

I'm not sure if this is a PR that you want to merge—if it is, I can create another PR with the same changes as this for the dev_object_detection branch.

This PR:

  • removes directories that usually aren't committed to version control
  • formats code according to black (https://github.com/psf/black)
  • switches to Poetry for package management (for easy updating of dependencies and easy publishing to PyPI)
  • updates Click to 7.1.1 (this is necessary so new libraries such as pymana can use methods from PathFlow without experiencing version conflicts)
    • we'll need to update the Wiki because underscores have been changed to hyphens in Click commands
  • updates pandas to 1.x
  • rebuilds docs with Sphinx
  • add testing with pytest
  • add continuous integration with Travis CI

@sumanthratna sumanthratna changed the title Add .gitignore Update Click, add .gitignore, format code, and use Poetry Mar 14, 2020
@jlevy44
Copy link
Owner

jlevy44 commented Mar 14, 2020

Nice!!!! Thanks for linting the repo amd updating the docs! Also, I collapsed and PR'd the dev_object_detector branch. We can PR to master or feel free to make branches. Generally, I'm hoping that all features that have been added to the public release are fully functional, so I may further deprecate code that is not fully functional.

A couple of questions: Have you verified that the docs work? Has any of the linting caused any of the modules to fail? And referring to your previous PR, I saw that you had attempted to change the orientation of the slide, any impact on downstream functionality?

We may want to consider starting a unit-testing framework with continuous integration. It's been in the plans for a while but haven't gotten around to it. We could both iterate on the framework and incorporate in pieces if you are interested.

In any case, will probably accept this PR, just want to test the code in your commits before proceeding and make sure it doesn't significantly conflict with the previous PR.

@jlevy44
Copy link
Owner

jlevy44 commented Mar 14, 2020

We should also make sure poetry is compatible with conda. I haven't used poetry before, but seems great. Eventually this will be packaged with Docker (we already have a preliminary Dockerfile but unreleased) and just need to make sure it can handle more complex installs such as nvidia Apex.

@sumanthratna
Copy link
Contributor Author

sumanthratna commented Mar 14, 2020

I had a lot of those questions and was planning on bringing them up to you.

The docs have a pretty significant error—the types of parameters get combined with the parameter names. This is because of numpydoc I have a commit ready that'll fix this within the next 2 minutes (hopefully).

I also wanted to test all of the methods but since PathFlow doesn't have a testing framework set up, I haven't had a chance to do that yet. Obviously we should do some testing before merging—if you'd like I can set up some basic testing using pytest.

As for the orientation of the slide, I don't expect it'll cause any errors but we should figure that out with the testing.

Also—unfortunately, Poetry doesn't easily publish to conda (I think). However, I think I could write a script to automatically publish to Conda for us. The same goes for Docker.

Also—Poetry doesn't allow custom install instructions (this is probably the biggest downside of Poetry for PathFlow). We could set up a script called pathflow-setup which installs apex and other libraries. Then, in the documentation, we could mention that users should run pathflow-setup after installing.

@jlevy44
Copy link
Owner

jlevy44 commented Mar 15, 2020

Were you building the docs using sphinx?

That sounds great. In the bin, I have a script called install_apex, but we can just merge this into pathflow setup.

If you want to try the conda and docker set ups, that would be great, I can also set it up as well. I think once we throw in a few pytest modules (we don't need to do the entire framework yet), we should be good to go for the merge.

@sumanthratna
Copy link
Contributor Author

Yup, the docs were built with sphinx. To build the docs:

cd docs
make html

I'm working on writing a Makefile for development files right now, I'll add that to this PR soon.

@jlevy44
Copy link
Owner

jlevy44 commented Mar 22, 2020

This is a really impressive PR that moves us closer to a more reproducible workflow.

I would say before merging, we need the remaining minimal set of tests:

  1. Preprocess pipeline test (is a SQL file generated? Zarr? NPZ?), in a future PR we can consider removing the dependence on zarr and npz
  2. Quick test of train_model (generate a pretrained model, no need to train, but if training, only store 100 patches)
  3. Preliminary tests of visualization module (UMAP plot production with plotly, UMAP with imagenet pretrained network with images overlaid; this can be done without training the model, we can discuss over Slack).
  4. The aforementioned CLIs work, no click errors or errors introduced by using black.

To perform the tests the most efficiently, you can also subset the SQL database to the first 100 entries, which should make 2 and 3 easier to accomplish. Some of the steps for performing these tasks are in the Wiki.

publish Outdated Show resolved Hide resolved
png_file = join(input_dir, basename + ".png")
xml_file = join(input_dir, basename + ".xml")

test_segmentation()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, you could test both classification and segmentation on the same dataset, I'm not sure if this is what you were going for here.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even regression from patch level labels featured in the SQL. I'll try to get a new dataset over soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was planning on using TCGA-18-5592-01Z-00-DX1 for testing both segmentation and classification, like you suggested. The reason I'm splitting the test into two different methods is because some of the parameter names change (such as npy_mask vs. xml_file).

I'm also planning on adding support for TCGA annotations in PathFlow, so a new dataset shouldn't be necessary.

"annotation",
"0",
"1",
"2",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to change these header names to reflect final testing dataset.

from sqlite3 import connect as sql_connect

connection = sql_connect(odb)
cursor = connection.execute('SELECT * FROM "256";')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One potential option here is to limit the number of patches before testing the classification and segmentation pipelines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants