Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Click, add .gitignore, format code, and use Poetry #8

Open
wants to merge 47 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
50826a7
Fix svs2dask_array
sumanthratna Mar 12, 2020
e3e0de6
Merge pull request #7 from jlevy44/dev_object_detection
jlevy44 Mar 13, 2020
4e7fa96
Merge pull request #6 from sumanthratna/master
jlevy44 Mar 13, 2020
a18f078
Add .gitignore
sumanthratna Mar 13, 2020
582bf1b
Merge remote-tracking branch 'upstream/master'
sumanthratna Mar 13, 2020
798fccd
Lint and format code
sumanthratna Mar 14, 2020
a89cfa2
Rebuild and update docs
sumanthratna Mar 14, 2020
0b6e8ff
Fix parameter names and types in docs
sumanthratna Mar 14, 2020
0a34204
Add publish script
sumanthratna Mar 15, 2020
cab0498
Add svs2dask_array test
sumanthratna Mar 15, 2020
7e529cd
Add large files to Git LFS
sumanthratna Mar 15, 2020
f27becb
Adding optimization level
jlevy44 Mar 16, 2020
5dee239
Programmatically download test file
sumanthratna Mar 16, 2020
b09ca93
Merge branch 'master' of github.com:jlevy44/PathFlowAI
sumanthratna Mar 16, 2020
871b6e3
First attempt at docker.. Need nvidia-docker and singularity
jlevy44 Mar 16, 2020
4869050
Add Travis CI
sumanthratna Mar 20, 2020
9d20541
Fix setup.py entry_points
sumanthratna Mar 21, 2020
663db89
Add OpenSlide installation to Travis
sumanthratna Mar 21, 2020
b6edc15
Add Travis badge to README
sumanthratna Mar 21, 2020
9b4b78d
Rebuild docs, make scripts executable, and update dependencies
sumanthratna Mar 21, 2020
4fdde7f
Add Black to Travis
sumanthratna Mar 22, 2020
b3db849
Install Black in Travis
sumanthratna Mar 22, 2020
42d3f7e
Updates to dockerfile
jlevy44 Mar 22, 2020
672c92d
changed capitalization of docker
jlevy44 Mar 22, 2020
ebc7aee
Merge pull request #11 from jlevy44/dockerfile
jlevy44 Mar 22, 2020
1b1f51b
Bump psutil from 5.6.3 to 5.6.6 in /docker
dependabot[bot] Mar 22, 2020
ce7c4d4
Merge pull request #12 from jlevy44/dependabot/pip/docker/psutil-5.6.6
jlevy44 Mar 22, 2020
716574c
Merge branch 'master' of github.com:jlevy44/PathFlowAI
sumanthratna Mar 22, 2020
0606903
Add test_preprocessing_pipeline
sumanthratna Mar 22, 2020
bd1e7e7
Test preprocessing pipeline CLI
sumanthratna Mar 22, 2020
1e5d2c3
Verify file creation for preprocessing
sumanthratna Mar 22, 2020
f9620b0
Replace preprocessing test data
sumanthratna Mar 23, 2020
2e27248
Merge branch 'master' of github.com:jlevy44/PathFlowAI
sumanthratna Mar 25, 2020
e2f3836
Convert TCGA sample to npy for testing segmentation
sumanthratna Mar 25, 2020
e8fd702
Fix Travis build
sumanthratna Mar 25, 2020
d7da285
Fix Travis build v2
sumanthratna Mar 25, 2020
0f44010
Improve targets in publish script
sumanthratna Mar 25, 2020
48d9fbd
Fix help command in publish script
sumanthratna Mar 25, 2020
ca2ef13
Fix Docker username in publish script
sumanthratna Mar 25, 2020
ffc5312
Merge branch 'master' of github.com:jlevy44/PathFlowAI
sumanthratna Mar 26, 2020
59185e1
Merge branch 'master' of github.com:jlevy44/PathFlowAI
sumanthratna Mar 27, 2020
762eef7
Update dependencies
sumanthratna Apr 5, 2020
76cf4f7
Use np.testing in unit tests
sumanthratna May 4, 2020
07f4e08
Add NVIDIA apex to pyproject.toml
sumanthratna May 4, 2020
ce32da7
Fix Travis build
sumanthratna May 5, 2020
2d7d4b4
Fix Travis build v2
sumanthratna May 5, 2020
70324e7
Fix Travis build v3
sumanthratna May 5, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Convert TCGA sample to npy for testing segmentation
sumanthratna committed Mar 25, 2020
commit e2f38363cee6a2699986897cc90aaaf5dbb5ffae
Binary file added tests/inputs/TCGA-18-5592-01Z-00-DX1.npy
Binary file not shown.
Binary file added tests/inputs/TCGA-18-5592-01Z-00-DX1_mask.npy
Binary file not shown.
Binary file added tests/patch_information.db
Binary file not shown.
156 changes: 93 additions & 63 deletions tests/test_utils.py
Original file line number Diff line number Diff line change
@@ -2,58 +2,35 @@
from numpy import array_equal


# def test_svs2dask_array():
# from .utils import download_svs
# from PIL import Image
# from numpy import array as to_npa
#
# # from os import remove
#
# id = "2e4f6316-588b-4629-adf0-7aeac358a0e2"
# file = "TCGA-MR-A520-01Z-00-DX1.2F323BAC-56C9-4A0C-9C1B-2B4F776056B4.svs"
# download_location = download_svs(id, file)
#
# Image.MAX_IMAGE_PIXELS = None # SECURITY RISK!
# ground_truth = to_npa(Image.open(download_location))
#
# test = utils.svs2dask_array(download_location).compute()
# crop_height, crop_width, _ = test.shape
#
# # remove(download_location)
#
# assert array_equal(ground_truth[:crop_height, :crop_width, :], test)
def test_svs2dask_array():
from .utils import download_svs
from PIL import Image
from numpy import array as to_npa

# from os import remove

id = "2e4f6316-588b-4629-adf0-7aeac358a0e2"
file = "TCGA-MR-A520-01Z-00-DX1.2F323BAC-56C9-4A0C-9C1B-2B4F776056B4.svs"
download_location = download_svs(id, file)

Image.MAX_IMAGE_PIXELS = None # SECURITY RISK!
ground_truth = to_npa(Image.open(download_location))

test = utils.svs2dask_array(download_location).compute()
crop_height, crop_width, _ = test.shape

# remove(download_location)

assert array_equal(ground_truth[:crop_height, :crop_width, :], test)


def test_preprocessing_pipeline():
from .utils import get_tests_dir, image_to_numpy
from .utils import get_tests_dir
from os.path import join, exists

tests_dir = get_tests_dir()
basename = "TCGA-18-5592-01Z-00-DX1"
input_dir = join(tests_dir, "inputs")
png_file = join(input_dir, basename + ".png")
xml_file = join(input_dir, basename + ".xml")
out_zarr = join(tests_dir, "output_zarr.zarr")
out_pkl = join(tests_dir, "output.pkl")

# convert a TCGA XML to a binary mask with the following:
# Image.fromarray(
# viewmask.utils.xml_to_image(
# ET.parse('./tests/inputs/TCGA-18-5592-01Z-00-DX1.xml')
# )
# ).save('/Users/suman/Downloads/bruh.png')

utils.run_preprocessing_pipeline(
png_file, xml_file=xml_file, out_zarr=out_zarr, out_pkl=out_pkl
)
assert exists(out_zarr)
assert exists(out_pkl)

from zarr import open as open_zarr
from dask.array import from_zarr as zarr_to_da

img = zarr_to_da(open_zarr(out_zarr)).compute()
assert array_equal(img, image_to_numpy(png_file))

def capture(command):
from subprocess import Popen, PIPE
@@ -65,22 +42,75 @@ def capture(command):
out, err = proc.communicate()
return out, err, proc.returncode

odb = join(tests_dir, "patch_information.db")
command = [
"poetry", "run", "pathflowai-preprocess",
"preprocess-pipeline",
"-odb", odb,
"--preprocess",
"--patches",
"--basename", basename,
"--input_dir", input_dir,
"--patch_size", "256",
"--intensity_threshold", "45.",
"-tc", "7",
"-t", "0.05"
]
out, err, exitcode = capture(command)
assert exists(out_zarr)
assert exists(out_pkl)
assert exists(odb)
assert exitcode == 0
def test_segmentation():
npy_file = join(input_dir, basename + ".npy")
npy_mask = join(input_dir, basename + "_mask.npy")
out_zarr = join(tests_dir, "output_zarr.zarr")
out_pkl = join(tests_dir, "output.pkl")

# convert TCGA annotations (XML) to a binary mask (npy) with the following:
#
# import numpy as np
# import viewmask
# import xml.etree.ElementTree as ET
# np.save(
# './tests/inputs/TCGA-18-5592-01Z-00-DX1_mask.npy',
# viewmask.utils.xml_to_image(
# ET.parse('./tests/inputs/TCGA-18-5592-01Z-00-DX1.xml')
# )
# )
#
#
# convert TCGA input (PNG) to a numpy array (npy) with the following:
#
# import numpy as np
# from PIL import Image
# np.save(
# './tests/inputs/TCGA-18-5592-01Z-00-DX1.npy',
# np.array(
# Image.open('./tests/inputs/TCGA-18-5592-01Z-00-DX1.png')
# )
# )

utils.run_preprocessing_pipeline(
npy_file, npy_mask=npy_mask, out_zarr=out_zarr, out_pkl=out_pkl
)
assert exists(out_zarr)
assert exists(out_pkl)

from numpy import load as npy_to_npa
from zarr import open as open_zarr
from dask.array import from_zarr as zarr_to_da

img = zarr_to_da(open_zarr(out_zarr)).compute()
assert array_equal(img, npy_to_npa(npy_file))

odb = join(tests_dir, "patch_information.db")
command = [
"poetry", "run", "pathflowai-preprocess",
"preprocess-pipeline",
"-odb", odb,
"--preprocess",
"--patches",
"--basename", basename,
"--input_dir", input_dir,
"--patch_size", "256",
"--intensity_threshold", "45.",
"-tc", "7",
"-t", "0.05"
]
out, err, exitcode = capture(command)
assert exists(out_zarr)
assert exists(out_pkl)
assert exists(odb)
assert exitcode == 0

from sqlite3 import connect as sql_connect
connection = sql_connect(odb)
cursor = connection.execute('SELECT * FROM "256";')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One potential option here is to limit the number of patches before testing the classification and segmentation pipelines

names = [description[0] for description in cursor.description]
cursor.close()
true_headers = ['index', 'ID', 'x', 'y', 'patch_size',
'annotation', '0', '1', '2', '3', '4', '5', '6']
assert names == true_headers
test_segmentation()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, you could test both classification and segmentation on the same dataset, I'm not sure if this is what you were going for here.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even regression from patch level labels featured in the SQL. I'll try to get a new dataset over soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was planning on using TCGA-18-5592-01Z-00-DX1 for testing both segmentation and classification, like you suggested. The reason I'm splitting the test into two different methods is because some of the parameter names change (such as npy_mask vs. xml_file).

I'm also planning on adding support for TCGA annotations in PathFlow, so a new dataset shouldn't be necessary.