Skip to content

Commit

Permalink
[REF] use custom patching implementation and use Apache 2.0 license (#…
Browse files Browse the repository at this point in the history
…153)

* rm clam-based patch implementation

This implementation is licensed GPL3 and is incompatible with Apache 2.0
license. We would like to use Apache 2.0 so this commit removes the GPL3
licensed code.

* add histolab for patching impl

* replace histolab with custom impl

* add scikit-image >= 0.20.0

* add custom segment and patch impl

* replace cli with new patch impl

* use new patching impl

* use Apache 2.0 license

We have removed code from CLAM and are no longer bound by GPL. As of
this commit, wsinfer is licensed under Apache 2.0.

* when thresholding, use arr>=thresh

* use binary thresholding in lieu of otsu

* use new cli args in patching

* fix x,y coord creation + sort coords after query

Before this commit, we were missing the bottom and right edge of patches
because the stop of the coord range was set slightly too low. This
commit fixes that, so now all patches are represented.

This commit also sorts the coordinate indexes after shapely tree query,
because it seems that the order query returns is arbitrary. Sorting
that ensures that when we index the coordinates that are contained in
the polygon, those coordinates are in ascending order (with y changing
most rapidly).

* run isort

* mypy ignore skimage.morphology

* update expected number of patches in test-package

* add test patching test of 100px @ 0.5 mpp

* add binary_threshold argument

* remove patches that extend out of the border of the slide

Our patching algorithm ignores patches that go outside of the slide
boundary.
  • Loading branch information
kaczmarj authored Jul 19, 2023
1 parent 8629418 commit 3bb7665
Show file tree
Hide file tree
Showing 21 changed files with 737 additions and 2,951 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ jobs:
test -f results/run_metadata_*.json
test -f results/patches/JP2K-33003-1.h5
test -f results/model-outputs/JP2K-33003-1.csv
test $(wc -l < results/model-outputs/JP2K-33003-1.csv) -eq 653
test $(wc -l < results/model-outputs/JP2K-33003-1.csv) -eq 675
# This is run on multiple operating systems.
test-package:
Expand Down Expand Up @@ -105,7 +105,7 @@ jobs:
test -f results/run_metadata_*.json
test -f results/patches/JP2K-33003-1.h5
test -f results/model-outputs/JP2K-33003-1.csv
test $(wc -l < results/model-outputs/JP2K-33003-1.csv) -eq 653
test $(wc -l < results/model-outputs/JP2K-33003-1.csv) -eq 675
# FIXME: tissue segmentation has different outputs on Windows. The patch sizes
# are the same but the coordinates found are different.
- name: Run 'wsinfer run' on Windows
Expand All @@ -119,7 +119,7 @@ jobs:
Test-Path -Path results/run_metadata_*.json -PathType Leaf
Test-Path -Path results/patches/JP2K-33003-1.h5 -PathType Leaf
Test-Path -Path results/model-outputs/JP2K-33003-1.csv -PathType Leaf
# test $(python -c "print(sum(1 for _ in open('results/model-outputs/JP2K-33003-1.csv')))") -eq 653
# test $(python -c "print(sum(1 for _ in open('results/model-outputs/JP2K-33003-1.csv')))") -eq 675
style-and-types:
runs-on: ubuntu-latest
Expand Down
875 changes: 201 additions & 674 deletions LICENSE

Large diffs are not rendered by default.

6 changes: 4 additions & 2 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,14 @@ author_email = jakub.kaczmarzyk@stonybrookmedicine.edu
description = Run patch-based classification on pathology whole slide images.
long_description = file: README.md
long_description_content_type = text/markdown
license = GNU General Public License v3 (GPLv3)
license_file = LICENSE
classifiers =
Development Status :: 4 - Beta
Environment :: Console
Intended Audience :: Developers
Intended Audience :: Healthcare Industry
Intended Audience :: Science/Research
License :: OSI Approved :: GNU General Public License v3 (GPLv3)
License :: OSI Approved :: Apache Software License
Operating System :: OS Independent
Programming Language :: Python :: 3
Programming Language :: Python :: 3 :: Only
Expand All @@ -38,6 +37,7 @@ install_requires =
pandas
pillow
pyyaml
scikit-image>=0.20.0
shapely
tifffile
tiffslide
Expand Down Expand Up @@ -101,6 +101,8 @@ ignore_missing_imports = True
ignore_missing_imports = True
[mypy-shapely.*]
ignore_missing_imports = True
[mypy-skimage.morphology]
ignore_missing_imports = True
[mypy-tifffile]
ignore_missing_imports = True
[mypy-zarr.storage]
Expand Down
41 changes: 0 additions & 41 deletions tests/reference/pancancer-lymphocytes-inceptionv4.tcga/purple.csv
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ minx,miny,width,height,prob_notils,prob_tils
0,3400,200,200,1.0,3.427372535086404e-12
0,3600,200,200,1.0,3.427372535086404e-12
0,3800,200,200,1.0,3.427372535086404e-12
0,4000,200,200,0.9999136924743652,8.626562339486554e-05
200,0,200,200,1.0,3.427372535086404e-12
200,200,200,200,1.0,3.427372535086404e-12
200,400,200,200,1.0,3.427372535086404e-12
Expand All @@ -40,7 +39,6 @@ minx,miny,width,height,prob_notils,prob_tils
200,3400,200,200,1.0,3.427372535086404e-12
200,3600,200,200,1.0,3.427372535086404e-12
200,3800,200,200,1.0,3.427372535086404e-12
200,4000,200,200,0.9999136924743652,8.626562339486554e-05
400,0,200,200,1.0,3.427372535086404e-12
400,200,200,200,1.0,3.427372535086404e-12
400,400,200,200,1.0,3.427372535086404e-12
Expand All @@ -61,7 +59,6 @@ minx,miny,width,height,prob_notils,prob_tils
400,3400,200,200,1.0,3.427372535086404e-12
400,3600,200,200,1.0,3.427372535086404e-12
400,3800,200,200,1.0,3.427372535086404e-12
400,4000,200,200,0.9999136924743652,8.626562339486554e-05
600,0,200,200,1.0,3.427372535086404e-12
600,200,200,200,1.0,3.427372535086404e-12
600,400,200,200,1.0,3.427372535086404e-12
Expand All @@ -82,7 +79,6 @@ minx,miny,width,height,prob_notils,prob_tils
600,3400,200,200,1.0,3.427372535086404e-12
600,3600,200,200,1.0,3.427372535086404e-12
600,3800,200,200,1.0,3.427372535086404e-12
600,4000,200,200,0.9999136924743652,8.626562339486554e-05
800,0,200,200,1.0,3.427372535086404e-12
800,200,200,200,1.0,3.427372535086404e-12
800,400,200,200,1.0,3.427372535086404e-12
Expand All @@ -103,7 +99,6 @@ minx,miny,width,height,prob_notils,prob_tils
800,3400,200,200,1.0,3.427372535086404e-12
800,3600,200,200,1.0,3.427372535086404e-12
800,3800,200,200,1.0,3.427372535086404e-12
800,4000,200,200,0.9999136924743652,8.626562339486554e-05
1000,0,200,200,1.0,3.427372535086404e-12
1000,200,200,200,1.0,3.427372535086404e-12
1000,400,200,200,1.0,3.427372535086404e-12
Expand All @@ -124,7 +119,6 @@ minx,miny,width,height,prob_notils,prob_tils
1000,3400,200,200,1.0,3.427372535086404e-12
1000,3600,200,200,1.0,3.427372535086404e-12
1000,3800,200,200,1.0,3.427372535086404e-12
1000,4000,200,200,0.9999136924743652,8.626562339486554e-05
1200,0,200,200,1.0,3.427372535086404e-12
1200,200,200,200,1.0,3.427372535086404e-12
1200,400,200,200,1.0,3.427372535086404e-12
Expand All @@ -145,7 +139,6 @@ minx,miny,width,height,prob_notils,prob_tils
1200,3400,200,200,1.0,3.427372535086404e-12
1200,3600,200,200,1.0,3.427372535086404e-12
1200,3800,200,200,1.0,3.427372535086404e-12
1200,4000,200,200,0.9999136924743652,8.626562339486554e-05
1400,0,200,200,1.0,3.427372535086404e-12
1400,200,200,200,1.0,3.427372535086404e-12
1400,400,200,200,1.0,3.427372535086404e-12
Expand All @@ -166,7 +159,6 @@ minx,miny,width,height,prob_notils,prob_tils
1400,3400,200,200,1.0,3.427372535086404e-12
1400,3600,200,200,1.0,3.427372535086404e-12
1400,3800,200,200,1.0,3.427372535086404e-12
1400,4000,200,200,0.9999136924743652,8.626562339486554e-05
1600,0,200,200,1.0,3.427372535086404e-12
1600,200,200,200,1.0,3.427372535086404e-12
1600,400,200,200,1.0,3.427372535086404e-12
Expand All @@ -187,7 +179,6 @@ minx,miny,width,height,prob_notils,prob_tils
1600,3400,200,200,1.0,3.427372535086404e-12
1600,3600,200,200,1.0,3.427372535086404e-12
1600,3800,200,200,1.0,3.427372535086404e-12
1600,4000,200,200,0.9999136924743652,8.626562339486554e-05
1800,0,200,200,1.0,3.427372535086404e-12
1800,200,200,200,1.0,3.427372535086404e-12
1800,400,200,200,1.0,3.427372535086404e-12
Expand All @@ -208,7 +199,6 @@ minx,miny,width,height,prob_notils,prob_tils
1800,3400,200,200,1.0,3.427372535086404e-12
1800,3600,200,200,1.0,3.427372535086404e-12
1800,3800,200,200,1.0,3.427372535086404e-12
1800,4000,200,200,0.9999136924743652,8.626562339486554e-05
2000,0,200,200,1.0,3.427372535086404e-12
2000,200,200,200,1.0,3.427372535086404e-12
2000,400,200,200,1.0,3.427372535086404e-12
Expand All @@ -229,7 +219,6 @@ minx,miny,width,height,prob_notils,prob_tils
2000,3400,200,200,1.0,3.427372535086404e-12
2000,3600,200,200,1.0,3.427372535086404e-12
2000,3800,200,200,1.0,3.427372535086404e-12
2000,4000,200,200,0.9999136924743652,8.626562339486554e-05
2200,0,200,200,1.0,3.427372535086404e-12
2200,200,200,200,1.0,3.427372535086404e-12
2200,400,200,200,1.0,3.427372535086404e-12
Expand All @@ -250,7 +239,6 @@ minx,miny,width,height,prob_notils,prob_tils
2200,3400,200,200,1.0,3.427372535086404e-12
2200,3600,200,200,1.0,3.427372535086404e-12
2200,3800,200,200,1.0,3.427372535086404e-12
2200,4000,200,200,0.9999136924743652,8.626562339486554e-05
2400,0,200,200,1.0,3.427372535086404e-12
2400,200,200,200,1.0,3.427372535086404e-12
2400,400,200,200,1.0,3.427372535086404e-12
Expand All @@ -271,7 +259,6 @@ minx,miny,width,height,prob_notils,prob_tils
2400,3400,200,200,1.0,3.427372535086404e-12
2400,3600,200,200,1.0,3.427372535086404e-12
2400,3800,200,200,1.0,3.427372535086404e-12
2400,4000,200,200,0.9999136924743652,8.626562339486554e-05
2600,0,200,200,1.0,3.427372535086404e-12
2600,200,200,200,1.0,3.427372535086404e-12
2600,400,200,200,1.0,3.427372535086404e-12
Expand All @@ -292,7 +279,6 @@ minx,miny,width,height,prob_notils,prob_tils
2600,3400,200,200,1.0,3.427372535086404e-12
2600,3600,200,200,1.0,3.427372535086404e-12
2600,3800,200,200,1.0,3.427372535086404e-12
2600,4000,200,200,0.9999136924743652,8.626562339486554e-05
2800,0,200,200,1.0,3.427372535086404e-12
2800,200,200,200,1.0,3.427372535086404e-12
2800,400,200,200,1.0,3.427372535086404e-12
Expand All @@ -313,7 +299,6 @@ minx,miny,width,height,prob_notils,prob_tils
2800,3400,200,200,1.0,3.427372535086404e-12
2800,3600,200,200,1.0,3.427372535086404e-12
2800,3800,200,200,1.0,3.427372535086404e-12
2800,4000,200,200,0.9999136924743652,8.626562339486554e-05
3000,0,200,200,1.0,3.427372535086404e-12
3000,200,200,200,1.0,3.427372535086404e-12
3000,400,200,200,1.0,3.427372535086404e-12
Expand All @@ -334,7 +319,6 @@ minx,miny,width,height,prob_notils,prob_tils
3000,3400,200,200,1.0,3.427372535086404e-12
3000,3600,200,200,1.0,3.427372535086404e-12
3000,3800,200,200,1.0,3.427372535086404e-12
3000,4000,200,200,0.9999136924743652,8.626562339486554e-05
3200,0,200,200,1.0,3.427372535086404e-12
3200,200,200,200,1.0,3.427372535086404e-12
3200,400,200,200,1.0,3.427372535086404e-12
Expand All @@ -355,7 +339,6 @@ minx,miny,width,height,prob_notils,prob_tils
3200,3400,200,200,1.0,3.427372535086404e-12
3200,3600,200,200,1.0,3.427372535086404e-12
3200,3800,200,200,1.0,3.427372535086404e-12
3200,4000,200,200,0.9999136924743652,8.626562339486554e-05
3400,0,200,200,1.0,3.427372535086404e-12
3400,200,200,200,1.0,3.427372535086404e-12
3400,400,200,200,1.0,3.427372535086404e-12
Expand All @@ -376,7 +359,6 @@ minx,miny,width,height,prob_notils,prob_tils
3400,3400,200,200,1.0,3.427372535086404e-12
3400,3600,200,200,1.0,3.427372535086404e-12
3400,3800,200,200,1.0,3.427372535086404e-12
3400,4000,200,200,0.9999136924743652,8.626562339486554e-05
3600,0,200,200,1.0,3.427372535086404e-12
3600,200,200,200,1.0,3.427372535086404e-12
3600,400,200,200,1.0,3.427372535086404e-12
Expand All @@ -397,7 +379,6 @@ minx,miny,width,height,prob_notils,prob_tils
3600,3400,200,200,1.0,3.427372535086404e-12
3600,3600,200,200,1.0,3.427372535086404e-12
3600,3800,200,200,1.0,3.427372535086404e-12
3600,4000,200,200,0.9999136924743652,8.626562339486554e-05
3800,0,200,200,1.0,3.427372535086404e-12
3800,200,200,200,1.0,3.427372535086404e-12
3800,400,200,200,1.0,3.427372535086404e-12
Expand All @@ -418,25 +399,3 @@ minx,miny,width,height,prob_notils,prob_tils
3800,3400,200,200,1.0,3.427372535086404e-12
3800,3600,200,200,1.0,3.427372535086404e-12
3800,3800,200,200,1.0,3.427372535086404e-12
3800,4000,200,200,0.9999136924743652,8.626562339486554e-05
4000,0,200,200,0.9996557235717772,0.0003442850720603
4000,200,200,200,0.9996557235717772,0.0003442850720603
4000,400,200,200,0.9996557235717772,0.0003442850720603
4000,600,200,200,0.9996557235717772,0.0003442850720603
4000,800,200,200,0.9996557235717772,0.0003442850720603
4000,1000,200,200,0.9996557235717772,0.0003442850720603
4000,1200,200,200,0.9996557235717772,0.0003442850720603
4000,1400,200,200,0.9996557235717772,0.0003442850720603
4000,1600,200,200,0.9996557235717772,0.0003442850720603
4000,1800,200,200,0.9996557235717772,0.0003442850720603
4000,2000,200,200,0.9996557235717772,0.0003442850720603
4000,2200,200,200,0.9996557235717772,0.0003442850720603
4000,2400,200,200,0.9996557235717772,0.0003442850720603
4000,2600,200,200,0.9996557235717772,0.0003442850720603
4000,2800,200,200,0.9996557235717772,0.0003442850720603
4000,3000,200,200,0.9996557235717772,0.0003442850720603
4000,3200,200,200,0.9996557235717772,0.0003442850720603
4000,3400,200,200,0.9996557235717772,0.0003442850720603
4000,3600,200,200,0.9996557235717772,0.0003442850720603
4000,3800,200,200,0.9996557235717772,0.0003442850720603
4000,4000,200,200,0.9894591569900512,0.0105408066883683
12 changes: 5 additions & 7 deletions tests/test_all.py
Original file line number Diff line number Diff line change
Expand Up @@ -291,7 +291,7 @@ def test_convert_to_sbu():

@pytest.mark.parametrize(
["patch_size", "patch_spacing"],
[(256, 0.25), (256, 0.50), (350, 0.25), (100, 0.3)],
[(256, 0.25), (256, 0.50), (350, 0.25), (100, 0.3), (100, 0.5)],
)
def test_patch_cli(
patch_size: int, patch_spacing: float, tmp_path: Path, tiff_image: Path
Expand All @@ -306,22 +306,20 @@ def test_patch_cli(
cli,
[
"patch",
"--source",
"--wsi-dir",
str(tiff_image.parent),
"--save-dir",
"--results-dir",
str(savedir),
"--patch-size",
"--patch-size-px",
str(patch_size),
"--patch-spacing",
"--patch-spacing-um-px",
str(patch_spacing),
],
)
assert result.exit_code == 0
stem = tiff_image.stem
assert (savedir / "masks" / f"{stem}.jpg").exists()
assert (savedir / "patches" / f"{stem}.h5").exists()
assert (savedir / "process_list_autogen.csv").exists()
assert (savedir / "stitches" / f"{stem}.jpg").exists()

expected_patch_size = round(patch_size * patch_spacing / orig_slide_spacing)
sqrt_expected_num_patches = round(orig_slide_size / expected_patch_size)
Expand Down
28 changes: 15 additions & 13 deletions wsinfer/cli/infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@

from ..modellib import models
from ..modellib.run_inference import run_inference
from ..patchlib.create_patches_fp import create_patches
from ..patchlib import segment_and_patch_directory_of_slides


def _num_cpus() -> int:
Expand Down Expand Up @@ -350,18 +350,20 @@ def run(

click.secho("\nFinding patch coordinates...\n", fg="green")

create_patches(
source=str(wsi_dir),
save_dir=str(results_dir),
patch_size=model_obj.config.patch_size_pixels,
patch_spacing=model_obj.config.spacing_um_px,
seg=True,
patch=True,
# Stitching is a bottleneck when using tiffslide.
# TODO: figure out why this is...
stitch=False,
# FIXME: allow customization of this preset
preset="tcga.csv",
# FIXME: add presets for different tissue types?

segment_and_patch_directory_of_slides(
wsi_dir=wsi_dir,
save_dir=results_dir,
patch_size_px=model_obj.config.patch_size_pixels,
patch_spacing_um_px=model_obj.config.spacing_um_px,
thumbsize=(2048, 2048),
# TODO: these can be made arguments to the CLI.
median_filter_size=7,
binary_threshold=7,
closing_kernel_size=6,
min_object_size_um2=200**2,
min_hole_size_um2=190**2,
)

click.secho("\nRunning model inference.\n", fg="green")
Expand Down
Loading

0 comments on commit 3bb7665

Please sign in to comment.