scientific-skills/histolab/references/tile_extraction.md
Tile extraction is the process of cropping smaller, manageable regions from large whole slide images. Histolab provides three main extraction strategies, each suited for different analysis needs. All tilers share common parameters and provide methods for previewing and extracting tiles.
All tiler classes accept these parameters:
tile_size: tuple = (512, 512) # Tile dimensions in pixels (width, height)
level: int = 0 # Pyramid level for extraction (0=highest resolution)
check_tissue: bool = True # Filter tiles by tissue content
tissue_percent: float = 80.0 # Minimum tissue coverage (0-100)
pixel_overlap: int = 0 # Overlap between adjacent tiles (GridTiler only)
prefix: str = "" # Prefix for saved tile filenames
suffix: str = ".png" # File extension for saved tiles
extraction_mask: BinaryMask = BiggestTissueBoxMask() # Mask defining extraction region
Purpose: Extract a fixed number of randomly positioned tiles from tissue regions.
from histolab.tiler import RandomTiler
random_tiler = RandomTiler(
tile_size=(512, 512),
n_tiles=100, # Number of random tiles to extract
level=0,
seed=42, # Random seed for reproducibility
check_tissue=True,
tissue_percent=80.0
)
# Extract tiles
random_tiler.extract(slide, extraction_mask=TissueMask())
Key Parameters:
n_tiles: Number of random tiles to extractseed: Random seed for reproducible tile selectionmax_iter: Maximum attempts to find valid tiles (default 1000)Use cases:
Advantages:
Limitations:
Purpose: Extract tiles systematically across tissue regions following a grid pattern.
from histolab.tiler import GridTiler
grid_tiler = GridTiler(
tile_size=(512, 512),
level=0,
check_tissue=True,
tissue_percent=80.0,
pixel_overlap=0 # Overlap in pixels between adjacent tiles
)
# Extract tiles
grid_tiler.extract(slide)
Key Parameters:
pixel_overlap: Number of overlapping pixels between adjacent tiles
pixel_overlap=0: Non-overlapping tilespixel_overlap=128: 128-pixel overlap on each sideUse cases:
Advantages:
Limitations:
check_tissue)Grid Pattern:
[Tile 1][Tile 2][Tile 3]
[Tile 4][Tile 5][Tile 6]
[Tile 7][Tile 8][Tile 9]
With pixel_overlap=64:
[Tile 1-overlap-Tile 2-overlap-Tile 3]
[ overlap overlap overlap]
[Tile 4-overlap-Tile 5-overlap-Tile 6]
Purpose: Extract top-ranked tiles based on custom scoring functions.
from histolab.tiler import ScoreTiler
from histolab.scorer import NucleiScorer
score_tiler = ScoreTiler(
tile_size=(512, 512),
n_tiles=50, # Number of top-scoring tiles to extract
level=0,
scorer=NucleiScorer(), # Scoring function
check_tissue=True
)
# Extract top-scoring tiles
score_tiler.extract(slide)
Key Parameters:
n_tiles: Number of top-scoring tiles to extractscorer: Scoring function (e.g., NucleiScorer, CellularityScorer, custom scorer)Use cases:
Advantages:
Limitations:
Scores tiles based on nuclei detection and density.
from histolab.scorer import NucleiScorer
nuclei_scorer = NucleiScorer()
How it works:
Best for:
Scores tiles based on overall cellular content.
from histolab.scorer import CellularityScorer
cellularity_scorer = CellularityScorer()
Best for:
Create custom scoring functions for specific needs:
from histolab.scorer import Scorer
import numpy as np
class ColorVarianceScorer(Scorer):
def __call__(self, tile):
"""Score tiles based on color variance."""
tile_array = np.array(tile.image)
# Calculate color variance
variance = np.var(tile_array, axis=(0, 1)).sum()
return variance
# Use custom scorer
variance_scorer = ColorVarianceScorer()
score_tiler = ScoreTiler(
tile_size=(512, 512),
n_tiles=30,
scorer=variance_scorer
)
Preview tile locations before extraction to validate tiler configuration:
# Preview random tile locations
random_tiler.locate_tiles(
slide=slide,
extraction_mask=TissueMask(),
n_tiles=20 # Number of tiles to preview (for RandomTiler)
)
This displays the slide thumbnail with colored rectangles indicating tile positions.
from histolab.slide import Slide
from histolab.tiler import RandomTiler
# Load slide
slide = Slide("slide.svs", processed_path="output/tiles/")
# Configure tiler
tiler = RandomTiler(
tile_size=(512, 512),
n_tiles=100,
level=0,
seed=42
)
# Extract tiles (saved to processed_path)
tiler.extract(slide)
import logging
# Enable logging
logging.basicConfig(level=logging.INFO)
# Extract tiles with progress information
tiler.extract(slide)
# Output: INFO: Tile 1/100 saved...
# Output: INFO: Tile 2/100 saved...
# Generate CSV report with tile information
score_tiler = ScoreTiler(
tile_size=(512, 512),
n_tiles=50,
scorer=NucleiScorer()
)
# Extract and save report
score_tiler.extract(slide, report_path="tiles_report.csv")
# Report contains: tile name, coordinates, score, tissue percentage
Report format:
tile_name,x_coord,y_coord,level,score,tissue_percent
tile_001.png,10240,5120,0,0.89,95.2
tile_002.png,15360,7680,0,0.85,91.7
...
Extract tiles at different magnification levels:
# High resolution tiles (level 0)
high_res_tiler = RandomTiler(tile_size=(512, 512), n_tiles=50, level=0)
high_res_tiler.extract(slide)
# Medium resolution tiles (level 1)
med_res_tiler = RandomTiler(tile_size=(512, 512), n_tiles=50, level=1)
med_res_tiler.extract(slide)
# Low resolution tiles (level 2)
low_res_tiler = RandomTiler(tile_size=(512, 512), n_tiles=50, level=2)
low_res_tiler.extract(slide)
Extract at multiple scales from same locations:
# Extract random locations at level 0
random_tiler_l0 = RandomTiler(
tile_size=(512, 512),
n_tiles=30,
level=0,
seed=42,
prefix="level0_"
)
random_tiler_l0.extract(slide)
# Extract same locations at level 1 (use same seed)
random_tiler_l1 = RandomTiler(
tile_size=(512, 512),
n_tiles=30,
level=1,
seed=42,
prefix="level1_"
)
random_tiler_l1.extract(slide)
Apply additional filtering after extraction:
from PIL import Image
import numpy as np
from pathlib import Path
def filter_blurry_tiles(tile_dir, threshold=100):
"""Remove blurry tiles using Laplacian variance."""
for tile_path in Path(tile_dir).glob("*.png"):
img = Image.open(tile_path)
gray = np.array(img.convert('L'))
laplacian_var = cv2.Laplacian(gray, cv2.CV_64F).var()
if laplacian_var < threshold:
tile_path.unlink() # Remove blurry tile
print(f"Removed blurry tile: {tile_path.name}")
# Use after extraction
tiler.extract(slide)
filter_blurry_tiles("output/tiles/")
locate_tiles() to verify tile placementSolutions:
tissue_percent thresholdSolutions:
check_tissue=Truetissue_percent thresholdSolutions:
n_tiles for RandomTiler/ScoreTilerSolutions:
pixel_overlap=0 for non-overlapping tilespixel_overlap value