Stain-Robust Mitotic Figure Detection for MIDOG 2022 Challenge

Mostafa Jahanifar Tissue Image Analytics Centre, Department of Compute Science, University of Warwick, UK. Adam Shephard Tissue Image Analytics Centre, Department of Compute Science, University of Warwick, UK. Neda Zamanitajeddin Tissue Image Analytics Centre, Department of Compute Science, University of Warwick, UK. Shan E Ahmed Raza Tissue Image Analytics Centre, Department of Compute Science, University of Warwick, UK. Nasir Rajpoot Tissue Image Analytics Centre, Department of Compute Science, University of Warwick, UK.

Abstract

The detection of mitotic figures from different scanners/sites remains an important topic of research, owing to its potential in assisting clinicians with tumour grading. The MItosis DOmain Generalization (MIDOG) 2022 challenge aims to test the robustness of detection models on unseen data from multiple scanners and tissue types for this task. We present a short summary of the approach employed by the TIA Centre team to address this challenge. Our approach is based on a hybrid detection model, where mitotic candidates are segmented, before being refined by a deep learning classifier. Cross-validation on the training images achieved the F1-score of 0.816 and 0.784 on the preliminary test set, demonstrating the generalizability of our model to unseen data from new scanners.

\leadauthor

Jahanifar

itosis | Segmentation | Classification | MIDOG Challenge

{corrauthor}

mostafa.jahanifar@warwick.ac.uk

Introduction

The detection of mitotic figures is an important task in the analysis of tumour regions (veta2015assessment). The abundance, or count, of mitotic figures has been shown to be strongly correlated with cell proliferation, which in turn is an important prognostic indicator of tumour behaviour, and thus is a key parameter in several tumour grading systems (veta2015assessment, Aubreville2020). However, other imposter/mimicker cells are often mistaken for mitotic figures due to their similar appearance/morphology, leading to large inter-rater variability. The introduction of deep learning methods for automated detecting/counting of mitotic figures in histology images offers a potential solution to this challenge.

An additional challenge is the translation of machine learning models into clinical practice (i.e., on whole-slide images or WSIs generated by digital slide scanners), which requires a high degree of robustness to staining and scanner variations. The WSIs can vary in their appearance as a result of differences in the way in which the sample was prepared (e.g. preparation/staining procedures), scanner acquisition method (scanner settings), and tumour type itself. The result of these variations is a domain shift between WSIs collected from different scanners/sites.

Last year, the MItosis DOmain Generalization (MIDOG) 2021 challenge (midog2021) provided a means of testing different algorithms on cohorts of expertly annotated histology images for mitotic figure detection in the presence of a domain shift, specifically in human breast cancer. To combat these challenges, we incorporated stain augmentation during the training of our proposed hybrid mitosis detection pipeline. This pipeline consisted of (a) a mitotic candidate segmentation model and (b) refinement by a deep learning (DL) classifier, and achieved joint-fist place on final testing (jahanifar2021midog).

The mitotic count in histopathology images is an important metric not only for tumours, but for a wide range of other neoplastic tissue from differing cancer types and species. However, there is a corresponding domain shift in the appearance of this morphology when switching between tissue types, that can lead to a drop in performance when applying machine learning algorithms to new domains. This year the MIDOG 2022 challenge (midog2022) aims to further the work of MIDOG21, by testing algorithms on hundreds of tumor cases, acquired from different laboratories, from different species (e.g. human, dog, cat), with different whole slide scanners. There are two tracks to the MIDOG22 challenge. In the first track, participants may only use the released challenge data for training/evaluating their algorithm. Conversely, in the second track, participants may use any publicly available dataset and/or models to solve the problem. Owing to the success of our algorithm in MIDOG21 (jahanifar2021midog), we have applied similar models to MIDOG22. We again incorporated stain augmentation during the training of our proposed hybrid mitosis detection pipeline. The hybrid pipeline consisted of (a) a mitotic candidate segmentation model and (b) refinement by a deep learning (DL) classifier. For the first track, we used mitosis disks as pseudo Ground-Truth (GT) maps for the training of our segmentation model. However, for the second track, we generated GT segmentation masks of mitotic figures via a semi-automated DL model (jahanifar2019nuclick, koohbanani2020nuclick). The use of a pre-trained DL method for GT allows the DL models to exploit the important contextual information and treating this detection task as a segmentation task instead.

Figure 1: An example of generated mitosis mask and mitosis point GT to be used for training of the proposed segmentation model.

Methodology

0.1 Mitosis candidate segmentation

0.1.1 Mitosis mask generation

We approach the mitosis candidate detection problem as a segmentation task. However, in order to train a CNN for the segmentation task in a supervised manner, GT masks of the desired objects within the image are required. Since the organizers have only provided approximate bounding box annotations for each mitosis in the released MIDOG dataset, we obtained mitotic instance segmentation masks in two different ways for two tracks of the challenge. In the first track, were use of external data/models is not allowed, we generate mitosis point map by drawing disks with radius of 17 pixels centered at the GT mitosis points provided by the challenge organizers. However, for the second track we use the NuClick¹¹1https://github.com/mostafajahanifar/nuclick_torch (jahanifar2019nuclick, koohbanani2020nuclick), a CNN-based interactive segmentation model capable of generating precise segmentation masks for each mitotic figure from a point annotation within the mitotic figure. Therefore, for each annotation point in the dataset, we fed the centre point of the bounding box alongside the patch from the original image into NuClick to generate the individual segmentation mask. fig. 1.a shows an example image for which mitosis mask is generated using NuClick and shown in fig. 1.b. On the right panel of fig. 1 zoomed-in regions are shown where fig. 1.d and fig. 1.e show mitosis point and mitosis mask GT maps, respectively.

0.1.2 Segmentation model

We employed a lightweight segmentation model, called Efficient-UNet (jahanifar2021semantic), for the segmentation task. The Efficient-UNet is a fully convolutional network based on an encoder-decoder design paradigm where the encoder branch is the B0 variant of Efficient-Net (tan2019efficientnet). Using this model with pre-trained weights from ImageNet as a backbone allows the overall model to benefit from transfer-learning, by extracting better feature representations and gaining higher domain generalizability. The Jaccard loss function (jahanifar2018segmentation) is robust against the imbalanced population of positive and negative pixels in the segmentation dataset, and thus has been utilised to train the model.

0.1.3 Model training

In order to train and evaluate the model, we extracted $512 \times 512$ patches from the original images. There was a large class imbalance in the training dataset, owing to the much fewer patches that contained mitosis (positive patches) in comparison to those without mitosis (negative patches). Since we did not wish to introduce a bias towards predicting empty maps (hence increasing the number of false negatives), we devised an on-the-fly under-sampling approach which guaranteed that similar number of positive and negative patches were sampled at the beginning of each epoch. Here, we used all positive patches in all epochs but randomly sampled equal number of negative patches in each epoch. This way we trained a segmentation model that maintains a high level of precision whilst having a high recall.

0.1.4 Post-processing and candidate extraction

At the inference stage of the previous step, each image is tiled with overlap ( $512 \times 512$ patches with 75 pixels overlap) and results for all tiles are aggregated to generate the segmentation prediction map. We then use a sequence of morphological operations and compute the centroid of the connected components to extract candidate mitotic cells from the segmentation map.

0.2 Mitosis candidates refinement

In the final step of our method, the mitosis candidates discovered in the previous step were verified using a classifier. Here, we trained an Efficient-Net-B7 (tan2019efficientnet) classifier to distinguish between mitoses and mimickers. To train the classifier we extracted mitosis and mimicker patches ( $128 \times 128$ pixels) based on the annotations provided by the challenge organizers. Again, to deal with the problem of class imbalance, we incorporated the on-the-fly under-sampling technique.

0.3 Data augmentation

To make both segmentation and classification networks more robust against the variation seen in histology images, we include the standard data augmentation techniques during the network training phase, including image flipping, rotation, shearing, zooming, elastic deformation, brightness and contrast adjustment, blurring, sharpening, color jittering, and saturation adjustment. The extent and combinations of these augmentation techniques are randomly selected on-the-fly and differ from epoch to epoch. Furthermore, stain augmentation has been incorporated using TIAToolbox (pocock2021tiatoolbox) where the stain components of the input patches to segmentation or classification network were adjusted randomly during the training.

Experiment	Cross-Validation			Preliminary Testing
Experiment	Recall	Precision	F1 Score	Recall	Precision	F1 Score
Track 1: Mitosis Point Segmentation + Classification	0.8314	0.8012	0.8160	0.7833	0.7855	0.7844
Track 2: Mitosis Mask Segmentation + Classification	0.8364	0.7978	0.8167	0.7472	0.7982	0.7718

Table 1: Mitosis detection results for cross-validation experiments on MIDOG22 training set and external preliminary test set.

0.4 Inference

The same pipeline as used for training was applied to each input image for inference. However, in order to benefit from all the models and all the training data, we also included “model ensembling” and “test time augmentation” (TTA) techniques in the inference pipeline. Therefore, during segmentation and classification, predictions from all three models from the cross-validation experiments (in addition to predictions on input image variations by TTA techniques like image flipping and sharpening) are averaged to make more confident and robust final predictions on unseen data.

Evaluation and Results

The training set released with the MIDOG 2022 challenge contains 354 images with GT point annotations. All segmentation and classification models were evaluated in a 3-fold cross-validation framework. For the first track of the challenge we only used MIDOG22 dataset (midog2022), whereas, for the second track we have also incorporated the TUPAC mitosis dataset (veta2019predicting) as an auxiliary training dataset. We additionally generated ground truth mitosis point and mitosis masks for the TUPAC dataset. We optimised our segmentation and classification models on cross-validation based on the segmentation loss and the classification F1-score. The results of the cross-validation experiments of the proposed method trained for track 1 and 2 of the challenge are reported in Table 1. The cross-validation results suggest that using both a mitosis mask and the auxiliary TUPAC dataset only marginally improves the overall mitosis detection, increasing the F1 score from 0.8160 to 0.8167. Conversely, during the preliminary test phase, a performance drop is seen, where the Track 1 model achieved an F1 score of 0.7844 but the Track 2 model achieved an F1 score of just 0.7718. However, the preliminary test set contains only 20 images, of which some contain no mitoses (including plain white/black images). Thus, care is taken in interpreting this discrepancy in performance of our two models from cross-validation to preliminary testing. Readers are invited to refer to the challenge leaderboard²²2https://midog2022.grand-challenge.org/evaluation/final-test-phase-task-1-without-additional-data/leaderboard/ to see the latest results.

Discussion and Conclusion

In this work, we have presented a method for the challenge of mitotic figure detection in histology images in the presence of a domain shift. Our proposed method first segments mitotic figures, based on the Efficient-UNet architecture, before passing the results of segmentation on to a DL-based classifier to further differentiate between mitotic figures and hard negatives (mimickers). The proposed method achieved a high F1-score of 0.784 when tested on the preliminary test set for the MIDOG22 challenge.