visit
Today we would like to share our thoughts and investigations into very promising direction: Human in the loop AI for medical image analysis within a single environment — . Our platform allows to manage and annotate data, train NNs, apply them for automatic pre-annotation and then deploy them as API.
Challenge 1: data privacy
Medical data is still personal and not easy to access. And due to data privacy concerns most of the public health centers are reluctant to share the data.
Challenge 2: size of annotated data
Annotation process is hard to outsource and only expert physicians can analyze medical images. This limitation leads to high costs and to the lack of annotated data.Challenge 3: quality of annotation tools
Annotation tools, that can be used to extract insights from medical images, are still limited, in most cases publicly unavailable and requiring most analysis to be done manually.Challenge 4 (consequence of 1 and 2): segmentation challenge
Datasets for segmentation task are typically extremely small compared to large public datasets of common images (COCO, PascalVoc and so on). Due to the size of datasets it is difficult to train very deep neural network architectures. Objects of interest can vary in size, shape and position. In combination with the “soft” boundaries it produces additional problems.Supervisely: user interfaces We realize, that there is still a lot of work ahead: increase the number of convenient annotation tools and add the support of DICOM format, three dimensional images, sequences of images and so on. But these are only technical issues, first steps are already done and promising results are obtained. We are passioned to accelerate medicine and happy to be a part of global research community that drives deep learning revolution to healthcare.
There could be no more important application of this new capability [deep learning] than improving patient care
— Jensen Huang, NVIDIA CEO and co-founder
Dataset contains 28 annotated images with resolution 999 × 960. We consider the case that we have only 6 annotated images in training dataset. Other images will be used for final evaluation of quality. All training images are below:
Here is the whole training dataset we use. This scenario is pretty close to real world: medical doctor annotates few images, then neural network is trained on this data and applied to other images for pre-segmentation. Then doctor just corrects the NN predictions. Such approach is called Human in the loop AI. It is aimed to significantly accelerate efficiency of human expert. PS. Thanks to Supervisely entire research took 2 hours without haste ☕.
How DTL query interface looks like In this use case we did horizontal/vertical flips and relatively big random crops. We got 264 training examples from only 6 annotated images. Here is the visualization of computational graph that we applied to our data:
Resulted crops after augmentation
There are few state of the art Neural Networks for semantic segmentation in Supervisely. One of them — our custom UNet-like architecture. It was chosen because: we have small training dataset, it is accurate and fast to train. Also we use combination of Binary Cross Entropy and Dice losses because of class imbalance problem. Vessels pixels covers only few percents of image area in contrast to background pixels. We trained NN 50 epochs. It is interesting to visualize Neural Network predictions during training. We take unseen image and apply NN after each epoch. Here you can see how our NN becomes smarter over time.
Supervisely supports multi GPU training. Each epoch takes around 20 seconds on four GPU. Total training time — around 17 minutes.
Left: NN predictions, Right: ground truth
As you can see from this comparison every relatively bold vessels are segmented. There is no noise. It means that the human only have to draw few hairlines with “polyline” tool. Also, as we understand real data has much bigger resolution that public data we use in this experiment. We think that this fact is crucial for the quality of hairlines segmentation. Resolution of publicly available images is not enough. Look at this example: do you see the vessels that are annotated by doctors?
Left: meme, Middle: original image, Right: doctor’s annotation
We were not lazy and made time measurements: how much time we need for manual annotation from scratch vs correction of NN predictions. Manual annotation from scratch: 36 minutes / image. Correction of NN predictions: 4 minutes / image.
Conclusion is obvious.