Why Standardisation Of Pre-inferencing Image Processing Methods Is Crucial For Deep Learning Algorithms – A Compelling Evidence Based On The Variations In Outputs For Different Inferencing Workflow


To evaluate if there are statistically significant differences in the outputs of a deep learning algorithm on two inferencing workflows, with different unit-processing methods.


The study was performed on DeepCOVIDXR, an open-source algorithm for the detection of COVID19 on chest X-rays. It is an ensemble of convolutional neural networks developed to detect COVID-19 on frontal chest radiographs. The algorithm was evaluated using a dataset of 905 Chest X-rays containing 484 COVID+ cases (as determined RTPCR test) and 421 COVID negative cases. The algorithm supports both batch image processing (workflow1) and single image processing (workflow2) for running inferencing. All the Xray were inferenced using both methods. In batch image processing, images were resized (224x224 and 331x331) and then lung was cropped out, but in single image processing, cropping was done without resizing of images.


We observed a significant difference in the results for the two inferencing workflows. The AUC for COVID classification was 0.632 on the bulk image processing pathway whereas it was 0.769 for the single image processing. There were discordant results in 334 studies, 164 were classified as positive in workflow1 whereas negative in workflow2 whereas 170 X-rays that were classified as negative on workflow1 were classified as positive in workflow2.


We report statistically significant differences in the results of a deep learning algorithm on using different inferencing workflows.


With rising adoption of radiology AI, it is important to understand that seemingly innocuous changes in the processing pathways can led to disastrous results in the clinical results.