Number: Su1831
IDENTIFICATION AND RANKING OF GENE MODULE ASSOCIATION WITH HISTOLOGIC FEATURES IN ULCERATIVE COLITIS
Society: AGA
Track: Inflammatory Bowel Diseases
Histologic features in digitized whole slide images (WSIs) of Inflammatory Bowel Disease (IBD) biopsies capture aspects of disease activity that are increasingly seen as valuable in characterizing mucosal healing and remission in clinical trials. Given the success achieved in estimating histologic disease severity from WSIs using weakly supervised deep learning (DL) models, we investigated whether such models can identify visual histologic phenotypes associated with gene co-expression modules (sets of genes with highly coordinated expression across samples) related to IBD biology. Using transcriptomic profiling (microarray) data from biopsies adjacent to 1967 H&E-stained colonic biopsies collected from 1599 patients in three Ulcerative Colitis (UC) clinical trials (NCT01959282, NCT01988961, and NCT01988961), we applied correlation and clustering analysis to identify 16 gene co-expression modules. After removing trial-specific transcriptomics batch effects by preprocessing with ComBat, we used 1579 WSIs of the above biopsies to train a multi-instance learning (MIL) multitask DL model with self-attention to estimate Gene Signature Variation Analysis (GSVA) signatures computed for each of those modules. To compensate for training data limitations, our signature estimation pipeline incorporated a foundational model - a very large neural network that uses self-supervised learning to learn histological features from additional unlabeled WSI datasets. We used the Self-Distillation with No Labels (DINO) v2 foundational model with 108 parameters pretrained on over 56000 WSIs of biopsies obtained from patients with IBD or several different types of cancer in multiple clinical trial and retrospective study datasets to create fixed-length representations of input WSIs. We measured model predictive performance by computing the mean root mean squared error (RMSE) for each module's estimated signature values normalized with respect to signature range and averaged across 4 cross-validation folds; mean normalized RMSE values for the modules ranged from 0.22 to 0.30 over a scale of [0, 1]. By ranking model performance evaluated on a test subset of 388 WSIs for each of the target tasks, we were able to identify modules whose signatures are most strongly associated to histological image data. We observed that the modules with best performance (lowest RMSE) - which included those corresponding to immune signaling, granulocytes, stromal tissue, and plasma cells - all are associated with active inflammation and other aspects of immune response. These findings suggest that estimation of gene expression associated with immune response/inflammation from adjacent histology images may provide a richer assessment of histologic disease activity than existing severity measures alone without necessitating additional and costly transcriptomics analysis of imaged biopsy tissue.


Number: Su1831
IDENTIFICATION AND RANKING OF GENE MODULE ASSOCIATION WITH HISTOLOGIC FEATURES IN ULCERATIVE COLITIS
Society: AGA
Track: Inflammatory Bowel Diseases
Histologic features in digitized whole slide images (WSIs) of Inflammatory Bowel Disease (IBD) biopsies capture aspects of disease activity that are increasingly seen as valuable in characterizing mucosal healing and remission in clinical trials. Given the success achieved in estimating histologic disease severity from WSIs using weakly supervised deep learning (DL) models, we investigated whether such models can identify visual histologic phenotypes associated with gene co-expression modules (sets of genes with highly coordinated expression across samples) related to IBD biology. Using transcriptomic profiling (microarray) data from biopsies adjacent to 1967 H&E-stained colonic biopsies collected from 1599 patients in three Ulcerative Colitis (UC) clinical trials (NCT01959282, NCT01988961, and NCT01988961), we applied correlation and clustering analysis to identify 16 gene co-expression modules. After removing trial-specific transcriptomics batch effects by preprocessing with ComBat, we used 1579 WSIs of the above biopsies to train a multi-instance learning (MIL) multitask DL model with self-attention to estimate Gene Signature Variation Analysis (GSVA) signatures computed for each of those modules. To compensate for training data limitations, our signature estimation pipeline incorporated a foundational model - a very large neural network that uses self-supervised learning to learn histological features from additional unlabeled WSI datasets. We used the Self-Distillation with No Labels (DINO) v2 foundational model with 108 parameters pretrained on over 56000 WSIs of biopsies obtained from patients with IBD or several different types of cancer in multiple clinical trial and retrospective study datasets to create fixed-length representations of input WSIs. We measured model predictive performance by computing the mean root mean squared error (RMSE) for each module's estimated signature values normalized with respect to signature range and averaged across 4 cross-validation folds; mean normalized RMSE values for the modules ranged from 0.22 to 0.30 over a scale of [0, 1]. By ranking model performance evaluated on a test subset of 388 WSIs for each of the target tasks, we were able to identify modules whose signatures are most strongly associated to histological image data. We observed that the modules with best performance (lowest RMSE) - which included those corresponding to immune signaling, granulocytes, stromal tissue, and plasma cells - all are associated with active inflammation and other aspects of immune response. These findings suggest that estimation of gene expression associated with immune response/inflammation from adjacent histology images may provide a richer assessment of histologic disease activity than existing severity measures alone without necessitating additional and costly transcriptomics analysis of imaged biopsy tissue.

