Hostname: page-component-89b8bd64d-ktprf Total loading time: 0 Render date: 2026-05-09T05:39:01.705Z Has data issue: false hasContentIssue false

Exploring self-supervised learning biases for microscopy image representation

Published online by Cambridge University Press:  14 November 2024

Ihab Bendidi
Affiliation:
IBENS, Ecole Normale Supérieure PSL, Paris, 75005, France Minos Biosciences, Paris, 75005, France
Adrien Bardes
Affiliation:
INRIA, Ecole Normale Supérieure PSL, Paris, 75005, France FAIR, Meta, Paris, 75005, France
Ethan Cohen
Affiliation:
IBENS, Ecole Normale Supérieure PSL, Paris, 75005, France Synsight, Evry, 91000, France
Alexis Lamiable
Affiliation:
IBENS, Ecole Normale Supérieure PSL, Paris, 75005, France
Guillaume Bollot
Affiliation:
Synsight, Evry, 91000, France
Auguste Genovesio*
Affiliation:
IBENS, Ecole Normale Supérieure PSL, Paris, 75005, France
*
Corresponding author: Auguste Genovesio; Email: auguste.genovesio@ens.psl.eu
Rights & Permissions [Opens in a new window]

Abstract

Self-supervised representation learning (SSRL) in computer vision relies heavily on simple image transformations such as random rotation, crops, or illumination to learn meaningful and invariant features. Despite acknowledged importance, there is a lack of comprehensive exploration of the impact of transformation choice in the literature. Our study delves into this relationship, specifically focusing on microscopy imaging with subtle cell phenotype differences. We reveal that transformation design acts as a form of either unwanted or beneficial supervision, impacting feature clustering and representation relevance. Importantly, these effects vary based on class labels in a supervised dataset. In microscopy images, transformation design significantly influences the representation, introducing imperceptible yet strong biases. We demonstrate that strategic transformation selection, based on desired feature invariance, drastically improves classification performance and representation quality, even with limited training samples.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. Different transformation parameter choices induce an inter-class bias. Inter-class Linear Probing accuracy results versus variation of a transformation parameter, for Resnet18 architectures trained with various SSRL methods on the benchmark datasets Cifar10, Cifar100, and Imagenet100. Each dot and associated error bar reflects the mean and standard deviation of three runs for Imagenet100 and five runs for Cifar with different random seeds. While overall accuracy remains relatively consistent across a range of transformation parameters, these transformations can have a subtle but significant impact on individual class performance, either favoring or penalizing specific classes. Additional comparisons are available in Supplementary Materials.

Figure 1

Figure 2. Analyzing negative correlations in class accuracies in Cifar10 and Cifar100 datasets. Using diverse backbones and SSRL methods, plus varying transformations, we assess the proportion of classes with negative correlations in these datasets. (a) In Cifar100 with ResNet18, more negatively correlated classes are seen, likely due to overlapping classes with increased color jitter. (b) We apply ResNet18/50 and ConvNeXt-Tiny backbones with SimCLR, BYOL, and VICReg, on Cifar100, adjusting hue intensity. The ratio of negatively correlated classes remains consistent across configurations, suggesting these patterns in (a) are independent of the SSL method and encoder architecture.

Figure 2

Table 1. Correlation values between class properties and the effect of transformations on classes. We focus on class properties such as Intrinsic Dimension, Texture Analysis, Fourier Transform, and Spectrum of Feature Covariance, and transformations such as Hue Intensity, Color Jitter Probability, and Crop Size, applied on Cifar100. Values significantly larger than 1 indicate a notable difference between behavior groups with respect to the varying transformation. Asterisks (*) denote p-values > 0.05, indicating less significant correlations

Figure 3

Figure 3. Transformation choices influence superclass performance. We analyze mean superclass accuracy in Cifar100 using BYOL, SimCLR, and VICReg SSRL methods, varying crop size (a) or hue intensity (b). Our observations show consistent patterns across models, highlighting distinct effects of transformation parameters on different superclasses. Each superclass has unique optimal parameters, underlining the ability of transformation selection to modulate superclass performance.

Figure 4

Table 2. Metrics for clustering, linear evaluation, and LPIPS(58) in VGG11 models on MNIST(29) using MoCov2(11) and various transformations are shown. Specific transformations’ effects are examined across training configurations. The First Set, in bold, yields digit representations, while the Second Set focuses on handwriting style and thickness. Top1 Accuracy is from a Linear Evaluation, and LPIPS, using an AlexNet(27) backbone, reflects perceptual similarity. Silhouette scores(41) suggest good cluster quality in the second set, despite AMI scores indicating inaccurate digit cluster capture.

Figure 5

Figure 4. The selection of transformations dictates the features learned during the training process, thus enabling the adaptation of a model for different tasks. A t-SNE projection of the ten-class clustering of the MNIST dataset(29) was performed on two representations obtained from two self-supervised trainings of the same model using MoCo V2(11), with the sole distinction being the selection of transformations employed. One representation (a) retains information pertaining to the digit classes, achieved through padding, color inversion, rotation, and random cropping, while the other representation (b) preserves information regarding the handwriting font weight and style, achieved through vertical flips, rotation, random cropping.

Figure 6

Figure 5. Single cells’ genetic expression and environment (a - untreated) cause inherent dissimilarities within conditions, challenging perturbation detection and measurement. The figure (b - high concentration Nocodazole-treated cells) shows four morphological responses to the same treatment, one resembling untreated cells (b - far right). Most lower-concentration treatments produce phenotypes visually similar to untreated cells (data not shown). Images in this dataset are centered on a cell.

Figure 7

Table 3. The results of the adjusted mutual information score(47)obtained for two sets of transformations, with different SSRL approaches and backbones, through the mean of five training runs for each, compared to each other and to the AMI score achieved on the representations of pre-trained models (Resnet 101 and VGG16) trained with supervision on ImageNet, and applied on the dataset subsets containing Nocodazole, Cytochalasin B and Taxol. The selection of the pretrained models width is studied in Supplementary Materials. Both sets of transformations comprise random rotations, affine transformations, color jitter, and flips, with the first set including an additional random cropping, and resulting in a mediocre AMI score, and the second set applying random rotations and resulting in a significantly higher score.

Figure 8

Figure 6. K-Means (k = 4) clustering on Nocodazole data subset (see Supplementary Material), using VGG13 and MoCov2 with different augmentations, aims to categorize cells’ morphological responses. Images nearest to each cluster’s centroid are based on Euclidean distance in representations. Clusters in (a), formed with color jitter, flips, rotation, affine transformation, and random cropping, focus on cell quantity per image.Clusters in (b), from rotations, center cropping, color jitter, and flips, consider specific phenotypes. Transformation details are in Supplementary Material.

Figure 9

Figure 7. The clustering results were achieved through the utilization of two MoCo v2 losses(11)with a VGG13 backbone, each with a distinct set of transformations, on the Nocodazole (a), Cytochalasin B (b), and Taxol (c) image treatment subsets. One loss employs color jitter, flips, rotation, affine transformation, and random cropping, while the other uses rotations, center cropping, color jitter, and flips. The clustering results demonstrate that the phenotypes of each subset are clearly separated and represented in each cluster, as evidenced by the images closest to its centroïd.

Figure 10

Table 4. AMI score comparison in K-Means (k = 2) clusterings of models trained with two SSRL losses versus an ImageNet pre-trained encoder on Nocodazole, Cytochalasin B, and Taxol treated cell subsets. One SSRL loss uses color jitter, flips, rotation, affine transformation, and random cropping; the other, rotations, center cropping, color jitter, and flips. Careful selection of transformation sets, tailored to desired features, enhances clustering performance in self-supervised training over supervised pre-trained models, even in small datasets.

Figure 11

Figure 8. Ablation study on Adjusted Mutual Information (AMI) scores with progressive transformation integration, using MoCo v2 SSRL and VGG13. The first set of transformations, especially random rotations, notably improves the score and representation. Including center cropping, focusing on cellular center, further enhances results.

Supplementary material: File

Bendidi et al. supplementary material

Bendidi et al. supplementary material
Download Bendidi et al. supplementary material(File)
File 3.6 MB