Skip to content

Implicit Generative Models Evaluation

Shahine edited this page May 18, 2020 · 12 revisions

Qualitative Evaluation

Nearest Neighbors

Real samples from training set are displayed next to their nearest neighbors in the achievable generation space.

Cons :

  • Typically computed with Euclidean distance which is very sensitive to minor perceptual perturbations
  • Overfitting to training set makes it trivial to pass this test

"Turing-like" tests

Measure ability to fool subjects with generated samples

Cons :

  • Cumbersome, expensive, experimental hazards causing inconsistent evaluation settings between subjects
  • Fails to evaluate diversity --> Overfitting models pass this test too

Visualizing Internals of the Model

Visualize representation disentanglement, space continuity, discriminator features and globally any facet of the model's regularity.

Image Quality Assessment Metrics

Image quality assessment provides a measure of the quality of an image in reference to the original image or not. We here review some metrics that have been used in works on generative methods for remote sensing (Wang et al. 2019, Grohnfeldt et al. 2018)

PSNR (Peak Signal to Noise Ratio)

Compares the power of a clean image y to the power of corrupting noise from its corrupted version x as :

psnr-expression

Pros :

Cons : High sensitivity towards biases in brightness

SAM (Spectral Angle Mapper, Boardman et al. 1993)

Estimates spectra similarity by comparing band similarities.

Given a pair of NxNxd images x and y, we have :

sam-expression

Variations :

  • Kernel-SAM : use kernel trick on base SAM expression

Pros :

Cons :

SSIM (Structural Similarity Index, Wang et al. 2004)

Estimates structural disparities based on luminosity, constrast and structure for a pair of image windows x and y as :

ssim-expression

see here for luminosity, contrast and structure expressions

Pros : Finds large-scale mode collapse reliably

Cons : Fails to diagnose smaller effects such as loss of variations in colors and textures + does not assess quality in terms of similarity to the dataset

Variations :

  • ESSIM: adds edge information
  • MS-SSIM: multi-scale comparison
  • FSIM: compares phase congruency and gradient magnitude
  • CW-SSIM: compares complex wavelet transform (deals with issues of image scaling, translation and rotation)

Sharpness Difference (SD)

Pretty self-explanatory ? 😄

where

Pros :

Cons :

Probabilistic Measures

to be completed but, as of now, not a priority in the context of virtual remote sensing product generation as we have access to the generation groundtruth and would rather focus on evaluation procedures based on comparison to groundtruth

References

Clone this wiki locally