-
Notifications
You must be signed in to change notification settings - Fork 2
Implicit Generative Models Evaluation
Real samples from training set are displayed next to their nearest neighbors in the achievable generation space.
Cons :
- Typically computed with Euclidean distance which is very sensitive to minor perceptual perturbations
- Overfitting to training set makes it trivial to pass this test
Measure ability to fool subjects with generated samples
Cons :
- Cumbersome, expensive, experimental hazards causing inconsistent evaluation settings between subjects
- Fails to evaluate diversity --> Overfitting models pass this test too
Visualize representation disentanglement, space continuity, discriminator features and globally any facet of the model's regularity.
Image quality assessment provides a measure of the quality of an image in reference to the original image or not. We here review some metrics that have been used in works on generative methods for remote sensing (Wang et al. 2019, Grohnfeldt et al. 2018)
Compares the power of a clean image y to the power of corrupting noise from its corrupted version x as :
Pros :
Cons : High sensitivity towards biases in brightness
SAM (Spectral Angle Mapper, Boardman et al. 1993)
Estimates spectra similarity by comparing band similarities.
Given a pair of NxNxd images x and y, we have :
Variations :
- Kernel-SAM : use kernel trick on base SAM expression
Pros :
Cons :
SSIM (Structural Similarity Index, Wang et al. 2004)
Estimates structural disparities based on luminosity, constrast and structure for a pair of image windows x and y as :
see here for luminosity, contrast and structure expressions
Pros : Finds large-scale mode collapse reliably
Cons : Fails to diagnose smaller effects such as loss of variations in colors and textures + does not assess quality in terms of similarity to the dataset
Variations :
- ESSIM: adds edge information
- MS-SSIM: multi-scale comparison
- FSIM: compares phase congruency and gradient magnitude
- CW-SSIM: compares complex wavelet transform (deals with issues of image scaling, translation and rotation)
Pretty self-explanatory ? 😄
where
Pros :
Cons :
1
|
Metric | Comment | Ref |
Implementation
|
||
---|---|---|---|---|---|---|
2
|
Full-Reference | Error/Distortion-based | Mean Absolute Error | - | - |
np.abs(x - y).mean()
|
3
|
Mean Squared Error | - | - |
np.square(x - y).mean()
|
||
4
|
PSNR | - | - | |||
5
|
SVD-distortion | averages stretcher deviation by block | None found | |||
6
|
Distortion Measure | didn't understand this one quite well | None found | |||
7
|
Similarity-based | Structural Content | Ratio of squares sum | - |
np.mean(y**2/x**2)
|
|
8
|
Mutual Information | - | - | |||
9
|
Cross-Correlation | - | - | |||
10
|
Spectral Angle Mapper | easy to implement | None found | |||
11
|
Universal Index | lesser version of SSIM | ||||
12
|
Structural Similarity Index (SSIM)
|
structure x luminosity x constrast | ||||
13
|
Mutliscale-SSIM | same but multiple image scales | ||||
14
|
Features-SSIM | phase congruency and gradient magnitude | None found | |||
15
|
Complex-Wavelett-SSIM | handles scaling, translation and rotations | ||||
16
|
No-Reference | BRISQUE |
estimates asymmetric generalized Gaussian params on MSCN distribution - requires training
|
|||
17
|
GMLOGQA | gradient magnitude and laplacian of gaussian response - required training | ||||
18
|
ILNIQE | estimates Weibull params fitting gradient magnitude - requires training | ||||
19
|
SSEQ | spatial and spectral entropy features - requires training | ||||
20
|
ENIQA |
improved SSEQ with multiscales, Log-Gabor and bandwise approach - requires training
|
to be completed but, as of now, not a priority in the context of virtual remote sensing product generation as we have access to the generation groundtruth and would rather focus on evaluation procedures based on comparison to groundtruth
- Pros and Cons of GAN Evaluation Measures, Borji 2018 : Comprehensive overview on GANs evaluation measures
- A note on the evaluation of generative models, Theis et al. 2015 : provides good explanations on why some measures are inconsistent with each other
- More to come
This is a footer