Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random Images for TCAV for Diabetic Retinopathy application #115

Closed
soumbane opened this issue Jun 4, 2021 · 9 comments
Closed

Random Images for TCAV for Diabetic Retinopathy application #115

soumbane opened this issue Jun 4, 2021 · 9 comments

Comments

@soumbane
Copy link

soumbane commented Jun 4, 2021

Hi,

I had a question regarding application of TCAV for Diabetic Retinopathy (DR). My question is, for application of TCAV on DR, what do you choose as the random images. Are they completely random images as shown in paper or are the random images related to DR or healthy eye?

Also, how many random folders/experiments do you use for DR application? For the statistical significance testing?

@jameswex
Copy link
Collaborator

jameswex commented Jun 7, 2021

For random images, we used random DR images from the same larger dataset used to train/test the DR model. They are random in that they aren't cherry-picked to have any specific concepts in them (such as all diseased eyes, for example).

@BeenKim can answer about how many random sets to use to test significance.

@soumbane
Copy link
Author

soumbane commented Jun 7, 2021

@jameswex Thank you for your response. So, just to summarize, you selected a random image from the entire train/test set (Ex: any image without any particular concept). And your concept images were related to the particular concept of interest. For example: for level 1 of DR, concept folders had DR images with "MA" (concept folder 1), "HMA" (concept folder 2) and random folders were general DR images, with or without "MA", "HMA" (Any DR image). Is that correct?

@jameswex
Copy link
Collaborator

jameswex commented Jun 7, 2021

That is correct from my understanding, but would good to have @BeenKim confirm as well.

@soumbane
Copy link
Author

soumbane commented Jun 7, 2021

Thank you @jameswex for your help. Yes, any insights from @BeenKim regarding this issue would be very helpful.

@BeenKim
Copy link
Contributor

BeenKim commented Jun 7, 2021

What james said is correct! You can also train a relative CAV instead too (unlike what we did). To do that, training MA concept by using all the images in level 1 what does not have MA as negative set. Depending on your embedding geometry, this might work better (in hand-waved theory, it gets you more "fine grain" results).

re random sets: I think we tried with 20-30 range for the random sets to test significance.

@soumbane
Copy link
Author

soumbane commented Jun 7, 2021

I see. Thank you @BeenKim for your advise and help.

@BeenKim BeenKim closed this as completed Jul 6, 2021
@soumbane
Copy link
Author

Hi,
I had one more question regarding application of TCAV for DR. We are working on a different disease similar to DR. So my question is how do you select the concept images? Do you center crop from the original image and use those as concepts? Or do you have a separate dataset for concepts?
If there is no separate concept dataset (such as Broaden for patterns), then we need to extract the concept from the original images, otherwise it will be like taking zebra as both class images and concept images - which is wrong. We need to crop a part of the zebra to get the striped concept (if we don't have some pattern dataset like Broaden).
@jameswex @BeenKim Any insights from you would be very helpful.
Thanks,
Soumyanil.

@BeenKim
Copy link
Contributor

BeenKim commented Jul 13, 2021

We tried both cropping and not cropping at various times. The critical factor might be how much overlap you have with class dataset vs concept dataset. For example, if you only have zebra images to represent "stripes" and you are testing zebra classes with those that stripes concept, that's not good -- like you said, it is testing the same thing. However, if there are many other striped things in the concept dataset (eg t-shirt) than you are better off. Similarly, in medical settings, if concepts don't overlap a lot with your class labels (e.g., there are many young and old folks with cancer - age concept is evenly distributed for the cancer label), then not cropping is forgivable (especially if you don't have pixel-wise label, like the DR data we used). Hope this is helpful!

@soumbane
Copy link
Author

I see. For us, we do not have separate concept images, so we need to center crop from the class images to extract concept - otherwise there will be overlap and TCAV will not work (I have tested this - overlapping concept and class images).
But thank you so much for the valuable information @BeenKim .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants