Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancies in Cortical Thickness Values from ADNI GO Compared to Published ANTsXNet Data #168

Open
GayanSamuditha opened this issue Feb 18, 2025 · 9 comments

Comments

@GayanSamuditha
Copy link

GayanSamuditha commented Feb 18, 2025

Hello @ntustison @cookpa

I have extracted cortical thickness values from ADNI GO phase MRI data using the ANTsXNet cortical thickness pipeline.
The extracted dataset is MPRAGE-ADNIGOsheet.csv, which contains cortical thickness values for multiple brain regions.

To validate the correctness of these extracted values, I compared them against the published cortical thickness dataset (antsxnetThickness.csv), available in your repository under:

However, upon detailed analysis, I found significant discrepancies between my extracted data and the published data, suggesting a possible systematic bias or differences in measurement methods.

Key Observations from the Analysis

  1. Large Deviations in Bland-Altman Analysis (Systematic Bias)
    When performing a Bland-Altman analysis, the differences between my extracted cortical thickness values and the published values were systematically biased, indicating a consistent overestimation or underestimation in specific brain regions.

  2. High Variance in Boxplots (Unexpected Variability)
    Some brain regions exhibited unexpectedly high variance, suggesting that the measurement process in my extracted dataset does not fully align with the published dataset.
    I'm not sure is this may indicate differences in preprocessing steps, scanner resolution, or image normalization procedures between my pipeline and the published reference dataset.

  3. Weaker Correlation with Published Study
    A correlation analysis showed a weaker-than-expected agreement between my extracted cortical thickness values and the reference dataset.

Image
Some regions show unexpectedly large differences (up to ±30%).
The systematic bias in specific brain regions raises concerns about the reproducibility of cortical thickness calculations.

The datasheet is attached here:
MPRAGE-ADNIGOsheet.csv

I have these questions:

  1. Could there be a difference in preprocessing methods between the published dataset?
  2. Are the published cortical thickness values derived from a specific version of ANTsXNet with different settings?
  3. Is there an expected normalization step that should be applied before comparing extracted values to published data?
  4. Could these differences be caused by scanner variability or gradient distortion corrections applied differently in the published dataset?
  5. Do you recommend any specific quality control steps to ensure my extracted values align with the published dataset?

I would appreciate any guidance on how to resolve these discrepancies and ensure that my extracted cortical thickness values are consistent with the published dataset.
Could you provide any insights into possible causes, or recommend additional verification steps?

Thank you for your time and for maintaining such an excellent open-source project.

I can share the image dataset if it is needed for further investigation.

Thank you!

@ntustison
Copy link
Member

The cortical thickness pipeline is relatively straightforward so I really don't know what you're doing to cause the discrepancy.

Why don't you send me a single subject with your results (both images and tabulated values) with the python script you used to reproduce those results and I'll take a look.

@GayanSamuditha
Copy link
Author

@ntustison here is the some of data ->

import ants
import antspynet
import pandas as pd
import glob
import os
import numpy as np
import logging

data_folder_path = ""
csv_file_path = os.path.join(data_folder_path, "")
log_file_path = os.path.join(data_folder_path, "")

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
    handlers=[
        logging.FileHandler(log_file_path),
        logging.StreamHandler()
    ]
)

def initialize_csv_file(csv_file_path):
    if not os.path.exists(csv_file_path):
        with open(csv_file_path, 'w') as csvfile:
            csvfile.write("LabelValue,Mean,Min,Max,Variance,Count,Volume,Mass,x,y,z,t,ImageFile\n")

def process_and_append_to_csv(t1_file, csv_file_path):
    try:
        logging.info(f"Processing image: {t1_file}")
        
        t1 = ants.image_read(t1_file)
        
        atropos = antspynet.deep_atropos(t1, do_preprocessing=True, verbose=True)
        
        kk_segmentation = atropos['segmentation_image']
        kk_segmentation[kk_segmentation == 4] = 3
        gray_matter = atropos['probability_images'][2]
        white_matter = atropos['probability_images'][3] + atropos['probability_images'][4]
        
        kk = ants.kelly_kapowski(s=kk_segmentation, g=gray_matter, w=white_matter, its=30, r=0.02, m=1.2, x=0, verbose=1)
        
        dkt = antspynet.desikan_killiany_tourville_labeling(t1, do_preprocessing=True, verbose=True)
        dkt_cortical_mask = ants.threshold_image(dkt, 1000, 3000, 1, 0)
        dkt = dkt_cortical_mask * dkt
        
        kk_mask = ants.threshold_image(kk, 0, 0, 0, 1)
        dkt_propagated = ants.iMath(kk_mask, "PropagateLabelsThroughMask", kk_mask * dkt)
        
        kk_regional_stats = ants.label_stats(kk, dkt_propagated)
        
        kk_regional_stats_df = pd.DataFrame(kk_regional_stats)
        kk_regional_stats_df.replace([np.inf, -np.inf], np.nan, inplace=True)  # Replace inf values with NaN
        kk_regional_stats_df.dropna(inplace=True)  # Drop any rows with NaNs
        
        if kk_regional_stats_df.empty:
            logging.warning(f"No valid data for image {t1_file}. Skipping appending to CSV.")
            return
        
        kk_regional_stats_df['ImageFile'] = os.path.basename(t1_file)
        
        # Append the DataFrame to the CSV file
        kk_regional_stats_df.to_csv(csv_file_path, mode='a', header=False, index=False)
        logging.info(f"Successfully processed and saved data for: {t1_file}")
    except Exception as e:
        logging.error(f"Failed to process {t1_file}: {e}")

def process_all_images_in_folder(data_folder_path, csv_file_path):
    initialize_csv_file(csv_file_path)
    
    image_files = glob.glob(os.path.join(data_folder_path, "*.nii"))
    
    if not image_files:
        logging.warning("No images found in the specified directory.")
        return
    
    logging.info(f"Found {len(image_files)} images. Beginning processing...")
    
    for t1_file in image_files:
        process_and_append_to_csv(t1_file, csv_file_path)

    logging.info("Processing complete.")

process_all_images_in_folder(data_folder_path, csv_file_path)

@stnava
Copy link
Member

stnava commented Feb 18, 2025

@ntustison could be that old bug that was fixed .... that added random variability

@ntustison
Copy link
Member

ntustison commented Feb 18, 2025

Thanks @stnava . Definitely. And who knows what ITK changes could have affected the results.

@GayanSamuditha --- I would also need the results for a single subject (all images and tabulated results). This includes the original DKT image and the propagated version and the Atropos segmentation image.

@ntustison
Copy link
Member

Also, h/t to @stnava, you might want to look at a resource such as this traveling cohort so that you can avoid the various issues that you've raised above as well as other possible sources of variance.

@GayanSamuditha
Copy link
Author

GayanSamuditha commented Feb 18, 2025

@ntustison @stnava Here is the data - data

@ntustison
Copy link
Member

I'm looking at the list of files and it looks like it's simply a set of several subjects. Did you not see where I wrote previously "I would also need the results for a single subject (all images and tabulated results). This includes the original DKT image and the propagated version and the Atropos segmentation image."? I would also like the file names clearly labeled so I know which is which.

Please understand that if you want me to spend time helping you, you need to make it as easy for me as possible.

@GayanSamuditha
Copy link
Author

@ntustison I'm sorry, I took some time to understand what you asking. I'm not very familiar with it. Here is What I found out

And I uploaded the subject image to drive:
Image: 002_S_1280.nii
002_S_1280

Image

subject_dkt_propagated.nii.gz
subject_atropos.nii.gz
original_dkt_atlas.nii.gz
subject_thickness.nii.gz
subject_dkt_thickness.csv

@ntustison
Copy link
Member

Just noticed this issue in your code:

kk = ants.kelly_kapowski(s=kk_segmentation, g=gray_matter, w=white_matter, its=30, r=0.02, m=1.2, x=0, verbose=1)

I don't know of anywhere that I've ever used KK with these parameters. In fact, the current default parameters have existed even prior to the existence of ANTsPyNet. You should go back and re-run your analysis based on the actual parameters used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants