-
Notifications
You must be signed in to change notification settings - Fork 83
Add X and Y chromosomes to CNV segment to gene symbol mapping #259
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, the only real question I have is about the logic of calling gain and loss in males. I also was curious about the logic of skipping XY processing for controlfreec only in the run script.
--filename_lead "cnvkit_annotated_cn" \ | ||
--chrom_filter $CHROMFILT \ | ||
--cnvkit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No xy option here? So the cnvkit files always do xy? Is there a particular reason for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default is to run the X and Y steps, so I do not include it here. The reason an option is included for ControlFreeC is to allow this to pass CI. My suspicion is that the ControlFreeC subset files don't represent the X and Y chromosome enough, causing an error with IRanges::mergeByOverlaps
and I still want the X and Y portion of the code to be tested with CNVkit.
germline_sex_estimate == "Male" & copy.num < (0.5 * tumor_ploidy) ~ "loss", | ||
germline_sex_estimate == "Male" & copy.num > (0.5 * tumor_ploidy) ~ "gain", | ||
germline_sex_estimate == "Male" & copy.num == (0.5 * tumor_ploidy) ~ "neutral", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if these are the right cutoffs... I'm thinking that if tumor_ploidy is 3, then the expected number of X or Y chromosomes is 1 or 2, not 1.5; so everything would be categorized as gain or loss. If we are within 0.5 copies of the 0.5 * tumor_ploidy
, perhaps we should have a different call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at ControlFreeC calls for X and Y chromosomes only in instances where germline_sex_estimate == "Male"
and grouping by chromsome, copy number, status, and ploidy:
chr copy.number status tumor_ploidy n
X 0 loss 2 176
X 2 gain 2 1447
X 3 gain 2 255
X 4 gain 2 148
X 5 gain 2 117
X 6 gain 2 74
X 7 gain 2 27
X 8 gain 2 17
X 9 gain 2 13
X 10 gain 2 5
X 11 gain 2 2
X 12 gain 2 3
X 13 gain 2 7
X 14 gain 2 2
X 16 gain 2 1
X 17 gain 2 1
X 19 gain 2 1
X 21 gain 2 1
X 24 gain 2 3
X 27 gain 2 1
X 28 gain 2 1
X 31 gain 2 2
X 34 gain 2 1
X 36 gain 2 1
X 53 gain 2 1
Y 0 loss 2 299
Y 2 gain 2 3338
Y 3 gain 2 1026
Y 4 gain 2 201
Y 5 gain 2 56
Y 6 gain 2 14
Y 7 gain 2 10
Y 8 gain 2 3
Y 10 gain 2 5
Y 11 gain 2 1
Y 17 gain 2 1
Y 19 gain 2 1
Y 30 gain 2 1
Y 41 gain 2 1
Y 54 gain 2 1
X 0 loss 3 17
X 1 loss 3 1445
X 2 gain 3 3008
X 3 gain 3 166
X 4 gain 3 45
X 5 gain 3 50
X 6 gain 3 35
X 7 gain 3 20
X 8 gain 3 17
X 9 gain 3 18
X 10 gain 3 8
X 11 gain 3 4
X 12 gain 3 1
X 14 gain 3 2
X 15 gain 3 1
X 46 gain 3 1
Y 0 loss 3 42
Y 1 loss 3 1094
Y 2 gain 3 1538
Y 3 gain 3 435
Y 4 gain 3 212
Y 5 gain 3 56
Y 6 gain 3 22
Y 7 gain 3 7
Y 8 gain 3 5
Y 9 gain 3 3
Y 10 gain 3 5
Y 12 gain 3 1
X 0 loss 4 9
X 1 loss 4 311
X 2 neutral 4 1412
X 3 gain 4 583
X 4 gain 4 65
X 5 gain 4 30
X 6 gain 4 21
X 7 gain 4 18
X 8 gain 4 11
X 9 gain 4 12
X 10 gain 4 6
X 11 gain 4 7
X 12 gain 4 7
X 13 gain 4 6
X 14 gain 4 6
X 15 gain 4 2
X 17 gain 4 3
X 18 gain 4 2
X 21 gain 4 1
X 24 gain 4 1
X 25 gain 4 1
X 27 gain 4 1
X 28 gain 4 1
X 39 gain 4 1
X 53 gain 4 1
X 108 gain 4 1
Y 0 loss 4 61
Y 1 loss 4 236
Y 2 neutral 4 534
Y 3 gain 4 366
Y 4 gain 4 181
Y 5 gain 4 134
Y 6 gain 4 87
Y 7 gain 4 24
Y 8 gain 4 23
Y 9 gain 4 10
Y 10 gain 4 4
Y 11 gain 4 3
Y 12 gain 4 2
Y 18 gain 4 1
Y 19 gain 4 1
I believe the logic I've implemented for CNVkit matches this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does appear so. Something to keep in mind downstream though. 🤨
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really missing that 💯 reaction option here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the caveat that I don't like how ControlFreeC handles sex chromosomes in males when ploidy is odd, this looks good to go.
Purpose/implementation Section
Here, I'm adding handling of X and Y chromosomes to the
focal-cn-file-preparation
module.status
column use similar logic to what is in the ControlFreeC file.01-prepare-cn-file.R
. The autosome and sex chromosome files get saved separately. The sex chromosome files include thegermline_sex_estimate
column frompbta-histologies.tsv
.What GitHub issue does your pull request address?
#186
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Which areas should receive a particularly close look?
The sex chromosome
loss
,gain
,neutral
calls in the CNVkit preparation steps (00-add-ploidy-cnvkit
) could use a good look. I am trying to mirror what logic is in the ControlFreeC file.Is there anything that you want to discuss further?
At this point, I've chosen to not mark
amplification
in X and Y chromosomes and I've left in the neutral calls because I feel that a downstream analyst might want to give these a careful look.Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?
N/A
Reproducibility Checklist
This has already been added to Docker and CI at an earlier point.