Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Add X and Y chromosomes to CNV segment to gene symbol mapping #259

Merged
merged 13 commits into from
Nov 13, 2019

Conversation

jaclyn-taroni
Copy link
Member

Purpose/implementation Section

Here, I'm adding handling of X and Y chromosomes to the focal-cn-file-preparation module.

  • I added steps to the CNVkit file prep that made the status column use similar logic to what is in the ControlFreeC file.
  • I added a step for merging overlaps specifically for the X & Y chromosomes into 01-prepare-cn-file.R. The autosome and sex chromosome files get saved separately. The sex chromosome files include the germline_sex_estimate column from pbta-histologies.tsv.
  • I've added steps to compress the X&Y TSV files.

What GitHub issue does your pull request address?

#186

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

The sex chromosome loss, gain, neutral calls in the CNVkit preparation steps (00-add-ploidy-cnvkit) could use a good look. I am trying to mirror what logic is in the ControlFreeC file.

Is there anything that you want to discuss further?

At this point, I've chosen to not mark amplification in X and Y chromosomes and I've left in the neutral calls because I feel that a downstream analyst might want to give these a careful look.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

N/A

Reproducibility Checklist

This has already been added to Docker and CI at an earlier point.

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, the only real question I have is about the logic of calling gain and loss in males. I also was curious about the logic of skipping XY processing for controlfreec only in the run script.

--filename_lead "cnvkit_annotated_cn" \
--chrom_filter $CHROMFILT \
--cnvkit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No xy option here? So the cnvkit files always do xy? Is there a particular reason for that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default is to run the X and Y steps, so I do not include it here. The reason an option is included for ControlFreeC is to allow this to pass CI. My suspicion is that the ControlFreeC subset files don't represent the X and Y chromosome enough, causing an error with IRanges::mergeByOverlaps and I still want the X and Y portion of the code to be tested with CNVkit.

Comment on lines +60 to +62
germline_sex_estimate == "Male" & copy.num < (0.5 * tumor_ploidy) ~ "loss",
germline_sex_estimate == "Male" & copy.num > (0.5 * tumor_ploidy) ~ "gain",
germline_sex_estimate == "Male" & copy.num == (0.5 * tumor_ploidy) ~ "neutral",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if these are the right cutoffs... I'm thinking that if tumor_ploidy is 3, then the expected number of X or Y chromosomes is 1 or 2, not 1.5; so everything would be categorized as gain or loss. If we are within 0.5 copies of the 0.5 * tumor_ploidy, perhaps we should have a different call?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at ControlFreeC calls for X and Y chromosomes only in instances where germline_sex_estimate == "Male" and grouping by chromsome, copy number, status, and ploidy:

chr	copy.number	status	tumor_ploidy	n
X	0	loss	2	176
X	2	gain	2	1447
X	3	gain	2	255
X	4	gain	2	148
X	5	gain	2	117
X	6	gain	2	74
X	7	gain	2	27
X	8	gain	2	17
X	9	gain	2	13
X	10	gain	2	5
X	11	gain	2	2
X	12	gain	2	3
X	13	gain	2	7
X	14	gain	2	2
X	16	gain	2	1
X	17	gain	2	1
X	19	gain	2	1
X	21	gain	2	1
X	24	gain	2	3
X	27	gain	2	1
X	28	gain	2	1
X	31	gain	2	2
X	34	gain	2	1
X	36	gain	2	1
X	53	gain	2	1
Y	0	loss	2	299
Y	2	gain	2	3338
Y	3	gain	2	1026
Y	4	gain	2	201
Y	5	gain	2	56
Y	6	gain	2	14
Y	7	gain	2	10
Y	8	gain	2	3
Y	10	gain	2	5
Y	11	gain	2	1
Y	17	gain	2	1
Y	19	gain	2	1
Y	30	gain	2	1
Y	41	gain	2	1
Y	54	gain	2	1
X	0	loss	3	17
X	1	loss	3	1445
X	2	gain	3	3008
X	3	gain	3	166
X	4	gain	3	45
X	5	gain	3	50
X	6	gain	3	35
X	7	gain	3	20
X	8	gain	3	17
X	9	gain	3	18
X	10	gain	3	8
X	11	gain	3	4
X	12	gain	3	1
X	14	gain	3	2
X	15	gain	3	1
X	46	gain	3	1
Y	0	loss	3	42
Y	1	loss	3	1094
Y	2	gain	3	1538
Y	3	gain	3	435
Y	4	gain	3	212
Y	5	gain	3	56
Y	6	gain	3	22
Y	7	gain	3	7
Y	8	gain	3	5
Y	9	gain	3	3
Y	10	gain	3	5
Y	12	gain	3	1
X	0	loss	4	9
X	1	loss	4	311
X	2	neutral	4	1412
X	3	gain	4	583
X	4	gain	4	65
X	5	gain	4	30
X	6	gain	4	21
X	7	gain	4	18
X	8	gain	4	11
X	9	gain	4	12
X	10	gain	4	6
X	11	gain	4	7
X	12	gain	4	7
X	13	gain	4	6
X	14	gain	4	6
X	15	gain	4	2
X	17	gain	4	3
X	18	gain	4	2
X	21	gain	4	1
X	24	gain	4	1
X	25	gain	4	1
X	27	gain	4	1
X	28	gain	4	1
X	39	gain	4	1
X	53	gain	4	1
X	108	gain	4	1
Y	0	loss	4	61
Y	1	loss	4	236
Y	2	neutral	4	534
Y	3	gain	4	366
Y	4	gain	4	181
Y	5	gain	4	134
Y	6	gain	4	87
Y	7	gain	4	24
Y	8	gain	4	23
Y	9	gain	4	10
Y	10	gain	4	4
Y	11	gain	4	3
Y	12	gain	4	2
Y	18	gain	4	1
Y	19	gain	4	1

I believe the logic I've implemented for CNVkit matches this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does appear so. Something to keep in mind downstream though. 🤨

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really missing that 💯 reaction option here

Copy link
Member

@jashapiro jashapiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the caveat that I don't like how ControlFreeC handles sex chromosomes in males when ploidy is odd, this looks good to go.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants