Updates on selection strategy, and v12 added "stranded" files. #374

jashapiro · 2019-12-24T00:28:45Z

Purpose/implementation Section

What scientific question is your analysis addressing?

A quick look at what is going on with the stranded samples added in v12, to try to determine why the new saples were clustering with the poly-A samples from the earlier data.

What was your approach?

I incorporated the additions and changes into a new file (02-selection-strategies-update.rmd) from d40097c that are part of #366. I then looked for genes that were correlated with cluster assignments based on the UMAP data.

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

As an intermediate analysis, I do not expect this to be a long term analysis, so the quality of the plots should be sufficient. But if there is a more refined analysis that people would like to see, this can be extended.

Is there anything that you want to discuss further?

We should discuss the next steps, given that the data do not seem to be what we had hoped to obtain with the re-sequencing. We should definitely seek clarification on the precise methods that were employed in generating these data.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

n/a

Results

What types of results are included (e.g., table, figure)?

A notebook and some figures.

What is your summary of the results?

The main result is this figure showing expression levels of individual genes, divided by UMAP cluster and colored by library preparation method.

The main takeaway is that histones and noncoding RNAs have much lower expression in the cluster that contains poly-A samples, which is as expected. However, the new stranded samples have the same biases, indicating that they may have been subjected to poly-A selection during preparation.

Reproducibility Checklist

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
This analysis has been added to continuous integration.
This analysis is recorded in the table in analyses/README.md.

cgreene · 2019-12-24T12:53:45Z

This does look bad. I agree with your assessment that the most likely explanation appears to be that BGI sequenced the RNA with poly-A selection instead of the requested protocol.

If we wanted to confirm, someone from the D3B side with access to the raw data could examine the coverage across transcripts for a few longer genes. I'm guessing it'll be high on the 3' side and drop from there.

jaclyn-taroni and others added 5 commits December 23, 2019 08:00

WIP: look at selection strategy with v12 data

9d75930

Add selection strategy README

8d882a8

Use params to set neighbors

a72eac3

Add cluster correlated genes

fc0fde4

A bit more context

a7326eb

This was referenced Dec 30, 2019

Rerun transcriptomic-dimension-reduction and selection-strategy-comparison with v12 data #366

Merged

Planned data release: V13 #373

Closed

This was referenced Jan 14, 2020

Proposed Analysis: Comparative RNA-Seq analysis #229

Open

Proposed Analysis: Assess batch effects in RNA-Seq data #448

Closed

jaclyn-taroni closed this Feb 3, 2020

jashapiro deleted the jashapiro/selection-strat-update branch April 11, 2021 18:53

jharenza mentioned this pull request Oct 18, 2021

Add OpenPBTA v21 (GitHub release 1) publication study to PedcBio #1185

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates on selection strategy, and v12 added "stranded" files. #374

Updates on selection strategy, and v12 added "stranded" files. #374

jashapiro commented Dec 24, 2019

cgreene commented Dec 24, 2019

Updates on selection strategy, and v12 added "stranded" files. #374

Updates on selection strategy, and v12 added "stranded" files. #374

Conversation

jashapiro commented Dec 24, 2019

Purpose/implementation Section

What scientific question is your analysis addressing?

What was your approach?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

cgreene commented Dec 24, 2019