Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Updates on selection strategy, and v12 added "stranded" files. #374

Conversation

jashapiro
Copy link
Member

Purpose/implementation Section

What scientific question is your analysis addressing?

A quick look at what is going on with the stranded samples added in v12, to try to determine why the new saples were clustering with the poly-A samples from the earlier data.

What was your approach?

I incorporated the additions and changes into a new file (02-selection-strategies-update.rmd) from d40097c that are part of #366. I then looked for genes that were correlated with cluster assignments based on the UMAP data.

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

As an intermediate analysis, I do not expect this to be a long term analysis, so the quality of the plots should be sufficient. But if there is a more refined analysis that people would like to see, this can be extended.

Is there anything that you want to discuss further?

We should discuss the next steps, given that the data do not seem to be what we had hoped to obtain with the re-sequencing. We should definitely seek clarification on the precise methods that were employed in generating these data.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

n/a

Results

What types of results are included (e.g., table, figure)?

A notebook and some figures.

What is your summary of the results?

The main result is this figure showing expression levels of individual genes, divided by UMAP cluster and colored by library preparation method.

image

The main takeaway is that histones and noncoding RNAs have much lower expression in the cluster that contains poly-A samples, which is as expected. However, the new stranded samples have the same biases, indicating that they may have been subjected to poly-A selection during preparation.

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.
  • This analysis is recorded in the table in analyses/README.md.

@cgreene
Copy link
Collaborator

cgreene commented Dec 24, 2019

This does look bad. I agree with your assessment that the most likely explanation appears to be that BGI sequenced the RNA with poly-A selection instead of the requested protocol.

If we wanted to confirm, someone from the D3B side with access to the raw data could examine the coverage across transcripts for a few longer genes. I'm guessing it'll be high on the 3' side and drop from there.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants