Skip to content
Ruitu Lyu edited this page Apr 19, 2023 · 11 revisions

Welcome to the KAS-Analyzer wiki!

Feel free to suggest questions or report troubleshooting and feedback, please post to the github issues or send emails to the authors: lvruitu@gmail.com.

1. How to analyze KAS-seq data from species, which is not supported in KAS-Analyzer?

If you're interested in analyzing KAS-seq data from rare species using KAS-Analyzer, simply provide the reference genome sequence and gene annotation to the authors. You can send these materials to lvruitu@gmail.com, or post the download link in the KAS-Analyzer GitHub issue or discussion section. The authors will promptly update KAS-Analyzer with the new species data.

2. Is it necessary to install or activate the KAS-Analyzer conda environment every time I use KAS-Analyzer?

KAS-Analyzer incorporates numerous top-performing tools from the NGS field, such as bwa, bowtie2, epic2, deeptools, bedtools, samtools, macs2, and homer. If you prefer not to activate the KAS-Analyzer conda environment each time you use KAS-Analyzer, ensure that all required tools are accessible in your system. You can verify this by running 'KAS-Analyzer install -check' in the terminal. Nonetheless, it is recommended to activate the KAS-Analyzer conda environment when generating plots or performing differential KAS-seq analysis, as these tasks typically require loading specific R packages.

3. How can I tell if my KAS-seq or spKAS-seq data are working well?

If KAS-seq or spKAS-seq data are working well can be determined using by many ways: 1) Peaks number, >=50,000. or Fraction of reads in peaks (FRiP) >=25%. 2) Typical 'valley' pattern of KAS-seq signals on genetic regions can be seen in the heatmap or metagene profiles, sharp peaks arounds TSS, broad-basal enrichment on gene body and broad-strong enrichment on terminator regions. 3) Visualize KAS-seq data on genome browser (UCSC, igv or WashU Epigenome Browser) and many significant KAS-seq peaks can be seen especially on genetic regions.

4. How many mapped reads are required for regular KAS-seq or spKAS-seq data?

We recommend you can run a saturation analysis using 'KAS-Analyzer saturation'. Basically, 40M deduplicated mapped reads should be enough for a regular KAS-seq assay. You can calculate how many raw sequencings reads you need from sequencing facility by the mapping ratio (~95%) and duplication ratios (20%-40%) for most KAS-seq data in our lab.

5. How to determine the library complexity?

KAS-Analyzer provide a subcommand (KAS-Analyzer complexity) to calculate the library complexity metric of (sp)KAS-seq data, including PCR Bottlenecking coefficient and Non-Redundant Fraction (NRF).

6. How to perform normalization for KAS-seq data?

We recommend that you can normalize the KAS-seq data based on deduplicated mapped reads or FPKM if spike-in normalization is not available. Normalization factors can be calculated to make sure the number of deduplicated reads between KAS-seq data at the same level, then generate the normalized KAS-seq density file (bedGraph) based on normalization factors using 'KAS-Analyzer normalize'.

6. What are single-stranded transcribing enhancers?

Single-stranded transcribing enhancers are those enhancers with KAS-seq peaks, which are 'single-stranded' active enhancers with higher activity and enrich unique TFs motifs. We found that most single-stranded transcribing enhancers are with ATAC-seq peaks and Pol2 binding, some of them are also with stable eRNA transcription.

7. What is the pausing index? and how the pausing index can be calculated using KAS-seq?

As we all know that Pol II pauses in the proximity of the promoter on a large fraction of transcribed genes, and transcription initiation and elongation of transcripts are under distinct control. Pausing index is defined as the ratio of pause peak density to gene body density. Previously, people usually used nascent RNA density calculated from GRO-seq to calculate pausing index, in KAS-Analyzer, we use ssDNA density calculated from KAS-seq data.

8. What is the termination index?

Transcription termination is the process where a nascent RNA is released from its complex with RNA polymerase and the DNA template. Termination index is defined as the ratio of terminator peak density to gene body density, which can be interpreted as the degree of difficulty that Pol2 leaves its DNA template. Termination index can only be calculated using KAS-seq data and KAS-Analyzer.

9. How many replicates are required to perform a differential KAS-seq analysis?

We recommend at least 3 replicates for differential KAS-seq analysis per condition. We mainly use DESeq2 to perform differential RNA Pols activity analysis for two-conditions KAS-seq experiments, and ImpulseDE2 to perform differential RNA Pols activity analysis for two-conditions for time-course KAS-seq experiments. Higher replicate numbers can marginally minimize false positives.

My question wasn't answered here

Please post your question to the github discussion and ask your question there.