Skip to content

Appendix: Parameters of external tools

Yaobo Xu edited this page Aug 30, 2019 · 4 revisions

This page lists the parameters (command-line options) that cgpRna (version: 2.3.0) uses when running external tools below.

Mapping and gene counting

  • Convert BAM file to FastQ files if the input file is a BAM using bamtofastq (biobambam2 version 2.0.87)

     bamtofastq \
     exclude=QCFAIL,SECONDARY,SUPPLEMENTARY \
     gz=1 \
     level=1 \
     T=${TEMPORARY_FILE} \
     S=${SINGLE_END_FILE} \
     O=${UNMATCHED_PAIR_FIRST_MATE_FILE} \
     O2=${UNMATCHED_PAIR_SECOND_MATE_FILE} \
     F=${MATCHED_PAIR_FIRST_MATE_FILE} \
     F2=${MATCHED_PAIR_SECOND_MATE_FILE} \
     filename=${INPUT_BAM}
  • Map with STAR (Version 2.5.0c)

     STAR --runMode alignReads \
     --sjdbOverhang 99 \
     --limitBAMsortRAM 64606632121 \
     --limitSjdbInsertNsj 1000000 \
     --outSAMtype BAM Unsorted \
     --outSAMstrandField intronMotif \
     --outSAMattributes NH HI NM MD AS XS \
     --outSAMunmapped Within \
     --outSAMheaderHD @HD VN:1.4 SO:unsorted \
     --outFilterMultimapNmax 20 \
     --outFilterScoreMinOverLread 0.33 \
     --outFilterIntronMotifs RemoveNoncanonicalUnannotated \
     --alignIntronMax 200000 \
     --alignMatesGapMax 200000 \
     --alignSJDBoverhangMin 1 \
     --quantMode TranscriptomeSAM \
     --readFilesCommand zcat \
     --outSAMheaderCommentFile ${OUT_COMMENT_FILE} \
     --outSAMattrRGline ${CUSTOM_RG_LINE} \
     --outFileNamePrefix ${OUTPUT_DIR} \
     --runThreadN ${THREADS} \
     --genomeDir ${REFERENCE_DIR} \
     --sjdbGTFfile ${TRANSCRIPTOME_GTF_FILE} \
     --readFilesIn {MATCHED_PAIR_FIRST_MATE_FILE} {MATCHED_PAIR_SECOND_MATE_FILE}
  • Sort BAM by coordinates using bamsort (biobambam2 version 2.0.87)

     bamsort \
     fixmate=1 \
     inputformat=bam \
     level=1 \
     inputthreads=${THREADS} \
     outputthreads=${THREADS} \
     I=${STAR_ALIGNED_OUT_BAM} \
     tmpfile=${TMP_FILE} \
     O=${OUT_SORTED_BAM}
  • Mark duplicates using bammarkduplicates2 (biobambam2 version 2.0.87)

     bammarkduplicates2 \
     md5=1 \
     index=1 \
     tmpfile=${TEMP_FILE} \
     markthreads=${THREADS} \
     md5filename=${OUT_BAM_MD5} \
     indexfilename=${OUT_BAM_INDEX} \
     M=${OUT_MET} \
     I=${SORTED_BAM} \
     O=${OUT_DUP_MARKED_BAM}
  • Sort BAM by read names using bio (biobambam2 version 2.0.87) so that HTSeq-count can use

     bamcollate2 \
     collate=1 \
     inputformat=bam \
     outputformat=bam \
     level=1 \
     exclude=SECONDARY,SUPPLEMENTARY \
     filename=${OUT_DUP_MARKED_BAM} \
     O=${NAME_SORTED_BAM}
  • HTSeq-count (Version 0.7.2)

     htseq-count \
     --format=bam \
     --order=name \
     --stranded="no" \
     --type="exon" \
     --idattr="gene_id" \
     --mode="union" \
     --quiet \
     ${NAME_SORTED_BAM} ${HTSEQ_GTF}