RepeatMasker reader

Script to read RepeatMasker data for downstream analysis. Data is downloaded from https://www.repeatmasker.org/species/hg.html and the following file is used: hg38.fa.out.gz

Considering the size of the data, this should be downloaded to a high-performance computing cluster (ideally >16GB) memory in order to open and read the file.

Dependencies

R version 3.6.1 (or above)
R libraries lincluding rtracklayer and GenomicRanges
BASH
Linux-powered high-performance computing cluster with a SLURM workload manager

Usage

The script should be run in the following manner

sbatch repeat.masker.running.sh

Output

The output of the script is two files:

A reformatted text file called "repeatmasker.reformat.txt" which will contain RepeatMasker data
An RData file containing a GRange object indicating repeat elements in the genome by chromosome location, repeat name, repeat class, and repeat family

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
repeat.masker.non_coding.data.reader.R		repeat.masker.non_coding.data.reader.R
repeat.masker.running.sh		repeat.masker.running.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RepeatMasker reader

Dependencies

Usage

Output

About

Releases

Packages

Languages

benloklab/repeatmasker_annotation

Folders and files

Latest commit

History

Repository files navigation

RepeatMasker reader

Dependencies

Usage

Output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages