-
Notifications
You must be signed in to change notification settings - Fork 6
Running Transposome with long read data
It is possible to run Tranposome with any read lengths, though it is necessary to modify the parameters for determining graph edges as the defaults are for Illumina data. For example, start by lowering the fraction coverage parameter in the configuration file. E.g.,
blast_input:
- sequence_file: sunflower_500k_interleaved.fasta
- sequence_num: 25_000
- cpu: 2
- thread: 12
- output_directory: sunflower_500k_transposome_PID90_COV55
clustering_options:
- in_memory: 1
- percent_identity: 90
- fraction_coverage: 0.15
- merge_threshold: 100
annotation_input:
- repeat_database: RepBase1801_sunflower_repeats.fasta
annotation_options:
- cluster_size: 500
- blast_evalue: 10
output:
- run_log_file: sunflower_500k_transposome_run_log.txt
- cluster_log_file: sunflower_500k_transposome_cluster_log.txt
The above setting would be a good starting point for 454 data with read lengths of 400 bp. Setting the fraction coverage too high will result in too few edges being added to the graph. Alternatively, setting the fraction coverage too will lead to longer run times and more non-specific clusters being generated.
Note that if you are writing your own scripts to do this analysis, it is easy to change these parameters and pass them to the Transposome::Cluster class during object construction. For a demonstration of how this would be done, see the Test a range of parameters for filtering pairwise matches example in the Tutorial.