Skip to content

Simulation

David Soldevila edited this page Aug 21, 2019 · 17 revisions

Summary

Given the template (output_positive.csv) generated by the MATCHING (or any other program as long as it has the required format), performs a simulated PCR-amplification (or in silico PCR) of a random sample of genomes (or species); the effciency of the PCR of each species in the mixture is mediated by its number of template-primer mismatches. The rationale for the model is taken from Piñol et al. (2019). In short:

  1. We generate a random sample of S species from the pool P of species.
  2. We generate the initial (before PCR) abundance of each of the S species in the sample using a geometric distribution of parameter k (equation 3 of Piñol et al., 2019).
  3. We estimate the relative amplification efficiency of each species in the sample using the parameter beta and the number of primer-template mismatches of each species according equation 4 of Piñol et al. (2019).
  4. We calculate the relative final (after PCR) DNA concentration of each species in the mixture using equation 2 of Piñol et al. (2019).
  5. Finally, we calculate the Pearson correlation coefficient between the initial and the final relative abundance of species (pre and post-PCR).
  6. The above steps are repeated N times with a fresh random set of species.

P is the numer of total genomes for which there was a positive matching in the template file (output_positive.csv). S, k, beta, and N are given by the user.

The results are summarized in two files. (1) raw results containing a table with the correlation coefficient for each sample and each primer pair; (2) a summary containing the basic statistics of the simulation for each primer pair.

HOWTO

HOWTO Open the program

To execute the program in GUI mode just double click on it.

If you are using the python script, type on a terminal:

python QMPrimers.py
#or python3 or python3.x, where X depends on your python version

Go to the next section for a quick review of the program options.

To execute the program in command line mode:

If using the python script, type in a terminal:

python QMPrimers.py --sim

If using the execuptable, type in a terminal:

./QMPrimers --sim

Go to the next section for a quick review of the program options. To display the help page of the simulaion, type:

QMPrimers --sim --help

HOWTO Fast Guide

When the program opens, click on the simulation tab. Now you should have this on your screen: Matching Screenshot

  • The template entry is used to load the template file containing the matching results of the matching. The last matching template is automatically loaded, in that case "TEMPLATE FROM LAST MATCHING" will be writen in the template entry. See Template File section for more info.
  • The Output entry (the second entry) is used to specify the path and name of the output files. The program outputs 2 files: simout.csv, containing raw data and a text file simou.txt, containing basic statistics. For more info, see Output Data
  • S sets the number of random sequences to select at each step of the simulation.
  • The Beta parameter: The greater it is this constant the more expensive the mismatches will be on the amplification.
  • The k parameter is used to calculate the geometric proportion of each species. The greater the more unbalanced the proportions will be. Ranges from 0 to 1.
  • The N parameter sets the number of steps of the simulation.
  • The Confidence Interval is used when calculating the statistics.
  • verbose: If enabled the program will output warning messages such as a primer has been skipped because it has too few occurrences in the template, for example. An additional log file in the program root directory is also generated. This log file always prints the warnings. Activating the verbose options makes the program to log also messages with the info flag.

Template File

The template header can be obtained with the matching tab of this program or with any other, but at least the following data must be included.

id,primerPair,fastaid,mismFT,mismRT

For more info, see Output data

Output

The program generates two output files, a .csv and a .txt file called simout.csv and simout.txt by default. The former contains the correlation interval of every primer for every step, we can call it raw data. The later contains a summary of this data.

simout.csv

Each row is a simulation step and each column a primer pair. The last row is the total number of different samples that can be generated for each primer. By different samples it is meant the different number of unique combinations of species possible.

simout.txt

A part from max_total and min_total which contain the absolute max and minimum, the other values are within the confidence interval.

Clone this wiki locally