Releases: ShashankRaoCoding/SNP_Manager
Release Version 0.1
First release of SNP_Manager and related applications
Guide for SNP Manager and Plot.py
SNP Manager
SNP manager is a library with 3 objects and numerous functions within it. Each SNP is instantiated as an SNP object. Files, which can be .TSV, .CSV, or .XLSX are instantiated as 'File' objects. A group of files can be provided, instantiated in the 'Filegroup' object. 'Filegroup's can be more useful as they share SNPs that are common between them. The library mathes SNPs between the files, adding data to the relevant 'SNP' objects as required.
NOTE: All SNP objects must have an 'rsid' attribute. As such, when reading through files, if a SNP's attribute is found to be a related name, such as 'variant id', the library will automatically convert it to 'rsid'. For example, provided data:
variant_id = "rs200480"
snp = SNP(variant_id)
print(snp.rsid)
> Output: rs200480
All other attributes are maintained from the source files.
The library comes with some built in functionality for plotting data. Example:
paths = selectfiles()
data = filegroup(parsepaths(paths))
data.plot("pval","beta","FinnGen P Values", "FinnGen Beta Values ", False, "FinnnGen pvalues against finngen beta values ", True, '.png')
> Syntax: filegroup.plot(x,y,xlabel,ylabel,logarithmic?,title,show?,save format or False)
Note: SNPs with missing data are ignored. SNPs where data cannot be plotted (eg. value 0 on a log plot) may also be ignored.
Creating scatter plots makes interactive plots where each SNP object can be clicked on. This gives data about the SNP and the attributes plotted. The SNP's rsid is automatically copied to your clipboard.
filegroup.snps consist of all snps in the file group
filegroup.files is a list of files objects
file.snps is a list of all snps within a file (note that accessing these SNP objects could still pull data from other files automatically).
Note: if multiple files have the same SNP with the same attribute, the most recently read file will take priority. To avoid this, name attributes in the source files uniquely, eg. 'pval', 'pvalue', 'P value', 'P-value' to differentiate p values from different sources. These would be treated as seperate and independent attributes.
getcommonsnps(filegroup_object) is a useful function that returns a list of snps common to all files in a filegroup. As such, these SNPs may share common attributes.
Plot.py
This application is a working example of the SNP manager library and helps plot SNPs by attributes. Note that attributes are derived from the provided files' column names. When the application encounters the same snp in different files with the same column name, the applicaiton understands this as the same attribute and overwrites the snp object data with the most recently read file's relevant value(s).
Export to FUMA.py
This application is another working example of the SNP manager library, and helps convert files into forms accepted by FUMA. This avoids having to manually rename column headings to those detected by FUMA.