PIO is an open-source software created to help researchers and biologists in designing their NGS panels.
Based on individual mutational profiles available on public cancer genomics databases or in private databases, PIO proposes a tool to either select the minimal set of genes or exons to achieve absolute informativity (100% of the patients with at least one mutation) or assess the informativity of a given panel in a cancer of interest. Several others features are proposed by PIO, such as the exploration of mutation data frequency and distribution. By combining metrics on informativity and panel length, PIO can help making accurate choices in the design of panels, and might also be used to easily benchmark available commercial solutions.
PIO has a ready-to-use online version available at https://vincentalcazer.shinyapps.io/Panel_informativity_optimizer/.
You can install the local development version from GitHub either by cloning the repository or directly downloading the package in R. Note that dependencies will have to be installed first in case of repository cloning.
install.packages("remotes")
remotes::install_github("VincentAlcazer/PIO")
shiny::runApp("path_to_PIO")
For a complete tutorial, please visit the vignette.
Parameters can be selected in the left panel.
The training dataset can be selected here.
-
Preloaded datasets: Here you can select mutation data from 91 independent cohorts spanning 31 different cancer types. Data sources are available in the PIO paper.
-
Custom datasets: Custom mutation data can be uploaded by selecting browse and the correct file format. Custom datasets should contain at least the 10 following columns: "patient_id", "gene_id", "exon_id", "Chromosome", "Start_Position", "End_Position", "Strand", "Variant_Classification", "HGVSp", "cohort". Column names can differ but should be in the exact same position. Columns can be left empty in case the variable is not available (e.g. for a mutation dataset where exons have not been annotated, exon_id will be an empty column).
An example dataset is provided (Custom_dataset_example.tsv). Other example of datasets from cbioportal can be found in the raw_mutation_data folder.
Note that preloaded datasets can be merged with custom datasets to enable a global analysis on a larger base (can be done with a local use only).
Optimal panel selection (optimal mode), custom panel interrogation (custom mode) or panel test. See the vignette for more informations about the mods.
Should mutations (and their respective length) be grouped by gene or exon/intron? (default: gene)
The number of unique patients per kilobase (UPKB) or the number of unique patients (UP) can be used as informativity metric. (default: UPKB)
A minimal number of patients per mutation, calculated on the overall cohort, can be set in order to avoid overfitting to private mutations. (default: 2)
How many mutations per patient you would like PIO to aim at. (default: 2)
For custom analysis: a custom panel should be uploaded. The uploaded file should contain only one column containing the list of gene or exon names. An example is provided (Custom_panel_example.tsv).
- Dataset: select a preloaded or upload a custom dataset
- Parameters: PIO optimal
- Group mutation by: exon/intron or gene
- Informativity metric: UP or UPKB
- Min. patients/mutations: at least 2
- Min. mutations/patient: between 1 and 5
With these parameters, PIO will selects an optimal set of mutations allowing to maximize informativity in a given disease, targeting a selected number of mutations per patient (informativity level).
- Dataset: select a preloaded or upload a custom dataset
- Parameters: PIO custom (do not forget to upload your custom panel below)
- Group mutation by: exon/intron or gene
- Informativity metric: UP or UPKB
- Min. patients/mutations: at least 2
- Min. mutations/patient: between 1 and 5
Using these parameters, PIO will select an optimal set of mutations among the uploaded panel allowing to maximize informativity in a given disease. PIO will automatically suggest the most informative mutations to add to optimize panel informativity.
- Dataset: select a preloaded or upload a custom dataset
- Parameters: PIO custom (do not forget to upload your custom panel below)
- Group mutation by: exon/intron is recommanded for panel size optimization.
- Informativity metric: UPKB is the most adapted metric for panel size optimization.
- Min. patients/mutations: at least 2
- Min. mutations/patient: between 1 and 5
In this configuration, PIO will propose an optimized panel allowing to maximize informativity with for the minimal size. Different panels can be compared by manually uploading individual panels.
- Dataset: select a preloaded or upload a custom dataset
- Parameters: Panel test (do not forget to upload your custom panel below)
- Group mutation by: according to the custom panel: gene or exon/intron
- Informativity metric: UP or UPKB
- Min. patients/mutations: 1
- Min. mutations/patient: between 1 and 5
In this configuration, PIO will show the performances of the uploaded panel in a given dataset without further mutation selection/panel optimization.
- Dataset: select a preloaded or upload a custom dataset
- Parameters: PIO optimal
- Group mutation by: gene (or exon/intron)
- Informativity metric: UP or UPKB
- Min. patients/mutations: 1
- Min. mutations/patient: 1
Using these parameters will optimize mutations exploration for a given disease in the mutations tab.