Skip to content

Latest commit

 

History

History
175 lines (149 loc) · 12.3 KB

README.md

File metadata and controls

175 lines (149 loc) · 12.3 KB

BiocompR - Advanced visualizations for data comparison

GitHub repo size GitHub issues GitHub closed issues

BiocompR is an R package built upon ggplot2, and using data.table. It is a visualization framework for data comparison and exploration. It improves some visualisations commonly used in biology and genomics, introduces new kind of plots, provides a toolbox of functions to work with ggplot2 and grid objects, and ultimately, allows users to customize plots produced into publication-ready figures.

Author: PAGEAUD Y.1
How to cite: Pageaud Y. et al., BiocompR - Advanced visualizations for data comparison.

GitHub R package version

GitHub last commit
GitHub

Ackowledgment

I would like to thank every people who contributed to the development of this package, with their code, their test datasets, their advices and feedbacks. - Yoann.
Contributors: Dr. Schefzik R.2; Mr. Hruska D.1; Mrs. Bitto V.1; Dr. Kurilov R.1; Mr. Beumer N.1; Mrs. Wursthorn A.3; Mrs. Qadeer R.1; Dr. Feuerbach L.1.
1. DKFZ - Division of Applied Bioinformatics, Germany.
2. Klinik für Anästhesiologie und Operative Intensivmedizin, Medizinische Fakultät Mannheim, Universität Heidelberg, Germany.
3. DKFZ - Clinical Cooperation Unit Translational Radiation Oncology, Germany.

Funding: Development and maintenance funded by de.NBI.
Feel free to take the de.NBI survey about BiocompR.

Linux prerequisites (under Ubuntu & Debian)

From your terminal install the following libraries (these are devtools and magick packages dependencies):

sudo apt install libfontconfig1-dev libxml2-dev libharfbuzz-dev libfribidi-dev libcurl4-openssl-dev libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev libmagick++-dev

Install BiocompR

In R execute the following command:

devtools::install_github("YoannPa/BiocompR")

Content

Currently the package BiocompR contains 40 exported functions:

  • biopalette() - A color palette advisor for biology plots.
  • build_legends_layout() - Builds legends layout.
  • check_fun() - Checks if a function exists and package of origin.
  • ggasso.all_annot() - Draws association test results between all columns from a data.frame.
  • ggasso.annot_pc() - Plots association tests' results between some annotations and some PCs.
  • ggasycorr() - Draws an asymmetrical pairwise correlation plot.
  • ggbipca() - Computes and draws a custom PCA biplot.
  • ggbivar() - Draws boxplots or violins from a variable values against ranges of a 2nd one.
  • ggcirclart() - Circlizes ggplot2 objects.
  • ggcoverage() - Plots an annotated stacked barplot.
  • ggcraviola() - Draws a craviola plot (half-splitted and percentile-binned violin plot).
  • ggcross.biplot() - Computes and draws biplots for multiple principal components at once.
  • ggdend() - Creates a dendogram in ggplot2.
  • ggdensity_map() - Plots a density color map from a matrix or a molten data.frame.
  • ggeigenvector() - Creates an eigenvector plot using ggplot2.
  • ggeva() - Computes eigenvectors, PC scores and correlations from a correlation test.
  • ggfusion.corr() - Draws 2 triangle matrices of computed pairwise correlations' results.
  • ggfusion.free() - Draws 2 triangle matrices fused together in a single plot.
  • ggheatmap() - Creates a custom heatmap with dendrograms and annotations.
  • gghist() - Plots an histogram using ggplot2 from a numeric or character vector.
  • ggpanel.corr() - Plots results of correlation test between a single variable and multiple others as jittered scatter plot divided into 4 different panels.
  • ggsidebar.basic() - Draws a ggplot2 of a basic sidebar.
  • ggsidebar.full() - Creates a colored side annotation bars in ggplot2.
  • ggstackbar() - Draws stacked barplots from an annotation table.
  • ggsunset() - Draws a sunset plot showing the completeness of a dataset.
  • ggtriangle() - Draws a triangle plot from a basic molten triangle matrix.
  • ggvolcano.corr() - Plots results of correlation test between a single variable and multiple others as volcano plot.
  • ggvolcano.free() - Plots any kind of results with P-values that can be displayed as a volcano plot.
  • ggvolcano.test() - Plots results of a Plots results of statistical tests as volcano plot.
  • ks.plot() - Computes pairwise Kolmogorov-Smirnov tests on a matrix and display results in a fused plot.
  • manage.na() - Keeps, removes or imputes missing values in a matrix or a data.frame based on sample groups.
  • pairwise.ks() - Computes a Kolmogrov-Smirnov test between all columns of a data.frame.
  • prepare_annot_asso() - Prepares annotations to be tested for associations.
  • prepare_pca_data() - Collects and computes needed metrics for PCA biplot.
  • raster.gg2grob() - Rasterize a gg plot into a raster grob.
  • raster.ggplot.to.grob() - Rasterize a gg plot into a raster grob.
  • resize.grob.oneway() - Resizes heights or widths of a grob based on the dimensions of another grob.
  • resize.grobs() - Resizes heights or widths of multiple grobs based on a given grob dimensions.
  • test.annots() - Tests association of an annotation with another one or with a PC.
  • test_asso_all_annot() - Write function description here.
  • test_asso_annot_pc() - Tests associations between a set of annotations and PCs from a prcomp object.
  • warn.handle() - Filters irrelevant warnings matching a regular expression.

Problems ? / I need help !

For any questions Not related to bugs or development please check the section "Known Issues" available below. If the issue you experience is not adressed in the known issues you can write me at y.pageaud@dkfz.de.

Known Issues

❎ Error in UseMethod("depth")

Error in UseMethod("depth") : 
  no applicable method for 'depth' applied to an object of class "NULL"

This error seems to happen randomly when executing code using the ggplot2 and/or grid packages. Usually executing one more time the chunck of code solve the error. The current statues of this issue can be tracked here.

❎ Error in grid.Call(C_convert, x, as.integer(whatfrom), as.integer(whatto), : Viewport has zero dimension(s)

Error in grid.Call(C_convert, x, as.integer(whatfrom), as.integer(whatto), :
  Viewport has zero dimension(s)

This error can arise when using the ggbipca() function: if you define a legend with too many values, the plotting area becomes too small to print the plot in plotting panel of RStudio.
When it happens, you can try to manually increase the size of the plotting panel in your RStudio interface. If doing this doesn't solve the error, then it is advised to define a legend with fewer values for colors and/or shapes.

⚠️ Reached elapsed time limit.

Warning message:
In grid.Call(C_convert, x, as.integer(whatfrom), as.integer(whatto),  :
  reached elapsed time limit.

This warning seems to happen randomly when executing code using the ggplot2 and/or grid packages. Usually after executing one more time the chunck of code the warning does not display anymore. The current statues of this issue can be tracked here.

⚠️ Using alpha for a discrete variable is not advised.

Warning message:
Using alpha for a discrete variable is not advised. 

This warning can arise when using the function ggvolcano.corr() with additionnal ggplot2 components. It doesn't compromise the printing of the plot, however you might feel annoyed by it.
A quick fix to suppress specifically this warning is to use the function warn.handle(), which filters out annoying warnings using pattern matching, as following:

#Create your correlation volcano plot (this will also print the 'default' volcano plot)
my_volcano <- ggvolcano.corr(
  data = dfrm_my_correlation_res, p.cutoff = 0.01, corr.cutoff = 0.1,
  title.corr.cutoff = "Samples default correlation",
  corr.label.cutoff = c(-0.35,0.40)) +
  scale_color_manual(values = ggsci::pal_npg("nrc", alpha = 1)(10)) +
  xlab("Spearman correlation") + ylab("Spearman P-value") +
  ggtitle("Spearman correlation between multiple variables and my variable of interest")

#Print your volcano plot without displaying the annoying warning
warn.handle(
  pattern = "Using alpha for a discrete variable is not advised.",
  print(my_volcano)) 

Nevertheless, using ggvolcano.corr() without additionnal ggplot2 components should not raise this warning.

⚠️ ggrepel: ## unlabeled data points (too many overlaps). Consider increasing max.overlaps.

Warning message:
ggrepel: ## unlabeled data points (too many overlaps). Consider increasing max.overlaps

This warning can arise when using the function ggbipca(). If the scale is too small, and you want to display too many loadings labels, then those overlapping will not be displayed, and this warning will be printed. Using this function has shown that this specific warning can persist, and be printed randomly afterward when running other commands. It is unclear why this is happening. But it can be fixed by executing the following command, once you ran ggbipca():

assign("last.warning", NULL, envir = baseenv())

The current statues of this issue can be tracked here.

⚠️ In min(x) : no non-missing arguments to min; returning Inf / In max(x) : no non-missing arguments to max; returning -Inf

Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf

This warning can arise when using the function sunset()when there is only 1 label displayed on the right Y axis. This warning does not compromise the result and should be ignored.
The current statues of this issue can be tracked here.

Technical questions / Development / Feature request

If you encounters issues or if a feature you would expect is not available in a BiocompR function, please check if an existing issue adresses your point here. If not, create a new issue here.

References

⚠️ Work in progress !

  1. Share a legend between two ggplot2 graphs - Mara Averick
  2. Align two plots on a page - Mara Averick
  3. ggfortify: Data Visualization Tools for Statistical Analysis Results
  4. ggfortify: Plotting PCA (Principal Component Analysis
  5. Loadings vs eigenvectors in PCA: when to use one or another?
  6. What is the proper association measure of a variable with a PCA component?

Licence

BiocompR is currently under the GPL-3.0 licence.