Skip to content

GsoC 2025 projects

Osvaldo A Martin edited this page Feb 15, 2025 · 12 revisions

ArviZ

ArviZ is a project dedicated to promoting and building tools for exploratory analysis of Bayesian models. It currently has a Python and a Julia interface. All projects listed below are for the Python interface.

ArviZ aims to seamlessly integrate with established probabilistic programming languages like PyStan, PyMC, Turing, Soss, emcee, and Pyro and to be easily integrated with novel or bespoke Bayesian analyses. Where the probabilistic programming languages aim to make it easy to build and solve Bayesian models, the ArviZ libraries aim to make it easy to process and analyze the results from those Bayesian models.

Timeline

The timeline of the GSoC internships is available at the GSoC website

Projects

Below is a list of possible topics for your GSoC project, we are also open to other topics, contact us on Gitter (we won't accept proposals on topics outside this idea list from people who hasn't contacted us before). Keep in mind that these are only ideas and that some of them can't be solved entirely in a single GSoC project. When writing your proposal, choose some specific tasks and make sure your proposal is adequate for the GSoC time commitment. We expect all projects to be 350h projects, if you'd like to be considered for a 175h project you must reach out to Gitter. We will not accept 175h applications from people with whom we haven't discussed their time commitments before applying.

Students should be familiar with Python, numpy, and scipy. They should also be able to write unit tests for the added functionality using pytest and be able to enforce development conventions and use black, pylint, and pydocstyle for code style and linting.

Each project also lists some specific requirements needed to be able to complete the project. Note that some of these requirements can be learned while writing the proposal and during the community bonding period. You should feel confident to work on any project whose requirements are interesting to you and you would like to learn about them, they are not skills all that you are expected to know before writing your proposal. We aim for GSoC to provide a win-win scenario where you benefit from an inclusive and thriving environment in which to learn and the library benefits from your contributions.

Expected benefits of working on ArviZ

Students who work on ArviZ can expect their skills to grow in

  • Bayesian Inference libraries
  • Bayesian modelling workflow and model criticism
  • Matplotlib, bokeh, plotly usage (depending on the project)
  • Xarray usage (depending on the project)
  • Numba or Dask optimization (depending on the project)

Feature Parity

We are in the process of refactoring ArviZ into three sub-packages: ArviZ-base, ArviZ-stats, and ArviZ-plots. You can see an example of this new structure in arviz-plots. This refactor introduces changes to both the API and internal implementation, with most of the design decisions already established. However, the main task remaining is bringing existing features from legacy ArviZ into the new structure.

Some key features still missing include:

  • Model comparison using PSIS-LOO-CV (both in arviz-stats and arviz-plots)
  • Model criticism tools, such as prior and posterior predictive checks (mostly ArviZ-plots, including plot_ppc and specialized plots for discrete data)
  • Sampling diagnostics, particularly visualization tools like rank plots, rank_ecdf plots, parallel coordinate plots, etc

To ensure full feature parity with legacy ArviZ, we need to reintroduce these functionalities while also incorporating some new features, such as additional LOO utilities.

Since these tasks are extensive, we do not expect a single student to implement all of them. Instead, students should discuss with the project developers to select the features that best match their interests, skills, and goals.

Expected output

The expected output is a collection of implemented and tested features that enhance feature parity with the legacy version, accompanied by relevant documentation.

Required skills

Participants focusing on plotting features should be familiar with plot faceting, and the grammar of graphics, and comfortable working with xarray. Basic knowledge of plotting libraries such as Matplotlib, Bokeh, and Plotly is also required.

Those primarily interested in adding statistical or diagnostic features should have a good understanding of Bayesian statistics and be comfortable with xarray. Depending on the specific features they plan to implement, they should also be familiar with at least one of the following topics: prior/posterior predictive checks, model comparison, or sampling diagnostics.

Info

  • Expected size: 350h
  • Difficulty rating: hard
  • Potential mentors: Osvaldo Martin, Andy Maloney

Prior elicitation

PreliZ currently supports elicitation on the parameter space, allowing users to specify probability distributions for model parameters based on their prior knowledge. Additionally, it includes a few experimental functions for predictive elicitation on the observed space, enabling users to directly elicit distributions over predicted outcomes.

The goal of this project is to enhance and broaden the predictive elicitation tools by refining their implementation, boosting flexibility, and improving usability. This includes enhancing interactive components, enabling resampling based on user inputs, integrating additional statistical steps, and improving interoperability with other elements in the PreliZ framework, like distributions and elicitation methods on the parameter space.

Required skills

People working on this project will need to be familiar with Bayesian statistics, Prior elicitation PreliZ and possibly also ipywidgets.

Expected outcome

The expected outcome of this project will be features and accompanying documentation that demonstrate how they can be effectively integrated into a Bayesian workflow.

Info

  • Expected size: 350h
  • Difficulty rating: hard
  • Potential Mentors: Osvaldo Martin, Rohan Babbar
Clone this wiki locally