This repository contains a module called tolerances.py which features two functions used in data science to calculate tolerance intervals. Similar to confidence intervals, tolerance intervals give bounds for a given confidence and coverage (percentage of population). The arguments for the functions is the dataset (a numpy array of values), the coverage (the percentage of the population that will lie within the tolerance intervals), confidence (the probability that the coverage is correct), and bootstrap iterations (the higher the value, the more accurate but longer to run). The return values will be the tolerance interval (two values for a double sided interval, one value for a one sided interval(as the lower bound is zero by definition). An updated coverage value is also given as the algorithm used needs to correct it. The algorithm is based on bootstrapping (sampling with replacing) that can be found via this link: https://www.math.kth.se/matstat/gru/sf2955/tolerans.pdf
An example program, Example.py, is given, as well as two datasets which have been randomly generated.
If there are any questions, please feel free to email me at jg854@cam.ac.uk