Skip to content

Fully generalized normal distribution. Flexible in kurtosis and skewness at the same time. Generalized gaussian

License

Notifications You must be signed in to change notification settings

quantsareus/s_dist__first_fully_generalized_normal_distribution__scipy_stats_extension_package

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is an implementation of the fully generalized S dist invented by Schlingmann, H., that was published in the Phd thesis 'Introducing a fully generalized normal distribution' in January 2024 on Github (github.com/quantsareus/introducing_a_fully_generalized_normal_distribution). The thesis proves, that the new distribution is valid; further it also explains a lot of the numerical methods applied in this stats lib. 

Summing it up short, the new S dist is the first generalized normal dist, that is flexible in both, skewness and kurtosis (power), at the same time (= fully generalized). The new giant flexibility makes it a very univeral distribution, that eats about 95% of the empirical dists (including the dists of metrics and parameters), that do occur in daily data science practice.   

Further, it is directly downward compatible to the normal distribution (without parameter conversion). The new S dist is closely related to the Gaussian; it can be described as a Gaussian (already flexible in kurosis/ power), that additionally got enhanced for flexibility in skewness and that has been normed to a total density of 1. For detailed information about S dist refer to the thesis (github.com/quantsareus/introducing_a_fully_generalized_normal_distribution/introducing_a_fully_generalized_normal_distribution_published.pdf). 

The MathML version of the S dist PDF formula is

f(x) = left langle binom {2 over{%lambda_1 + %lambda_2} e^ {-0.5({ abs{x-c}} over z)^ak}   for x -c < 0 } { 2 over {%lambda_1 + %lambda_2} e^ {-0.5({ abs{x-c}} over z)^{k over a}} for x -c > 0}  right rangle 

where

%lambda_1 = {2z %GAMMA(1 over ak)} over {ak 0.5^{1 over ak}} 

%lambda_2 = {2z %GAMMA(a over k)} over {k over a 0.5^{a over k}} 

and

x: independent input variable
k: power and kurtosis parameter € [ 1, 3]
a: asymmetry parameter € [ 0.5, 2]
z: deviation scale parameter
c: construction location parameter


On the advantages hand the giant flexibility of the new S dist in skewness and kurtosis opens a complete new horizon in data science. Over the thumb the current central statistics framework out of normal, Gaussian, chi square, t and f dist comparably covers less than 0.1 % of the dist set of S dist. Getting specific, the S dist can approximate the t dist from degree 5 and up and can approximate the f dists from degree (18, 18) and up in good quality. So, it should be more or less possible to model all random errors and all parameters of any data with a number count higher than 20 by just one S distribution! All the nasty lookup of the individual test/ distribution corresponding to the parameter of interest is not required any more. Thus, the new distribution allows for a much higher degree of automation in data science analysis.

Away from inferential statistics the new S dist also comes with a comparably very precise random number generator, that can create gradual scales of alternative skewness and alternative kurtosis, also seamless from the normal distribution. The moment precise random number data sets from the new generator can be used for systematically stress-testing other data science methods (e.g. OLS regression) or for simulating a mixed investment portfolio out of symmetric, (asymmetric) skew and also more heavy tailed investment returns. Current classical random number generators either cannot create gradual scales of skewness and kurtosis; and their outputs are also highly volatile in kurtosis, skewness and standard deviation (in this order).

Despite its low version number the package already provides the most relevant 16 standard methods of the scipy.stats.dist standard (in total 19). (Find a detailed list below.)   

On the disadvantages hand it has been necessary to implement some parts of the package on a numerical basis. The S dist is still pretty new. Yet mathematicians have not figured out all the analytical solves, that are already known for the classical dists. Currently, the most limiting point is a missing analytical CDF. Thus, CDF based computations, such as quantiles, confidence intervals, p-values and hypothesis tests, currently do take longer than in other dists. Fortunately, usually only a low number of inferential procedures (< 5?)  is required in a typical data science analysis. Most of the other low level dist methods (including random number generation) have been realized without using the CDF. So they are competitive performant.


The valid parameter space of the current implementation is

k: k € [ 1, 3]
a: a € [ 0.5, 2]
where k *a >1 and k /a >1 
z: basically unlimited (within Python number and precision limits)
c: basically unlimited (within Python number and precision limits)


The current parameter space will roughly covers about 95% of the empirical dists (including the dists of metrics and parameters), that do occur in daily data science practice. This, without using any other distribution than S distribution! In future the valid parameter space and the performance will get further expanded by additional mathematical solve of the distribution.  




##############################################################################################################################################################################################
#
# Linux Installation
#


Download the source code to <path_to_s_dist>/s_dist/ . In case unzip the zip archive and change the name of the unzipped folder to s_dist.


[If you want to add to an existing anaconda installation:] 
cd ~/anaconda3  [or where your local anaconda installation is]

[If you want to add to the system's python installation:] 
cd /


Finally
bin/pip3 install <path_to_s_dist>/s_dist/

Done.


You have to take care, if you effectively call the pip from the python installation or the pip from anaconda installation, which do install the S dist module to different locations and environments.  


Afterwards check the installation by
bin/python3 -c "import s_dist"



##############################################################################################################################################################################################
#
# OS-X Installation
#

You have to install Python or Anaconda distribution, first.

Then, the addition of S dist to the Python installation or the Anaconda installation from the CLI will be just like the Linux install, except for it is 'pip' __not__ 'pip3'. 

You have to take care, if you effectively call the pip from the python installation or the pip from anaconda installation, which do install the S dist module to different locations and environments.  



##############################################################################################################################################################################################
#
# Window Installation
#

You have to install Python or Anaconda distribution, first.

Then, the addition of S dist to the Python installation or the Anaconda installation from the CLI will be just like the Linux install, except for it is 'pip' __not__ 'pip3'. Windows does provide the command line interface (CLI) by Windows Power Shell.

You have to take care, if you effectively call the pip from the python installation or the pip from anaconda installation, which do install the S dist module to different locations and environments.  
 


Windows installation option B) WSL Environment

Install WSL (Windows subsystem for Linux).

Switch to the running WSL environment and install as in Linux.