Unify the efforts for Regression/GLM #14

lindahua · 2014-07-20T04:33:09Z

Regression (e.g. linear regression, logistic regression, poisson regression, etc) is a very important in machine learning. Many problems can be formulated in the form of (regularized) regression.

Regression is closely related to generalized linear models. A major portion of regression problems can be considered as estimation of generalized linear models (GLM). In other words, estimating a GLM can be casted as a regression problem where the loss function can be considered as the negative log-likeliehood.

There have been a few Julia packages in this domain. Just to name a few, we already have:

and probably some others that I am missing.

Functionalities provided by these packages are substantially overlapped. Yet, they are not working with each other.

Unifying these efforts towards a great framework for regression/GLM would definitely make Julia a much more appealing option for machine learning. I am opening this thread to initiate the discussions.

Below is a proposal about how we may proceed:

Front-end and Back-end should be decoupled. To me, a regression framework consists of four basic aspects:
- Data: observed features/attributes and responses
- Model: an entity that can be used to make predictions given new observations. A model should contain the coefficients for producing linear predictors (when data are given) and information about how to link predictors to responses.
- Problem: given data, estimating model parameters can be casted as an optimization problem with a certain objective function (and optionally some constraints).
- Algorithm: the procedure to solve a given problem
The front-end modules should provide functions to help users turn their data and domain-specific knowledges into optimization problems; while the back-end should focus on solving the given problems. These two parts require different skills (the former is mainly concerned with user API design; while the latter is mainly about efficient optimization algorithms).
I propose the following way to reorganize packages:
- RegressionBase.jl: provide types to represent regression problems and models. This package should also provide other facilities to express a regression problems, e.g. loss functions, regularizers, etc. This package can also provide some classical/basic algorithms to solve a regression problem. (This may more or less adopt what RegERM is doing).
- GLMNet.jl (depend on RegressionBase.jl): wrap the external glmnet library to provide efficient solvers for certain regression problems. The part that depend on DataFrames should be separated out.
- Similarly, SGD.jl, LARS.jl, etc should also depend on RegressionBase.jl and provide different kind of solvers. Note that GLMNet, SGD, LARS should accept the same problem types, and have consistent interface. They just implement different algorithms.
- Regression.jl: a meta package that include RegressionBase.jl and a curated set of solver packages (e.g. GLMNet, SGD, etc)
- GLMBase.jl (depend on Distributions.jl and Regression.jl): provide types to represent generalized linear models, relevant machinery such as link functions, etc. This package can take advantage of Regression.jl for model estimation.
- GLM.jl (depend on GLMBase.jl and DataFrames.jl): provide high level UI for end users to perform data analysis. (The user interface can remain the same as it is).

Your suggestions and opinions are really appreciated.

The first question that we need to answer is whether we should introduce RegressionBase.jl (which would borrow part of the stuff from RegERM). If there's no objection, I can setup this package and then we can discuss interface designs from there.

We can then proceed with the adjustment of other packages.

cc: @dmbates @johnmyleswhite @simonster @BigCrunsh @Scidom @StefanKarpinski

The text was updated successfully, but these errors were encountered:

IainNZ · 2014-07-20T05:13:51Z

I like the design/choices of abstraction!

dmbates · 2014-07-20T11:42:04Z

I very much like this idea.

On Sat, Jul 19, 2014 at 11:33 PM, Dahua Lin notifications@github.com
wrote:

Regression (e.g. linear regression, logistic regression, poisson
regression, etc) is a very important in machine learning. Many problems can
be formulated in the form of (regularized) regression.

Regression is closely related to generalized linear models. A major
portion of regression problems can be considered as estimation of
generalized linear models (GLM). In other words, estimating a GLM can be
casted as a regression problem where the loss function can be considered as
the negative log-likeliehood.

There have been a few Julia packages in this domain. Just to name a few,
we already have:

GLM https://github.com/JuliaStats/GLM.jl

GLMNet https://github.com/simonster/GLMNet.jl

Regression https://github.com/lindahua/Regression.jl

NLReg https://github.com/dmbates/NLreg.jl

RegERM https://github.com/BigCrunsh/RegERMs.jl

SVM https://github.com/JuliaStats/SVM.jl

LIBSVM https://github.com/simonster/LIBSVM.jl

SGD https://github.com/johnmyleswhite/SGD.jl

Loss https://github.com/johnmyleswhite/Loss.jl

LARS https://github.com/simonster/LARS.jl

and probably some others that I am missing.

Functionalities provided by these packages are substantially overlapped.
Yet, they are not working with each other.

Unifying these efforts towards a great framework for regression/GLM would
definitely make Julia a much more appealing option for machine learning. I

am opening this thread to initiate the discussions.

Below is a proposal about how we may proceed:

Front-end and Back-end should be decoupled. To me, a regression
framework consists of four basic aspects:
- Data: observed features/attributes and responses
- Model: an entity that can be used to make predictions given new
observations. A model should contain the coefficients for producing linear
predictors (when data are given) and information about how to link
predictors to responses.
- Problem: given data, estimating model parameters can be casted
as an optimization problem with a certain objective function (and
optionally some constraints).
- Algorithm: the procedure to solve a given problem

The front-end modules should provide functions to help users turn
their data and domain-specific knowledges into optimization problems; while
the back-end should focus on solving the given problems. These two
parts require different skills (the former is mainly concerned with user
API design; while the latter is mainly about efficient optimization
algorithms).
-

I propose the following way to reorganize packages:
-
  RegressionBase.jl: provide types to represent regression problems
  and models. This package should also provide other facilities to express a
  regression problems, *e.g.* loss functions, regularizers, etc. This
  package can also provide some classical/basic algorithms to solve a
  regression problem. (This may more or less adopt what RegERM is
  doing).
  -

  GLMNet.jl (depend on RegressionBase.jl): wrap the external *glmnet*
  library to provide efficient solvers for certain regression problems. The
  part that depend on DataFrames should be separated out.
  -

  Similarly, SGD.jl, LARS.jl, etc should also depend on
  RegressionBase.jl and provide different kind of solvers. Note that
  GLMNet, SGD, LARS should accept the same problem types, and have
  consistent interface. They just implement different algorithms.
  -

  Regression.jl: a meta package that include RegressionBase.jl and a
  curated set of solver packages (*e.g.* GLMNet, SGD, etc)
  -

  GLMBase.jl (depend on Distributions.jl and Regression.jl): provide
  types to represent generalized linear models, relevant machinery such as
  link functions, etc. This package can take advantage of
  Regression.jl for model estimation.
  -

  GLM.jl (depend on GLMBase.jl and DataFrames.jl): provide high level
  UI for end users to perform data analysis. (The user interface can remain
  the same as it is).
Your suggestions and opinions are really appreciated.

The first question that we need to answer is whether we should introduce
RegressionBase.jl (which would borrow part of the stuff from RegERM). If
there's no objection, I can setup this package and then we can discuss
interface designs from there.

We can then proceed with the adjustment of other packages.

cc: @dmbates https://github.com/dmbates @johnmyleswhite
https://github.com/johnmyleswhite @simonster
https://github.com/simonster @BigCrunsh https://github.com/BigCrunsh
@Scidom https://github.com/scidom @StefanKarpinski
https://github.com/StefanKarpinski

—
Reply to this email directly or view it on GitHub
#14.

dmbates · 2014-07-20T11:45:03Z

Alright, now I see that it is a issue in the repository. I might suggest putting MixedModels within this framework too. Linear mixed models, generalized linear mixed models and nonlinear mixed models are all in the regression model family.

BigCrunsh · 2014-07-20T15:34:10Z

Great Initiative! I agree with this abstraction. But your argumentation holds in general for all regularized empirical risk minimization approaches. Is it necessary to restrict the base package RegressionBase.jl to regression and exclude classification methods like SVMs and logistic regression? So I would slightly extend / modify your suggestion:

instead of RegressionBase.jl I would propose to have a more general RegERMs.jl- like base package: provide types to represent regularised empirical risk problems and models. This package should also provide loss functions, regularizers, kernels, etc.
Classification.jl in addition to Regression.jl: Both could be umbrellas for empirical risk instances (SVMs, LogReg, MatrixFactorization, RidgeRegression,...) as well as other prediction approaches. Minor Concern: I am not sure yet how to handle methods which can be used for both like decision trees.

I am totally fine with using RegERMs.jl for that purpose and see how we can adapt it and the affected methods. This wiki might also be helpful for the interfaces.

cc: @gusl

lindahua · 2014-07-20T15:48:17Z

@BigCrunsh In my mind, the term Regression can be considered in a quite general sense, that is, to put it more intuitively, optimizing a sum of loss over given data plus a regularization term in some form.

If you don't mind, we can just use RegERMs as the basis and go from there. If you agree with this idea, what about moving the package RegERMs to StatsBase?

BigCrunsh · 2014-07-20T18:14:50Z

sounds good.

lindahua · 2014-07-20T18:24:36Z

@BigCrunsh: I have added you as one the owners of JuliaStats, so you have privilege to move packages here.

simonster · 2014-07-20T20:01:38Z

Generally, I support the idea of more standardized APIs and unification of our many regression packages. Some more specific comments below.

It would be great if our API supported fitting multiple dependent variables in some way, either explicitly or by offering a fit! method that reuses the factorization of the design matrix when applicable; see JuliaStats/StatsBase.jl#83.

L1 solvers are often used to fit many models spanning the entire regularization path because 1) fitting the entire regularization path is often not much more computationally expensive than fitting a single model (esp. for LARS, which has to fit the preceding part of the regularization path anyway) and 2) the regularization parameters are typically selected by cross validation, so knowledge of the entire regularization path is useful. We should thus have a standardized API for holding the regularization path and performing cross validation. Perhaps we should support the same for ridge, although the standard Cholesky algorithm doesn't benefit as much from fitting the entire regularization path and generalized cross validation is often used in place of ordinary cross validation.

As far as a high-level interface for fitting models goes, as of JuliaData/DataFrames.jl#571, you can fit any model that defines StatsBase.fit(Type{T<:RegressionModel}, X, y) as StatsBase.fit(Type, y ~ x1 + ... + xn, df) and it will wrap the resulting model so that coeftable prints the proper labels for the predictors and other methods that are applicable to RegressionModel are passed through to the underlying model object. There is still some work to be done here: it should be possible to call predict on a DataFrame, and I will investigate wrapping functions defined only for a specific model object. (Right now a DataFrameRegressionModel supports only the methods defined on RegressionModel in StatsBase.) I'm also not entirely sure what the low-level API should look like for MixedModels. But in general, I think this is a good way to split out the high-level API from the code that fits models and avoid making the low-level packages depend on DataFrames.

StefanKarpinski · 2014-07-20T20:07:56Z

I have nothing to add except that this is my favorite issue in a long time. (Besides the "Can" issue.)

gusl · 2014-07-21T03:57:25Z

I'm hoping to receive comments / edits on the wiki, and that this document
will evolve into the standard interface doc for Statistics / ML models.

On Sunday, July 20, 2014, Stefan Karpinski notifications@github.com wrote:

I have nothing to add except that this is my favorite issue in a long
time. (Besides the "Can" issue.)

—
Reply to this email directly or view it on GitHub
#14 (comment)
.

Gustavo Lacerda
http://www.optimizelife.com

papamarkou · 2014-07-21T13:34:34Z

Sorry for the late reply, I am on holidays for the next couple of weeks thus the delay. This is a great initiative as I favor the proposed abstraction and unification for regression models. More generally, I favor the unification of model specification across packages as discussed in PGM. I view the intended codebase for regression models as a first step towards this collaborative direction (if I am not mistaken regression can be expressed as a factor graph?)

lindahua · 2014-07-21T13:49:18Z

@gusl thanks for creating the wiki.

I am not completely sure that can be a common interface that work for all statistical models. Generally, generative Bayesian network, discriminative models, Markov random fields, time series, stochastic processes -- most of these can be called statistical models. I can't imagine one interface that can fit them all. For example, a Bayesian network may involve multiple variables, not just x and y, that are related to each other in a complicated way; while a time series model need to be updated over time.

I think it is more pragmatic to consider interface designs for individual family of models. Within this restricted context, many of your proposals do make a lot of sense.

This issue, in particular, focuses on a common family of problems -- regression analysis. Generally, regression analysis aims to find relations between dependent variables (also known as responses) and independent variables (e.g. features/attributes). A typical classification problem can be considered as a special case of regression problem that try to find relations between the features and the class labels.

From a mathematical standpoint, a regression problem can be formalized in two ways:

(Regularized) empirical risk minimization. This is an optimization problem that usually involves two parts, namely loss terms and regularization terms, as
```
minimize  \sum_i w_i f(x_i, y_i; \theta) + r(\theta)
```
Here, f is the loss function, r is the regularizer.
conditional distribution. This is to formulate the relations between x and y as a conditional distribution, as p(y | x). One can also impose a prior over the parameter \theta as p(\theta).
MAP estimation of the parameter \theta can be casted as a risk minimization problem as above, while the loss is f(x, y) = - log p(y | x), and the regularizer is r(\theta) = - \log p(\theta).

Generalized Linear Model is a special case of the regression analysis problem as outlined above, where the dependent variable y is connected to the independent variables x in a special form that involves a (possibly nonlinear) link function and a distribution over responses. This kind of formulations, while having a restricted form, are incredibly flexible. Many important regression problems, notably linear regression, logistic regression, and poisson regression, belong to this family.

A generalized linear model can be estimated in two ways: (1) cast to a regularized risk minimization problem; or (2) use algorithms dedicated to GLMs.

Conceptually, all these things can be divided into three levels:

Solver level: this level concerns about loss functions and regularization. The representation of loss functions and regularizers, as well as basic algorithms, can be mostly implemented in a base package (preferably RegERM). Advanced or specific algorithms may be implemented in other packages with a standardized interface, such as GLMnet, LARS, GLMNet, SVM, SGD, etc.
Model level: this level concerns about probabilistic formulation, e.g. evaluate conditional probabilities, likelihoods, and compute useful statistics. This level should go into GLMBase and MixedModels. Note that this level makes it possible to incorporate regression models into a bigger probabilistic framework, e.g. hierarchical mixture of experts, etc.
Semantics level: this level concerns about assigning semantic interpretation to the results. At this level, each variable may be associated with a semantic meaning (e.g. temperature, speed, duration, etc). The mathematical machinery at lower levels can be combined with a semantic context (e.g. DataFrames) to achieve this goal. Major tasks at this level: (1) turn given user inputs & data into a lower level form that math algorithms can operate on; (2) invoke a proper solver/algorithm; (3) deliver the results to user in a meaningful way.

A major principle in modern software engineering is separation of concerns. This principle also applies here. I can imagine that different groups of developers (of course these groups may overlap in reality) may focus on different levels:

Solver level: people interested in optimization or machine learning algorithms.
Model level: people interested in statistics or probabilistic modeling.
Semantic level: people interested in data mining. end users.

Particularly, people who implement solvers or machine learning algorithms should not be concerned about things like data frames etc. It is the responsibility of the higher level packages to convert data frames into a problem in standardized forms (that only involve numerical matrices and vectors).

I hope this further clarify my thoughts.

lindahua · 2014-07-21T14:01:33Z

to @Scidom: the model level of this formulation (as outlined above) can be seen as a factor in a probabilistic graphical model, and thus can be incorporated in a larger probabilistic framework.

lindahua · 2014-07-21T14:07:00Z

My experience with developing Distributions and Graphs is that interface may change a lot as opposed to what is planned originally. It would be useful to start building up a package and make changes as necessary as we move forward. We can update the wiki as the API matures.

As to how we may proceed, I think the next would be to start working on the regression codebase (starting from the solver level).

@BigCrunsh: would you please move RegERM over to JuliaStats when you are ready? We can work together on detailed design of the API over there.

IainNZ · 2014-07-21T15:46:33Z

@lindahua is right about getting started. Look to JuliaOpt for inspiration that it can work, although it was a smaller set of developers. We have a solver level (i.e. a package for each solver wrapper), a generic interface level to all solvers that defines a canonical form (MathProgBase.jl) and then currently one modeling interface (JuMP.jl, although CVX.jl will join this soon).

lindahua · 2014-07-21T16:42:03Z

I looked at the codes in RegERM, and believe it is going along the right direction. The package has already provided some basic infrastructure, such as Loss, Regularizer, and some types to represent regression problems.

We probably need to enrich that system with more discussions. However, I think it is already a good starting point.

StefanKarpinski · 2014-07-21T18:03:26Z

This breakdown into the solver, model and semantics levels is very good. It might be a bit premature, but I find that making the names of things line up with the concepts can be very helpful to get everyone on the same page conceptually. (This is why I'm so picky about naming.) Perhaps there should be packages named RegressionSolvers, RegressionModels and RegressionInterfaces? Perhaps not all of the code goes in there, but it seems like there will need to be common base types that can live there.

lindahua · 2014-07-21T19:14:16Z

@StefanKarpinski These names would be useful as abstract types. This whole thing involves close interaction between these types, hence it would make sense to put this type hierarchy in a foundational package, together with a clear document about how they work. Other packages can extend those or build on top of them.

Originally, I proposed to have a package named RegressionBase. However, after looking at @BigCrunsh's RegERM, I think that would be the right place to host these.

gusl · 2014-07-22T03:14:05Z

On Mon, Jul 21, 2014 at 6:49 AM, Dahua Lin notifications@github.com wrote:

@gusl https://github.com/gusl thanks for creating the wiki.

I am not completely sure that can be a common interface that work for all
statistical models. Generally, generative Bayesian network, discriminative
models, Markov random fields, time series, stochastic processes -- most of
these can be called statistical models. I can't imagine one interface
that can fit them all.

Thanks for bringing up separation of concerns.

For example, a Bayesian network may involve multiple variables, not just x
and y, that are related to each other in a complicated way;

The issue with graphical models is that 'fit' can mean many different
things.

StatisticalModel e.g. a specific graphical model such as an A -- B -- C
Ising model with a free parameter for each edge
InferenceGoal e.g. MLE, MAP estimate, posterior approximation by Monte
Carlo, etc
InferenceAlgorithm (Solver) e.g. Optimization with Interior-Point method,
Metropolis-Hastings, etc

(I'm introducing an extra level, between Model and Solver)

My idea is that 'fit' should still be used, with extra arguments that have
default values.

e.g. given an instance of the A--B--C Ising Model with a free parameter for
each edge, 'fit' would assume by default that you want a MAP estimate with
a diffuse prior, but you can also specify that you want to do a posterior
approximation, and then it assumes that you want Metropolis-Hastings and
will use a standard proposal distribution, but also allows you to pass your
own proposal.

while a time series model need to be updated over time.

I would say that fit_more! applies in this case.

I think it is more pragmatic to consider interface designs for individual

family of models. Within this restricted context, many of your proposals do
make a lot of sense.

This issue, in particular, focuses on a common family of problems -- regression
analysis http://en.wikipedia.org/wiki/Regression_analysis. Generally,
regression analysis aims to find relations between dependent variables
(also known as responses) and independent variables (e.g.
features/attributes). A typical classification problem can be considered as
a special case of regression problem that try to find relations between the
features and the class labels.

I agree.

From a mathematical standpoint, a regression problem can be formalized
in two ways:

(Regularized) empirical risk minimization. This is an optimization
problem that usually involves two parts, namely loss terms and
regularization terms, as

minimize \sum_i w_i f(x_i, y_i; \theta) + r(\theta)

Here, f is the loss function, r is the regularizer.

Please pardon my ignorance, but my understanding is that since "risk"
means expected loss, "empirical risk minimization" means minimizing risk on
unseen data (often estimated by using held-out data)... so it sounds
broader than the formulation above. (I guess I'm not convinced that a
penalty on \theta provides a universal solution to the problem)

conditional distribution. This is to formulate the relations
between x and y as a conditional distribution, as p(y | x). One can
also impose a prior over the parameter \theta as p(\theta). MAP
estimation of the parameter \theta can be casted as a risk
minimization problem as above, while the loss is f(x, y) = - log p(y |
x), and the regularizer is r(\theta) = - \log p(\theta).

This sounds like a special case of the above, namely the loss function is
the log-likelihood. If you're doing MLE, r(theta) is the zero function.

Generalized Linear Model
http://en.wikipedia.org/wiki/Generalized_linear_model is a special
case of the regression analysis problem as outlined above, where the
dependent variable y is connected to the independent variables x in a
special form that involves a (possibly nonlinear) link function and a
distribution over responses. This kind of formulations, while having a
restricted form, are incredibly flexible. Many important regression
problems, notably linear regression, logistic regression, and poisson
regression, belong to this family.

yes, GLMs are super important.

g is said to be the link function if E[Y_i] = g(X_i beta). If g :: Real ->
Real, then the GLM is called a "single-index model", because we are
summarizing X_i with a single Real number (X_i beta). If g :: Real^2 ->
Real, we have a "double-index model", which will be a richer model if g is
a truly 2D function.
Anyway, GLMs get really cool when we don't specify the functional form of
g, i.e. semi-parametric models.

A generalized linear model can be estimated in two ways: (1) cast to a
regularized risk minimization problem; or (2) use algorithms dedicated to
GLMs.

The "algorithms dedicated to GLMs" hopefully optimize the same objective
function as (1).

Conceptually, all these things can be divided into three levels:

Solver level: this level concerns about loss functions and
regularization. The representation of loss functions and regularizers, as
well as basic algorithms, can be mostly implemented in a base package
(preferably RegERM). Advanced or specific algorithms may be
implemented in other packages with a standardized interface, such as
GLMnet, LARS, GLMNet, SVM, SGD, etc.
2.

Model level: this level concerns about probabilistic formulation,
e.g. evaluate conditional probabilities, likelihoods, and compute
useful statistics. This level should go into GLMBase and MixedModels.
Note that this level makes it possible to incorporate regression models
into a bigger probabilistic framework, e.g. hierarchical mixture of
experts, etc.
3.

Semantics level: this level concerns about assigning semantic
interpretation to the results. At this level, each variable may be
associated with a semantic meaning (e.g. temperature, speed, duration,
etc). The mathematical machinery at lower levels can be combined with a
semantic context (e.g. DataFrames) to achieve this goal. Major tasks
at this level: (1) turn given user inputs & data into a lower level form
that math algorithms can operate on; (2) invoke a proper solver/algorithm;
(3) deliver the results to user in a meaningful way.

A major principle in modern software engineering is separation of
concerns. This principle also applies here. I can imagine that different
groups of developers (of course these groups may overlap in reality) may
focus on different levels:

Solver level: people interested in optimization or machine
learning algorithms.

Model level: people interested in statistics or probabilistic
modeling.

Semantic level: people interested in data mining. end users.

Particularly, people who implement solvers or machine learning algorithms
should not be concerned about things like data frames etc. It is the
responsibility of the higher level packages to convert data frames into a
problem in standardized forms (that only involve numerical matrices and
vectors).

I completely agree with the above.

BigCrunsh · 2014-07-22T06:19:13Z

@lindahua: I already moved RegERMs.jl to JuliaStats 😉

papamarkou · 2014-07-22T07:04:06Z

@gusl you touched various matters in your last message. As far as passing a user-defined proposal to the Metropolis-Hastings sampler is concerned, I have thought it last month and have a clear idea on how to do it. In fact I have pretty much completed coding it and once finished, I will merge this generalisation to the MCMC package; I will do this soon after I return from holidays.

papamarkou · 2014-07-22T07:06:07Z

P.S. in fact the structure of MCMC will undergo several already planned changes and refactoring though this goes beyond the scope of the present thread.

BigCrunsh · 2014-07-22T07:46:35Z

I like the idea of solver, model, and semantics level. I agree with @lindahua and @IainNZ, let's get started and perhaps with revising the interfaces in RegERMs.jl; I started a more detailed discussion over there: JuliaStats/RegERMs.jl#3.

Just one thing, which is probably to earlier, but sooner or later there is a large zoo of solvers and at some point it might be useful to have some benchmarking to derive default choices depending on the number of examples, dimensions, sparsity,...

lindahua · 2014-07-22T11:43:43Z

Thanks @BigCrunsh.

Let's keep the high level discussions (those that affect the reorganization of packages) here. Detailed API design of regression problems should go to JuliaStats/RegERMs.jl#3, as @BigCrunsh suggested.

lendle · 2014-07-22T16:48:56Z

A general ensemble package would be great to have under the RegressionBase.jl umbrella. A consistent interface makes developing a such a package very easy.

@svs14 has done a lot of work on the Orchestra.jl package which provides heterogeneous ensemble methods and has it's own API. I don't know the details but it might be a good starting place if the API can be made consistent with RegressionBase.jl.

lendle · 2014-07-22T17:34:39Z

I have OnlineLearning.jl which fits GLMs (linear regression, logistic, and quantile regression for now) (optionally with L1 and/or L2 regularization) with SGD. Standard SGD and some variants (ADAGRAD, ADADELTA, and averaged SGD) are implemented. I also started on linear SVMs but the implementation is not done.

I'll keep an eye on JuliaStats/RegERMs.jl#3 and can update the API when that's more fleshed out.

BigCrunsh · 2014-07-23T06:55:32Z

@lendle: Feel free to do that in that framework 😉

robertfeldt · 2014-07-30T22:54:29Z

I'd be happy to get a clean version of the newly proposed L0 EM algorithm into the proper format once the regularized regression design/interfaces has been set. For a spike on the L0 EM algorithm see:

https://github.com/robertfeldt/FeldtLib.jl/blob/master/spikes/L0EM_regularized_regression.jl

cc: @lindahua

Vgrunert · 2015-09-05T10:04:43Z

What happend to this project? Are there any new developments? The idea is really great.

datnamer · 2015-09-08T20:35:02Z

Check this out https://github.com/Evizero/SupervisedLearning.jl

IainNZ mentioned this issue Jul 20, 2014

Document relationship with other packages benhamner/MachineLearning.jl#4

Closed

BigCrunsh mentioned this issue Jul 22, 2014

Design of a general API for Regression problems JuliaStats/RegERMs.jl#3

Open

BigCrunsh mentioned this issue Oct 13, 2014

Use a more general framework for primal SVM JuliaStats/SVM.jl#8

Open

nalimilan mentioned this issue Jan 31, 2016

Three packages: ParametricModels.jl, DynamicDiscreteModels.jl, HiddenMarkovModels.jl JuliaLang/METADATA.jl#4491

Closed

trthatcher mentioned this issue Mar 12, 2016

Register DiscriminantAnalysis: v0.0.1 JuliaLang/METADATA.jl#4750

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify the efforts for Regression/GLM #14

Unify the efforts for Regression/GLM #14

lindahua commented Jul 20, 2014

IainNZ commented Jul 20, 2014

dmbates commented Jul 20, 2014

am opening this thread to initiate the discussions.

dmbates commented Jul 20, 2014

BigCrunsh commented Jul 20, 2014

lindahua commented Jul 20, 2014

BigCrunsh commented Jul 20, 2014

lindahua commented Jul 20, 2014

simonster commented Jul 20, 2014

StefanKarpinski commented Jul 20, 2014

gusl commented Jul 21, 2014

papamarkou commented Jul 21, 2014

lindahua commented Jul 21, 2014

lindahua commented Jul 21, 2014

lindahua commented Jul 21, 2014

IainNZ commented Jul 21, 2014

lindahua commented Jul 21, 2014

StefanKarpinski commented Jul 21, 2014

lindahua commented Jul 21, 2014

gusl commented Jul 22, 2014

BigCrunsh commented Jul 22, 2014

papamarkou commented Jul 22, 2014

papamarkou commented Jul 22, 2014

BigCrunsh commented Jul 22, 2014

lindahua commented Jul 22, 2014

lendle commented Jul 22, 2014

lendle commented Jul 22, 2014

BigCrunsh commented Jul 23, 2014

robertfeldt commented Jul 30, 2014

Vgrunert commented Sep 5, 2015

datnamer commented Sep 8, 2015

Unify the efforts for Regression/GLM #14

Unify the efforts for Regression/GLM #14

Comments

lindahua commented Jul 20, 2014

IainNZ commented Jul 20, 2014

dmbates commented Jul 20, 2014

am opening this thread to initiate the discussions.

dmbates commented Jul 20, 2014

BigCrunsh commented Jul 20, 2014

lindahua commented Jul 20, 2014

BigCrunsh commented Jul 20, 2014

lindahua commented Jul 20, 2014

simonster commented Jul 20, 2014

StefanKarpinski commented Jul 20, 2014

gusl commented Jul 21, 2014

papamarkou commented Jul 21, 2014

lindahua commented Jul 21, 2014

lindahua commented Jul 21, 2014

lindahua commented Jul 21, 2014

IainNZ commented Jul 21, 2014

lindahua commented Jul 21, 2014

StefanKarpinski commented Jul 21, 2014

lindahua commented Jul 21, 2014

gusl commented Jul 22, 2014

BigCrunsh commented Jul 22, 2014

papamarkou commented Jul 22, 2014

papamarkou commented Jul 22, 2014

BigCrunsh commented Jul 22, 2014

lindahua commented Jul 22, 2014

lendle commented Jul 22, 2014

lendle commented Jul 22, 2014

BigCrunsh commented Jul 23, 2014

robertfeldt commented Jul 30, 2014

Vgrunert commented Sep 5, 2015

datnamer commented Sep 8, 2015