Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fitting glms on matricies instead of dataframes? #54

Closed
lendle opened this issue Feb 19, 2014 · 5 comments · Fixed by #57
Closed

Fitting glms on matricies instead of dataframes? #54

lendle opened this issue Feb 19, 2014 · 5 comments · Fixed by #57

Comments

@lendle
Copy link
Contributor

lendle commented Feb 19, 2014

I'm looking for something like R's glm.fit function, which is called by glm after the design matrix and response vector are constructed and can be called directly. This would be useful if you already have a design matrix and response, and you don't need extra features that are provided by DataFrames. This would also avoid the overhead of having a DataFrame in memory in addition to your design matrix.

It looks like it would not be too hard to provide this. In glmfit.jl, fit takes a GlmMod. Other than the ModelFrame, the GlmMod object can be constructed from a response vector and design matrix instead of a Formula and DataFrame, and it looks like the ModelFrame is only used for column names in the coef table. (I'm exactly sure what the assign field is for in a ModelMatrix, important here?)

@dmbates
Copy link
Contributor

dmbates commented Feb 20, 2014

Good idea. I patterned the division into glmfit and glm on the corresponding R code (but the implementation is considerably different from that in R, thankfully).

It is a matter of determining how much information needs to be retained at what types in a type hierarchy. You mentioned the assign field, which is important if you have used Formula/Data arguments and want to evaluate analysis of variance (lm) or analysis of deviance (glm) tables. (The assign field maps columns in the model matrix to terms in the formula.)

Coming from an R background I prefer the Formula/Data specification but to others using a pre-built model matrix is just fine.

I'll propose a series of types that could allow for more general approaches. I hope that @lindahua and @andreasnoackjensen will be able to comment on such types.

@simonster
Copy link
Member

Perhaps if a model is fit with a design matrix, we can just create a ModelMatrix with the assign field populated as 1:size(X, 2)?

@simonster
Copy link
Member

cc @johnmyleswhite, who probably saw this anyway but might have some thoughts since he's currently working on the Formula code. Also cross-referencing JuliaStats/Roadmap.jl#11.

simonster added a commit to simonster/GLM.jl that referenced this issue Feb 24, 2014
This adds a method for fitting a GLM by explicitly specifying the
design matrix and response vectors. The resulting GlmMod object has
empty ModelFrame and formula fields, and I've changed the few
functions that reference these fields to first check if they are
defined.

Eventually it is probably a good idea to follow @lindahua's suggestion
from JuliaStats/Roadmap.jl#11 and split out functionality that depends
on DataFrames into a separate package, but most of these changes will
be necessary for that as well.

I have also added a method for fitting a GLM on a new response vector
using the same design matrix.

Closes JuliaStats#54
@simonster
Copy link
Member

I've implemented fitting GLMs on matrices in a somewhat naive way in simonster/GLM.jl@e517ede, although that code is built on top of #56, so I'll wait for the outcome on that before I create a PR for it.

@lendle
Copy link
Contributor Author

lendle commented Apr 4, 2014

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants