-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fitting glms on matricies instead of dataframes? #54
Comments
Good idea. I patterned the division into It is a matter of determining how much information needs to be retained at what types in a type hierarchy. You mentioned the assign field, which is important if you have used Formula/Data arguments and want to evaluate analysis of variance (lm) or analysis of deviance (glm) tables. (The assign field maps columns in the model matrix to terms in the formula.) Coming from an I'll propose a series of types that could allow for more general approaches. I hope that @lindahua and @andreasnoackjensen will be able to comment on such types. |
Perhaps if a model is fit with a design matrix, we can just create a ModelMatrix with the |
cc @johnmyleswhite, who probably saw this anyway but might have some thoughts since he's currently working on the Formula code. Also cross-referencing JuliaStats/Roadmap.jl#11. |
This adds a method for fitting a GLM by explicitly specifying the design matrix and response vectors. The resulting GlmMod object has empty ModelFrame and formula fields, and I've changed the few functions that reference these fields to first check if they are defined. Eventually it is probably a good idea to follow @lindahua's suggestion from JuliaStats/Roadmap.jl#11 and split out functionality that depends on DataFrames into a separate package, but most of these changes will be necessary for that as well. I have also added a method for fitting a GLM on a new response vector using the same design matrix. Closes JuliaStats#54
I've implemented fitting GLMs on matrices in a somewhat naive way in simonster/GLM.jl@e517ede, although that code is built on top of #56, so I'll wait for the outcome on that before I create a PR for it. |
Thank you! |
I'm looking for something like R's
glm.fit
function, which is called byglm
after the design matrix and response vector are constructed and can be called directly. This would be useful if you already have a design matrix and response, and you don't need extra features that are provided by DataFrames. This would also avoid the overhead of having a DataFrame in memory in addition to your design matrix.It looks like it would not be too hard to provide this. In glmfit.jl,
fit
takes a GlmMod. Other than the ModelFrame, the GlmMod object can be constructed from a response vector and design matrix instead of a Formula and DataFrame, and it looks like the ModelFrame is only used for column names in the coef table. (I'm exactly sure what the assign field is for in a ModelMatrix, important here?)The text was updated successfully, but these errors were encountered: