Skip to content

Commit

Permalink
Merge pull request #59 from kkondo1981/develop
Browse files Browse the repository at this point in the history
Version 0.4.0
  • Loading branch information
kkondo1981 authored Jun 9, 2021
2 parents ecab04e + 55e0be2 commit 98a28c1
Show file tree
Hide file tree
Showing 56 changed files with 1,908 additions and 1,006 deletions.
5 changes: 5 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,8 @@
^aglm\.Rcheck$
^aglm.*\.tar\.gz$
^aglm.*\.tgz$
^LICENSE\.md$
^cran-comments\.md
^\.github$
^examples/*
^CRAN-RELEASE$
2 changes: 2 additions & 0 deletions CRAN-RELEASE
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
This package was submitted to CRAN on 2021-06-09.
Once it is accepted, delete this file and tag the release (commit 97c24a7).
30 changes: 19 additions & 11 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,24 +1,32 @@
Package: aglm
Type: Package
Title: Accurate Generalized Linear Model (AGLM)
Version: 0.3.2
Author: Kenji Kondo, Kazuhisa Takahashi, others
Maintainer: Kenji Kondo <kkondo.odnokk@gmail.com>
Description: A handy tool for actuarial modeling, which is designed to achieve both accuracy and accountability.
AGLM is based on GLM but customized by expert actuaries for areas which require not only accuracy but also accountability.
Title: Accurate Generalized Linear Model
Version: 0.4.0
Authors@R: c(
person("Kenji", "Kondo", role=c("aut", "cre", "cph"), email="kkondo.odnokk@gmail.com"),
person("Kazuhisa", "Takahashi", role=c("ctb")),
person("Hikari", "Banno", role=c("ctb"))
)
Description: Provides functions to fit Accurate Generalized Linear Model (AGLM) models, visualize them, and predict for new data. AGLM is defined as a regularized GLM which applies a sort of feature transformations using a discretization of numerical features and specific coding methodologies of dummy variables. For more information on AGLM, see Suguru Fujita, Toyoto Tanaka, Kenji Kondo and Hirokazu Iwasawa (2020) <https://www.institutdesactuaires.com/global/gene/link.php?doc_id=16273&fg=1>.
URL: https://github.com/kkondo1981/aglm
BugReports: https://github.com/kkondo1981/aglm/issues
License: GPL-2
Encoding: UTF-8
LazyData: true
Language: en-US
RoxygenNote: 7.1.1
Roxygen: list(markdown = TRUE)
Depends:
R (>= 4.0.2),
R (>= 4.0.0),
Imports:
glmnet (>= 4.0.2),
assertthat
assertthat,
methods,
mathjaxr
Suggests:
testthat,
knitr,
rmarkdown,
MASS
VignetteBuilder: knitr
MASS,
faraway
RdMacros:
mathjaxr
21 changes: 20 additions & 1 deletion NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ export(createEqualWidthBins)
export(cv.aglm)
export(cva.aglm)
export(executeBinning)
export(getDesignMatrix)
export(getLVarMatForOneVec)
export(getODummyMatForOneVec)
export(getUDummyMatForOneVec)
Expand All @@ -22,3 +21,23 @@ importFrom(assertthat,assert_that)
importFrom(glmnet,cv.glmnet)
importFrom(glmnet,glmnet)
importFrom(glmnet,predict.glmnet)
importFrom(grDevices,devAskNewPage)
importFrom(graphics,barplot)
importFrom(graphics,boxplot)
importFrom(graphics,lines)
importFrom(graphics,mtext)
importFrom(graphics,par)
importFrom(graphics,points)
importFrom(graphics,rug)
importFrom(methods,new)
importFrom(stats,IQR)
importFrom(stats,coef)
importFrom(stats,deviance)
importFrom(stats,getCall)
importFrom(stats,ksmooth)
importFrom(stats,predict)
importFrom(stats,quantile)
importFrom(stats,residuals)
importFrom(stats,smooth.spline)
importFrom(utils,flush.console)
importFrom(utils,str)
5 changes: 5 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# aglm 0.4.0
- Updated all documents and examples.

# aglm 0.3.2
- Fixed to use `R` 4.0 and `glmnet` 4.0.
44 changes: 28 additions & 16 deletions R/accurate-glm.R
Original file line number Diff line number Diff line change
@@ -1,13 +1,21 @@
# S4 class for fitted AGLM
# written by Kenji Kondo @ 2019/1/2


#' S4 class for fitted AGLM
#' Class for results of `aglm()` and `cv.aglm()`
#'
#' @slot backend_models The fitted backend `glmnet` model is stored.
#' @slot vars_info A list, each of whose element is information of one variable.
#' @slot lambda Same as in the result of \link{cv.glmnet}.
#' @slot cvm Same as in the result of \link{cv.glmnet}.
#' @slot cvsd Same as in the result of \link{cv.glmnet}.
#' @slot cvup Same as in the result of \link{cv.glmnet}.
#' @slot cvlo Same as in the result of \link{cv.glmnet}.
#' @slot nzero Same as in the result of \link{cv.glmnet}.
#' @slot name Same as in the result of \link{cv.glmnet}.
#' @slot lambda.min Same as in the result of \link{cv.glmnet}.
#' @slot lambda.1se Same as in the result of \link{cv.glmnet}.
#' @slot fit.preval Same as in the result of \link{cv.glmnet}.
#' @slot foldid Same as in the result of \link{cv.glmnet}.
#' @slot call An object of class `call`, corresponding to the function call when this `AccurateGLM` object is created.
#'
#' @slot backend_models Internally used model objects to be passed to backend functions.
#' Currently glmnet is used as a backend and this slot holding a glmnet object.
#' @slot vars_info A list of list. Each element of `vars_info` represents one predictor variable and contains various informations of it.
#' @slot others slots for holding cross-validation results
#' @author Kenji Kondo
#'
#' @export
setClass("AccurateGLM",
Expand All @@ -26,14 +34,18 @@ setClass("AccurateGLM",
foldid="integer",
call="ANY"))

#' S4 class for the result of cva.aglm function

#' Class for results of `cva.aglm()`
#'
#' @slot models_list A list consists of `cv.glmnet()`'s results for all \eqn{\alpha} values.
#' @slot alpha Same as in \link{cv.aglm}.
#' @slot nfolds Same as in \link{cv.aglm}.
#' @slot alpha.min.index The index of `alpha.min` in the vector `alpha`.
#' @slot alpha.min The \eqn{\alpha} value achieving the minimum loss among all the values of `alpha`.
#' @slot lambda.min The \eqn{\lambda} value achieving the minimum loss when \eqn{\alpha} is equal to `alpha.min`.
#' @slot call An object of class `call`, corresponding to the function call when this `CVA_AccurateGLM` object is created.
#'
#' @slot models_list Results of cv.glmnet() for all the values of alpha.
#' @slot alpha A numeric values specifying alpha values to be examined.
#' @slot nfolds An integer value specifying the number of folds.
#' @slot alpha.min The alpha value which achieves the minimum loss.
#' @slot alpha.min.index An integer value specifying the index of `alpha.min` in `alpha`.
#' @slot lambda.min The lambda value which achieves the minimum loss, when combined with `alpha.min`.
#' @author Kenji Kondo
#'
#' @export
setClass("CVA_AccurateGLM",
Expand Down
30 changes: 11 additions & 19 deletions R/aglm-input.R
Original file line number Diff line number Diff line change
@@ -1,17 +1,14 @@
# handling inputs of AGLM model
# written by Kenji Kondo @ 2019/1/2


#' S4 class for input
#'
#' @slot vars_info A list of list. Each element has some information of one feature.
#' @slot data A data.frame which contains original data itself.
#' @slot vars_info A list, each of whose element is information of one variable.
#' @slot data The original data.
setClass("AGLM_Input",
representation=representation(vars_info="list", data="data.frame"))


#' Create a new AGLM_Input object
# An inner-use function for creating a new AGLM_Input object
#' @importFrom assertthat assert_that
#' @importFrom methods new
newInput <- function(x,
qualitative_vars_UD_only=NULL,
qualitative_vars_both=NULL,
Expand Down Expand Up @@ -86,11 +83,11 @@ newInput <- function(x,
cl <- class(idxs_or_names)
idxs <- seq(length(var_names))
if (cl == "integer") {
is_hit <- function(idx) {return(idx %in% idxs_or_names)}
idxs <- idxs[sapply(idxs, is_hit)]
is_hit_i <- function(idx) {return(idx %in% idxs_or_names)}
idxs <- idxs[sapply(idxs, is_hit_i)]
} else if (cl == "character") {
is_hit <- function(var_name) {return(var_name %in% idxs_or_names)}
idxs <- idxs[sapply(var_names, is_hit)]
is_hit_c <- function(var_name) {return(var_name %in% idxs_or_names)}
idxs <- idxs[sapply(var_names, is_hit_c)]
} else {
assert_that(FALSE, msg="qualitative_vars_UD_only, qualitative_vars_both, qualitative_vars_both, quantitative_vars should be integer or character vectors.")
}
Expand Down Expand Up @@ -309,6 +306,8 @@ getMatrixRepresentationByVector <- function(raw_vec, var_info, drop_OD=FALSE) {
return(z)
}


#' @importFrom assertthat assert_that
getMatrixRepresentation <- function(x, idx, drop_OD=FALSE) {
var_info <- x@vars_info[[idx]]
z <- NULL
Expand Down Expand Up @@ -347,20 +346,13 @@ getMatrixRepresentation <- function(x, idx, drop_OD=FALSE) {
}
colnames(z) <- nm
} else {
assert_true(FALSE) # never expects to come here
assert_that(FALSE) # never expects to come here
}

return(z)
}


#' Get design-matrix representation of AGLM_Input objects
#'
#' @param x An AGLM_Input object
#'
#' @return A data.frame which represents the matrix representation of `x`.
#'
#' @export
#' @importFrom assertthat assert_that
getDesignMatrix <- function(x) {
# Check arguments
Expand Down
81 changes: 81 additions & 0 deletions R/aglm-package.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
#' aglm: Accurate Generalized Linear Model
#'
#' Provides functions to fit Accurate Generalized Linear Model (AGLM) models,
#' visualize them, and predict for new data. AGLM is defined as a regularized GLM
#' which applies a sort of feature transformations using a discretization of numerical
#' features and specific coding methodologies of dummy variables.
#' For more information on AGLM, see
#' \href{https://www.institutdesactuaires.com/global/gene/link.php?doc_id=16273&fg=1}{Suguru Fujita, Toyoto Tanaka, Kenji Kondo and Hirokazu Iwasawa (2020)}.
#'
#' The collection of functions provided by the `aglm` package has almost the same structure as the famous `glmnet` package,
#' so users familiar with the `glmnet` package will be able to handle it easily.
#' In fact, this structure is reasonable in implementation, because what the `aglm` package does is
#' applying appropriate transformations to the given data and passing it to the `glmnet` package as a backend.
#'
#' @section Fitting functions:
#' The `aglm` package provides three different fitting functions, depending on how users want to handle hyper-parameters of AGLM models.
#'
#' Because AGLM is based on regularized GLM, the regularization term of the loss function can be expressed as follows:
#' \loadmathjax
#' \mjsdeqn{
#' R(\lbrace \beta_{jk} \rbrace; \lambda, \alpha)
#' = \lambda \left\lbrace
#' (1 - \alpha)\sum_{j=1}^{p} \sum_{k=1}^{m_j}|\beta_{jk}|^2 + \alpha \sum_{j=1}^{p} \sum_{k=1}^{m_j} |\beta_{jk}|
#' \right\rbrace,
#' }
#' where \eqn{\beta_jk} is the k-th coefficient of auxiliary variables for the j-th column in data,
#' \eqn{\alpha} is a weight which controls how L1 and L2 regularization terms are mixed,
#' and \eqn{\lambda} determines the strength of the regularization.
#'
#' Searching hyper-parameters \eqn{\alpha} and \eqn{\lambda} is often useful to get better results, but usually time-consuming.
#' That's why the `aglm` package provides three fitting functions with different strategies for specifying hyper-parameters as follows:
#' * \link{aglm}: A basic fitting function with given \eqn{\alpha} and \eqn{\lambda} (s).
#' * \link{cv.aglm}: A fitting function with given \eqn{\alpha} and cross-validation for \eqn{\lambda}.
#' * \link{cva.aglm}: A fitting function with cross-validation for both \eqn{\alpha} and \eqn{\lambda}.
#'
#' Generally speaking, setting an appropriate \eqn{\lambda} is often important to get meaningful results,
#' and using `cv.aglm()` with default \eqn{\alpha=1} (LASSO) is usually enough.
#' Since `cva.aglm()` is much time-consuming than `cv.aglm()`, it is better to use it only if particularly better results are needed.
#'
#' The following S4 classes are defined to store results of the fitting functions.
#' * \link{AccurateGLM-class}: A class for results of `aglm()` and `cv.aglm()`
#' * \link{CVA_AccurateGLM-class}: A class for results of `cva.aglm()`
#'
#' @section Using the fitted model:
#' Users can use models obtained from fitting functions in various ways, by passing them to following functions:
#' * \link[=predict.AccurateGLM]{predict}: Make predictions for new data
#' * \link[=plot.AccurateGLM]{plot}: Plot contribution of each variable and residuals
#' * \link[=print.AccurateGLM]{print}: Display textual information of the model
#' * \link[=coef.AccurateGLM]{coef}: Get coefficients
#' * \link[=deviance.AccurateGLM]{deviance}: Get deviance
#' * \link[=residuals.AccurateGLM]{residuals}: Get residuals of various types
#'
#' We emphasize that `plot()` is particularly useful to understand the fitted model,
#' because it presents a visual representation of how variables in the original data are used by the model.
#'
#' @section Other functions:
#' The following functions are basically for internal use, but exported as utility functions for convenience.
#'
#' * Functions for creating feature vectors
#' * \link{getUDummyMatForOneVec}
#' * \link{getODummyMatForOneVec}
#' * \link{getLVarMatForOneVec}
#' * Functions for binning
#' * \link{createEqualWidthBins}
#' * \link{createEqualFreqBins}
#' * \link{executeBinning}
#'
#'
#' @author
#' * Kenji Kondo,
#' * Kazuhisa Takahashi and Hikari Banno (worked on L-Variable related features)
#'
#'
#' @references Suguru Fujita, Toyoto Tanaka, Kenji Kondo and Hirokazu Iwasawa. (2020)
#' \emph{AGLM: A Hybrid Modeling Method of GLM and Data Science Techniques}, \cr
#' \url{https://www.institutdesactuaires.com/global/gene/link.php?doc_id=16273&fg=1} \cr
#' \emph{Actuarial Colloquium Paris 2020}
#'
#' @docType package
#' @name aglm-package
NULL
Loading

0 comments on commit 98a28c1

Please sign in to comment.