Merge pull request #59 from kkondo1981/develop

Version 0.4.0
kkondo1981 · Jun 9, 2021 · 98a28c1 · 98a28c1
2 parents ecab04e + 55e0be2
commit 98a28c1
Show file tree

Hide file tree

Showing 56 changed files with 1,908 additions and 1,006 deletions.
diff --git a/.Rbuildignore b/.Rbuildignore
@@ -3,3 +3,8 @@
 ^aglm\.Rcheck$
 ^aglm.*\.tar\.gz$
 ^aglm.*\.tgz$
+^LICENSE\.md$
+^cran-comments\.md
+^\.github$
+^examples/*
+^CRAN-RELEASE$
diff --git a/CRAN-RELEASE b/CRAN-RELEASE
@@ -0,0 +1,2 @@
+This package was submitted to CRAN on 2021-06-09.
+Once it is accepted, delete this file and tag the release (commit 97c24a7).
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,24 +1,32 @@
 Package: aglm
 Type: Package
-Title: Accurate Generalized Linear Model (AGLM)
-Version: 0.3.2
-Author: Kenji Kondo, Kazuhisa Takahashi, others
-Maintainer: Kenji Kondo <kkondo.odnokk@gmail.com>
-Description: A handy tool for actuarial modeling, which is designed to achieve both accuracy and accountability.
-    AGLM is based on GLM but customized by expert actuaries for areas which require not only accuracy but also accountability.
+Title: Accurate Generalized Linear Model
+Version: 0.4.0
+Authors@R: c(
+    person("Kenji", "Kondo", role=c("aut", "cre", "cph"), email="kkondo.odnokk@gmail.com"),
+    person("Kazuhisa", "Takahashi", role=c("ctb")),
+    person("Hikari", "Banno", role=c("ctb"))
+    )
+Description: Provides functions to fit Accurate Generalized Linear Model (AGLM) models, visualize them, and predict for new data. AGLM is defined as a regularized GLM which applies a sort of feature transformations using a discretization of numerical features and specific coding methodologies of dummy variables. For more information on AGLM, see Suguru Fujita, Toyoto Tanaka, Kenji Kondo and Hirokazu Iwasawa (2020) <https://www.institutdesactuaires.com/global/gene/link.php?doc_id=16273&fg=1>.
+URL: https://github.com/kkondo1981/aglm
+BugReports: https://github.com/kkondo1981/aglm/issues
 License: GPL-2
 Encoding: UTF-8
-LazyData: true
+Language: en-US
 RoxygenNote: 7.1.1
 Roxygen: list(markdown = TRUE)
 Depends:
-    R (>= 4.0.2),
+    R (>= 4.0.0),
 Imports:
     glmnet (>= 4.0.2),
-    assertthat
+    assertthat,
+    methods,
+    mathjaxr
 Suggests: 
     testthat,
     knitr,
     rmarkdown,
-    MASS
-VignetteBuilder: knitr
+    MASS,
+    faraway
+RdMacros:
+    mathjaxr
diff --git a/NAMESPACE b/NAMESPACE
@@ -12,7 +12,6 @@ export(createEqualWidthBins)
 export(cv.aglm)
 export(cva.aglm)
 export(executeBinning)
-export(getDesignMatrix)
 export(getLVarMatForOneVec)
 export(getODummyMatForOneVec)
 export(getUDummyMatForOneVec)
@@ -22,3 +21,23 @@ importFrom(assertthat,assert_that)
 importFrom(glmnet,cv.glmnet)
 importFrom(glmnet,glmnet)
 importFrom(glmnet,predict.glmnet)
+importFrom(grDevices,devAskNewPage)
+importFrom(graphics,barplot)
+importFrom(graphics,boxplot)
+importFrom(graphics,lines)
+importFrom(graphics,mtext)
+importFrom(graphics,par)
+importFrom(graphics,points)
+importFrom(graphics,rug)
+importFrom(methods,new)
+importFrom(stats,IQR)
+importFrom(stats,coef)
+importFrom(stats,deviance)
+importFrom(stats,getCall)
+importFrom(stats,ksmooth)
+importFrom(stats,predict)
+importFrom(stats,quantile)
+importFrom(stats,residuals)
+importFrom(stats,smooth.spline)
+importFrom(utils,flush.console)
+importFrom(utils,str)
diff --git a/NEWS.md b/NEWS.md
@@ -0,0 +1,5 @@
+# aglm 0.4.0
+- Updated all documents and examples.
+
+# aglm 0.3.2
+- Fixed to use `R` 4.0 and `glmnet` 4.0.
diff --git a/R/accurate-glm.R b/R/accurate-glm.R
@@ -1,13 +1,21 @@
-# S4 class for fitted AGLM
-# written by Kenji Kondo @ 2019/1/2
-
-
-#' S4 class for fitted AGLM
+#' Class for results of `aglm()` and `cv.aglm()`
+#'
+#' @slot backend_models The fitted backend `glmnet` model is stored.
+#' @slot vars_info A list, each of whose element is information of one variable.
+#' @slot lambda Same as in the result of \link{cv.glmnet}.
+#' @slot cvm Same as in the result of \link{cv.glmnet}.
+#' @slot cvsd Same as in the result of \link{cv.glmnet}.
+#' @slot cvup Same as in the result of \link{cv.glmnet}.
+#' @slot cvlo Same as in the result of \link{cv.glmnet}.
+#' @slot nzero Same as in the result of \link{cv.glmnet}.
+#' @slot name Same as in the result of \link{cv.glmnet}.
+#' @slot lambda.min Same as in the result of \link{cv.glmnet}.
+#' @slot lambda.1se Same as in the result of \link{cv.glmnet}.
+#' @slot fit.preval Same as in the result of \link{cv.glmnet}.
+#' @slot foldid Same as in the result of \link{cv.glmnet}.
+#' @slot call An object of class `call`, corresponding to the function call when this `AccurateGLM` object is created.
 #'
-#' @slot backend_models Internally used model objects to be passed to backend functions.
-#'   Currently glmnet is used as a backend and this slot holding a glmnet object.
-#' @slot vars_info A list of list. Each element of `vars_info` represents one predictor variable and contains various informations of it.
-#' @slot others slots for holding cross-validation results
+#' @author Kenji Kondo
 #'
 #' @export
 setClass("AccurateGLM",
@@ -26,14 +34,18 @@ setClass("AccurateGLM",
                                        foldid="integer",
                                        call="ANY"))
 
-#' S4 class for the result of cva.aglm function
+
+#' Class for results of `cva.aglm()`
+#'
+#' @slot models_list A list consists of `cv.glmnet()`'s results for all \eqn{\alpha} values.
+#' @slot alpha Same as in \link{cv.aglm}.
+#' @slot nfolds Same as in \link{cv.aglm}.
+#' @slot alpha.min.index The index of `alpha.min` in the vector `alpha`.
+#' @slot alpha.min The \eqn{\alpha} value achieving the minimum loss among all the values of `alpha`.
+#' @slot lambda.min The \eqn{\lambda} value achieving the minimum loss when \eqn{\alpha} is equal to `alpha.min`.
+#' @slot call An object of class `call`, corresponding to the function call when this `CVA_AccurateGLM` object is created.
 #'
-#' @slot models_list Results of cv.glmnet() for all the values of alpha.
-#' @slot alpha A numeric values specifying alpha values to be examined.
-#' @slot nfolds An integer value specifying the number of folds.
-#' @slot alpha.min The alpha value which achieves the minimum loss.
-#' @slot alpha.min.index An integer value specifying the index of `alpha.min` in `alpha`.
-#' @slot lambda.min The lambda value which achieves the minimum loss, when combined with `alpha.min`.
+#' @author Kenji Kondo
 #'
 #' @export
 setClass("CVA_AccurateGLM",

diff --git a/R/aglm-input.R b/R/aglm-input.R
@@ -1,17 +1,14 @@
-# handling inputs of AGLM model
-# written by Kenji Kondo @ 2019/1/2
-
-
 #' S4 class for input
 #'
-#' @slot vars_info A list of list. Each element has some information of one feature.
-#' @slot data A data.frame which contains original data itself.
+#' @slot vars_info A list, each of whose element is information of one variable.
+#' @slot data The original data.
 setClass("AGLM_Input",
          representation=representation(vars_info="list", data="data.frame"))
 
 
-#' Create a new AGLM_Input object
+# An inner-use function for creating a new AGLM_Input object
 #' @importFrom assertthat assert_that
+#' @importFrom methods new
 newInput <- function(x,
                      qualitative_vars_UD_only=NULL,
                      qualitative_vars_both=NULL,
@@ -86,11 +83,11 @@ newInput <- function(x,
     cl <- class(idxs_or_names)
     idxs <- seq(length(var_names))
     if (cl == "integer") {
-      is_hit <- function(idx) {return(idx %in% idxs_or_names)}
-      idxs <- idxs[sapply(idxs, is_hit)]
+      is_hit_i <- function(idx) {return(idx %in% idxs_or_names)}
+      idxs <- idxs[sapply(idxs, is_hit_i)]
     } else if (cl == "character") {
-      is_hit <- function(var_name) {return(var_name %in% idxs_or_names)}
-      idxs <- idxs[sapply(var_names, is_hit)]
+      is_hit_c <- function(var_name) {return(var_name %in% idxs_or_names)}
+      idxs <- idxs[sapply(var_names, is_hit_c)]
     } else {
       assert_that(FALSE, msg="qualitative_vars_UD_only, qualitative_vars_both, qualitative_vars_both, quantitative_vars should be integer or character vectors.")
     }
@@ -309,6 +306,8 @@ getMatrixRepresentationByVector <- function(raw_vec, var_info, drop_OD=FALSE) {
   return(z)
 }
 
+
+#' @importFrom assertthat assert_that
 getMatrixRepresentation <- function(x, idx, drop_OD=FALSE) {
   var_info <- x@vars_info[[idx]]
   z <- NULL
@@ -347,20 +346,13 @@ getMatrixRepresentation <- function(x, idx, drop_OD=FALSE) {
     }
     colnames(z) <- nm
   } else {
-    assert_true(FALSE)  # never expects to come here
+    assert_that(FALSE)  # never expects to come here
   }
 
   return(z)
 }
 
 
-#' Get design-matrix representation of AGLM_Input objects
-#'
-#' @param x An AGLM_Input object
-#'
-#' @return A data.frame which represents the matrix representation of `x`.
-#'
-#' @export
 #' @importFrom assertthat assert_that
 getDesignMatrix <- function(x) {
   # Check arguments

diff --git a/R/aglm-package.R b/R/aglm-package.R
@@ -0,0 +1,81 @@
+#' aglm: Accurate Generalized Linear Model
+#'
+#' Provides functions to fit Accurate Generalized Linear Model (AGLM) models,
+#' visualize them, and predict for new data. AGLM is defined as a regularized GLM
+#' which applies a sort of feature transformations using a discretization of numerical
+#' features and specific coding methodologies of dummy variables.
+#' For more information on AGLM, see
+#' \href{https://www.institutdesactuaires.com/global/gene/link.php?doc_id=16273&fg=1}{Suguru Fujita, Toyoto Tanaka, Kenji Kondo and Hirokazu Iwasawa (2020)}.
+#'
+#' The collection of functions provided by the `aglm` package has almost the same structure as the famous `glmnet` package,
+#' so users familiar with the `glmnet` package will be able to handle it easily.
+#' In fact, this structure is reasonable in implementation, because what the `aglm` package does is
+#' applying appropriate transformations to the given data and passing it to the `glmnet` package as a backend.
+#'
+#' @section Fitting functions:
+#' The `aglm` package provides three different fitting functions, depending on how users want to handle hyper-parameters of AGLM models.
+#'
+#' Because AGLM is based on regularized GLM, the regularization term of the loss function can be expressed as follows:
+#' \loadmathjax
+#' \mjsdeqn{
+#'   R(\lbrace \beta_{jk} \rbrace; \lambda, \alpha)
+#'    = \lambda \left\lbrace
+#'      (1 - \alpha)\sum_{j=1}^{p} \sum_{k=1}^{m_j}|\beta_{jk}|^2 + \alpha \sum_{j=1}^{p} \sum_{k=1}^{m_j} |\beta_{jk}|
+#'    \right\rbrace,
+#' }
+#' where \eqn{\beta_jk} is the k-th coefficient of auxiliary variables for the j-th column in data,
+#' \eqn{\alpha} is a weight which controls how L1 and L2 regularization terms are mixed,
+#' and \eqn{\lambda} determines the strength of the regularization.
+#'
+#' Searching hyper-parameters \eqn{\alpha} and \eqn{\lambda} is often useful to get better results, but usually time-consuming.
+#' That's why the `aglm` package provides three fitting functions with different strategies for specifying hyper-parameters as follows:
+#'   * \link{aglm}: A basic fitting function with given \eqn{\alpha} and \eqn{\lambda} (s).
+#'   * \link{cv.aglm}: A fitting function with given \eqn{\alpha} and cross-validation for \eqn{\lambda}.
+#'   * \link{cva.aglm}: A fitting function with cross-validation for both \eqn{\alpha} and \eqn{\lambda}.
+#'
+#' Generally speaking, setting an appropriate \eqn{\lambda} is often important to get meaningful results,
+#' and using `cv.aglm()` with default \eqn{\alpha=1} (LASSO) is usually enough.
+#' Since `cva.aglm()` is much time-consuming than `cv.aglm()`, it is better to use it only if particularly better results are needed.
+#'
+#' The following S4 classes are defined to store results of the fitting functions.
+#'   * \link{AccurateGLM-class}: A class for results of `aglm()` and `cv.aglm()`
+#'   * \link{CVA_AccurateGLM-class}: A class for results of `cva.aglm()`
+#'
+#' @section Using the fitted model:
+#' Users can use models obtained from fitting functions in various ways, by passing them to following functions:
+#'   * \link[=predict.AccurateGLM]{predict}: Make predictions for new data
+#'   * \link[=plot.AccurateGLM]{plot}: Plot contribution of each variable and residuals
+#'   * \link[=print.AccurateGLM]{print}: Display textual information of the model
+#'   * \link[=coef.AccurateGLM]{coef}: Get coefficients
+#'   * \link[=deviance.AccurateGLM]{deviance}: Get deviance
+#'   * \link[=residuals.AccurateGLM]{residuals}: Get residuals of various types
+#'
+#' We emphasize that `plot()` is particularly useful to understand the fitted model,
+#' because it presents a visual representation of how variables in the original data are used by the model.
+#'
+#' @section Other functions:
+#' The following functions are basically for internal use, but exported as utility functions for convenience.
+#'
+#' * Functions for creating feature vectors
+#'   * \link{getUDummyMatForOneVec}
+#'   * \link{getODummyMatForOneVec}
+#'   * \link{getLVarMatForOneVec}
+#' * Functions for binning
+#'   * \link{createEqualWidthBins}
+#'   * \link{createEqualFreqBins}
+#'   * \link{executeBinning}
+#'
+#'
+#' @author
+#'   * Kenji Kondo,
+#'   * Kazuhisa Takahashi and Hikari Banno (worked on L-Variable related features)
+#'
+#'
+#' @references Suguru Fujita, Toyoto Tanaka, Kenji Kondo and Hirokazu Iwasawa. (2020)
+#' \emph{AGLM: A Hybrid Modeling Method of GLM and Data Science Techniques}, \cr
+#' \url{https://www.institutdesactuaires.com/global/gene/link.php?doc_id=16273&fg=1} \cr
+#' \emph{Actuarial Colloquium Paris 2020}
+#'
+#' @docType package
+#' @name aglm-package
+NULL
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		This package was submitted to CRAN on 2021-06-09.
		Once it is accepted, delete this file and tag the release (commit 97c24a7).