glmulti {glmulti}R Documentation

Automated model selection and multimodel inference with (G)LMs

Description

glmulti finds what are the n best models (the confidence set of models) among all possible models (the candidate set, as specified by the user). Models are fitted with the specified fitting function (default is glm) and are ranked with the specified Information Criterion (default is aicc). The best models are found either through an exhaustive screening of the candidates or using a genetic algorithm, which allows very large candidate sets to be adressed. The output can be used for model selection, variable selection, and multimodel inference.

Usage

#  glmulti S4 generic 
glmulti(y, xr, data, exclude = c(), name = "glmulti.analysis", intercept = TRUE, marginality = FALSE, chunk = 1, chunks = 1,
                level = 2, minsize = 0, maxsize = -1, minK = 0, maxK = -1, method = "h", crit = "aicc", confsetsize = 100, popsize = 100, mutrate = 10^-3, sexrate = 0.1, imm = 0.3, plotty = TRUE, report = TRUE, deltaM = 0.05, deltaB = 0.05, conseq = 5, fitfunction = "glm", resumefile = "id", ...)

Arguments

y A formula, character string, or fitted model (of class lm or glm) specifying the response variable and the terms (main effects and/or interactions) to be used in the candidate models (e.g. height~age*sex+mass). Alternatively, a character string naming the variable to be used as response (e.g. "height") (in which case the names of the predictors must be passed through the xr argument)
xr An optional character array specifying the variables (categorical or quantitative) to be used as predictors, e.g. c("age", "height" , "mass")
exclude Optional character vector naming terms (main effects or interactions) to be excluded from the candidate models, e.g. c("mass:height")
intercept Whether to include an intercept in the candidate models or not.
level If 1, only main effects (terms of order 1) are used to build the candidate set. If 2, pairwise interactions are also used (higher order interactions are currently ignored)
data A data.frame containing the data. If not specified, glmulti will try to find the data in the environment of the formula, from the fitted model passed as y argument, or from the global environment.
name The name of this glmulti analysis. Optional.
marginality Whether to apply the marginality rule or not. If TRUE, only marginal models will be considered.
minsize This sets a constraint on candidate models. Minimal number of TERMS (main effects or interactions) to be included in candidate models (negative = no constraint)
maxsize This sets a constraint on candidate models. Maximal number of TERMS to be included in candidate models (negative = no constraint)
minK This sets a constraint on candidate models. Minimal complexity of candidate models (negative = no constraint)
maxK This sets a constraint on candidate models. Maximal complexity of candidate models (negative = no constraint)
method The method to be used to explore the candidate set of models. If "h" an exhaustive screening is undertaken. If "g" the genetic algorithm is employed (recommended for large candidate sets). If "d", a simple summary of the candidate set is printed, including the number of candidate models.
crit The Information Criterion to be used. This should be a function that accepts a fitted model as first argument. Default is the small-sampled corrected Akaike IC (aicc). Other provided functions are the Bayes IC (bic), the original AIC (aic) and QAIC/QAICc (qaic and qaicc).
fitfunction The fitting function to be used. Any function similar to glm can be used
confsetsize The number of models to be looked for, i.e. the size of the returned confidence set.
plotty Whether to plot the progress of the IC profile when running.
report Whether to report about the progress at run time.
chunk When using an exhaustive screening approach, it can be splitted in several parts to take advantage of multiple CPUs. chunk is an integer specifying which part the current call should perform.
chunks When splitting an exhaustive screening, the total number of parts the task should be divided into. For example, with a quad-core processor, 4 may be useful. Use consensus to bring back the pieces into a single object.
popsize The population size for the genetic algorithm
mutrate The per locus (i.e. per term) mutation rate for genetic algorithm, between 0 and 1
sexrate The rate of sexual reproduction for the genetic algorithm, between 0 and 1
imm The rate of immigration for the genetic algorithm, between 0 and 1
deltaM The target change in mean IC (defines the stop rules for the genetic algorithm)
deltaB The target change in best IC (defines the stop rules for the genetic algorithm)
conseq The target successive number of times with no improvement (i.e. target changes have been attained) (defines the stop rule for the GA). The greater it is, the more stringent the stop rule.
resumefile When resuming an analysis (method="r"), the name of the files from which to resume. Default uses the same as name
... Further arguments to be passed to the fitting function. E.g. maxit=50 or family=binomial

Details

glmulti is defined as a S4 function. It acts as a frontend that calls background compiled functions (contained if archive glmulti.jar). Running the function therefore requires a Java Running Environment, and package rJava. A thorough description of this function and package can be found in the article by Calcagno and de Mazancourt. print.glmulti and summary.glmulti are S3 methods which provide a synthesis of glmulti analyses.

Value

An object of class glmulti is returned. It is a S4 object with several slots containing all the relevant data for model selection and beyond.
Several standard S3 functions are provided to help access the content of this object.
Several glmulti objects can be shrunk to one using the function consensus. This is useful to get the best of several replicates (of the genetic algorithm) or to bring together the different parts of a splitted exhaustive screening. When running a genetic algorithm, two tiny java files (serialized objects) are also written to the disk at regular intervals. They can be used to resume the calculation (method="r") if it was interrupted for any reason. This can also be used to continue a GA with modified parameters (e.g. mutation rate).

Author(s)

Vincent Calcagno, McGill University, Canada

References

Buckam (1997) Model Selection: an Integral Part of Inference. Biometrics 10:41 Burnham & Anderson (2002) Model Selection and Multimodel Inference: an Information Theoretic Approach Calcagno & de Mazancourt in revision

See Also

consensus, aic, summary.glmulti, coef.glmulti, step


[Package glmulti version 0.6-2 Index]