CoreModel {CORElearn} | R Documentation |
Builds a classification or regression model from the data
and formula
with given parameters.
Classification models available are
Regression models:
CoreModel(formula, data, model=c("rf","rfNear","tree","knn","knnKernel","bayes","regTree"), ..., costMatrix=NULL, dataFromFiles=FALSE)
formula |
Formula specifying the response and attribute variables. |
data |
Data frame with training data. |
model |
The type of model to be learned. |
... |
Options for building the model. See optionCore . |
costMatrix |
Optional cost matrix. |
dataFromFiles |
Logical value controlling whether the training data will be read directly from files. |
Parameter formula is used as a mechanism to select features (attributes) and the prediction (response) variable (class). Only simple terms can be used. Interaction terms are not supported. The simplest way is to specify just the response variable using e.g. "class ~ .". See examples below.
Parameter model controls the type of the constructed model. There are several possibilities:
"rf"
"rfNear"
"tree"
"knn"
"knnKernel"
"bayes"
"regTree"
There are many additional parameters ... available which are used by different models.
Their list and description is available by calling optionCore
. Evaluation of attributes is covered
in function attrEval
.
The optional parameter costMatrix can provide nonuniform cost matrix for classification problems. For regression problem this parameter is ignored. The format of the matrix is costMatrix(true class, predicted class). By default uniform costs are assumed, i.e., costMatrix(i, i) = 0, and costMatrix(i, j) = 1, for i not equal to j.
The optional logical parameter dataFromFiles can be set to TRUE
in case of very large
data sets not fitting into R. In this case training data is read directly from files, ignoring data
parameter.
Supported data formats are C4.5 (M5) and native format of CORElearn. Its description is available from
http://lkm.fri.uni-lj.si/rmarko/software/.
The created model is not returned as a structure. It is stored internally
in the package memory space and only its pointer (index) is returned.
The maximum number of models that can be stored simultaneously
is a parameter of the initialization function initCore
and
defaults to 100. Models, which are not needed may be deleted in order
to free the memory using function destroyModels
.
By referencing the returned model, any of the stored models may be
used for prediction with predict.CoreModel
.
What the function actually returns is a list with components:
modelID |
index of internally stored model, |
terms |
description of prediction variables and response, |
class.lev |
class values for classification problem, null for regression problem, |
model |
the type of model used, see parameter model , |
dataFromFiles |
was data read directly from file, see parameter dataFromFiles , |
formula |
the formula parameter passed. |
Marko Robnik-Sikonja, Petr Savicky
Marko Robnik-Sikonja, Igor Kononenko: Theoretical and Empirical Analysis of ReliefF and RReliefF. Machine Learning Journal, 53:23-69, 2003
Leo Breiman: Random Forests. Machine Learning Journal, 45:5-32, 2001
Marko Robnik-Sikonja: Improving Random Forests. In J.-F. Boulicaut et al.(Eds): ECML 2004, LNAI 3210, Springer, Berlin, 2004, pp. 359-370
Marko Robnik-Sikonja: CORE - a system that predicts continuous variables. Proceedings of ERK'97 , Portoroz, Slovenia, 1997
Marko Robnik-Sikonja, Igor Kononenko: Discretization of continuous attributes using ReliefF. Proceedings of ERK'95, B149-152, Ljubljana, 1995
Majority of these references are available from http://lkm.fri.uni-lj.si/rmarko/papers/
CORElearn
,
predict.CoreModel
,
modelEval
,
attrEval
,
optionCore
,
paramCoreIO
.
# use iris data set # build random forests model with certain parameters modelRF <- CoreModel(Species ~ ., iris, model="rf", selectionEstimator="MDL",minNodeWeight=5,rfNoTrees=100) print(modelRF) # build decision tree with naive Bayes in the leaves modelDT <- CoreModel(Species ~ ., iris, model="tree", modelType=4) print(modelDT) # build regression tree similar to CART instReg <- regDataGen(200) modelRT <- CoreModel(response~., instReg, model="regTree", modelTypeReg=1) print(modelRT) # build kNN kernel regressor by preventing tree splitting modelKernel <- CoreModel(response~., instReg, model="regTree", modelTypeReg=7, minNodeWeight=Inf) print(modelKernel) # A more complex example demonstrating also destroyModels() follows. # Test accuracy of random forest predictor with 20 trees on iris data # using 10-fold cross-validation. ncases <- nrow(iris) ind <- ceiling(10*(1:ncases)/ncases) ind <- sample(ind,length(ind)) pred <- rep(NA,ncases) fit <- NULL for (i in unique(ind)) { # Delete the previous model, if there is one. if (!is.null(fit)) destroyModels(fit) fit <- CoreModel(Species ~ ., iris[ind!=i,], model="rf", rfNoTrees=20) pred[ind==i] <- predict.CoreModel(fit, iris[ind==i,], type="class") } table(pred,iris$Species)