optionCore {CORElearn}R Documentation

Description of parameters.

Description

For given parameter name function prints its type, default value, and short description. If no name is given descriptions for all available options are printed out.

Usage

optionCore(name=NULL) 

Arguments

name Optional parameter giving the name of the option.

Details

There are many different parameters available. Some are general and can be used in many learning, or feature evaluation algorithms. All the values actually used by the classifier / regressor can be written to file (or read from it) using paramCoreIO. The parameters for the methods are split into several groups and documented below.

Value

There is no return value. Type of the option, its default value, and short description are printed on the output.

Attribute/feature evaluation

The parameters in this group may be used inside model construction via CoreModel and feature evaluation in attrEval. See attrEval for description of relevant evaluation methods.

Parameters attrEvaluationInstances,binaryAttributes,binarySplitNumericAttributes are applicable to all attribute evaluation methods. Other parameters may be used only in context sensitive measures, i.e., ReliefF in classification and RReliefF in regression and their variants.

binaryAttributes
type: logical, default value: FALSE
shell we treat all attributes as binary and binarize them before evaluation if necessary.
binarySplitNumericAttributes
type: logical, default value: TRUE
shell the numerical attributes be treated as binary (to avoid biases against multi-valued attributes).
attrEvaluationInstances
type: integer, default value: 0, value range: 0, Inf
number of instances for attribute evaluation (0=all available).
ReliefIterations
type: integer, default value: 0, value range: -2, Inf
number of iterations for all variants of Relief (0=DataSize, -1=ln(DataSize) -2=sqrt(DataSize)).
numAttrProportionEqual
type: numeric, default value: 0.04, value range: 0, 1
used in ramp function, proportion of numerical attribute's range to consider two values equal.
numAttrProportionEqual
type: numeric, default value: 0.1, value range: 0, 1
used in ramp function, proportion of numerical attribute's range to consider two values different.
kNearestEqual
type: integer, default value: 10, value range: 0, Inf
number of neighbors to consider in equal k-nearest attribute evaluation.
kNearestExpRank
type: integer, default value: 70, value range: 0, Inf
number of neighbors to consider in exponential rank distance attribute evaluation.
quotientExpRankDistance
type: numeric, default value: 20, value range: 0, Inf
quotient in exponential rank distance attribute evaluation.

Algorithm ordEval

Algorithm ordEval uses all the parameters for context-sensitive attribute evaluation (as e.g. ReliefF), and some additional ones.

ordEvalNoRandomNormalizers
type: integer, default value: 0, value range: 0, Inf,
number of randomly shuffled attributes for normalization of each attribute (0=no normalization).
ordEvalBootstrapRandomNormalize
type: logical, default value: FALSE
are features used for normalization constructed with bootstrap sampling or random permutation.
ordEvalNormalizingPercentile
type: numeric, default value: 0.025, value range: 0, 0.5
percentile defines the length of confidence interval obtained with random normalization. Percentile t forms interval by taking the nt and n(1-t) random evaluation as the confidence interval boundaries, thereby forming 100(1-2t)% confidence interval (t=0.025 gives 95% confidence interval).
The value n is set by ordEvalNoRandomNormalizers parameters.
attrWeights
type: character
a character vector representing a list of attribute weights in the ordEval distance measure.

Decision/regression tree construction

There are several parameters controlling a construction of the tree model. Some are described here, but also attribute evaluation, stop building, model, constructive induction, discretization, and pruning options described in these document are applicable.

selectionEstimator
type: character, default value: "MDL", possible values: all from infoCore(what="attrEval")
estimator for selection of attributes and binarization in classification.
selectionEstimatorReg
type: character, default value: "RReliefFexpRank", possible values: all from infoCore(what="attrEvalReg")
estimator for selection of attributes and binarization in regression.
minReliefEstimate
type: numeric, default value: 0, value range: -1, 1
for all variants of Relief attribute estimator: the minimal evaluation of attribute to consider the attribute useful in further processing.
minInstanceWeight
type: numeric, default value: 0.05, value range: 0, 1
minimal weight of an instance to use it further in splitting.

Stop tree building

During tree construction the node is recursively split, until certain condition is fulfilled.

minNodeWeight
type: numeric, default value: 2, value range: 0, Inf
minimal number of instances (weight) of a tree node to split it further.
relMinNodeWeight
type: numeric, default value: 0, value range: 0, 1
minimal proportion of training instances in a tree node to split it further.
majorClassProportion
type: numeric, default value: 1, value range: 0, 1
proportion of majority class in a classification tree node to stop splitting it.
rootStdDevProportion
type: numeric, default value: 0, value range: 0, 1
proportion of root's standard deviation in a regression tree node to stop splitting it.

Models in the tree leaves

In leaves of the tree model there can be various prediction models controlling prediction. For example instead of classification with majority of class values one can use naive Bayes in classification, or a linear model in regression, thereby expanding expressive power of the tree model.

modelType
type: integer, default value: 1, value range: 1, 4
type of models used in classification tree leaves (1=majority class, 2=k-nearest neighbors, 3=k-nearest neighbors with kernel, 4=naive Bayes).
modelTypeReg
type: integer, default value: 1, value range: 1, 8
type of models used in regression tree leaves (1=mean predicted value, 2=median predicted value, 3=linear by MSE, 4=linear by MDL, 5=linear as in M5, 6=kNN, 7=Gaussian kernel regression, 8=locally weighted linear regression).
kInNN
type: integer, default value: 10, value range: 0, Inf
number of neighbors in k-nearest neighbors models (0=all).
nnKernelWidth
type: numeric, default value: 2, value range: 0, Inf
kernel width in k-nearest neighbors models.
bayesDiscretization
type: integer, default value: 2, value range: 1, 2
type of discretization for naive Bayes models (1=greedy with selection estimator, 2=equal frequency).
bayesEqFreqIntervals
type: integer, default value: 4, value range: 1, Inf
number of intervals in equal frequency discretization for naive Bayesian models.

Constructive induction aka. feature construction

The expressive power of tree models can be increased by incorporating additional types of splits. Operator based constructive induction is implemented in both classification and regression. The best construct is searched with beam search. At each step new constructs are evaluated with selected feature evaluation measure. With different types of operators one can control expressions in the interior tree nodes.

constructionMode
type: integer, default value: 15, value range: 1, 15
sum of constructive operators (1=single attributes, 2=conjunction, 4=addition, 8=multiplication); all=1+2+4+8=15
constructionDepth
type: integer, default value: 0, value range: 0, Inf
maximal depth of the tree for constructive induction (0=do not do construction, 1=only at root, ...).
noCachedInNode
type: integer, default value: 5, value range: 0, Inf
number of cached attributes in each node where construction was performed.
constructionEstimator
type: character, default value: "MDL", possible values: all from infoCore(what="attrEval")
estimator for constructive induction in classification.
constructionEstimator
type: character, default value: "RReliefFexpRank", possible values: all from infoCore(what="attrEval")
estimator for constructive induction in regression.
beamSize
type: integer, default value: 20, value range: 1, Inf
size of the beam in search for best feature in constructive induction.
maxConstructSize
type: integer, default value: 3, value range: 1, Inf
maximal size of constructs in constructive induction.

Attribute discretization

Some algorithms cannot deal with numeric attributes directly, so we have to discretize them. The discretization algorithm greedily (exhaustively for small number of candidates) evaluates split candidates and forms intervals of values.

discretizationLookahead
type: integer, default value: 3, value range: 0, Inf
number of times current discretization can be worse than the best so far found, to stop search (0=try all possibilities).
discretizationSample
type: integer, default value: 50, value range: 0, Inf
maximal number of points to try discretization (0=all sensible).

Tree pruning

After the tree is constructed, to reduce noise it is beneficial to prune it.

selectedPruner
type: integer, default value: 1, value range: 0, 1
decision tree pruning method used (0=none, 1=with m-estimate).
selectedPrunerReg
type: integer, default value: 2, value range: 0, 4
regression tree pruning method used (0=none, 1=MDL, 2=with m-estimate, 3=as in M5, 4=error complexity as in CART (fixed alpha)).
mdlModelPrecision
type: numeric, default value: 0.1, value range: 0, Inf
precision of model coefficients in MDL tree pruning.
mdlErrorPrecision
type: numeric, default value: 0.01, value range: 0, Inf
precision of errors in MDL tree pruning.
mEstPruning
type: numeric, default value: 2, value range: 0, Inf
m-estimate for pruning with m-estimate.
alphaErrorComplexity
type: numeric, default value: 0, value range: 0, Inf
alpha for error complexity pruning.

Prediction

For some models (trees and naive Bayes) one can control prediction.

mEstPrediction
type: numeric, default value: 0, value range: 0, Inf
m-estimate for prediction.

Random forests

Random forest is quite complex model, whose construction one can control with several parameters. Momentarily only classification version of the algorithm is implemented. Besides parameters in this section one can apply majority of parameters for control of decision trees (except constructive induction and tree pruning).

rfNoTrees
type: integer, default value: 100, value range: 1, Inf
number of trees in the random forest.
rfNoSelAttr
type: integer, default value: 0, value range: -2, Inf
number of randomly selected attributes in the node (0=sqrt(numOfAttr), -1=log2(numOfAttr)+1, -2=all).
rfMultipleEst
type: logical, default value: FALSE
use multiple attribute estimators in the forest? If TRUE the algorithm uses some preselected attribute evaluation measures on different trees.
rfkNearestEqual
type: integer, default value: 30, value range: 0, Inf
number of nearest intances for weighted random forest classification (0=no weighing).
rfPropWeightedTrees
type: numeric, default value: 0, value range: 0, 1
Proportion of trees where attribute probabilities are weighted with their quality. As attribute weighting might reduce the variance between the models, the default value switches the weighing off.
rfPredictClass
type: logical, default value: FALSE
shall individual trees predict with majority class (otherwise with class distribution).

General tree ensembles

In the same manner as random forests more general tree ensembles can be constructed. Additional options control sampling, tree size and regularization.

rfSampleProp
type: numeric, default value: 0, value range: 0, 1
proportion of the training set to be used in learning (0=bootstrap replication).
rfNoTerminals
type: integer, default value: 0, value range: 0, Inf
maximal number of leaves in each tree (0=build the whole tree).
rfRegType
type: integer, default value: 2, value range: 0, 2
type of regularization (0=no regularization, 1=global regularization, 2=local regularization).
rfRegLambda
type: numeric, default value: 0, value range: 0, Inf
regularization parameter lambda (0=no regularization).

Read data directly from files

In case of very large data sets it is useful to bypass R and read data directly from files as the standalone learning system CORElearn does. Supported file formats are C4.5, M5, and native format of CORElearn. See documentation at http://lkm.fri.uni-lj.si/rmarko/software/.

domainName
type: character,
name of a problem to read from files with suffixes .dsc, .dat, .names, .data, .cm, and .costs
dataDirectory
type: character,
folder where data files are stored.
NAstring
type: character, default value: "?"
character string which represents missing and NA values in the data files.

Author(s)

Marko Robnik-Sikonja, Petr Savicky

See Also

CORElearn, CoreModel, predict.CoreModel, attrEval, ordEval, paramCoreIO.

Examples

# single parameter
optionCore("modelTypeReg")

#description of all parameters
optionCore()

[Package CORElearn version 0.9.22 Index]