optionCore {CORElearn} | R Documentation |
Description of parameters.
Description
For given parameter name function prints its type, default value, and short description.
If no name
is given descriptions for all available options are printed out.
Usage
optionCore(name=NULL)
Arguments
name |
Optional parameter giving the name of the option. |
Details
There are many different parameters available. Some are general and can be used in many
learning, or feature evaluation algorithms. All the values actually used by
the classifier / regressor can be written to file (or read from it) using
paramCoreIO
.
The parameters for the methods are split into several groups and documented below.
Value
There is no return value. Type of the option, its default value, and short description are printed
on the output.
Attribute/feature evaluation
The parameters in this group may be used inside model construction
via CoreModel
and feature evaluation in attrEval
. See attrEval
for description of relevant evaluation methods.
Parameters attrEvaluationInstances
,binaryAttributes
,binarySplitNumericAttributes
are applicable to all attribute evaluation methods.
Other parameters may be used only in context sensitive measures, i.e., ReliefF in classification
and RReliefF in regression and their variants.
- binaryAttributes
- type: logical, default value: FALSE
shell we treat all attributes as binary and binarize them before evaluation if necessary.
- binarySplitNumericAttributes
- type: logical, default value: TRUE
shell the numerical attributes be treated as binary (to avoid biases against multi-valued attributes).
- attrEvaluationInstances
- type: integer, default value: 0, value range: 0, Inf
number of instances for attribute evaluation (0=all available).
- ReliefIterations
- type: integer, default value: 0, value range: -2, Inf
number of iterations for all variants of Relief (0=DataSize, -1=ln(DataSize) -2=sqrt(DataSize)).
- numAttrProportionEqual
- type: numeric, default value: 0.04, value range: 0, 1
used in ramp function, proportion of numerical attribute's range to consider two values equal.
- numAttrProportionEqual
- type: numeric, default value: 0.1, value range: 0, 1
used in ramp function, proportion of numerical attribute's range to consider two values different.
- kNearestEqual
- type: integer, default value: 10, value range: 0, Inf
number of neighbors to consider in equal k-nearest attribute evaluation.
- kNearestExpRank
- type: integer, default value: 70, value range: 0, Inf
number of neighbors to consider in exponential rank distance attribute evaluation.
- quotientExpRankDistance
- type: numeric, default value: 20, value range: 0, Inf
quotient in exponential rank distance attribute evaluation.
Algorithm ordEval
Algorithm ordEval
uses all the parameters for context-sensitive attribute evaluation (as e.g. ReliefF),
and some additional ones.
- ordEvalNoRandomNormalizers
- type: integer, default value: 0, value range: 0, Inf,
number of randomly shuffled attributes for normalization of each attribute (0=no normalization).
- ordEvalBootstrapRandomNormalize
- type: logical, default value: FALSE
are features used for normalization constructed with bootstrap sampling or random permutation.
- ordEvalNormalizingPercentile
- type: numeric, default value: 0.025, value range: 0, 0.5
percentile defines the length of confidence interval obtained with random normalization. Percentile t
forms
interval by taking the nt and n(1-t) random evaluation as the confidence interval boundaries, thereby forming
100(1-2t)% confidence interval (t
=0.025 gives 95% confidence interval). The value n is set by
ordEvalNoRandomNormalizers
parameters.
- attrWeights
- type: character
a character vector representing a list of attribute weights in the ordEval distance measure.
Decision/regression tree construction
There are several parameters controlling a construction of the tree model. Some are described here,
but also attribute evaluation, stop building, model, constructive induction, discretization,
and pruning options described in these document are applicable.
- selectionEstimator
- type: character, default value: "MDL", possible values: all from
infoCore(what="attrEval")
estimator for selection of attributes and binarization in classification.
- selectionEstimatorReg
- type: character, default value: "RReliefFexpRank", possible values: all from
infoCore(what="attrEvalReg")
estimator for selection of attributes and binarization in regression.
- minReliefEstimate
- type: numeric, default value: 0, value range: -1, 1
for all variants of Relief attribute estimator: the minimal evaluation of attribute to consider the attribute useful in further processing.
- minInstanceWeight
- type: numeric, default value: 0.05, value range: 0, 1
minimal weight of an instance to use it further in splitting.
Stop tree building
During tree construction the node is recursively split, until certain condition is fulfilled.
- minNodeWeight
- type: numeric, default value: 2, value range: 0, Inf
minimal number of instances (weight) of a tree node to split it further.
- relMinNodeWeight
- type: numeric, default value: 0, value range: 0, 1
minimal proportion of training instances in a tree node to split it further.
- majorClassProportion
- type: numeric, default value: 1, value range: 0, 1
proportion of majority class in a classification tree node to stop splitting it.
- rootStdDevProportion
- type: numeric, default value: 0, value range: 0, 1
proportion of root's standard deviation in a regression tree node to stop splitting it.
Models in the tree leaves
In leaves of the tree model there can be various prediction models controlling prediction. For example instead of classification with
majority of class values one can use naive Bayes in classification, or a linear model in regression, thereby expanding
expressive power of the tree model.
- modelType
- type: integer, default value: 1, value range: 1, 4
type of models used in classification tree leaves (1=majority class, 2=k-nearest neighbors, 3=k-nearest neighbors with kernel, 4=naive Bayes).
- modelTypeReg
- type: integer, default value: 1, value range: 1, 8
type of models used in regression tree leaves (1=mean predicted value, 2=median predicted value, 3=linear by MSE,
4=linear by MDL, 5=linear as in M5, 6=kNN, 7=Gaussian kernel regression, 8=locally weighted linear regression).
- kInNN
- type: integer, default value: 10, value range: 0, Inf
number of neighbors in k-nearest neighbors models (0=all).
- nnKernelWidth
- type: numeric, default value: 2, value range: 0, Inf
kernel width in k-nearest neighbors models.
- bayesDiscretization
- type: integer, default value: 2, value range: 1, 2
type of discretization for naive Bayes models (1=greedy with selection estimator, 2=equal frequency).
- bayesEqFreqIntervals
- type: integer, default value: 4, value range: 1, Inf
number of intervals in equal frequency discretization for naive Bayesian models.
Constructive induction aka. feature construction
The expressive power of tree models can be increased by incorporating additional types of splits. Operator based
constructive induction is implemented in both classification and regression. The best construct is searched with beam search.
At each step new constructs are evaluated with selected feature evaluation measure.
With different types of operators one can control expressions in the interior tree nodes.
- constructionMode
- type: integer, default value: 15, value range: 1, 15
sum of constructive operators (1=single attributes, 2=conjunction, 4=addition, 8=multiplication); all=1+2+4+8=15
- constructionDepth
- type: integer, default value: 0, value range: 0, Inf
maximal depth of the tree for constructive induction (0=do not do construction, 1=only at root, ...).
- noCachedInNode
- type: integer, default value: 5, value range: 0, Inf
number of cached attributes in each node where construction was performed.
- constructionEstimator
- type: character, default value: "MDL", possible values: all from
infoCore(what="attrEval")
estimator for constructive induction in classification.
- constructionEstimator
- type: character, default value: "RReliefFexpRank", possible values: all from
infoCore(what="attrEval")
estimator for constructive induction in regression.
- beamSize
- type: integer, default value: 20, value range: 1, Inf
size of the beam in search for best feature in constructive induction.
- maxConstructSize
- type: integer, default value: 3, value range: 1, Inf
maximal size of constructs in constructive induction.
Attribute discretization
Some algorithms cannot deal with numeric attributes directly, so we have to discretize them. The discretization algorithm
greedily (exhaustively for small number of candidates) evaluates split candidates and forms intervals of values.
- discretizationLookahead
- type: integer, default value: 3, value range: 0, Inf
number of times current discretization can be worse than the best so far found, to stop search (0=try all possibilities).
- discretizationSample
- type: integer, default value: 50, value range: 0, Inf
maximal number of points to try discretization (0=all sensible).
Tree pruning
After the tree is constructed, to reduce noise it is beneficial to prune it.
- selectedPruner
- type: integer, default value: 1, value range: 0, 1
decision tree pruning method used (0=none, 1=with m-estimate).
- selectedPrunerReg
- type: integer, default value: 2, value range: 0, 4
regression tree pruning method used (0=none, 1=MDL, 2=with m-estimate, 3=as in M5, 4=error complexity as in CART (fixed alpha)).
- mdlModelPrecision
- type: numeric, default value: 0.1, value range: 0, Inf
precision of model coefficients in MDL tree pruning.
- mdlErrorPrecision
- type: numeric, default value: 0.01, value range: 0, Inf
precision of errors in MDL tree pruning.
- mEstPruning
- type: numeric, default value: 2, value range: 0, Inf
m-estimate for pruning with m-estimate.
- alphaErrorComplexity
- type: numeric, default value: 0, value range: 0, Inf
alpha for error complexity pruning.
Prediction
For some models (trees and naive Bayes) one can control prediction.
- mEstPrediction
- type: numeric, default value: 0, value range: 0, Inf
m-estimate for prediction.
Random forests
Random forest is quite complex model, whose construction one can control with several parameters.
Momentarily only classification version of the algorithm is implemented.
Besides parameters in this section one can apply majority of parameters for control of decision trees (except constructive induction and tree pruning).
- rfNoTrees
- type: integer, default value: 100, value range: 1, Inf
number of trees in the random forest.
- rfNoSelAttr
- type: integer, default value: 0, value range: -2, Inf
number of randomly selected attributes in the node (0=sqrt(numOfAttr), -1=log2(numOfAttr)+1, -2=all).
- rfMultipleEst
- type: logical, default value: FALSE
use multiple attribute estimators in the forest? If TRUE the algorithm uses some preselected attribute evaluation measures on different trees.
- rfkNearestEqual
- type: integer, default value: 30, value range: 0, Inf
number of nearest intances for weighted random forest classification (0=no weighing).
- rfPropWeightedTrees
- type: numeric, default value: 0, value range: 0, 1
Proportion of trees where attribute probabilities are weighted with their quality. As attribute weighting might reduce the variance between the models,
the default value switches the weighing off.
- rfPredictClass
- type: logical, default value: FALSE
shall individual trees predict with majority class (otherwise with class distribution).
General tree ensembles
In the same manner as random forests more general tree ensembles can be constructed. Additional options control sampling,
tree size and regularization.
- rfSampleProp
- type: numeric, default value: 0, value range: 0, 1
proportion of the training set to be used in learning (0=bootstrap replication).
- rfNoTerminals
- type: integer, default value: 0, value range: 0, Inf
maximal number of leaves in each tree (0=build the whole tree).
- rfRegType
- type: integer, default value: 2, value range: 0, 2
type of regularization (0=no regularization, 1=global regularization, 2=local regularization).
- rfRegLambda
- type: numeric, default value: 0, value range: 0, Inf
regularization parameter lambda (0=no regularization).
Read data directly from files
In case of very large data sets it is useful to bypass R and read data directly from files as the standalone learning system CORElearn
does. Supported file formats are C4.5, M5, and native format of CORElearn. See documentation at http://lkm.fri.uni-lj.si/rmarko/software/.
- domainName
- type: character,
name of a problem to read from files with suffixes .dsc, .dat, .names, .data, .cm, and .costs
- dataDirectory
- type: character,
folder where data files are stored.
- NAstring
- type: character, default value: "?"
character string which represents missing and NA values in the data files.
Author(s)
Marko Robnik-Sikonja, Petr Savicky
See Also
CORElearn
,
CoreModel
,
predict.CoreModel
,
attrEval
,
ordEval
,
paramCoreIO
.
Examples
# single parameter
optionCore("modelTypeReg")
#description of all parameters
optionCore()
[Package
CORElearn version 0.9.22
Index]