prune {treethresh} | R Documentation |
Extracts an optimal subtree from a tree object of the classes
treethresh
or wtthresh
. Contrary to
subtree
the values of the complexity parameter C does
not need to be given, but is determined using cross-validation.
prune(object, v=5, sd.mult=0.5, plot=TRUE) prune.treethresh(object, v=5, sd.mult=0.5, plot=TRUE) prune.wtthresh(object, v=5, sd.mult=0.5, plot=TRUE)
object |
An object of the class treethresh or
wtthresh according to which thresholding is to be
carried out. |
v |
The number of folds in the cross-validation used to determine the optimal subtree in the pruning step (see below for details). |
sd.mult |
The smallest subtree that is not sd.mult times
the standard error worse than the best loglikelihood will be chosen as
the optimal tree in the pruning step. (see below for details). |
plot |
If plot=TRUE a plot of the relative predicted
loglikelihood estimated in the cross-validation against the complexity
parameter C is produced. |
... |
additional arguments (see above for supported arguments). |
The tree grown by treethresh
or wtthresh
often yields too many partitions leading to an overfit. The resulting
tree has to be 'pruned', i.e. the branches corresponding to the least
important regions have to be 'snipped off'.
As the TreeThresh model is a special case of a classification and regression tree, there exists a sequence of nested subtrees (i.e. a sequence of nested partitions) that maximises the regularised loglikelihood
l + alpha * #partitions.
The parameter alpha controls the complexity of the resulting partition. For alpha=0 no pruning is carried out. If a large enough alpha is chosen, only the root node of the tree is retained, i.e. no partitioning is done. Denote this value of alpha by alpha_0. The complexity parameter can thus be rescaled to
C = alpha / alpha_0
yielding a complexity parameter ranging from 0 (no pruning) to 1 (only retain the root node).
The optimal value of the complexity parameter C (or, equivalently, alpha)
depends on the problem at hand and thus has to be chosen
carefully. prune
estimates the optimal complexity parameter
C by a v-fold cross-validation. If sd.mult=0
the
value of C that yields the highest predictive loglikelihood in the
cross-validation is used to prune the tree object
. If
sd.mult
is not 0 the largest C that is not
sd.mult
standard errors worse than the best C is used.
prune
returns an object of the class
treethresh
or wtthresh
that contains a
tree pruned at value C (see the function prune
for details on the pruning process).
prune.treethresh
and prune.wtthresh
should rarely be directly called by the user. The more user-friendly S3 function
prune
will take care of calling the right function.
For an example of the use of prune
, see coefficients
.
treethresh
, wtthresh
, get.t
, prune