knn.var {knnTree} | R Documentation |
Construct or predict with k-nearest-neighbor classifiers, using cross-validation to select variables by forward or backward selection, to choose the best k and to choose scaling methods.
knn.var (train, test, k.vec = seq (1, 31, by=2), theyre.the.same=FALSE, return.all.rates=FALSE, scaling = 1, backward = FALSE, max.steps=-1, save.call = FALSE, verbose = 0, use.big=TRUE)
train |
data frame or matrix of training data with classifications in column 1 |
test |
data frame or matrix of training data with classifications in column 1. Optional if theyre.the.same = TRUE |
k.vec |
numeric vector of possible values of k, the number of nearest neighbors. |
theyre.the.same |
logical describing whether train and test are the same data set. If so, test is ignored and leave-one-out cross-validation is used. This will normally be TRUE when building the classifier and FALSE when predicting. |
return.all.rates |
logical, TRUE if all error rates (that is, one for every element of k.vec) should be returned. If FALSE, only the smallest is returned. |
scaling |
numeric describing scaling technique: 0 means do no scaling; 1 means choose between no scaling and scaling each column by its SD; 2 means choose between no scaling and scaling each column by its MAD. |
backward |
logical describing variable selection technique. TRUE means start with all variables and delete them one at a time until there is no improvement; FALSE means start with no variables and add them one at a time. |
max.steps |
numeric giving maximum number of steps to take. If negative, continue until there is no improvement. Default: -1. |
save.call |
logical, TRUE if a copy of the call should be saved in the resulting object |
verbose |
numeric for debugging purposes. If verbose is 0, no
diagnostic output is produced. If verbose > 0, diagnostic output (more
as the value increases) is placed in a file called "status.txt" in
the HOME directory. When verbose is 2 or (especially) 3 this file
may become very large. |
use.big |
logical, TRUE if the C code should try to use a technique that uses more memory but runs faster. |
R{knn.var} constructs a k-nearest-neighbor classifier using Euclidean metric. Leave-one-out cross-validation together with stepwise (forward or backward, but not both) selection is used to find the best set of variables to include, the best choice of k, and whether the data should be scaled.
Object of class knn. This is a list with between six and eight
of the following components:
which: logical vector, one per input variable; the i-th element of which is TRUE if the i-th input variable is in the classifier
rate: Smallest misclassification rate acheived by algorithm. If
return.all.rates is TRUE this is a vector of error rates, one for each element
of k.vec
best.k: Number giving the optimal value of k, chosen from among the elements
of k.vec.
scaled: indicator of best scaling. FALSE means no scaling was used; TRUE
means scaling was used.
n: the number of observations in the training set
col.sds: Numeric vector of scaling factors, present only if scaled != 0.
If scaled = 1 these are column SD's; if scaled = 2 they are MAD's.
pure: logical, TRUE if every item in the training set had the same class. If
a training set is pure then all the elements of which are FALSE, best.k is
taken to be the first element of k.vec and scaled is set to 0.
call: a copy of the call used to create the object, if save.call was TRUE
Sam Buttrey buttrey@nps.navy.mil
Buttrey and Karo, 2002
data(iris)
set.seed (3)
samp <- sample (nrow(iris), size = 75, replace=FALSE)
knn.var (iris[samp,c(5, 1:4)]) # Build classifier
# Output produced by print.knn
## Not run: This knn classifier is based on 75 observations.
It uses 1 out of 4 variables without scaling.
Training rate is 0.01333 , achieved at k = 1## End(Not run)
iris.knn <- knn.var (iris[samp,c(5, 1:4)]) # Build and save, then predict
predict (iris.knn, iris[-samp,c(5, 1:4)], iris[samp, c(5, 1:4)])
## Not run:
$rate
[1] 0.08
## End(Not run)