nncluster {nnclust} | R Documentation |
Uses Prim's algorithm to build a minimum spanning tree for each
cluster, stopping when the nearest-neighbour distance rises above a
specified threshold. Returns a set of clusters and a set of 'outliers'
not in any cluster. trimCluster
tidies up the output by removing
small clusters, clusterMember
returns cluster membership for the
original data points.
nncluster(x, threshold, fill = 0.95, maxclust = 20, give.up = 500,verbose=FALSE) trimCluster(nnclust, size=10) clusterMember(nnclust, outlier=TRUE)
x |
data matrix |
threshold |
Threshold for stopping the tree building within a cluster. The tree stops when the squared euclidean distance to the closest point to the tree is greater than this. |
fill |
Stop when the clusters make up this fraction of the data. |
maxclust |
Stop at this many clusters |
give.up |
Stop when fewer than this many pairs have nearest-neighbour distance
less than threshold .
|
verbose |
Print some cluster summaries before restarting? |
nnclust |
An object of class nncluster , returned by nncluster
|
size |
Clusters smaller than this are added to the 'outlier' set |
outlier |
If FALSE , use NA for the cluster identifier
for outliers |
Works best for well-separated clusters in up to 8 dimensions, and sample sizes up to hundreds of thousands.
If you want a complete minimum spanning tree, run mst
on the
outlier set and then use nnfind
to find the shortest links
connecting the clusters. When there are well-separated clusters this
will be faster than running mst
once on the whole data set.
A list of class nncluster
. Each element but the last
describes a cluster, with components mst
containing the tree,
x
containing the data, and rows
containing row numbers in
the initial data set.
The last element describes the unclustered outliers and has no
mst
component.
The performance of this algorithm depends critically on the performance of the nearest-neighbour finder, and can decay catastrophically if too many uninformative variables are added.
Thomas Lumley
x<-scale(faithful) a<-nncluster(x, threshold=0.1, give.up=0, fill=1) a id<-clusterMember(a) plot(faithful, col=id, pch=19)