ojaRank {OjaNP} | R Documentation |
The function computes the Oja rank of a point x
w.r.t. a data set X
or, if no
point x
is given, the Oja ranks of all points in X
.
ojaRank(X, x = NULL, p = NULL, silent = FALSE, na.action = na.fail)
X |
numeric data.frame or matrix containing the data points as rows. |
x |
NULL or a numeric vector, the point for which the Oja rank should be computed. |
p |
NULL or a number between 0 and 1 which specifies the fraction of hyperplanes to be used for subsampling. If p = 1 , no subsampling is done.
If p = NULL , the value of p is determined based on the size of the data set. See details. |
silent |
logical, if subsampling is done or the expected computation time is too long, a warning message will be printed unless silent is TRUE . The default is FALSE .
|
na.action |
a function which indicates what should happen when the data contain 'NA's. Default is to fail. |
The function computes the Oja rank of the point x
w.r.t. the data set X
or, if no x
is specified, the Oja ranks of all data points in
X
w.r.t. X
. For a definition of Oja rank see reference below.
The matrix X
needs to have at least as many rows as columns in order to give sensible results.
The vector x
has to be of length ncol(X)
.
If x
is specified, a vector of length ncol(X)
is returned. Otherwise the return value is a matrix of the same dimensions as X
where the i-th row contains the
Oja rank of the i-th row of X
.
The function will also work for matrices X
with only one column and also vectors. Then centered and normalized (univariate) ranks are returned.
For n = nrow(X) data points in R^k, where k = ncol(X), the computation of the Oja rank necessitates the evaluation of N = choose(n,k) hyperplanes in R^k.
Thus for large data sets the function offers a subsampling option in order to deliver (approximate) results within reasonable time. The subsampling fraction is controlled by the parameter
p
:
If p < 1 is passed to the function, the computation is based on a random sample of only pN of all possible N hyperplanes. If p
is not specified
(which defaults to p = NULL
), it is automatically determined based on n and k to yield a sensible trade-off between accuracy and computing time. If
choose(n,k)*k^3 < 6e+06, the sample fraction p
is set to 1 (no subsampling). Otherwise p
is choosen such that the computation
(of one rank) usually takes around 20 seconds (on a 1.66 GHz CPU and 1 GB RAM). If all Oja ranks of X
are requested, a hyperplane sample is drawn once,
all Oja ranks are then computed based on this sample.
Finally, subsampling is feasible. Even for very small p
useable results can be expected, see e.g. the examples for the function ojaRCM
.
either a numeric vector, the Oja rank of x
, or
a matrix of the same dimensions as X
containing the Oja ranks of X
as rows.
Daniel Vogel, Jyrki Möttönen and Claudia Köllmann
Oja, H. (1999), Affine invariant multivariate sign and rank tests and corresponding estimates: A review, Scand. J. Statist., 26, 319–343.
ojaSign
, ojaRCM
, hyperplane
, ojaSignedRank
### ----<< Example 1 >>---- : 30 points in R^2 set.seed(123) X <- rmvnorm(n = 30,mean = c(0,0)) # from package 'mvtnorm' ojaRank(X) ojaRank(X, x = c(100,100)) ojaRank(X, x = ojaMedian(X, alg="exact")) # close to zero # The following two return the same (only in different time) ojaRank(X) t(apply(X, 1, function(y){ojaRank(X,y)})) # but the following two do not (due to different subsampling). # 1) set.seed(123); ojaRank(X, p = 0.9, silent = TRUE) # 2) set.seed(123) t(apply(X, 1, function(y){ojaRank(X, y, p = 0.9, silent = TRUE)})) # In 1) one subsample for all ranks is drawn, whereas in 2) # a different sample for each rank is drawn. ### ----<< Example 2 >>---- : three points in R^3: only one hyperplane # The following commands return the same result. ojaRank(X = diag(rep(1, 3)), x = c(0,0,0)) ojaRank(X = diag(rep(1, 3)), x = c(-100,-110,-5550)) hyperplane(X = diag(rep(1,3)))[1:3] ### ----<< Example 3 >>---- : 300 points in R^7 # Subsampling is done. # The following example might take a bit longer: ## Not run: set.seed(123) X <- rmvnorm(n = 300, mean = rep(0, 7)) system.time(or <- ojaRank(x = 1:7, X = X)) # PLEASE NOTE: The computation of the Oja rank is based on a # random sub-sample of less than 1% of all possible hyperplanes. # # user system elapsed # 18.47 0.00 18.47 print(or,d=4) # [1] 7.733 6.613 6.839 7.383 18.237 21.851 23.700 ## End(Not run) ### ----<< Example 4 >>---- : univariate ranks ojaRank(1:10) ojaRank(X = 1:10, x = 5.5)