FRBhotellingS {FRB}R Documentation

Robust Hotelling test using the S-estimator

Description

Robust one-sample and two-sample Hotelling test using the S-estimator and the Fast and Robust Bootstrap.

Usage

FRBhotellingS(Xdata, Ydata=NULL, mu0 = 0, R = 999, bdp = 0.5, conf = 0.95, 
                method = c("HeFung", "pool"), control=Scontrol(...), ...)

Arguments

Xdata a matrix or data-frame
Ydata an optional matrix or data-frame in case of a two-sample test
mu0 an optional vector of data values (or a single number which will be repeated p times) indicating the true value of the mean (does not apply in case of the two-sample test). Default is the null vector mu0=0
R number of bootstrap samples. Default is R=999
bdp required breakdown point. Should have 0 < bdp <= 0.5, the default is 0.5
conf confidence level for the simultaneous confidence intervals. Default is conf=0.95
method for the two-sample Hotelling test, indicates the way the common covariance matrix is estimated: "pool"= pooled covariance matrix, "HeFung"= using the He and Fung method
control a list with control parameters for tuning the computing algorithm, see Scontrol().
... allows for specifying control parameters directly instead of via control

Details

The classical Hotelling test for testing if the mean equals a certain center or if two means are equal is modified into a robust one through substitution of the empirical estimates by the S-estimates of location and scatter. The S-estimator uses Tukey's biweight function where the constant is chosen to obtain the desired breakdown point as specified by bdp. The S-estimator is computed by a call to Sest_loccov() or Sest_twosample(), depending on the type of test. These functions implement a fast-S-type algorithm, the tuning parameters of which can be changed via control.

The fast and robust bootstrap is used to mimic the distribution of the test statistic under the null hypothesis. For instance, the 5% critical value for the test is given by the 95% quantile of the recalculated statistics.

Robust simultaneous confidence intervals for linear combinations of the mean (or difference in means) are developed similarly to the classical case (Johnson and Wichern, 1988, page 239). The value CI is a matrix with the confidence intervals for each element of the mean (or difference in means), with level conf. It consists of two rows, the first being the lower bound and the second the upper bound. Note that these intervals are rather conservative in the sense that the simultaneous confidence level holds for all linear combinations and here only p of these are considered (with p the dimension of the data).

For the two-sample Hotelling test we assume that the samples have an underlying distribution with the same covariance matrix. This covariance matrix can be estimated in two different ways using the pooled covariance matrix or the two-sample estimator of He and Fung (He and Fung 2000), and argument method defaults to the first option. For more details see Roelant et al. (2008).

In the two-sample version, the null hypothesis always states that the two means are equal. For the one-sample version, the default null hypothesis is that the mean equals zero, but the hypothesized value can be changed and specified through argument mu0.

Bootstrap samples are discarded if the fast and robust covariance estimate is not positive definite, such that the actual number of recalculations used can be lower than R. This number is returned as ROK.

See print.FRBhot for details on the output.

Value

An object of class FRBhot, which is a list containing the following components:

pvalue p-value of the robust one or two-sample Hotelling test, determined by the fast and robust bootstrap
teststat the value of the robust test statistic.
teststat.boot the bootstrap recalculated values of the robust test statistic.
Mu center of the sample in case of one-sample Hotelling test
Mu1 center of the first sample in case of the two-sample Hotelling test
Mu2 center of the second sample in case of the two-sample Hotelling test
Sigma covariance of one sample or common covariance matrix in the case of two samples
CI bootstrap simultaneous confidence intervals for each component of the center
conf a copy of the conf argument
data the names of the Xdata and possibly Ydata object
meth a character string giving the estimator that was used
X, Y copies of the Xdata and Ydata arguments as matrices
w implicit weights corresponding to the S-estimates (i.e. final weights in the RWLS procedure at the end of the fast-S algorithm)
outFlag outlier flags: 1 if the robust distance of the observation exceeds the .975 quantile of (the square root of) the chi-square distribution with degrees of freedom equal to the dimension of Xdata; 0 otherwise
ROK number of bootstrap samples actually used (i.e. not discarded due to non-positive definite covariance

Author(s)

Ella Roelant and Gert Willems

References

See Also

plot.FRBhot,print.FRBhot, FRBhotellingMM, Scontrol

Examples

## One sample robust Hotelling test
data(delivery)
delivery.x <- delivery[,1:2]
FRBhotellingS(delivery.x)

## One sample robust Hotelling test
data(ForgedBankNotes)
samplemean <- apply(ForgedBankNotes, 2, mean)
res = FRBhotellingS(ForgedBankNotes, mu0=samplemean)
res
# Note that the test rejects the hypothesis that the true mean equals the
# sample mean; this is due to outliers in the data (i.e. the robustly estimated
# mean apparently significantly differs from the non-robust sample mean.

# Graphical display of the results:
plot(res)
# It is clear from the (scaled) simultaneous confidence limits that the rejection
# of the hypothesis is due to the differences in variables Bottom and Diagonal

# For comparison, the hypothesis would be accepted if only the first three
# variables were considered:
res = FRBhotellingS(ForgedBankNotes[,1:3], mu0=samplemean[1:3])
plot(res)

## Two sample robust Hotelling test
data(hemophilia)
grp <-as.factor(hemophilia[,3])
x <- hemophilia[which(grp==levels(grp)[1]),1:2]
y <- hemophilia[which(grp==levels(grp)[2]),1:2]

#using the pooled covariance matrix to estimate the common covariance matrix
res = FRBhotellingS(x,y,method="pool")

#using the estimator of He and Fung to estimate the common covariance matrix
res = FRBhotellingS(x,y,method="HeFung")

# From the confidence limits it can be seen that the significant difference
# is mainly caused by the AHFactivity variable. The graphical display helps too:
plot(res)
# the red line on the histogram indicates the test statistic value in the original
# sample (it is omitted if the statistic exceeds 100)


[Package FRB version 1.6 Index]