integrativeME {integrativeME}R Documentation

Integrative mixture of experts (ME) to combine clinical variables and microarray data in a binary classification framework.

Description

We have implemented and further developed mixture of experts (ME) to combine clinical (categorical) data and microarray data. In order to create a hybrid signature, gene selection is performed in a first step with various approaches. The selected genes and the clinical data are then combined with mixture of experts methodology in a classification framework and the evaluation of the performance is performed via k-fold cross-validation.

Usage

integrativeME(
        data.cat,
        data.cont,
        type,
        select = c('RF', 'student', 'sPLS'),
        method = c('logreg', 'indep', 'loc'),
        loc.ind =  NULL,
        keepX = 5,
        ng = 2,
        mode.sPLS = NULL,
        fold = 10,
        kcv = 1                    # number of 10-fold cv    
)

Arguments

data.cat clinical categorical data. The number of clinical factors is usually very small. Data should be sorted according to the type class vector.
data.cont microarray data. Data should be sorted according to the type class vector.
type vector indicating the class of each observation or sample. The vector should be coded 0 and 1 and sorted.
select gene selection method to be applied in the first step. RF = random forests (wrapper method), student = t-test (filter method) and sPLS = sparse PLS, see also spls to select genes according to the clinical factors.
method variant of ME to use to combine both types of variables, see also MEfunctions.
loc.ind if method = 'loc', then the index of the location variable is needed.
keepX number of genes to select. Should be set to a small value.
ng number of experts to use in ME, set by default to 2 (i.e. the number of classes)
mode.sPLS sPLS mode, 'regression' to select genes according to type, 'canonical' to select genes according to the clinical variables.
fold number of folds in the cross-validation, by default set to 10
kcv numbers of runs of the cross-validation.

Details

Clinical variables should all be categorical and should not contain any 0 values.

The samples in data.cat and data.cont should be ordered according to class vector type.

If method = 'logreg', some warning messages may appear: 'fitted probabilities numerically 0 or 1 occurred' for binomial GLMs, see Venables & Ripley (2002, pp. 197-8).

Value

mean.error classification error rate for each cross-validation run.
mat.predcited predicted class of each observation and each cross-validation run.

Author(s)

Kim-Anh Le Cao

References

Le Cao et al. (2009), submitted.

Ng, S.K. and McLachlan, G.J. (2008). Expert Networks with Mixed Continuous and Categorical Feature Variables: a Location Modeling Approach. Machine Learning Research Progress, ed. Hanna Peters and Mia Vogel, 1–14

Hunt, L. and Jorgensen, M. (1999). Mixture model clustering using the MULTIMIX program. Australian & New Zealand Journal of Statistics, 41, 2, 154–171.

See Also

MEfunctions, kmeans.init

Examples

## Not run: 
data(prostate)
data.cont = prostate$cont
data.cat = prostate$indep
type = prostate$type
type #check that type is sorted
# gene selection with t-test and ME model with a logistic regression in the gatin network:
res = integrativeME(
        data.cat = data.cat,
        data.cont = data.cont,
        type = type,
        select = 'student',
        method = 'logreg',
        keepX = 5,
        ng = 2,
        fold = 10,
        kcv = 1                    
)
## End(Not run)


[Package integrativeME version 1.1 Index]