minForest {gRapHD} | R Documentation |
Returns the forest that minimises the -2*log-likelihood, AIC, or BIC.
minForest(dataset,homog=TRUE,forbEdges=NULL,stat="BIC")
dataset |
matrix or data frame (nrow(dataset) observations and
ncol(dataset) variables). |
homog |
TRUE for homogeneous covariance structure, FALSE
for heterogeneous. This is only meaningful with mixed models.
Default is homogeneous (TRUE ). |
forbEdges |
list with edges that should not be considered. Matrix with 2
columns, each row representing one edge, and each column one
of the vertices in the edge. Default is NULL . |
stat |
measure to be minimized: LR (-2*log-likelihood), AIC, or BIC.
Default is BIC. It can also be a user
defined function with format: FUN(newEdge,varType,numCat,
dataset) ; where the parameters varType and numCat
are as defined in the Value section; newEdge is a vector
with length two; and dataset is a matrix (n by p). |
Returns for the tree or forest that minimizes the -2*log-likelihood, AIC, or
BIC. If the log-likelihood is used, the result is a tree, if AIC or BIC is used,
the result is a tree or forest.The dataset
contains variables
(vertices) in the columns, and observations in the rows. The result has vertices
numbered according to the column indexes in dataset
.
All discrete variables must be factors. All factor levels must be represented in
the data. Missing values are not allowed.
A list containing:
edges |
matrix with 2 columns, each row representing one edge, and each column one of the vertices in the edge. Column 1 contains the vertex with lower index. |
p |
number of variables (vertices) in the model. |
stat.minForest |
measure used (LR, AIC, or BIC). |
statSeq |
vector with value of stat.minForest for each edge. |
varType |
vector indicating the type of each variable: 0 if continuous, or 1 if discrete. |
numCat |
vector with number of levels for each variable (0 if continuous). |
homog |
TRUE if the covariance is homogeneous. |
numP |
vector with number of estimated parameters for each edge. |
minForest |
first and last edges found with minForest . |
Gabriel Coelho Goncalves de Abreu (Gabriel.Abreu@agrsci.dk)
Rodrigo Labouriau (Rodrigo.Labouriau@agrsci.dk)
David Edwards (David.Edwards@agrsci.dk)
Chow, C.K. and Liu, C.N. (1968) Approximating discrete probability distributions
with dependence trees. IEEE Transactions on Information Theory,
Vol. IT-14, 3:462-7.
Edwards, D., de Abreu, G.C.G. and Labouriau, R. (2009). High-dimensional Mixed
Graphical Models Using Minimal AIC and BIC forests. BMC Bioinformatics.
(submitted).
set.seed(7,kind="Mersenne-Twister") dataset <- matrix(rnorm(1000),nrow=100,ncol=10) m <- minForest(dataset,stat="BIC") ############################################################################## # Example with continuous variables data(dsCont) # m1 <- minForest(dataset,varType=0,homog=TRUE,forbEdges=NULL,stat="LR") # 1. in this case, there is no use for homog # 2. no forbidden edges # 3. the measure used is the LR (the result is a tree) m1 <- minForest(dsCont,homog=TRUE,forbEdges=NULL,stat="LR") plotG(model=m1,numIter=1000) ############################################################################## # Example with discrete variables data(dsDiscr) # m1 <- minForest(dataset,varType=1,homog=TRUE,forbEdges=NULL,stat="LR") # 1. in this case, there is no use for homog # 2. no forbidden edges # 3. the measure used is the LR (the result is a tree) m1 <- minForest(dsDiscr,homog=TRUE,forbEdges=NULL,stat="LR") plotG(model=m1,numIter=1000) ############################################################################## # Example with mixed variables data(dsMixed) # m1 <- minForest(dataset,varType=1,homog=TRUE,forbEdges=NULL,stat="LR") # 1. it is to be considered homogeneous # 2. no forbidden edges # 3. the measure used is the LR (the result is a tree) m1 <- minForest(dsMixed,homog=TRUE,forbEdges=NULL,stat="LR") plotG(model=m1,numIter=1000) ############################################################################## # Example using a user defined function # The function userFun calculates the same edges weigths as the option # stat="LR". It means that the final result, using either option, is the # same. userFun <- function(newEdge,varType,numCat,dataset) { sigma <- var(dataset[,newEdge]) v <- nrow(dataset)*log(prod(diag(sigma))/det(sigma)) return(c(v,1)) } data(dsCont) m <- minForest(dsCont,stat="LR") m1 <- minForest(dsCont,stat=userFun) identical(m$edges,m1$edges)