geneclust {Geneclust} | R Documentation |
Main function of the "geneclust" package. Performs inference of population structure using spatial coordinates and multilocus genotypes.
geneclust(project.name = "Data", data, npopmax = 3, psi = 0.5, nit = 1000, burnin = 10, thinning = 1, c = NULL, freq = NULL, tabcst = NULL, matngh = NULL, fis = rep(0, npopmax), varpsi = FALSE, varfis = FALSE, otherconfig=NULL, write=FALSE)
project.name |
A path to output files directory. |
data |
Object of class "geneclustdata". A dataframe that contains individuals locations and their genotypes. |
npopmax |
An initial number of clusters. May be different from the final number of clusters. |
psi |
A (numeric) value for the spatial interaction parameter. The
psi = 0 option corresponds to an implementation of the
algorithm STRUCTURE.
Typical values should be between 0 and 1. |
nit |
Number of MCMC cycles. One cycle visits each locus and each individual. |
burnin |
Number of cycles corresponding to the MCMC burnin period (Markov chain internal parameter). |
thinning |
Number of recorded cycles (Markov chain internal parameter). |
c |
A vector containing starting cluster labels for the sample. If c = NULL , the program starts with the uniform distribution on (1,..,npopmax ) |
freq |
A three dimensional array which contains the initial allele frequencies in each cluster, for each locus and each allele. If freq = NULL, the initial allele frequencies are randomly sampled according to the Dirichlet distribution D(1,1,..,1). |
tabcst |
If varpsi = T and tabcst = NULL, the program computes a
partition function table (cf function tablecst ) which may takes
time.
A vector with 11 components is returned and used afterwards for the inference of the Potts-Dirichlet spatial interaction parameter psi . |
matngh |
An binary matrix which defines the neighbourhoods to be
used in the Potts prior model.
If matngh[i,j]=1 then i and j are neighbours. If matngh = NULL , the matrix is computed from the Delaunay graph via the package deldir . |
fis |
A vector with npopmax components containing the initial
values for inbreeding coefficients. If fis = NULL, the initialization is at random. |
varpsi |
Logical: if varpsi = TRUE, the spatial interaction
parameter psi is treated as an
unknown parameter and varies along the MCMC run. If varpsi = FALSE , then psi is kept fixed. |
varfis |
Logical: if varfis = TRUE, the inbreeding coefficients
are treated as unknown parameters and
vary along the MCMC run. If varfis = FALSE , they are kept fixed to the initial value fis . |
otherconfig |
A spatial configuration of individuals to compare with the posterior spatial configuration reached after the MCMC run. For example, it could be a configuration obtained with a non-Bayesian hierarchical clustering algorithm such as the Ward reconstruction method. |
write |
Logical: If TRUE, some outputs are written in ascci files in the directory project.name |
Bayesian clustering and computations of individual membership
probabilities are performed using a
MCMC algorithm similar to STRUCTURE (Pritchard et al, 2000) implemented in the main functions geneclust
and mcmcgeneclust
In addition, the package includes the use of Hidden Markov Random Fields (HMRF) priors enabling the simultaneous analysis of spatial coordinates. So the input data include individual genotypes and spatial coordinates.
Basically the HMRF is used as a model for the spatial continuity of genotypes within a population. It contains a spatial interaction parameter psi which represents the intensity at which individual genotypes depends from their neighbors. For psi = 0, the program corresponds to another implementation of STRUCTURE. The HMRF assumes an interaction graph for the individuals. In the default implementation the graph is computed as the Dirichlet-Delaunay structure via the package deldir. But the program allows modifications or other implementation of graphs.
The model also assumes linkage equilibrium, but tolerates departures from the HW equilibrium using imbreeding coefficients. For psi > 0, the program includes an automatic selection procedure for the actual number of clusters in the population based on Bayesian regularization.
Basically this function does the same as mcmcgeneclust
but
it proposes a simplified version, and an easier
access to data summaries. Users of the geneland
package or users at the expert stage may prefer the function mcmcgeneclust
which offers more parameters.
An object of class geneclust
.
prob |
A matrix with indicates the posterior distributions of membership coefficients for each individual |
membership |
A vector containing the most likely cluster membership |
K |
Estimated number of clusters (less than the initial number) |
postmodepsi |
The posterior mode of psi |
postmeanfis |
A numerical vector which contains the posterior mean of inbreeding coefficient in each identified cluster |
postquantfis |
A matrix that stores the posterior distribution quantiles of each inbreeding coefficient. Each line corresponds to one cluster |
diffclassif |
A rate of misclassification computed if another spatial configuration is given as argument |
coord |
Individual spatial coordinates |
psi |
Spatial interaction parameter after nit MCMC cycles |
fis |
Inbreeding coefficients after nit cycles |
path |
Path to the MCMC program output data |
c |
Cluster configuration after nit MCMC cycles |
freq |
Allele frequencies after nit MCMC cycles |
matngh |
Neighbourhood matrix |
tabcst |
Partition function table |
Sophie Ancelet
On mixture models in population genetics:
- J.K. Pritchard, M. Stephens and P. Donnelly, Inference of population structure using multilocus genotype data, Genetics, pp 945-959 vol. 155, 2000
- Falush D., M. Stephens and J.K. Pritchard, Inference of population structure using multilocus genotype data: Linked loci and correlated allele frequencies, Genetics, pp 1567-1587, vol 164, 2003
- G. Guillot, Estoup, A., Mortier, F. Cosson, J.F. A spatial statistical model for landscape genetics. Genetics, 170, 1261-1280, 2005.
- G. Guillot, Mortier, F., Estoup, A. Geneland : A program for landscape genetics. Molecular Ecology Notes, 5, 712-715, 2005.
On the implementation of MCMC inference on the spatial interaction parameter psi of a Potts-Dirichlet model:
- P.Green, S.Richardson: Hidden Markov models and disease mapping, Journal of the American Statistical Association 97(460): 1055-1070
On the model (and sub-models) implemented in geneclust
- O.Francois, S. Ancelet, G. Guillot (2006). papers in preparation.
Functions mcmcgeneclust
,postclassif
,postpsi
,postfis