geneclust {Geneclust}R Documentation

Bayesian inference of population structure using multilocus genotypes and spatial coordinates

Description

Main function of the "geneclust" package. Performs inference of population structure using spatial coordinates and multilocus genotypes.

Usage

geneclust(project.name = "Data", data, npopmax = 3, psi = 0.5, nit = 1000, burnin = 10, thinning = 1, c = NULL, freq = NULL, tabcst = NULL, 
matngh = NULL, fis = rep(0, npopmax), varpsi = FALSE, varfis = FALSE, otherconfig=NULL, write=FALSE)

Arguments

project.name A path to output files directory.
data Object of class "geneclustdata". A dataframe that contains individuals locations and their genotypes.
npopmax An initial number of clusters. May be different from the final number of clusters.
psi A (numeric) value for the spatial interaction parameter. The psi = 0 option corresponds to an implementation of the algorithm STRUCTURE. Typical values should be between 0 and 1.
nit Number of MCMC cycles. One cycle visits each locus and each individual.
burnin Number of cycles corresponding to the MCMC burnin period (Markov chain internal parameter).
thinning Number of recorded cycles (Markov chain internal parameter).
c A vector containing starting cluster labels for the sample. If c = NULL, the program starts with the uniform distribution on (1,..,npopmax)
freq A three dimensional array which contains the initial allele frequencies in each cluster, for each locus and each allele. If freq = NULL, the initial allele frequencies are randomly sampled according to the Dirichlet distribution D(1,1,..,1).
tabcst If varpsi = T and tabcst = NULL, the program computes a partition function table (cf function tablecst) which may takes time. A vector with 11 components is returned and used afterwards for the inference of the Potts-Dirichlet spatial interaction parameter psi.
matngh An binary matrix which defines the neighbourhoods to be used in the Potts prior model. If matngh[i,j]=1 then i and j are neighbours. If matngh = NULL, the matrix is computed from the Delaunay graph via the package deldir.
fis A vector with npopmax components containing the initial values for inbreeding coefficients. If fis = NULL, the initialization is at random.
varpsi Logical: if varpsi = TRUE, the spatial interaction parameter psi is treated as an unknown parameter and varies along the MCMC run. If varpsi = FALSE, then psi is kept fixed.
varfis Logical: if varfis = TRUE, the inbreeding coefficients are treated as unknown parameters and vary along the MCMC run. If varfis = FALSE, they are kept fixed to the initial value fis.
otherconfig A spatial configuration of individuals to compare with the posterior spatial configuration reached after the MCMC run. For example, it could be a configuration obtained with a non-Bayesian hierarchical clustering algorithm such as the Ward reconstruction method.
write Logical: If TRUE, some outputs are written in ascci files in the directory project.name

Details

Bayesian clustering and computations of individual membership probabilities are performed using a MCMC algorithm similar to STRUCTURE (Pritchard et al, 2000) implemented in the main functions geneclust and mcmcgeneclust

In addition, the package includes the use of Hidden Markov Random Fields (HMRF) priors enabling the simultaneous analysis of spatial coordinates. So the input data include individual genotypes and spatial coordinates.

Basically the HMRF is used as a model for the spatial continuity of genotypes within a population. It contains a spatial interaction parameter psi which represents the intensity at which individual genotypes depends from their neighbors. For psi = 0, the program corresponds to another implementation of STRUCTURE. The HMRF assumes an interaction graph for the individuals. In the default implementation the graph is computed as the Dirichlet-Delaunay structure via the package deldir. But the program allows modifications or other implementation of graphs.

The model also assumes linkage equilibrium, but tolerates departures from the HW equilibrium using imbreeding coefficients. For psi > 0, the program includes an automatic selection procedure for the actual number of clusters in the population based on Bayesian regularization.

Basically this function does the same as mcmcgeneclust but it proposes a simplified version, and an easier access to data summaries. Users of the geneland package or users at the expert stage may prefer the function mcmcgeneclust which offers more parameters.

Value

An object of class geneclust.

prob A matrix with indicates the posterior distributions of membership coefficients for each individual
membership A vector containing the most likely cluster membership
K Estimated number of clusters (less than the initial number)
postmodepsi The posterior mode of psi
postmeanfis A numerical vector which contains the posterior mean of inbreeding coefficient in each identified cluster
postquantfis A matrix that stores the posterior distribution quantiles of each inbreeding coefficient. Each line corresponds to one cluster
diffclassif A rate of misclassification computed if another spatial configuration is given as argument
coord Individual spatial coordinates
psi Spatial interaction parameter after nit MCMC cycles
fis Inbreeding coefficients after nit cycles
path Path to the MCMC program output data
c Cluster configuration after nit MCMC cycles
freq Allele frequencies after nit MCMC cycles
matngh Neighbourhood matrix
tabcst Partition function table

Author(s)

Sophie Ancelet

References

On mixture models in population genetics:

- J.K. Pritchard, M. Stephens and P. Donnelly, Inference of population structure using multilocus genotype data, Genetics, pp 945-959 vol. 155, 2000

- Falush D., M. Stephens and J.K. Pritchard, Inference of population structure using multilocus genotype data: Linked loci and correlated allele frequencies, Genetics, pp 1567-1587, vol 164, 2003

- G. Guillot, Estoup, A., Mortier, F. Cosson, J.F. A spatial statistical model for landscape genetics. Genetics, 170, 1261-1280, 2005.

- G. Guillot, Mortier, F., Estoup, A. Geneland : A program for landscape genetics. Molecular Ecology Notes, 5, 712-715, 2005.

On the implementation of MCMC inference on the spatial interaction parameter psi of a Potts-Dirichlet model:

- P.Green, S.Richardson: Hidden Markov models and disease mapping, Journal of the American Statistical Association 97(460): 1055-1070

On the model (and sub-models) implemented in geneclust

- O.Francois, S. Ancelet, G. Guillot (2006). papers in preparation.

See Also

Functions mcmcgeneclust,postclassif,postpsi,postfis


[Package Geneclust version 1.0.0 Index]