simupopD {forensim} | R Documentation |
Simulate multi-population allele frequencies for independent loci, from a given reference population, following a Dirichlet model. Allele frequencies in the populations are generated as random deviates from a Dirichlet distribution, the parameters of which control the deviation of allele frequencies from the values in the reference population.
simupopD(npop = 1, nloc = 1, na = 2, globalfreq = NULL, which.loc = NULL, alpha1, alpha2 = 1)
npop |
the number of populations |
nloc |
the number of loci |
na |
an integer vector giving the numbers of alleles per locus |
globalfreq |
matrix of allele frequencies in the reference population. Data must be given in the format of the Journal of Forensic
Sciences for genetic data. Default corresponds to allele frequencies generated form a Dirichlet distribution
with parameter alpha2 for all allele frequencies. |
which.loc |
which loci to simulate from the globalfreq matrix, default considers all loci |
alpha1 |
a positive float vector of length npop giving the variance parameter of the Dirichlet
distribution used
to generate allele frequencies in the npop independent populations |
alpha2 |
a positive float giving the parameter to be used to in the Dirichlet distribution to generate allele frequencies for the reference population |
In the reference population, allele frequencies for independent loci are simulated using a Dirichlet distribution with
parameter alpha2
.
At a given locus L with n alleles, the allele frequencies are modeled as a vector of random
variables p=(p1, ..., pn) following a Dirichlet distribution with a parameter vector of length n,
where each component is equal to alpha2, p1+...+pn=1 and alpha2 > 0.
Note that a more sophisticated generation of global allele frequencies is possible using the simufreqD
function.
Similarly, allele frequencies in the independent populations are simulated using a Dirichlet Distribution.
For example, for the first population to simulate, at a given locus L with n alleles,
the allele frequencies are modeled as a vector
of random variables p=(p1, ..., pn) following a Dirichlet distribution with a parameter vector of length n:
(p1(1-a1)/alpha1[1], ..., pn(1-alpha1[1])/alpha1[1]), where p1+...+pn=1 and alpha1[1] > 0.
alpha1[1] is the variance parameter for population 1 and is equivalent to Wright's Fst. The closest this parameter is to one,
the more the population allele frequencies are different from the values of the reference population.
The result is stored in a list with two elements :
globfreq |
a tabfreq object giving the allele frequencies of the chosen reference population,
with the chosen loci. |
popfreq |
a tabfreq object giving the allele frequencies of the simulated populations. |
The code used here for the generation of random Dirichlet deviates was previously implemented in the gtools library.
Hinda Haned haned@biomserv.univ-lyon1.fr
Nicholson G, Smith AV, Jonsson F, Gustafsson O, Stefansson K, Donnelly P.
Assessing population differentiation and isolation from single-nucleotide polymorphism data.
J Roy Stat Soc B 2002;64:695–715
Marchini J, Cardon LR. Discussion on the meeting on "Statistical modelling and analysis of genetic data"
J Roy Stat Soc B, 2002;64:740-741
Wright S. The genetical structure of populations. Ann Eugen 1951;15:323-354
# simulate allele frequencies for two populations data(Tu) simupopD(npop=2,globalfreq=Tu, which.loc=c("FGA","TH01","TPOX"), alpha1=c(0.2,0.3),alpha2=1)