haplin {Haplin} | R Documentation |
Produces an object of class haplin
, which is the result of fitting the log-linear models to the data
haplin(filename, markers = "ALL", n.vars = 0, sep = " ", allele.sep = ";", na.strings = "NA", design = "triad", use.missing = FALSE, xchrom = FALSE, maternal = FALSE, scoretest = "no", ccvar = NULL, covar = NULL, sex = NULL, reference = "reciprocal", response = "free", threshold = 0.01, max.haplos = NULL, haplo.file = NULL, resampling = FALSE, max.EM.iter = 50, data.out = FALSE, verbose = TRUE, printout = TRUE)
Of the following arguments, only filename
is required. Use of the remaining arguments will depend on the type of analysis.
filename |
A character string giving the name and path of the ASCII data file to be read. |
markers |
Default is "ALL", which means HAPLIN uses all available markers in the data set in the analysis. For the current version of HAPLIN the number of markers used at a single run should probably not exceed 4 or 5 due to the computational burden. The markers argument can be used to select appropriate markers from the file without creating a new file for the selected markers. For instance, if markers is set to c(2,4), HAPLIN will only use the second and fourth markers supplied in the data set. When running HAPLIN, it may be a good idea to start exploring a few markers at a time, using this argument. |
n.vars |
Numeric. The number of variables (columns) in the data file before (to the left) of the genetic data. |
sep |
The character separator used in the data file to separate between "columns", where each column contains the two alleles of a single individual at a single marker. |
allele.sep |
The character separator used in the data file to separate the two alleles for a single individual in a single marker. The recommended (default) separator is ";", but for SNPs an empty "" is also common. |
na.strings |
The character string indicating missing data in the data file. Default is to use "NA" in place of, for instance, C;T for a SNP that hasn't been typed in that individual. |
design |
The value "triad" is used for the standard case triad design, without indepdendent controls. The value "cc.triad" means a combination of case triads and control triads. This requires the argument ccvar to point to the data column containing the case-control variable. The value "cc" means a simple case-control design, where the parents have not been genotyped (there are no data columns for parental genes) |
use.missing |
A logical value used to determine whether triads with missing data should be included in the analysis. When set to TRUE, Haplin uses the EM algorithm to obtain risk estimates, also taking into account triads with missing data. The standard errors and p-values are adjusted to correct for this. The default, however, is FALSE. When FALSE, all triads having any sort of missing data are excluded before the analysis is run. Note that Haplin only looks at markers actually used in the analysis, so that if the markers argument (see below) is used to select a collection of markers for analysis, Haplin only excludes triads with missing data on the included markers. |
xchrom |
Not yet implemented |
maternal |
If TRUE, maternal effects are estimated as well as the standard fetal effects. |
scoretest |
Special interest only. If "no", no score test is computed. If "yes", an overall score p-value is included in the output, and the individual score values are returned in the haplin object. If "only", haplin is only run under the null hypothesis, and a simple score object is returned instead of the full haplin object. Useful if only score testing is needed. |
ccvar |
Numeric. Should give the column number for the column containing the case-control indicator in the data file. Needed for the "cc" and "cc.triad" designs. The column should contain two numeric values, of which the largest one is always used to denote cases. |
covar |
Not yet implemented |
sex |
Not yet implemented |
reference |
Decides how HAPLIN chooses its reference category for the effect estimates. Default value is "reciprocal". With the reciprocal reference the effect of a single or double dose of each haplotype is measured relative to the remaining haplotypes. This means that a new reference category is used for each single haplotype. Other possible values are "population" (which is similar to reciprocal, but where the reference category is always the total population), and "ref.cat", where a single haplotype is used as reference for all the rest. For ref.cat, the default is to choose the most frequent haplotype as the reference haplotype. The reference haplotype can be set explicitly by giving a numeric value for the reference argument. Note that the numeric value refers to the haplotype's position among the haplotypes selected for analysis by HAPLIN. This means that one should run HAPLIN once first to see what haplotypes are used before giving a numeric value to reference. |
response |
The default value "free" means that both single- and double dose effects are estimated. Choosing "mult" instead specifies a multiplicative dose-response model. |
threshold |
Sets the (approximate) lower limit for the haplotype frequencies of those haplotypes that should be retained in the analysis. Hapotypes that are less frequent are removed, and information about this is given in the output. |
max.haplos |
Not yet implemented |
haplo.file |
Not yet implemented |
resampling |
Default is FALSE. When FALSE, the individual haplotypes reconstructed by the EM algorithm as assumed known when computing CIs and p-values. If set to "jackknife" a jackknife-based resampling procedure is used when computing confidence intervals and p-values for effect estimates. This takes more time, but corrects the CIs and p-values for the uncertainty contained in unphased data. Note: in this version of Haplin, the resampling is no longer needed since the confidence intervals and p-values are already corrected in the standard computation. |
max.EM.iter |
The maximum number of iterations used by the EM algorithm. This value can be increased if necessary, which sometimes is the case with e.g. case-control data which a substantial amount of missing. However, for triad data with little missing information there is usually no need for many iterations. |
data.out |
Not yet implemented |
verbose |
Default is T (=TRUE). During the EM algorithm, HAPLIN prints the estimated parameters and deviance for each step. To avoid the output, set this argument to F (=FALSE). |
printout |
Logical. If TRUE (default), haplin prints a full summary of the results after finishing the estimation. If FALSE, no such printout is given, but the summary function can later be applied to a saved result to get the same summary. |
The output can be examined by print, summary and plot.
An object of class haplin is returned
Typically, some of the included haplotypes will be relatively rare, such as a frequency of 1% - 5%. For those haplotypes there may be too little data to estimate the double doses properly, so the estimates may be unreliable. This is seen from the extremely wide confidence intervals. The rare double dose estimates should be disregarded, but the remaining single and double dose estimates are valid. To avoid the problem one can also reduce the model to a purely multiplicative model by setting response = "mult"
.
Further information is found on the web page
Håkon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
hakon.gjessing@fhi.no
Gjessing HK and Lie RT. Case-parent triads: Estimating single- and double-dose effects of fetal and maternal disease gene haplotypes. Annals of Human Genetics (2006) 70, pp. 382-396.
Web Site: http://www.uib.no/smis/gjessing/genetics/software/haplin/
## Not run: # Standard run: haplin("data.dat") # Specify path, estimate maternal effects: haplin("C:/work/data.dat", maternal = T) # Specify path, use haplotype no. 2 as reference: haplin("C:/work/data.dat", reference = 2) # Remove more haplotypes from estimation by increasing the threshold # to 5%: haplin("C:/work/data.dat", threshold = 0.05) # Estimate maternal effects, using the most frequent haplotype as reference. # Use all data, including triads with missing data. Select # markers 3, 4 and 8 from the supplied data. haplin("C:/work/data.dat", use.missing = T, maternal = T, reference = "ref.cat", markers = c(3,4,8)) # Note: in this version of Haplin, the jackknife is # no longer necessary since the standard errors are already corrected. # Some examples showing how to save the Haplin result and later # recall plot and summary results: # Same analysis as above, saving the result in the object "result.1": result.1 <- haplin("C:/work/data.dat", use.missing = T, maternal = T, reference = "ref.cat", markers = c(3,4,8)) # Replot the saved result (fetal effects): plot(result.1) # Replot the saved result (maternal effects): plot(result.1, plot.maternal = T) # Print a very short summary of saved result: result.1 # A full summary of saved result, with confidence intervals and # p-values (the same as haplin prints when running): summary(result.1) # Some examples when the data file contains two covariates, # the second is the case-control variable: # The following standard triad run is INCORRECT since it disregards # case status: haplin("data.dat", use.missing = T, n.vars = 2, design = "triad") # Combined run on "hybrid" design, correctly using both case-parent # triads and control-parent triads: haplin("data.dat", use.missing = T, n.vars = 2, ccvar = 2, design = "cc.triad") # If parent columns are not in the file, a plain case-control # run can be used: haplin("data.dat", use.missing = T, n.vars = 2, ccvar = 2, design = "cc", response = "mult", reference = "ref.cat") ## End(Not run)