Mort1Dsmooth {MortalitySmooth}R Documentation

Fit One-dimensional Poisson P-splines

Description

Returns an object of class Mort1Dsmooth which is a P-splines smooth of the input data of degree and order fixed by the user. Specifically tailored to mortality data.

Usage

Mort1Dsmooth(x, y, offset, w,
             overdispersion=FALSE,
             ndx = floor(length(x)/5), deg = 3, pord = 2,
             lambda = NULL, df = NULL, method = 1,
             control = list())

Arguments

x Values of the predictor variable. These must be at least 2 ndx + 1 of them.
y Set of counts response variable values. y must be a vector of the same length as x
offset This can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length either one or equal to the number of cases.
w An optional vector of weights to be used in the fitting process. This should be NULL or a numeric vector of length equal to the number of cases.
overdispersion Logical on the accounting for overdisperion in the smoothing parameter selection criterion. See Details. Default: FALSE.
ndx Number of internal knots -1. Default: floor(length(x)/5).
deg Degree of the B-splines. Default: 3.
pord Order of differences. Default: 2.
lambda Smoothing parameter (optional).
df A number which specifies the degrees of freedom (optional).
method The method for controlling the amount of smoothing. method = 1 (default) adjusts the smoothing parameter so that the BIC is minimized. method = 2 adjusts lambda so that the AIC is minimized. method = 3 uses the value supplied for lambda. method = 4 adjusts lambda so that the degrees of freedom is equal to the supplied df.
control A list of control parameters. See Details.

Details

The method fits a P-spline model with equally-spaced B-splines along x. The response variables must be Poisson distributed counts, though overdisperion can be accounted. Offset can be provided, otherwise the default is that all weights are one.

The method produces results similar to function smooth.spline, but the smoothing function is a B-spline smooth with discrete penalization directly on the differences of the B-splines coefficients. The user can set the order of difference, the degree of the B-splines and number of them. Nevertheless, the smoothing parameter lambda is mainly used to tune the smoothness/model fidelity of the fitted values.

There are print.Mort1Dsmooth, summary.Mort1Dsmooth, plot.Mort1Dsmooth predict.Mort1Dsmooth and residuals.Mort1Dsmooth methods available for this function.

Four methods are possible and optimal smoothing parameter based on BIC is set as default. Minimization ofthe AIC is also possible. BIC will give always smoother outcomes with respect to AIC, especially for large sample size. Alternatively the user can directly provide the smoothing parameter (method=3) or the degree of freedom to be used in the model (method=4). Note that Mort1Dsmooth uses approximated degree of freedom, therefore method=4 will produce fitted values with degree of freedom only similar to the one provided in df. The tolerance level can be set via control - TOL2.

Note that the 'ultimate' smoothing with very large lambda will approach to a polynomial of degree pord.

The argument overdispersion can be set to TRUE when smoothing parameter selection has to account for possible presence of over(under)dispersion. Mortality data often present overdispersion also known, in demography, as heterogeneity. Duplicates in insurance data can lead to overdispersed data, too. Smoothing parameter selection may be affected by this phenomenon. When overdispersion=TRUE, the function uses a penalized quasi-likelihood method for including an overdisperion parameter (psi2) in the fitting procedure. With this approach expected values are assumed equal to the variance multiplied by the parameter psi2. See reference. Note that the inclusion of the overdisperion parameter within the estimation might lead to select higher lambda, leading to smoother outcomes. When overdispersion=FALSE (default value) or method=3 or method=4, psi2 is estimated after the smoothing parameter have been employed. Overdispersion parameter larger (smaller) than 1 may be a sign of overdispersion (underdispersion).

The control argument is a list that can supply any of the following components:

MON: Logical. If TRUE tracing information on the progress of the fitting is produced. Default: FALSE.

TOL1: The absolute convergence tolerance. Default: 1e-06.

TOL2: Difference between two adjacent smoothing parameters in the (pseudo) grid search, log-scale. Useful only when method is equal to 1, 2 or 4. Default: 0.1.

MAX.IT: The maximum number of iterations. Default: 50.

The arguments MON, TOL1 and MAX.IT are kept during all the (pseudo) grid search when method is equal to 1, 2 or 4. Function cleversearch from package svcm is employed to speed the grid search.

The function is specifically tailored to smooth mortality data in one-dimensional setting. In such case the argument x would be either the ages or the years under study. Death counts will be the argument y. In a Poisson regression setting applied to actual death counts the offset will be the logarithm of the exposure population. See example below.

Value

An object of the class Mort1Dsmooth with components:

coefficients vector of fitted (penalized) B-splines coefficients.
residuals the deviance residuals.
fitted.values vector of fitted counts.
linear.predictor vector of fitted linear predictor.
leverage diagonal of the hat-matrix.
df effective dimension.
deviance Poisson Deviance.
aic Akaike's Information Criterion.
bic Bayesian Information Criterion.
psi2 Overdispersion parameter.
lambda the selected (given) smoothing parameter lambda.
call the matched call.
n number of observations.
tolerance the used tolerance level.
ndx the number of internal knots -1.
deg degree of the B-splines.
pord order of difference.
x values of the predictor variable.
y set of counts response variable values.
offset vector of the offset.
w vector of weights used in the model.

Author(s)

Carlo G Camarda

References

Eilers and Marx (1996). Flexible Smoothing with B-splines and Penalties. Statistical Science. Vol. 11. 89-121.

See Also

predict.Mort1Dsmooth, plot.Mort1Dsmooth.

Examples

# selected data
years <- 1950:2006
death <- selectHMDdata("Japan", "Deaths", "Females",
                       ages = 80, years = years)
exposure <- selectHMDdata("Japan", "Exposures", "Females",
                          ages = 80, years = years)
# various fits
# default using Bayesian Information Criterion
fitBIC <- Mort1Dsmooth(x=years, y=death, offset=log(exposure))
fitBIC
summary(fitBIC)
# subjective choice of the smoothing parameter lambda
fitLAM <- Mort1Dsmooth(x=years, y=death, offset=log(exposure),
                       lambda=10000, method=3)
# plot
plot(years, log(death/exposure),
main="Mortality rates, log-scale. Japanese females, age 80, 1950:2006")
lines(years, log(fitted(fitBIC)/exposure), col=2, lwd=2)
lines(years, log(fitted(fitLAM)/exposure), col=3, lwd=2)
legend("topright", c("Actual", "BIC", "lambda=10000"),
       col=1:3, lwd=c(1,2,2), lty=c(-1,1,1),
       pch=c(1,-1,-1))

# about Extra-Poisson variation (overdispersion)
# checking the presence of overdispersion
fitBIC$psi2 # quite larger than 1
# fitting accounting for overdispersion
fitBICover <- Mort1Dsmooth(x=years, y=death, offset=log(exposure),
                           overdispersion=TRUE)
# difference in the selected smoothing parameters
fitBIC$lambda;fitBICover$lambda
# plotting both situations
plot(fitBICover)
lines(years, log(fitBIC$fitted) - fitBIC$offset, col=4, lwd=2, lty=2) 

[Package MortalitySmooth version 1.0 Index]