Mort2Dsmooth {MortalitySmooth}R Documentation

Fit Two-dimensional Poisson P-splines

Description

Returns an object of class Mort2Dsmooth which is a two-dimensional P-splines smooth of the input data of degree and order fixed by the user. Specifically tailored to mortality data.

Usage

Mort2Dsmooth(x, y, Z, offset, W, overdispersion=FALSE, 
             ndx = c(floor(length(x)/5), floor(length(y)/5)), 
             deg = c(3, 3), pord = c(2, 2), 
             lambdas = NULL, df = NULL, method = 1, 
             control = list())

Arguments

x Vector for the abscissa of data. These must be at least 2 ndx[1] + 1 of them.
y Vector for the ordinate of data. These must be at least 2 ndx[2] + 1 of them.
Z Matrix of counts response. Dimensions of Z should correspond to the length of x and y, respectively.
offset Matrix with an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric matrix of dimensions of Z or a single numeric value.
W An optional matrix of weights to be used in the fitting process. This should be NULL or a numeric matrix of dimensions of Z or a single numeric value.
overdispersion Logical on the accounting for overdisperion in the smoothing parameters selection criterion. See Details. Default: FALSE.
ndx Vector with the number of internal knots -1 for each axis. Default: [floor(length(x)/5), floor(length(y)/5)].
deg Vector with the degree of the B-splines for each axis. Default: [3, 3].
pord Vector with the order of differences for each axis. Default: [2, 2].
lambdas Vector with smoothing parameters, possibly one for axis (optional).
df A number which specifies the degrees of freedom (optional).
method The method for controlling the amount of smoothing. method = 1 (default) adjusts the two smoothing parameters so that the BIC is minimized. method = 2 adjusts lambdas so that the AIC is minimized. method = 3 uses the values supplied for lambdas. Isotropic smoothing is allowed in this method. method = 4 adjusts lambdas so that the degrees of freedom is equal to the supplied df.
control A list of control parameters. See Details.

Details

The method fits a two-dimensional P-spline model with equally-spaced B-splines along x and y. The response variables must be a matrix of Poisson distributed counts. Offset can be provided, otherwise the default is that all weights are one.

The method produces results from a smoothing function which is the Kronecker product of B-spline basis over the two axes and include a discrete penalization directly on the differences of the B-splines coefficients. The user can set the order of difference, the degree of the B-splines and number of them for each of the axis. Nevertheless, the smoothing parameters lambdas are mainly used to tune the smoothness/model fidelity of the fitted values.

There are print.Mort2Dsmooth, summary.Mort2Dsmooth, plot.Mort2Dsmooth predict.Mort2Dsmooth and residuals.Mort2Dsmooth methods available for this function.

Four methods are possible and the minimization of the BIC is set as default method for optimizing the two smoothing parameters. Minimization of the AIC is also possible. BIC will give always smoother outcomes with respect to AIC, especially for large sample size. Alternatively the user can directly provide the smoothing parameters (method=3) or the degree of freedom to be used in the model (method=4). If the user provides only a single value for the argument lambdas, isotropic smoothing is applied (with warning). Note that Mort2Dsmooth uses approximated degree of freedom, therefore method=4 will produce fitted values with degree of freedom only similar to the one provided in df. The tolerance level can be set via control - TOL2.

Note that the two-dimensional 'ultimate' smoothing with very large lambda will approach to a surface which is a product of two polynomial of degree pord[1] and pord[2], respectively. In particular, when pord=c(2,2) the 'ultimate' smoothing is a bi-linear surface over x and y.

The argument overdispersion can be set to TRUE when smoothing parameters selection has to account for possible presence of over(under)dispersion. Mortality data often present overdispersion also known, in demography, as heterogeneity. Duplicates in insurance data can lead to overdispersed data, too. Smoothing parameters selection may be affected by this phenomenon. When overdispersion=TRUE, the function uses a penalized quasi-likelihood method for including an overdisperion parameter (psi2) in the fitting procedure. With this approach expected values are assumed equal to the variance multiplied by the parameter psi2. See reference. Note that with overdispersed data both BIC and AIC might select higher lambdas, leading to smoother outcomes. When overdispersion=FALSE (default value) or method=3 or method=4, psi2 is estimated after the smoothing parameters have been employed. Overdispersion parameter larger (smaller) than 1 may be a sign of overdispersion (underdispersion).

The control argument is a list that can supply any of the following components:

MON: Logical. If TRUE tracing information on the progress of the fitting is produced. Default: FALSE.

TOL1: The absolute convergence tolerance. Default: 1e-06.

TOL2: Difference between two adjacent smoothing parameters in the (pseudo) grid search, log-scale. Useful only when method is equal to 1, 2 or 4. Default: 0.1.

MAX.IT: The maximum number of iterations. Default: 50.

The arguments MON, TOL1 and MAX.IT are kept during all the (pseudo) grid search when method is equal to 1, 2 or 4. Function cleversearch from package svcm is employed to speed the grid search.

The function is specifically tailored to smooth mortality data in one-dimensional setting. In such case the argument x would be either the ages and the argument y the years under study. The matrix of death counts will be the argument Z. In a Poisson regression setting applied to actual death counts the offset will be the logarithm of the matrix of exposure population. See example below.

The inner functions work using an arithmetic of arrays defined as Generalized Linear Array Model (GLAM) (see reference). In order to avoid construction of large Kronecker product basis from the large number of B-splines along the axes, the function profits of the special structure of both the data as rectangular array and the model matrix as tensor product. It uses sequence of nested matrix operations and this leads to low storage and high speed computation within the IWLS algorithm. Moreover, the function do not vectorize the whole system keeping the actual two-dimensional array structure within the scoring algorithm.

Value

An object of the class Mort2Dsmooth with components:

coefficients matrix of fitted (penalized) B-splines coefficients.
residuals matrix of deviance residuals.
fitted.values matrix of fitted counts.
linear.predictor matrix of fitted linear predictor.
df effective dimension.
dev Poisson Deviance.
aic Akaike's Information Criterion.
bic Bayesian Information Criterion.
psi2 Overdispersion parameter.
lambdas the selected (given) smoothing parameters.
call the matched call.
m length of argument x.
n length of argument y.
tolerance the used tolerance level.
lev diagonal of the hat-matrix.
ndx the number of internal knots -1 for each axis.
deg degree of the B-splines for each axis.
pord order of difference for each axis.
x vector for the abscissa of data.
y vector for the ordinate of data.
Z matrix of counts response.
offset matrix with an a priori known component.
w matrix of weights.

Author(s)

Carlo G Camarda

References

Eilers, P. H. C., I. D. Currie, and M. Durban (2006). Fast and compact smoothing on large multidimensional grids. Computational Statistics & Data Analysis 50, 61-76.

Currie, I. D., M. Durban, and P. H. C. Eilers (2006). Generalized linear array models with applications to multidimentional smoothing. Journal of the Royal Statistical Society. Series B. 68, 259-280.

See Also

predict.Mort2Dsmooth, plot.Mort2Dsmooth.

Examples

# selected data
ages <- 50:100
years <- 1950:2006
death <- selectHMDdata("Japan", "Deaths", "Females",
                       ages = ages, years = years) 
exposure <- selectHMDdata("Japan", "Exposures", "Females",
                          ages = ages, years = years)

# fit with BIC
fitBIC <- Mort2Dsmooth(x=ages, y=years, Z=death, offset=log(exposure)) 
fitBIC
summary(fitBIC)

# plot age 50 log death rates (1st row)
plot(years, log(death[1,]/exposure[1,]),
     main="Mortality rates, log-scale. Japanese females, age 50, 1950:2006")
lines(years, log(fitted(fitBIC)[1,]/exposure[1,]), col=2, lwd=2)

# plot over age and years fitted log death rates from fitBIC
grid. <- expand.grid(list(ages=ages, years=years))
grid.$lmx <- c(log(fitted(fitBIC)/exposure))
levelplot(lmx ~ years * ages , grid., 
          at=quantile(grid.$lmx, seq(0,1,length=10)),
          col.regions=rainbow(9))

# about Extra-Poisson variation (overdispersion)
# checking the presence of overdispersion
fitBIC$psi2
# fitting accounting for overdispersion
fitBICover <- Mort2Dsmooth(x=ages, y=years, Z=death,
                           offset=log(exposure),
                           overdispersion=TRUE)
# difference in the selected smoothing parameters
fitBIC$lambdas;fitBICover$lambdas

[Package MortalitySmooth version 1.0 Index]