Mort2Dsmooth {MortalitySmooth} | R Documentation |
Returns an object of class Mort2Dsmooth
which is a
two-dimensional P-splines smooth of the input data of degree and order
fixed by the user. Specifically tailored to mortality data.
Mort2Dsmooth(x, y, Z, offset, W, overdispersion=FALSE, ndx = c(floor(length(x)/5), floor(length(y)/5)), deg = c(3, 3), pord = c(2, 2), lambdas = NULL, df = NULL, method = 1, control = list())
x |
Vector for the abscissa of data. These must be at least
2 ndx[1] + 1 of them. |
y |
Vector for the ordinate of data. These must be at least
2 ndx[2] + 1 of them. |
Z |
Matrix of counts response. Dimensions of Z should
correspond to the length of x and y , respectively. |
offset |
Matrix with an a priori known component to be included
in the linear predictor during fitting. This should be NULL or
a numeric matrix of dimensions of Z or a single numeric
value. |
W |
An optional matrix of weights to be used in the fitting
process. This should be NULL or a numeric matrix of dimensions
of Z or a single numeric value. |
overdispersion |
Logical on the accounting for overdisperion in
the smoothing parameters selection criterion. See
Details . Default: FALSE. |
ndx |
Vector with the number of internal knots -1 for each
axis. Default: [floor(length(x)/5) ,
floor(length(y)/5) ]. |
deg |
Vector with the degree of the B-splines for each axis. Default: [3, 3]. |
pord |
Vector with the order of differences for each axis. Default: [2, 2]. |
lambdas |
Vector with smoothing parameters, possibly one for axis (optional). |
df |
A number which specifies the degrees of freedom (optional). |
method |
The method for controlling the amount of
smoothing. method = 1 (default) adjusts the two smoothing
parameters so that the BIC is minimized. method = 2 adjusts
lambdas so that the AIC is minimized. method = 3 uses
the values supplied for lambdas . Isotropic smoothing is allowed
in this method. method = 4 adjusts lambdas so that the
degrees of freedom is equal to the supplied df . |
control |
A list of control parameters. See Details . |
The method fits a two-dimensional P-spline model with equally-spaced
B-splines along x
and y
. The response variables must be
a matrix of Poisson distributed counts. Offset can be provided,
otherwise the default is that all weights are one.
The method produces results from a smoothing function which is the
Kronecker product of B-spline basis over the two axes and include a
discrete penalization directly on the differences of the B-splines
coefficients. The user can set the order of difference, the degree of
the B-splines and number of them for each of the axis. Nevertheless,
the smoothing parameters lambdas
are mainly used to tune the
smoothness/model fidelity of the fitted values.
There are print.Mort2Dsmooth
,
summary.Mort2Dsmooth
, plot.Mort2Dsmooth
predict.Mort2Dsmooth
and
residuals.Mort2Dsmooth
methods available for this
function.
Four methods are possible and the minimization of the BIC is set as
default method for optimizing the two smoothing
parameters. Minimization of the AIC is also possible. BIC will give
always smoother outcomes with respect to AIC, especially for large
sample size. Alternatively the user can directly provide the smoothing
parameters (method=3
) or the degree of freedom to be used in
the model (method=4
). If the user provides only a single value
for the argument lambdas
, isotropic smoothing is applied (with
warning). Note that Mort2Dsmooth
uses approximated degree of
freedom, therefore method=4
will produce fitted values with
degree of freedom only similar to the one provided in df
. The
tolerance level can be set via control
- TOL2
.
Note that the two-dimensional 'ultimate' smoothing with very large
lambda will approach to a surface which is a product of two polynomial
of degree pord[1]
and pord[2]
, respectively. In
particular, when pord=c(2,2)
the 'ultimate' smoothing is a
bi-linear surface over x
and y
.
The argument overdispersion
can be set to TRUE
when
smoothing parameters selection has to account for possible presence of
over(under)dispersion. Mortality data often present overdispersion
also known, in demography, as heterogeneity. Duplicates in insurance
data can lead to overdispersed data, too. Smoothing parameters
selection may be affected by this phenomenon. When
overdispersion=TRUE
, the function uses a penalized
quasi-likelihood method for including an overdisperion parameter
(psi2
) in the fitting procedure. With this approach expected
values are assumed equal to the variance multiplied by the parameter
psi2
. See reference. Note that with overdispersed data both BIC
and AIC might select higher lambdas, leading to smoother
outcomes. When overdispersion=FALSE
(default value) or
method=3
or method=4
, psi2
is estimated after the
smoothing parameters have been employed. Overdispersion parameter
larger (smaller) than 1 may be a sign of overdispersion
(underdispersion).
The control
argument is a list that can supply any of the
following components:
MON
: Logical. If TRUE
tracing information on the
progress of the fitting is produced. Default: FALSE
.
TOL1
: The absolute convergence tolerance. Default: 1e-06.
TOL2
: Difference between two adjacent smoothing parameters in
the (pseudo) grid search, log-scale. Useful only when method
is
equal to 1, 2 or 4. Default: 0.1.
MAX.IT
: The maximum number of iterations. Default: 50.
The arguments MON
, TOL1
and MAX.IT
are kept
during all the (pseudo) grid search when method
is equal to 1,
2 or 4. Function cleversearch
from package
svcm
is employed to speed the grid search.
The function is specifically tailored to smooth mortality data in
one-dimensional setting. In such case the argument x
would be
either the ages and the argument y
the years under study. The
matrix of death counts will be the argument Z
. In a Poisson
regression setting applied to actual death counts the offset
will be the logarithm of the matrix of exposure population. See
example below.
The inner functions work using an arithmetic of arrays defined as Generalized Linear Array Model (GLAM) (see reference). In order to avoid construction of large Kronecker product basis from the large number of B-splines along the axes, the function profits of the special structure of both the data as rectangular array and the model matrix as tensor product. It uses sequence of nested matrix operations and this leads to low storage and high speed computation within the IWLS algorithm. Moreover, the function do not vectorize the whole system keeping the actual two-dimensional array structure within the scoring algorithm.
An object of the class Mort2Dsmooth
with components:
coefficients |
matrix of fitted (penalized) B-splines coefficients. |
residuals |
matrix of deviance residuals. |
fitted.values |
matrix of fitted counts. |
linear.predictor |
matrix of fitted linear predictor. |
df |
effective dimension. |
dev |
Poisson Deviance. |
aic |
Akaike's Information Criterion. |
bic |
Bayesian Information Criterion. |
psi2 |
Overdispersion parameter. |
lambdas |
the selected (given) smoothing parameters. |
call |
the matched call. |
m |
length of argument x . |
n |
length of argument y . |
tolerance |
the used tolerance level. |
lev |
diagonal of the hat-matrix. |
ndx |
the number of internal knots -1 for each axis. |
deg |
degree of the B-splines for each axis. |
pord |
order of difference for each axis. |
x |
vector for the abscissa of data. |
y |
vector for the ordinate of data. |
Z |
matrix of counts response. |
offset |
matrix with an a priori known component. |
w |
matrix of weights. |
Carlo G Camarda
Eilers, P. H. C., I. D. Currie, and M. Durban (2006). Fast and compact smoothing on large multidimensional grids. Computational Statistics & Data Analysis 50, 61-76.
Currie, I. D., M. Durban, and P. H. C. Eilers (2006). Generalized linear array models with applications to multidimentional smoothing. Journal of the Royal Statistical Society. Series B. 68, 259-280.
predict.Mort2Dsmooth
,
plot.Mort2Dsmooth
.
# selected data ages <- 50:100 years <- 1950:2006 death <- selectHMDdata("Japan", "Deaths", "Females", ages = ages, years = years) exposure <- selectHMDdata("Japan", "Exposures", "Females", ages = ages, years = years) # fit with BIC fitBIC <- Mort2Dsmooth(x=ages, y=years, Z=death, offset=log(exposure)) fitBIC summary(fitBIC) # plot age 50 log death rates (1st row) plot(years, log(death[1,]/exposure[1,]), main="Mortality rates, log-scale. Japanese females, age 50, 1950:2006") lines(years, log(fitted(fitBIC)[1,]/exposure[1,]), col=2, lwd=2) # plot over age and years fitted log death rates from fitBIC grid. <- expand.grid(list(ages=ages, years=years)) grid.$lmx <- c(log(fitted(fitBIC)/exposure)) levelplot(lmx ~ years * ages , grid., at=quantile(grid.$lmx, seq(0,1,length=10)), col.regions=rainbow(9)) # about Extra-Poisson variation (overdispersion) # checking the presence of overdispersion fitBIC$psi2 # fitting accounting for overdispersion fitBICover <- Mort2Dsmooth(x=ages, y=years, Z=death, offset=log(exposure), overdispersion=TRUE) # difference in the selected smoothing parameters fitBIC$lambdas;fitBICover$lambdas