modCost {FME}R Documentation

Calculates the Discrepancy of a Model Solution with Observations

Description

Given a solution of a model and observed data, estimates the residuals, and the variable and model costs (sum of squared residuals).

Usage

modCost(model, obs, x="time", y=NULL, err=NULL,
  weight="none", scaleVar=FALSE, cost=NULL,  ...)

Arguments

model model output, as generated by the integration routine or the steady-state solver, a matrix or a data.frame, with one column per dependent and independent variable.
obs the observed data, either in long (database) format (name, x, y), a data.frame, or in wide (crosstable, or matrix) format - see details.
x the name of the independent variable; it should be a name occurring both in the obs and model data structures.
y either NULL, the name of the column with the dependent variable values,or an index to the dependent variable values; if NULL then the observations are assumed to be in crosstable (matrix) format, and the names of the independent variables are given by the column names of this matrix.
err either NULL, or the name of the column with the error estimates, used to weigh the residuals (see details); if NULL, then the residuals are not weighed.
cost if not NULL, the output of a previous call to modCost; in this case, the new output will combine both.
weight only if err=NULL: how to weigh the residuals, one of "none", "std", "mean", see details.
scaleVar if TRUE, then the residuals of one observed variable are scaled respectively to the number of observations (see details).
... additional arguments passed to R-function approx.

Details

This function compares model output with observed data.

It computes

  1. the weighted residuals, one for each data point.
  2. the variable costs, i.e. the sum of squared weight residuals per variable.
  3. the model cost, the scaled sum of variable costs .

There are three steps:

1. For any observed data point, i, the weighted residuals are estimated as:

res_i=(mod_i-obs_i)/err_i=(mod_i-obs_i)*weight_i

and where Mod_i and Obs_i are the modeled, respectively observed value of data point i.

The weights are equal to 1/error, where the latter can be inputted, one for each data point by specifying err as an extra column in the observed data.

This can only be done when the data input is in long (database) format.

When err is not inputted, then the weights are specified via argument weight which is either:

2. Then for each observed variable, j, a variable cost is estimated as the sum of squared weighted residuals for this variable:

Cost_varj=sum(for i=1,n_j) (res_i^2)

where n_j is the number of observations for observed variable j.

3. Finally, the model Cost is estimated as the scaled sum of variable costs:

sum(Cost_varj/scale_j)

and where scale_j allows to scale the variable costs relative to the number of observations. This is set by specifying argument scaleVar. If TRUE, then the variable costs are rescaled. The default is NOT to rescale (i.e. scale_j=1).

The models typically consist of (a system of) differential equations, which are either solved by

The data can be presented in two formats:

As an example of both formats consider the data, called Dat consisting of two observed variables, called "Obs1" and "Obs2", both containing two observations, at time 1 and 2:
name time val err
Obs1 1 50 5
Obs1 2 150 15
Obs2 1 1 0.1
Obs2 2 2 0.2

for the long format and

time Obs1 Obs2
1 50 1
2 150 2

for the crosstab format. Note, that in the latter case it is not possible to provide separate errors per data point.

By calling modCost several consecutive times (using the cost argument), it is possible to combine both types of data files.

Value

a list of type modCost containing:

model one value, the model cost, which equals the sum of scaled variable costs (see details).
minlogp one value, -log(model probablity), where it is assumed that the data are normally distributed, with standard deviation = error.
var the variable costs, a data.frame with, for each observed variable the following (see details):
  • name, the name of the observed variable.
  • scale, the scale-factor used to weigh the variable cost, either 1 or 1/(number observations), defaults to 1.
  • N, the number of data points per observed variable.
  • SSR.unweighted, the sum of squared residuals per observed variable, unweighted.
  • SSR, the sum of weighted squared residuals per observed variable(see details).
residuals the data residual, a data.frame with several columns:
  • name, the name of the observed variable.
  • x, the value of the independent variable (if present).
  • obs, the observed variable value.
  • mod, the corresponding modeled value.
  • weight, the factor used to weigh the residuals, 1/error, defaults to 1.
  • res, the weighted residuals between model and observations (mod-obs)*weight.
  • res.unweighted, the residuals between model and observations (mod-obs).

Note

In the future, it should be possible to have more than one independent variable present. This is not yet implemented, but it should allow e.g. to fit time series of spatially dependent variables.

Author(s)

Karline Soetaert <k.soetaert@nioo.knaw.nl>

Examples


## =======================================================================
## Type 1 input:  name, time, value
## =======================================================================

## Create new data: two observed variables, "a", "b"
Data  <-data.frame(name=c(rep("a",4),rep("b",4)),
                   time=c(1:4,2:5),val=c(runif(4),1:4))

## "a nonsense model"
Mod <- function (t,y,par) {
 da <- 0
 db <- 1
 return(list(c(da,db)))
}

out <- ode(y=c(a=0.5,b=0.5),times=0:6,func=Mod,parms=NULL)

Data   # Show
out

## The cost function
modCost(model=out,obs=Data,y="val")

## The cost function with a data error added
Dat2 <- cbind(Data, Err=Data$val*0.1)  # error=10
modCost(model=out,obs=Dat2,y="val",err="Err")

## =======================================================================
## Type 2 input:  Matrix format; column names = variable names
## =======================================================================

## logistic growth model
TT    <- seq(1,100,2.5)
N0    <- 0.1
r     <- 0.5
K     <- 100

## analytical solution
Ana <- cbind(time=TT,N=K/(1+(K/N0-1)*exp(-r*TT)))

plot(TT, Ana[,"N"], ylim=c(0, 120),  type="l", col="red", lwd=2,
     main = "logistic growth", xlab="time", ylab="N")

## numeric solution
logist <- function(t, x, parms) {
  with(as.list(parms), {
    dx <- r * x[1] * (1 - x[1]/K)
    list(dx)
  })
}

time  <- 0:100
parms <- c(r = r, K = K)
x     <- c(N = N0)

## Compare several numerical solutions
Euler <- ode(x, time, logist, parms, hini=2, method="euler")
Rk4   <- ode(x, time, logist, parms, hini=2, method="rk4")
Lsoda <- ode(x, time, logist, parms) # lsoda is default method
Ana2  <- cbind(time=time,N=K/(1+(K/N0-1)*exp(-r*time)))

## the SSR and residuals with respect to the "data"
cEuler <- modCost(Euler, Ana)$model
cRk4   <- modCost(Rk4  , Ana)$model
cLsoda <- modCost(Lsoda, Ana)$model
cAna   <- modCost(Ana2 , Ana)$model
compare <- data.frame(method=c("euler", "rk4", "lsoda", "Ana"),
                      cost=c(cEuler, cRk4, cLsoda, cAna))

points(Euler,col="red")
points(Rk4,col="blue")

legend("bottomright", c("exact","euler","rk4"), pch=c(NA,1,1),
      col=c("red","red","blue"),lty=c(1,NA,NA))
legend("right", ncol=2, title="SSR",
      legend=c(as.character(compare[,1]), format(compare[,2], digits=2)))

compare

## =======================================================================
## Now suppose we do not know K and r and they are to be fitted...
## The "observations" are the analytical solution
## =======================================================================

## Run the model with initial guess: K=10, r=2
parms["K"]<-10
parms["r"]<-2
init <-  ode(x, time, logist, parms)

## show results, compared with "observations"
plot(TT, Ana[,"N"], ylim=c(0, 120),  type="p", col="red", pch=16,cex=2,
     main = "logistic growth", xlab="time", ylab="N")

lines (init  ,lwd=2,col="green")

## FITTING algorithm uses modFit
## First define the objective function (model cost) to be minimised

## more general: using modFit
Cost <- function(P) {
 parms["K"]<-P[1]
 parms["r"]<-P[2]
 out <- ode(x, time, logist, parms)
 return(modCost(out, Ana))
}
(Fit<-modFit(p=c(K=10,r=2),f=Cost))

summary(Fit)

## run model with the optimized value:
parms[c("K","r")]<-Fit$par
fitted <-  ode(x, time, logist, parms)

lines (fitted,lwd=2,col="blue")

legend("right",c("initial", "fitted"), col=c("green", "blue"), lwd=2)

Cost(Fit$par)


[Package FME version 1.0 Index]