modCost {FME} | R Documentation |
Given a solution of a model and observed data, estimates the residuals, and the variable and model costs (sum of squared residuals).
modCost(model, obs, x="time", y=NULL, err=NULL, weight="none", scaleVar=FALSE, cost=NULL, ...)
model |
model output, as generated by the integration routine or the steady-state solver, a matrix or a data.frame, with one column per dependent and independent variable. |
obs |
the observed data, either in long (database) format (name, x, y), a data.frame, or in wide (crosstable, or matrix) format - see details. |
x |
the name of the independent variable; it should be a name
occurring both in the obs and model data structures.
|
y |
either NULL , the name of the column with the dependent
variable values,or an index to the dependent variable values; if NULL
then the observations are assumed to be in crosstable (matrix) format, and
the names of the independent variables are given by the column names of
this matrix.
|
err |
either NULL , or the name of the column with the
error estimates, used to weigh the residuals (see details);
if NULL , then the residuals are not weighed.
|
cost |
if not NULL , the output of a previous call to
modCost ; in this case, the new output will combine both.
|
weight |
only if err =NULL : how to weigh the residuals,
one of "none", "std", "mean", see details.
|
scaleVar |
if TRUE , then the residuals of one observed variable
are scaled respectively to the number of observations (see details).
|
... |
additional arguments passed to R-function approx .
|
This function compares model output with observed data.
It computes
There are three steps:
1. For any observed data point, i, the weighted residuals are estimated as:
res_i=(mod_i-obs_i)/err_i=(mod_i-obs_i)*weight_i
and where Mod_i and Obs_i are the modeled, respectively observed value of data point i.
The weights are equal to 1/error, where the latter can be inputted, one
for each data point by specifying err
as an extra column in the
observed data.
This can only be done when the data input is in long (database) format.
When err
is not inputted, then the weights are specified via argument
weight
which is either:
"none"
, which sets the weight equal to 1 (the default)
"std"
, which sets the weights equal to the standard deviation
of the observed data (can only be used if there is more than 1 data point)
"mean"
, which uses the mean of the absolute value of the
observed data (can only be used if not 0).
Cost_varj=sum(for i=1,n_j) (res_i^2)
where n_j is the number of observations for observed variable j.
3. Finally, the model Cost is estimated as the scaled sum of variable costs:
sum(Cost_varj/scale_j)
and where scale_j allows to scale the variable costs
relative to the number of observations.
This is set by specifying argument scaleVar
. If TRUE
, then the
variable costs are rescaled. The default is NOT to rescale
(i.e. scale_j=1).
The models typically consist of (a system of) differential equations, which are either solved by
deSolve
,
rootSolve
.
The data can be presented in two formats:
name
of the observed variable (always
the FIRST column), the (optional) value of the independent variable
(default column name = "time"), the value of the observation
and
the (optional) value of the error
.
For data presented in this format, the names of the column(s) with the
independent variable (x
) and the name of the column that has the
value of the dependent variable y
must be passed to function
costMod
.
x
).
As an example of both formats consider the data, called Dat
consisting
of two observed variables, called "Obs1" and "Obs2", both containing two
observations, at time 1 and 2:
name | time | val | err |
Obs1 | 1 | 50 | 5 |
Obs1 | 2 | 150 | 15 |
Obs2 | 1 | 1 | 0.1 |
Obs2 | 2 | 2 | 0.2 |
for the long format and
time | Obs1 | Obs2 |
1 | 50 | 1 |
2 | 150 | 2 |
for the crosstab format. Note, that in the latter case it is not possible to provide separate errors per data point.
By calling modCost several consecutive times (using the cost
argument),
it is possible to combine both types of data files.
a list of type modCost
containing:
model |
one value, the model cost, which equals the sum of scaled variable costs (see details). |
minlogp |
one value, -log(model probablity), where it is assumed that
the data are normally distributed, with standard deviation = error .
|
var |
the variable costs, a data.frame with, for each observed variable
the following (see details):
|
residuals |
the data residual, a data.frame with several columns:
|
In the future, it should be possible to have more than one independent variable present. This is not yet implemented, but it should allow e.g. to fit time series of spatially dependent variables.
Karline Soetaert <k.soetaert@nioo.knaw.nl>
## ======================================================================= ## Type 1 input: name, time, value ## ======================================================================= ## Create new data: two observed variables, "a", "b" Data <-data.frame(name=c(rep("a",4),rep("b",4)), time=c(1:4,2:5),val=c(runif(4),1:4)) ## "a nonsense model" Mod <- function (t,y,par) { da <- 0 db <- 1 return(list(c(da,db))) } out <- ode(y=c(a=0.5,b=0.5),times=0:6,func=Mod,parms=NULL) Data # Show out ## The cost function modCost(model=out,obs=Data,y="val") ## The cost function with a data error added Dat2 <- cbind(Data, Err=Data$val*0.1) # error=10 modCost(model=out,obs=Dat2,y="val",err="Err") ## ======================================================================= ## Type 2 input: Matrix format; column names = variable names ## ======================================================================= ## logistic growth model TT <- seq(1,100,2.5) N0 <- 0.1 r <- 0.5 K <- 100 ## analytical solution Ana <- cbind(time=TT,N=K/(1+(K/N0-1)*exp(-r*TT))) plot(TT, Ana[,"N"], ylim=c(0, 120), type="l", col="red", lwd=2, main = "logistic growth", xlab="time", ylab="N") ## numeric solution logist <- function(t, x, parms) { with(as.list(parms), { dx <- r * x[1] * (1 - x[1]/K) list(dx) }) } time <- 0:100 parms <- c(r = r, K = K) x <- c(N = N0) ## Compare several numerical solutions Euler <- ode(x, time, logist, parms, hini=2, method="euler") Rk4 <- ode(x, time, logist, parms, hini=2, method="rk4") Lsoda <- ode(x, time, logist, parms) # lsoda is default method Ana2 <- cbind(time=time,N=K/(1+(K/N0-1)*exp(-r*time))) ## the SSR and residuals with respect to the "data" cEuler <- modCost(Euler, Ana)$model cRk4 <- modCost(Rk4 , Ana)$model cLsoda <- modCost(Lsoda, Ana)$model cAna <- modCost(Ana2 , Ana)$model compare <- data.frame(method=c("euler", "rk4", "lsoda", "Ana"), cost=c(cEuler, cRk4, cLsoda, cAna)) points(Euler,col="red") points(Rk4,col="blue") legend("bottomright", c("exact","euler","rk4"), pch=c(NA,1,1), col=c("red","red","blue"),lty=c(1,NA,NA)) legend("right", ncol=2, title="SSR", legend=c(as.character(compare[,1]), format(compare[,2], digits=2))) compare ## ======================================================================= ## Now suppose we do not know K and r and they are to be fitted... ## The "observations" are the analytical solution ## ======================================================================= ## Run the model with initial guess: K=10, r=2 parms["K"]<-10 parms["r"]<-2 init <- ode(x, time, logist, parms) ## show results, compared with "observations" plot(TT, Ana[,"N"], ylim=c(0, 120), type="p", col="red", pch=16,cex=2, main = "logistic growth", xlab="time", ylab="N") lines (init ,lwd=2,col="green") ## FITTING algorithm uses modFit ## First define the objective function (model cost) to be minimised ## more general: using modFit Cost <- function(P) { parms["K"]<-P[1] parms["r"]<-P[2] out <- ode(x, time, logist, parms) return(modCost(out, Ana)) } (Fit<-modFit(p=c(K=10,r=2),f=Cost)) summary(Fit) ## run model with the optimized value: parms[c("K","r")]<-Fit$par fitted <- ode(x, time, logist, parms) lines (fitted,lwd=2,col="blue") legend("right",c("initial", "fitted"), col=c("green", "blue"), lwd=2) Cost(Fit$par)