Skip to content

Optimal interpolation

Cristian Lussana edited this page Nov 21, 2024 · 57 revisions

Optimal interpolation (OI) can be used to create gridded analysis by combining a gridded background field (e.g. from an NWP model) with observations at points. OI weights the background and the observations based on the uncertainty of each data source.

Gridpp's OI method does not perform any quality control (QC) on the observations. Check out Titanlib, which contains suitable QC functions for this purpose. For a complete step-by-step guide on creating gridded analysis with titanlib and gridpp, checkout https://tnipen.github.io/2020/06/15/titanlib-gridpp.html.

The OI function signature looks like this:

gridpp.optimal_interpolation(bgrid, background, points, pobs, pratios,
   pbackground, structure, max_points) 

where bgrid is the background grid, background is the 2D background field, points is the locations of the observations, pobs is a vector of observations, pratios is ratio of observation error variance to background error variance, pbackground is the background interpolated to the observation points, structure is a structure function object, and max_points specifies the maximum number of observations to use for a gridpoint.

Structure function

The structure function specifies how analysis increments are spread spatially. Gridpp supports several structure functions, but typically a gaussian function is used (Barnes 1973):

structure = gridpp.BarnesStructure(100000, 200)

The first argument is the horizontal decorrelation length scale (in meters) and the second is the vertical decorrelation length scale (in meters). Optionally, a third argument specifying the decorrelation length across land area fraction (units 1), and a fourth argument specifying the maximum length that an observations will have an effect (in meters, also called the localization radius). If the third is unspecified, land area fractions are ignored in the structure function., If the fourth is unspecified, 3.64 times the horizontal scale is used.

A Cressman structure function is also available, where correlations decrease linearly to 0 at the length scales:

structure = gridpp.CressmanStructure(100000, 200)

Spatially varying structure function

The length scales in the Barnes structure function defined above is static. However in some cases it can be useful to allow the decorrelation lengths to vary with the station density. The BarnesStructure class can also be initialized with a 2D field of decorrelation lengths. The grid you chose to compute the lengths for does not have to be the same grid you will use in the OI. The OI function will use the nearest neighbour in the decorrelation length grid when determining what decorrelation length to use.

h = gridpp.distance(points, grid, 5)
v = 200 * np.ones(h.shape)
w = np.zeros(h.shape)
structure = gridpp.BarnesStructure(grid, h, v, w)

where points are a Points object with the observations, and grid is the desired grid to compute the decorrelation lengths for. In this case, h is the distance to the 5th nearest observation point.

Using a spatially varying structure function is tricky, because it often creates artifacts in the final OI analysis. This is especially true if the decorrelation lengths vary quickly in space. Consider to smooth the field, for example using a neighbourhood filter:

h = gridpp.neighbourhood(h, 5, gridpp.Mean)

Observation operator

Applying the observation operator to the background field results in a vector of values at the observation points. There are several methods to do this, but a typical one is nearest neighbour.

pbackground = gridpp.nearest(bgrid, points, background)

Any of gridpp's downscalers can be used for this, or the users can compute the values using their own preferred method.

Error variances

OI only requires the ratio of the error variance of the observations and background. Since observations are more trustworthy than the background, values less than 1 are typically used. The error variance ratios are passed in as vectors, meaning that different observations can be assigned different ratios.

Maximum locations

The computation time for OI is dependent on how many observations are available within the localization radius. If the domain has areas with very high density of stations, the calculations in this region can dominate the total computation time. max_points limits the number of observations that can be used for any particular gridpoint. If more observations are available, only the max_points stations with the highest correlation are used.

Example

pbackground = gridpp.bilinear(igrid, points, temp_analysis[:, :, 0])
pratios = np.full(points.size(), 0.1)
h = 100000
v = 200
structure = gridpp.BarnesStructure(h, v)
max_points = 50
analysis = gridpp.optimal_interpolation(igrid, temp_analysis[:, :, 0], points,
        temp_obs, pratios, pbackground, structure, max_points)

Transformation

OI can also be applied to transformed variables, by using the gridpp.optimal_interpolation_transform function. This requires a transform object as well as the specification of the background and observation standard errors (i.e., instead of the error variance ratio as in the untransformed OI). Currently a Box-Cox and logarithmic transformations are available:

transform = gridpp.BoxCox(0.1)
transform = gridpp.Log()

The Box-Cox transformation takes an input argument that specifies the exponent in the transformation.

Here is an example:

threshold = 0.1
transform = gridpp.BoxCox(threshold)
bsigma = 2
psigma = np.full(points.size(), 0.5)
analysis = gridpp.optimal_interpolation_transform(igrid, temp_analysis[:, :, 0], bsigma, points,
        temp_obs, psigma, pbackground, structure, max_points, transform)

Ensemble mode

Gridpp also supports an Ensemble-based Statistical Interpolation (EnSI; Lussana et al. 2019) scheme that uses spatial structure information from an ensemble of NWP model runs. The gridpp.optimal_interpolation_ensi function can be used for this. It takes a 3D input field (with the ensemble dimension being last) and produces a 3D analysis field. Here, only the observation standard error is needed, since the error variance of the background is deduced from the spread of the ensemble.

pbackground = np.zeros(temp_analysis.shape)
for e in range(temp_analysis.shape[2]):
      pbackground[:, :, e] = gridpp.bilinear(igrid, points, temp_analysis[:, :, e])
psigma = np.full(points.size(), 0.5)
analysis = gridpp.optimal_interpolation_ensi(igrid, temp_analysis, points,
        temp_obs, psigma, pbackground, structure, max_points)

Cross validation experiments

In verification experiments, it is unnecessary to run the OI on a full high resolution grid. Instead one wishes to only output the analysis at points where the analysis will be evaluated at. Gridpp provides OI functions that perform OI only for a set of output points.

analysis = gridpp.optimal_interpolation(points, pbackground, points,
        temp_obs, pratios, pbackground, structure, max_points)

Such a setup will generally produce very accurate analyses at the observation points since the same observations have been used as input. Gridpp can compute the analysis in a "leave-one-out" cross-validation fashion. For each output point, the observation at that point is left out of the interpolation. This can be done by defining a cross-validation structure function as follows:

structure = gridpp.Barnes
structure_cv = gridpp.CrossValidation(structure, 2000)

The structure_cv object can then be passed into the OI as before. This will force the correlation of all observations within a 2000 m radius of an output point to be 0, effectively discarding the observation. This approach is highly efficient for finding the optimal parameters in the OI since the computational cost is very low.

For an in-depth discussion of using cross validation with gridpp, check out SMHI Gridded climatology, where the authors have done extensive work on optimized OI parameters.

Ensemble multi mode

Optimal_interpolation_ensi_multi_ebe(...) This function creates an analysis ensemble for a variable at a specific time, using inputs from the ensemble background and observations of the same or different variables and times. Users can define dynamic correlation functions through two additional background ensembles. The function supports iterative corrections to the analysis, incorporating multiple definitions of the correlation functions, multiple information sources, or both. Each ensemble member is adjusted iteratively or "ensemble member by ensemble member" (ebe).

If we define: L as the number of grid points; S as the number of observations; E as the number of ensemble members. The function returns:

  • 2D vector of analised values (L, E)

The list of inputs follows:

  • bpoints: Points of background field (L)
  • bratios: 1D vector (L) representing the ratio of background error standard deviation at grid points to that at observation points. The background at grid points is the value being updated, while the background at station points shares the units and time of the observations, which may differ from the background at grid points. This vector contains coefficients (0-1) that adjust the analysis at grid points, accounting for differences in units and variability between the innovations (observation minus background) and the background at grid points. For example, if trusting the ensemble spread, bratios can be set as the ratio between the ensemble spread of the background to be updated and that used to compute the innovations. If the ensemble spread is not trusted at specific times or grid points, the bratios can be based on a typical expected ratio of spreads from multiple ensemble background realizations.
  • background: 2D vector (L, E) representing the background values at grid points to be updated.
  • background_corr: 2D vector (L, E) representing the background values used to compute dynamic correlations.
  • obs_points: Observation points (S = num. observations)
  • pobs: 2D vector of perturbed observations (S, E)
  • pratios: 1D vector (S) representing the ratio of observation to background error variance. These coefficients (0-1) indicate the relative trust in observations versus the background. A value of 1 means equal trust in both, while values close to 0 indicate greater trust in the observations. For example, a value of 0.1 means the observations are trusted 10 times more than the background.
  • pbackground: 2D vector (S, E) representing the background values at observation points used to compute innovations.
  • pbackground_corr: 2D vector (S, E) representing the background values at observation points used to compute dynamic correlations.
  • structure: Structure function for the localization function
  • max_points: Maximum number of observations to use inside localization zone; Use 0 to disable
  • allow_extrapolation Allow EnSI to extrapolate increments outside increments at observations

Optimal_interpolation_ensi_multi_ebesc(...) This function is similar to Optimal_interpolation_ensi_multi_ebe(...), but uses analytical functions to define correlation functions instead of ensembles. Unlike the localization functions in the previous method, the structure functions here represent static correlations.

Optimal_interpolation_ensi_multi_utem(...) This function extends Optimal_interpolation_ensi() for iterative calls, similar to Optimal_interpolation_ensi_multi_ebe(...). The key difference is in generating the analysis ensemble: first, the ensemble mean is updated, followed by generating ensemble members based on background correlation functions and ensemble spread. The input parameters are the same as Optimal_interpolation_ensi_multi_ebe(...), but observations are not perturbed here.

Clone this wiki locally