P.J. Rayner, __A.M. Michalak__ and F. Chevallier

### SUMMARY

This paper provides an overview of computational data assimilation tools as they have been applied for furthering understanding of biogeochemical cycles. Such tools have been increasingly used both to allow for multiple data streams to be considered and to reduce the computational cost of analyses. Many of the widely used methods are an adaptation of a general approach for data assimilation. In this paper we describe the general theory, introduce consistent notation, and describe the general approach to solving data assimilation problems through a simple example. We then show how solution methods are specific cases of the underlying Bayesian formalisms and show the historical development of the field through selected application examples.

**Figure: **Illustration of Bayesian inference for a system with one target variable and one observation. Panel (a) shows the joint probability distribution for target variable (x-axis) and measurement (y-axis). The light-blue rectangle represents the prior knowledge of the target variable (uniformly distributed between −0.2 and 0.2). The light-red rectangle represents the knowledge of the true value of the measurement (uniformly distributed between 0.8 and 1.2). The green rectangle represents the state of knowledge of the observation operator. The observation operator is a simple one-to-one mapping represented by the heavy green line. The black triangle (the intersection of the three PDFs) represents the solution as the joint PDF. Panel (b) shows the PDF for target variables obtained by projecting the black triangle from (a) onto the x-axis. Figure modified from Rayner (2010).

### ABSTRACT

This article lays out the fundamentals of data as-similation as used in biogeochemistry. It demonstrates thatall of the methods in widespread use within the field are spe-cial cases of the underlying Bayesian formalism. Methodsdiffer in the assumptions they make and information theyprovide on the probability distributions used in Bayesian cal-culations. It thus provides a basis for comparison and choiceamong these methods. It also provides a standardised nota-tion for the various quantities used in the field.