8/20/03

Climate Model Data Documentation Project

In the Climate Group we rely extensively on the data produced from general circulation models (GCMs) to answer fundamental science questions. Analysis of the model data is then used to support the scientific conclusions in our publications. This has raised many issues for us regarding archiving, accessibility and documentation of our results. This in turn leads to the fundamental issue of the integrity and reproducibility of our modeling experiments. Typically the published data of modelers consists of figures, processed data (such correlations, EOF patterns and other statistical analyses) and tables of area averaged, depth integrated, time smoothed data. Due primarily to size constraints, the `raw' computer output is not available for further analysis by scientists who wish to further explore the results. Furthermore, the source code for the models themselves is difficult to duplicate exactly. Even local scientists (same group at the same institution) who repeat a model run frequently report different results due to some unknown combination of code evolution, change in personnel, migration of computer platforms and data storage, or lack of proper documentation of parameters. This disturbing state of affairs wastes computer and personnel resources and fundamentally compromises the scientific effort itself.

We feel that computer model integrity, particularly of the large GCM type, is a fundamental challenge to be addressed in the evolving "Cyberinfrastructure" initiative. To this end, we have recently piloted a comprehensive procedure for ensuring the integrity of our local GCM runs. This is web-based and is incorporated into Benno Blumenthal's Climate Data Library. We made this choice because it offers the flexibility of serving data over the internet through the Distributed Oceanographic Data System (DODS) to the whole community as well as providing its own graphical interface for data analysis. In addition, all necessary documentation, source code and initialization and forcing data can be archived in one place. We are developing a tool for automatic submission and documentation of all GCM runs referred to in published form. The URL for this project is

http://ocp.ldeo.columbia.edu/climategroup/datadoc/

For an example of how an individual run is documented, click on the link `tav1_spin10'sample run under MODELS/LOAM. In particular, the link `dataset documentation' provides detailed instructions to download the source code, executable, all necessary datasets and control files to reproduce the model output, which can also be analyzed online or downloaded directly.

Many of our prior model results have already been compromised over the years. The creation and maintenance of this Climate Model Data Documentation Project is crucial to correct and set a new standard for this type of research. All new published modeling results in our group will be added to this project. As time allows, existing projects will be brought into compliance and also added to this effort.