Beginner's Guide to CUF (Construction Under Forever):

( Complex Unstable Fragile )

Introduction

CUF is a C and Fortran library which provides an efficient scheme for storage and retrieval of multidimensional data. This is an attempt to overcome the well-known limitations of popular NetCDF and HDF data formats. In CUF the following is possible:

Basic Structure of a CUF Data File.

Variables and Dimensions: A CUF file is basically a collection of named dimensional variables (arrays). Each variable should be based on previously defined named dimensions.

Dimensions could be of fixed or expandable type (allowed to grow like in time series). CUF will handle the situation then several variables are sharing expandable dimension(s): in this case all the variables will be extended automatically.

A number of dimensions (even expandable) per variable is not limited. Number of different variables per file is also unlimited as well is the number of simultaneously opened files (which still could be bounded by OS restrictions)
Dimensions, variables, and attributes can be added to an existed CUF data file.

Sparseness and Compression: A prominent feature of a CUF file is that a variable may be "compressed" in case it has natural sparseness (missing data). Such variables will be internally compressed which results in significant disk space saving and reduced network traffic. CUF supports multiple missing data flags for a variable.

Only a contiguous chunk of fastest dimensions is allowed to be compressed (for example, for a variable V(X,Y,Z,T) the allowed compression chunks are: [X], [XY], [XYZ], and [XYZT] ). For a better overall performance one should set the type of a compression chunk depending on typical access request.

CUF gives a choice of constant or variable compression. For example, a variable V(X,Y,T) may have all [XY] chunks with the same sparseness ("constant" compression) or with the different missing data patterns for each T point ("variable" compression).

The compression schemes may be shared between several variables in case their sparseness structures are the same. This results in even better total compression rate. Currently only the constant compression is supported for variables with expandable dimensions.

Attributes: Each variable and CUF file as a whole may have an unlimited number of named attributes (see attributes' types below). These elements can be used to supply an additional information about CUF file and CUF variables. It is also a way to supply a hypermedia documentation along with your data (inside your portable data file!). An attribute may be an array i.e. consist of many elements of the same type.
An attribute's name and data can be changed, or an entire attribute can be deleted from a CUF file.

Data Types and Portability: Allowed data types for variables are: 1/2/4/8-bytes INTEGERs, 4/8-bytes REALs, and N-bytes string segments (ex. FORTRAN CHARACTER arrays).
For the attributes you can also use C-type strings (null terminated), arrays of C-strings (char **), and files ( the binary data which can be read either directly from a file or from memory and while retrieved - written into a file, memory or piped to <stdout>.

CUF uses machine independent data representation and therefore can be read/written on a variety platforms, supporting XDR library calls (SGI,SUN,HP,DEC,IBM,PC,MAC,CRAY...).

User Interfaces: C and FORTRAN CUF libraries provide for an extensive control of CUF objects implemented via formal CUF system of IDs or explicitly via object names for all major CUF elements: Dimensions, Variables and Attributes.


Top 10 CUF Functions

idf = cuf_open(file, key) Open a CUF file
idd = cuf_dfdim(idf, name, value) Define a dimension
idv = cuf_dfvar(idf, name, type) Start a var's definition
cuf_set_dim(name) Set var's dimension
cuf_set_comp(name) Set var's compression
cuf_enddf() Close a var's definition
cuf_ptvar(idf,idv,vname,ista,icnt,data) Write var's values
cuf_gtvar(idf, idv, vname, ista, icnt, data) Read var's values
ida = cuf_set_attr(type, name, n, aval) Set an attribute
ida = cuf_get_attr( idf, idv, vname, ida, aname, n, val) Get an attribute's data

CUF file dump utility

There is dcuf utility included with the CUF package. It provides for an ASCII dump of CUF file header and/or data. The rules for this simple program are:
 

Reading a CUF file

Let's assume that you need the temperature data from a CUF file GCMoutput.cuf which has a variable temp among others (like salt, vel, dens). We assume that you know the dimensions of temp (NX = 360 and NY = 180) obtained by using CUF-dump utility dcuf.

The fragment of a Fortran code which does this will be the following:



     include '/usr/local/include/fcuf.h'
     dimension temp(360,180)
     dimension ista(2)/2*1/, icnt(2)/2*0/

c.....................open a CUF-file 
     idf = cuf_open ('GCMoutout.cdf', CUF_READ)  
c.....................read the data for temperature variable into array temp
     call cuf_gtvar (idf, 0, 'temp', ista, icnt, temp)  
c.....................close the CUF-file 
     call cuf_close (idf) 

Here is a more advanced FORTRAN example of reading CUF file:


c...Include the definitions of CUF constants: include '/usr/local/include/fcuf.h' dimension id_dim(10), ns_dim(10) c...Opens a file, returns a file ID idf = cuf_open('database.cuf', CUF_READ) c...Inquire the number of dimensions and variables in the file call cuf_finfo(idf, ndim, nvar) c...For each variable: do idv = 1, nvar c......Get the name, number of dimensions, type, number of attributes: call cuf_vinfo(idf, idv, vname(idv), ndim, type, nattr) c......Obtain the dimensions' IDs and sizes: call cuf_getvdim(idf, idv, id_dim, ns_dim) c......For each dimension: do i = 1, ndim c.........Get dimension's name type and size: call cuf_dinfo(idf, id_dim(i), name, type, val) enddo enddo

Writing a CUF file

      include "/usr/local/include/fcuf.h"
      parameter (IYBA = 1856, NTMA = 1632)
      integer   nobs(360,180)
      integer*1 ipack(360,180)
      integer*1 imiss/0/
      integer   ista(4)/4*1/, icnt(4)/4*1/
      real      x(360), y(180)
      integer   itm(12), ity(1992-1856+1)

      idf = cuf_open("coverage.cuf", CUF_OWRT)
      call cuf_dfdim(idf, "NX",  360)
      call cuf_dfdim(idf, "NY",  180)
      call cuf_dfdim(idf, "NTM", 12)
      call cuf_dfdim(idf, "NTY", 1992-1856+1)

      call cuf_dfvar(idf, "X", CUF_DF4)
        call cuf_set_dim("NX")

      call cuf_dfvar(idf, "Y", CUF_DF4)
        call cuf_set_dim("NY")

      call cuf_dfvar(idf, "TM", CUF_DI4)
        call cuf_set_dim("NTM")

      call cuf_dfvar(idf, "TY", CUF_DI4)
        call cuf_set_dim("NTY")

      call cuf_dfvar(idf, "nobs", CUF_DI1)
        call cuf_set_dim ("NX")
        call cuf_set_dim ("NY")
        call cuf_set_dim ("NTM")
        call cuf_set_dim ("NTY")
        call cuf_set_comp (CUF_VARY+CUF_BEST, 2)
        call cuf_set_miss (1, imiss)
      call cuf_enddf()

C.....Writing Grids:
      do i = 1, 360
         x(i) = -179.5 + float(i-1)
      enddo
      call cuf_ptvar(idf, 0, "X", 1, 0, x)

      do i = 1, 180
         y(i) = -89.5 + float(i-1)
      enddo
      call cuf_ptvar(idf, 0, "Y", 1, 0, y)

      do i = 1, 12
         itm(i) = i
      enddo
      call cuf_ptvar(idf, 0, "TM", 1, 0, itm)

      do i = 1990, 1991
         ity(i-1990+1) = i
      enddo 
      call cuf_ptvar(idf, 0, "TY", 1, 0, ity)

C.....Writing DATA:
      do iy = 1, 1992-1856+1
         ista(4) = iy
         do im = 1, 12
            call getobs(150, 1856+iy, im, nobs)
            call pack2i1(nobs, 360, 180, ipack)
            ista(3) = im
            call cuf_ptvar(idf, 0, "nobs", ista, icnt, ipack)
         enddo
      enddo

      call cuf_close(idf)
      
      STOP
      END

Compilation and Linking

In your Fortran code use:
include '/usr/local/include/fcuf.h'

In your C code use:
#include "/usr/local/include/cuf.h"

Within the LDEO Climate Group (on rosie, lola, ariel, fox, breadbox etc. ) you can just add -lsenq option during the linking:
f77 foo.f -o foo -lsenq
("-lsenq" library is available in mips1, mips1coff, mips2 and mips4 flavors).

On other systems one should be using:
f77 foo.f -o foo -lcuf
or:
cc foo.f -o foo -lcuf
( You can download the include files and the CUF library -->here)


FORTRAN-interfaces to CUF-Library

(see also C-interface)

Some basic agreements in CUF are:

INCLUDE FILE: "/usr/local/include/fcuf.h"
Basic File Manipulation:
---------------------------------------------------------------------------
idf = cuf_open(file, key) - opens a file, 
         returns a file ID.

  file - /input/  character string: file name
  key  - /input/ CUF_READ - read only, 
		 CUF_UPDT - update or create
	         CUF_OWRT - always overwrite 
	         CUF_TEST - returns 1 if file is in CUF 
                                   format, 0 otherwise

call cuf_close(idf) - close a CUF file

call cuf_sync(idf)  - flush data to insure a CUF file integrity	
	
integer*4 cuf_finfo(idf, ndim, nvar, natt) - provides 
        for the file info
  ndim /output/ - number of dimensions
  nvar /output/ - number of variables
  natt /output/ - number of GLOBAL attributes

Dimensions:
---------------------------------------------------------------------------
integer*4 function cuf_dfdim(idf, name, val) - defines a
                 dimension;returns a dimension id
   idf  /input/: file id from cuf_open(), 
                 or 0 for current file
   name /input/: dimension name /character*(*)/
   val  /input/: value (integer*4) (if val<=0, dimension will
                 be unlimited, with an initial size of abs(val) )

integer*4 function cuf_gtdim(idf, name, val) - returns an ID 
                 and the current value for a dimension.

bool cuf_dinfo(idf, id, val, type, name)  
     for a given id, which is an number from 1 to ndim 
     (*see cuf_finfo()) returns value, 
     type and name of dimension. 
     Type may me CUF_FIX or CUF_VARY

Variables:
---------------------------------------------------------------------------
integer*4 cuf_dfvar(idf, name, type) - opens a
   definition of a variable. Specifies the name and 
   a type (*see Data Types).
   May be followed by "cuf_set_..." calls and MUST be 
   embraced with a cuf_enddf() or another cuf_dfvar() call.

cuf_set_dim(name) - set a dimension for the variable
   name - name of a dimension. Order of cuf_set_dim()'s is
   important: the first appears to be the fastest changing
   dimension, second - next fastest etc.

cuf_copy_dims(name) - copies the dimensions from an already
   defined variable
   name - name of a variable which is a prototype for a current variable, 
          and all dimensions will be duplicated from a variable "name".
	
cuf_set_comp (type, ncdim) - set the compression type for
   a variable with possible missing data 

   type - CUF_FIX or CUF_VARY for constant and 
   varying compression. When type = CUF_FIX it is assumed 
   that data consist of chunks with the same pattern of missing 
   values throughout the whole range of higher dimensions.
   NOTE: You can enhance the compression (vs. some performance)
   by using: type = CUF_FIX|CUF_VARY + CUF_BEST.

   ncdim - number of dimensions in a compression chunk (counted from 
      the lowest: changing first)



cuf_set_miss(nmiss, vmiss) - specify the missing data values.
   nmiss   - number of different possible missing data flags
   vmiss() - array of values for missing data flags
   NOTE: this call will automatically set an "missing_value" 
   attribute for a given variable equal to the first value of vmiss[].

cuf_copy_comp (name) - assumes that the variable has the
      same pattern of missing data as the referenced prototype.
   name - name of a variable which will be used as a compression 
      prototype for a current variable (compression parameters 
      will be duplicated from the variable "name", the 
      "missing_value" attribute will be set unless the variables 
      are matched, otherwise user should explicitly call cuf_set_miss()).

cuf_ptvar(idf, idv, vname, ista, icnt, data) - write variable's data
   idf - opened file ID (or 0, if current)
   idv - variable's ID or 0 if name is used (*see next) 
   name - a string with the variable's name tag (ignored if idv != 0)
   ista(ndim) - an array of integers marking the beginning of 
        hypercorner of the data chunk. Value 1 corresponds to a first 
        element. Values for all dimensions should be present.
   icnt(ndim) - an array of integers which defines how many 
        rows/columns are in each dimension of the data hyperslice. 
        Value of 0 defaults to the whole dimension. (BUT: for an expandable
        dimension an explicit (!=0) value should me provided !)
   data() - data array
          
cuf_gtvar(idf, idv, vname, ista, icnt, data) - read variable's data
   (*see cuf_ptvar() )

integer*4 cuf_vinfo (idp, idv, name, type, ndim, natt)
   idf  /input/  - opened file ID (or 0, if current)
   idv  /input/  - variable's ID. If =0 then "name" is used
   name /input/output/ - a string with the variable's name tag. 
        Will be filled with the variable name if (idv != 0) 
        unless you explicitly asked not to by using 
        CUF_NULL in place of a name.
   ndim /output/ - number of dimensions for a variable    
   natt /output/ - number of attributes for a variable    

integer*4 cuf_gtvdim (idf, idv, name, did, dsz)
   did(ndim) /output/ - an array ( see cuf_vinfo() about ndim) 
        filled with the dimensions' IDs for that variable.
   dsz(ndim) /output/ - an array filled with the 
        current dimensions' sizes.

Attributes:
---------------------------------------------------------------------------
integer*4 cuf_set_attr(type, name, n, aval)
   this function (which returns an attribute's ID) if used 
   inside cuf_dfvar(), cuf_enddf context will 
   define an attribute for a current variable. Outside of a variable
   definition this function would set a GLOBAL (file) attribute.

   type  - type (*see Types)
   aname - a character tag to identify the attribute
   n     - a number of elements in attribute (generally it is a
           one-dimensional array), but n CAN BE 0. 
   aval() - values of attribute (array)

   NOTE: CUF expects C-like strings for type=CUF_DSTR
   and a C-like array of strings's pointers if n > 1, therefore
   for a Fortran-like CHARACTER STRING attributes use
   cuf_set_cattr() instead.   

   NOTE: A special type of attribute is the "file" type: 
   CUF_DFILE. You  can use it for storing documentation 
   accompanying the data (HTML, GIFs, AUX, .ps txt or any other binary
   information). While setting this kind of attribute the value
   of a parameter n has the following meanings: 
      n  = 0 : the file data will be read from the path=aval
      n != 0 : the n bytes of data will be read from the pointer aval 
   (See also a NOTE for cuf_get_attr() )   

integer*4 cuf_put_attr(idf, idv, vname, ida, aname, type, n, aval)
   Set or change variable and GLOBAL attributes values outside "defvar"
   context. Returns an attribute's ID.
   ida  /input/ - existed(!) attribute's ID, if =0 then use aname.

   In order to set GLOBAL (file) attribute call with:
      idv=CUF_GLOBAL , vname=CUF_NULL 
        
   NOTE: For a Fortran CHARACTER STRING attributes use
     cuf_put_cattr() instead.

integer*4 cuf_copy_attr(idf1,idv1,vname1, ida,aname, idf2,idv2,vname2)
   Copy the attribute's name and value(s) from a one CUF file/variable to
   another CUF file/variable. Returns the the attribute's ID.
     
cuf_del_attr(idf, idv, vname, ida, aname)
        Deletes variable and GLOBAL attrubutes. Use name or ID. 
        Parameters as above.

cuf_ainfo(idf, idv, vname, ida, aname, n, type)
   idf, idv, vname see above
   ida   /input/ - attribute's ID, if =0 then use aname.
   aname /input/output/ - attribute's name, if ida != 0 will be returned
   n     /output/ - number of elements in the attribute
   type  /output/ - type of the attribute

cuf_get_attr(idf, idv, vname, ida, aname, n, val)
   (*see above, plus:)
   val   - value(s) of the attribute
   in case of array of characters (n > 1 and type = CUF_DSTR) 
   val is assumed to be an array of C-string pointers. 

   NOTE: In case of a "file" type (CUF_DFILE) attributes the
      following rules apply: 
      if n = CUF_PATHOUT : the data will be written to a file "val" 
      if n = CUF_STDOUT  : the data
will be dumped to <stdout>
      if n = CUF_FIDOUT  : the data will be written to a Fortran file descriptor "val"
      if n = CUF_CIDOUT : the data will be written to a C (FILE *) descriptor val 

Miscellaneous:
---------------------------------------------------------------------------
integer*4 cuf_len(idf, id1, id2, request)
   returns the length of a name (in bytes) for a dimension, variable or 
   an attribute for a given value of id1 and request:  
    CUF_DDIM
    CUF_DVAR
    CUF_DATTR, consequently.
   In case of an attribute user should specify a variable's ID 
   as id1 and an attribute's ID as id2.

   With a request=CUF_DDATA you may obtain information
   about data element size in bytes, (if id1 != 0 and id2 = 0),
   or attribute's data size if both id1 and id2 are !=0. 

Data Types:
---------------------------------------------------------------------------
For CUF variables the following data types are allowed:
  CUF_DI1  - an INTEGER*1 
  CUF_DU1  - an CHARACTER*1 
  CUF_DI2  - an INTEGER*2 
  CUF_DU2  - an unsigned two-byte integer
  CUF_DI4  - an INTEGER*4
  CUF_DU4  - an unsigned four-byte integer
  CUF_DF4  - a REAL*4 or a C-float
  CUF_DF8  - a REAL*8, DOUBLE PRECISION or C-double
  CUF_DBN  - an N-bytes number, interpretation of number is up to
             a user (It may be viewed as fixed length character string). 
             The number of bytes per element should be specified as:
             CUF_DBN + N.

For CUF attrubutes you can use all the above types plus:
  CUF_DSTR - a character string "C"-style (ended with '\0').
             For this data type in case the attribute's dimension 
             is greater then 1 (*see cuf_set_attr()),
             it will be treated as an array of C strings: **ps.
  CUF_DFILE  - used for a "file" type attributes only

C-interfaces to CUF-Library


INCLUDE FILE: "/usr/local/include/cuf.h"

Basic File Operations:
---------------------------------------------------------------------------
   int CUF_open(file, key) - opens a file, returns a file id 

	char *file - /input/  file name 
	int   key  - /input/  CUF_READ - read only, 
			      CUF_UPDT - update or create
	        	      CUF_OWRT - always create 

   void CUF_close(int idf) - close a file

   void CUF_sync(int idf)  - insures file integrity	
	
   int CUF_finfo(idf, ndim, nvar, natt) - provides file info 
	int *ndim /output/ - number of dimensions
	int *nvar /output/ - number of variables
	int *natt /output/ - number of GLOBAL attributes

Dimensions:
---------------------------------------------------------------------------
      int CUF_dfdim(idf, name, val) - defines a dimension,
        returns dimension ID
	int   idf  /input/: file id from cuf_open, or 0 for current file
	char *name /input/: dimension name /character*(*)/
	int   val  /input/: value (integer*4) (if 0, dimension is unlimited)

      int CUF_gtdim(idf, name, val) - returns an ID and
	current value for a dimension "name"

      int CUF_dinfo(idf, id, val, type, name)  
        for a given id, which is an number from 1 to ndim (*see CUF_finfo())
        returns value, type and name of dimension. 
        Type may me CUF_FIX/1/ or CUF_VARY/0/


Variables:
---------------------------------------------------------------------------
      integer*4 CUF_dfvar(int idf, char *name, int type) - opens a
	definition of a variable with a name and a type (*see TYPES)
	may be followed by "CUF_set_..." calls and SHOULD be embraced
	by a "CUF_enddf" or another "CUF_dfvar" call.

      CUF_set_dim(char *name)
        char *name - name of a dimension. Order of cuf_set_dim is important:
               the first appears to be the fastest changing dimension,
               second - next fastest etc.

      CUF_copy_dims(name)
        char *name - name of a variable which is a prototype for 
               a current variable, and all dimensions will 
               be duplicated from "name".
	
      cuf_set_comp (type, ncdim)
        type - CUF_FIX or CUF_VARY for constant and varying 
               compression. When type = CUF_FIX it is assumed that 
               data consist of chunks with the same pattern of missing values 
               throughout the range of higher dimensions.
               NOTE: You can enhance the compression (sacrify some  
               performance) by using: type = CUF_FIX|CUF_VARY + CUF_BEST.
        ncdim - number of dimensions in a compression chunk (counted from 
               the lowest: changing first)

      cuf_set_miss(nmiss, vmiss)
        nmiss - specify the number of various possible missing data flags
        vmiss - array of values for missing data flag

      cuf_copy_comp (name)        
        name - name of a variable which will be used as a compression prototype
               for a current variable (compression parameters will be 
               duplicated from the variable "name").

      cuf_ptvar(idf, idv, vname, ista, icnt, data)
        idf - opened file ID (or 0, if current)
        idv - variable's ID or 0 if name is used (*see next) 
        name - a string with the variable's name tag (ignored if idv != 0)
        ista - an array of integers marking the beginning of hypercorner of 
               data chunk. Value 1 corresponds to first index. One value
               for each dimenstion.
        icnt - an array of integers which defines how many "steps" in
               each dimension the hyperslice of data consist of. Value of 0 
               indicates that whole dimension is included.     
        data - data array
          
      cuf_gtvar(idf, idv, vname, ista, icnt, data)
        (*see cuf_ptvar() )

      integer*4 cuf_vinfo (idp, idv, name, type, ndim, natt)
        idf  /input/  - opened file ID (or 0, if current)
        idv  /input/  - variable's ID. If =0 then "name" is used
        name /input/output/ - a string with the variable's name tag (filled if idv != 0)
        ndim /output/ - number if dimensions for a variable    
        natt /output/ - number if attributes for a variable    

Attributes:
---------------------------------------------------------------------------
      integer*4 cuf_set_attr(type, name, n, aval)
        this function (which returns attribute's ID) should be used 
        after cuf_dfvar(), but before cuf_enddf or another cuf_dfvar 
        type  - type (*see Types)
        aname - a character tag to identify the attribute
        n     - a number of elements in attribute (generally it is a
                one-dimensional array), but n CAN BE 0. 
        aval  - values of attribute (array)
        NOTE: CUF expects C-like strings for type=CUF_DSTR
        and a C-like array of strings's pointers if n > 1, therefore
        for a Fortran-like CHARACTER STRING attributes use
        cuf_set_cattr() instead.   
   
      integer*4 cuf_put_attr(idf, idv, vname, ida, aname, type, n, aval)
        Set or change variable and GLOBAL attrubutes values outside "defvar"
        context. Returns an attribute's ID.
        ida  /input/ - exsisted(!) attribute's ID, if =0 then use aname.

        In order to set GLOBAL (file) attribute call with:
        idv=CUF_GLOBAL , vname=CUF_NULL 
        
        NOTE: For a Fortran CHARACTER STRING attributes use
           cuf_put_cattr() instead.
     
      cuf_del_attr(idf, idv, vname, ida, aname)
        Delete variable and GLOBAL attrubutes. Use name or ID. 
        Parameters as above.

      cuf_ainfo(idf, idv, vname, ida, aname, n, type)
        idf, idv, vname see above
        ida   /input/ - attribute's ID, if =0 then use aname.
        aname /input/output/ - attribute's name, if ida != 0 will be returned
        n     /output/ - number of elements in the attribute
        type  /output/ - type of the attribute

      cuf_get_attr(idf, idv, vname, ida, aname, n, val)
        (*see above, plus:)
        val   - value(s) of the attribute
        in case of array of characters (n > 1 and type = CUF_DSTR) 
        val is assumed to be an array of C-string pointers. 

Miscellaneous:
---------------------------------------------------------------------------
      integer*4 cuf_len(idf, id1, id2, request)
          returns the length of a name (in bytes) for a dimension, variable or 
          an attribute for a given value of id1 and request:  
           CUF_DDIM
           CUF_DVAR
           CUF_DATTR, consequently.
          In case of an attribute user should specify a variable's ID 
          as id1 and an attribute's ID as id2.

           With a request=CUF_DDATA you may obtain information
          about data element size in bytes, (if id1 != 0 and id2 = 0),
          or attribute's data size if both id1 and id2 are !=0. 

Data Types:
---------------------------------------------------------------------------
       CUF_DSTR - a character string "C"-style (ended with '\0')
       CUF_DI1  - an INTEGER*1 
       CUF_DU1  - an CHARACTER*1 
       CUF_DI2  - an INTEGER*2 
       CUF_DU2  - an unsigned two-byte integer
       CUF_DI4  - an INTEGER*4   
       CUF_DU4  - an unsigned four-byte integer
       CUF_DF4  - a REAL*4 a C-float
       CUF_DF8  - a REAL*8, DOUBLE PRECISION or C-double
       CUF_DBN  - an N-bytes number, interpretation of number is up to
                  a user (It may be viewed as fixed length character string). 
                  The number of bytes per element should be specified as:
                  CUF_DBN + N.

	                            Release 0.1         1995

Document last modified: