Sparseness and Compression: A prominent feature of
a CUF file is that a variable may be "compressed" in case it has
natural sparseness (missing data). Such variables will be internally
compressed which results in significant disk space saving and reduced
network traffic. CUF supports multiple missing data flags for a
variable.
Only a contiguous chunk of fastest dimensions is allowed to be
compressed (for example, for a variable V(X,Y,Z,T) the allowed
compression chunks are: [X], [XY], [XYZ], and [XYZT] ). For a better
overall performance one should set the type of a compression chunk
depending on typical access request.
CUF gives a choice of constant or variable compression. For example, a
variable V(X,Y,T) may have all [XY] chunks with the same sparseness
("constant" compression) or with the different missing data patterns
for each T point ("variable" compression).
The compression schemes may be shared between several variables in case their
sparseness structures are the same. This results in even better
total compression rate.
Currently only the constant compression is supported for
variables with expandable dimensions.
Attributes: Each variable and CUF file as a whole may
have an unlimited number of named attributes (see attributes' types
below). These elements can be used to supply an additional information
about CUF file and CUF variables. It is also a way to supply a
hypermedia documentation along with your data (inside your portable
data file!). An attribute may be an array i.e. consist of many
elements of the same type.
An attribute's name and data can be
changed, or an entire attribute can be deleted from a CUF file.
Data Types and Portability: Allowed data types for
variables are: 1/2/4/8-bytes INTEGERs, 4/8-bytes REALs,
and N-bytes string segments (ex. FORTRAN CHARACTER arrays).
For the attributes you can also use C-type strings (null
terminated), arrays of C-strings (char **), and files (
the binary data which can be read either directly from a file or from
memory and while retrieved - written into a file, memory or piped to
<stdout>.
CUF uses machine independent data representation and therefore can be
read/written on a variety platforms, supporting XDR library calls
(SGI,SUN,HP,DEC,IBM,PC,MAC,CRAY...).
User Interfaces: C and FORTRAN CUF libraries provide for an extensive control of CUF objects implemented via formal CUF system of IDs or explicitly via object names for all major CUF elements: Dimensions, Variables and Attributes.
idf = cuf_open(file, key) | Open a CUF file |
idd = cuf_dfdim(idf, name, value) | Define a dimension |
idv = cuf_dfvar(idf, name, type) | Start a var's definition |
cuf_set_dim(name) | Set var's dimension |
cuf_set_comp(name) | Set var's compression |
cuf_enddf() | Close a var's definition |
cuf_ptvar(idf,idv,vname,ista,icnt,data) | Write var's values |
cuf_gtvar(idf, idv, vname, ista, icnt, data) | Read var's values |
ida = cuf_set_attr(type, name, n, aval) | Set an attribute |
ida = cuf_get_attr( idf, idv, vname, ida, aname, n, val) | Get an attribute's data |
The fragment of a Fortran code which does this will be the following:
include '/usr/local/include/fcuf.h' dimension temp(360,180) dimension ista(2)/2*1/, icnt(2)/2*0/ c.....................open a CUF-file idf = cuf_open ('GCMoutout.cdf', CUF_READ) c.....................read the data for temperature variable into array temp call cuf_gtvar (idf, 0, 'temp', ista, icnt, temp) c.....................close the CUF-file call cuf_close (idf)Here is a more advanced FORTRAN example of reading CUF file:
c...Include the definitions of CUF constants: include '/usr/local/include/fcuf.h' dimension id_dim(10), ns_dim(10) c...Opens a file, returns a file ID idf = cuf_open('database.cuf', CUF_READ) c...Inquire the number of dimensions and variables in the file call cuf_finfo(idf, ndim, nvar) c...For each variable: do idv = 1, nvar c......Get the name, number of dimensions, type, number of attributes: call cuf_vinfo(idf, idv, vname(idv), ndim, type, nattr) c......Obtain the dimensions' IDs and sizes: call cuf_getvdim(idf, idv, id_dim, ns_dim) c......For each dimension: do i = 1, ndim c.........Get dimension's name type and size: call cuf_dinfo(idf, id_dim(i), name, type, val) enddo enddo
include "/usr/local/include/fcuf.h" parameter (IYBA = 1856, NTMA = 1632) integer nobs(360,180) integer*1 ipack(360,180) integer*1 imiss/0/ integer ista(4)/4*1/, icnt(4)/4*1/ real x(360), y(180) integer itm(12), ity(1992-1856+1) idf = cuf_open("coverage.cuf", CUF_OWRT) call cuf_dfdim(idf, "NX", 360) call cuf_dfdim(idf, "NY", 180) call cuf_dfdim(idf, "NTM", 12) call cuf_dfdim(idf, "NTY", 1992-1856+1) call cuf_dfvar(idf, "X", CUF_DF4) call cuf_set_dim("NX") call cuf_dfvar(idf, "Y", CUF_DF4) call cuf_set_dim("NY") call cuf_dfvar(idf, "TM", CUF_DI4) call cuf_set_dim("NTM") call cuf_dfvar(idf, "TY", CUF_DI4) call cuf_set_dim("NTY") call cuf_dfvar(idf, "nobs", CUF_DI1) call cuf_set_dim ("NX") call cuf_set_dim ("NY") call cuf_set_dim ("NTM") call cuf_set_dim ("NTY") call cuf_set_comp (CUF_VARY+CUF_BEST, 2) call cuf_set_miss (1, imiss) call cuf_enddf() C.....Writing Grids: do i = 1, 360 x(i) = -179.5 + float(i-1) enddo call cuf_ptvar(idf, 0, "X", 1, 0, x) do i = 1, 180 y(i) = -89.5 + float(i-1) enddo call cuf_ptvar(idf, 0, "Y", 1, 0, y) do i = 1, 12 itm(i) = i enddo call cuf_ptvar(idf, 0, "TM", 1, 0, itm) do i = 1990, 1991 ity(i-1990+1) = i enddo call cuf_ptvar(idf, 0, "TY", 1, 0, ity) C.....Writing DATA: do iy = 1, 1992-1856+1 ista(4) = iy do im = 1, 12 call getobs(150, 1856+iy, im, nobs) call pack2i1(nobs, 360, 180, ipack) ista(3) = im call cuf_ptvar(idf, 0, "nobs", ista, icnt, ipack) enddo enddo call cuf_close(idf) STOP END
In your C code use:
#include "/usr/local/include/cuf.h"
Within the LDEO Climate Group (on rosie, lola, ariel, fox,
breadbox etc. ) you can just add -lsenq option
during the linking:
f77 foo.f -o foo -lsenq
("-lsenq" library is available in mips1, mips1coff, mips2 and mips4 flavors).
On other systems one should be using:
f77 foo.f -o foo -lcuf
or:
cc foo.f -o foo -lcuf
( You can download the include files and the CUF library -->here)
Some basic agreements in CUF are:
Basic File Manipulation: --------------------------------------------------------------------------- idf = cuf_open(file, key) - opens a file, returns a file ID. file - /input/ character string: file name key - /input/ CUF_READ - read only, CUF_UPDT - update or create CUF_OWRT - always overwrite CUF_TEST - returns 1 if file is in CUF format, 0 otherwise call cuf_close(idf) - close a CUF file call cuf_sync(idf) - flush data to insure a CUF file integrity integer*4 cuf_finfo(idf, ndim, nvar, natt) - provides for the file info ndim /output/ - number of dimensions nvar /output/ - number of variables natt /output/ - number of GLOBAL attributes Dimensions: --------------------------------------------------------------------------- integer*4 function cuf_dfdim(idf, name, val) - defines a dimension;returns a dimension id idf /input/: file id from cuf_open(), or 0 for current file name /input/: dimension name /character*(*)/ val /input/: value (integer*4) (if val<=0, dimension will be unlimited, with an initial size of abs(val) ) integer*4 function cuf_gtdim(idf, name, val) - returns an ID and the current value for a dimension. bool cuf_dinfo(idf, id, val, type, name) for a given id, which is an number from 1 to ndim (*see cuf_finfo()) returns value, type and name of dimension. Type may me CUF_FIX or CUF_VARY Variables: --------------------------------------------------------------------------- integer*4 cuf_dfvar(idf, name, type) - opens a definition of a variable. Specifies the name and a type (*see Data Types). May be followed by "cuf_set_..." calls and MUST be embraced with a cuf_enddf() or another cuf_dfvar() call. cuf_set_dim(name) - set a dimension for the variable name - name of a dimension. Order of cuf_set_dim()'s is important: the first appears to be the fastest changing dimension, second - next fastest etc. cuf_copy_dims(name) - copies the dimensions from an already defined variable name - name of a variable which is a prototype for a current variable, and all dimensions will be duplicated from a variable "name". cuf_set_comp (type, ncdim) - set the compression type for a variable with possible missing data type - CUF_FIX or CUF_VARY for constant and varying compression. When type = CUF_FIX it is assumed that data consist of chunks with the same pattern of missing values throughout the whole range of higher dimensions. NOTE: You can enhance the compression (vs. some performance) by using: type = CUF_FIX|CUF_VARY + CUF_BEST. ncdim - number of dimensions in a compression chunk (counted from the lowest: changing first) cuf_set_miss(nmiss, vmiss) - specify the missing data values. nmiss - number of different possible missing data flags vmiss() - array of values for missing data flags NOTE: this call will automatically set an "missing_value" attribute for a given variable equal to the first value of vmiss[]. cuf_copy_comp (name) - assumes that the variable has the same pattern of missing data as the referenced prototype. name - name of a variable which will be used as a compression prototype for a current variable (compression parameters will be duplicated from the variable "name", the "missing_value" attribute will be set unless the variables are matched, otherwise user should explicitly call cuf_set_miss()). cuf_ptvar(idf, idv, vname, ista, icnt, data) - write variable's data idf - opened file ID (or 0, if current) idv - variable's ID or 0 if name is used (*see next) name - a string with the variable's name tag (ignored if idv != 0) ista(ndim) - an array of integers marking the beginning of hypercorner of the data chunk. Value 1 corresponds to a first element. Values for all dimensions should be present. icnt(ndim) - an array of integers which defines how many rows/columns are in each dimension of the data hyperslice. Value of 0 defaults to the whole dimension. (BUT: for an expandable dimension an explicit (!=0) value should me provided !) data() - data array cuf_gtvar(idf, idv, vname, ista, icnt, data) - read variable's data (*see cuf_ptvar() ) integer*4 cuf_vinfo (idp, idv, name, type, ndim, natt) idf /input/ - opened file ID (or 0, if current) idv /input/ - variable's ID. If =0 then "name" is used name /input/output/ - a string with the variable's name tag. Will be filled with the variable name if (idv != 0) unless you explicitly asked not to by using CUF_NULL in place of a name. ndim /output/ - number of dimensions for a variable natt /output/ - number of attributes for a variable integer*4 cuf_gtvdim (idf, idv, name, did, dsz) did(ndim) /output/ - an array ( see cuf_vinfo() about ndim) filled with the dimensions' IDs for that variable. dsz(ndim) /output/ - an array filled with the current dimensions' sizes. Attributes: --------------------------------------------------------------------------- integer*4 cuf_set_attr(type, name, n, aval) this function (which returns an attribute's ID) if used inside cuf_dfvar(), cuf_enddf context will define an attribute for a current variable. Outside of a variable definition this function would set a GLOBAL (file) attribute. type - type (*see Types) aname - a character tag to identify the attribute n - a number of elements in attribute (generally it is a one-dimensional array), but n CAN BE 0. aval() - values of attribute (array) NOTE: CUF expects C-like strings for type=CUF_DSTR and a C-like array of strings's pointers if n > 1, therefore for a Fortran-like CHARACTER STRING attributes use cuf_set_cattr() instead. NOTE: A special type of attribute is the "file" type: CUF_DFILE. You can use it for storing documentation accompanying the data (HTML, GIFs, AUX, .ps txt or any other binary information). While setting this kind of attribute the value of a parameter n has the following meanings: n = 0 : the file data will be read from the path=aval n != 0 : the n bytes of data will be read from the pointer aval (See also a NOTE for cuf_get_attr() ) integer*4 cuf_put_attr(idf, idv, vname, ida, aname, type, n, aval) Set or change variable and GLOBAL attributes values outside "defvar" context. Returns an attribute's ID. ida /input/ - existed(!) attribute's ID, if =0 then use aname. In order to set GLOBAL (file) attribute call with: idv=CUF_GLOBAL , vname=CUF_NULL NOTE: For a Fortran CHARACTER STRING attributes use cuf_put_cattr() instead. integer*4 cuf_copy_attr(idf1,idv1,vname1, ida,aname, idf2,idv2,vname2) Copy the attribute's name and value(s) from a one CUF file/variable to another CUF file/variable. Returns the the attribute's ID. cuf_del_attr(idf, idv, vname, ida, aname) Deletes variable and GLOBAL attrubutes. Use name or ID. Parameters as above. cuf_ainfo(idf, idv, vname, ida, aname, n, type) idf, idv, vname see above ida /input/ - attribute's ID, if =0 then use aname. aname /input/output/ - attribute's name, if ida != 0 will be returned n /output/ - number of elements in the attribute type /output/ - type of the attribute cuf_get_attr(idf, idv, vname, ida, aname, n, val) (*see above, plus:) val - value(s) of the attribute in case of array of characters (n > 1 and type = CUF_DSTR) val is assumed to be an array of C-string pointers. NOTE: In case of a "file" type (CUF_DFILE) attributes the following rules apply: if n = CUF_PATHOUT : the data will be written to a file "val" if n = CUF_STDOUT : the data will be dumped to <stdout> if n = CUF_FIDOUT : the data will be written to a Fortran file descriptor "val" if n = CUF_CIDOUT : the data will be written to a C (FILE *) descriptor val Miscellaneous: --------------------------------------------------------------------------- integer*4 cuf_len(idf, id1, id2, request) returns the length of a name (in bytes) for a dimension, variable or an attribute for a given value of id1 and request: CUF_DDIM CUF_DVAR CUF_DATTR, consequently. In case of an attribute user should specify a variable's ID as id1 and an attribute's ID as id2. With a request=CUF_DDATA you may obtain information about data element size in bytes, (if id1 != 0 and id2 = 0), or attribute's data size if both id1 and id2 are !=0. Data Types: --------------------------------------------------------------------------- For CUF variables the following data types are allowed: CUF_DI1 - an INTEGER*1 CUF_DU1 - an CHARACTER*1 CUF_DI2 - an INTEGER*2 CUF_DU2 - an unsigned two-byte integer CUF_DI4 - an INTEGER*4 CUF_DU4 - an unsigned four-byte integer CUF_DF4 - a REAL*4 or a C-float CUF_DF8 - a REAL*8, DOUBLE PRECISION or C-double CUF_DBN - an N-bytes number, interpretation of number is up to a user (It may be viewed as fixed length character string). The number of bytes per element should be specified as: CUF_DBN + N. For CUF attrubutes you can use all the above types plus: CUF_DSTR - a character string "C"-style (ended with '\0'). For this data type in case the attribute's dimension is greater then 1 (*see cuf_set_attr()), it will be treated as an array of C strings: **ps. CUF_DFILE - used for a "file" type attributes only
INCLUDE FILE: "/usr/local/include/cuf.h" Basic File Operations: --------------------------------------------------------------------------- int CUF_open(file, key) - opens a file, returns a file id char *file - /input/ file name int key - /input/ CUF_READ - read only, CUF_UPDT - update or create CUF_OWRT - always create void CUF_close(int idf) - close a file void CUF_sync(int idf) - insures file integrity int CUF_finfo(idf, ndim, nvar, natt) - provides file info int *ndim /output/ - number of dimensions int *nvar /output/ - number of variables int *natt /output/ - number of GLOBAL attributes Dimensions: --------------------------------------------------------------------------- int CUF_dfdim(idf, name, val) - defines a dimension, returns dimension ID int idf /input/: file id from cuf_open, or 0 for current file char *name /input/: dimension name /character*(*)/ int val /input/: value (integer*4) (if 0, dimension is unlimited) int CUF_gtdim(idf, name, val) - returns an ID and current value for a dimension "name" int CUF_dinfo(idf, id, val, type, name) for a given id, which is an number from 1 to ndim (*see CUF_finfo()) returns value, type and name of dimension. Type may me CUF_FIX/1/ or CUF_VARY/0/ Variables: --------------------------------------------------------------------------- integer*4 CUF_dfvar(int idf, char *name, int type) - opens a definition of a variable with a name and a type (*see TYPES) may be followed by "CUF_set_..." calls and SHOULD be embraced by a "CUF_enddf" or another "CUF_dfvar" call. CUF_set_dim(char *name) char *name - name of a dimension. Order of cuf_set_dim is important: the first appears to be the fastest changing dimension, second - next fastest etc. CUF_copy_dims(name) char *name - name of a variable which is a prototype for a current variable, and all dimensions will be duplicated from "name". cuf_set_comp (type, ncdim) type - CUF_FIX or CUF_VARY for constant and varying compression. When type = CUF_FIX it is assumed that data consist of chunks with the same pattern of missing values throughout the range of higher dimensions. NOTE: You can enhance the compression (sacrify some performance) by using: type = CUF_FIX|CUF_VARY + CUF_BEST. ncdim - number of dimensions in a compression chunk (counted from the lowest: changing first) cuf_set_miss(nmiss, vmiss) nmiss - specify the number of various possible missing data flags vmiss - array of values for missing data flag cuf_copy_comp (name) name - name of a variable which will be used as a compression prototype for a current variable (compression parameters will be duplicated from the variable "name"). cuf_ptvar(idf, idv, vname, ista, icnt, data) idf - opened file ID (or 0, if current) idv - variable's ID or 0 if name is used (*see next) name - a string with the variable's name tag (ignored if idv != 0) ista - an array of integers marking the beginning of hypercorner of data chunk. Value 1 corresponds to first index. One value for each dimenstion. icnt - an array of integers which defines how many "steps" in each dimension the hyperslice of data consist of. Value of 0 indicates that whole dimension is included. data - data array cuf_gtvar(idf, idv, vname, ista, icnt, data) (*see cuf_ptvar() ) integer*4 cuf_vinfo (idp, idv, name, type, ndim, natt) idf /input/ - opened file ID (or 0, if current) idv /input/ - variable's ID. If =0 then "name" is used name /input/output/ - a string with the variable's name tag (filled if idv != 0) ndim /output/ - number if dimensions for a variable natt /output/ - number if attributes for a variable Attributes: --------------------------------------------------------------------------- integer*4 cuf_set_attr(type, name, n, aval) this function (which returns attribute's ID) should be used after cuf_dfvar(), but before cuf_enddf or another cuf_dfvar type - type (*see Types) aname - a character tag to identify the attribute n - a number of elements in attribute (generally it is a one-dimensional array), but n CAN BE 0. aval - values of attribute (array) NOTE: CUF expects C-like strings for type=CUF_DSTR and a C-like array of strings's pointers if n > 1, therefore for a Fortran-like CHARACTER STRING attributes use cuf_set_cattr() instead. integer*4 cuf_put_attr(idf, idv, vname, ida, aname, type, n, aval) Set or change variable and GLOBAL attrubutes values outside "defvar" context. Returns an attribute's ID. ida /input/ - exsisted(!) attribute's ID, if =0 then use aname. In order to set GLOBAL (file) attribute call with: idv=CUF_GLOBAL , vname=CUF_NULL NOTE: For a Fortran CHARACTER STRING attributes use cuf_put_cattr() instead. cuf_del_attr(idf, idv, vname, ida, aname) Delete variable and GLOBAL attrubutes. Use name or ID. Parameters as above. cuf_ainfo(idf, idv, vname, ida, aname, n, type) idf, idv, vname see above ida /input/ - attribute's ID, if =0 then use aname. aname /input/output/ - attribute's name, if ida != 0 will be returned n /output/ - number of elements in the attribute type /output/ - type of the attribute cuf_get_attr(idf, idv, vname, ida, aname, n, val) (*see above, plus:) val - value(s) of the attribute in case of array of characters (n > 1 and type = CUF_DSTR) val is assumed to be an array of C-string pointers. Miscellaneous: --------------------------------------------------------------------------- integer*4 cuf_len(idf, id1, id2, request) returns the length of a name (in bytes) for a dimension, variable or an attribute for a given value of id1 and request: CUF_DDIM CUF_DVAR CUF_DATTR, consequently. In case of an attribute user should specify a variable's ID as id1 and an attribute's ID as id2. With a request=CUF_DDATA you may obtain information about data element size in bytes, (if id1 != 0 and id2 = 0), or attribute's data size if both id1 and id2 are !=0. Data Types: --------------------------------------------------------------------------- CUF_DSTR - a character string "C"-style (ended with '\0') CUF_DI1 - an INTEGER*1 CUF_DU1 - an CHARACTER*1 CUF_DI2 - an INTEGER*2 CUF_DU2 - an unsigned two-byte integer CUF_DI4 - an INTEGER*4 CUF_DU4 - an unsigned four-byte integer CUF_DF4 - a REAL*4 a C-float CUF_DF8 - a REAL*8, DOUBLE PRECISION or C-double CUF_DBN - an N-bytes number, interpretation of number is up to a user (It may be viewed as fixed length character string). The number of bytes per element should be specified as: CUF_DBN + N. Release 0.1 1995