Sparseness and Compression: A prominent feature of
a CUF file is that a variable may be "compressed" in case it has
natural sparseness (missing data). Such variables will be internally
compressed which results in significant disk space saving and reduced
network traffic. CUF supports multiple missing data flags for a
variable.
Only a contiguous chunk of fastest dimensions is allowed to be
compressed (for example, for a variable V(X,Y,Z,T) the allowed
compression chunks are: [X], [XY], [XYZ], and [XYZT] ). For a better
overall performance one should set the type of a compression chunk
depending on typical access request.
CUF gives a choice of constant or variable compression. For example, a
variable V(X,Y,T) may have all [XY] chunks with the same sparseness
("constant" compression) or with the different missing data patterns
for each T point ("variable" compression).
The compression schemes may be shared between several variables in case their
sparseness structures are the same. This results in even better
total compression rate.
Currently only the constant compression is supported for
variables with expandable dimensions.
Attributes: Each variable and CUF file as a whole may
have an unlimited number of named attributes (see attributes' types
below). These elements can be used to supply an additional information
about CUF file and CUF variables. It is also a way to supply a
hypermedia documentation along with your data (inside your portable
data file!). An attribute may be an array i.e. consist of many
elements of the same type.
An attribute's name and data can be
changed, or an entire attribute can be deleted from a CUF file.
Data Types and Portability: Allowed data types for
variables are: 1/2/4/8-bytes INTEGERs, 4/8-bytes REALs,
and N-bytes string segments (ex. FORTRAN CHARACTER arrays).
For the attributes you can also use C-type strings (null
terminated), arrays of C-strings (char **), and files (
the binary data which can be read either directly from a file or from
memory and while retrieved - written into a file, memory or piped to
<stdout>.
CUF uses machine independent data representation and therefore can be
read/written on a variety platforms, supporting XDR library calls
(SGI,SUN,HP,DEC,IBM,PC,MAC,CRAY...).
User Interfaces: C and FORTRAN CUF libraries provide for an extensive control of CUF objects implemented via formal CUF system of IDs or explicitly via object names for all major CUF elements: Dimensions, Variables and Attributes.
| idf = cuf_open(file, key) | Open a CUF file |
| idd = cuf_dfdim(idf, name, value) | Define a dimension |
| idv = cuf_dfvar(idf, name, type) | Start a var's definition |
| cuf_set_dim(name) | Set var's dimension |
| cuf_set_comp(name) | Set var's compression |
| cuf_enddf() | Close a var's definition |
| cuf_ptvar(idf,idv,vname,ista,icnt,data) | Write var's values |
| cuf_gtvar(idf, idv, vname, ista, icnt, data) | Read var's values |
| ida = cuf_set_attr(type, name, n, aval) | Set an attribute |
| ida = cuf_get_attr( idf, idv, vname, ida, aname, n, val) | Get an attribute's data |
The fragment of a Fortran code which does this will be the following:
include '/usr/local/include/fcuf.h'
dimension temp(360,180)
dimension ista(2)/2*1/, icnt(2)/2*0/
c.....................open a CUF-file
idf = cuf_open ('GCMoutout.cdf', CUF_READ)
c.....................read the data for temperature variable into array temp
call cuf_gtvar (idf, 0, 'temp', ista, icnt, temp)
c.....................close the CUF-file
call cuf_close (idf)
Here is a more advanced FORTRAN example of reading CUF file:
c...Include the definitions of CUF constants: include '/usr/local/include/fcuf.h' dimension id_dim(10), ns_dim(10) c...Opens a file, returns a file ID idf = cuf_open('database.cuf', CUF_READ) c...Inquire the number of dimensions and variables in the file call cuf_finfo(idf, ndim, nvar) c...For each variable: do idv = 1, nvar c......Get the name, number of dimensions, type, number of attributes: call cuf_vinfo(idf, idv, vname(idv), ndim, type, nattr) c......Obtain the dimensions' IDs and sizes: call cuf_getvdim(idf, idv, id_dim, ns_dim) c......For each dimension: do i = 1, ndim c.........Get dimension's name type and size: call cuf_dinfo(idf, id_dim(i), name, type, val) enddo enddo
include "/usr/local/include/fcuf.h"
parameter (IYBA = 1856, NTMA = 1632)
integer nobs(360,180)
integer*1 ipack(360,180)
integer*1 imiss/0/
integer ista(4)/4*1/, icnt(4)/4*1/
real x(360), y(180)
integer itm(12), ity(1992-1856+1)
idf = cuf_open("coverage.cuf", CUF_OWRT)
call cuf_dfdim(idf, "NX", 360)
call cuf_dfdim(idf, "NY", 180)
call cuf_dfdim(idf, "NTM", 12)
call cuf_dfdim(idf, "NTY", 1992-1856+1)
call cuf_dfvar(idf, "X", CUF_DF4)
call cuf_set_dim("NX")
call cuf_dfvar(idf, "Y", CUF_DF4)
call cuf_set_dim("NY")
call cuf_dfvar(idf, "TM", CUF_DI4)
call cuf_set_dim("NTM")
call cuf_dfvar(idf, "TY", CUF_DI4)
call cuf_set_dim("NTY")
call cuf_dfvar(idf, "nobs", CUF_DI1)
call cuf_set_dim ("NX")
call cuf_set_dim ("NY")
call cuf_set_dim ("NTM")
call cuf_set_dim ("NTY")
call cuf_set_comp (CUF_VARY+CUF_BEST, 2)
call cuf_set_miss (1, imiss)
call cuf_enddf()
C.....Writing Grids:
do i = 1, 360
x(i) = -179.5 + float(i-1)
enddo
call cuf_ptvar(idf, 0, "X", 1, 0, x)
do i = 1, 180
y(i) = -89.5 + float(i-1)
enddo
call cuf_ptvar(idf, 0, "Y", 1, 0, y)
do i = 1, 12
itm(i) = i
enddo
call cuf_ptvar(idf, 0, "TM", 1, 0, itm)
do i = 1990, 1991
ity(i-1990+1) = i
enddo
call cuf_ptvar(idf, 0, "TY", 1, 0, ity)
C.....Writing DATA:
do iy = 1, 1992-1856+1
ista(4) = iy
do im = 1, 12
call getobs(150, 1856+iy, im, nobs)
call pack2i1(nobs, 360, 180, ipack)
ista(3) = im
call cuf_ptvar(idf, 0, "nobs", ista, icnt, ipack)
enddo
enddo
call cuf_close(idf)
STOP
END
In your C code use:
#include "/usr/local/include/cuf.h"
Within the LDEO Climate Group (on rosie, lola, ariel, fox,
breadbox etc. ) you can just add -lsenq option
during the linking:
f77 foo.f -o foo -lsenq
("-lsenq" library is available in mips1, mips1coff, mips2 and mips4 flavors).
On other systems one should be using:
f77 foo.f -o foo -lcuf
or:
cc foo.f -o foo -lcuf
( You can download the include files and the CUF library -->here)
Some basic agreements in CUF are:
Basic File Manipulation:
---------------------------------------------------------------------------
idf = cuf_open(file, key) - opens a file,
returns a file ID.
file - /input/ character string: file name
key - /input/ CUF_READ - read only,
CUF_UPDT - update or create
CUF_OWRT - always overwrite
CUF_TEST - returns 1 if file is in CUF
format, 0 otherwise
call cuf_close(idf) - close a CUF file
call cuf_sync(idf) - flush data to insure a CUF file integrity
integer*4 cuf_finfo(idf, ndim, nvar, natt) - provides
for the file info
ndim /output/ - number of dimensions
nvar /output/ - number of variables
natt /output/ - number of GLOBAL attributes
Dimensions:
---------------------------------------------------------------------------
integer*4 function cuf_dfdim(idf, name, val) - defines a
dimension;returns a dimension id
idf /input/: file id from cuf_open(),
or 0 for current file
name /input/: dimension name /character*(*)/
val /input/: value (integer*4) (if val<=0, dimension will
be unlimited, with an initial size of abs(val) )
integer*4 function cuf_gtdim(idf, name, val) - returns an ID
and the current value for a dimension.
bool cuf_dinfo(idf, id, val, type, name)
for a given id, which is an number from 1 to ndim
(*see cuf_finfo()) returns value,
type and name of dimension.
Type may me CUF_FIX or CUF_VARY
Variables:
---------------------------------------------------------------------------
integer*4 cuf_dfvar(idf, name, type) - opens a
definition of a variable. Specifies the name and
a type (*see Data Types).
May be followed by "cuf_set_..." calls and MUST be
embraced with a cuf_enddf() or another cuf_dfvar() call.
cuf_set_dim(name) - set a dimension for the variable
name - name of a dimension. Order of cuf_set_dim()'s is
important: the first appears to be the fastest changing
dimension, second - next fastest etc.
cuf_copy_dims(name) - copies the dimensions from an already
defined variable
name - name of a variable which is a prototype for a current variable,
and all dimensions will be duplicated from a variable "name".
cuf_set_comp (type, ncdim) - set the compression type for
a variable with possible missing data
type - CUF_FIX or CUF_VARY for constant and
varying compression. When type = CUF_FIX it is assumed
that data consist of chunks with the same pattern of missing
values throughout the whole range of higher dimensions.
NOTE: You can enhance the compression (vs. some performance)
by using: type = CUF_FIX|CUF_VARY + CUF_BEST.
ncdim - number of dimensions in a compression chunk (counted from
the lowest: changing first)
cuf_set_miss(nmiss, vmiss) - specify the missing data values.
nmiss - number of different possible missing data flags
vmiss() - array of values for missing data flags
NOTE: this call will automatically set an "missing_value"
attribute for a given variable equal to the first value of vmiss[].
cuf_copy_comp (name) - assumes that the variable has the
same pattern of missing data as the referenced prototype.
name - name of a variable which will be used as a compression
prototype for a current variable (compression parameters
will be duplicated from the variable "name", the
"missing_value" attribute will be set unless the variables
are matched, otherwise user should explicitly call cuf_set_miss()).
cuf_ptvar(idf, idv, vname, ista, icnt, data) - write variable's data
idf - opened file ID (or 0, if current)
idv - variable's ID or 0 if name is used (*see next)
name - a string with the variable's name tag (ignored if idv != 0)
ista(ndim) - an array of integers marking the beginning of
hypercorner of the data chunk. Value 1 corresponds to a first
element. Values for all dimensions should be present.
icnt(ndim) - an array of integers which defines how many
rows/columns are in each dimension of the data hyperslice.
Value of 0 defaults to the whole dimension. (BUT: for an expandable
dimension an explicit (!=0) value should me provided !)
data() - data array
cuf_gtvar(idf, idv, vname, ista, icnt, data) - read variable's data
(*see cuf_ptvar() )
integer*4 cuf_vinfo (idp, idv, name, type, ndim, natt)
idf /input/ - opened file ID (or 0, if current)
idv /input/ - variable's ID. If =0 then "name" is used
name /input/output/ - a string with the variable's name tag.
Will be filled with the variable name if (idv != 0)
unless you explicitly asked not to by using
CUF_NULL in place of a name.
ndim /output/ - number of dimensions for a variable
natt /output/ - number of attributes for a variable
integer*4 cuf_gtvdim (idf, idv, name, did, dsz)
did(ndim) /output/ - an array ( see cuf_vinfo() about ndim)
filled with the dimensions' IDs for that variable.
dsz(ndim) /output/ - an array filled with the
current dimensions' sizes.
Attributes:
---------------------------------------------------------------------------
integer*4 cuf_set_attr(type, name, n, aval)
this function (which returns an attribute's ID) if used
inside cuf_dfvar(), cuf_enddf context will
define an attribute for a current variable. Outside of a variable
definition this function would set a GLOBAL (file) attribute.
type - type (*see Types)
aname - a character tag to identify the attribute
n - a number of elements in attribute (generally it is a
one-dimensional array), but n CAN BE 0.
aval() - values of attribute (array)
NOTE: CUF expects C-like strings for type=CUF_DSTR
and a C-like array of strings's pointers if n > 1, therefore
for a Fortran-like CHARACTER STRING attributes use
cuf_set_cattr() instead.
NOTE: A special type of attribute is the "file" type:
CUF_DFILE. You can use it for storing documentation
accompanying the data (HTML, GIFs, AUX, .ps txt or any other binary
information). While setting this kind of attribute the value
of a parameter n has the following meanings:
n = 0 : the file data will be read from the path=aval
n != 0 : the n bytes of data will be read from the pointer aval
(See also a NOTE for cuf_get_attr() )
integer*4 cuf_put_attr(idf, idv, vname, ida, aname, type, n, aval)
Set or change variable and GLOBAL attributes values outside "defvar"
context. Returns an attribute's ID.
ida /input/ - existed(!) attribute's ID, if =0 then use aname.
In order to set GLOBAL (file) attribute call with:
idv=CUF_GLOBAL , vname=CUF_NULL
NOTE: For a Fortran CHARACTER STRING attributes use
cuf_put_cattr() instead.
integer*4 cuf_copy_attr(idf1,idv1,vname1, ida,aname, idf2,idv2,vname2)
Copy the attribute's name and value(s) from a one CUF file/variable to
another CUF file/variable. Returns the the attribute's ID.
cuf_del_attr(idf, idv, vname, ida, aname)
Deletes variable and GLOBAL attrubutes. Use name or ID.
Parameters as above.
cuf_ainfo(idf, idv, vname, ida, aname, n, type)
idf, idv, vname see above
ida /input/ - attribute's ID, if =0 then use aname.
aname /input/output/ - attribute's name, if ida != 0 will be returned
n /output/ - number of elements in the attribute
type /output/ - type of the attribute
cuf_get_attr(idf, idv, vname, ida, aname, n, val)
(*see above, plus:)
val - value(s) of the attribute
in case of array of characters (n > 1 and type = CUF_DSTR)
val is assumed to be an array of C-string pointers.
NOTE: In case of a "file" type (CUF_DFILE) attributes the
following rules apply:
if n = CUF_PATHOUT : the data will be written to a file "val"
if n = CUF_STDOUT : the data
will be dumped to <stdout>
if n = CUF_FIDOUT : the data will be written to a Fortran file descriptor "val"
if n = CUF_CIDOUT : the data will be written to a C (FILE *) descriptor val
Miscellaneous:
---------------------------------------------------------------------------
integer*4 cuf_len(idf, id1, id2, request)
returns the length of a name (in bytes) for a dimension, variable or
an attribute for a given value of id1 and request:
CUF_DDIM
CUF_DVAR
CUF_DATTR, consequently.
In case of an attribute user should specify a variable's ID
as id1 and an attribute's ID as id2.
With a request=CUF_DDATA you may obtain information
about data element size in bytes, (if id1 != 0 and id2 = 0),
or attribute's data size if both id1 and id2 are !=0.
Data Types:
---------------------------------------------------------------------------
For CUF variables the following data types are allowed:
CUF_DI1 - an INTEGER*1
CUF_DU1 - an CHARACTER*1
CUF_DI2 - an INTEGER*2
CUF_DU2 - an unsigned two-byte integer
CUF_DI4 - an INTEGER*4
CUF_DU4 - an unsigned four-byte integer
CUF_DF4 - a REAL*4 or a C-float
CUF_DF8 - a REAL*8, DOUBLE PRECISION or C-double
CUF_DBN - an N-bytes number, interpretation of number is up to
a user (It may be viewed as fixed length character string).
The number of bytes per element should be specified as:
CUF_DBN + N.
For CUF attrubutes you can use all the above types plus:
CUF_DSTR - a character string "C"-style (ended with '\0').
For this data type in case the attribute's dimension
is greater then 1 (*see cuf_set_attr()),
it will be treated as an array of C strings: **ps.
CUF_DFILE - used for a "file" type attributes only
INCLUDE FILE: "/usr/local/include/cuf.h"
Basic File Operations:
---------------------------------------------------------------------------
int CUF_open(file, key) - opens a file, returns a file id
char *file - /input/ file name
int key - /input/ CUF_READ - read only,
CUF_UPDT - update or create
CUF_OWRT - always create
void CUF_close(int idf) - close a file
void CUF_sync(int idf) - insures file integrity
int CUF_finfo(idf, ndim, nvar, natt) - provides file info
int *ndim /output/ - number of dimensions
int *nvar /output/ - number of variables
int *natt /output/ - number of GLOBAL attributes
Dimensions:
---------------------------------------------------------------------------
int CUF_dfdim(idf, name, val) - defines a dimension,
returns dimension ID
int idf /input/: file id from cuf_open, or 0 for current file
char *name /input/: dimension name /character*(*)/
int val /input/: value (integer*4) (if 0, dimension is unlimited)
int CUF_gtdim(idf, name, val) - returns an ID and
current value for a dimension "name"
int CUF_dinfo(idf, id, val, type, name)
for a given id, which is an number from 1 to ndim (*see CUF_finfo())
returns value, type and name of dimension.
Type may me CUF_FIX/1/ or CUF_VARY/0/
Variables:
---------------------------------------------------------------------------
integer*4 CUF_dfvar(int idf, char *name, int type) - opens a
definition of a variable with a name and a type (*see TYPES)
may be followed by "CUF_set_..." calls and SHOULD be embraced
by a "CUF_enddf" or another "CUF_dfvar" call.
CUF_set_dim(char *name)
char *name - name of a dimension. Order of cuf_set_dim is important:
the first appears to be the fastest changing dimension,
second - next fastest etc.
CUF_copy_dims(name)
char *name - name of a variable which is a prototype for
a current variable, and all dimensions will
be duplicated from "name".
cuf_set_comp (type, ncdim)
type - CUF_FIX or CUF_VARY for constant and varying
compression. When type = CUF_FIX it is assumed that
data consist of chunks with the same pattern of missing values
throughout the range of higher dimensions.
NOTE: You can enhance the compression (sacrify some
performance) by using: type = CUF_FIX|CUF_VARY + CUF_BEST.
ncdim - number of dimensions in a compression chunk (counted from
the lowest: changing first)
cuf_set_miss(nmiss, vmiss)
nmiss - specify the number of various possible missing data flags
vmiss - array of values for missing data flag
cuf_copy_comp (name)
name - name of a variable which will be used as a compression prototype
for a current variable (compression parameters will be
duplicated from the variable "name").
cuf_ptvar(idf, idv, vname, ista, icnt, data)
idf - opened file ID (or 0, if current)
idv - variable's ID or 0 if name is used (*see next)
name - a string with the variable's name tag (ignored if idv != 0)
ista - an array of integers marking the beginning of hypercorner of
data chunk. Value 1 corresponds to first index. One value
for each dimenstion.
icnt - an array of integers which defines how many "steps" in
each dimension the hyperslice of data consist of. Value of 0
indicates that whole dimension is included.
data - data array
cuf_gtvar(idf, idv, vname, ista, icnt, data)
(*see cuf_ptvar() )
integer*4 cuf_vinfo (idp, idv, name, type, ndim, natt)
idf /input/ - opened file ID (or 0, if current)
idv /input/ - variable's ID. If =0 then "name" is used
name /input/output/ - a string with the variable's name tag (filled if idv != 0)
ndim /output/ - number if dimensions for a variable
natt /output/ - number if attributes for a variable
Attributes:
---------------------------------------------------------------------------
integer*4 cuf_set_attr(type, name, n, aval)
this function (which returns attribute's ID) should be used
after cuf_dfvar(), but before cuf_enddf or another cuf_dfvar
type - type (*see Types)
aname - a character tag to identify the attribute
n - a number of elements in attribute (generally it is a
one-dimensional array), but n CAN BE 0.
aval - values of attribute (array)
NOTE: CUF expects C-like strings for type=CUF_DSTR
and a C-like array of strings's pointers if n > 1, therefore
for a Fortran-like CHARACTER STRING attributes use
cuf_set_cattr() instead.
integer*4 cuf_put_attr(idf, idv, vname, ida, aname, type, n, aval)
Set or change variable and GLOBAL attrubutes values outside "defvar"
context. Returns an attribute's ID.
ida /input/ - exsisted(!) attribute's ID, if =0 then use aname.
In order to set GLOBAL (file) attribute call with:
idv=CUF_GLOBAL , vname=CUF_NULL
NOTE: For a Fortran CHARACTER STRING attributes use
cuf_put_cattr() instead.
cuf_del_attr(idf, idv, vname, ida, aname)
Delete variable and GLOBAL attrubutes. Use name or ID.
Parameters as above.
cuf_ainfo(idf, idv, vname, ida, aname, n, type)
idf, idv, vname see above
ida /input/ - attribute's ID, if =0 then use aname.
aname /input/output/ - attribute's name, if ida != 0 will be returned
n /output/ - number of elements in the attribute
type /output/ - type of the attribute
cuf_get_attr(idf, idv, vname, ida, aname, n, val)
(*see above, plus:)
val - value(s) of the attribute
in case of array of characters (n > 1 and type = CUF_DSTR)
val is assumed to be an array of C-string pointers.
Miscellaneous:
---------------------------------------------------------------------------
integer*4 cuf_len(idf, id1, id2, request)
returns the length of a name (in bytes) for a dimension, variable or
an attribute for a given value of id1 and request:
CUF_DDIM
CUF_DVAR
CUF_DATTR, consequently.
In case of an attribute user should specify a variable's ID
as id1 and an attribute's ID as id2.
With a request=CUF_DDATA you may obtain information
about data element size in bytes, (if id1 != 0 and id2 = 0),
or attribute's data size if both id1 and id2 are !=0.
Data Types:
---------------------------------------------------------------------------
CUF_DSTR - a character string "C"-style (ended with '\0')
CUF_DI1 - an INTEGER*1
CUF_DU1 - an CHARACTER*1
CUF_DI2 - an INTEGER*2
CUF_DU2 - an unsigned two-byte integer
CUF_DI4 - an INTEGER*4
CUF_DU4 - an unsigned four-byte integer
CUF_DF4 - a REAL*4 a C-float
CUF_DF8 - a REAL*8, DOUBLE PRECISION or C-double
CUF_DBN - an N-bytes number, interpretation of number is up to
a user (It may be viewed as fixed length character string).
The number of bytes per element should be specified as:
CUF_DBN + N.
Release 0.1 1995