The gfz library

 [image of the Head of a GNU]


Table of content


The gfz library

The gfz library is a free library for access to forensic files. The first version of this library will support only the gfzip file format and dd, but future versions will also support AFF and possibly more (non closed) formats.

Initialization

In order to use libgfz in a program you first need to include gfz.h.

    #include <gfz.h>
 
   
Next to including the header file you will need to initialize the library. For initialization the gfz library defines the function gfz_init. This function will initialize the library and create an environment for you that you must use when opening a forensic file.

gfz_env *gfz_init(char *crl,char *ca);

The two arguments are the file path of the main CA certificate file, and optionally of a certificate revocation list. If gfz_init fails, NULL will be returned, and errno will be set. (TODO: define errno values)

int gfz_addca(char *ca);

It is possible that it will be needed to consult multiple CA's. For this reason, the function gfz_addca is made available so more ca files may be added to the environment.

void gfz_end(gfz_env *env);

In order to free all memory allocated to the environment and close any images opened into this environment, the function gfz_end is provided.

Opening a forensic file

Now what you have created your environment you can go and open the forensic disk image into this environment. For this the gfz library has the gfz_open function. This function on success will return a gfz_image that will be used as handle to do most further API calls on.

gfz_image *gfz_open(gfz_env *env,char *path,int filetype,u_int32_t *flags);

The first argument is the environment as created with gfz_init. The second argument is the path of the file to load, and the final, filetype argument defines the type of file. The following values are valid as filetype: gfz_open will return a gfz_image on success or a NULL value on failure. on failure errno will be set to one of the following values: The flags argument indicates what data flags must be considered sufficient reason for a read error. Using this mechanism makes it possible to either have read actions that would read data marked as 'bad' to fail with an error or to succeed. If a gfzip block device is opened than the flags may be updated with flags that were set prior to attaching the gfzip file to the device.

Compression type and block sizes

Ones the file has been opened, the gfz API supplies some basic functions to query the most basic information about the file.

int gfz_getfiletype(gfz_image *img)

Returns the type of image file.

int gfz_getcompression(gfz_image *img)

Returns the compression used on the data in the image.

size_t gfz_getblocksize(gfz_image *img)

This function returns the uncompressed size in bytes of the compressed data chunks. libgfz expects compressed data to have equally sized uncompressed versions.

int gfz_usesarchive(gfz_image *img)

This function returns a boolean value that tells if the image is self contained in the image file, or if the data is actually contained in a separate archive containing the data of multiple images.

u_int64_t gfz_getsize(gfz_image *img);

This function returns the uncompressed size of the image.

int gfz_ispacked(gfz_image *img);

This function returns a boolean value that tells if the image is stored packed, that is : compressed and with any unique block of data stored only ones, in a storage order based on the digest of the uncompressed data

int gfz_usesreduction(gfz_image *img);

Having the library cache image data

After querying the file for its basics, you should have sufficient information to configure the basic caching features of the gfz library. The gfz library can do caching on 4 separate levels, and it is extremely important that you fully understand what this means in order to choose the proper caching mechanism and configuration for your application. After opening a file, in order to read from it, it is needed to acquire a seekhandle. It is possible ta acquire multiple seek handles for each open file. If the gfzip file is handed to the gfzip kernel module, the kernel module will be allocating seek handles and corresponding caches to processes that open the block device. If nothing is set, than the default caching is that of a single compression block size for each seekhandle. If it is likely that multiple seekhandles for the same file will regularly hit the same blocks of data of a given image, as it is when handing the gfzip file to the kernel module, than it may be wise to also cache on the image level. Version 2 of the gfzip file format defines the possibility of archives. With this it may be possible that you have not only different seekhandles to the same image that could share caching, but you may even have seekhandles open to different images that access the same archive. It is very likely with these archives that multiple images will hold references to a particular block of data that exists on these images. If the block would be in cache for one image, it will also be usable for the next, thus the gfz library also defines optional caching at the archive level. One last location where gfzip can do caching is at the top environment level. A program will always have only one environment created for access to multiple image files or even multiple archives. Given the fact that the SHA256 digests on data guarantee sufficiently that data with the same hash is the same data, it may be possible to also do caching on the environment level. NOTE: use of environment level caching may provide performance merits, but may very well conflict with procedures, please use only with the highest amount of certainty that such a conflict does not exist

size_t gfz_setcaching(gfz_env *env,int level,size_t kbcache);

The first argument is the environment as created with gfz_init. The second argument specifies one of the following levels: The last argument defines the number of kbytes to allocate for caching at the given level. The gfz_setcaching function will return the number of kbyte actually allocated for caching at the given level, or 0 on error. On error errno will be set. When handing a gfzip file to the kernel module it is important to be more conservative in cache definition than when directly accessing the gfzip file from a user space program.

Validating file and data integrity

Given that you successfully opened an image file does not yet give you any information about the integrity of the image and its content. The gfzip library API provides a number of functions that you SHOULD use in order to validate the integrity of the opened file. The integrity checks of gfzip files are multi layered. There are a number of checks defined that can be done using a single function and specifying a set of flags.

u_unt32_t gfz_validate(gfz_image *img,u_int32_t *flags);

The flags argument of this function determines the checks that should be done. The value GFZ_CHECK_DEFAULT can be used to have the default checks done. gfz_validate will return 0 on success. If any of the checks fails, the value returned will be the flag belonging to the failed check. On failure, the flags argument will be updated. The resulting set of flags will hold the flags for the checks that were not performed. This way gfz_validate can return on the first failure, but if the programmer wishes he could call gfz_validate repeatedly in order to perform the remaining tests.

Attaching to a device

If the file opened is a gfzip file, than after calling gfz_validate at least ones, if the kernel of the operating system supports it, it becomes possible to attach the validated file to a gfzip block device. This device can than later be accessed by other processes either as a regular block device, or using the gfz library. It is important to note that all certificate related checks must have been done before attaching, for the checks to be possible on the device file later on. The reason for this is that no kernel level certificate checking is possible, and checks done on the device file will return the cached results of the pre attach checks.

int gfz_attach(gfz_image *img,char *device,u_int32_t flags);

This function will attach the opened gfzip file to the indicated gfzip block device. The flags argument is used to indicate on what flags read actions on the image should fail. On failure this function will return NULL and errno will be set appropriately.

Querying signee information

With signed gfzip files, the information on who signed the data in the file becomes a very important subset of the available meta data. Given that multiple sections in the image and/or archive could be signed by different people, before getting information on signing it is needed to identify the appropriate section(s) in the appropriate file.

gfz_file *gfz_getfile(gfz_image *img,int type);

Given that a gfzip file could have its data available in an archive, and could have its index table segment in yet an other file, this function allows it to address these individual files if needed by supplying a gfz_file for each of these entities. The value type can have the following values: On failure NULL is returned and errno will be set.

void gfz_free_filehandle(gfz_file *fh);

This function un-allocates any memory allocated to a gfz_file structure.

size_t gfz_partitioncount(gfz_file *file);

This function will return the number of partitions in the file. On failure 0 is returned and errno will be set.

gfz_partition *gfz_getpartition(gfz_file *file,size_t parts);

This function will return a gfz_section for the identified section of the file. On failure NULL is returned and errno will be set.

gfz_freepartitionhandle(gfz_partition *part);

This function un-allocates any memory allocated to a gfz_section structure.

int gfz_partitiontype(gfz_partition *part);

This function returns the type of a section.

char *gfz_getmpartitionnamespace(gfz_partition *part);

A gfzip file can hold multiple meta data partitions. Each such partition will have a unique namespace defined. This namespace is defined as a free from set of alphanumeric characters, but the gfzip format specifies a number of suggested namespaces: You may consult the file format specification for more information on these defined namespaces.

size_t gfz_getcainfo(gfz_partition *part,char **entval,int type);

This function will dependent on the value supplied for type return: The gfz_getcainfo function expects entval to be the address of a character pointer that is initialized as NULL. The function will allocate a character string pointed to by entval that is put under the responsibility of the caller of gfz_getcertinfo. On success gfz_getcainfo will return the size of the allocated string. On failure 0 is returned and errno will be set.

u_int32_t gfz_checkcert(gfz_partition *part,u_int32_t *flags);

This function will administer the certificate checks as defined for gfz_validate, but instead of doing it for all certificates, it is done for the certificate belonging to specified section. See gfz_validate for more details.

Accessing image digests

A gfzip file embeds a number of digests of the original uncompressed data as part of a special meta-data partition. The default digest used by gfzip is sha256, but for legacy purposes, this section also includes a sha1 and even an md5 digest.

char *gfz_gethexdigest(gfz_image *img,int digtype);

This function will return a lowercase hex representation of the digest of the indicated type. The following types are defined for gfzip files:

char *gfz_rawdigest(gfz_image *img,int digtype);

This function will return a pointer to the binary version of the indicated digest. Please note that this is NOT a NULL terminated string, but an array of binary chars with a length given by GFZ_DIGLEN_SHA256, GFZ_DIGLEN_SHA1 or GFZ_DIGLEN_MD5.

Read access

Other than opening and validating the data in the image, we also need to to access the actual data. As stated before in the section on caching, we need to acquire a seekhandle in order to gain access to the actual data.

gfz_seek *gfz_getseekhandle(gfz_image *img);

This function on success will return a gfz_seek handle. On failure NULL is returned and errno is set.

void gfz_free_seekhandle(gfz_seek *sh);

Un-allocate any memory allocated to a gfz_seek structure.

int gfz_seekset(gfz_seek *sh,u_int64_t offset);

Move the read pointer of the gfz_seek to the indicated position. This action will not result in reading data from disk, but it will be used by higher level caching algorithms to determine if in-memory blocks can be replaced.

u_int64_t gfz_tell(gfz_seek *sh);

Return the current read pointer of the gfz_seek.

u_int32_t gfz_read(gfz_seek *sh,void *buffer,u_int32_t size, u_int32_t nmemb,u_int32_t *andflags,u_int32_t *orflags);

This function tries to read nmemb items of size bytes each into buffer starting at the position indicated by the read pointer of sh. The read-pointer of is will be updated and andflags and orflags will be updated. The function gfz_read will return the number of items actually read. The two arguments andflags and orflags need a bit more attention. A block of data read could spawn multiple flagged sections of data and thus contain data sections where a certain flag is set and other data sections where this flag is not set. Both andflags and orflags should be initialized with the flags of interest to the caller. After the call orrflags will maintain the flags that were set for any of the data read, while andflag will maintain only the flags that were set for all the data read. Currently the following flags are defined:

Access to meta data

Next to flags and certificate information, a more free-form of meta data is also important. The gfzip file format allows for the inclusion of multiple signed meta data partitions. On such a partition the following function can be called.

gfz_metastream *gfz_getmetaset(gfz_partition *part);

This function returns a gfz_metastream structure that can be used to retrieve a set of key/value pairs that together form the meta data section.

void gfz_free_metastreamhandle(gfz_metastream *ms);

This function will un-allocate any memory allocated for a gfz_metastream structure.

int gfz_getmeta(gfz_metastream *ms,char **key,char **val);

This function expects key and val to be the addresses of two NULL initiated char pointers. The gfz_getmeta will try to retrieve the next meta-data key/value pair from ms, and will allocate character strings for this that are returned into the responsibility of the caller. On success the function will return 1, while 0 indicates the end of available meta-data or an error. If 0 is returned, errno is filled with an appropriate value. The value GFZ_ERRNO_ENDOFMETA indicates no real error but the end of the meta-data stream.

Querying the flags

As already seen briefly in the section on gfz_read, the gfzip file format supplies a set of flags that provide special meta-data that is only applicable to small sections of the data of an image. A few additional functions are available to access the flags information.

u_int32_t gfz_getflags(gfz_seek *sh,gfz_partition *part);

This function will return the flags for the current read pointer of a gfz_seek. If part is defined as NULL, than all flags sections in the image will be combined, otherwise the specified meta partition only will be consulted.

u_int64_t gfz_flagsoffset(gfz_seek *sh,gfz_partition *part);

This function returns the the offset where the current flagged data section starts.

u_int64_t gfz_next_flagsoffset(gfz_seek *sh,gfz_partition *part);

This function returns the offset where the next flagged data section starts.

Image History

The gfzip file format makes it possible to add meta-data and/or flags to a gfzip file. The gfz creation library will not overwrite the old gfzip footer, but will just ad an additional meta-data partition. In order to guard and record the chain of evidence, the new section is cryptographically linked to existing partitions. The library provides a number of functions to access and traverse the chain of evidence thus recorded.

gfz_partition *gfz_getlastmetapartition(gfz_file *fil);

This function returns the last added meta-data partition of the image file.

gfz_partition *gfz_getdatapartition(gfz_file *fil);

This function returns the data partition of the image file. If non is defined than NULL is returned.

gfz_partition *gfz_getdigestpartition(gfz_partition *part);

This function returns the digest partition as defined in the chain of evidence signature portion of the supplied partition. If non is defined than NULL is returned.

gfz_partition *gfz_getparentpartition(gfz_partition *part);

This function returns the parent partition as defined in the chain of evidence signature portion of the supplied partition. If non is defined than NULL is returned.

gfz_partition *gfz_getcoparentpartition(gfz_partition *part);

This function returns the parent partition as defined in the chain of evidence signature portion of the supplied partition. If non is defined than NULL is returned.


Verbatim copying and distribution of this entire article are permitted worldwide, without royalty, in any medium, provided this notice, and the copyright notice, are preserved.