Technical Solutions for COBOL



Redvers Compression Algorithm

The Redvers Compression Algorithm gives you the ability to reduce memory overheads, carbon emissions and data storage costs without compromising your data assets.

Redvers Consulting have engineered a compression and deduplication algorithm specifically designed to produce optimal compression rates for data held in COBOL format. The algorithm is also designed to use minimal computer resources, especially when decompressing, so that archived information can be retrieved on-line with minimal disruption.

Main features:

The Redvers Compression Algorithm consists of a pair of simple but efficient COBOL subroutines that compress and decompress data strings as required. These data strings can be single fields, parts of a record, complete records or even a file of concatenated records.

Field level compression can be used to exclude record keys from the compression process, leaving the structure of databases and indexed files unchanged. This approach also gives applications access to compressed data without the need to decompress the whole file, disk or tape first.

As the software is distributed in COBOL source code, it can be compiled and run on IBM mainframe, iSeries/AS400, UNIX, HP, Linux, Fujitsu BS2000, Micro Focus or any other COBOL platform.

Download a White Paper on the Redvers Compression Algorithm:

How it Works

Compression can be performed in a one-off batch procedure that selects the data suitable for compression. This data is passed to the Redvers compression routine (RCCMPRES) which returns the string in its compressed form. The compressed string can then safely replace the original data.

Application programs that require the compressed information, pass the compressed string to the decompression routine (RCUNPRES) which returns the data string in its standard form. No other files, keys or parameters are required.

If an application needs to update compressed information, the updated details are passed through RCCMPRES and the output rewritten to the compressed data store.

Disk space in databases and indexed files can be saved by leaving key information unchanged, rewriting only the data components in their compressed state.

Compressed data can always be recovered because there are no keys to loose.

The diagram below shows how a compression / decompression procedure might be used in a typical application environment.

Compression Flowchart

Reduced Carbon Emissions

The following chart shows how 60% compression (case study rate) can produce real power and carbon dioxide savings for DS8000 storage devices.

zSeries Disk Compression Chart

1 640 additional watts per disk enclosure pair (270W x 2) + (25W x 4) - see "Power consumption and environment" section of IBM System Storage DS8000: Architecture and Implementation

2 0.537Kg per kilowatt-hour - see Carbon Trust Conversion Factors

3 Containing 32 x 15 KRPM, 300GB disks running 7 + P RAID 5 configuration (7.8TB data capacity)

Technical Information

The Redvers Compression Algorithm (2.1) uses a "lossless" compression algorithm designed and developed by Redvers Consulting. This algorithm provides optimum compression rates for COBOL application type data, using minimal computer resources. The algorithm can also be used to compress data not in COBOL format.

The algorithm is not "Huffman" or "arithmetic" based and doesn't require the overhead of building a probability tree and adding it to the compressed string. However, it does use a standard "sliding window" technique for data deduplication.

The size of the "sliding window" can be adjusted by the application to respond to different system priorities: a large window will produce better deduplication ratios but require more CPU time; a small window will result in poorer deduplication ratios but require less CPU time.

As RCCMPRES and RCUNPRES are standard COBOL programs, compiled at the customer's site, any limit on the length of data stings passed, are defined by the limitations of the compiler used.

Compression techniques include:

Input data can be in the ASCII or EBCDIC character sets and it can be encoded using single or double byte characters.

Actual compression rates range from 35% to 75%, depending on the length of the input data string and the length of the "sliding window".

Compression time is 0.3 megabytes per CPU second (using a "sliding window" of 400 bytes) and decompression is more than 14 megabytes per CPU second. All benchmark timings were performed on an IBM zSeries mainframe running z/OS 1.10.

Download a free 30 day trial here...