Upgrading

New file format

Starting from version 2021.048 CAPS introduces a new file storage format. Actually the files are still compatible and chunk based but two new chunk types were added. The upgrade itself should run smoothly without interruption but due to the new file format all files must be converted before they can be read. CAPS will do that on-the-fly whenever a file is opened for reading or writing.

That can cause performance drops until all files have been converted. But it should not cause any outages.

Rationale

The time to store an out-of-order record in CAPS increased the more records were stored already. This was caused by a linear search of the insert position. The more records were stored the more records had to be checked and the more file content had to be paged in system memory which is a slow operation. In addition a second index file had to be maintained which requires an additional open file descriptor per data file. As we also looked for way to reduce disc fragmentation and to allow file size pre-allocation on any operating system we decided to redesign the way how individual records are stored within a data file. What we wanted was:

  • Fast insert operations

  • Fast data retrieval

  • Portable file size pre-allocations

  • Efficient OS memory paging

CAPS now implements a B+tree index per data file. No additional index file is required. The index is maintained as additional chunks in the data file itself. Furthermore CAPS maintains a meta chunk at the end of the file with information about the logical and pyhsical file size, the index chunks and so on. If that chunk is not available or is not valid then the data file will be re-scanned and converted. This is what actually happens after an upgrade.

As a consequence, time window requests will be much faster with respect to CPU time. Also file accesses are less frequent and reading file content overhead while extracting arbitrary time windows is less than before.

As the time range stored in the data file is now part of the meta data a full re-scan is not necessary when restarting CAPS without its archive log. When dealing with many channels it will speed up re-scanning an archive a lot.

Manual archive conversion

If a controlled conversion of the archive files is desired then the following procedure can be applied:

  1. Stop caps

    $ seiscomp stop caps
    
  2. Enter the configured archve directory

    $ cd seiscomp/var/lib/caps/archive
    
  3. Check all files and trigger a conversion

    $ find -name *.data -exec rifftest {} check \;
    
  4. Start caps

    $ seiscomp start caps
    

Depending on the size of the archive step 3 can take some time.