capssds

Virtual overlay file system presenting a CAPS archive directory as a read-only SDS archive.

Description

capssds is a virtual overlay file system presenting a CAPS archive directory as a read-only SDS archive with no extra disk space requirement.

CAPS Directory and file names are mapped. An application reading from a file will only see miniSEED records ordered by record start time. You may connect to the virtual SDS archive using the RecordStream SDS or directly read the single miniSEED file. Other seismological software such as ObsPy or Seisan may read directly from the SDS archive of the files therein.

Usage

The virtual file system may be mounted by an unprivileged system user like sysop or configured by the root user to be automatically mounted on machine startup via an /etc/fstab entry or an systemd mount script.

The following sections assume that the CAPS archive is located under /home/sysop/seiscomp/var/lib/caps/archive and the SDS archive should appear under /tmp/sds with all files and directories being owned by the sysop user.

Regardless which of the following mount strategies is chosen make sure to create the target directory first:

mkdir -p /tmp/sds

Unpriviledged user

Mount the archive:

capssds ~/seiscomp/var/lib/caps/archive /tmp/sds

Unmount the archive:

fusermount -u /tmp/sds

System administrator - /etc/fstab

Create the /etc/fstab entry:

/home/sysop/seiscomp/var/lib/caps/archive  /tmp/sds fuse.capssds  defaults  0  0

Alternatively you may define mount options, e.g., to deactivate the auto mount, grant the user the option to mount the directory himself or use the sloppy_size feature:

/home/sysop/seiscomp/var/lib/caps/archive  /tmp/sds fuse.capssds  fuse.capssds  noauto,exact_size,user  0  0

Mount the archive:

mount /tmp/sds

Unmount the archive:

umount /tmp/sds

System administrator - systemd

Create the following file under /etc/systemd/system/tmp-sds.mount. Please note that the file name must match the path specified under Where with all slashes replaced by a dash:

[Unit]
Description=Mount CAPS archive as readonly miniSEED SDS
After=network.target

[Mount]
What=/home/sysop/var/lib/caps/archive
Where=/tmp/sds
Type=fuse.capssds
Options=defaults,allow_other

[Install]
WantedBy=multi-user.target

Mount the archive:

systemctl start tmp-sds.mount

Unmount the archive:

systemctl stop tmp-sds.mount

Automatic startup:

systemctl enable tmp-sds.mount

Implementation Details

capssds makes use of the FUSE [2] is a userspace filesystem framework provided by the Linux kernel as well as the libfuse [3] user space library.

The file system provides only read access to the data files and implements only basic operations required to list and read data files. It has to fulfill 2 main tasks, the Path mapping of CAPS and SDS directory tree entries and the Data file conversion. Caches are used the improve the performance.

Supported operations

  • init - initializes the file system

  • getattr - get file and directory attributes such as size and access rights

  • access - check for specific access rights

  • open - open a file

  • read - read data at a specific file position

  • readdir - list directory entries

  • release - release a file handle

  • destroy - shutdown the file system

Please refer to fuse.h for a complete list of fuse operations.

Path mapping

CAPS uses a comparable directory structure to SDS with three differences:

  • The channel does not use the .D prefix.

  • The day of year index is zero-based (0-365) where as SDS uses an index starting with 1 (1-366).

  • CAPS data files use the extension .data.

The following example shows the translation from a CAPS data file path to an SDS file path for the stream AM.R0F05.00.SHZ for data on January 1st 2025:

2025/AM/R0F05/SHZ/AM.R0F05.00.SHZ.2025.000.data -> 2025/AM/R0F05/SHZ.D/AM.R0F05.00.SHZ.D.2025.001

Directories and file names not fulfilling the miniSEED format specification are not listed.

Data file conversion

A CAPS data file contains records of certain types in the order of their arrival together with a record index for record lookup and sorting. If a process reads data, only miniSEED records contained in the CAPS data file are returned in order of the records start time and not the order of arrival. Likewise only miniSEED records are counted for the reported file size unless the -o sloppy-size option is specified.

Performance optimization

When a file is opened all miniSEED records are copied to a memory buffer. This allows fast index based data access at the cost of main memory consumption. The number or simultaneously opened data files can be configured through the -o cached_files option and must match the available memory size. If an application tries to open more files than available, the action will fail.

To obtain the mapped SDS file size the CAPS data file must be scanned for miniSEED records. Although only the header data is read this is still an expensive operation for hundreds of files. A file size cache is used containing up to -o cached_file_sizes entries each consuming 56 bytes of memory. File sizes recently accessed are pushed to the front of the cache. A cache item is invalidated if the modification time of the CAPS data file is more recent than the entry creation time.

If your use case does not require the listing of the exact file size, you may use the -o sloppy-size option which will stop generating the miniSEED file size and will return the size of the CAPS file instead.

Command-Line Options

capstool [options] [capsdir] mountpoint

File-system specific options

-o caps_dir=DIR

Default: Current working directory

Path to the CAPS archive directory.

-o sloppy_size

Return the size of the CAPS data file instead of summing up the size of all MSEED records. Although there is a cache for the MSEED file size calculating the real size is an expensive operation. If your use case does not depend on the exact size you may activate this flag for speedup.

-o cached_file_sizes=int

Default: 100000

Type: int

Number of file sizes to cache. Used when sloppy_size is off to avoid unnecessary recomputation of MSEED sizes. A cache entry is valid as long as neither the mtime nor size of the CAPS data file changed. Each entry consumes 56 bytes of memory.

-o cached_files=int

Default: 100

Type: int

Number of CAPS data files to cache (100). The file handle for each cached file will be kept open to speed up data access.

FUSE Options

-h, --help

Print this help text.

-V, --version

Print version.

-d

Enable debug output (implies -f).

-o debug

Enable debug output (implies -f).

-f

Enable foreground operation.

-s

Disable multi-threaded operation.

-o clone_fd

Use separate fuse device fd for each thread (may improve performance).

-o max_idle_threads=int

Default: -1

Type: int

The maximum number of idle worker threads allowed.

-o max_threads=int

Default: 10

Type: int

The maximum number of worker threads allowed.

-o kernel_cache

Cache files in kernel.

-o [no]auto_cache

Enable caching based on modification times.

-o no_rofd_flush

Disable flushing of read-only fd on close.

-o umask=M

Type: octal

Set file permissions.

-o uid=N

Set file owner.

-o gid=N

Set file group.

-o entry_timeout=T

Default: 1

Unit: s

Type: float

Cache timeout for names.

-o negative_timeout=T

Default: 0

Unit: s

Type: float

Cache timeout for deleted names.

-o attr_timeout=T

Default: 1

Unit: s

Type: float

Cache timeout for attributes.

-o ac_attr_timeout=T

Default: attr_timeout

Unit: s

Type: float

Auto cache timeout for attributes.

-o noforget

Never forget cached inodes.

-o remember=T

Default: 0

Unit: s

Type: float

Remember cached inodes for T seconds.

-o modules=M1[:M2...]

Names of modules to push onto filesystem stack.

-o allow_other

Allow access by all users.

-o allow_root

Allow access by root.

-o auto_unmount

Auto unmount on process termination.

Options for subdir module

-o subdir=DIR

Prepend this directory to all paths (mandatory).

-o [no]rellinks

Transform absolute symlinks to relative.

Options for iconv module

-o from_code=CHARSET

Default: UTF-8

Original encoding of file names.

-o to_code=CHARSET

Default: UTF-8

New encoding of the file names.