capssds¶
Virtual overlay file system presenting a CAPS archive directory as a read-only SDS archive.
Description¶
capssds is a virtual overlay file system presenting a CAPS archive directory as a read-only SDS archive with no extra disk space requirement.
CAPS Directory and file names are mapped. An application reading from a file will only see miniSEED records ordered by record start time. You may connect to the virtual SDS archive using the RecordStream SDS or directly read the single miniSEED file. Other seismological software such as ObsPy or Seisan may read directly from the SDS archive of the files therein.
Usage¶
The virtual file system may be mounted by an unprivileged system user like sysop or configured by the root user to be automatically mounted on machine startup via an /etc/fstab entry or an systemd mount script.
The following sections assume that the CAPS archive is located under /home/sysop/seiscomp/var/lib/caps/archive and the SDS archive should appear under /tmp/sds with all files and directories being owned by the sysop user.
Regardless which of the following mount strategies is chosen make sure to create the target directory first:
mkdir -p /tmp/sds
Unpriviledged user¶
Mount the archive:
capssds ~/seiscomp/var/lib/caps/archive /tmp/sds
Unmount the archive:
fusermount -u /tmp/sds
System administrator - /etc/fstab¶
Create the /etc/fstab entry:
/home/sysop/seiscomp/var/lib/caps/archive /tmp/sds fuse.capssds defaults 0 0
Alternatively you may define mount options, e.g., to deactivate the auto mount, grant the user the option to mount the directory himself or use the sloppy_size feature:
/home/sysop/seiscomp/var/lib/caps/archive /tmp/sds fuse.capssds fuse.capssds noauto,exact_size,user 0 0
Mount the archive:
mount /tmp/sds
Unmount the archive:
umount /tmp/sds
System administrator - systemd¶
Create the following file under /etc/systemd/system/tmp-sds.mount. Please note that the file name must match the path specified under Where with all slashes replaced by a dash:
[Unit]
Description=Mount CAPS archive as readonly miniSEED SDS
After=network.target
[Mount]
What=/home/sysop/var/lib/caps/archive
Where=/tmp/sds
Type=fuse.capssds
Options=defaults,allow_other
[Install]
WantedBy=multi-user.target
Mount the archive:
systemctl start tmp-sds.mount
Unmount the archive:
systemctl stop tmp-sds.mount
Automatic startup:
systemctl enable tmp-sds.mount
Implementation Details¶
capssds makes use of the FUSE [2] is a userspace filesystem framework provided by the Linux kernel as well as the libfuse [3] user space library.
The file system provides only read access to the data files and implements only basic operations required to list and read data files. It has to fulfill 2 main tasks, the Path mapping of CAPS and SDS directory tree entries and the Data file conversion. Caches are used the improve the performance.
Supported operations¶
init - initializes the file system
getattr - get file and directory attributes such as size and access rights
access - check for specific access rights
open - open a file
read - read data at a specific file position
readdir - list directory entries
release - release a file handle
destroy - shutdown the file system
Please refer to fuse.h for a complete list of fuse operations.
Path mapping¶
CAPS uses a comparable directory structure to SDS with three differences:
The channel does not use the .D prefix.
The day of year index is zero-based (0-365) where as SDS uses an index starting with 1 (1-366).
CAPS data files use the extension .data.
The following example shows the translation from a CAPS data file path to an SDS file path for the stream AM.R0F05.00.SHZ for data on January 1st 2025:
2025/AM/R0F05/SHZ/AM.R0F05.00.SHZ.2025.000.data -> 2025/AM/R0F05/SHZ.D/AM.R0F05.00.SHZ.D.2025.001
Directories and file names not fulfilling the miniSEED format specification are not listed.
Data file conversion¶
A CAPS data file contains records of certain types in the order of their arrival together with a record index for record lookup and sorting. If a process reads data, only miniSEED records contained in the CAPS data file are returned in order of the records start time and not the order of arrival. Likewise only miniSEED records are counted for the reported file size unless the -o sloppy-size option is specified.
Performance optimization¶
When a file is opened all miniSEED records are copied to a memory buffer. This allows fast index based data access at the cost of main memory consumption. The number or simultaneously opened data files can be configured through the -o cached_files option and must match the available memory size. If an application tries to open more files than available, the action will fail.
To obtain the mapped SDS file size the CAPS data file must be scanned for miniSEED records. Although only the header data is read this is still an expensive operation for hundreds of files. A file size cache is used containing up to -o cached_file_sizes entries each consuming 56 bytes of memory. File sizes recently accessed are pushed to the front of the cache. A cache item is invalidated if the modification time of the CAPS data file is more recent than the entry creation time.
If your use case does not require the listing of the exact file size, you may use the -o sloppy-size option which will stop generating the miniSEED file size and will return the size of the CAPS file instead.
Command-Line Options¶
capstool [options] [capsdir] mountpoint
File-system specific options¶
- -o caps_dir=DIR¶
Default:
Current working directory
Path to the CAPS archive directory.
- -o sloppy_size¶
Return the size of the CAPS data file instead of summing up the size of all MSEED records. Although there is a cache for the MSEED file size calculating the real size is an expensive operation. If your use case does not depend on the exact size you may activate this flag for speedup.
- -o cached_file_sizes=int¶
Default:
100000
Type: int
Number of file sizes to cache. Used when sloppy_size is off to avoid unnecessary recomputation of MSEED sizes. A cache entry is valid as long as neither the mtime nor size of the CAPS data file changed. Each entry consumes 56 bytes of memory.
- -o cached_files=int¶
Default:
100
Type: int
Number of CAPS data files to cache (100). The file handle for each cached file will be kept open to speed up data access.
FUSE Options¶
- -h, --help¶
Print this help text.
- -V, --version¶
Print version.
- -d¶
Enable debug output (implies -f).
- -o debug¶
Enable debug output (implies -f).
- -f¶
Enable foreground operation.
- -s¶
Disable multi-threaded operation.
- -o clone_fd¶
Use separate fuse device fd for each thread (may improve performance).
- -o max_idle_threads=int¶
Default:
-1
Type: int
The maximum number of idle worker threads allowed.
- -o max_threads=int¶
Default:
10
Type: int
The maximum number of worker threads allowed.
- -o kernel_cache¶
Cache files in kernel.
- -o [no]auto_cache¶
Enable caching based on modification times.
- -o no_rofd_flush¶
Disable flushing of read-only fd on close.
- -o umask=M¶
Type: octal
Set file permissions.
- -o uid=N¶
Set file owner.
- -o gid=N¶
Set file group.
- -o entry_timeout=T¶
Default:
1
Unit: s
Type: float
Cache timeout for names.
- -o negative_timeout=T¶
Default:
0
Unit: s
Type: float
Cache timeout for deleted names.
- -o attr_timeout=T¶
Default:
1
Unit: s
Type: float
Cache timeout for attributes.
- -o ac_attr_timeout=T¶
Default:
attr_timeout
Unit: s
Type: float
Auto cache timeout for attributes.
- -o noforget¶
Never forget cached inodes.
- -o remember=T¶
Default:
0
Unit: s
Type: float
Remember cached inodes for T seconds.
- -o modules=M1[:M2...]¶
Names of modules to push onto filesystem stack.
- -o allow_other¶
Allow access by all users.
- -o allow_root¶
Allow access by root.
- -o auto_unmount¶
Auto unmount on process termination.
Options for subdir module¶
- -o subdir=DIR¶
Prepend this directory to all paths (mandatory).
- -o [no]rellinks¶
Transform absolute symlinks to relative.
Options for iconv module¶
- -o from_code=CHARSET¶
Default:
UTF-8
Original encoding of file names.
- -o to_code=CHARSET¶
Default:
UTF-8
New encoding of the file names.