scardac

Waveform archive data availability collector.

Description

scardac scans an SDS waveform archive , e.g., created by slarchive or scart for available miniSEED data. It will collect information about

  • Data extents - the absolute earliest and latest times data is available of a particular channel,

  • Data segments - continuous data segments sharing the same quality and sampling rate attributes.

scardac is intended to be executed periodically, e.g., as a cronjob.

The data availability information is stored in the SeisComP database under the root element DataAvailability. Access to the data availability is provided by the fdsnws module via the services:

  • /fdsnws/station (extent information only, see matchtimeseries and includeavailability request parameters).

  • /fdsnws/ext/availability (extent and segment information provided in different formats)

Non-SDS archives

scardac can be extended by plugins to scan non-SDS archives. For example the daccaps plugin provided by CAPS [3] allows scanning archives generated by a CAPS server. Plugins are added to global module configuration, e.g.:

plugin = xyz

Workflow

  1. Read existing Extents from database

  2. Scan the SDS archive for new channel IDs and create new Extents

  3. Subsequently process the Extents using threads number of parallel threads. For each Extent:

    1. Find all available daily data files

    2. Sort the file list according date

    3. For each data file

      • remove DataSegments that do longer exists

      • update or create DataSegments that changed or are new

      • a segment is split if

        • the jitter (difference between previous records end time and current records start time) is exceeded

        • the quality or sampling rate changed

      • merge segment information into DataAttributeExtents (Extents sharing the same quality and sample rate information)

      • merge segment start and end time into overall Extent

Examples

  1. Get command line help or execute scardac with default parameters and informative debug output:

    scardac -h
    scardac --debug
    
  2. Update the availability of waveform data files existing in the standard SDS archive to the seiscomp database and create an XML file using scxmldump:

    scardac -d mysql://sysop:sysop@localhost/seiscomp -a $SEISCOMP_ROOT/var/lib/archive --debug
    scxmldump -Yf -d mysql://sysop:sysop@localhost/seiscomp -o availability.xml
    
  3. Update the availability of waveform data files existing in the standard SDS archive to the seiscomp database. Use fdsnws to fetch a flat file containing a list of periods of available data from stations of the CX network sharing the same quality and sampling rate attributes:

    scardac -d mysql://sysop:sysop@localhost/seiscomp -a $SEISCOMP_ROOT/var/lib/archive
    wget -O availability.txt 'http://localhost:8080/fdsnws/ext/availability/1/query?network=CX'
    

    Note

    The SeisComP module fdsnws must be running for executing this example.

Module Configuration

etc/defaults/global.cfg
etc/defaults/scardac.cfg
etc/global.cfg
etc/scardac.cfg
~/.seiscomp/global.cfg
~/.seiscomp/scardac.cfg

scardac inherits global options.

archive

Default: @SEISCOMP_ROOT@/var/lib/archive

Type: string

Path to MiniSeed waveform archive where all data is stored. The SDS archive structure is defined as YEAR/NET/STA/CHA/NET.STA.LOC.CHA.YEAR.DATEOFYEAR, e.g. 2018/GE/APE/BHZ.D/GE.APE..BHZ.D.2018.125

threads

Default: 1

Type: int

Number of threads scanning the archive in parallel.

batchSize

Default: 100

Type: int

Batch size of database transactions used when updating data availability segments. Allowed range: [1,1000].

jitter

Default: 0.5

Type: float

Acceptable derivation of end time and start time of successive records in multiples of sample time.

maxSegments

Default: 1000000

Type: int

Maximum number of segments per stream. If the limit is reached no more segments are added to the database and the corresponding extent is flagged as to fragmented. Use a negative value to disable any limit.

Command-Line Options

scardac [OPTION]...

Generic

-h, --help

Show help message.

-V, --version

Show version information.

--config-file arg

Use alternative configuration file. When this option is used the loading of all stages is disabled. Only the given configuration file is parsed and used. To use another name for the configuration create a symbolic link of the application or copy it. Example: scautopick -> scautopick2.

--plugins arg

Load given plugins.

Verbosity

--verbosity arg

Verbosity level [0..4]. 0:quiet, 1:error, 2:warning, 3:info, 4:debug.

-v, --v

Increase verbosity level (may be repeated, eg. -vv).

-q, --quiet

Quiet mode: no logging output.

--print-component arg

For each log entry print the component right after the log level. By default the component output is enabled for file output but disabled for console output.

--component arg

Limit the logging to a certain component. This option can be given more than once.

-s, --syslog

Use syslog logging backend. The output usually goes to /var/lib/messages.

-l, --lockfile arg

Path to lock file.

--console arg

Send log output to stdout.

--debug

Execute in debug mode. Equivalent to --verbosity=4 --console=1 .

--trace

Execute in trace mode. Equivalent to --verbosity=4 --console=1 --print-component=1 --print-context=1 .

--log-file arg

Use alternative log file.

Collector

-a, --archive arg

Overrides configuration parameter archive.

--threads arg

Overrides configuration parameter threads.

-b, --batch-size arg

Overrides configuration parameter batchsize.

-j, --jitter arg

Overrides configuration parameter jitter.

--generate-test-data arg

Do not scan the archive but generate test data for each stream in the inventory. Format: days,gaps,gapslen,overlaps,overlaplen. E.g. the following parameter list would generate test data for 100 days (starting from now()-100) which includes 150 gaps with a length of 2.5s followed by 50 overlaps with an overlap of 5s: --generate-test-data=100,150,2.5,50,5