.. highlight:: rst

.. _scardac:

#######
scardac
#######

**Waveform archive data availability collector**


Description
===========

scardac scans an :term:`SDS waveform archive <SDS>` , e.g.,
created by :ref:`slarchive` or :ref:`scart` for
available `miniSEED <http://www.iris.edu/data/miniseed.htm>`_ data. It will
collect information about

* data extents - the absolute earliest and latest times data is available of a
  particular channel
* data segments - continuous data segments sharing the same quality and sampling
  rate attributes

scardac is intended to be executed periodically, e.g., as a cronjob.

The data availability information is stored in the SeisComP3 database under the
root element :ref:`DataAvailability <api-datamodel-python>`. Access to the data
availability is provided by the :ref:`fdsnws` module via the services:

* :ref:`/fdsnws/station <sec-station>` (extent information only, see
  ``matchtimeseries`` and ``includeavailability`` request parameters).
* :ref:`/fdsnws/ext/availability <sec-avail>` (extent and segment information
  provided in different formats)

Workflow
--------

1. Read existing ``Extents`` from database
#. Scan the SDS archive for new channel IDs and create new ``Extents``
#. Subsequently process the ``Extents`` using ``threads`` number of parallel
   threads. For each ``Extent``:

   1. Find all available daily data files
   #. Sort the file list according date
   #. For each data file

     * remove ``DataSegments`` that do longer exists
     * update or create ``DataSegments`` that changed or are new
     * a segment is split if

       * the ``jitter`` (difference between previous records end time and
         current records start time) is exceeded
       * the quality or sampling rate changed

     * merge segment information into ``DataAttributeExtents`` (``Extents``
       sharing the same quality and sample rate information)
     * merge segment start and end time into overall ``Extent``

Examples
--------

1. Get command line help or execute scardac with default parameters and informative
   debug output:

   .. code-block:: sh

      scardac -h
      scardac --debug

#. Update the availability of waveform data files existing in the standard
   :term:`SDS` archive to the seiscomp3 database and create an XML file using
   :ref:`scxmldump`:

   .. code-block:: sh

      scardac -d mysql://sysop:sysop@localhost/seiscomp3 -a $SEISCOMP_ROOT/var/lib/archive --debug
      scxmldump -Yf -d mysql://sysop:sysop@localhost/seiscomp3 -o availability.xml

#. Update the availability of waveform data files existing in the standard
   :term:`SDS` archive to the seiscomp3 database. Use :ref:`fdsnws` to fetch a flat file containing a list
   of periods of available data from stations of the CX network sharing the same
   quality and sampling rate attributes:

   .. code-block:: sh

      scardac -d mysql://sysop:sysop@localhost/seiscomp3 -a $SEISCOMP_ROOT/var/lib/archive
      wget -O availability.txt 'http://localhost:8080/fdsnws/ext/availability/1/query?network=CX'

   .. note::

      The SeisComP3 module :ref:`fdsnws` must be running for executing this example.


.. _scardac_configuration:

Configuration
=============

| :file:`etc/defaults/global.cfg`
| :file:`etc/defaults/scardac.cfg`
| :file:`etc/global.cfg`
| :file:`etc/scardac.cfg`
| :file:`~/.seiscomp3/global.cfg`
| :file:`~/.seiscomp3/scardac.cfg`

scardac inherits :ref:`global options<global-configuration>`.


.. confval:: archive

   Type: *string*

   Path to MiniSeed waveform archive where all data is stored. The SDS archive
   structure is defined as
   YEAR\/NET\/STA\/CHA\/NET.STA.LOC.CHA.YEAR.DATEOFYEAR, e.g.
   2018\/GE\/APE\/BHZ.D\/GE.APE..BHZ.D.2018.125
   Default is ``@SEISCOMP_ROOT@/var/lib/archive``.

.. confval:: threads

   Type: *int*

   Number of threads scanning the archive in parallel
   Default is ``1``.

.. confval:: batchSize

   Type: *int*

   Batch size of database transactions used when updating data
   availability segments. Allowed range: [1,1000]
   Default is ``100``.

.. confval:: jitter

   Type: *float*

   Acceptable derivation of end time and start time of successive
   records in multiples of sample time.
   Default is ``0.5``.

.. confval:: maxSegments

   Type: *int*

   Maximum number of segments per stream. If the limit is reached
   no more segments are added to the database and the corresponding
   extent is flagged as to fragmented. Use a negative value to
   disable any limit.
   Default is ``1000000``.


Command-line
============

.. program:: scardac

:program:`scardac [OPTION]...`


Generic
-------

.. option:: -h, --help

   show help message.

.. option:: -V, --version

   show version information

.. option:: --config-file arg

   Use alternative configuration file. When this option is used
   the loading of all stages is disabled. Only the given configuration
   file is parsed and used. To use another name for the configuration
   create a symbolic link of the application or copy it, eg scautopick \-> scautopick2.

.. option:: --plugins arg

   Load given plugins.

.. option:: -D, --daemon

   Run as daemon. This means the application will fork itself and
   doesn't need to be started with \&.


Verbosity
---------

.. option:: --verbosity arg

   Verbosity level [0..4]. 0:quiet, 1:error, 2:warning, 3:info, 4:debug

.. option:: -v, --v

   Increase verbosity level \(may be repeated, eg. \-vv\)

.. option:: -q, --quiet

   Quiet mode: no logging output

.. option:: --print-component arg

   For each log entry print the component right after the
   log level. By default the component output is enabled
   for file output but disabled for console output.

.. option:: --component arg

   Limits the logging to a certain component. This option can be given more than once.

.. option:: -s, --syslog

   Use syslog logging back end. The output usually goes to \/var\/lib\/messages.

.. option:: -l, --lockfile arg

   Path to lock file.

.. option:: --console arg

   Send log output to stdout.

.. option:: --debug

   Debug mode: \-\-verbosity\=4 \-\-console\=1

.. option:: --trace

   Trace mode: \-\-verbosity\=4 \-\-console\=1 \-\-print\-component\=1 \-\-print\-context\=1

.. option:: --log-file arg

   Use alternative log file.


Collector
---------

.. option:: -a, --archive arg

   Overrides configuration parameter :confval:`archive`.

.. option:: --threads arg

   Overrides configuration parameter :confval:`threads`.

.. option:: -b, --batch-size arg

   Overrides configuration parameter :confval:`batchsize`.

.. option:: -j, --jitter arg

   Overrides configuration parameter :confval:`jitter`.

.. option:: --generate-test-data arg

   Do not scan the archive but generate test data for each
   stream in the inventory. Format:
   days,gaps,gapslen,overlaps,overlaplen. E.g. the following
   parameter list would generate test data for 100 days
   \(starting from now\(\)\-100\) which includes 150 gaps with a
   length of 2.5s followed by 50 overlaps with an overlap of
   5s: \-\-generate\-test\-data\=100,150,2.5,50,5