Source code for mspasspy.algorithms.bundle

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
This is a set of thin wrappers for C++ code to do a process we call 
"bundling".   That is, a Seismogram object can be constructed from 
3 TimeSeries objects spanning a common time period and having the 
same sample rate but with orientation pointing in 3 linearly 
independent directions.   There are a lot of complexities to assembling
such data and manipulating the data objects, which is why the 
process was implemented in C++.   The algorithms here are not completely 
generic and will likely need additions at some future data for nonstandard
data.  "standard" in this context means passive array data archived in 
the SEED (Standard Earthquake Exchange Data) format (aka miniseed which is
a standard subset of seed used for archives).  The algorithms here 
depend upon seismic channels being uniquely defined by four metadata keys:
net, sta, chan, and loc.   Bundles are formed by sorting data in the 
following key order:  net, sta, loc, chan.   The algorithms here are 
further limited to applications to ensembles that are the generalization 
of a reflection seismology "shot gather".   That means an implict assumption
is the ensembles contain data assembled from one and only one event so 
there is a one-to-one relationship between each channel and an event tag.
if data from multiple events or something other than a shot gather 
are to be handled used the BundleGroup function station by station 
sorting the inputs by some other method to create triples of channels 
that can be merged into one Seismogram object.

Created on Mon Jan 11 05:34:10 2021

@author: pavlis
"""
from mspasspy.ccore.algorithms.basic import _bundle_seed_data, _BundleSEEDGroup
from mspasspy.ccore.seismic import TimeSeriesEnsemble
from mspasspy.ccore.utility import MsPASSError, ErrorSeverity


[docs]def bundle_seed_data(ensemble): """ This function can be used to take an (unordered) input ensemble of TimeSeries objects generated from miniseed data and produce an output ensemble of Seismograms produced by bundles linked to the seed name codes net, sta, chan, and loc. An implicit assumption of the algorithm used here is that the data are a variant of a shot gather and the input ensemble defines one net:sta:chan:loc:time_interval for each record that is to be bundled. It can only properly handle pure duplicates for a given net:sta:chan:loc combination. (i.e. if the input has the same TimeSeries defined by net:sta:chan:loc AND a common start and end time). Data with gaps broken into multiple net:sta:chan:loc TimeSeries with different start and end times will produce incomplete results. That is, Seismograms in the output associated with such inputs will either be killed with an associated error log entry or in the best case truncated to the overlap range of one of the segments with the gap(s) between. Irregular start times of any set of TimeSeries forming a single bundle are subject to the same truncation or discard rules described in the related function Bundle3C. Note there is not guarantee the Seismogram objects returned will be in standard coordinates. In fact, they will never be with standard channel names because of the internal sorting. It would normally be highly recommended the user call the rotate_to_standard method on each Seismogram before any use. :param ensemble: is the input ensemble of TimeSeries to be processed. :return: ensemble of Seismogram objects made by bundling input data :rtype: SeismogramEnsemble :exception: Can throw a MsPASSError for a number of conditions. Caller should be enclosed in a handler if run on a large data set. """ if not isinstance(ensemble, TimeSeriesEnsemble): raise MsPASSError( "bundle_seed_data: illegal input - must be a TimeSeriesEnsemble", ErrorSeverity.Invalid, ) try: d3c = _bundle_seed_data(ensemble) except Exception as err: raise MsPASSError( "_bundle_seed_data threw an exception - see more messages below", ErrorSeverity.Invalid, ) from err return d3c
[docs]def BundleSEEDGroup(d, i0=0, iend=2): """ Combine a grouped set of TimeSeries into one Seismogram. A Seismogram object is a bundle of TimeSeries objects that define a nonsingular tranformation matrix that can be used to reconstruct vector group motion. That requires three TimeSeries objects that have define directions that are linearly independent. This function does not directly test for linear independence but depends upon channel codes to assemble one or more bundles needed to build a Seismogram. The algorithm used here is simple and ONLY works if the inputs have been sorted so the channels define a group of three unique channel codes. For example, HHE, HHN, HHZ would form a typical seed channel grouping. The function will attempt to handle duplicates. By that I mean if the group has two of the same channel code like these sequences: HHE, HHE, HHN, HHZ or HHE, HHN, HHN, HHZ, HHZ If the duplicates are pure duplicates there is no complication and the result will be clean. If the time spans of the duplicate channels are different the decision of which to use keys on a simple idea that is most appropriate for data assembled by event with mistakes in associations. That is, it attempts to scans the group for the earliest start time. When duplicates are found it uses the one with a start time closest to the minimum as the one merged to make the output Seismogram. The output will be marked as dead data with no valid data in one of two conditions: (1) less than 3 unique channel names or (2) more than three inputs with an inconsistent set of SEED names. That "inconsistent" test is obscure and yet another example that SEED is a four letter word. Commentary aside, the rules are: 1. The net code must be defined and the same in all TimeSeries passed 2. The station (sta) code must also be the same for all inputs 3. Similarly the loc code must be the same in all inputs. 4. Finally, there is a more obscure test on channel names. They must all have the same first two characters. That is, BHE, BHN, BHN, BHZ is ok but BHE, BHN, BHZ, HHE will cause an immediate exit with no attempt to resolve the ambiguity - that is viewed a usage error in defining the range of the bundle. In all cases where the bundling is not possible the function does not throw an exception but does four things: 1. Merges the Metadata of all inputs (uses the += operator so only the last values of duplicate keys will be preserved in the return) 2. If ProcessingHistory is defined in the input they history records are posted to the returned Seismogram using as if the data were live but the number of input will always be a number different from 3. 3. The return is marked dead. 4. The function posts a (hopefully) informative message to elog of the returned Seismogram. ProcessingHistory is handled internally by this function. If all the components in a group have a nonempty ProcessingHistory the data to link the outputs to the inputs will be posted to ProcessingHistory. :param d: This is assumed to be an array like object of TimeSeries data that are to be used to build the Seismogram objects. They must be sorted as described above or the algorithm will fail. Two typical array like objects to use are the member attribute of a TimeSeriesEnsemble or a python array constructed from a (sorted) collection of TimeSeries objects. :param i0: starting array position for constructing output(s). The default is 0 which would be the normal request for an full ensemble or a single grouping assembled by some other mechanism. A nonzero is useful to work through a larger container one Seismogram at a time. :param iend: end array position. The function will attempt to assemble one or more Seismograms from TimeSeries in the range d[i0] to d[iend]. Default is 2 for a single Seismogram without duplicates. """ d3c = _BundleSEEDGroup(d.member, i0, iend) return d3c