mspasspy.algorithms
MCXcorStacking
Module implementing a similar algorithm to dbxcor in python.
Created on Tue Jul 9 05:37:40 2024
@author: pavlis
- mspasspy.algorithms.MCXcorStacking.MCXcorPrepP(ensemble, noise_window, noise_metric='mad', initial_beam_metric='bandwidth', snr_doc_key='Parrival', low_f_corner=None, high_f_corner=None, npoles=4, filter_parameter_keys=['MCXcor_f_low', 'MCXcor_f_high', 'MCXcor_npoles'], coda_level_factor=4.0, set_phases=True, model=None, Ptime_key='Ptime', pPtime_key='pPtime', PPtime_key='PPtime', station_collection='channel', search_window_fraction=0.9, minimum_coda_duration=5.0, correlation_window_start=-3.0) TimeSeriesEnsemble [source]
Function used to preprocess an ensemble to prepare input for running multichannel cross-correlation and stacking method (automated dbxcor algorithm) of P phase data. This function should not be used for anything but teleseismic P data.
The multichannel correlation and stacking function in this module called align_and_stack requires one to define a number of parameters that in the original dbxcor implementation were input through a graphical user interface. This function can be thought of a robot that will attempt to set all the required parameters automatically using signal processing. It will attempt to produce two inputs required by the align_and_stack: 1. What I call a “correlation window”. 2. The ensemble member that is to be used as the initial estimate of the
beam (stack of the data aligned by cross correlation).
A CRITICAL assumption of this function is that the input ensemble’s data have been processed with the snr module function broadband_snr_QC. That function computes and posts multiple snr metrics to a subdocument (dictionary) accessed with a single key. This function requires that data exist and be accessible with the key defined by the argument “snr_doc_key”. The data in that dictionary are used in the following algorithms run within this function: 1. The working bandwidth of the data is established by computing the median
of the attributes “low_f_band_edge” and “high_f_band_edge” extracted from each live member. Those frequencies are computed using the function in this module called estimate_ensemble_bandwidth. The working ensemble (which is what is returned on success) will be filtered in the band defined by the output of estimate_ensemble_bandwidth using a Butterworth, bandpass filter with the number of poles defined by the “npoles” argument.
What I call the correlation window is estimated by a complex recipe best understood from the user manual page and example jupyter notebooks related to this module. Briefly, the correlation window is defined by applying an envelope function to each signal and defining the coda end using a common recipe for coda magnitudes where the end of the coda is defined by where the coda level (envelope) first falls below a level specified as a multiple of measured background noise. This function REQUIRES the input for each member to have a section that it can treat as background noise. For most uses that means some section of data immediately before the P wave arrival time. The noise level is estimated by the metric defined by the “noise_metric” argument and the coda level cutoff is computed as noise_level*coda_level_factor where coda_level_factor is the valued passed via the function argument with that key. Because this function is designed strictly for P phases it has to handle the complexity of interference by secondary P phases. For that reason it compute arrival times for pP and PP and will always start the coda search before time of the smaller of pP or PP. Note, however, that pP is not used a constraint if the source depth is less than 100 km. The justification for that number can be found in jupyter notebooks that should be part of the MsPASS documentation.
It then extracts one member from the ensemble the algorithm judges to be the best choice for an initial beam estimate. That is, in align_and_stack the first step is to align the data with a reference signal using cross-correlation of the “beam” with each member. The aligned data are then summed with a robust stack to refine the beam. That is repeated until the stack does not change significantly. That algorithm requires a seed for the initial beam estimate. That initial signal is extracted as the ensemble member with the largest value of the snr metric defined with the initial_beam_metric argument. The correlation window parameters computed earlier are then posted to the Metadata of that TimeSeries where align_and_stack can parse them to set the correlation time window.
The function returns a copy of the input ensemble filtered to the average bandwidth and the initial beam estimate. They are return as a tuple with 0 the ensemble and 1 the initial beam estimate. Callers should verify the beam signal is marked live. A dead beam signal indicates the algorithm failed in one way or another. Errors at the level will result in errors being posted to the ErrorLogger container of the beam output. Note also this algorithm can kill some ensemble members in the output that were marked live in the input.
- Parameters:
ensemble – input ensemble of data to use. As noted above it
must have been processed with broadband_snr_QC or this function will return a null result. The function makes a tacit assumption that all the members of this ensemble are in relative time with 0 of each member being an estimate of the P wave arrival time. It does no tests to validate this assumption so the assumption is wrong you will, at best, get junk as output. :type ensemble: TimeSeriesEnsemble Note for most uses this is the output of ExtractComponent of a SeismogramEnsemble processed with broadband_snr_QC that is the longitudinal component for the P phase. :param noise_window: time window defining the section of each ensemble member that is to be treated as “noise” for estimating a relative noise level. :type noise_window: TimeWindow tacitly expected to be time relative to a measure of the P wave arrival time. :param noise_metric: vector norm metric to use for measuring the amplitude of the noise extracted from the data in the range defined by noise_window. :type noise_metric: string (Must be one of: “mad” (default), “rms”, or “peak”). Any other string will cause the function to throw a ValueError exception. :param initial_beam_metric: key of attribute to extract from the output of broadband_snr_QC used to select initial beam signal. This argument is passed to the extract_initial_beam_estimate function for as the “metric” argument of that function. :param initial_beam_metric: string (default “bandwidth”). For a list of allowed values see the docstring of extract_initial_beam_estimate. :param snr_doc_key: key to use to fetch the subdocument containing the output of broadband_snr_QC. :type snr_doc_key: string (default “Parrival” which is the default of broadband_snr_QC). :param low_f_corner: force the low frequency corner for the output data bandwidth to this value. When defined, the internal call to estimate_ensemble_bandwidth is bypassed and the output data are filtered between the value defined by low_f_corner and high_f_corner. :type low_f_corner: float (default None which is interpreted to mean scan ensemble to estimate the frequency range) :param high_f_corner: force the high frequency corner for the output data bandwidth to this value. When defined, the internal call to estimate_ensemble_bandwidth is bypassed and the output data are filtered between the value defined by low_f_corner and high_f_corner. :type high_f_corner: float (default None which is interpreted to mean scan ensemble to estimate the frequency range) :param npoles: number of poles to use for the Butterworth bandpass filter. :type npoles: integer (default 4) :param set_phases: boolean that if set True (default) the P phase time s P, pP (if defined), and PP (if defined) are computed and posted to the output of each ensemble member. If False the algorith assumes the same quantities were previously calculated and can be fetched from the member TimeSeries Metadata container using keys defined by Ptime_key, pPtimme_key, and PPtime_key. :param Ptime_key: :param pPtime_key: :param PPtime_key: These three arguments define alternative keys that will be used to fetch (if set_phases is false) or post (if set_phases is True) computed P, pP, and PP times to each member’s Metadata container. Changing any of these values not really advised when set_phases is True. These are most useful if the phase arrival times were previously computed or measured and posted with different keys. :param station_collection: MonogDB collection name used to fetch receiver coordinate data. Normal practice in MsPASS is to save receiver Metadata in two channels called “channel” and “site” and to load the coordinate data through normalization when the data are loaded. The MsPASS convention defines data loaded by normalization with a leading collection name. e.g. the “lat” value extracted from a “channel” document would be posted to the data as “channel_lat”. The same value, however, loaded from “site” would be tagged “site_lat”. The default for this argument is “channel” which means the algorithm will require the attributes “channel_lat” and “channel_lon” as the receiver coordinates (in degrees). The standard alternative is to define station_collection=”site”, in which case the function will use “site_lat” and “site_lon”. :type station_collection: string (default “channel”) :param search_window_fraction: The window for defining the correlation window is determined by running the internal _coda_duration function on each ensemble member and computing the range from the median of the ranges computed from all the ensemble members. Each coda search, however, is constrained by the time so f pP and/or PP. As noted above the function uses the pP time for events with depths greater than 100 km but PP for shallow sources. To allow for hypocenter errors the duration defined by P to pP or P to PP is multiplied by this factor to define the search start for the coda estimation. :type search_window_fraction: float (default 0.9) :param minimum_coda_duration: if the estimate of coda duration computed internally is less than this value the correlation window is set to this value - relative time from P. :type minimum_coda_duration: float (5.0 seconds) :param correlation_window_start: the time of the correlation window set in the output “beam” is fixed as this value. It is normally a negative number defining a time before P that no signal is likely to have an arrival before this relative time. :type correlation_window_start: float (default -3.0)
- mspasspy.algorithms.MCXcorStacking.align_and_stack(ensemble, beam, correlation_window=None, correlation_window_keys=['correlation_window_start', 'correlation_window_end'], window_beam=True, robust_stack_window=None, robust_stack_window_keys=['robust_window_start', 'robust_window_end'], robust_stack_method='dbxcor', use_median_initial_stack=True, output_stack_window=None, robust_weight_key='robust_stack_weight', time_shift_key='arrival_time_correction', time_shift_limit=2.0, abort_irregular_sampling=False, convergence=0.01, residual_norm_floor=0.1, demean_residuals=True) tuple [source]
This function uses an initial estimate of the array stack passed as the beam argument as a seed to a robust algorithm that will align all the data in the input ensemble by cross-correlation with the beam, apply a robust stack to the aligned signals, update the beam with the robust stack, and repeat until the changes to the beam signal are small. Returns a copy of the ensemble time shifted to be aligned with beam time base and an updated beam estimate crated by the robust stack. The shifts and the weights of each input signal are stored in the Metadata of each live ensemble member returned with keys defined by robust_weight_key and time_shift_key.
This function is a python implementation of the same basic algorithm used in the dbxcor program described by Pavlis and Vernon(2010) Array Processing of teleseismic body waves with the USArray, Computers and Geosciences,15, 910-920. It has additional options made possible by the python interface and integration into MsPASS. In particular, the original algorithm was designed to work as part of a GUI where the user had to pick a set of required parameters for the algorithm. In this function those are supplied through Metadata key-value pairs and/or arguments. This function uses a pythonic approach aimed to allow this function to be run in batch without user intervention. The original dbxcor algorithm required four interactive picks to set the input. The way we set them for this automated algorithm is described in the following four itemize paragraphs:
The “correlation window”, which is the waveform segment used to compute cross-correlations between the beam and all ensmebled members, is set one of three ways. The default uses the time window defined by the starttime and endtime of the beam signal as the cross-correlation window. Alternative, this window can be specified either by fetching values from the Metadata container of beam or via the correlation_window argument. The algorithm first tests if correlation_window is set and is an instance of a TimeWindow object. If the type of the argument is not a TimeWindow object an error is logged and the program reverts to using the span of the beam signal as the correlation window. If correlation_window is a None (default) the algorithm then checks for a valid input via the correlation_window_keys argument. If defined that argument is assumed to contain a pair of strings that can be used as keys to fetch start (component 0) and end times (component 1) from the Metadata container of the TimeSeries objct passed via beam. For example, ```
correlation_window_keys = [‘correlation_start’,’correlation_end’]
``` would cause the function to attempt to fetch the start time with “correlation_start” and end time with “correlation_end”. In the default both correlation_window and correlation_window_keys are None which cause the function to silently use the window defined as [beam.t0, beam.endtime()] as the correlation winow. If the optional boolean window_beam argument is set True the function will attempt to window the beam using a range input via either of the optional methods of setting the correlation window. An error is logged and nothing happens if window_beam is set True and the default use of the beam window is being used.
The “robust window” is a concept used in dbxcor to implement a special robust stacking algorithm that is a novel feature of the dbxcor algorithm. It uses a very aggresssive weighting scheme to downweight signals that do not match the beam. The Pavlis and Vernon paper shows examples of how this algorithm can cleanly handle ensembles with a mix of high signal-to-noise data with pure junk and produce a clean stack that is defined. Note recent experience has shown that with large, consistent ensembles the dbxcor robust estimate tends to converge to the focus on the signal closestbeam_correlation to the median stack. The reason is that the median stack is always used as the initial estimator. Hence, it can be thought of as a median stack that uses the full data set more completely.
dbxcor required the user to pick a seed signal to use as the initial beam estimate. That approach is NOT used here but idea is to have some estimate passed to the algorithm via the beam (arg1) argument. In MsPASS the working model is to apply the broadband_snr_QC function to the data before running this function and select the initial seed (beam) from one or more of the computed snr metrics. In addition, with this approach I envision a two-stage computation where the some initial seed is used for a first pass. The return is then used to revise the correlation window by examining stack coherence metrics and then rerunning the algorithm. The point is it is a research problem for different types of data to know how to best handle the align and stack problem.
dbxcor had a final stage that required picking the arrival time to use as the reference from the computed beam trace. That is actually necessary if absolute times are needed because the method used will be biased by the time shift of the beam relative to the reference time. See the Pavlis and Vernon paper for more on this topic. The idea here is that if absolute times are needed some secondary processing will be used to manually or automatically pick an arrival time from the beam output.
This function does some consistency checking of arguments to handle the different modes for handling the correlation and robust windows noted above. It also applies a series of validation tests on the input ensemble before attempting to run. Any of the following will cause the return to be a dead ensemble with an explanation in the elog container of the ensemble (in these situation the stack is an empty TimeSeries container):
Irregular sample intervals of live data.
Any live data with the time reference set to UTC
Inconsistency of the time range of the data and the time windows parsed for the correlation and robust windows. That is, it checks the time span of all member functions and if the time range of all members (min of start time and maximum end times). is not outside (inclusive) the range of the correlation and robust windows it is viewed as invalid.
What we called the “robust window” in the dbxcor paper isextract_input_beam_estimate required to be inside (inclusive of endpoints) the cross-correlation window time window. That could be relaxed but is a useful constraint because in my (glp) experience the most coherent part of phase arrivals is the first few cycles of the phase that is also the part cross-correlation needs to contain if it is to be stable. The “robust window” should be set to focus on the section of the signal that will have the most coherent stack.
There is a further complexity in the iteration sequence used by this algorithm for any robust stack method. That is, time shifts computed by cross-correlation can potentially move the correlation window outside the bounds of the data received as input. To reduce the impact of that potential problem the function has an optional argument called time_shift_limit that is validated against other inputs. In particular, the function computes the average start and end time (keep in mind the assumption is the time base is time relative to the arrival time a particular phase) of the input ensemble. If the difference between the average start time and the correlation window start time is less than time_shift_limit the input is viewed as problematic. How that is handled depends on how the correlation window is set. If it is received as constant (correlation_window argument) an exception will be thrown to abort the entire job. It that window is extracted from the beam TimeSeries Metadata container a complaint is logged to the outputs. An endtime inconsistency is treated the same way. i.e. it is treated as a problem if the average ensemble endtime - the correlation window endtime is less than the time_shift_limit.
A related issue is that arrival times estimated by this algorithm will be biased by the model mismatch with whatever signal was used as the initial beam estimae. In dbxcor that was handled by forcing the user to manually pick the first arrival of the computed stack. That could be done if desired but would require you to devise a scheme to do that picking. The default here is handled by the boolean parameter demean_residuals. When True (the default) the vector of computed time shifts is corrected by the mean value of the group. Note that is common practice in regional tomography inversion anyway. For reasonable sized ensembles it will tend to yield data that when aligned by arrival time are all close to 0 relative time.
Note the output stack normally spans a different time range than either the correlation or robust windows. That property is defined by the output_stack_window argument. See below for details.
- Parameters:
ensemble (TimeSeriesEnsemble with some fairly rigid requirements. (see above)) – ensemble of data to be aligned and stacked. This function requires all data to be on a relative time base. It will throw a MsPASSError exception if any live datum has a UTC time base. The assumption is all data have a time span that have the correlation and robust windows inside the data time range. Both windows are carved from the inputs using the WindowData function which will kill any members that do not satisfy this requirement.
beam (TimeSeries. Must have a length consistent with window parameters.) – Estimate of stack (may be just one representative member) used as the seed for initial alignment and stacking.
correlation_window (TimeWindow to define explicitly. If None (default) uses the recipe driven by correlation_window_keys (see above)) – Used to specify the time window for computing cross-correlations with the beam signal. Closely linked to correlation_window_keys as described above.
correlation_window_keys – optional pair of Metadata keys used to extract cross-correlation window attributes from beam Metadata container. If defined component 0 is taken as the key for the start time of the window and component 1 the key for the end time.
window_beam – if True (default) the parsed cross-correlation window attributes are applied to the beam signal as well as the data before starting processing. If False the beam signal is used directly in all cross-correlations. Set False only if you can be sure secondary phases are not present in the unwindowed input.
robust_stack_window (If defined must be a TimeWindow object. If a None type (default) use the logic defined above to set this time window.) – Provide an explicity TimeWindow used for extracting the robust window for this algorithm. Interacts with the robust_stack_window_keys argument as described above.
robust_stack_window_keys (iterable list of two strings) – specifies a pair of strings to be used as keys to extract the strart time (component 0) and end time (component 1) of the robust time window to use from the beam `TimeSeries.
output_stack_window (TimeWindow object. If None (default) the range is derived from the ensemble member time ranges.) – optional TimeWindow to apply to the computed robust stack output. Default returns a stack spanning the inner range of all live members of the ensemble.
robust_weight_key (string) – The robust weight used for each member to compute the robust stack output is posted to the Metadata container of each live member with this key.
robust_stack_method (string - must be one of options listed above.) – keyword defining the method textract_input_beam_estimateo use for computing the robust stack. Currently accepted value are: “dbxcor” (default) and “median”.
use_median_initial_stack – If True (default) use the median stack as the initial estimate of the stack in the first iterative pass. When False the signal in the TimeSeries passed via the “beam” argument is used for the initial estimate.
time_shift_key (string (default "arrival_time_correction")) – the time shift applied relative to the starting point is posted to each live member with this key. It is IMPORTANT to realize this is the time for this pass. If thisextract_input_beam_estimatefunctions is applied more than once and you reuse this key the shift from the previous run will be overwritten. If you need to accumulate shifts it needs to be handled outside this function.
convergence (real number (default 0.001)) – fractional change in robust stack estimates in iterative loop to define convergence. This should not be changed unless you deeply understand the algorithm.
time_shift_limit (float) – when time shifting data with the cross correlation algorithm any estimated time shift larger than this value will be truncated to this value with the sign of th shift preserved.
abort_irregular_sampling (boolean) – boolean that controls error handling of data with irregular sample rates. This function uses a generous test for sample rate mismatch. A mismatch is detected only if the computed time skew over the time span of the input beam signal is more than 1/2 of the beam sample interval (beam.dt). When set true the function will abort with a ValueError exception if any ensemble member fails the sample interval test. If False (the default) offending ensemble members are killed and a message is posted. Note the actual ensemble is modified so the return may have fewer data live than the input when this mode is enabled.
residual_norm_floor (float (default 0.01)) – floor on residuals used to compute dbxcor weight function. See docstring for dbxcor_weights for details.
demean_residuals – boolean controlling if the computed shifts are corrected with a demean operation. Default is True which means the set of all time shifts computed by this function will have zero mean.
- Returns:
tuple with 0 containing the original ensemble but time shifted by cross-correlation. Failed/discarded signals for the stack are not killed but should be detected by not having the time shift Metadata value set. component 1 is the computed stack windowed to the range defined by stack_time_window.
- mspasspy.algorithms.MCXcorStacking.amplitude_relative_to_beam(d, beam, normalize_beam=True, window=None)[source]
Compute and return amplitude relative to the stack (beam).
dbxcor computed a useful metric of amplitude relative to the beam (stack) as (d dot beam)/N where “dot” means vector dot product with between the sample vectbeam_coherence(d,beam,window=None,filter=True)ors of b and beam and N is vector size. The formula requires the beam to be normalized so its L2 norm is 1. By default it assumes that but that can be overriden with the normalize_beam argument. Default assumes d and beam cover the same time span. If they aren’t the minimum overlap of the two signals is used.
- Parameters:
d (TimeSeries assumed - will throw an exception if it isn’t) – datum for which the relative amplitude is to be computed.
beam (TimeSeries assumed - will throw an exception if it isn’t) – stack with which it is to be compared.
normalize_beam (boolean) – If True (default) the windowed beam vector will be normalized to be a unit vector (i.e. L2 norm of 1.0) for the calculation. If False that will not be done and the vector in beam will be used directly. Use False ONLY if windowing is off and beam is already normalized. A minor efficiency gain is possible if beam is already normalized.
- Returns:
float value of amplitude. A negative number indicates an error.
- mspasspy.algorithms.MCXcorStacking.beam_align(ensemble, beam, window=None, time_shift_limit=10.0)[source]
Aligns ensemble members using signal defined by beam (arg1) argument.
Computes cross correlation between each ensemble member and the beam. An optional window can be specified that is applied to each ensemble member before computing the cross correlation function. All live ensemble members are shifted to align with time base of the beam. Note that can be a huge shift if the beam is relative and the ensemble members are absolute time. It should work in that context but the original context was aligning common-source gathers for teleseismic phase alignment where the expectation is all the ensemble members and the beam are in relative time with 0 defined by some estimate of the phase arrival time. Will correctly handle irregular window sizes between ensemble members and beam signal.
It is important to recognize that if the window option is used it is applied only internally. In that situation the output will be time shifted but the number of samples of each member will be the same.
- Parameters:
ensemble (assumed to be a TimeSeriesEnsemble) – ensemble of data to be correlated with beam data.
beam (assumed to be a TimeSeries object) – common signal to correlate with ensemble members.
window (
mspasspy.ccore.algorithms.basic.TimeWindow
) – optional window to apply to ensemble members before computing cross correlation.time_shift_limit (float (may abort if you use an int because the value can to sent to a C++ method that it type sensitive)) – ceiling on allowed time shift for ensemble members. Any computed shift with absolute value larger than this value will be reset to this value with the sign of the shift preserved. (i.e. a negative lag will be set to the negative of this number). The default is 10.0 which is large for most data more or less making this an optional parameter.
- Returns:
copy of ensemble with the members time shifted to align with the time base of beam. Note if a window is defined it is not applied to the ensemble members.
- mspasspy.algorithms.MCXcorStacking.beam_coherence(d, beam, window=None) float [source]
Compute time-domain coherence of a datum relative to the stack.
Time domain coherence is a measure of misfit between a signal and a a reference signal (normally a stack). The formula is 1.0 - norm(residual)/norm(beam) where residual=d-beam.
This function has an optional window parameter that computes the coherence with a specified time window. By default the entire d and beam signals are used. The default is done cautiously by using windowing to the mininum overlap of the two signals (if they differ)
- mspasspy.algorithms.MCXcorStacking.beam_correlation(d, beam, window=None, aligned=True) float [source]
Computes normalized peak cross-correlation value a datum with an array stack.
Cross-correlation is a heavily used concept in seismology. This function is a specialized version designed to compute a peak cross correlation value between a datum and an array stack. The normal use is to call this function in a loop and post the results to each live member of the ensemble used to compute the beam. Note it is assumed beam and d are filtered in the same passband. Windowing is bombproof
- mspasspy.algorithms.MCXcorStacking.dbxcor_weights(ensemble, stack, residual_norm_floor=0.1)[source]
Computes the robust weights used originally in dbxcor for each member of ensemble. Result is returned in a parallel numpy array (i.e. return[i] is computed weight for ensemble.member[i])
This function is made for speed and has no safeties. It assumes all the members of ensemble are the same and it is the same length as stack. It will throw an exception if that is violated so callers should guarantee that does not happen.
This function adds a feature not found in the original Pavlis and Vernon(2011) paper via the residual_norm_floor argument. Experience with this algorithm showed it tends to converge to a state with very high weights on one or two signals and low weights on the rest. The high weights are given to the one or two signals most closely matching the stack at convergence. This has the undesirable effect of causing a strong initial value dependence and suboptimal noise reduction. This argument handles that by not allowing the L2 norm of the residual to fall below a floor computed from the ratio ||r||/||d||. i.e. if ||r|| < residual_norm_floor*||d|| it is set to the value passed as residual_norm_floor. Note that setting this to 1.0 effectively turns off the residual norm term in the weight equation which I now recognize is the cross-correlation of the beam and datum at zero lag. That is true, however, only if the ensemble members are all time aligned perfectly.
Returns a numpy vector of weights. Any dead data will have a weight of -1.0 (test for negative is sufficient). In addition the function has a safety to handle receiving a vector of all zeros. If the function detects all 0s for a datum marked live it will silently return a weight of 0 for that datum.
- Parameters:
ensemble – TimeSeriesEnsemble of data from which weights are to
be computed. :type ensemble: Assumed to be a TimeSeriesEnsemble. No type checking is done so if input is wrong an exception will occur but what is thown will depend on what ensemble actually is. :param stack: TimeSeries containing stack to be used to compute weights. The method returns robust weights relative to the vector of data in this object. :type stack: TimeSeries. :param residual_norm_floor: nondimensional floor in the ratio norm2(r)/norm2(d) used as described above. Default is 0.1 which is reasonable for high snr signals. Data with irregular quality can profit from smaller values of this parameter. :type residual_norm_floor: float :return: numpy vector of weights parallel with ensemble.member. Dead members will have a negative weight in this vector.
- mspasspy.algorithms.MCXcorStacking.demean_residuals(ensemble, measured_time_key='Ptime_xcor', model_time_key='Ptime', residual_time_key='Presidual', corrected_measured_time_key='Pmeasured', center_method='median', center_estimate_key='Presidual_bias', kill_on_failure=False)[source]
Residuals measured by any method are always subject to a bias from Earth structure and earthquake location errors. With the MsPASS multichannel xcor algorithm there is an additional bias inherited from the choice of the initial beam estimate that can produce an even larger bias. This algorithm id designed to process an ensemble to remove and estimate of center from the residuals stored in the Metadata containers of the ensemble members.
The arguments can be used to change the keys used to access attributes needed to compute the residual and store the result. The formula is trivial an with kwarg symbols is:
residual_time_key = measured_time_key - model_time_key - bias
where bias is the estimate of center computed from the vector of (uncorrected) residual extracted from the ensemble members. Each member of the ensemble where the measured and model times are defined will have a value set for the residual_time_key in the output. The actual estimate of bias will be posted to the output in the ensemble’s Mewtadata container with the key defined by the argument “center_estimate_key”.
Note the corrected (by estimate of center that is) measured phase arrival time will be stored in each member with the key defined by the “corrected_measured_time_key” argument. By default that is a different key than measured_time_key but some many want to make measured_time_key==corrected_measured_time_key to reduce complexity. That is allowed but is a choice not a requirement. Again, default will make a new entry for the corrected value.
- Parameters:
ensemble (
mspasspy.ccore.seismic.TimeSeriesEnsemble
ormspasspy.ccore.seismic.SeismogramEnsemble
. The function will throw a ValueError exception if this require arg is any other type.) – ensemble object to be processed. Assumes the members have defined values for the keys defined by the “measured_time_key” and “model_time_key”.measured_time_key (string (default "Ptime_xcor")) – key to use to fetch the measured arrival time of interest from member Metadata containers.
model_time_key (string (default "Ptime")) – key to use to fetch the arrival time computed from an earth model and location estimate. Each member is assumed to have this value defined.
residual_time_key (string (default "Presidual")) – key use to store the output demeaned residual. Be warned that this will overwrite a previously stored value if the key was previously defined for something else. That can, however, be a useful feature if the same data are reprocessed.
corrected_mesured_time_key – key to use to save the bias corrected measured time estimate. This value is just the input measured time value minus the computed bias.
center_method (string (default "median")) – method of center method to use to compute bias correction from vector of residuals. Valid values are: “median” to use the median and “mean” (“average” is also accepted an is treated as “mean”) to use the average/mean value operator.
center_estimate_key (string (default "Presidual_bias")) – key to use to post the estimated center of the vector of residuals. Note that value is posted to the ensemble’s Metadata container, not the members. Be warned, however, that currentlyh when ensemble data are saved this value will be posted to all members before saving. Be sure this name does not overwrite some other desired member Metdata when that happens.
kill_on_failure – boolean controlling what the function does to the output ensemble if the algorithm totally fails. “totally fails” in this case means it the number of residuals it could compute was less than or equal 1. If set True the output will be killed. When False it will be returned with no values set for the keys defined by “residual_time_key” and “center_estimate_key”. Default is False.
- Returns:
edited copy of input ensemble altering only Metadata containers
- mspasspy.algorithms.MCXcorStacking.estimate_ensemble_bandwidth(ensemble, snr_doc_key='Parrival')[source]
Estimate average bandwidth of an ensemble of data using the output of broadband_snr_QC.
The original dbxcor program used a GUI that allowed the user to select one of a set of predefined filters to be used to set the frequency band of the signal to be processed by the equivalent of the align_and_stack function of this module. The purpose of this function is to automate that process to defined an optimal bandwidth for processing of the input ensemble.
This function only works on ensembles processed previously with the mspass function broadband_snr_QC. That function computes a series of metrics for signal-to-noise ratio. The only one this one uses is the two values defined by “low_f_band_edge” and “high_f_band_edge”. The verbose names should make their definition obvious. This function returns the median of the values of all values of those two attributes extracted from all live members of the input ensemble. The result is returned as a tuple with the low frequency edge as 0 and the high frequency edge as 1. The expectation is that the data will be bandpass filtered between the low and high edges before running the align_and_stack function.
- Parameters:
ensemble (TimeSeriesEnsemble) – ensemble of data to be scanned
srn_doc_key – subdocument key of attributes computed by
broadband_snr_QC to fetch as estimates of low and high frequency edges. :type snr_doc_key: string (default “Parrival” = default of broadband_snr_QC)
- mspasspy.algorithms.MCXcorStacking.extract_initial_beam_estimate(ensemble, metric='bandwidth', subdoc_key='Parrival') TimeSeries [source]
The robust stacking method used in the align_and_stack function in this module requires an initial signal estimate for first-order alignment of signals in the input ensemble. In the original dbxcor implementation of that algorithm the user was required to select that initial signal interactively in a graphical user interface. This function provides one possible algorithm to accomplish that task automatically. It does so by scanning the ensemble to extract a member with the largest value of some quality control metric. This implementation is limited to metrics computed by the MsPASS function called broadband_snr_QC. The algorithm, however, would be easy to modify to produce a custom operator using some other metric posted to the Metadata container of each ensemble member. An oddity of the broadband_snr_QC is that it posts its output to a python dictionary (subdocument in pymongo jargon) fetched with a single key. That key can be changed from the default “Parrival” (default of broadband_snr_QC) to something else via the “subdoc_key” argument. This list of keys inside that subdocument that can be used to set what metric will be used are: bandwidth, filtered_envelope, filtered_L2, filtered_Linf, or filtered_perc. See the docstring for broadband_snr_QC for an explanation of each metric.
- Parameters:
ensemble (TimeSeriesEnsemble) – ensemble to be scanned
metric – quality metric to use for selecting member to use
as return. Always scans for the maximum of specified value as it assume the value is some norm measure. Accepted values at present are all those computed by broadband_snr_QC: bandwidth, filtered_envelope, filtered_L2, filtered_Linf, or filtered_perc. Default is “bandwidth”. A ValueError exception will be thrown if not one of those values. :type metric: string :param subdoc_key: broadband_snr_QC normally posts output to a subdocument (dictionary). This is the key that is used to extract that subdocument from each member. Default is “Parrival” which is the default of broadband_snr_QC. :type subdoc_key: string :return: TimeSeries that is a copy of the ensemble member with the largest value of the requested member. A default constructed, dead datum is returned if the algorithm failed.
- mspasspy.algorithms.MCXcorStacking.phase_time(d, phase_time_key='Ptime', time_shift_key='arrival_time_correction') float [source]
Small generic function to return a UTC arrival time combining an initial arrival time estimate (normally a model based time) defined by the “phase_time_key” metadata value extracted from d and the time shift computed by align_and_stack (or any other algorithm that computes relative time shifts) and set with the Metadata key defined by the “time_shift_key” argment. The defaults work for P phase times computed by MCXcorPrepP and shifts computed by align_and_stack. The computation here is trivial (just a difference) but the fluff is all the safeties in handling missing values. Returns -1.0 if any of the requried keys are missing. Returns -2.0 if d is not defined as UTC. That is basically a reminder this function only makes sense for data with a UTC time standard.
- mspasspy.algorithms.MCXcorStacking.post_MCXcor_metrics(d, beam, metrics={'amplitude': 'beam_relative_amplitude', 'arrival_time': 'Ptime_xcor', 'coherence': 'beam_coherence', 'cross_correlation': 'beam_correlation'}, window=None, phase_time_key='Ptime', time_shift_key='arrival_time_correction') TimeSeries [source]
Computes and posts a set of standard QC metrics for result of multichannel cross-correlation algorithm.
This function should be thought of as a post-processing function to standardize a set of useful calculations from the output of the multichannel cross-correlation algorithm. The default is set up for post processing teleseismic P wave data but with changes to the arguments it should be possible to use it for any teleseismic body wave phase.
What is computed is controlled by the input parameter “metrics”. “metrics” is expected to be a dictionary with the keys defining the concept of what is to be computed and posted and a “value” being the actual key to use for posting the computed value. Defaults for “metrics” are as follows the dictionary key in quotes at the start of each paragraph:
“arrival_time” - compute arrival time from posted initial time estimate and correlation time shift computed by align_and_stack. Uses the “phase_time_key” and “time_shift_key” to fetch required values. Default works for Pwave data processed with align_and_stack. Changes needed for other phases.
“cross_correlation” - compute cross correlation with respect to beam
“coherence” - compute time domain coherence with respect to beam
“amplitude” - compute amplitude of d relative to the beam.
- Parameters:
d (TimeSeries) – datum to be processed
beam (TimeSeries) – array stack (beam) output of align_and_stack
metrics (python dictionary with one or more of the keys defined above.) – defines what metrics should be computed ans posted (see above for options)
window (TimeWindow or None (default)) – optional time window to use for computing specified metric(s). Windowing is normally applied for cross-correlation, coherence, and amplitude calculations (it is ignored for the time computation). If it is not defined behavior depends on the algorithm as the object is passed directly to functions used to compute correlation, coherence, and amplitude metrics.
phase_time_key (string (default “Ptime” which is default expectation used in align_and_stack)) – key used to fetch the initial phase time used for initial alignment for align_and_stack input. The assumption is the content is defined in d and when fetch is an epoch time defining the initial time shift for the given phase. Note it could be either a measured or model-based arrival time.
time_shift_key (string (default “arrival_time_correction” is the key used to post the cross-correlation shifts in align_and_stack)) – key used to store the relative time shifts computed by align_and_stack.
- Returns:
copy of d TimeSeries with the requested metrics posted to the Metadata container of the output.
- mspasspy.algorithms.MCXcorStacking.regularize_ensemble(ensemble, starttime, endtime, pad_fraction_cutoff) TimeSeriesEnsemble [source]
Secondary function to regularize an ensemble for input to robust stacking algorithm. ASsumes all data have the same sample rate. Uses WindowData to assumre all data are inside the common range startime:endtime. Silently drops dead data.
- mspasspy.algorithms.MCXcorStacking.remove_incident_wavefield(d, beam)[source]
Remove incident wavefield for teleseismic P wave data using a beam estimate.
In imaging of S to P conversion with teleseimic P wave data a critical step is removing the incident wavefield. The theoretical reason is described in multiple publications on S to P imaging theory. This function implements a novel method using the output of align_and_stack to remove the incident wavefield. The approach make sense ONLY IF d is a member of the ensemble used to compute beam and is the longitudinal component estimate for a P phase. The best choice for that is the free surface transformation operator but in could be used for LQT or even the very crude RTZ transformation.
The actual operation is very simple: the overlaping section of beam and d is determined. The algorthm then computes a scaling factor for beam as the dot product of beam and d in the common time range. That number is used to scale the beam which is then subtracted (in the overlapping time range) from d. There are several key assumptions this algorithm makes about the input that should be kept in mind before using it:
d is implicitly expected to span a longer or a least equal span to the beam time range.
beam should always be tapered to prevent an offset at the front and end when it is subtracted from d.
beam should be normalized so it L2 norm is 1.0. Note that should be the taper beam not the beam cut with WindowData. That is not computed in this function for efficiency as normal use would apply the same beam estimate to all live ensemble members.
Because this is assumed to be used at deep in a processing chain it has no safties. d and beam and assumped to be TimeSeries objects. The only safety is that if d or beam are marked dead it does nothing but return d.
- Parameters:
d (TimeSeries) – datum from which beam is to be subtracted (Assumed to have a time span containing the time span of beam)
beam – windowed and tapered output stack from align_and_stack. Assumed to have L2 norm of 1.0.
- mspasspy.algorithms.MCXcorStacking.robust_stack(ensemble, method='dbxcor', stack0=None, stack_md=None, timespan_method='ensemble_inner', pad_fraction_cutoff=0.05, residual_norm_floor=0.01) tuple [source]
Generic function for robust stacking live members of a TimeSeriesEnsemble. An optional initial stack estimate can be used via tha stack0 argument. The function currently supports two methods: “median” for a median stack and “dbxcor” to implement the robust loss function used in the dbxcor program defined in Pavlis and Vernon (2010). Other algorithms could easily be implemented via this same api by adding an option for the “method” argument.
All robust estimators I am aware of that use some form of penalty function (e.g. m-estimators or the dbxcor penalty function) require an initial estimator for the stack. They do that because the penalty function is defined from a metric of residuals relative to the current estimate of center. The median, however, does not require an initial estimator which complicates the API for this function. For the current options understand that stack0 is required for the dbxcor algorithm but will be ignored if median is requested.
The other complication of this function is handling of potential irregular time ranges of the ensemble input and how to set the time range for the output. The problem is further complicated by use in an algorithm like align_and_stack in this module where the data can get shifted to have undefined data within the time range the data aims to utilize. The behavior of the algorithm for this issue is controlled by the kwarg values with the keys “timespan_method” and “pad_fraction_cutoff”. As the name imply “timespan_method” defines how the time span for the stack should be defined. The following options are supported:
“stack0” - sets the time span to that of the input TimeSeries passed as stack0. i.e. the range is set to stack0.dt to stack0.endtime().
“ensemble_inner” - (default) use the range defined by the “inner” method for computing the range with the function ensemble_time_range. (see ensemble_time_range docstring for the definition).
“ensemble_outer” - use the range defined by the “outer” method for computing the range with the function ensemble_time_range. (see ensemble_time_range docstring for the definition).
“ensemble_median” - use the range defined by the “median” method for computing the range with the function ensemble_time_range. (see ensemble_time_range docstring for the definition).
These interact with the value passed via “fractional_mismatch_level”. When the time range computed is larger than the range of a particular member of the input ensemble this parameter determines whether or not the member will be used in the stack. If the fraction of undefined samples (i.e. N_undefined/Nsamp) is greater than this cutoff that datum will be ignored. Otherwise if there are undefined values they will be zero padded.
- Parameters:
ensemble (TimeSeriesEnsemble) – input data to be stacked. Should all be in relative time with all members having the same relative time span.
method (string. Currently must be one of two values or the function will abort: "median" or "dbxcor". As the names imply "median" will cause the function to return the median of the sample vectors while "dbxcor" applies the dbxcor method.) – Defines a name string of the method to be used to compute the stack.
stack0 (TimeSeries. Note the time span of this optional input must be the same or wider than the ensemble member range defined by the (internal to this module) validate_ensemble function or the return will be return as a copy of this TimeSeries marked dead. Default for this argument is None which means the median will be used for the initial stack for dbxcor) – optional initial estimate for stack. Estimators other than median I know of use a loss function for downweighting members of the stack that do not match the stack as defined by some misfit metric. This argument can be used to input an optional starting estimate of the stack for the dbxcor method. By default it uses the median as the starting point, but this can be used to input something else. Note the function will silently ignore this argument if method == “median”.
stack_md (Metadata container or None. If stack0 is defined this argument is ignored. Otherwise it should be used to add whatever Metadata is required to provide a tag that can be used to identify the output. If not specified the stack Metadata will be only those produce from default construction of a TimeSeries. That is almost never what you want. Reiterate, however, that if stack0 is defined the output stack will be a clone of stack0 with possible modifications of time and data range attributes and anything the stack algorithm posts.) – optional Metadata container to define the content of the stack output. By default the output will have only Metadata that duplicate required internal attributes (e.g. t0 and npts). An exception is if stack0 is used the Metadata of that container will be copied and this argument will be ignored.
residual_norm_floor (float (default 0.01)) – floor on residuals used to compute dbxcor weight function. See docstring for dbxcor_weights for details. Ignored unless method is “dbxcor”
- Returns:
tuple containing the requested stack as component 0. The stack is returned as a TimeSeries with optional Metadata copied from the (optional) stack_md argument. Component 1 is defined only for the dbxcor method in which case it a numpy array containing th e robust weights returned by the dbxcor algorithm. If the method is set to “median” component 1 will be returned as a None type.
MTPowerSpectrumEngine
- class mspasspy.algorithms.MTPowerSpectrumEngine.MTPowerSpectrumEngine(winsize, tbp, number_tapers, nfft=0, iadapt=0)[source]
Bases:
object
Wrapper class to use German Prieto’s multitaper package as a plug in replacement for the MsPASS class of the same name. The MsPASS version is in C++ and there are some collisions in concept. It should be used only in a python script where it might be useful to use features of Prieto’s library not available in the cruder implementation in MsPASS. (e.g. adaptive weights in multitaper power spectrum estimation).
The biggest collision in concept is that the MsPASS version with this class name was designed to be created and moved around to avoid the overhead of recreating the Slepian tapers on each use. Prieto’s class creates these in the constructor but caches a number of things when first created that allows secondary calls to run faster. The implementation attempts to overcome this collision by caching an instance of Prieto’s mstspec class on the first call to the apply method. Secondary calls to apply test for consistency with the previous call and only recreate the mtspec instance if something that invalidates the previous call has changed.
- Parameters:
winsize – Number of samples expected for data window. Note this argument defines the size of the eigentapers so it defines the size of input time series the object expect to process.
tbp – multitaper time-bandwidth product
number_tapers – number of tapers to use for estimators (See Prieto’s) documentation for details)
nfft – optional size of the fft work arrays to use. nfft should be greater than or equal winsize. When greater zero padding is used in the usual way. Default is 0 which sets nfft to 2*winsize. If nfft<winsize a diagnostic message will be printed and nfft will be set equal to winsize.
iadapt – integer argument passed to Prieto’s MTspec that sets the eigentaper weighting scheme. See Prieto’s MTspect class documentation for options. Default is 0 which enables the adaptive weighting scheme.
RFdeconProcessor
This file contains a class definition for a wrapper for the suite of scalar deconvolution methods supported by mspass. It demonstrates the concept of a processing object created by wrapping C code. It also contains a top-level function that is a pythonic interface that meshes with MsPASS schedulers for parallel processing called RFdecon. RFdecon is a wrapper for all single-station methods. It cannot be used for array methods.
Created on Fri Jul 31 06:24:10 2020
@author: Gary Pavlis
- mspasspy.algorithms.RFdeconProcessor.RFdecon(d, engine=None, alg='LeastSquares', pf='RFdeconProcessor.pf', wavelet=None, noisedata=None, wcomp=2, ncomp=2, QCdocument_key='RFdecon_properties', object_history=False, alg_name='RFdecon', alg_id=None, dryrun=False)[source]
Use this function to compute conventional receiver functions from a single three component seismogram. In this function, an instance of wrapper class RFdeconProcessor will be built and initialized with alg and pf.
Default assumes d contains all data sections required to do the deconvolution with the wavelet in component 2 (3 for matlab and FORTRAN people). By default the data and noise (if required by the algorithm) sections will be extracted from the (assumed larger) data section using time windows defined internally in the processor pf definition. For variations (e.g. adding tapering to one or more of the time series inputs) use the d, wavelet, and (if required) noise arguments to load each component separately. Note d is dogmatically required to be three component data while optional wavelet and noisedata series are passed as plain numpy vectors (i.e. without the decoration of a TimeSeries).
To make use of the extended outputs from RFdeconProcessor algorithms (e.g. actual output of the computed operator) call those methods after this function returns successfully with a three-component seismogram output. That is possible because the processor object caches the most recent wavelet and inverse used for the deconvolution. An exception is that all algorithms call their QCmetrics method of processor and push them to the headers of the deconvolved output. QCmetric attributes are algorithm dependent.
The ProcessingHistory feature can optionally be enabled by setting the save_history argument to True. When enable one should normally set a unique id for the algid argument.
- Parameters:
d (Must be a Seismogram object or the function will throw a TypeError exceptionl) – Seismogram input data. See notes above about time span of these data.
engine (None or an instance of RFdeconProcessor. When None (default) an instance of an RFdeconProcessor is created on entry based on the keyword defined by the alg argument. The algorithm built into the instance of RFdeconProcessor is used if engine is not null.) – optional instance of a RFdeconProcessor object. By default the function instantiates an instance of a processor for each call to the function. For algorithms like the multitaper based algorithms with a high initialization cost performance will improve by sending an instance to the function via this argument.
alg – The algorithm to be applied, used for initializing a RFdeconProcessor object. Ignored if engine is used.
pf (string defining an absolute path for the file name or a path relative to a directory defined by PFPATH.) – The pf file to be parsed, used for inititalizing a RFdeconProcessor. Ignored if engine is used.
wavelet (None or an iterable vector container (in MsPASS that means a python array, a numpy array, or a DoubleVector)) – vector of doubles (numpy array or the std::vector container internal to TimeSeries object) defining the wavelet to use to compute deconvolution operator. Default is None which assumes processor was set up to use a component of d as the wavelet estimate.
noisedata (None or an iterable vector container (in MsPASS that means a python array, a numpy array, or a DoubleVector)) – vector of doubles (numpy array or the std::vector container internal to TimeSeries object) defining noise data to use for computing regularization. Not all RF estimation algorithms use noise estimators so this parameter is optional. It can also be extracted from d depending on parameter file options.
wcomp –
- When defined from Seismogram d the wavelet
estimate in conventional RFs is one of the components that are most P wave dominated. That is always one of three things: Z, L of LQT, or the L component from the output of Kennett’s free surface transformation operator. The default is 2, which for ccore.Seismogram is always one of the above. This parameter would be changed only if the data has undergone some novel transformation not yet invented and the best wavelet estimate was on in 2 (3 with FORTRAN and matlab numbering).
- type wcomp:
int (must 0, 1, or 2)
- param ncomp:
component number to use to compute noise. This is used only if the algorithm in processor requires a noise estimate. Normally it should be the same as wcomp and is by default (2).
- type ncomp:
int (must be 0, 1, or 2)
- param QCdocument_key:
A summary of the parameters defining the deconvolution operator (really a dump of the pf content used for creating the engine) and computed QC attributes are posted to a python dictionary. That content is posted to the outputs Metadata container with the key defined by this argument. In MongoDB lingo that means when saved to the database the dictionary content associated with this key becomes a “subdocument”.
- type QCdocument_key:
string (default is “RFdecon_properties”)
- param object_history:
boolean to enable or disable saving object level history. Default is False. Note this functionality is implemented via the mspass_func_wrapper decorator.
- param alg_name:
When history is enabled this is the algorithm name assigned to the stamp for applying this algorithm. Default (“WindowData”) should normally be just used. Note this functionality is implemented via the mspass_func_wrapper decorator.
- param ald_id:
algorithm id to assign to history record (used only if object_history is set True.) Note this functionality is implemented via the mspass_func_wrapper decorator.
- param dryrun:
When true only the arguments are checked for validity. When true nothing is calculated and the original data are returned. Note this functionality is implemented via the mspass_func_wrapper decorator.
- Returns:
Normally returns Seismogram object containing the RF estimates. The orientations are always the same as the input. If return-wavelets is set True returns a tuple with three components: 0 - Seismogram returned as with default, 1 - ideal output wavelet TimeSeries, 2 - actual output wavelet stored as a TimeSeries object.
- class mspasspy.algorithms.RFdeconProcessor.RFdeconProcessor(alg='LeastSquares', pf='RFdeconProcessor.pf')[source]
Bases:
object
This class is a wrapper for the suite of receiver function deconvolution methods we call scalar methods. That is, the operation is reducable to two time series: wavelet signal and the data (TimeSeries) signal. That is in contrast to three component methods that always treat the data as vector samples. The class should be created as a global processor object to be used in a spark job. The design assumes the processor object will be passed as an argument to the RFdecon function that should appear as a function in a spark map call.
- QCMetrics(prediction_error_key='prediction_error') dict [source]
All decon algorithms compute a set of algorithm dependent quality control metrics. The return is a Metadata container. All this wrapper really does is translate that return into a python dictionary that can be used as the base of a subdocument posting to outputs. This method MUST ONLY BE CALLED after calling the process method of the C++ engine.
- actual_output()[source]
The actual output of a decon operator is the inverse filter applied to the wavelet. By design it is an approximation of the shaping wavelet defined for this operator.
- Returns:
Actual output of the operator as a ccore.TimeSeries object.
The Metadata of the return is bare bones. The most important factor about this result is that because actual output waveforms are normally a zero phase wavelet of some kind the result is time shifted to be centered (i.e. t0 is rounded n/2 where n is the length of the vector
returned).
- apply()[source]
Compute the RF estimate using the algorithm defined internally.
- Returns:
vector of data that are the RF estimate computed from previously loaded data.
- change_parameters(md)[source]
Use this method to change the internal parameter setting of the processor. It can only change the parameters for a particular algorithm. A new instance of this class needs to be created if you need to switch to a different algorithm. It does little more than call the read_metadata of the already loaded processor. All the scalar decon methods implement that method.
- Parameters:
md – is a mspass.Metadata object containing required parameters
for the alternative algorithm.
- property dwin
- ideal_output()[source]
The ideal output of a decon operator is the same thing we call a shaping wavelet. This method returns the ideal output=shaping wavelet as a TimeSeries object. Like the actual output method the return function is circular shifted so the function peaks at 0 time located at n/2 samples from the start sample. Graphic displays will then show the wavelet centered and peaked at time 0. The prediction error can be computed as the difference between the actual_output and ideal_output TimeSeries objects. The norm of the prediction error is a helpful metric to display the stability and accuracy of the inverse.
- inverse_filter()[source]
This method returns the actual inverse filter that if convolved with he original data will produce the RF estimate. Note the filter is meaningful only if the source wavelet is minimum phase. A standard theorem from time series analysis shows that the inverse of mixed phase wavelet is usually unstable and a maximum phase wavelet is always unstable. Fourier-based methods can still compute a stable solution even with a mixed phase wavelet because of the implied circular convolution.
The result is returned as TimeSeries object.
- loaddata(d, dtype='Seismogram', component=0, window=False)[source]
Loads data for processing. When window is set true use the internal pf definition of data time window and window the data. The dtype parameter changes the behavior of this algorithm significantly depending on the setting. It can be one of the following: Seismogram, TimeSeries, or raw_vector. For the first two the data to process will be extracted in a pf specfied window if window is True. If window is False TimeSeries data will be passed directly and Seismogram data will have the data defined by the component parameter copied to the internal data vector workspace. If dtype is set to raw_vector d is assumed to be a raw numpy vector of doubles or an the aliased std::vector used in ccore, for example, in the TimeSeries object s vector. Setting dtype to raw_vector and window True will result in this method throwing a RuntimeError exception as the combination is not possible since raw_vector data have no time base.
- Parameters:
d – input data (contents expected depend upon
value of dtype parameter). :param dtype: string defining the form d is expected
to be (see details above)
- Parameters:
component – component of Seismogram data to load as data vector. Ignored if dtype is raw_vector or TimeSeries.
window – boolean controlling internally defined windowing. (see details above)
- Returns:
Nothing (not None nothing) is returned
- property nwin
- property uses_noise
basic
- mspasspy.algorithms.basic.ExtractComponent(data, component, *args, object_history=False, alg_name=None, alg_id=None, dryrun=False, inplace_return=False, function_return_key=None, **kwargs)[source]
Extract single component from three-component data.
The function creates a scalar TimeSeries object from a three component Seismogram object Or a TimeSeriesEnsemble object from a SeismogramEnsemble object
- Parameters:
data (either
Seismogram
orSeismogramEnsemble
) – data object to extract from.component (
int
) – the index of component that will be extracted, it can only be 0, 1, or 2object_history – True to preserve the processing history. For details, refer to
mspass_func_wrapper
.alg_name (
str
) – alg_name is the name the func we are gonna save while preserving the history.alg_id (
bson.objectid.ObjectId
) – alg_id is a unique id to record the usage of func while preserving the history.dryrun – True for dry-run, which return “OK”. Used in the mspass_func_wrapper.
inplace_return – True to return data in mspass_func_wrapper. This is set false to handle exception directly in the function, without passing it to mspass_func_wrapper.
function_return_key – Some functions one might want to wrap with this decorator return something that is appropriate to save as Metadata. If so, use this argument to define the key used to set that field in the data that is returned.
- mspasspy.algorithms.basic.ator(data, tshift, *args, object_history=False, alg_name=None, alg_id=None, dryrun=False, inplace_return=True, function_return_key=None, **kwargs)[source]
Absolute to relative time conversion.
Sometimes we want to convert data from absolute time (epoch times) to a relative time standard. Examples are conversions to travel time using an event origin time or shifting to an arrival time reference frame. This operation simply switches the tref variable and alters t0 by tshift.
- Parameters:
data (either
mspasspy.ccore.seismic.TimeSeries
ormspasspy.ccore.seismic.Seismogram
) – data object to be converted.tshift (
float
) – time shift applied to data before switching data to relative time mode.object_history – True to preserve the processing history. For details, refer to
mspass_func_wrapper
.alg_name (
str
) – alg_name is the name the func we are gonna save while preserving the history.alg_id (
bson.objectid.ObjectId
) – alg_id is a unique id to record the usage of func while preserving the history.dryrun – True for dry-run, which return “OK”. Used in the mspass_func_wrapper.
inplace_return – True to return data in mspass_func_wrapper. This is necessary to be used in mapreduce.
function_return_key – Some functions one might want to wrap with this decorator return something that is appropriate to save as Metadata. If so, use this argument to define the key used to set that field in the data that is returned.
- mspasspy.algorithms.basic.cosine_taper(data, t0head, t1head, t1tail, t0tail, *args, object_history=False, alg_name=None, alg_id=None, dryrun=False, inplace_return=True, function_return_key=None, **kwargs)[source]
Taper front and/or end of a data object with a half cosine function.
A cosine taper is a common, simple approach to taper data. When applied at the front it defnes a half cycle of a cosine curve +1.0 in range -pi to 0. On the right it defines the same function for the range 0 to pi. The period of the left and right operator can be different. Turn off left or right by giving illegal start and end points and the operator will silently be only one sided.
- Parameters:
data (either
TimeSeries
orSeismogram
) – data object to be processed.t0head (
float
) – t0 of the head tapert1head (
float
) – t1 of the head tapert1tail (
float
) – t1 of the tail tapert0tail (
float
) – t0 of the tail taperobject_history – True to preserve the processing history. For details, refer to
mspass_func_wrapper
.alg_name (
str
) – alg_name is the name the func we are gonna save while preserving the history.alg_id (
bson.objectid.ObjectId
) – alg_id is a unique id to record the usage of func while preserving the history.dryrun – True for dry-run, which return “OK”. Used in the mspass_func_wrapper.
inplace_return – True to return data in mspass_func_wrapper. This is necessary to be used in mapreduce.
function_return_key – Some functions one might want to wrap with this decorator return something that is appropriate to save as Metadata. If so, use this argument to define the key used to set that field in the data that is returned.
- mspasspy.algorithms.basic.free_surface_transformation(data, uvec=None, vp0=None, vs0=None, ux_key='ux', uy_key='uy', vp0_key='vp0', vs0_key='vs0', *args, object_history=False, alg_name=None, alg_id=None, dryrun=False, inplace_return=True, function_return_key=None, **kwargs)[source]
Computes and applies the Kennett [1991] free surface transformation matrix.
Kennett [1991] gives the form for a free surface transformation operator that defines a transformation matrix that provides optimal separation of P, SV, and SH for a receiver at the free surface of an isotropic medium with constant P and S velocity. The transformation is only valid if the incident wavefield is not evanescent (i.e. at or past the SV critical angle). It is important to realize the tranformation matrix does not define an orthogonal transformation. It is also, of course, only an approximation since it is strictly correct only for a constant velocity medium. The tranformation operator requires: (1) a slowness vector, (2) an estimate of surface P wave velocity, and (3) an estimate of surface shear wave velocity. In all cases the input for the slowness vector uses a local cartesian system with the vector specified as component ux and uy. ux is the component of the slowness of an incident wavefield in the geographic x direction, which means positive at local east. uy is the complementary component for the y direction defined by local north. The components of the slowness vector are assumed to be in units of s/km. (Warning: obspy’s and other implementations of the commonly used tau-p calculator return slowness (ray parameter) in spherical units of s/radian - r sin(theta)/v)
The required parameters (slowness vector and velocities) can be passed to the function one of two ways. Because it is much simpler to implement in a map operator the default expect those parameters to be set in the data object’s Metadata container. The default keys for fetching each attribute are defined by the four arguments “ux_key”, “uy_key”, “vp0_key”, and “vs0_key”. All four have standard defaults defined in the function signature and below. The Metadata fetching algorithm can be overridden by defining the same data through the three optional arguments with the key names “uvec”, “vp0”, and “vs0”. (see below for detailed descriptions)
The switching between Metadata fetching and a constant arguments is not all or none. If a slowness vector is defined through uvec it will always override metadata. Handing of vp0 and vs0 is independent. i.e. you can define uvec and not define vp0 and vs0 or conversely you can (more commonly) define vp0 and vs0 but not define uvec. In all cases not defining an arg is a signal to fetch it from Metadata.
When this function is applied to ensembles and the Metadata fetch approach is used (default) the function assumes all ensemble members have the required metadata keys defined. Any that do not are killed. The same happens for atomic data passed to this function if any of the required keys are missing. In fact, you should realize the ensemble algorithm simply applies this function in a recursion over all the members.
The output components are in the order defined in Kennett’s original paper. The order is 0=SH, 1=SV, 2=L.
- Parameters:
data – data object to be transformed. For ensembles the transformation
is applied to all members. Note the Metadata fetch mechanism is the only recommended way to handle ensembles. An elog message will be posted to the ensemble’s elog container if you try to use a constant slowness vector passed via uvec. It will not warn about constant vp0 and vs0 as that case is common. :type data:
Seismogram
orSeismogramEnsemble
:param ux_key: key to use to fetch EW component of slowness vector from Metadata container. Default is “ux”. :type ux_key: string :param uy_key: key to use to fetch NS component of slowness vector from Metadata container. Default is “uy”. :type uy_key: string :param vp0_key: key to use to fetch free surface P wave velocity from Metadata container. Default is “vp0”. :type vp0_key: string :param vs0_key: key to use to fetch free surface S wave velocity from Metadata container. Default is “vs0”. :type vs0_key: string :param uvec: slowness vector of the incident wavefield defined via custom C++ classSlownessVector
. Default is None which is taken as a signal to fetch the slowness vector components from Metadata using ux_key and uy_key. :type uvec:SlownessVector
:param vp0: Surface P wave velocity. Default is None which is taken as a signal to fetch this quantity from Metadata using the vp0_key. :type vp0:float
:param vs0: Surface S wave velocity. Default is None which is taken as a signal to fetch this quantity from Metadata using the vs0_key. :type vs0:float
:param object_history: True to preserve the processing history. For details, refer tomspass_func_wrapper
.- Parameters:
alg_name (
str
) – alg_name is the name the func we are gonna save while preserving the history.alg_id (
bson.objectid.ObjectId
) – alg_id is a unique id to record the usage of func while preserving the history.dryrun – True for dry-run, which return “OK”. Used in the mspass_func_wrapper.
inplace_return – True to return data in mspass_func_wrapper. This is necessary to be used in mapreduce.
function_return_key – Some functions one might want to wrap with this decorator return something that is appropriate to save as Metadata. If so, use this argument to define the key used to set that field in the data that is returned.
- mspasspy.algorithms.basic.linear_taper(data, t0head, t1head, t1tail, t0tail, *args, object_history=False, alg_name=None, alg_id=None, dryrun=False, inplace_return=True, function_return_key=None, **kwargs)[source]
Taper front and/or end of a data object with a linear taper.
Linear tapers are defined here as a time spanning a ramp running from 0 to 1. Data will be zeroed on each end of a 0 mark and a linear weight applied between 0 points and 1 points. Postive ramp slope on left and negative slope ramp on right. Setting t0 == t1 will disable the taper on the specified end (e.g., t0head == t1head).
- Parameters:
data (either
TimeSeries
orSeismogram
) – data object to be processed.t0head (
float
) – t0 of the head tapert1head (
float
) – t1 of the head tapert1tail (
float
) – t1 of the tail tapert0tail (
float
) – t0 of the tail taperobject_history – True to preserve the processing history. For details, refer to
mspass_func_wrapper
.alg_name (
str
) – alg_name is the name the func we are gonna save while preserving the history.alg_id (
bson.objectid.ObjectId
) – alg_id is a unique id to record the usage of func while preserving the history.dryrun – True for dry-run, which return “OK”. Used in the mspass_func_wrapper.
inplace_return – True to return data in mspass_func_wrapper. This is necessary to be used in mapreduce.
function_return_key – Some functions one might want to wrap with this decorator return something that is appropriate to save as Metadata. If so, use this argument to define the key used to set that field in the data that is returned.
- mspasspy.algorithms.basic.rotate(data, rotate_param, *args, object_history=False, alg_name=None, alg_id=None, dryrun=False, inplace_return=True, function_return_key=None, **kwargs)[source]
Rotate data using a P wave type coordinate definition.
This function can apply three different types of rotation depending on the type of parameter given. If a
SphericalCoordinate
is given, it will rotate the data into a coordinate system defined by the direction defined by the spherical coordinate. The data are rotated such that x1 becomes the transverse component, x2 becomes radial, and x3 becomes longitudinal.If an unite vector of three components that defines the direction of x3 direction (longitudinal) is give, it will turn the vector into a
SphericalCoordinate
object and calles the related rotate with it.If a
float
number is given, it will rotate the horizontal components by this much angle in radians.- Parameters:
data (
Seismogram
) – data object to be rotated.rotate_param (see above for details.) – the parameter that defines the rotation.
object_history – True to preserve the processing history. For details, refer to
mspass_func_wrapper
.alg_name (
str
) – alg_name is the name the func we are gonna save while preserving the history.alg_id (
bson.objectid.ObjectId
) – alg_id is a unique id to record the usage of func while preserving the history.dryrun – True for dry-run, which return “OK”. Used in the mspass_func_wrapper.
inplace_return – True to return data in mspass_func_wrapper. This is necessary to be used in mapreduce.
function_return_key – Some functions one might want to wrap with this decorator return something that is appropriate to save as Metadata. If so, use this argument to define the key used to set that field in the data that is returned.
- mspasspy.algorithms.basic.rotate_to_standard(data, *args, object_history=False, alg_name=None, alg_id=None, dryrun=False, inplace_return=True, function_return_key=None, **kwargs)[source]
Apply inverse transformation matrix to return data to cardinal direction components.
It is frequently necessary to make certain a set of three component data are oriented to the standard reference frame (EW, NS, Vertical). This function does this. For efficiency it checks the components_are_cardinal variable and does nothing if it is set true. Otherwise, it applies the inverse transformation and then sets this variable true. Note even if the current transformation matrix is not orthogonal it will be put back into cardinal coordinates.
If inversion of the transformation matrix is not possible (e.g. two components are colinear) an error thrown by the C++ function is caught, posted to elog, and the datum return will be marked dead.
- Parameters:
data (
Seismogram
) – data object to be rotated.object_history – True to preserve the processing history. For details, refer to
mspass_func_wrapper
.alg_name (
str
) – alg_name is the name the func we are gonna save while preserving the history.alg_id (
bson.objectid.ObjectId
) – alg_id is a unique id to record the usage of func while preserving the history.dryrun – True for dry-run, which return “OK”. Used in the mspass_func_wrapper.
inplace_return – True to return data in mspass_func_wrapper. This is necessary to be used in mapreduce.
function_return_key – Some functions one might want to wrap with this decorator return something that is appropriate to save as Metadata. If so, use this argument to define the key used to set that field in the data that is returned.
- Exception:
MsPASSError
thrown if the an inversion of the transformation matrix is required and that matrix is singular. This can happen if the transformation matrix is incorrectly defined or the actual data are coplanar.
- mspasspy.algorithms.basic.rtoa(data, *args, object_history=False, alg_name=None, alg_id=None, dryrun=False, inplace_return=True, function_return_key=None, **kwargs)[source]
Relative to absolute time conversion.
Sometimes we want to convert data from relative time to to an UTC time standard. An example would be converting segy shot data to something that could be processed like earthquake data in a css3.0 database. This function returns data previously converted to relative back to UTC using the internally stored time shift attribute.
- Parameters:
data (either
TimeSeries
orSeismogram
) – data object to be converted.object_history – True to preserve the processing history. For details, refer to
mspass_func_wrapper
.alg_name (
str
) – alg_name is the name the func we are gonna save while preserving the history.alg_id (
bson.objectid.ObjectId
) – alg_id is a unique id to record the usage of func while preserving the history.dryrun – True for dry-run, which return “OK”. Used in the mspass_func_wrapper.
inplace_return – True to return data in mspass_func_wrapper. This is necessary to be used in mapreduce.
function_return_key – Some functions one might want to wrap with this decorator return something that is appropriate to save as Metadata. If so, use this argument to define the key used to set that field in the data that is returned.
- mspasspy.algorithms.basic.transform(data, matrix, *args, object_history=False, alg_name=None, alg_id=None, dryrun=False, inplace_return=True, function_return_key=None, **kwargs)[source]
Applies an arbitrary transformation matrix to the data.
i.e. after calling this function the data will have been multiplied by the matrix and the transformation matrix will be updated. The later allows cascaded transformations to data.
- Parameters:
data (
Seismogram
) – data object to be transformed.matrix (
numpy.array
) – a 3x3 matrix that defines the transformation.object_history – True to preserve the processing history. For details, refer to
mspass_func_wrapper
.alg_name (
str
) – alg_name is the name the func we are gonna save while preserving the history.alg_id (
bson.objectid.ObjectId
) – alg_id is a unique id to record the usage of func while preserving the history.dryrun – True for dry-run, which return “OK”. Used in the mspass_func_wrapper.
inplace_return – True to return data in mspass_func_wrapper. This is necessary to be used in mapreduce.
function_return_key – Some functions one might want to wrap with this decorator return something that is appropriate to save as Metadata. If so, use this argument to define the key used to set that field in the data that is returned.
- mspasspy.algorithms.basic.transform_to_LQT(data, seaz_key='seaz', ema_key='ema', phi=None, theta=None, angle_units='degrees', object_history=False, alg_name=None, alg_id=None, dryrun=False, inplace_return=False, function_return_key=None)[source]
Applies coordinate transform to LQT version of ray coordinates.
LQT is an orthogonal coordinate transformation for Seismogram objects that cause the output data to have x1-Longitudinal (positive up normally set to the predicted emergence angle of P particle motion), x2 - Q a rotated radial direction (in propagation direction but tilted by theta). and x3 - transverse (T) in direction to define a right handed coordinate system. The function produces this transformation by the product f three transformation:
Uses rotate_to_standard to assure we start from cardinal directions.
Transformation to what might be called TQL using the Seismogram C++ method rotate using a SphericalCoordinate definition.
Transformation to LQT to rearrange the order of the Seismogram data matrix (also requires a sign change on T to keep the output right handed)
The form of the transformation can be specified in one of two completely different ways. By default the function attempts to extract the back azimuth from station to event with using the metadata key defined by ‘seaz_key’ (default ‘seaz’) and the P emergence angle with the metadata key defined by the ‘ema_key’ argument (default ‘ema’). If the arguments ‘phi’ and ‘theta’ are defined they are assumed angles to be used to defined the transformation and no attempt will be made to fetch the value from Metaata (Rarely a good idea with ensemble but can be useful for atomic data. Using Metadata is strongly preferred because it also preserves what the angles used where. They are not if the argument approach is used.) Metadata values are required to be in degree units. If the argument approach is used radian values can to used if you also set the argument “angle_units” to “radians”.
Note the operation handles the singular case of theta==0.0 where it simply uses the phi value and rotates the coordinates in a variant of RTZ (variant because order is different). :param seaz_key: key to use to fetch back azimuth value (assumed degrees always) :type key: string (default “seaz”) :param ema_key: key to use to fetch emergence angle defining L direction relative to local vertical. :type ema_key: string (defautl “ema”) :param phi: angle to rotate around vertical to define the transformation (positive anticlockwise convention NOT azimuth convention) Default is None which means ignore this parameter and use key. Setting this value to something other than None causes the Metadata fetch method to be overridden. WARNING: this is not backazimuth but angle relative to E direction. Note that is not at all what is expected when using a Metadata key :type phi: float :param theta: angle relative to local vertical for defining the L coordinate direction. It is the same as theta in spherical coordinates with emergence angle pointing upward. Default is None which causes the algorithm to automatically assume the Metadata key method should be used. :type theta: float :param angle_units: should be either ‘degrees’ (default) or ‘radians’. An invalid value will be treated as an attempt to switch to radians but will generate an elog warning message. This argument is ignored unless phi is not null (None type) :type angle_units: string :param alg_name: alg_name is the name the func we are gonna save while preserving the history. :type alg_name:
str
:param alg_id: alg_id is a unique id to record the usage of func while preserving the history. :type alg_id:bson.objectid.ObjectId
:param dryrun: True for dry-run, which return “OK”. Used in the mspass_func_wrapper. :param inplace_return: True to return data in mspass_func_wrapper. This is necessary to be used in mapreduce. :param function_return_key: Some functions one might want to wrap with this decoratorreturn something that is appropriate to save as Metadata. If so, use this argument to define the key used to set that field in the data that is returned.
- return:
transformed version of input. For ensembles the entire ensemble
is transformed.
- mspasspy.algorithms.basic.transform_to_RTZ(data, key='seaz', phi=None, angle_units='degrees', key_is_backazimuth=True, object_history=False, alg_name=None, alg_id=None, dryrun=False, inplace_return=False, function_return_key=None)[source]
Applies coordinate transform to RTZ version of ray coordinates.
RTZ is the simplest transformation for three component data to what is commonly called ray coordinates. R-radial (SV), T-transverse (SH). and Z=vertical (bad measure of longitudinal). The function first forces the data to cardinal direction using rotate_to_standard and then rotates the coordinates around the vertical axis. The rotation can be defined in one of two ways. The default behavior is to attempt to extract the back azimuth from the station to the source with the key defined by the “key” argument (defaults to ‘seaz’). That behavior will be overriden if the “phi” kwarg value is set. Phi is assumed to be the angle to rotate the coordinates using the math convention for the phi angle with positive anticlockwise. The units of the value retrieved with the key argument or the value passed via the phi argument are by default assumed to be in degrees. If using the phi argument you specify the angle in radians if you also set the “angle_units” argument to “radians”.
- Parameters:
data (
Seismogram
orSeismogramEnsemble
.) – data object to be transformedkey (string) – key to use to fetch back azimuth value (assumed degrees always)
phi – angle to rotate around vertical to define the transformation
(positive anticlockwise convention NOT azimuth convention) Default is None which means ignore this parameter and use key. Setting this value to something other than None causes the key method to be overridden. :type phi: float :param angle_units: should be either ‘degrees’ (default) or ‘radians’. An invalid value will be treated as an attempt to switch to radians but will generate an elog warning message. This argument is ignored unless phi is not null (None type) :type angle_units: string :param key_is_backazimuth: boolean that when True (default) assumes the angle extrated with the key argument value is a backazimuth in degrees. If set False, the angle will be assumed to be a rotation angle in with the anticlockwise positive convention. :param alg_name: alg_name is the name the func we are gonna save while preserving the history. :type alg_name:
str
:param alg_id: alg_id is a unique id to record the usage of func while preserving the history. :type alg_id:bson.objectid.ObjectId
:param dryrun: True for dry-run, which return “OK”. Used in the mspass_func_wrapper. :param inplace_return: True to return data in mspass_func_wrapper. This is necessary to be used in mapreduce. :param function_return_key: Some functions one might want to wrap with this decoratorreturn something that is appropriate to save as Metadata. If so, use this argument to define the key used to set that field in the data that is returned.
- Returns:
transformed version of input. For ensembles the entire ensemble
is transformed.
- mspasspy.algorithms.basic.vector_taper(data, taper_array, *args, object_history=False, alg_name=None, alg_id=None, dryrun=False, inplace_return=True, function_return_key=None, **kwargs)[source]
Apply a general taper defined by a vector to the data object.
This method provides a simple way to build a taper from a set of uniformly spaced points. The apply methods will dogmatically only accept input data of the same length as the taper defined in the operator.
- Parameters:
data (either
TimeSeries
orSeismogram
) – data object to be processed.taper_array (
numpy.array
) – the array that defines the taperobject_history – True to preserve the processing history. For details, refer to
mspass_func_wrapper
.alg_name (
str
) – alg_name is the name the func we are gonna save while preserving the history.alg_id (
bson.objectid.ObjectId
) – alg_id is a unique id to record the usage of func while preserving the history.dryrun – True for dry-run, which return “OK”. Used in the mspass_func_wrapper.
inplace_return – True to return data in mspass_func_wrapper. This is necessary to be used in mapreduce.
function_return_key – Some functions one might want to wrap with this decorator return something that is appropriate to save as Metadata. If so, use this argument to define the key used to set that field in the data that is returned.
bundle
This is a set of thin wrappers for C++ code to do a process we call “bundling”. That is, a Seismogram object can be constructed from 3 TimeSeries objects spanning a common time period and having the same sample rate but with orientation pointing in 3 linearly independent directions. There are a lot of complexities to assembling such data and manipulating the data objects, which is why the process was implemented in C++. The algorithms here are not completely generic and will likely need additions at some future data for nonstandard data. “standard” in this context means passive array data archived in the SEED (Standard Earthquake Exchange Data) format (aka miniseed which is a standard subset of seed used for archives). The algorithms here depend upon seismic channels being uniquely defined by four metadata keys: net, sta, chan, and loc. Bundles are formed by sorting data in the following key order: net, sta, loc, chan. The algorithms here are further limited to applications to ensembles that are the generalization of a reflection seismology “shot gather”. That means an implict assumption is the ensembles contain data assembled from one and only one event so there is a one-to-one relationship between each channel and an event tag. if data from multiple events or something other than a shot gather are to be handled used the BundleGroup function station by station sorting the inputs by some other method to create triples of channels that can be merged into one Seismogram object.
Created on Mon Jan 11 05:34:10 2021
@author: pavlis
- mspasspy.algorithms.bundle.BundleSEEDGroup(d, i0=0, iend=2)[source]
Combine a grouped set of TimeSeries into one Seismogram.
A Seismogram object is a bundle of TimeSeries objects that define a nonsingular tranformation matrix that can be used to reconstruct vector group motion. That requires three TimeSeries objects that have define directions that are linearly independent. This function does not directly test for linear independence but depends upon channel codes to assemble one or more bundles needed to build a Seismogram. The algorithm used here is simple and ONLY works if the inputs have been sorted so the channels define a group of three unique channel codes. For example,
HHE, HHN, HHZ
would form a typical seed channel grouping.
The function will attempt to handle duplicates. By that I mean if the group has two of the same channel code like these sequences:
HHE, HHE, HHN, HHZ or HHE, HHN, HHN, HHZ, HHZ
If the duplicates are pure duplicates there is no complication and the result will be clean. If the time spans of the duplicate channels are different the decision of which to use keys on a simple idea that is most appropriate for data assembled by event with mistakes in associations. That is, it attempts to scans the group for the earliest start time. When duplicates are found it uses the one with a start time closest to the minimum as the one merged to make the output Seismogram.
The output will be marked as dead data with no valid data in one of two conditions: (1) less than 3 unique channel names or (2) more than three inputs with an inconsistent set of SEED names. That “inconsistent” test is obscure and yet another example that SEED is a four letter word. Commentary aside, the rules are:
The net code must be defined and the same in all TimeSeries passed
The station (sta) code must also be the same for all inputs
Similarly the loc code must be the same in all inputs.
Finally, there is a more obscure test on channel names. They must
all have the same first two characters. That is, BHE, BHN, BHN, BHZ is ok but BHE, BHN, BHZ, HHE will cause an immediate exit with no attempt to resolve the ambiguity - that is viewed a usage error in defining the range of the bundle.
In all cases where the bundling is not possible the function does not throw an exception but does four things:
Merges the Metadata of all inputs (uses the += operator so only the last values of duplicate keys will be preserved in the return)
If ProcessingHistory is defined in the input they history records are posted to the returned Seismogram using as if the data were live but the number of input will always be a number different from 3.
The return is marked dead.
The function posts a (hopefully) informative message to elog of the returned Seismogram.
ProcessingHistory is handled internally by this function. If all the components in a group have a nonempty ProcessingHistory the data to link the outputs to the inputs will be posted to ProcessingHistory.
- Parameters:
d – This is assumed to be an array like object of TimeSeries data
that are to be used to build the Seismogram objects. They must be sorted as described above or the algorithm will fail. Two typical array like objects to use are the member attribute of a TimeSeriesEnsemble or a python array constructed from a (sorted) collection of TimeSeries objects. :param i0: starting array position for constructing output(s). The default is 0 which would be the normal request for an full ensemble or a single grouping assembled by some other mechanism. A nonzero is useful to work through a larger container one Seismogram at a time. :param iend: end array position. The function will attempt to assemble one or more Seismograms from TimeSeries in the range d[i0] to d[iend]. Default is 2 for a single Seismogram without duplicates.
- mspasspy.algorithms.bundle.bundle_seed_data(ensemble)[source]
This function can be used to take an (unordered) input ensemble of TimeSeries objects generated from miniseed data and produce an output ensemble of Seismograms produced by bundles linked to the seed name codes net, sta, chan, and loc. An implicit assumption of the algorithm used here is that the data are a variant of a shot gather and the input ensemble defines one net:sta:chan:loc:time_interval for each record that is to be bundled. It can only properly handle pure duplicates for a given net:sta:chan:loc combination. (i.e. if the input has the same TimeSeries defined by net:sta:chan:loc AND a common start and end time). Data with gaps broken into multiple net:sta:chan:loc TimeSeries with different start and end times will produce incomplete results. That is, Seismograms in the output associated with such inputs will either be killed with an associated error log entry or in the best case truncated to the overlap range of one of the segments with the gap(s) between.
Irregular start times of any set of TimeSeries forming a single bundle are subject to the same truncation or discard rules described in the related function Bundle3C.
Note there is not guarantee the Seismogram objects returned will be in standard coordinates. In fact, they will never be with standard channel names because of the internal sorting. It would normally be highly recommended the user call the rotate_to_standard method on each Seismogram before any use.
- Parameters:
ensemble – is the input ensemble of TimeSeries to be processed.
- Returns:
ensemble of Seismogram objects made by bundling input data
- Return type:
- Exception:
Can throw a MsPASSError for a number of conditions.
Caller should be enclosed in a handler if run on a large data set.
edit
- class mspasspy.algorithms.edit.Add(key, value_to_add=1)[source]
Bases:
MetadataOperator
Used to implement += operator on a specified Metadata key. Example: to add 2 to data, d, with key=’icdp’ could use this
op = Add(‘icdp’,2) d = op.apply(d)
- apply(d, apply_to_members=False)[source]
Implementations are required to implement this method. It must accept a MsPASS data object, d, (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeimogramEnsemble) and apply the operator the implementation defines to the Metadata of d. It should throw a MsPASSError exception if d is not one of the expected data types. It should, on the other hand, handle d==None gracefully and do nothing if d is None. Ensemble handlers need to deal with an ambiguity in which metadata the operation refers to. Both the overall ensemble container and individual members have (normally independent) Metadata containers. Implementations of this method should have a apply_to_members argument as in the signature for this base class. When True the editing is done by a loop over all members while if False (which should be the default) the ensemble’s container should be used.
This method must return an (potentially but not guarnateed) edited version of d. That is essential to allow this method to be used as the function in a parallel map operator.
If a required metadata key is missing this method should do nothing unless a verbose flag is set. In that case it should log that as an error. Best practice for any use of operators based on this base class is to apply a kill operator for (MetadataDefined) on all data before running a calculator. That way you can guarantee the data needed for all operations is valid before trying to do calculations on on required metadata. There is some overhead in validating metadata so all implementations should include use of “fast_mode”. When set true the safeties will be bypassed for speed. That includes at least the two requried methods “check_keys” and “check_operation”. Implementations may add other safties.
- check_keys(d)[source]
checks that the key defined for the operator is defined for d. Returns true if is defined and false if not. the method assumes d is a valid child of Metadata so the is_defined method will not generate an exception. that means this method should ALWAYS be called after a test with _input_is_valid.
- class mspasspy.algorithms.edit.Add2(key0, key1, key2)[source]
Bases:
MetadataOperator
Used to implement + operator that adds two Metadata attributes together. The attributes are feteched with two keys set when the operator is constructed. Let a be the value in a datum associated with key2 (arg1 of constructor) and b be the value associated with key2 (arg2 of constructor). The apply method of this class computes a+b and sets the Metadata attribute defined by key0 (arg0 of constructor) to that value (a+b) i.e. d[key0] = d[key1] + d[key2]
Note key0 can be the same as either key1 or key2. The contents of the left hand side (key0) are always set by this operator unless that input was previously marked dead. Further key1 and key2 can be the same although it is hard to conceive how that could be useful.
Example: to compute ix as the d[‘sx’] + d[‘chan’] use
op = Add2(‘ix1’,’sx’,’chan’) d = op.apply(d)
- apply(d, apply_to_members=False)[source]
Implementations are required to implement this method. It must accept a MsPASS data object, d, (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeimogramEnsemble) and apply the operator the implementation defines to the Metadata of d. It should throw a MsPASSError exception if d is not one of the expected data types. It should, on the other hand, handle d==None gracefully and do nothing if d is None. Ensemble handlers need to deal with an ambiguity in which metadata the operation refers to. Both the overall ensemble container and individual members have (normally independent) Metadata containers. Implementations of this method should have a apply_to_members argument as in the signature for this base class. When True the editing is done by a loop over all members while if False (which should be the default) the ensemble’s container should be used.
This method must return an (potentially but not guarnateed) edited version of d. That is essential to allow this method to be used as the function in a parallel map operator.
If a required metadata key is missing this method should do nothing unless a verbose flag is set. In that case it should log that as an error. Best practice for any use of operators based on this base class is to apply a kill operator for (MetadataDefined) on all data before running a calculator. That way you can guarantee the data needed for all operations is valid before trying to do calculations on on required metadata. There is some overhead in validating metadata so all implementations should include use of “fast_mode”. When set true the safeties will be bypassed for speed. That includes at least the two requried methods “check_keys” and “check_operation”. Implementations may add other safties.
- check_keys(d)[source]
checks that the keys required for the operator are defined for d. If either required key are missing from d return False. Return True if both are set in d. The method assumes d is a valid child of Metadata so that the is_defined method will not generate an exception. that means this method should ALWAYS be called after a test with _input_is_valid.
- class mspasspy.algorithms.edit.ChangeKey(oldkey, newkey, erase_old=True)[source]
Bases:
MetadataOperator
- apply(d, apply_to_members=False, fast_mode=False, verbose=False)[source]
Implementations are required to implement this method. It must accept a MsPASS data object, d, (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeimogramEnsemble) and apply the operator the implementation defines to the Metadata of d. It should throw a MsPASSError exception if d is not one of the expected data types. It should, on the other hand, handle d==None gracefully and do nothing if d is None. Ensemble handlers need to deal with an ambiguity in which metadata the operation refers to. Both the overall ensemble container and individual members have (normally independent) Metadata containers. Implementations of this method should have a apply_to_members argument as in the signature for this base class. When True the editing is done by a loop over all members while if False (which should be the default) the ensemble’s container should be used.
This method must return an (potentially but not guarnateed) edited version of d. That is essential to allow this method to be used as the function in a parallel map operator.
If a required metadata key is missing this method should do nothing unless a verbose flag is set. In that case it should log that as an error. Best practice for any use of operators based on this base class is to apply a kill operator for (MetadataDefined) on all data before running a calculator. That way you can guarantee the data needed for all operations is valid before trying to do calculations on on required metadata. There is some overhead in validating metadata so all implementations should include use of “fast_mode”. When set true the safeties will be bypassed for speed. That includes at least the two requried methods “check_keys” and “check_operation”. Implementations may add other safties.
- class mspasspy.algorithms.edit.Divide(key, value_to_divide=1)[source]
Bases:
MetadataOperator
Used to implement /= operator on a specified Metadata key. Example: to divide metadata in, d, with key=’Pamp’ by 2.0 you could use this
op = Divide(‘Pamp’,2.-) d = op.apply(d)
- apply(d, apply_to_members=False)[source]
Implementations are required to implement this method. It must accept a MsPASS data object, d, (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeimogramEnsemble) and apply the operator the implementation defines to the Metadata of d. It should throw a MsPASSError exception if d is not one of the expected data types. It should, on the other hand, handle d==None gracefully and do nothing if d is None. Ensemble handlers need to deal with an ambiguity in which metadata the operation refers to. Both the overall ensemble container and individual members have (normally independent) Metadata containers. Implementations of this method should have a apply_to_members argument as in the signature for this base class. When True the editing is done by a loop over all members while if False (which should be the default) the ensemble’s container should be used.
This method must return an (potentially but not guarnateed) edited version of d. That is essential to allow this method to be used as the function in a parallel map operator.
If a required metadata key is missing this method should do nothing unless a verbose flag is set. In that case it should log that as an error. Best practice for any use of operators based on this base class is to apply a kill operator for (MetadataDefined) on all data before running a calculator. That way you can guarantee the data needed for all operations is valid before trying to do calculations on on required metadata. There is some overhead in validating metadata so all implementations should include use of “fast_mode”. When set true the safeties will be bypassed for speed. That includes at least the two requried methods “check_keys” and “check_operation”. Implementations may add other safties.
- check_keys(d)[source]
checks that the key defined for the operator is defined for d. Returns true if is defined and false if not. the method assumes d is a valid child of Metadata so the is_defined method will not generate an exception. that means this method should ALWAYS be called after a test with _input_is_valid.
- class mspasspy.algorithms.edit.Divide2(key0, key1, key2)[source]
Bases:
MetadataOperator
Used to implement / operator that divides two Metadata attributes. The attributes are feteched with two keys set when the operator is constructed. Let a be the value in a datum associated with key2 (arg1 of constructor) and b be the value associated with key2 (arg2 of constructor). The apply method of this class computes a/b and sets the Metadata attribute defined by key0 (arg0 of constructor) to that value (a/b) i.e. d[key0] = d[key1] / d[key2]
Note key0 can be the same as either key1 or key2. The contents of the left hand side (key0) are always set by this operator unless that input was previously marked dead. Further key1 and key2 can be the same although it is hard to conceive how that could be useful.
Example: to compute ix as the d[‘sx’] / d[‘chan’] use
op = Divide2(‘ix1’,’sx’,’chan’) d = op.apply(d)
- apply(d, apply_to_members=False)[source]
Implementations are required to implement this method. It must accept a MsPASS data object, d, (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeimogramEnsemble) and apply the operator the implementation defines to the Metadata of d. It should throw a MsPASSError exception if d is not one of the expected data types. It should, on the other hand, handle d==None gracefully and do nothing if d is None. Ensemble handlers need to deal with an ambiguity in which metadata the operation refers to. Both the overall ensemble container and individual members have (normally independent) Metadata containers. Implementations of this method should have a apply_to_members argument as in the signature for this base class. When True the editing is done by a loop over all members while if False (which should be the default) the ensemble’s container should be used.
This method must return an (potentially but not guarnateed) edited version of d. That is essential to allow this method to be used as the function in a parallel map operator.
If a required metadata key is missing this method should do nothing unless a verbose flag is set. In that case it should log that as an error. Best practice for any use of operators based on this base class is to apply a kill operator for (MetadataDefined) on all data before running a calculator. That way you can guarantee the data needed for all operations is valid before trying to do calculations on on required metadata. There is some overhead in validating metadata so all implementations should include use of “fast_mode”. When set true the safeties will be bypassed for speed. That includes at least the two requried methods “check_keys” and “check_operation”. Implementations may add other safties.
- check_keys(d)[source]
checks that the keys required for the operator are defined for d. If either required key are missing from d return False. Return True if both are set in d. The method assumes d is a valid child of Metadata so that the is_defined method will not generate an exception. that means this method should ALWAYS be called after a test with _input_is_valid.
- class mspasspy.algorithms.edit.Executioner[source]
Bases:
ABC
Abstract base class for family of python classes used for killing mspass data objects failing to pass a particular metric. It is made abstract to define required methods a particular instance must create. As in any good OOP that also means subclasses can add additional methods. This class should be used only as base class as it has no functionality by itself.
- edit_ensemble_members(ensemble)[source]
Subclasses should call this method if the input data are an ensemble. A trick of inheritance allows the algorithm of self to then be applied to whole ensemble. Putting this in the base class avoids the duplication of duplicate code in all subclasses.
- abstract kill_if_true(d)[source]
This method should run a test on d that will call the kill method on MsPASS atomic object d if the test defined by the implementation fails. This is the main working method of this class of function objects.
- log_kill(d, testname, message, severity=<ErrorSeverity.Informational: 5>)[source]
This base class method is used to standardize the error logging functionality of all Executioners. It writes a standardized message to simplify writing of subclasses - they need only define the testname (normally the name of the subclass) and format a specific message to be posted.
Note most subclasses will may want to include a verbose option (or the reciprocal silent) and only write log messages when verbose is set true.
- Parameters:
d – MsPASS data object to which elog message is to be written.
testname – is the string assigned to the “algorithm” field
of the message posted to d.elog. :param message: specialized message to post - this string is added to an internal generic message. :param severity: ErrorSeverity to assign to elog message (See ErrorLogger docstring). Default is Informational
- class mspasspy.algorithms.edit.FiringSquad(executioner_list)[source]
Bases:
Executioner
Used to apply multiple Executioners in a single pass - hence the name FiringSquare image; facing more than one thing that could kill you. The implementation in kill_if_true iterates through a list of Executioners. Once the datum is killed it is immediately returned. If the Executions are running in verbose mode that means some tests can shadow others. It is like a firing squad where the guns are fired in sequence and the body is removed as soon as there is a hit. The victim walks away only if all the guns miss.
Note the class has a += operator to allow appending additional tests to the chain.
- kill_if_true(d, apply_to_members=False)[source]
Implementation of base class method. In this case failure is defined as not passing one of the set of tests loaded when the object was created. As noted earlier the tests are performed in the same order they were passed to the constructor of added on with the += operator. :param d: is a mspass data object to be checked
- class mspasspy.algorithms.edit.IntegerDivide(key, value=1)[source]
Bases:
MetadataOperator
Used to implement // operator on a specified Metadata key. The // operator is a bit obscure but it implements the common need to truncate a division result to an integer. This will work on floats but the result will always be close to and integer value as if the operation were done with integers. Note also IntegerDivide is the complement to Mod which returns the remainder of such a division.
Example: to apply integer division to metadata in, d, with key=’icdp’ by 5 you could use this
op = IntegerDivide(‘icdp’,5) d = op.apply(d)
- apply(d, apply_to_members=False)[source]
Implementations are required to implement this method. It must accept a MsPASS data object, d, (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeimogramEnsemble) and apply the operator the implementation defines to the Metadata of d. It should throw a MsPASSError exception if d is not one of the expected data types. It should, on the other hand, handle d==None gracefully and do nothing if d is None. Ensemble handlers need to deal with an ambiguity in which metadata the operation refers to. Both the overall ensemble container and individual members have (normally independent) Metadata containers. Implementations of this method should have a apply_to_members argument as in the signature for this base class. When True the editing is done by a loop over all members while if False (which should be the default) the ensemble’s container should be used.
This method must return an (potentially but not guarnateed) edited version of d. That is essential to allow this method to be used as the function in a parallel map operator.
If a required metadata key is missing this method should do nothing unless a verbose flag is set. In that case it should log that as an error. Best practice for any use of operators based on this base class is to apply a kill operator for (MetadataDefined) on all data before running a calculator. That way you can guarantee the data needed for all operations is valid before trying to do calculations on on required metadata. There is some overhead in validating metadata so all implementations should include use of “fast_mode”. When set true the safeties will be bypassed for speed. That includes at least the two requried methods “check_keys” and “check_operation”. Implementations may add other safties.
- check_keys(d)[source]
checks that the key defined for the operator is defined for d. Returns true if is defined and false if not. the method assumes d is a valid child of Metadata so the is_defined method will not generate an exception. that means this method should ALWAYS be called after a test with _input_is_valid.
- class mspasspy.algorithms.edit.IntegerDivide2(key0, key1, key2)[source]
Bases:
MetadataOperator
Used to implement // operator between two Metadata attributes. The attributes are feteched with two keys set when the operator is constructed. Let a be the value in a datum associated with key2 (arg1 of constructor) and b be the value associated with key2 (arg2 of constructor). The apply method of this class computes a//b and sets the Metadata attribute defined by key0 (arg0 of constructor) to that value (a+b) i.e. d[key0] = d[key1] // d[key2]
Note key0 can be the same as either key1 or key2. The contents of the left hand side (key0) are always set by this operator unless that input was previously marked dead. Further key1 and key2 can be the same although it is hard to conceive how that could be useful.
Example: to compute ix as the d[‘sx’] // d[‘chan’] use
op = IntegerDivide2(‘ix1’,’sx’,’chan’) d = op.apply(d)
- apply(d, apply_to_members=False)[source]
Implementations are required to implement this method. It must accept a MsPASS data object, d, (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeimogramEnsemble) and apply the operator the implementation defines to the Metadata of d. It should throw a MsPASSError exception if d is not one of the expected data types. It should, on the other hand, handle d==None gracefully and do nothing if d is None. Ensemble handlers need to deal with an ambiguity in which metadata the operation refers to. Both the overall ensemble container and individual members have (normally independent) Metadata containers. Implementations of this method should have a apply_to_members argument as in the signature for this base class. When True the editing is done by a loop over all members while if False (which should be the default) the ensemble’s container should be used.
This method must return an (potentially but not guarnateed) edited version of d. That is essential to allow this method to be used as the function in a parallel map operator.
If a required metadata key is missing this method should do nothing unless a verbose flag is set. In that case it should log that as an error. Best practice for any use of operators based on this base class is to apply a kill operator for (MetadataDefined) on all data before running a calculator. That way you can guarantee the data needed for all operations is valid before trying to do calculations on on required metadata. There is some overhead in validating metadata so all implementations should include use of “fast_mode”. When set true the safeties will be bypassed for speed. That includes at least the two requried methods “check_keys” and “check_operation”. Implementations may add other safties.
- check_keys(d)[source]
checks that the keys required for the operator are defined for d. If either required key are missing from d return False. Return True if both are set in d. The method assumes d is a valid child of Metadata so that the is_defined method will not generate an exception. that means this method should ALWAYS be called after a test with _input_is_valid.
- class mspasspy.algorithms.edit.MetadataDefined(key, verbose=False)[source]
Bases:
Executioner
Implementation of Executioner using an existence test. The constructor loads only a key string. The test for the kill_if_true method is then simply for the existence of a value associated with the loaded key. Data will be killed if the defined key exists in the Metadata (header).
- class mspasspy.algorithms.edit.MetadataEQ(key, value, verbose=False)[source]
Bases:
Executioner
Implementation of Executioner using an equality test of a Metadata attribute. Both the metadata key and the threshold for the kill test are set on creation. This implementation should work on any value pairs for which the == operator in python works. That is true for pairs of numeric types, for example, but will fail if one of the pair is a string (an error anyway for any rational use of this). It should also work for any pair of pyobjects for which operator >= is defined, although anything nonstandard should be tested carefully before using this class for editing data with such nonstandard types. This was mostly intended for simple numeric attributes. It can be used for booleans even though few would write an if statement testing if two booleans were equal.
- class mspasspy.algorithms.edit.MetadataGE(key, value, verbose=False)[source]
Bases:
Executioner
Implementation of Executioner using a greater than or equal test of a Metadata attribute. Both the metadata key and the threshold for the kill test are set on creation. This implementation should work on any value pairs for which the >= operator in python works. That is true for pairs of numeric types, for example, but will fail if one of the pair is a string (an error anyway for any rational use of this). It should also work for any pair of pyobjects for which operator >= is defined, although anything nonstandard should be tested carefully before using this class for editing data with such nonstandard types. This was mostly intended for simple numeric attributes.
- class mspasspy.algorithms.edit.MetadataGT(key, value, verbose=False)[source]
Bases:
Executioner
Implementation of Executioner using a greater than test of a Metadata attribute. Both the metadata key and the threshold for the kill test are set on creation. This implementation should work on any value pairs for which the > operator in python works. That is true for pairs of numeric types, for example, but will fail if one of the pair is a string (an error anyway for any rational use of this). It should also work for any pair of pyobjects for which operator > is defined, although anything nonstandard should be tested carefully before using this class for editing data with such nonstandard types. This was mostly intended for simple numeric attributes.
- class mspasspy.algorithms.edit.MetadataInterval(key, lower_endpoint, upper_endpoint, use_lower_edge=True, use_upper_edge=True, kill_if_outside=True, verbose=False)[source]
Bases:
Executioner
Implementation of Executioner based on a numeric range of values. i.e. it tests if a metadata value is in a range defined by an upper and lower value. Ranges have a minor complication in handling the edge condition: should or should the test not include the edge? Rather than write different functions for the four possible combinations of <=, <, >=, and > that define a range we use constructor arguments (use_lower_edge and use_upper_edge) to turn inclusion of the boundary on and off.
In this context an interval test also has two logical alternatives. One may want to keep data inside an interval (most common and default) or delete data within a specified interval. That logic is controlled by the kill_if_outside boolean
Intervals mostly make sense only for numeric types (int and float), but can be used with strings. In reality this function should work with any object for which the operators >, <, >=, and >= are defined but that is adventure land if you try.
- class mspasspy.algorithms.edit.MetadataLE(key, value, verbose=False)[source]
Bases:
Executioner
Implementation of Executioner using a less than or equal test of a Metadata attribute. Both the metadata key and the threshold for the kill test are set on creation. This implementation should work on any value pairs for which the <= operator in python works. That is true for pairs of numeric types, for example, but will fail if one of the pair is a string (an error anyway for any rational use of this). It should also work for any pair of pyobjects for which operator >= is defined, although anything nonstandard should be tested carefully before using this class for editing data with such nonstandard types. This was mostly intended for simple numeric attributes.
- class mspasspy.algorithms.edit.MetadataLT(key, value, verbose=False)[source]
Bases:
Executioner
Implementation of Executioner using a lessthan test of a Metadata attribute. Both the metadata key and the threshold for the kill test are set on creation. This implementation should work on any value pairs for which the < operator in python works. That is true for pairs of numeric types, for example, but will fail if one of the pair is a string (an error anyway for any rational use of this). It should also work for any pair of pyobjects for which operator < is defined, although anything nonstandard should be tested carefully before using this class for editing data with such nonstandard types. This was mostly intended for simple numeric attributes.
- class mspasspy.algorithms.edit.MetadataNE(key, value, verbose=False)[source]
Bases:
Executioner
Implementation of Executioner using an not equal (NE) test of a Metadata attribute. Both the metadata key and the threshold for the kill test are set on creation. This implementation should work on any value pairs for which the != operator in python works. That is true for pairs of numeric types, for example, but will fail if one of the pair is a string (an error anyway for any rational use of this). It should also work for any pair of pyobjects for which operator >= is defined, although anything nonstandard should be tested carefully before using this class for editing data with such nonstandard types. This was mostly intended for simple numeric attributes. It can be used for booleans even though few would write an if statement testing if two booleans were equal.
- class mspasspy.algorithms.edit.MetadataOperator[source]
Bases:
ABC
Base class for a set of inline Metadata editors. That is, there are many instances where Metadata attributes need to be altered during a workflow where it is either unnecessary or inappropriate to access the database. e.g. we have found updates in MongoDB can be very slow if done one transaction at a time so it can streamline processing to edit metadata on the fly. This base class defines the API for a suite of tools for editing metadata on the fly.
- abstract apply(d, apply_to_members=False, fast_mode=False)[source]
Implementations are required to implement this method. It must accept a MsPASS data object, d, (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeimogramEnsemble) and apply the operator the implementation defines to the Metadata of d. It should throw a MsPASSError exception if d is not one of the expected data types. It should, on the other hand, handle d==None gracefully and do nothing if d is None. Ensemble handlers need to deal with an ambiguity in which metadata the operation refers to. Both the overall ensemble container and individual members have (normally independent) Metadata containers. Implementations of this method should have a apply_to_members argument as in the signature for this base class. When True the editing is done by a loop over all members while if False (which should be the default) the ensemble’s container should be used.
This method must return an (potentially but not guarnateed) edited version of d. That is essential to allow this method to be used as the function in a parallel map operator.
If a required metadata key is missing this method should do nothing unless a verbose flag is set. In that case it should log that as an error. Best practice for any use of operators based on this base class is to apply a kill operator for (MetadataDefined) on all data before running a calculator. That way you can guarantee the data needed for all operations is valid before trying to do calculations on on required metadata. There is some overhead in validating metadata so all implementations should include use of “fast_mode”. When set true the safeties will be bypassed for speed. That includes at least the two requried methods “check_keys” and “check_operation”. Implementations may add other safties.
- abstract check_keys(d)[source]
All implementation should implement this checker even if all it contains is a pass. It should validate the keys are defined in data to be handled in the apply method. An early call in apply should always be to call this method.
- abstract check_operation(d)[source]
All implementations should implement this method even if they choose to ignore it. It should be used to guarantee the operator the class defines will succeed on the datum sent to the apply method. e.g. any standard arithmetic operations will throw a TypeError if one of the operands is a string. This method should be used to make the operator as bombproof as possible logging a message to the datum rather than aborting if there is an issue. Some classes may want to implement this as pass because it makes no sense - e.g. setting a constant value. Those are the exception, however, so the api dogmatically demands these be implemented even if they do nothing.
- edit_ensemble_members(ensemble)[source]
Subclasses should call this method if the input data are an ensemble. A trick of inheritance allows the algorithm of self to then be applied to whole ensemble. Putting this in the base class avoids the duplication of duplicate code in all subclasses.
- log_edit(d, testname, message, severity=<ErrorSeverity.Informational: 5>)[source]
This base class method is used to standardize the error logging functionality of all editors. It writes a standardized message to simplify writing of subclasses - they need only define the testname (normally the name of the subclass) and format a specific message to be posted.
Note most subclasses will may want to include a verbose option (or the reciprocal silent) and only write log messages when verbose is set true.
- Parameters:
d – MsPASS data object to which elog message is to be written.
testname – is the string assigned to the “algorithm” field
of the message posted to d.elog. :param message: specialized message to post - this string is added to an internal generic message. :param severity: ErrorSeverity to assign to elog message (See ErrorLogger docstring). Default is Informational
- class mspasspy.algorithms.edit.MetadataOperatorChain(operator_list)[source]
Bases:
MetadataOperator
Used to apply multiple a chain of arithmetic operators to derive computed metadata attributes. Very elaborate calculations can be done through this class by chaining appropriate atomic operators defined elsewhere in the module (i.e. Add, Subtract, etc.). The operation chain is defined by a python list of the atomic operators. When the apply method of this class is called the list of operators are applied sequentially in list order.
Note the class has a += operator to allow appending additional operators to the chain.
- apply(d, apply_to_members=False)[source]
Implementation of base class method. In this case failure is defined as not passing one of the set of tests loaded when the object was created. As noted earlier the tests are performed in the same order they were passed to the constructor of added on with the += operator. :param d: is a mspass data object to be checked
- check_keys(d)[source]
All implementation should implement this checker even if all it contains is a pass. It should validate the keys are defined in data to be handled in the apply method. An early call in apply should always be to call this method.
- check_operation(d)[source]
All implementations should implement this method even if they choose to ignore it. It should be used to guarantee the operator the class defines will succeed on the datum sent to the apply method. e.g. any standard arithmetic operations will throw a TypeError if one of the operands is a string. This method should be used to make the operator as bombproof as possible logging a message to the datum rather than aborting if there is an issue. Some classes may want to implement this as pass because it makes no sense - e.g. setting a constant value. Those are the exception, however, so the api dogmatically demands these be implemented even if they do nothing.
- class mspasspy.algorithms.edit.MetadataUndefined(key, verbose=False)[source]
Bases:
Executioner
Implementation of Executioner using an nonexistence test. The constructor loads only a key string. The test for the kill_if_true method is then simply for the existence of a value associated with the loaded key. Data will be killed if the defined key does not exists (Undefined) in the Metadata (header).
This class is a useful prefilter to apply to any algorithm that requires a particular metadata attribute. Use FiringSquad to define a chain of required metadata to prefilter data input to such an algorithm.
- class mspasspy.algorithms.edit.Mod(key, value=1)[source]
Bases:
MetadataOperator
Used to implement % operator on a specified Metadata key. The % operator is a bit obscure but it implements the common need return the remainder of a divide operation. It is commonly used, for example, in cmp processing where survey flag numbers can often be converted to channel numbers for simple multichannel cable geometries.
This operator will work any numeric type but it is most commonly used for integer attributes.
Example: to convert the metadata associated with the key ‘ichan’ that are currently counting by 1 to numbers that cycle from 0 to 23 us this:
op = Mod(‘ix’,24) d = op.apply(d)
- apply(d, apply_to_members=False)[source]
Implementations are required to implement this method. It must accept a MsPASS data object, d, (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeimogramEnsemble) and apply the operator the implementation defines to the Metadata of d. It should throw a MsPASSError exception if d is not one of the expected data types. It should, on the other hand, handle d==None gracefully and do nothing if d is None. Ensemble handlers need to deal with an ambiguity in which metadata the operation refers to. Both the overall ensemble container and individual members have (normally independent) Metadata containers. Implementations of this method should have a apply_to_members argument as in the signature for this base class. When True the editing is done by a loop over all members while if False (which should be the default) the ensemble’s container should be used.
This method must return an (potentially but not guarnateed) edited version of d. That is essential to allow this method to be used as the function in a parallel map operator.
If a required metadata key is missing this method should do nothing unless a verbose flag is set. In that case it should log that as an error. Best practice for any use of operators based on this base class is to apply a kill operator for (MetadataDefined) on all data before running a calculator. That way you can guarantee the data needed for all operations is valid before trying to do calculations on on required metadata. There is some overhead in validating metadata so all implementations should include use of “fast_mode”. When set true the safeties will be bypassed for speed. That includes at least the two requried methods “check_keys” and “check_operation”. Implementations may add other safties.
- check_keys(d)[source]
checks that the key defined for the operator is defined for d. Returns true if is defined and false if not. the method assumes d is a valid child of Metadata so the is_defined method will not generate an exception. that means this method should ALWAYS be called after a test with _input_is_valid.
- class mspasspy.algorithms.edit.Mod2(key0, key1, key2)[source]
Bases:
MetadataOperator
Used to implement % operator between two Metadata attributes. The attributes are feteched with two keys set when the operator is constructed. Let a be the value in a datum associated with key2 (arg1 of constructor) and b be the value associated with key2 (arg2 of constructor). The apply method of this class computes a+b and sets the Metadata attribute defined by key0 (arg0 of constructor) to that value (a%b) i.e. d[key0] = d[key1] % d[key2]
Note key0 can be the same as either key1 or key2. The contents of the left hand side (key0) are always set by this operator unless that input was previously marked dead. Further key1 and key2 can be the same although it is hard to conceive how that could be useful.
Example: to compute ix as the d[‘sx’] % d[‘chan’] use
op = Mod2(‘ix1’,’sx’,’chan’) d = op.apply(d)
- apply(d, apply_to_members=False)[source]
Implementations are required to implement this method. It must accept a MsPASS data object, d, (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeimogramEnsemble) and apply the operator the implementation defines to the Metadata of d. It should throw a MsPASSError exception if d is not one of the expected data types. It should, on the other hand, handle d==None gracefully and do nothing if d is None. Ensemble handlers need to deal with an ambiguity in which metadata the operation refers to. Both the overall ensemble container and individual members have (normally independent) Metadata containers. Implementations of this method should have a apply_to_members argument as in the signature for this base class. When True the editing is done by a loop over all members while if False (which should be the default) the ensemble’s container should be used.
This method must return an (potentially but not guarnateed) edited version of d. That is essential to allow this method to be used as the function in a parallel map operator.
If a required metadata key is missing this method should do nothing unless a verbose flag is set. In that case it should log that as an error. Best practice for any use of operators based on this base class is to apply a kill operator for (MetadataDefined) on all data before running a calculator. That way you can guarantee the data needed for all operations is valid before trying to do calculations on on required metadata. There is some overhead in validating metadata so all implementations should include use of “fast_mode”. When set true the safeties will be bypassed for speed. That includes at least the two requried methods “check_keys” and “check_operation”. Implementations may add other safties.
- check_keys(d)[source]
checks that the keys required for the operator are defined for d. If either required key are missing from d return False. Return True if both are set in d. The method assumes d is a valid child of Metadata so that the is_defined method will not generate an exception. that means this method should ALWAYS be called after a test with _input_is_valid.
- class mspasspy.algorithms.edit.Multiply(key, value_to_multiply=1)[source]
Bases:
MetadataOperator
Used to implement *= operator on a specified Metadata key. Example: to multiple metadata in, d, with key=’Pamp’ by 2.5 you could use this
op = Multiply(‘Pamp’,2.5) d = op.apply(d)
- apply(d, apply_to_members=False)[source]
Implementations are required to implement this method. It must accept a MsPASS data object, d, (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeimogramEnsemble) and apply the operator the implementation defines to the Metadata of d. It should throw a MsPASSError exception if d is not one of the expected data types. It should, on the other hand, handle d==None gracefully and do nothing if d is None. Ensemble handlers need to deal with an ambiguity in which metadata the operation refers to. Both the overall ensemble container and individual members have (normally independent) Metadata containers. Implementations of this method should have a apply_to_members argument as in the signature for this base class. When True the editing is done by a loop over all members while if False (which should be the default) the ensemble’s container should be used.
This method must return an (potentially but not guarnateed) edited version of d. That is essential to allow this method to be used as the function in a parallel map operator.
If a required metadata key is missing this method should do nothing unless a verbose flag is set. In that case it should log that as an error. Best practice for any use of operators based on this base class is to apply a kill operator for (MetadataDefined) on all data before running a calculator. That way you can guarantee the data needed for all operations is valid before trying to do calculations on on required metadata. There is some overhead in validating metadata so all implementations should include use of “fast_mode”. When set true the safeties will be bypassed for speed. That includes at least the two requried methods “check_keys” and “check_operation”. Implementations may add other safties.
- check_keys(d)[source]
checks that the key defined for the operator is defined for d. Returns true if is defined and false if not. the method assumes d is a valid child of Metadata so the is_defined method will not generate an exception. that means this method should ALWAYS be called after a test with _input_is_valid.
- class mspasspy.algorithms.edit.Multiply2(key0, key1, key2)[source]
Bases:
MetadataOperator
Used to implement * operator that multiplies two Metadata attributes together. The attributes are feteched with two keys set when the operator is constructed. Let a be the value in a datum associated with key2 (arg1 of constructor) and b be the value associated with key2 (arg2 of constructor). The apply method of this class computes a*b and sets the Metadata attribute defined by key0 (arg0 of constructor) to that value (a*b) i.e. d[key0] = d[key1] * d[key2]
Note key0 can be the same as either key1 or key2. The contents of the left hand side (key0) are always set by this operator unless that input was previously marked dead. Further key1 and key2 can be the same although it is hard to conceive how that could be useful.
Example: to compute ix as the d[‘sx’] * d[‘chan’] use
op = Multiply2(‘ix1’,’sx’,’chan’) d = op.apply(d)
- apply(d, apply_to_members=False)[source]
Implementations are required to implement this method. It must accept a MsPASS data object, d, (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeimogramEnsemble) and apply the operator the implementation defines to the Metadata of d. It should throw a MsPASSError exception if d is not one of the expected data types. It should, on the other hand, handle d==None gracefully and do nothing if d is None. Ensemble handlers need to deal with an ambiguity in which metadata the operation refers to. Both the overall ensemble container and individual members have (normally independent) Metadata containers. Implementations of this method should have a apply_to_members argument as in the signature for this base class. When True the editing is done by a loop over all members while if False (which should be the default) the ensemble’s container should be used.
This method must return an (potentially but not guarnateed) edited version of d. That is essential to allow this method to be used as the function in a parallel map operator.
If a required metadata key is missing this method should do nothing unless a verbose flag is set. In that case it should log that as an error. Best practice for any use of operators based on this base class is to apply a kill operator for (MetadataDefined) on all data before running a calculator. That way you can guarantee the data needed for all operations is valid before trying to do calculations on on required metadata. There is some overhead in validating metadata so all implementations should include use of “fast_mode”. When set true the safeties will be bypassed for speed. That includes at least the two requried methods “check_keys” and “check_operation”. Implementations may add other safties.
- check_keys(d)[source]
checks that the keys required for the operator are defined for d. If either required key are missing from d return False. Return True if both are set in d. The method assumes d is a valid child of Metadata so that the is_defined method will not generate an exception. that means this method should ALWAYS be called after a test with _input_is_valid.
- class mspasspy.algorithms.edit.SetValue(key, constant_value=0)[source]
Bases:
MetadataOperator
Used to set a specified metadata key to a constant value. Note any existing value of Metadata associated with the key defined in the operator always be overwritten.
Example: to set the value of key = ‘a’ to constant 2.0
op = SetValue(‘a’,2.0) d = op.apply(d)
- apply(d, apply_to_members=False)[source]
Used to apply this operator to Metadata of a MsPASS data object. Use of decorator adds common MsPASS arguments as call options.
- Parameters:
d – datum to which operator is to be applied. d must be a
valid MsPASS data object or this method will throw a fatal MsPASSError exception. If d is marked dead it will be silently ignored. :param apply_to_members: when true and d is an ensemble object the operator is applied to the members. When false the metadata for the ensemble will be altered. This parameter is ignored for atomic data types. Default is False.
- Returns:
always returns a (usually) edited copy of the input d.
When the input d is dead the copy will always be unaltered. Note the copy is a shallow copy which in python means we just return the equivalent of a pointer to the caller. Important for efficiency as d can be very large for some ensembles.
- check_keys(d)[source]
Useless implementation of required abstract base method. In this case all it does is test if the stored value of key is a string. Returns true if the key is a string and false otherwise.
- check_operation(d)[source]
All implementations should implement this method even if they choose to ignore it. It should be used to guarantee the operator the class defines will succeed on the datum sent to the apply method. e.g. any standard arithmetic operations will throw a TypeError if one of the operands is a string. This method should be used to make the operator as bombproof as possible logging a message to the datum rather than aborting if there is an issue. Some classes may want to implement this as pass because it makes no sense - e.g. setting a constant value. Those are the exception, however, so the api dogmatically demands these be implemented even if they do nothing.
- class mspasspy.algorithms.edit.Subtract(key, value_to_subtract)[source]
Bases:
MetadataOperator
Used to implement -= operator on a specified Metadata key. Example: to subtract 2 from metadata, d, with key=’icdp’ could use this
op = Subtract(‘icdp’,2) d = op.apply(d)
- apply(d, apply_to_members=False)[source]
Implementations are required to implement this method. It must accept a MsPASS data object, d, (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeimogramEnsemble) and apply the operator the implementation defines to the Metadata of d. It should throw a MsPASSError exception if d is not one of the expected data types. It should, on the other hand, handle d==None gracefully and do nothing if d is None. Ensemble handlers need to deal with an ambiguity in which metadata the operation refers to. Both the overall ensemble container and individual members have (normally independent) Metadata containers. Implementations of this method should have a apply_to_members argument as in the signature for this base class. When True the editing is done by a loop over all members while if False (which should be the default) the ensemble’s container should be used.
This method must return an (potentially but not guarnateed) edited version of d. That is essential to allow this method to be used as the function in a parallel map operator.
If a required metadata key is missing this method should do nothing unless a verbose flag is set. In that case it should log that as an error. Best practice for any use of operators based on this base class is to apply a kill operator for (MetadataDefined) on all data before running a calculator. That way you can guarantee the data needed for all operations is valid before trying to do calculations on on required metadata. There is some overhead in validating metadata so all implementations should include use of “fast_mode”. When set true the safeties will be bypassed for speed. That includes at least the two requried methods “check_keys” and “check_operation”. Implementations may add other safties.
- check_keys(d)[source]
checks that the key defined for the operator is defined for d. Returns true if is defined and false if not. the method assumes d is a valid child of Metadata so the is_defined method will not generate an exception. that means this method should ALWAYS be called after a test with _input_is_valid.
- class mspasspy.algorithms.edit.Subtract2(key0, key1, key2)[source]
Bases:
MetadataOperator
Used to implement - operator that computes the difference of two Metadata attributes. The attributes are feteched with two keys set when the operator is constructed. Let a be the value in a datum associated with key2 (arg1 of constructor) and b be the value associated with key2 (arg2 of constructor). The apply method of this class computes a-b and sets the Metadata attribute defined by key0 (arg0 of constructor) to that value (a-b) i.e. d[key0] = d[key1] - d[key2]
Note key0 can be the same as either key1 or key2. The contents of the left hand side (key0) are always set by this operator unless that input was previously marked dead. Further key1 and key2 can be the same although it is hard to conceive how that could be useful.
Example: to compute ix as the d[‘sx’] - d[‘chan’] use
op = Subtract2(‘ix1’,’sx’,’chan’) d = op.apply(d)
- apply(d, apply_to_members=False)[source]
Implementations are required to implement this method. It must accept a MsPASS data object, d, (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeimogramEnsemble) and apply the operator the implementation defines to the Metadata of d. It should throw a MsPASSError exception if d is not one of the expected data types. It should, on the other hand, handle d==None gracefully and do nothing if d is None. Ensemble handlers need to deal with an ambiguity in which metadata the operation refers to. Both the overall ensemble container and individual members have (normally independent) Metadata containers. Implementations of this method should have a apply_to_members argument as in the signature for this base class. When True the editing is done by a loop over all members while if False (which should be the default) the ensemble’s container should be used.
This method must return an (potentially but not guarnateed) edited version of d. That is essential to allow this method to be used as the function in a parallel map operator.
If a required metadata key is missing this method should do nothing unless a verbose flag is set. In that case it should log that as an error. Best practice for any use of operators based on this base class is to apply a kill operator for (MetadataDefined) on all data before running a calculator. That way you can guarantee the data needed for all operations is valid before trying to do calculations on on required metadata. There is some overhead in validating metadata so all implementations should include use of “fast_mode”. When set true the safeties will be bypassed for speed. That includes at least the two requried methods “check_keys” and “check_operation”. Implementations may add other safties.
- check_keys(d)[source]
checks that the keys required for the operator are defined for d. If either required key are missing from d return False. Return True if both are set in d. The method assumes d is a valid child of Metadata so that the is_defined method will not generate an exception. that means this method should ALWAYS be called after a test with _input_is_valid.
- mspasspy.algorithms.edit.erase_metadata(d, keylist, apply_to_members=False)[source]
This editor clears the contents of any data associated with a list of Metadata keys. If there is no data for that key it will silently do nothing. If the input is an ensemble and apply_to_members is True all the algorithm will run on the metadata of each member in a loop. If false only the ensemble (global) metadata are handled.
- Parameters:
d – must be a valid MsPASS data object. If not the function
will throw an exception. If the datum is marked dead it will silently just return the contents.
- Parameters:
keylist – python list of strings to use as keywords. Any matching
keys the metadata of d will be cleared.
- Parameters:
apply_to_members – is a boolean controlling how the function
should handle ensembles. Then set true the erase algorithm will be applied in a loop over the member vector of an input ensemble. When False (default) only the ensemble’s metadata is checked. This parameter is ignored if the input is an atomic data type.
- Returns:
edited copy of d
resample
- class mspasspy.algorithms.resample.BasicResampler(dt=None, sampling_rate=None)[source]
Bases:
ABC
Base class for family of resampling operators. All this class really does is define the interface and standardize the target of the operator. A key concept of this family of operator is they are intended to be used in a map operator to regularize the sample rate to a constant. Hence, the base class defines that constant output sample rate (alternatively the sample interval, dt).
- ALL implementations must recognize a couple fundamental concepts:
This is intended to ONLY be used on waveform segments. The problem of resampling continuous data requires different algorithms. The reason is boundary effects. All implementations are subject to edge transients. How the implementation does or does not handle that issue is viewed as an implementation detail,
Downsampling ALWAYS requires some method to avoid aliasing of the output. How that is done is considered an implementation detail.
This is a sketch of an algorithm is pseudopython code showing how a typical instance of this class (in the example ScipyResampler) would be used in parallel workflow sketch:
Example
resamp_op = ScipyResampler(10.0) # target sample rate of 10 sps cursor = db.TimeSeries.find({}) bag = read_distributed_data(cursor,collection=”wf_TimeSeries”) bag = bag.map(resamp_op.resample) bag.map(db.save_data()) bag.compute()
A point to emphasize is the model is to generate the operator through a constructor (ScipResampler in the example) that defines the target sample rate. All data passed through that operator through the map operator call will be returned to create a bag/rdd with a uniform sample rate. All implementations should also follow the MsPASS rule for parallel algorithms to kill data that cannot be handled and not throw exceptions unless the whole usage is wrong.
- dec_factor(d)[source]
All implementations of decimation algorithms should use this method to test if resampling by decimation is feasible. A decimation operator only works for downsampling with the restriction that the ouptut sample interval is an integer multiple of the input sample interval.
The function returns a decimation factor to use for atomic data d being tested. The algorithm uses the numpy “isclose” function to establish if the sample interval is feasible. If so it returns the decimation factor as an integer that should be used on d. Not implementations should handle a return of 1 specially for efficiency. A return of 1 means no resampling is needed. A return of 0 or -1 is used for two slightly different cases that indicate a decimation operation is not feasible. A return of 0 should be taken as a signal that the data requires upsampling to match the target sampling rate (interval). A return of -1 means the data require downsampling but the a decimator operator is not feasible. (e.g. needing to create 10 sps data from 25 sps input.)
- Parameters:
d – input mspass data object to be tested. All that is
actually required is d have a “dt” attribute (i.e. d.dt is defined) that is the sample interval for that datum. :type d: assumed to be a MsPASS atomic data object for which the dt attribute is defined. This method has no safeties to test input type. It will throw an exception if d.dt does not resolve.
- class mspasspy.algorithms.resample.ScipyDecimator(sampling_rate, n=None, ftype='iir', zero_phase=True)[source]
Bases:
BasicResampler
This class defines a generic operator where a decimator can be used to downsample any input data to a common sample rate. A decimator requires the ratio of the input to output sample rate to be an integer (equivalently the ratio of the output sample interval to the input sample interval). The operator will fail on any data it receives that are irregular in that sense. For example, 10 sps data can be created by downsampling 40 sps data by a factor of 4. In constract, 10 sps can not be created by decimation of 25 sps data because the ratio is 2.5 (not an integer).
This operator satisfies the concept defined in BasicResampler. That is, a particular concrete instance once constructed will define an operator that will resample any input data to a common sample rate/interval. Because of the requirement of this algorithm that the sample intervals/rates are related by integers the operator has to be able to handle irregular sample rate data. The algorithm is brutal and will kill any datum for which the integer test fails and post an elog message.
This operator is really little more than a wrapper around a scipy function with a similar same name (scipy.signal.decimate). The resample method handles any supported MsPASS data object type but will fail with a TypeError if it receives any other data type.
Be aware decimators all have unavoidable edge effects. The anitialias filter that has to be applied (you can get garbage otherwise) will always produce an edge transient. A key to success with any downsampling operator is to always have a pad zone if possible. That is, you start with a longer time window than you will need for final processing and discard the pad zone when you enter the final stage. Note that is actually true of ALL filtering.
The constructor has almost the same arguments defined in the documentation page for scipy.signal.decimate. The only exception is the axis argument. It needs to be defined internally. For scalar data we pass 0 while for three component data we sent it 1 which is means the decimator is applied per channel.
- resample(mspass_object)[source]
Implementation of required abstract method for this operator. The only argument is mspass_object. The operator will downsample the contents of the sample data container for any valid input. If the input is not a mspass data object (i.e. atomic TimeSeries or Seismogram) or one of the enemble objects it will throw a TypeError exception.
Note for ensembles the algorithm simply applies this method in a loop over all the members of the ensemble. Be aware that any members of the ensemble cannot be resampled to the target sampling frequency (interval) they will be killed. For example, if you are downsampling to 20 sps and you have 25 sps data in the ensemble the 25 sps data will be killed on output with an elog message posted.
Returns an edited clone of the input with revised sample data but no changes to any Metadata.
- class mspasspy.algorithms.resample.ScipyResampler(sampling_rate, window='hann')[source]
Bases:
BasicResampler
This class is a wrapper for the scipy resample algorithm. Obspy users should note that the Trace and Stream method called “resample” is only a light wrapper to apply the scipy resample function.
The algorithm and its limitations are described in the scipy documentation you can easily find with a web search. A key point about this algorithm is that unlike decimate it allows resampling to something not an integer multiple or division from the input OR if you need to upsample data to match the rest of the data set (Note that is not usually a good idea unless the upsampling is followed
by a decimator to get all data to a lower, uniform sample rate.)
A type example where that is essential is some old OBS data from Scripps instruments that had a sample rate that was a multiple of one of the more standard rates like 20 or 100. Such data can be downsampled immediately too something like 10 sps with this operator or upsampled to something like 50 and then downsampled to something like 10 with a factor of 5 decimator.
We emphasize a nice feature of the scipy implementation is that it automatically applies a rational antialiasing filter when downsampling, If, however, you need to do something like regularize a data set with irregular sample rates but preserve a common upper frequency response controlled at the high frequency end by digizer antialias filters (e.g. LH channels from Q330 data) you will need to crack the scipy documentation on setting up a custom antialias filter using FIR filters defined through the window argument. All common digitizer FIR filter coefficients can be found in appropriate response files. That should, in principle, be feasible but mspass developers have not tested that hypothesis.
The concept of the window argument in this constructor is idential to that described in the documentation for scipy.signal.resample. The value passed, in fact, is used as the argument whenever the scipy function is called. Note the other optional arguments to scipy resample are always defaulted because the current default apply to all cases we handle. Be careful if resample changes.
The primary method of this class is a concrete implementation of the resample method.
- resample(mspass_object)[source]
Applies the scipy.signal.resample function to all data held in a mspass container passed through arg0 (mspass_object). This method will accept all supported MsPASS datat objects: TimeSeries, Seismogram, TimeSeriesEnsemble, and SeismogramEnsemble. For Ensembles the method is called recursively on each of the members.
The method returns mspass_object with the sample data altered by the operator defined by a particular instance, which is defined exclusively by the target sample rate for the output. All metadata will be clone without checking. If the metadata have attributes linked to the sample interval the metadata of the result may not match the data.
If the input is marked dead it will be returned immediately with no change.
- mspasspy.algorithms.resample.resample(mspass_object, decimator, resampler, verify_operators=True, object_history=False, alg_name='resample', alg_id=None, dryrun=False, inplace_return=False, function_return_key=None)[source]
Resample any valid data object to a common sample rate (sample interval).
This function is a wrapper that automates handling of resampling. Its main use is in a dask/spark map operator where the input can be a set of irregularly sampled data and the output is required to be at a common sample rate (interval). The problem has some complexity because decimation is normally the preferred method of resampling when possible due to speed and more predictable behavior. The problem is that downsampling by decimation is only possible if the output sampling interval is an integer multiple of the input sample interval. With modern seismology data that is usually possible, but there are some common exceptions. For example, 10 sps cannot be created from 25 sps by decimation. The algorithm tests the data sample rate and if decimation is possible it applies a decimation operator passed as the argument “decimator”. If not, it calls the operator “resampler” that is assumed to be capable of handling any sample rate change. The two operators must have been constructed with the same output target sampling frequency (interval). Both must also be a subclass of BasicResampler to match the api requirements.
The parameters object_history, alg_name, alg_id, dryrun, inplace_return, and function_return_key are handled by the decorator called mspass_func_wrapper used by this function. See the docstring for mspass_func_wrapper for the generic use of those parameters.
- Parameters:
mspass_object (Must a TimeSeries, Seismogram, TimeSeriesEnsemble, or SeismogramEnsemble object.) – mspass datum to be resampled
decimator (Must be a subclass of BasicResampler) – decimation operator.
resampler (Must be a subclass of BasicResampler) – resampling operator
verify_operators – boolean controlling whether safety checks are applied to inputs. When True (default) the contents of decimator and resampler are verified as subclasses of BasicResampler and the function tests if the target output sampling frequency (interval) of both operators are the same. The function will throw an exception if any of the verify tests fail. Standard practice should be to verify the operators and valid before running a large workflow and running production with this arg set False for efficiency. That should acceptable in any case I can conceive as once the operators are defined in a parallel workflow they should be invariant for the entire application in a map operator.
signals
- mspasspy.algorithms.signals.correlate(a, b, shift, object_history=False, alg_name='correlate', alg_id=None, dryrun=False, demean=True, normalize='naive', method='auto')[source]
Cross-correlation of two signals up to a specified maximal shift. :param a: first signal :param b: second signal :param shift: Number of samples to shift for cross correlation. The cross-correlation will consist of 2*shift+1 or
2*shift samples. The sample with zero shift will be in the middle.
- Parameters:
object_history – True to preserve the processing history. For details, refer to
mspass_func_wrapper_multi
.alg_name (
str
) – alg_name is the name the func we are gonna save while preserving the history.alg_id (
bson.objectid.ObjectId
) – alg_id is a unique id to record the usage of func while preserving the history.dryrun – True for dry-run, which return “OK”. Used in the mspass_func_wrapper.
demean (bool) – Demean data beforehand.
normalize – Method for normalization of cross-correlation. One of ‘naive’ or None (True and False are supported for backwards compatibility). ‘naive’ normalizes by the overall standard deviation. None does not normalize.
method (str) – Method to use to calculate the correlation. ‘direct’: The correlation is determined directly from sums, the definition of correlation. ‘fft’ The Fast Fourier Transform is used to perform the correlation more quickly. ‘auto’ Automatically chooses direct or Fourier method based on an estimate of which is faster. (Only available for SciPy versions >= 0.19. For older Scipy version method defaults to ‘fft’.)
- Returns:
cross-correlation function.
- mspasspy.algorithms.signals.correlate_stream_template(stream, template, object_history=False, alg_name='correlate_stream_template', alg_id=None, dryrun=False, template_time=None, return_type='seismogram', **kwargs)[source]
- mspasspy.algorithms.signals.correlate_template(data, template, object_history=False, alg_name='correlate_template', alg_id=None, dryrun=False, mode='valid', normalize='full', demean=True, method='auto')[source]
- mspasspy.algorithms.signals.correlation_detector(stream, templates, heights, distance, object_history=False, alg_name='correlation_detector', alg_id=None, dryrun=False, template_times=None, template_magnitudes=None, template_names=None, similarity_func=None, details=None, plot=None, return_type='seismogram', **kwargs)[source]
- mspasspy.algorithms.signals.detrend(data, *args, object_history=False, alg_name='dtrend', alg_id=None, dryrun=False, inplace_return=True, type='simple', **options)[source]
This function removes a trend from the data, which is a mspasspy object. Note it is wrapped by mspass_func_wrapper, so the processing history and error logs can be preserved.
- Parameters:
data – input data, only mspasspy data objects are accepted, i.e. TimeSeries, Seismogram, Ensemble.
type (str) – type of filter, ‘simple’, ‘linear’, ‘constant’, ‘polynomial’, ‘spline’. You can refer to Obspy <https://docs.obspy.org/packages/autogen/obspy.core.trace.Trace.detrend.html> for details.
args – extra arguments
object_history – True to preserve the processing history. For details, refer to
mspass_func_wrapper
.alg_name (
str
) – alg_name is the name the func we are gonna save while preserving the history.alg_id (
bson.objectid.ObjectId
) – alg_id is a unique id to record the usage of func while preserving the history.dryrun – True for dry-run, which return “OK”. Used in the mspass_func_wrapper.
inplace_return – True to return data in mspass_func_wrapper.
options – extra kv options
- Returns:
None
- mspasspy.algorithms.signals.filter(data, type, *args, object_history=False, alg_name='filter', alg_id=None, dryrun=False, inplace_return=True, **options)[source]
This function filters the data of mspasspy objects. Note it is wrapped by mspass_func_wrapper, so the processing history and error logs can be preserved.
- Parameters:
data – input data, only mspasspy data objects are accepted, i.e. TimeSeries, Seismogram, Ensemble.
type (str) – type of filter, ‘bandpass’, ‘bandstop’, ‘lowpass’, ‘highpass’, ‘lowpass_cheby_2’, ‘lowpass_fir’, ‘remez_fir’. You can refer to Obspy <https://docs.obspy.org/packages/autogen/obspy.core.trace.Trace.filter.html> for details.
args – extra arguments
object_history – True to preserve the processing history. For details, refer to
mspass_func_wrapper
.alg_name (
str
) – alg_name is the name the func we are gonna save while preserving the history.alg_id (
bson.objectid.ObjectId
) – alg_id is a unique id to record the usage of func while preserving the history.dryrun – True for dry-run, which return “OK”. Used in the mspass_func_wrapper.
inplace_return – True to return data in mspass_func_wrapper.
options – extra kv options
- Returns:
None
- mspasspy.algorithms.signals.interpolate(data, sampling_rate, *args, object_history=False, alg_name='interpolate', alg_id=None, dryrun=False, inplace_return=True, method='weighted_average_slopes', starttime=None, npts=None, time_shift=0.0, **kwargs)[source]
This function interpolates data, which is a mspasspy object. Note it is wrapped by mspass_func_wrapper, so the processing history and error logs can be preserved.
- Parameters:
data – input data, only mspasspy data objects are accepted, i.e. TimeSeries, Seismogram, Ensemble.
sampling_rate – The new sampling rate in Hz.
args – extra arguments.
object_history – True to preserve the processing history. For details, refer to
mspass_func_wrapper
.alg_name (
str
) – alg_name is the name the func we are gonna save while preserving the history.alg_id (
bson.objectid.ObjectId
) – alg_id is a unique id to record the usage of func while preserving the history.dryrun – True for dry-run, which return “OK”. Used in the mspass_func_wrapper.
inplace_return – True to return data in mspass_func_wrapper.
method (str) – One of “linear”, “nearest”, “zero”, “slinear”, “quadratic”, “cubic”, “lanczos”, or “weighted_average_slopes”. You can refer to Obspy <https://docs.obspy.org/packages/autogen/obspy.core.trace.Trace.interpolate.html> for details.
starttime (
UTCDateTime
or int) – The start time (or timestamp) for the new interpolated stream. Will be set to current start time of the data if not given.npts (int) – The new number of samples. Will be set to the best fitting number to retain the current end time of the trace if not given.
time_shift – Shift the trace by adding time_shift to the starttime. The time shift is always given in seconds. A positive shift means the data is shifted towards the future, e.g. a positive time delta. Note that this parameter solely affects the metadata. The actual interpolation of the underlaying data is governed by the parameters sampling_rate, starttime and npts.
kwargs – extra kv arguments
- Returns:
None.
- mspasspy.algorithms.signals.templates_max_similarity(st, time, streams_templates, object_history=False, alg_name='templates_max_similarity', alg_id=None, dryrun=False)[source]
- mspasspy.algorithms.signals.xcorr_3c(st1, st2, shift_len, object_history=False, alg_name='xcor_3c', alg_id=None, dryrun=False, components=None, full_xcorr=False, abs_max=True)[source]
snr
- mspasspy.algorithms.snr.EstimateBandwidth(S, N, snr_threshold=1.5, df_smoother=None, f0=1.0, f_max=None) BandwidthData [source]
Estimates a set of signal bandwidth estimates returned in the the class BandwidthData. The algorithm is most appropriate for body waves recorded at teleseimic distances from earthquake sources. The reason is that the algorithm is most appropriate for signal and noise spectra typical of that type of data. In particular, broadband noise is very colored by the microseisms. Large enough signals can exceed the microseism peak but smaller events will not. The smallest events typically are only visible in the traditional short-period band. The most difficult are the ones that have signals in both the short and long period band but have high noise levels in the microseism band making a single bandwidth the wrong model. This function handles that by only returning the band data for the section near the search start defined by the “f0” argument.
The algorithm works by searching from a starting frequency defined by the “f0” argument. The basic idea is it hunts up and down the frequency axis until it detects the band edge. The “band edge” detection is defined as the point where the signal-to-noise ratio first falls below the value defined by the “snr_threshold” argument. The algorithm has two variants of note: 1. If no point in the snr curve exceeds the valued defined by
“snr_threshold” the function returns immediately with all attributes of the BandwidthData object set to 0.
If the snr value at f0 does not exceed the threshold it searches down until it finds a value exceeding the threshold. In that situation it marks the first point found as the high frequency band edge and continues hunt backward to attempt to define the low frequency band edge.
The “df_smoother” argument can be used to smooth the internally generated signal-to-noise ratio vector. It is particularly useful for data with noise containing spectral lines that can create incorrect bandwidth data. It is rarely necessary if the power spectra are computed with the multitaper method as that method produces spectra that are inherently smooth at a specified scale. In that case if smoothing is desired we recommend the smoothing width be of the form k*tbp*df where tbp is the time bandwidth product, df is the Rayleigh bin size, and k is some small multipler (note multitaper spectra a inherently smoothed by 2*tbp*df).
- Parameters:
S (
mspasspy.ccore.seismic.PowerSpectrum
) – power spectrum computed from signal time window.N (
mspasspy.ccore.seismic.PowerSpectrum
) – power spectrum computed from noise time window (Note S and N do not need to be on the same frequency grid but the signal grid is used to compute the signal to noise ratio curve)snr_threshold (float (default 1.5) grid but the signal grid is used to compute the signal to spectrum within the range defined by this parameter. i.e. the number of points in the smoother is round(df_smoother/S.df())) – value of snr used to define the band edges. As noted above the algorithm searches from the value f0 for the first points above and below that frequency where the snr curve has a value less than or equal to this value.
f0 (float (default 1.0)) – frequency to start searching up and down the frequency axis.
- Returns:
Returns an instance of the BandwidthData class. BandwidthData has the following attributes that are set by this function:
”low_edge_f” low frequency corner of estimated bandwidth
”low_edge_snr” snr at low corner
”high_edge_f” high frequency corner of estimated bandwidth
”high_edge_snr” snr at high corner
”f_range” total frequency range of estimate (range of S)
Note the low edge can be zero which must be handled carefully if the output is used for designing a filter. Further all values will be 0 if no points in the snr curve exceed the defined threshold.
- mspasspy.algorithms.snr.FD_snr_estimator(data_object, noise_window=<mspasspy.ccore.algorithms.basic.TimeWindow object>, noise_spectrum_engine=None, signal_window=<mspasspy.ccore.algorithms.basic.TimeWindow object>, signal_spectrum_engine=None, band_cutoff_snr=1.5, signal_detection_minimum_bandwidth=6.0, f_low_zero_test=None, tbp=4, ntapers=6, f0=1.0, poles=3, perc=95.0, optional_metrics=None, save_spectra=False) tuple [source]
Estimates one or more amplitude metrics of signal-to-noise from a TimeSeries object. Results are returned as a set of key-value pairs in a python dict.
FD_snr_estimator first estimates bandwidth with the function in this module called EstimateBandwidth. See the docstring of that function for how the bandwidth is estimated. The metrics this function computes all depend upon that bandwidth estimate. The default return of this function is the return of EstimateBandwidth translated to keyp-value pairs. Those are:
low_f_band_edge - lowest frequency exceeding threshold high_f_band_edge - highest frequency exeeding threshold high_f_band_edge_snr and low_f_band_edge_snr are the snr values
at the band edges
- spectrum_frequency_range - total frequency band for estimate
(really just 0 to Nyquist).
- bandwidth - bandwidth of estimate in dB.
i.e. 20*log10(high_f_band_edge/low_f_band_edge)
bandwidth_fraction - bandwidth/spectrum_frequency_range
A set of optional metrics can be computed. All optional metrics use the bandwidth estimates in one way or another. Optional metrics are defined by the following keywords passed through a list (actually any iterable container will work) of strings defining one or more of the keywords. The metrics and a brief description of each follow:
snr_stats computes what are commonly plotted in box plots for the snr estimates within the estimated bandwidth: minimum, maximum, 0.25 (1/4) point, 0.75 (3/4) point, and the median. These are set with following dict keys: ‘snr_band_maximum’,’snr_band_minimum’, ‘snr_band_1/4’, ‘srn_band_3/4’, and ‘snr_band_median’ respectively.
filtered_envelope, filtered_L2, filtered_Linf, filtered_perc, and filtered_MAD: All of these optional metrics first copy the data_object and then filter the copy with a Butterworth bandpass filter with the number of poles specified by the npoles argument and corners at the estimated band edge by the EstimateBandwidth function. The algorithm automatically handles the case of a zero low frequency edge. That is, with large events the low band edge can be computed as 0 frequency. More commonly the band edge is computed as one or two rayleigh bins above 0. A bandpass filtered applied with a corner too close to 0 can produced distorted (or null) results. To prevent that the default behavior is to revert to a low pass filter versus a bandpass filter when the estimated value of low_f_band_edge is small. By default “small” is defined as <= 2.0*tbp*df, where tbp is the time bandwidth product for the multitaper spectral estimates (in optional argument) and df is the frequency sampling interval of the spectrum computed from the data in signal_window. You can use a different recipe by passing a value for the optional parameter “f_low_zero_test” which will replace the computed value using the formula above for switching to a lowpass filter. The optional metrics are time domain estimates computed from the bandpass (lowpass) filtered data. They are actually computed from functions in this same module that can be used independently and have their own docstring description. The functions called have the following names in order of the keyword list above: snr_envelope, snr_filtered_rms, snr_Linv, and snr_filtered_mad. When the computed they are set in the output dictionary with the following (again in order) keys: ‘snr_envelope’,’snr_filtered_rms’, ‘srn_Linf’, and ‘snr_filtered_mad’.
It is important to note that all the metrics this function returns are measures of amplitude NOT power. You need to be particularly aware of this if you unpickle the spectra created if you set save_spectra true as those are power spectra.
- Parameters:
data_object – TimeSeries object to be processed. For Seismogram objects the assumption is algorithm would be used for a single component (e.g longitudinal or vertical for a P phase)
noise_window (
mspasspy.ccore.algorithms.basic.TimeWindow
default -130 to -5 s.) – defines the time window to use for computing the spectrum considered noise. The time span can be either relative or UTC (absolute) time but we do not check for consistency. This low level function assumes they are consistent. If not, the calculations are nearly guaranteed to fail. Type must be mspasspy.ccore.TimeWindow.signal_window (
mspasspy.ccore.algorithms.basic.TimeWindow
default -5 to 120 s) – defines the time window to use that defines what you consider “the signal”. The time span can be either relative or UTC (absolute) time but we do not check for consistency. This low level function assumes they are consistent. If not, the calculations are nearly guaranteed to fail. Type must be mspasspy.ccore.TimeWindow.signal_spectrum_engine (None (default) or an instance of
mspasspy.ccore.algorithms.deconvolution.MTPowerSpectrumEngine
) – is the comparable MTPowerSpectralEngine to use to compute the signal power spectrum. Default is None with the same caveat as above for the noise_spectrum_engine.band_cutoff_snr (float) – defines the signal-to-noise ratio floor used in the search for band edges. See description of the algorithm above and in the user’s manual. Default is 1.5
signal_spectrum_engine – is the comparable MTPowerSpectralEngine to use to compute the signal power spectrum. Default is None with the same caveat as above for the noise_spectrum_engine.
band_cutoff_snr – defines the signal-to-noise ratio floor used in the search for band edges. See description of the algorithm above and in the user’s manual. Default is 2.0
signal_detection_minimum_bandwidth (float (default 6.0 dB)) – As noted above this algorithm first tries to estimate the bandwidth of data where the signal level exceeds the noise level defined by the parameter band_cutoff_snr. It then computes the bandwidth of the data in dB computed as 20*log10(f_high/f_low). For almost any application if the working bandwidth falls below some threshold the data is junk to all intends and purpose. A factor more relevant to this algorithm is that the “optional parameters” will all be meaningless and a waste of computational effort if the bandwidth is too small. A particular extreme example is zero bandwidth that happens all the time if no frequency band exceeds the band_cutoff_snr for a range over that minimum defined by the time-bandwidth product. The default is 6.0. (One octave which is roughly the width of the traditional short-period band) which allows optional metrics to be computed but may be too small for some applications. If your application requires higher snr and wider bandwidth adjust this parameter and/or band_cutoff_snr.
f_low_zero_test (float (default is None which causes the test to revert to 2.0*tbp*df (see above).) – optional lower bound on frequency to use for test to disable the low frequency corner. (see above)
tbp (float (default 4.0)) – time-bandwidth product to use for computing the set of Slepian functions used for the multitaper estimator. This parameter is used only if the noise_spectrum_engine or signal_spectrum_engine arguments are set as None.
ntapers (integer (default 6)) – is the number of Slepian functions (tapers) to compute for the multitaper estimators. Like tbp it is referenced only if noise_spectrum_engine or signal_spectrum_engine are set to None. Note the function will throw an exception if the ntaper parameter is not consistent with the time-bandwidth product.
f0 (float (default 1.0)) – frequency to use to start search for bandwidth up and down the frequency axis (see above).
npoles (integer (default 3)) – defines number of poles to us for the Butterworth bandpass or lowpass applied for the “filtered” metrics (see above). Default is 3.
perc (float (default 95.0)) – used only if ‘filtered_perc’ is in the optional metrics list. Specifies the perc parameter as used in seismic unix. Uses the percentage point specified of the sorted abs of all amplitudes. (Not perc=50.0 is identical to MAD) Default is 95.0 which is 2 sigma for Gaussian noise.
optional_metrics (should be a list of strings matching the set of required keywords. Default is None which means none of the optional metrics will be computed.) – is an iterable container containing one or more of the optional snr metrics discussed above. Typos in names will create log messages but will not cause the function to abort.
save_spectra – If set True (default is False) the function will pickle the computed noise and signal spectra and save the strings created along with a set of related metadata defining the time range to the output python dict (these will be saved in MongoDB when db is defined - see below). This option should ONLY be used for spot checking, discovery of why an snr metric has unexpected results using graphics, or a research topic where the spectra would be of interest. It is a very bad idea to turn this option on if you are processing a large quantity of data and saving the results to MongoDB as it may bloat the database. Consider a different strategy if that essential for your work.
- Returns:
python tuple with two components. 0 is a python dict with the computed metrics associated with keys defined above. 1 is a mspass.ccore.ErrorLogger object. Any errors in computng any of the metrics will be posted to this logger. Users should then test this object using it’s size() method and if it the log is not empty (size >0) the caller should handle that condition. For normal use that means pushing any messages the log contains to the original data object’s error log. Component 0 will also be empty with no log entry if the estimated bandwidth falls below the threshold defined by the parameter signal_detection_minimum_bandwidth.
- mspasspy.algorithms.snr.arrival_snr(data_object, noise_window=<mspasspy.ccore.algorithms.basic.TimeWindow object>, noise_spectrum_engine=None, signal_window=<mspasspy.ccore.algorithms.basic.TimeWindow object>, signal_spectrum_engine=None, band_cutoff_snr=2.0, signal_detection_minimum_bandwidth=6.0, tbp=4.0, ntapers=6, f_low_zero_test=None, f0=1.0, poles=3, perc=95.0, save_spectra=False, phase_name='P', arrival_time_key='Ptime', metadata_output_key='Parrival', kill_null_signals=True, optional_metrics=['snr_stats', 'filtered_envelope', 'filtered_L2', 'filtered_Linf', 'filtered_MAD', 'filtered_perc'])[source]
Specialization of FD_snr_estimator. A common situation where snr data is a critical thing to estimate is data windowed around a given seismic phase. FD_snr_estimator is a bit more generic. This function removes some of the options from the more generic function and has a frozen structure appropriate for measuring snr of a particular phase. In particular it always stores the results as a subdocument (python dict) keyed by the name defined in the metadata_output_key argument. This function has a close sibling called “broadband_snr_QC” that has similar behavior but add some additional functionality. The most significant limitation of this function relative to broadband_snr_QC is that this function ONLY accepts TimeSeries data as input.
This function is most appropriate for QC done within a workflow where the model is to process a large data set and winnow it down to separate the wheat from the chaff, to use a cliche consistent with “winnow”. In that situation the normal use would be to run this function with a map operator on atomic data and follow it with a call to filter to remove dead data and/or filter with tests on the computed metrics. See User’s Manual for guidance on this topic. Because that is the expected normal use of this function the kill_null_signals boolean defaults to True.
To be more robust the function tries to handle a common error. That is, if the input data has a UTC time standard then the noise and signal windows would need to be shifted to some reference time to make any sense. Consequently, this algorithm silently handles that situation automatically with a simple test. If the data are relative time no test of the time window range is made. If the data are UTC, however, it tests if the signal time window is inside the data range. If not, it shifts the time windows by a time it tries to pull from the input with the key defined by “arrival_time_key”. If that attribute is not defined a message is posted to elog of the input datum and it is returned with no other change. (i.e. the attribute normally output with the tag defined by metadata_output_key will not exist in the output). Large data workflows need to handle this condition,.
Dead inputs are handled the standard way - returned immediately with no change.
Most parameters for this function are described in detail in the docstring for FD_snr_estimator. The user is referred there to see the usage. The following are added for this specialization:
- Parameters:
phase_name – Name tag for the seismic phase being analyzed. This string is saved to the output subdocument with the key “phase”. The default is “P”
arrival_time_key – key (string) used to fetch an arrival time if the data are in UTC and the time window received does not overlap the data range (see above)
kill_null_signals – boolean controlling how null snr estimator returns are handled. When True (default) if FD_snr_estimator returns a null result (no apparent signal) that input datum is killed before being returned. In that situation no snr metrics will be in the output because null means FD_snr_estimator couldn’t detect a signal and the algorithm failed. When False the datum is returned silently but will have no snr data defined in a dict stored with the key metadata_output_key (i.e. that attribute will be undefined in output)
metadata_output_key –
is a string used as a key under which the subdocument (python dict) created internally is stored. Default is “Parrival”. The idea is if multiple phases are being analyzed each phase should have a different key set by this argument (e.g. if PP were also being analyzed in the same workflow you
might use a key like “PParrival”).
- Returns:
a copy of data_object with the the results stored under the key defined by the metadata_output_key argument.
- mspasspy.algorithms.snr.broadband_snr_QC(data_object, component=2, noise_window=<mspasspy.ccore.algorithms.basic.TimeWindow object>, noise_spectrum_engine=None, signal_window=<mspasspy.ccore.algorithms.basic.TimeWindow object>, signal_spectrum_engine=None, band_cutoff_snr=1.5, signal_detection_minimum_bandwidth=6.0, f_low_zero_test=None, tbp=4.0, ntapers=6, f0=1.0, kill_null_signals=True, poles=3, perc=95.0, phase_name='P', metadata_output_key='Parrival', optional_metrics=['snr_stats', 'filtered_envelope', 'filtered_L2', 'filtered_Linf', 'filtered_MAD', 'filtered_perc'], save_spectra=False, use_measured_arrival_time=False, measured_arrival_time_key='Ptime', taup_model=None, source_collection='source', receiver_collection=None)[source]
Compute a series of metrics that can be used for quality control filtering of seismic phase data.
This function is intended as a workhorse to be used for low-level, automated QC of broadband data when the the data set is defined by signals linked to a timeable seismic phase. It can be thought of as a version of a related function called “arrival_snr” with some additional features. See the docstring for that function for what those base features are. Features this function adds not found in arrival_snr are:
This function allows Seismogram inputs. Only TimeSeries data are handled by arrival_snr.
This function provides an option to compute arrival times from source coordinates, receiver coordinates, and a handle to an obspy tau-p calculator.
Otherwise it behaves the same. Note both functions may or may not choose to interact with the function save_snr_arrival. If you want to save the computed metrics into a form more easily fetched your workflow should extract the contents of the python dictionary stored under the metadata_output_key tag and save the result to MongoDB with the save_snr_arrival function. That option is most useful for test runs on a more limited data set to sort out values of the computed metrics that are appropriate for a secondary winnowing of the your data. See User’s Manual for more on this concept.
The input of arg0 (data_object) can be either a TimeSeries or a Seismogram object. If a Seismogram object is passed the “component” argument is used to extract the specified single channel from the Seismogram object and that component is used for processing. That is necessary because all the algorithms used are single channel algorithms. To use this function on all components use a loop over components BUT make sure you use a unique value for the argument “metadata_output_key” for each component. Note this will also produce multiple documents per input datum.
The type of the data_object also has a more subtle implication the user must be aware of. That is, in the MsPASS schema we store receiver coordinates in one of two different collections: “channel” for TimeSeries data and “site” for Seismogram data. When such data are loaded the generic keys like lat are always converted to names like channel_lat or site_lat for TimeSeries and Seismogram data respectively. This function uses the data type to set that naming. i.e. if the input is TimeSeries it tries to fetch the latitude data as channel_lat while if it the input is a Seismogram it tries to fetch site_lat. That is true of all coordinate data loaded by normalization from a source and receiver collection.
Most of the arguments to this function are passed directly to FD_snr_estimator. See the docstring of that function for reference. The following are additional parameters specific to this function:
- Parameters:
data_object – An atomic MsPASS data object to which the algorithms requested should be applied. Currently that means a TimeSeries or Seismogram object. Any other input will result in a TypeError exception. As noted above for Seismogram input the component argument defines which data component is to be used for the snr computations.
component – integer (0, 1, or 2) defining which component of a Seismogram object to use to compute the requested snr metrics. This parameter is ignored if the input is a TimeSeries.
metadata_output_key – string defining the key where the results are to be posted to the returned data_object. The results are always posted to a python dictionary and then posted to the returned data_object with this key. Default is “Parrival”
use_measured_arrival_time – boolean defining the method used to define the time reference for windowing used for snr calculations. When True the function will attempt to fetch a phase arrival time with the key defined by the “measured_arrival_time_key” argument. In that mode if the fetch fails the data_object will be killed and an error posted to elog. That somewhat brutal choice was intentional as the expectation is if you want to use measured arrival times you don’t want data where there are no picks. The default is True to make the defaults consistent. The reason is that the tau-p calculator handle is passed to the function when using model-based travel times. There is no way to default that so it defaults to None.
measured_arrival_time_key – is the key used to fetch a measured arrival time. This parameter is ignored if use_measured_arrival_time is False.
taup_model – when use_measured_arrival_time is False this argument is required. It defaults as None because there is no way the author knows to initialize it to anything valid. If set it MUST be an instance of the obspy class TauPyModel (https://docs.obspy.org/packages/autogen/obspy.taup.tau.TauPyModel.html#obspy.taup.tau.TauPyModel) Mistakes in use of this argument can cause a MsPASSError exception to be thrown (not logged thrown as a fatal error) in one of two ways: (1) If use_measured_arrival_time is False this argument must be defined, and (2) if it is defined it MUST be an instance of TauPyModel.
source_collection – normalization collection for source data. The default is the MsPASS name “source” which means the function will try to load the source hypocenter coordinates (when required) as source_lat, source_lon, source_depth, and source_time from the input data_object. The id of that document is posted to the output dictionary stored under metadata_output_key.
receiver_collection – when set this name will override the automatic setting of the expected normalization collection naming for receiver functions (see above). The default is None which causes the automatic switching to be involked. If it is any other string the automatic naming will be overridden.
- Returns:
the data_object modified by insertion of the snr QC data in the object’s Metadata under the key defined by metadata_output_key.
- mspasspy.algorithms.snr.filter_by_snr_bandwidth(d, qcdoc=None, subdoc_key='Parrival', bandpass_low_f_limit=0.02, npoles=4)[source]
Filter function to automatically handle data processed with arrival_snr or broadband_snr_QC.
The two MsPASS functions arrival_snr and broadband_snr_QC are front ends to a lower level function called FD_broadband_snr. That function is designed to estimate the bandwidth of a signal and compute a suite of snr metrics that can be used to winnow data for further processing. One often wants to then filter the data so processed into the frequency band the algorithm determines. This function does that automaticallty for any atomic datum previously processed with arrival_snr or broadband_snr_QC.
An important complication this function handles is that for large earthquakes the low frequency band edge can be 0 or close (meaning within a few Rayleigh bins) of 0. In that situation a bandpass filter with the corner near 0 is unstable and the filter operator would fail if you tried to use that corner. This function handles that all automatically if you use the defaults for arrival_snr or broadband_snr_QC. If you change any of the defaults you may need to change one of the kwargs for this function. See descriptions below for guidance.
- Parameters:
d (
mspasspy.ccore.seismic.TimeSeries
ormspasspy.ccore.seismic.Seismogram
) – atomic MsPASS data object to be filtered.qcdoc – use attributes stored in a dictionary. Use this approach if you want to impose attributes externally. If defined the contents of this dictionary will be used. If it isn’t define the function will use subdoc_key to try to fetch the same data. Using a dictionary passed via this argument is a way to filter an ensemble in a common band for all members.
subdoc_key (string (default "Parrival")) – subdocument Metadata key to use to fetch data posted by arrival_snr or broadband_snr_QC. Both by default post the computed attributes to a subdocument (python dictionary) that can be fetched from the Metadata of d with a particular key. This argument is the key that should be used to fetch that data. The default is the default for arrival_snr and broadband_snr_QC. Note if no data is found for this key the function attempts to extract the required attributes from the top level of the Metadata container of d. If that fails the function will kill the datum and post an elog message.
bandpass_low_f_limit (float (default 0.02 = 1/50 s)) –
npoles (integer (default 4)) – number of poles to use for Butterworth filter function if not defined in the datum’s Metadata.
- mspasspy.algorithms.snr.save_snr_arrival(db, doc_to_save, wfid, wf_collection='wf_Seismogram', save_collection='arrival', subdocument_key=None, use_update=False, update_id=None, validate_wfid=False) ObjectId [source]
This function is a companion to broadband_snr_QC. It handles the situation where the workflow aims to post calculated snr metrics to an output database (normally in the “arrival” collection but optionally to a parent waveform collection. ). The alternative models as noted in the User’s Manual is to use the kill option to broadband_snr_QC followed by a call to the filter method of bag/rdd to remove the deadwood and reduce the size of data passed downstream in large parallel workflow. That case is better handled by using broadband_snr_QC directly.
How the data are saved is controlled by four parameters: save_collection, use_update, update_id and subdocument_key. They interact in a way that is best summarized as a set of cases that procuce behavior you may want:
If save_collection is not the parent waveform collection, behavior is driven by use_update combined with subdocument_key. When use_update is False (default) the contents of doc_to_save will be used to define a new document in save_collection with the MongoDB insert_one method.
When use_update is True the update_id will be assumed to be defined and point be the ObjectId of an existing document in save_collection. Specifically that id will be used as the query clause for a call the insert_one method. This combination is useful if a workflow is being driven by arrival data stored in save_collection created, for example, for a css3.0 a event->origin->assoc->arrival catalog of arrival picks. A variant of this mode will occur if the argument subdocument_key is defined (default is None). If you define subgdocument_key the contents of doc_to_save will be stored as a subdocument in save_collection accessible with the key defined by subdocument_key.
If save_collection is the same as the parent waveform collection (defined via the input parameter wf_collection) the value of use_update will be ignored and only an update will be attempted. The reason is that if one tried to save the contents of doc_to_save to a waveform collection would corrupt the database by have a bunch of documents that that could not be used to construct a valid data object (the normal use for one of the wf collections).
- Parameters:
db – MongoDB database handle to use for transactions that are the focus of this algorithm.
doc_to_save – python dictionary containing data to be saved. Where and now this is saved is controlled by save_collection, use_update, and subdocument_key as described above.
wfid – waveform document id of the parent datum. It is assumed to be an ObjectId of linking the data in doc_to_save to the parent. It is ALWAYS saved in the output with the key “wfid”.
wf_collection – string defining the collection from which the datum from which the data stored in doc_to_save are associated. wfid is assumed define a valid document in wf_collection. Default is “wf_Seismogram”.
save_collection – string defining the collection name to which doc_to_save should be pushed. See above for how this name interacts with other parameters.
subdocument_key – Optional key for saving doc_to_save as a a subdocument in the save_collection. Default is None which means the contents of doc_to_save will be saved (or update) as is. For saves to (default) arrival collection this parameter should normally be left None, but is allowed. If save_collection is the parent waveform collection setting this to some sensible key is recommended to avoid possible name collisions with waveform Metadata key-value pairs. Default is None which means no subdocuments are created.
use_update – boolean controlling whether or not to use updates or inserts for the contents of doc_to_save. See above for a description of how this interacts with other arguments to this function. Default is False.
update_id – ObjectId of target document when running in update mode. When save_collection is the same as wf_collection this parameter is ignored and the required id passed as wfid will be used for the update key matching. Also ignored with the default behavior if inserting doc_to_save as a new document. Required only if running with a different collection and updating is desired. The type example noted above would be updates to existing arrival informations created from a css3.0 database.
validate_wfid – When set True the id defined by the required argument wfid will be validated by querying wf_collection. In this mode if wfid is not found the function will silently return None. Callers using this mode should handle that condition.
- Returns:
ObjectId of saved record. None if something went wrong and nothing was saved.
- mspasspy.algorithms.snr.snr(data_object, noise_window=<mspasspy.ccore.algorithms.basic.TimeWindow object>, signal_window=<mspasspy.ccore.algorithms.basic.TimeWindow object>, noise_metric='mad', signal_metric='mad', perc=95.0)[source]
Compute time-domain based signal-to-noise ratio with a specified metric.
Signal-to-noise ratio is a fundamental measurement in all forms of seismic data processing. There is, however, not a single unified metric that ideal for all types of signals one may want to analyze. One class of metrics used time-domain metrics to use some measure of amplitude in a signal and noise window cut from a single waveform segment. A type example is snr of some particular “seismic phase” (P, S, PP, ScS, etc) relative to some measure of background noise. e.g. for P phases it is nearly universal to try to estimate snr from some window defined by the arrival time of P and a noise window before the time P arrives (pre-event noise).
This function provides a generic api to measure a large range of metrics using one of four choices for measuring the norm of the data in the signal and noise windows:
rms - L2 norm
mad - median absolute difference, which is essentially the median amplitude in this context
perc - percentage norm ala seismic unix. perc is defined at as the amplitude level were perc percentage of the data have an amplitude smaller than this value. It is computed by ranking (sorting) the data, computing the count of that perctage relative to the number of amplitude samples, and returning the amplitude of the nearest value to that position in the ranked data.
peak - is the peak value which in linear algebra is the L infinity norm
Note the user can specify a different norm for the signal and noise windows. The perc metric requires specifying what percentage level to use. It is important to recognize that ALL of these metrics are scaled to amplitude not power (amplitude squared).
This function will throw a MsPASSError exception if the window parameters do not define a time period inside the range of the data_object. You will need a custom function if the model of windows insider a larger waveform segment does not match your data.
There is one final detail about an snr calculation that we handle carefully. With simulation data it is very common to have error free simulations where the “noise” window one would use with real data is all zeros. An snr calculated with this function in that situation would either return inf or NaN depending on some picky details. Neither is good as either can cause downstream problems. For that reason we trap any condition where the noise amplitude measure is computed as zero. If the signal amplitude is also zero we return a -1.0. Otherwise we return a large, constant, positive number. Neither condition will cause an exception to be thrown as that condition is considered somewhat to be anticipated.
- Parameters:
data_object – MsPASS atomic data object (TimeSeries or Seismogram) to use for computing the snr. Note that for Seismogram objects the metrix always use L2 measures of amplitude of each sample (i.e. vector amplitudes) If snr for components of a Seismogram are desired use ExtractComponent and apply this function to each component separately.
noise_window – TimeWindow objects defining the time range to extract from data_object to define the part of the signal considered noise. Times can be absolute or relative. Default the range -5 to 120 which is makes sense only as time relative to some phase arrival time.
signal_window – TimeWindow object defining the time range to extract from data_object to define the part of the signal defines as signal to use for the required amplitude measure. Default of -130 to -5 is consistent with the default noise window (in terms of length) and is assumes a time relative to a phase arrival time. For absolute times each call to this function may need its own time window.
noise_metric – string defining one of the four metrics defined above (‘mad’,’peak’,’perc’ or ‘rms’) to use for noise window measurement.
signal_metric – string defining one of the four metrics defined above (‘mad’,’peak’,’perc’ or ‘rms’) to use for signal window measurement.
- Returns:
estimated signal-to-noise ratio as a single float. Note the special returns noted above for any situation where the noise window amplitude is 0
- mspasspy.algorithms.snr.visualize_qcdata(d, component=2, qc_subdoc_key='Parrival')[source]
Creates a set of plots to visualize qc results computed from spectral estimates. Requires the input datum d to have the spectra stored in metadata as pickled serialization of spectra used for the estimators. That requires the data to have been run with the option “save_spectra=True” in on of the QC function that use FD_snr_estimator. The function with throw a MsPASSError exception if that metadata is missing.
This function generates graphics and must not be used in a large data processing job. It is for exploratory work only for use on a small subset of data. Note “exploratory” also implies it is not as robust as a processing function and may fail in ways that require debugging.
- Parameters:
d (
mspasspy.ccore.seismic.TimeSeries
ormapsspy.ccore.seismic.Seismogram
.) – Datum to displaycomponent (integer) – component to display if input is a Seismogram object. Ignore if input is a TimeSeries.
qc_subdoc_key (string or None. Default “Parrival” is default of broadband_arrival_QC.) – key used to fetch the subdocument from d normally used to store the output from broadband_snr_QC. If set to None will attempt to fetch required attributes from Metdata container of d.
window
- class mspasspy.algorithms.window.TopMute(t0=0.0, t1=0.5, type='cosine')[source]
Bases:
object
A top mute is a form of taper applied to the “front” of a signal. Front in standard jargon means that with the time axis running from left to right (normal unless your native language is Arabic) the time period on the left side of a plot. Data tagged at times less than what we call the zero time here are always zeroed. The mute defines a ramp function running from the zero time to the “one” time. In the ramp zone the ramp function multiplies the sample data so it is a form of “taper” as used in Fourier analysis. This class should normally only be used on data with the time reference type set Relative. It can be applied to UTC time standard data but with such data one of these objects would often need to be created for each atomic data object, which would be horribly inefficient. In most cases conversion to relative time is an essential step before using this algorithm.
This implementation uses an “apply” method for processing data. That means for a parallel construct instead of the symbol for a function you use the apply method as a function call. (e.g. if tm is an is an instance of this class and “d” is a TimeSeries or Seismogram object to be muted the function call that applies the mute would be ts.apply(d))
- apply(d, object_history=False, instance=None)[source]
Use thie method to apply the defined top mute to one of the MsPASS atomic data objects. The method does a sanity check on the input data range. If the starttime of the data is greater than t0 for the mute the datum is killed and an error posted to elog. The reason is in that situation the data would be completely zeroed anyway and it is better to define it dead and leave an error message than to completely null data. :param d: input atomic MsPASS data object (TimeSeries or Seismogram) :object_history: It set true the function will add define this
step as an map operation to preserve object level history. (default is False)
- Parameters:
instance – string defining the “instance” of this algorithm. This parameter is needed only if object_history is set True. It is used to define which instance of this algrithm is being applied. (In the C++ api this is what is called the algorithm id). I can come from the global history manager or be set manually.
- mspasspy.algorithms.window.WindowData(mspass_object, win_start, win_end, t0shift=None, short_segment_handling='kill', log_recoverable_errors=True, overwrite_members=False, retain_dead_members=True, object_history=False, alg_name='WindowData', alg_id=None, dryrun=False)[source]
Apply a window operation to cut out data within a specified time range to any MsPASS seismic data object.
Cutting a smaller waveform segment from a larger waveform segment is a very common seismic data processing task. Handling that low level operation needs to be done efficiently but with a reasonable number of options. This function is very fast if the inputs match the expected model where the requested time segment is inside the range of the datum passed via arg0. In that mode the function is a thin wrapper on a C++ function that does most of the work. When the window is inconsistent with the data, which is defined here as a “short segment”, more overhead is involved if you want to recover something. To be specific a “short segment” means the requested time span for a the window operation form win_start to win_end is not completely inside (inclusive of the endpoints) the range of the data.
Handling of “short segments” has these elements: 1. If the window range is completely outside the range of
the data the result is always killed and returned as a dead datum with no sample data. (d.npts=0)
Behavior if there is the window range overlaps but has boundaries outside the data range depends on the setting of the argument short_segment_handling and the boolean argument log_recoverable_errors. If log_recoverable_errors is set True (default) and the result is returned live (i.e. the error was recoverable but the data are flawed) a complaint message describing what was done will be posted to the elog container of the return. If False recoveries will be done silently. That can be useful in some algorithms (e.g. cross-correlation) where partial segments can be handled. What defines possible recovery is set by the short_segment_handling string. It must be one of only three possible values or the function will abort with a ValueError exception: “kill” - (default) does not recovery attempt and will
kill any data with time inconsistencies.
- “truncate” - this truncates to the output to the
time range max(d.t0,win.starttime) to min(d.endtime(),win.endtime).
- “pad” - will cause the function to have data in
the span define dby win_start to win_end but the sections where the data are undefined will be set to zeros.
This function handles input that is any valid MsPASS data object. For atomic data it is a very thin wrapper for the related function WindowDataAtomic. For atomic data this is nothing more than a convenience function that allows you to omit the “Atomic” qualifier. For ensemble data this function is more-or-less a loop over all ensemble members running WindowDataAtomic on each member datum. In all cases the return is the same type as the input but either shortened to the specified range or killed. A special case is ensembles where only some members may be killed.
A special option for ensembles only can be triggered by setting the optional argument overwrite_members to True. Default behavior returns an independent ensemble created from the input cut to the requested window interval. When overwrite_members is set True the windowing of the members will be done in place ovewriting the original ensemble contents. i.e. in tha output is a reference to the same object as the input. The primary use of the overwrite_members == False option is for use in map operators on large ensembles as it can significantly reduce the memory footprint.
Note the description of subsample time handling in the related docstring for WindowDataAtomic. For ensembles each member output preserves subsample timing.
Finally, how the function handles data marked dead is important. For atomic data the is no complexity. dead is dead and the function just immediately returns a reference to the input. For ensembles some members can be dead or the entire ensemble can be marked dead. If the ensemble is marked dead the function immediately returns a reference to the input. If any members are dead the result will depend on the boolean argument “retain_data_members”. When True (the default) dead members will be copied verbatim to the output. If False the dead members will be SILENTLY deleted. The False option is only recommended if the windowing is internal to a function and the windowed output will be discarded during processing. Otherwise the error log of why data were killed will be lost. If you need to save memory by clearing dead bodies use the Undertaker class to bury the dead and retain the error log data.
- Parameters:
d – is the input data. d must be either a
mspasspy.ccore.seismic.TimeSeries
ormspasspy.ccore.seismic.Seismogram
object or the function will log an error to d and return a None.twin_start (
float
) – defines the start of timeWindow to be cuttwin_end (
float
) – defines the end of timeWindow to be cutt0shift (real number (float) or a string that defines a) – is an optional time shift to apply to the time window. This parameter is convenient to avoid conversions to relative time. A typical example would be to set t0shift to an arrival time and let the window define time relative to that arrival time. Default is None which cause the function to assume twin is to be used directly. It can be specified one of two ways: (1) as a number where it is assumed to be a time shift to apply in seconds, or (2) as a a string. In the later case the string is assumed to be a valid key for fetching a time from a datum’s Metadata container. Note the name t0shift is a bit inconsistent with this usage but was retained as the original api did not have the string option.
Metadata container key. :param short_segment_handling: Defines method for handling data where the requested interval is shorter than the requested window but does have some overlap. segment_handling_methods must be one of the following:
- kill - in this mode any issues cause the return to be marked dead
(this is the default)
- ‘pad’ - in this mode return will have short segments will be padded
with zeros to define data of the required length.
- ‘truncate’ - in this mode short segments will be truncated on left
and/or right to match actual data range.
Note that if the input time range does not overlap the requested interval “kiLl” is always the behavior as by definition the result is null.
Throws a ValueError exception of not one of the accepted values. :param log_recoverable_errors: When True (default) any recoverable windowing error (meaning at least a partial overlap in time range) will cause a log message to be posted to the elog container of the output. When False recovery will be done silently. Note when short_segment_handling is set to “kill” logging is not optional and kill will always create an error log entry. :param overwrite_members: controls handling of the member vector of ensembles as described above. When True the member atomic data will be overwritten by the windowed version and the ensmble returned will be a reference to the same container as the input. When False (the default) a new container is created and returned. Note in that mode dead data are copied to the same slot as the input unaltered. This argument will be silently ignored if the input is an atomic MsPASS seismic object. :type overwrite_members: boolean :param retain_dead_members: Controls how dead data are handled with ensembles. When True (default) dead ensemble members are copied verbatim to the output. When False they are silently deleted. (see above for a more complete description). This argument is ignored for Atomic data. :type retain_dead_members: boolean :param object_history: boolean to enable or disable saving object
level history. Default is False. Note this functionality is implemented via the mspass_func_wrapper decorator.
- Parameters:
alg_name – When history is enabled this is the algorithm name assigned to the stamp for applying this algorithm. Default (“WindowData”) should normally be just used. Note this functionality is implemented via the mspass_func_wrapper decorator.
ald_id – algorithm id to assign to history record (used only if object_history is set True.) Note this functionality is implemented via the mspass_func_wrapper decorator.
dryrun – Note this functionality is implemented via the mspass_func_wrapper decorator.
dryrun – Note this functionality is implemented via the mspass_func_wrapper decorator.
- Returns:
copy of d with sample range reduced to twin range. Returns an empty version of the parent data type (default constructor) if the input is marked dead
- mspasspy.algorithms.window.WindowDataAtomic(d, win_start, win_end, t0shift=None, short_segment_handling='kill', log_recoverable_errors=True, object_history=False, alg_name='WindowDataAtomic', alg_id=None, dryrun=False)[source]
Cut atomic data to a shorter time segment defined by a time range.
Cutting a smaller waveform segment from a larger waveform segment is a very common seismic data processing task. Handling that low level operation needs to be done efficiently but with a reasonable number of options. This function is very fast if the inputs match the expected model where the requested time segment is inside the range of the datum passed via arg0. In that mode the function is a thin wrapper on a C++ function that does most of the work. When the window is inconsistent with the data, which is defined here as a “short segment”, more overhead is involved if you want to recover something. To be specific a “short segment” means the requested time span for a the window operation form win_start to win_end is not completely inside (inclusive of the endpoints) the range of the data.
Handling of “short segments” has these elements: 1. If the window range is completely outside the range of
the data the result is always killed and returned as a dead datum with no sample data. (d.npts=0)
Behavior if there is the window range overlaps but has boundaries outside the data range depends on the setting of the argument short_segment_handling and the boolean argument log_recoverable_errors. If log_recoverable_errors is set True (default) and the result is returned live (i.e. the error was recoverable but the data are flawed) a complaint message describing what was done will be posted to the elog container of the return. If False recoveries will be done silently. That can be useful in some algorithms (e.g. cross-correlation) where partial segments can be handled. What defines possible recovery is set by the short_segment_handling string. It must be one of only three possible values or the function will abort with a ValueError exception: “kill” - (default) does not recovery attempt and will
kill any data with time inconsistencies.
- “truncate” - this truncates to the output to the
time range max(d.t0,win.starttime) to min(d.endtime(),win.endtime).
- “pad” - will cause the function to have data in
the span define dby win_start to win_end but the sections where the data are undefined will be set to zeros.
Users should also be aware that this function preserves subsample timing in modern earthquake data. All seismic reflection processing systems treat timing as synchronous on all channels. That assumption is not true for data acquired by independent digitizers with external timing systems (today always GPS timing). In MsPASS that is handled through the t0 attribute of atomic data objects and the (internal) time shift from GMT used when the time standard is shifted to “relative”. A detail of this function is it preserves t0 subsample timing so if you carefully examine the t0 value on the return of this function it will match the twin_start value only to the nearest sample. That way time computed with the time method (endtime is a special case for sample npts-1) will have subsample accuracy.
- Parameters:
d – is the input data. d must be either a
mspasspy.ccore.seismic.TimeSeries
ormspasspy.ccore.seismic.Seismogram
object or the function will log an error to d and return a None.twin_start (
float
) – defines the start of timeWindow to be cuttwin_end (
float
) – defines the end of timeWindow to be cutt0shift (real number (float) or a string that defines a) – is an optional time shift to apply to the time window. This parameter is convenient to avoid conversions to relative time. A typical example would be to set t0shift to an arrival time and let the window define time relative to that arrival time. Default is None which cause the function to assume twin is to be used directly. It can be specified one of two ways: (1) as a number where it is assumed to be a time shift to apply in seconds, or (2) as a a string. In the later case the string is assumed to be a valid key for fetching a time from a datum’s Metadata container. Note the name t0shift is a bit inconsistent with this usage but was retained as the original api did not have the string option.
Metadata container key. :param short_segment_handling: Defines method for handling data where the requested interval is shorter than the requested window but does have some overlap. segment_handling_methods must be one of the following:
- kill - in this mode any issues cause the return to be marked dead
(this is the default)
- ‘pad’ - in this mode return will have short segments will be padded
with zeros to define data of the required length.
- ‘truncate’ - in this mode short segments will be truncated on left
and/or right to match actual data range.
Note that if the input time range does not overlap the requested interval “kiLl” is always the behavior as by definition the result is null.
Throws a ValueError exception of not one of the accepted values. :param log_recoverable_errors: When True (default) any recoverable windowing error (meaning at least a partial overlap in time range) will cause a log message to be posted to the elog container of the output. When False recovery will be done silently. Note when short_segment_handling is set to “kill” logging is not optional and kill will always create an error log entry. :param object_history: boolean to enable or disable saving object
level history. Default is False. Note this functionality is implemented via the mspass_func_wrapper decorator.
- Parameters:
alg_name – When history is enabled this is the algorithm name assigned to the stamp for applying this algorithm. Default (“WindowData”) should normally be just used. Note this functionality is implemented via the mspass_func_wrapper decorator.
ald_id – algorithm id to assign to history record (used only if object_history is set True.) Note this functionality is implemented via the mspass_func_wrapper decorator.
dryrun – Note this functionality is implemented via the mspass_func_wrapper decorator.
dryrun – Note this functionality is implemented via the mspass_func_wrapper decorator.
- Returns:
copy of d with sample range reduced to twin range. Returns an empty version of the parent data type (default constructor) if the input is marked dead
- mspasspy.algorithms.window.WindowData_autopad(d, stime, etime, pad_fraction_cutoff=0.05, object_history=False, alg_name='WindowData_autopad', alg_id=None, dryrun=False)[source]
Windows an atomic data object with automatic padding if the undefined section is not too large.
When using numpy or MsPASS data arrays the : notation can be used to drastically speed performance over using a python loop. This function is most useful for contexts where the size of the output of a window must exactly match what is expected from the time range. A type example is a multichannel algorithm where you need to use an API that loads a set of signals into a matrix that is used as the workspace for processing. That is the univeral model, for example, in seismic reflection processing. This algorithm will silently zero pad any undefined samples at the end of the window IF the fraction of undefined data relative to the number of samples expected from the time range defined by etime-stime is less than the “pad_fraction_cutoff”. If the input time span shorter than the computed mismatch limit the result will be returned marked dead with an elog entry highlighting the problem.
- Parameters:
d (works with either TimeSeries or Seismogram) – atomic MsPASS seismic data object to be windowed.
objects. Will raise a TypeError exception if d is not one of the two atomic data types. :param stime: start time of window range :type stime: float :param etime: end time of window range :type etime: float :param pad_fraction_cutoff: limit for the fraction of data with undefined values before the datum is killed. (see above) If set to 0.0 this is an expensive way to behave the same as WindowData :return: object of the same type as d. Will be marked dead if the fraction of undefined data exceeds pad_fraction_cutoff. Otherwise will return a data vector of constant size that may be padded.
- mspasspy.algorithms.window.ensemble_error_post(d, alg, message, severity)[source]
This is a small helper function useful for error handlers in except blocks for ensemble objects. If a function is called on an ensemble object that throws an exception this function will post the message posted to all ensemble members. It silently does nothing if the ensemble is empty. :param d: is the ensemble data to be handled. It print and error message
and returns doing nothing if d is not one of the known ensemble objects.
- Parameters:
alg – is the algorithm name posted to elog on each member
message – is the string posted to all members
severity – is the error severity level
- mspasspy.algorithms.window.merge(tsvector, starttime=None, endtime=None, fix_overlaps=False, zero_gaps=False, object_history=False) TimeSeries [source]
Splices a vector of TimeSeries objects together and optionally carves out a specified time window. It acts a bit like an obspy function with the same name, but has completely different options and works with native MsPASS TimeSeries objects.
This function is a workhorse for handling continuous data that are universally stored today as a series of files. The input to this function is an array of TimeSeries objects that are assummed to be created from a set of data stored in such files. The data are assumed to be from a common stream of a single channel and sorted so the array index defines a time order. The algorithm attempts to glue the segments together into a single time series that is returned. The algorithm by default assumes the input is “clean” which means the endtime of each input TimeSeries is 1 sample ahead of the start time of the next segment (i.e. (segment[i+1].t0()-segment[i].endtime()) == dt). The actual test is that the time difference is less than dt/2.
This algorithm treats two conditions as a fatal error and will throw a MsPASSError when the condition occurs:
It checks that the input array of TimeSeries data are in time order.
It checks that the inputs all have the same sample rate.
If fix_overlaps is False if an overlap is found it is considered an exception.
Either of these conditions will cause the function to throw an exception. The assumption is that either is a user error created by failing to reading the directions that emphasize this requirement for the input.
Other conditions can cause the output to be marked dead with an error message posted to the output’s elog attribute. These are:
This algorithm aims to produce an output with data stored in a continuous vector with a length defined by the total time span of the input. Naive use can create enormously long vectors that would mostly be empty. (e.g. two random day files with a 5 year difference in start time) The algorithm refuses to try to merge data when the span exceeds an internal threshold of 10^8 samples.
If fix_overlaps is false any overlap of successive endtime to the next starttime of more than 0.5 samples will cause the output to be killed. The assumption is such data have a serious timing problem retained even after any cleaning.
The above illustrates that this function behaves differently and makes different assumption if the argument check_overlaps is True or False. False is faster because it bypasses the algorithm used to fix overlaps, but is safer if your data is not perfectly clean in the sense of lacking any timing issues or problem with duplicates. If the fix_overlaps boolean is set True, the mspass overlap handler is involked that is a C++ function with the name “repair_overlaps”. The function was designed only to handle the following common situations. How they are handled is different for each situation.
Duplicate waveform segments spanning a common time interval can exist in raw data and accidentally by indexing two copies the same data. If the samples in the overlapping section match the algorithm will attempt to remove the overlapping section. The algorithm is known to work only for a pair of pure duplicates and a smaller segment with a start time after a more complete segment. It may fail if there are more than three or more copies of the same waveform in the input or one of the waveforms spans a smaller time range than the other but has the same start time. The first is a gross user error. The second should be rare an is conceivable only with raw data where a packet or two was randomly saved twice - something that shouldn’t happen but could with flakey hardware.
If an overlap is detected AND the sample data in the overlap are different the algorithm assumes the data have a timing problem that created this situation. Our experience is this situation only happens when an instrument has a timing problem. All continuous data generating digitizers we know of use a timing system slaved to an external reference (usually GPS time). If the external signal is lost the clock drifts. When the signal is restored if the digitizer detects a large time jump it may reset (time jerk) to tag the next packet of data with the updated time based on the standard. If that time jump is forward it will leave an apparent gap in the data, which we discuss below, but if the jump is backward it will leave an apparent overlap with inconsistent samples. The repair_overlaps function assumes that is the cause of all overlaps it detects.
The final common situation this function needs to handle is gaps. A gap is defined in this algorithm as any section where the endtime of one segment is followed by a start time of the next segment that is more than 1 sample in duration. Specifically when
(segment[i+1].t0()-segment[i].endtime()) > 1.5*dt
The result depends on values of the (optional) windowing arguments starttime and endtime and the boolean “zero_gaps” argument. If windowing is enabled (done by changing default None values of starttime and endtime) any gap will be harmless unless it is present inside the specified time time range. In all cases what happens to the output depends upon the boolean zero_gaps. When zero_gaps is False any gaps detected within the output range of the result (windowed range if specified but the full time of the input otherwise) When set True gap sections will be zeroed in the output data vector. All outputs with gaps that have been zeroed will have the boolean Metadata attribute “has_gap” set True and undefined otherwise. When the has_data attribute is set true the tap windows will be stored as a list of “TimeWindow” objects with the Metadata key “gaps”.
- Parameters:
tsvector – array of TimeSeries data that are to be spliced together to create a single output. The contents must all have the same sample rate and be sorted by starttime.
starttime (double - assumed to be a UTC time expressed as a unix epoch time.) – (optional) start time to apply for windowing the output. Default is None which means the output will be created as the merge of all the inputs. When set WindowData is applied with this start time. Note if endtime is defined but starttime is None windowing is enabled with starttime = earliest segment start time.
endtime (double - assumed to be a UTC time expressed as a unix epoch time.) – (optional) end time to apply for windowing the output. Default is None which means the output will be created as the merge of all the inputs. When set WindowData is applied with this end time. Note if starttime is defined but endtime is None windowing is enabled with endtime = latest end time of all segments.
fix_overlaps (boolean) – when set True (default is False) if an overlap is detected the algorithm will attempt to repair the overlap to yield a continuous time series vectgor if it determined to have matching data. See description above for more details.
zero_gaps – When set False, which is the default, any gaps detected in the output window will cause the return to be marked dead. When set True, gaps will be zeroed and with a record of gap positions posted to the Metadata of the output. See above for details.
zero_gaps – boolean controlling how gaps are to be handled.
See above for details of the algorithm. :type zero_gaps: boolean (default False) :param object_history: boolean to enable or disable saving object
level history. Default is False. Note this functionality is implemented via the mspass_func_wrapper decorator.
- Returns:
TimeSeries in the range defined by the time span of the input vector of segments or if starttime or endtime are specified a reduced time range. The result may be marked dead for a variety of reasons with error messages explaining why in the return elog attribute.
- mspasspy.algorithms.window.scale(d, compute_from_window=False, window=None, method='peak', level=1.0, scale_by_section=False, use_mean=False, object_history=False, alg_name='scale', alg_id=None, dryrun=False, function_return_key=None)[source]
Top level function interface to data scaling methods.
This function can be used to scale seismic data contained in any of the four seismic data objects defined in mspass: TimeSeries, Seismogram, TimeSeriesEnsemble, and SeismogramEnsemble. An extensive set of amplitude estimation metrics are available by selecting one of the allowed values for the method parameter. Ensembles can be scaled at the individual seismogram level or as a whole (scale_by_section=True).
Note all methods preserve the original amplitude by updating the Metadata parameter calib to include the scaling. i.e. as always the amplitude of the data in physical units can be restored by multiplying the data samples by calib.
- Parameters:
d – is input data object. If not one of the four mspass seismic data types noted above the function will throw a RuntimeError exception.
compute_from_window – boolean used to compute amplitude and scale based on a windowed section of the input waveform. By default (this boolan False) the amplitude for scaling is computed from the full waveform. When True the window argument must contain a valid TimeWindow that spans a time range smaller than the data range. In that situation if the window is inconsistent with the data range the return will be marked dead and messages will be found posted in elog. For ensembles all or only portion of the group will be killed if this happens. Note this parameter is also ignored when scale_by_section is true.
window – mspass TimeWindow object defining the time range over which the amplitude for scaling is to be computed. (see the compute_from_window parameter description)
method – string defining the gain method to use. Currently supported method values are: peak, RMS (rms accepted), perc, and MAD (also accepts mad or Mad). Peak uses the largest amplitude for scaling. For 3C data that means vector amplitude while for scalar data it is the largest absolute value. rms is the standard rms measure, although for 3C data is is rms vector amplitudes so scaling is by the number of vectors not the total number of samples (3 times number of vectors). perc uses a sorted percentage clip level metric as used in seismic unix. mad is a variant where the value returned is the median absolute deviation (mad) that is actual the same as perc=1/2. Default is peak. WARNING: if an invalid value for method is passed the data will be returned unaltered with a complaint message issue for very datum (indivually or in ensembles) run that way.
level – For all but perc this defines the scale to which the data are scaled. For perc it is used to set the percent clip level. If the value passed is illegal (0 or negative for most methods while perc must also be positive but less or equal 1) a complain message will be posted to elog and the level adjusted to 1.0.
scale_by_section – is a boolean that controls the scaling behavior on ensembles only (It is silently ignored for atomic TimeSeries and Seismogram data). When true a single gain factor is applied to all members of an ensemble. When false each member is gained individually as if this function were applied in a loop to each member.
use_mean – boolean used only for ensembles and when scale_by_section is True. The algorithm used in that case has an option to use the mean log amplitude for scaling the section instead of the default median amplitude.
- Returns:
Data scaled to specified level. Note the scaling always preserves absolute amplitude by adjusting the value of the calib attribute of the return so calib*data is the same value before and after the scaling.
- Return type:
same as input