mspasspy.util

converter

Functions for converting to and from MsPASS data types.

mspasspy.util.converter.AntelopePf2dict(pf)[source]

Converts a AntelopePf object to a Python dict. This converts a AntelopePf object to a Python dict by recursively decoding the tbls. :param pf: AntelopePf object to convert. :type md: AntelopePf :return: Python dict equivalent to md. :rtype: dict

mspasspy.util.converter.Metadata2dict(md)[source]

Converts a Metadata object to a Python dict.

This is the inverse of dict2Metadata. It converts a Metadata object to a Python dict. Note that Metadata behavies like dict, so this conversion is usually not necessay.

Parameters:

md (Metadata) – Metadata object to convert.

Returns:

Python dict equivalent to md.

Return type:

dict

mspasspy.util.converter.Pf2AttributeNameTbl(pf, tag='attributes')[source]

This function will parse a pf file to extract a tbl with a specific key and return a data structure that defines the names and types of each column in the input file.

The structure returned is a tuple with three components:
1 (index 0) python array of attribute names in the original tbl order

This is used to parse the text file so the order matters a lot.

2 (index 1) parallel array of type names for each attribute.

These are actual python type objects that can be used as the second arg of isinstance.

3 (index 2) python dictionary keyed by name field that defines

what a null value is for this attribute.

Parameters:
  • pf – AntelopePf object to be parsed

  • tag – &Tbl tag for section of pf to be parsed.

mspasspy.util.converter.Seismogram2Stream(sg, chanmap=['E', 'N', 'Z'], hang=[90.0, 0.0, 0.0], vang=[90.0, 90.0, 0.0])[source]

Convert a mspass::Seismogram object to an obspy::Stream with 3 components split apart.

mspass and obspy have completely incompatible approaches to handling three component data. obspy uses a Stream object that is a wrapper around and a list of Trace objects. mspass stores 3C data bundled into a matrix container. This function takes the matrix container apart and produces the three Trace objects obspy want to define 3C data. The caller is responsible for how they handle bundling the output.

A very dark side of this function is any error log entries in the part mspass Seismogram object will be lost in this conversion as obspy does not implement that concept. If you need to save the error log you will need to save the input of this function to MongoDB to preserve the errorlog it may contain.

Parameters:
  • sg (Seismogram) – is the Seismogram object to be converted

  • chanmap (list) – 3 element list of channel names to be assigned components

  • hang (list) – 3 element list of horizontal angle attributes (azimuth in degrees) to be set in Stats array of output for each component. (default is for cardinal directions)

  • vang (list) – 3 element list of vertical angle (theta of spherical coordinates) to be set in Stats array of output for each component. (default is for cardinal directions)

Returns:

obspy Stream object containing a list of 3 Trace objects in mspass component order. Presently the data are ALWAYS returned to cardinal directions (see above). It will be empty if sg was marked dead

Return type:

obspy.core.stream.Stream

mspasspy.util.converter.SeismogramEnsemble2Stream(sge)[source]

Convert a seismogram ensemble to stream :param sge: seismogram ensemble input :return: stream

mspasspy.util.converter.Stream2Seismogram(st, master=0, cardinal=False, azimuth='azimuth', dip='dip')[source]

Convert obspy Stream to a Seismogram.

Convert an obspy Stream object with 3 components to a mspass::Seismogram (three-component data) object. This implementation actually converts each component first to a TimeSeries and then calls a C++ function to assemble the complete Seismogram. This has some inefficiencies, but the assumption is this function is called early on in a processing chain to build a raw data set.

Parameters:
  • st – input obspy Stream object. The object MUST have exactly 3 components or the function will throw a AssertionError exception. The program is less dogmatic about start times and number of samples as these are handled by the C++ function this python script calls. Be warned, however, that the C++ function can throw a MsPASSrror exception that should be handled separately.

  • master – a Seismogram is an assembly of three channels composed created from three TimeSeries/Trace objects. Each component may have different metadata (e.g. orientation data) and common metadata (e.g. station coordinates). To assemble a Seismogram a decision has to be made on which component has the definitive common metadata. We use a simple algorithm and clone the data from one component defined by this index. Must be 0,1, or 2 or the function wil throw a RuntimeError. Default is 0.

  • cardinal – boolean used to define one of two algorithms used to assemble the bundle. When true the three input components are assumed to be in cardinal directions (x1=positive east, x2=positive north, and x3=positive up) AND in a fixed order of E,N,Z. Otherwise the Metadata fetched with the azimuth and dip keys are used for orientation.

  • azimuth – defines the Metadata key used to fetch the azimuth angle used to define the orientation of each component Trace object. Default is ‘azimuth’ used by obspy. Note azimuth=hang in css3.0. Cannot be aliased - must be present in obspy Stats unless cardinal is true

  • dip – defines the Metadata key used to fetch the vertical angle orientation of each data component. Vertical angle (vang in css3.0) is exactly the same as theta in spherical coordinates. Default is obspy ‘dip’ key. Cannot be aliased - must be defined in obspy Stats unless cardinal is true

Raise:

Can throw either an AssertionError or MsPASSrror(currently defaulted to

pybind11’s default RuntimeError. Error message can be obtained by calling the what method of RuntimeError).

mspasspy.util.converter.Stream2SeismogramEnsemble(stream)[source]

Convert a stream to seismogram ensemble. :param stream: stream input :return: converted seismogram ensemble

mspasspy.util.converter.Stream2TimeSeriesEnsemble(stream)[source]

Convert a stream to timeseries ensemble. :param stream: stream input :return: converted timeseries ensemble

mspasspy.util.converter.Textfile2Dataframe(filename, separator='\\s+', type_dict=None, header_line=0, attribute_names=None, rename_attributes=None, attributes_to_use=None, one_to_one=True, parallel=False, insert_column=None)[source]

Import a text file representation of a table and store its representation as a pandas dataframe. Note that even in the parallel environment, a dask dataframe will be transfered back to a pandas dataframe for the consistency.

Parameters:
  • filename – path to text file that is to be read to create the table object that is to be processed (internally we use pandas or dask dataframes)

  • separator

    The delimiter used for seperating fields, the default is “s+”, which is the regular expression of “one or more spaces”.

    For csv file, its value should be set to ‘,’. This parameter will be passed into pandas.read_csv or dask.dataframe.read_csv. To learn more details about the usage, check the following links: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html https://docs.dask.org/en/latest/generated/dask.dataframe.read_csv.html

  • type_dict – pairs of each attribute and its type, usedd to validate the type of each input item

  • header_line – defines the line to be used as the attribute names for columns, if is < 0, an attribute_names is required. Please note that if an attribute_names is provided, the attributes defined in header_line will always be override.

  • attribute_names – This argument must be either a list of (unique) string names to define the attribute name tags for each column of the input table. The length of the array must match the number of columns in the input table or this function will throw a MsPASSError exception. This argument is None by default which means the function will assume the line specified by the “header_line” argument as column headers defining the attribute name. If header_line is less than 0 this argument will be required. When header_line is >= 0 and this argument (attribute_names) is defined all the names in this list will override those stored in the file at the specified line number.

  • rename_attributes – This is expected to be a python dict keyed by names matching those defined in the file or attribute_names array (i.e. the panda/dataframe column index names) and values defining strings to use to override the original names. That usage, of course, is most common to override names in a file. If you want to change all the name use a custom attributes_name array as noted above. This argument is mostly to rename a small number of anomalous names.

  • attributes_to_use – If used this argument must define a list of attribute names that define the subset of the dataframe dataframe attributes that are to be saved. For relational db users this is effectively a “select” list of attribute names. The default is None which is taken to mean no selection is to be done.

  • one_to_one – is an important boolean use to control if the output is or is not filtered by rows. The default is True which means every tuple in the input file will create a single row in dataframe. (Useful, for example, to construct an wf_miniseed collection css3.0 attributes.) If False the (normally reduced) set of attributes defined by attributes_to_use will be filtered with the panda/dask dataframe drop_duplicates method. That approach is important, for example, to filter things like Antelope “site” or “sitechan” attributes created by a join to something like wfdisc and saved as a text file to be processed by this function.

  • parallel – When true we use the dask dataframe operation. The default is false meaning the simpler, identical api panda operators are used.

  • insert_column – a dictionary of new columns to add, and their value(s).

If the content is a single value, it can be passedto define a constant value for the entire column of data. The content can also be a list, in that case, the list should contain values that are to be set, and it must be the same length as the number of tuples in the table.

mspasspy.util.converter.TimeSeries2Trace(ts)[source]

Converts a TimeSeries object to an obspy Trace object.

MsPASS can handle scalar data either as an obspy Trace object or as with the mspass TimeSeries object. The capture nearly the same concepts. The main difference is that TimeSeries support the error logging and history features of mspass while obspy, which is a separate package, does not. Obspy has a number of useful algorithms that operate on scalar data, however, so it is frequently useful to switch between Trace and TimeSeries formats. The user is warned, however, that converting a TimeSeries to a Trace object with this function will result in the loss of any error log information. For production runs unless the data set is huge, we recommend saving the intermediate result AFTER calling this function if there is any possibility there are errors posted on any data. We say after because some warning errors from this function may be posted in elog. Since python uses call by reference d may thus be altered.

Parameters:

ts (TimeSeries) – is the TimeSeries object to be converted

Returns:

an obspy Trace object from conversion of d. An empty Trace object will be returned if d was marked dead

Return type:

:class:`~obspy.core.trace.Trace`b

mspasspy.util.converter.TimeSeriesEnsemble2Stream(tse)[source]

Convert a timeseries ensemble to stream. Always copies all ensemble Metadata to tse members before conversion. That is necessary to avoid loss of data in the case where the only copy is stored in the ensemble’s metadata.

Parameters:

tse – timeseries ensemble

Returns:

converted stream

mspasspy.util.converter.Trace2TimeSeries(trace, history=None)[source]

Convert an obspy Trace object to a TimeSeries object.

An obspy Trace object mostly maps directly into the mspass TimeSeries object with the stats of Trace mapping (almost) directly to the TimeSeries Metadata object that is a base class to TimeSeries. A deep copy of the data vector in the original Trace is made to the result. That copy is done in C++ for speed (we found a 100+ fold speedup using that mechanism instead of a simple python loop) There is one important type collision in copying obspy starttime and endtime stats fields. obspy uses their UTCDateTime object to hold time but TimeSeries only supports an epoch time (UTCDateTime.timestamp) so the code here has to convert from the UTCDateTime to epoch time in the TimeSeries. Note in a TimeSeries starttime is the t0 attribute.

The biggest mismatch in Trace and TimeSeries is that Trace has no concept of object level history as used in mspass. That history must be maintained outside obspy. To maintain full history the user must pass the history maintained externally through the optional history parameter. The contents of history will be loaded directly into the result with no sanity checks.

Parameters:
  • trace (Trace) – obspy trace object to convert

  • history – mspass ProcessingHistory object to post to result.

Returns:

TimeSeries object derived from obpsy input Trace object

Return type:

TimeSeries

mspasspy.util.converter.dict2Metadata(dic)[source]

Function to convert Python dict data to Metadata.

pymongo returns a Python dict container from find queries to any collection. Simple type in returned documents can be converted to Metadata that are used as headers in the C++ components of mspass.

Parameters:

dict (dict) – Python dict to convert

Returns:

Metadata object translated from d

Return type:

Metadata

mspasspy.util.converter.list2Ensemble(l, keys=None)[source]

Convert a list of TimeSeries or Seismograms to a corresponding type of Ensemble. This function will make copies of all the data, to create a new Ensemble. Note that the Ensemble’s Metadata will always be copied from the first member. If the keys argument is specifid, it will only copy the keys specified. If a key does not exist in the first member, it will be skipped and leave a complaint in the error log of the ensemble.

Parameters:
  • l – a list of TimeSeries or Seismograms

  • keys – a list of keys to be copied from the first object to the Ensemble’s Metadata

Returns:

converted TimeSeriesEnsemble or SeismogramEnsemble

mspasspy.util.converter.post_ensemble_metadata(ens, keys=[], check_all_members=False, clean_members=False)[source]

It may be necessary to call this function after conversion from an obspy Stream to one of the mspass Ensemble classes. This function is necessary because a mspass Ensemble has a concept not part of the obspy Stream object. That is, mspass ensembles have a global Metadata container. That container is expected to contain Metadata common to all members of the ensemble. For example, for data from a single earthquake it would be sensible to post the source location information in the ensemble metadata container rather than having duplicates in each member.

Two different approaches can be used to do this copy. The faster, but least reliable method is to simply copy the values from the first member of the ensemble. That approach is enabled by default. It is completely reliable when used after a conversion from an obspy Stream but ONLY if the data began life as a mspass ensemble with exactly the same keys set as global. The type example of that is after an obspy algorithm is applied to a mspass ensemble via the mspass decorators.

A more cautious algorithm can be enabled by setting check_all_members True. In that mode the list of keys received is tested with a not equal test for against each member. Note we do not do anything fancy with floating point data to allow for finite precision. The reason is Metadata float values are normally expected to be constant data. In that case an != test will yield false when the comparison is between two copies. The not equal test may fail, however, if used with computed floating point numbers. An example where that is possible would be spatial gathers like PP data assembled by midpoint coordinates. If you need to build gathers in such a context we recommend you use an integer image point tied to a specialized document collection in MongoDB that defines the geometry of that point. There may be other examples, but the point is don’t trust computed floating point values to work. It will also not work if the values of a key-value pair don’t support an != comparison. That could be common if the value request for copy was a python object.

Parameters:
  • ens – ensemble data to be processed. The function will throw a MsPASSError exception of ens is not either a TimeSeriesEnsemble or a SeismogramEnsemble.

  • keys – is expected to be a list of metadata keys (required to be strings) that are to be copied from member metadata to ensemble metadata.

  • check_all_members – switch controlling method used to extract metadata that is to be copied (see above for details). Default is False

  • clean_members – when true data copied to ensemble metadata will be removed from all members. This option is only allowed if check_all_members is set True. It will be silently ignored if check_all_members is False.

decorators

mspasspy.util.decorators.is_input_dead(*args, **kwargs)[source]

A helper method to see if any mspass objects in the input parameters are dead. If one is dead, we should keep silent, i.e. no longer perform any further operations on this dead mspass object. Note for an ensemble object, only if all the objects of it are dead, we mark them as dead, otherwise they are still alive.

Parameters:
  • args – any parameters.

  • kwargs – any key-word parameters.

Returns:

True if there is a dead mspass object in the parameters, False if no mspass objects in the input parameters or all of them are still alive.

mspasspy.util.decorators.seismogram_copy_helper(seis1, seis2)[source]
mspasspy.util.decorators.seismogram_ensemble_copy_helper(es1, es2)[source]
mspasspy.util.decorators.timeseries_copy_helper(ts1, ts2)[source]
mspasspy.util.decorators.timeseries_ensemble_copy_helper(es1, es2)[source]

logging_helper

mspasspy.util.logging_helper.ensemble_error(d, alg, message, err_severity=<ErrorSeverity.Invalid: 1>)[source]

This is a small helper function useful for error handlers in except blocks for ensemble objects. If a function is called on an ensemble object that throws an exception this function will post the message posted to all ensemble members. It silently does nothing if the ensemble is empty.

Parameters:
  • err_severity – severity of the error, default as ErrorSeverity.Invalid.

  • d – is the ensemble data to be handled. It print and error message and returns doing nothing if d is not one of the known ensemble objects.

  • alg – is the algorithm name posted to elog on each member

  • message – is the string posted to all members

(Note due to a current flaw in the api we don’t have access to the severity attribute. For now this always set it Invalid)

mspasspy.util.logging_helper.info(data, alg_id, alg_name, target=None)[source]

This helper function is used to log operations in processing history of mspass object. Per best practice, every operations happen on the mspass object should be logged.

Parameters:
  • data – the mspass data object

  • alg_id – an id designator to uniquely define an instance of algorithm.

  • alg_name – the name of the algorithm that used on the mspass object.

  • target – if the mspass data object is an ensemble type, you may use target as index to log on one specific object in the ensemble. If target is not specified, all the objects in the ensemble will be logged using the same information.

Returns:

None

mspasspy.util.logging_helper.reduce(data1, data2, alg_id, alg_name)[source]

This function replicates the processing history of data2 onto data1, which is a common use case in reduce stage. If data1 is dead, it will keep silent, i.e. no history will be replicated. If data2 is dead, the processing history will still be replicated.

Parameters:
  • data1 – Mspass object

  • data2 – Mspass object

  • alg_id – The unique id of that user gives to the algorithm.

  • alg_name – The name of the reduce algorithm that uses this helper function.

Returns:

None

seispp

mspasspy.util.seispp.index_data(filebase, db, ext='d3C', verbose=False)[source]

Import function for data from antelope export_to_mspass.

This function is an import function for Seismogram objects created by the antelope program export_to_mspass. That program writes header data as a yaml file and the sample data as a raw binary fwrite of the data matrix (stored in fortran order but written as a contiguous block of 3*npts (number of samples) double values. This function parses the yaml file and adds three critical metadata entries: dfile, dir, and foff. To get foff values the function reads the binary data file and gets foff values by calls to tell. It then writes these entries into MongoDB in the wf collection of a database. Readers that want to read this raw data will need to use dir, dfile, and foff to find the right file and read point.

Parameters:
  • filebase – is the base name of the dataset to be read and indexed. The function will look for filebase.yaml for the header data and filebase.ext (Arg 3 defaulting to d3C).

  • db – is the MongoDB database handler

  • ext – is the file extension for the sample data (default is ‘d3C’).

Undertaker

class mspasspy.util.Undertaker.Undertaker(dbin, regular_data_collection='cemetery', aborted_data_collection='abortions', data_tag=None)[source]

Bases: object

Class to handle dead data. Results are stored to two spcial collections defined by default as “cemetery”, for regular dead bodies, and “abortions” for those defined as abortions.

Parameters:
  • dbin (the constructor for this class only tests for that the handle is an instance of pymongo’s Database class. The MsPASS version of Database extends the pymongo version. Thsi particular class references only two methods of Database: (1) the private method _save_elog and (2) the private method _save_history. Technically an alternative extension of pymongo’s Database class that implements those two methods would be plug compatible. User’s who might want to pull MsPASS apart and use this class separately could do so with an alternative Database extension than MsPaSs.) – Should be an instance of mspasspy.db.Database that is used to save the remains of any bodies.

  • regular_data_collection – collection where we bury regular

dead bodies. Default “cemetery” :type regular_data_collection: string

Parameters:

aborted_data_collection – collection where aborted data documents

are buried. Default “abortions” :type aborted_data_collection: string

Parameters:

data_tag – tag to attach to each document. Normally would

be the same as the data_tag used for a particular save operation for data not marked dead.

bring_out_your_dead(d, bury=False, save_history=True, mummify_atomic_data=True)[source]

Seperate an ensemble into live and dead members. Result is returned as a pair (tuple) of two ensembles. First (0 component) is a copy of the input with the dead bodies removed. The second (component 1) has the same ensemble Metadata as the input but only contains dead members - like the name implies stolen from a great line in the Monty Python movie “Search for the Holy Grail”.

Parameters:
  • d – must be either a TimeSeriesEnsemble or SeismogramEnsemble of data to be processed.

  • bury – if true the bury method will be called on the ensemble of dead data before returning. Note a limitation of using this method is there is no way to save the optional history data via this method. If you need to save history run this with bury=False and then run bury with save_history true on the dead ensemble. There is also no way to specify an alternative to the default collection name of “cemetery”

Returns:

python list with two elements. 0 is ensemble with live data and 1 is ensemble with dead data.

Return type:

python list with two components

bury(mspass_object, save_history=False, mummify_atomic_data=True)[source]

Handles dead data by saving a subset of content to database.

MsPASS makes extensive use of the idea of “killing” data as a way to say it is bad and should not be considered further in any analysis. There is a need to record what data was killed and it is preferable to do so without saving the entire data object. (That is the norm in seismic reflection processing where data marked dead are normally carried through until removed through a process like a stack.) This method standardizes the method of how to do that and what is saved as the shell of a dead datum. That “shell” is always a minimum of two things:

  1. All elog entries - essential to understand why datum was killed

  2. The content of the Metadata container saved under a subdocument called “tombstone”.

If save_history is set True and the datum has history records they will also be saved.

It is important to realize this method acts like an overloaded c++ method in that it accepts multiple data types, but handles them differently. 1. Atomic data (TimeSeries or Seismogram) marked dead

generate a document saved to the specified collection and an (optional) history document. If the mummify_atomic_data parameter is set True (the default) the returned copy of the data will be processed with the “mummify” method of this class. (That means the sample data are discarded and the array is set

to zero length).

  1. Ensembles have to handle two different situations. If the entire ensemble is marked dead, all members are treated as dead and then processed through this method by a recursive call on each member. In that situation an empty ensemble is returned with only ensemble metadata not empty. If the ensemble is marked live the code loops over members calling this method recusively only on dead data. In that situation the ensemble returned is edited with all dead data removed. (e.g. if we started with 20 members and two were marked dead, the return would have 18 members.)

Parameters:
  • mspass_object (Must be a MsPASS seismic data object (TimeSeries, Seismogram, TimeSeriesEnsemble, or SeismogramEnsemble) or the method will throw a TypeError.) – datum to be processed

  • save_history – If True and a datum has the optional history data stored with it, the history data will be stored in a MongoDB collection hard wired into the _save_history method of Database. Default is False

  • mummify_atomic_data – When True (default) atomic data marked dead will be passed through self.mummify to reduce memory use of the remains. This parameter is ignored for ensembles.

bury_the_dead(mspass_object, save_history=True, mummify_atomic_data=True)[source]

Depricated method exactly equivalent to new, and shorter name of simply bury. With context as a member of Undertaker the long name was redundnant. Note the call sequence is exactly the same as bury.

cremate(mspass_object)[source]

Like bury but nothing is preserved of the dead.

Fpr atomic data it returns a default constructed (empty) copy of the container matching the original type. That avoids downstream type collisions if this method is called in parallel workflow to release memory. This method is most appropriate for ensembles. In that case, it returns a copy of the ensemble with all dead data removed. (i.e. they are ommited from the returned copy leaving no trace.) If an ensemble is marked dead the return is an empty ensemble containing only ensemble Metadata.

Parameters:

mspass_object – Seismic data object. If not a MsPASS seismic data object a TypeError will be thrown.

handle_abortion(doc_or_datum, type=None)[source]

Standardized method to handle what we call abortions (see class overview).

This method standardizes handling of abortions. They are always saved as a document in a collection set by the constructor (self.aborted_data_collection) that defaults to “abortions”. The documents saved have up to 3 key-value pairs:

“tombstone” - contents are a subdocument (dict) of the

wf document that was aborted during construction.

“logdata” - any error log records left by the reeader that failed. “type” - string describing the expected type of data object

that a reader was attempting to construct. In rare situations it could be set to “unknown” if Undertaker._handle_abortion is called on a raw document and type is not set (see parameters below)

Parameters:

doc_or_datum (Must be one of TimeSeries, Seismogram, Metadata,) – container defining the aborted fetus.

or a python dict. For the seismic data objects any content in the ErrorLogger will be saved. For dict input an application should post a message to the dict with some appropriate (custom) key to preserve a cause for the abortion.

Parameters:

type – string description of the type of data object

to associate with dict input. Default for this parameter is None and it is not referenced at all for normal input of TimeSeries and Seismogram objects. It is ONLY referenced if arg0 is a dict. If type is None and the input is a dict the value assigned to the “type” key in the abortions document is “unknown”. The escape for “unknown” makes the method bombproof but may make the saved documents ambiguous.

Exception:

throws a TypeError if arg0 does not obey type

list described above.

mummify(mspass_object, post_elog=True, post_history=False)[source]

Reduce memory use associated with dead data.

For atomic data objects if they are marked dead the data vector/matrix is set to zero length releasing the dynamically allocated memory. For Ensembles if the entire ensemble is marked dead all members are killed and this method calls itself on each member. For normal ensembles with mixed live and dead data only the data marked dead are muffified.

Handling of

Parameters:

mspass_object – datum to be processed.