sdaas - a Python tool computing an amplitude anomaly score of seismic data and 
metadata using simple machine‐Learning models

Zaccarelli, Riccardo

doi:10.5880/GFZ.2.6.2023.009

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Software

sdaas - a Python tool computing an amplitude anomaly score of seismic data and metadata using simple machine‐Learning models

Authors

/persons/resource/rizac

Zaccarelli, Riccardo
2.6 Seismic Hazard and Risk Dynamics, 2.0 Geophysics, Departments, GFZ Publication Database, Deutsches GeoForschungsZentrum;

External Ressource

No external resources are shared

Fulltext (public)

There are no public fulltexts stored in GFZpublic

Supplementary Material (public)

There is no public supplementary material available

Citation

Zaccarelli, R.(2022): sdaas - a Python tool computing an amplitude anomaly score of seismic data and metadata using simple machine‐Learning models, Potsdam : GFZ Data Services.
https://doi.org/10.5880/GFZ.2.6.2023.009

Cite as: https://gfzpublic.gfz-potsdam.de/pubman/item/item_5022744

Abstract

The increasingly high number of big data applications in seismology has made quality control tools to filter, discard, or rank data of extreme importance. In this framework, machine learning algorithms, already established in several seismic applications, are good candidates to perform the task flexibility and efficiently. sdaas (seismic data/metadata amplitude anomaly score) is a Python library and command line tool for detecting a wide range of amplitude anomalies on any seismic waveform segment such as recording artifacts (e.g., anomalous noise, peaks, gaps, spikes), sensor problems (e.g., digitizer noise), metadata field errors (e.g., wrong stage gain in StationXML). The underlying machine learning model, based on the isolation forest algorithm, has been trained and tested on a broad variety of seismic waveforms of different length, from local to teleseismic earthquakes to noise recordings from both broadband and accelerometers. For this reason, the software assures a high degree of flexibility and ease of use: from any given input (waveform in miniSEED format and its metadata as StationXML, either given as file path or FDSN URLs), the computed anomaly score is a probability-like numeric value in [0, 1] indicating the degree of belief that the analyzed waveform represents an anomaly (or outlier), where scores ≤0.5 indicate no distinct anomaly. sdaas can be employed for filtering malformed data in a pre-process routine, assign robustness weights, or be used as metadata checker by computing randomly selected segments from a given station/channel: in this case, a persistent sequence of high scores clearly indicates problems in the metadata