Deutsch
 
Datenschutzhinweis Impressum
  DetailsucheBrowse

Datensatz

 
 
DownloadE-Mail
  Addressing the need for interactive, efficient, and reproducible data processing in ecology with the datacleanr R package

Hurley, A., Peters, R. L., Pappas, C., Steger, D., Heinrich, I. (2022): Addressing the need for interactive, efficient, and reproducible data processing in ecology with the datacleanr R package. - Plos One, 17, 5, e0268426.
https://doi.org/10.1371/journal.pone.0268426

Item is

Dateien

einblenden: Dateien
ausblenden: Dateien
:
5011388_.pdf (Verlagsversion), 3MB
Name:
5011388_.pdf
Beschreibung:
-
Sichtbarkeit:
Öffentlich
MIME-Typ / Prüfsumme:
application/pdf / [MD5]
Technische Metadaten:
Copyright Datum:
-
Copyright Info:
-

Externe Referenzen

einblenden:

Urheber

einblenden:
ausblenden:
 Urheber:
Hurley, Alexander1, Autor              
Peters, Richard L.2, Autor
Pappas, Christoforos2, Autor
Steger, David1, Autor              
Heinrich, Ingo1, Autor              
Krug, Rainer M.2, Herausgeber
Affiliations:
14.3 Climate Dynamics and Landscape Evolution, 4.0 Geosystems, Departments, GFZ Publication Database, Deutsches GeoForschungsZentrum, ou_146046              
2External Organizations, ou_persistent22              

Inhalt

einblenden:
ausblenden:
Schlagwörter: -
 Zusammenfassung: Ecological research, just as all Earth System Sciences, is becoming increasingly data-rich. Tools for processing of “big data” are continuously developed to meet corresponding technical and logistical challenges. However, even at smaller scales, data sets may be challenging when best practices in data exploration, quality control and reproducibility are to be met. This can occur when conventional methods, such as generating and assessing diagnostic visualizations or tables, become unfeasible due to time and practicality constraints. Interactive processing can alleviate this issue, and is increasingly utilized to ensure that large data sets are diligently handled. However, recent interactive tools rarely enable data manipulation, may not generate reproducible outputs, or are typically data/domain-specific. We developed datacleanr, an interactive tool that facilitates best practices in data exploration, quality control (e.g., outlier assessment) and flexible processing for multiple tabular data types, including time series and georeferenced data. The package is open-source, and based on the R programming language. A key functionality of datacleanr is the “reproducible recipe”—a translation of all interactive actions into R code, which can be integrated into existing analyses pipelines. This enables researchers experienced with script-based workflows to utilize the strengths of interactive processing without sacrificing their usual work style or functionalities from other (R) packages. We demonstrate the package’s utility by addressing two common issues during data analyses, namely 1) identifying problematic structures and artefacts in hierarchically nested data, and 2) preventing excessive loss of data from ‘coarse,’ code-based filtering of time series. Ultimately, with datacleanr we aim to improve researchers’ workflows and increase confidence in and reproducibility of their results.

Details

einblenden:
ausblenden:
Sprache(n):
 Datum: 2022-05-122022
 Publikationsstatus: Final veröffentlicht
 Seiten: -
 Ort, Verlag, Ausgabe: -
 Inhaltsverzeichnis: -
 Art der Begutachtung: -
 Identifikatoren: DOI: 10.1371/journal.pone.0268426
GFZPOF: p4 T5 Future Landscapes
OATYPE: Gold Open Access
 Art des Abschluß: -

Veranstaltung

einblenden:

Entscheidung

einblenden:

Projektinformation

einblenden: ausblenden:
Projektname : Gefördert im Rahmen des Förderprogramms "Open Access Publikationskosten" durch die Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 491075472"
Grant ID : -
Förderprogramm : Open-Access-Publikationskosten (491075472)
Förderorganisation : Deutsche Forschungsgemeinschaft (DFG)

Quelle 1

einblenden:
ausblenden:
Titel: Plos One
Genre der Quelle: Zeitschrift, SCI, Scopus, p3, OA
 Urheber:
Affiliations:
Ort, Verlag, Ausgabe: -
Seiten: - Band / Heft: 17 (5) Artikelnummer: e0268426 Start- / Endseite: - Identifikator: CoNE: https://gfzpublic.gfz-potsdam.de/cone/journals/resource/r1311121
Publisher: Public Library of Science (PLoS)