Promoting Global Sharing of Earth System Science Data 1 Through Free and Open Access Data Publication

19 In less than one decade the open-access data journal Earth System Science Data (ESSD, a 20 member of the Copernicus Open Access Publisher family) grew from a start-up venture into one 21 of the highest-rated journals in global environmental science. Stimulated by data needs of the 22 International Polar Year 2007-2008, ESSD now serves a very broad community of data providers 23 and users, ensuring that users get free and easy access to quality data products and that providers 24 gain full public credit for preparing, describing and sharing those products. Adopting technology 25 and practices from research journals, ESSD moved data publication from an abstract concept to a 26 working enterprise; several publishers now support similar data-sharing journals. As it confronts 27 increasing challenges and barriers, ESSD serves as a prominent voice for and an example of 28 emphatic fully-free fully-open global data access. Data journals such as ESSD clearly meet a 29 strong community need. 30 31


Brief History
Having stimulated vast interest and participation (Carlson 2010), the International Polar Year 2007-2008 (IPY) also exposed substantial deficiencies in international data services.Despite operating under an enlightened open-access data policy, Carlson (2011) reported "inadequate services, almost no international support, and few solutions''.As if to confirm dismal initial assessments, A. Driemel and colleagues (2015) undertook a post-project inventory to extract and preserve IPY data that had emerged in various IPY-labeled or IPY-related publications.
Based on public complaints from Carlson (2010) which elicited intervention by Hans Pfeiffenberger (then at Alfred Wegener Institute; personal communication), Copernicus offered to support a data journal venture under the title 'Earth System Science Data' (ESSD).Having successfully processed and published an initial description of ozonesonde data from Antarctica (König-Langlo & Gernandt 2009) followed some months later by two special issues proposed by the oceanographic community, ESSD began the process of building community interest and confidence.We recognized immediately that ESSD's remit would extend beyond polar data.
Figure 1 shows gradual accumulation of data products described and promoted through successful ESSD publications.ESSD remained a specialty journal of Copernicus, publishing 30some data descriptions per year during its first five years.Eventually Copernicus decided to promote ESSD through registration in the Thomson-Reuters (now Clarivate) journal indexing and citation system Web of Science.To buttress our application, we needed to show ESSD as not overly-dependent on special issues and as serving a broad community beyond polar science.By 2014 both of those issues seemed safely discharged.ESSD received a very high rating in its first Journal Citation Report: roughly 8.3 for 2015.
A positive feedback cycle ensued: more submissions seeking higher impact factors led to more data products serving more communities with ESSD's attention to open access and data quality as a constant asset.Impact factors increased: 9.2 for 2019 with a five-year average of 9.6.In  Polar topics constitute only 9% of ESSD data descriptions.A broader categorization (Figure 2) necessary to encompass the wide range of ESSD publications shows prominence of land, ocean and atmosphere topics with ice a smaller but respectable fraction.'Global' (e.g.global budgets of carbon, methane, sea level, energy, etc.) and 'Earth' (e.g.gravity) represent small fractions with -often -disproportionately high impact.ESSD constantly receives new submissions (population, air transport, historical records of the built environment, etc.) that further stretch disciplinary distinctions.

ESSD processes
ESSD fundamentally evaluates and certifies data quality and data accessibility, leveraging expertise of subject matter and data management experts.Carlson & Oda (2018)  Many datasets evolve with time via periodic (daily, monthly, annual, etc.) increment or via revision.For datasets that update semi-autonomously, e.g. one additional year of satellite data processed via consistent algorithms, ESSD again recommends a DOI plus URL: the DOI addresses an initial static data product described in ESSD while users access evolving up-to-date products via the URL.Many ESSD-published data descriptions follow this model.
For a product where sources, calibrations, validations, or approaches may have changed, but where the desired outcome, e.g. a comprehensive global budget, remains identical, ESSD applies a 'Living Data' process.Authors archive an up-dated version of the data product under a fresh DOI and describe the new product using 'Living Data' options in ESSD.Specifically: authors submit a 'track-changes' version of the prior article that allows reviewers and users to see changes.ESSD endeavours to re-use at least one reviewer from a prior version; those reviewers can focus on specific changes in the most-recent version.Descriptions and data under a 'Living Data' designation generally lighten workloads for providers, reviewers and users.After three or four 'Living Data' iterations, data products and data descriptions will usually have evolved substantially so that authors or journal editors or both will request a fresh thorough review.
A data description published in ESSD linked to a data product held at a partner repository represents -in nearly every case -fully-free fully-accessible high-quality well-described readyto-use data.By these outcomes, ESSD and similar data publication journals provide tangible benefit to data providers, useful products to data users, and -publication-by-publication -a growing library of open access data produced and used by a global community of Open Science advocates.

An ESSD-stimulated open access community
As shown in Figure 2 ESSD often publishes complex community-based data products, compilations of the efforts of dozens of researchers over decades: e.g., global streamflow analyses (Gudmundsson et al. 2018); multidecadal global surface ocean CO2 concentrations (Bakker et al. 2016); long-term reproducible climate-quality sea ice concentrations (Peng et al. 2013); global methane budgets (Saunois et al. 2020); and dozens of others.These projects, programmes and regular or ad hoc assemblages of researchers need a place to share outcomes of substantive data gathering and data quality control efforts.ESSD provides that credit and -equally important -an avenue to sharing these quality-assured open access products.We note that each ESSD paper in the list above generates views and downloads in 1000s to 10000s.We know of no other mechanism by which researchers achieve that level of interest in their data compilation efforts.
As our world of scientific data evolves, with expectations, standards, tools, repositories and products changing constantly, ESSD seeks to expand communities of providers and users without relaxing focus on quality and accessibility.As researchers overcome access and computing barriers through use of Google Earth Engine, as they apply advanced machine learning extraction or conversion tools or explore virtual reality visualizations, as they push Open Science concepts to earliest stages of project management through open access data notebooks (e.g.Atkins et al. 2021) Because ESSD data descriptions discovered via search also return article metrics, authors can easily monitor community interest in their product(s).

Challenges
As mentioned, ESSD confronts continuing challenging changes in data sources, sizes and quality.As a small journal in the much wider world of scientific publishing, ESSD also finds itself buffeted by changes in publishing expectations, practices and standards.ESSD's success adds complexity to some of these challenges.
For ESSD, 'big' data means global emissions products interpolated to km-scale grids, long-term atmospheric reanalyses, satellite-generated time series, 4-D high-resolution matrices, etc., of generally larger than 20 to 50 GB.An acute data challenge emerges when file sizes exceed what disciplinary data centres can manage.Even Zenodo -a generalist data archive increasingly popular with many providers -imposes a size limit of 50 GB for each DOI.Meanwhile, amidst ongoing daily distributions of many tens of TB, major forecast centres increasingly desire an ESSD-certified description of specific or new products; a peer-reviewed ESSD data description can prove more useful than web-based technical manuals.Data providers find that cloud-based services such as Google Earth Engine allow them to explore and function beyond local limits on storage or computing.ESSD addresses the growing size and availability of big data products on one hand with heightened interest by data users (most of whom do not sit on high-bandwidth networks) in useful descriptions and quality assessments.ESSD insists on careful detailed listing of all data sources, whether obtained within Google Earth Engine or downloaded from institutional archives; many ESSD manuscripts therefore include an attribution table that allows users to track exact sources, exact versions, download dates, etc.In the interests of reviewers and users, ESSD requests teaser products: small (10s of MB?) extracts in time or space of larger products that demonstrate the full range of author-described generation skills.For an ESSD Special Issue on regional emissions (https://essd.copernicus.org/articles/special_issue1100.html), the topical editors gained agreement on a mutually-defined 3-month teaser period; each submission should include a teaser covering DJF 2014-2015.As data moves in these larger directions, ESSD finds collaborative innovations to keep users abreast.In early days ESSD could insist on barrier-free access to nearly every data product.With time, data commercialization, diminished funding for repository services, and increasingly restrictive national policies, more and more repositories impose a login or similar barrier.Even when -as they all claim -personal information gathered during registration remains highly secured, and even though most of us use names and passwords for basic internet functions, growth in the use of login barriers erodes free unhindered exchange of data.When necessary ESSD works with repositories to establish generic anonymous logins for reviewers; we would rather not need a custom back-door solution in every case.Thankfully, prominent data repositories remain barrierfree.
Although several ESSD-published data products (e.g.global budgets of carbon, CH4, energy, etc.) require regular updates, other products (refined gravity fields, global streamflow or volcanic aerosol compilations, guide to population data products) represent definitive durable products.
ESSD Topical Editors need flexibility to handle once-per-year and once-per-decade submissions.
2019, Scimago rated ESSD second for Earth and Space Science (https://www.scimagojr.com/journalrank.php?type=j&category=1901; accessed June 12, 2019).Successful publication of open-access data and description of a global carbon budget (Le Quéré et al. 2013, that manuscript has received more than 40k views and downloads) raised ESSD's profile within the climate community.ESSD developed 'Living Data' processes for data undergoing periodic updates.In 2020 ESSD will handle nearly 400 descriptions of new data products.Since inception ESSD has rewarded more than 6000 data providers (authors and coauthors) for data-sharing efforts.For users, ESSD publications have described and certified 632 (thru November 2020) high-quality open-access data products.
provide a detailed description of ESSD's mandate and expectations.ESSD's evaluation processes focus on data quality factors -formats and documentation, uncertainties, product validation, and accessibility -that assure users of the usefulness of data products.Review of a data description (that includes review of the data) often proves more rigorous and more time-consuming than review of a research paper.All submissions undergo standard processing within the Copernicus open review and discussion format; ESSD enjoins reviewers to 'test drive' data, acting as surrogates for subsequent data users.Submissions, reviews, revisions, community comments, editor comments, interim versions, etc., become part of a permanent open record for each ESSD manuscript.ESSD holds no data products.The journal works with data repositories around the world, repositories who themselves foster barrier-free open access, version control, metadata standards, and -above all -minting of digital object identifiers (DOI).ESSD establishes archive and curation partnerships on a practical dataset-by-dataset basis.Often providers choose a topical, national or institutional data repository; ESSD tries to follow those preferences.When a data provider does not have or does not know a suitable repository, ESSD will recommend an open access repository.As ESSD receives more and more descriptions of large (e.g.global, multidecade, high spatial resolution) data files, ESSD attempts to partner directly with source institutions (e.g., forecast centres, space agencies, etc.).In many cases those institutions will have established -for valid reasons -data policies and data access services different to those espoused by ESSD.We note that extended conversation between ESSD and ECMWF recently resulted in new open anonymous access options for ECMWF products (https://www.ecmwf.int/en/forecasts/access-forecasts/registration-vs-anonymous-access).Even with pandemic-induced delays, ESSD handles data descriptions from submission to publication, including negotiation steps with provider and repository where necessary, no slower than for research submissions in other Copernicus journals (as of December 2020: ESSD 192 days, ACP 193 days, GMD 220 days, HESS 225 days, etc.).To support topical editors and smooth submission processes, an ESSD Managing Editor makes detailed initial checks of data formats, accessibility, licenses, login barriers, etc.Most rejections of ESSD submissions occur as a consequence of initial scrutiny; too many authors seek high impact factors of ESSD while ignoring the journal's fundamental mandate of open sharing of useful data.Additional rejections based primarily on data quality occur as an outcome of the review process.ESSD publications cover databases and datasets.Databases often set new metadata standards for user communities, including uniform formats, descriptive fields, terminologies, chronologies, etc., and combine data rescue with future contributions.Recognizing that data in databases nearly always undergo dynamic change, ESSD asks authors to deposit a snapshot of database fields and contents as of the time of publication.By including both a DOI-labeled snapshot and a database URL, authors demonstrate reliability and reproducibility while also soliciting additional contributions.
, ESSD descriptions of open access data products cover all aspects of our planet; a portion of researchers across the full range of Earth system sciences have joined an open access data community.For individuals, ESSD offers a clear exchange: providers get credit for the work of sharing data while users get access to high quality products.Researchers observe our planet from unique perspectives then share data via ESSD: combined radar and camera tracking of volcanic aerosol plumes in Iceland (Petersen et al. 2012); nearly 50 years of first flowering dates for 'weeds' and trees in a Swiss canton (Rutishauser et al. 2019); aquaculture installations along the Chinese coast (Fu et al. in review); crowd-sourced air traffic data during pandemic-induced travel restrictions (Strohmeier et al. in review); too many others to list.Whatever the intent of and benefit accruing to those researchers, sharing products through ESSD amplifies exposure with who-knows-what eventual impact.One can use DOI tracking to document numerous analysis or modeling papers based on ESSD products; to enhance cumulative impact, research papers often emerge in close coordination with ESSD data descriptions.For example, analysis papers in Nature Climate Change (Peters et al. 2019) and Environmental Research Letters (Jackson et al. 2019) appeared simultaneously with the global carbon budget described in ESSD (Friedlingstein et al. 2019).As these examples show, an ESSD published description combined with an accurate DOI-labeled data citation protects and promotes openly-shared data.
, ESSD attempts to maintain flexibility to allow providers to gain credit for innovative products while ensuring users of a quality outcome.In a practical sense this expanded journal purview depends crucially on constant recruitment of new reviewers and adventurous topical editors.Parallel with the evolution of ESSD, most publishers have enabled substantial access to research literature, particularly for Open Access journals.Curious researchers (and citizens) can apply favored search tools on almost any computer, without charge.Copernicus-enabled search functions allow text searches of ESSD author, title, abstract or full narrative through a quick easy interface; these functions allow one to find ESSD products regardless of size, source, impact, or prominence.This combination of data search with literature search, facilitated by Open Access standards, represents a fundamental component of Open Science.Not every literature search returns an open-access paper but users soon learn that every successful ESSD search leads to an open-access data description which almost certainly leads in turn to an open-access data product.
Licenses -imposed according to diverse standards by groups, institutions or national policiesremain a fraught issue for data journals where data providers intend onward use.ESSD and sister data journals espouse free and open access to data products and therefore recommend a public domain waiver (CC0) or simple attribution license such as CC-BY.ESSD finds additional 'share alike' (-SA) requirements counter to its open access mission; in specific cases ESSD may accept 'non-commercial' (-NC) licences.In general ESSD recommends and practices open licenses.
Because ESSD submissions tend to arrive in waves influenced by emergence of data products and compilation programmes (e.g.European Space Agency's Climate Change Initiative, many of whose products end up in ESSD), by project intentions (many projects reference ESSD in data management plans) and by word-of-mouth as one successful publication of a soil moisture or oil palm distribution product induces similar or competing submissions, an ESSD Topical Editor will necessarily encounter familiar within-speciality topics along with exotic submissions that push the boundaries of expertise.A good ESSD Topical Editor sustains general curiosity about Earth Systems, motivated by dedication to the goal of fostering open data sharing.A single chief editor supported by good Topical Editors could manage -with substantial assistance from Copernicus -ESSD at 30 submissions per year.A more popular ESSD, one that now processes nearly 400 submissions per year, requires more editorial staff, better communication and coordination, and an even greater commitment by the data-dependent Open Science community; this more-successful ESSD places greater reliance and burden on expert reviewers!How will ESSD and Copernicus handle these success-induced challenges?How does ESSD clarify and amplify its mission statement and submission guidelines to discourage increasing numbers of (no data) research papers focused solely on high impact factors?At what point must Copernicus re-evaluate its commitment to maintaining ESSD as a completely free journal?If motivation and enthusiasm for open access data sharing continues to grow, and as other communities of researchers discover and exploit benefits of data publication, will ESSD and sister data journals need to enlarge, multiply, or fragment into discipline-specific data journals?Conclusion ESSD proved what prior reports had only imagined: that providing credit to data providers and quality products to data users would facilitate and accelerate open exchange of data.The fundamental incentive of tangible citable credit to data providers, achieved through familiar peer-review processes, has clearly stimulated and accelerated open data and Open Science.Data journals have established data publication and -by extension -open data repositories as a welcome substantial enterprise, one that needs and deserves commensurate community support.Any sense of a vast reservoir of unexposed data awaiting only the opportunity of a new journal remains emphatically false!Preparing and curating a data product for description, evaluation and publication by a data journal represents very hard work by providers, reviewers, editors and publication specialists.That some ESSD publications have achieved remarkable impact should not hide the substantial effort behind every successful data publication.White papers, case studies, organizations and standards, while interesting and often relevant, have yet to have the positive open data impact of ESSD.