Skip to content

Commit

Permalink
Update policies.md
Browse files Browse the repository at this point in the history
kept updating policies.
  • Loading branch information
mesteva authored Sep 9, 2024
1 parent 1b4c74e commit 7925f3e
Showing 1 changed file with 7 additions and 4 deletions.
11 changes: 7 additions & 4 deletions user-guide/docs/curating/policies.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ We accept engineering and social and behavioural sciences datasets derived from

#### Data Size

Given the nature of research in the natural hazards community, which involves large-scale experiments, simulations, and field research projects, we currently do not impose restrictions on the size of the datasets that can be published. Largest published datasets in DDR are ~4 TB. This approach recognizes the necessity of comprehensive data collection and the importance of making this data available for future research and analysis. We do recommend researchers to be selective and publish data that is relevant to the dataset completeness and to research reproducibility and it is adequately described for reuse. The [Curation and Publication Best Practices](https://www.designsafe-ci.org/rw/user-guides/curating-publishing-projects/best-practices/data-curation/)include recommendations to achieve a quality dataset publications. We remain open to revisiting this policy as we observe changes in data usage patterns and technological advancements. Any future modifications will be communicated clearly to the community.
Given the nature of research in natural hazards which involves large-scale experiments, simulations, and field research projects, we currently do not impose restrictions on the size of the datasets that can be published. Largest published datasets in DDR are ~5 TB. This approach recognizes the necessity of comprehensive data collection and the importance of making this data available for future research and analysis. However, we do recommend researchers to be selective and to publish data that is relevant to research reproducibility. Importantly, the dataset should be adequately organized and described so that other researchers interested in reusing the data can find what they need. Thus, publishing large sized datasets imply significant curation work. The [Curation and Publication Best Practices](https://www.designsafe-ci.org/rw/user-guides/curating-publishing-projects/best-practices/data-curation/)include recommendations to achieve quality dataset publications.

#### File Formats

Expand All @@ -32,9 +32,9 @@ Reviews datasets pre and post-publication and suggests changes and improvements.

The DDR team worked with NHERI experts to develop data models to curate the datasets generated in the natural hazards field. Based on the [Core Scientific Metadata Model], the models represent the structure and provenance of: experimental, simulation, field research/interdisciplinary, and hybrid simulation datasets. A data model type "other" was also developed for datasets that do not correspond to the research methods mentioned above and for other products such as posters, presentations, reports, check sheets, benchmarks, reports, etc. In the DDR interface users select one of this models as project type at the beginning of their interactive curation process. Implemented as interactive curation pipelines in the DDR curation interface the models allow users to organize their datasets in relation to research method and natural hazard type. This allows for a uniform curation experience and representation of published datasets.

(To facilitate data curation of the diverse and large datasets generated in the fields associated with natural hazards, we worked with experts in natural hazards research to develop<a href="https://www.designsafe-ci.org/rw/user-guides/data-curation-publication/"> five data models </a>that encompass the following types of datasets: experimental, simulation, field research, hybrid simulation, and other data products (See: 10.3390/publications7030051; 10.2218/ijdc.v13i1.661) as well as lists of specialized vocabulary. Based on the <a href="http://icatproject-contrib.github.io/CSMD/">Core Scientific Metadata Model</a>, these data models were designed considering the <a href="https://www.youtube.com/watch?v=iYzvYi-SY8Q">community's research practices and workflows</a>, <a href="https://www.youtube.com/watch?v=xUyFJwZmyqM">the need for documenting these processes</a> (provenance), and using terms common to the field. The models highlight the structure and components of natural hazards research projects across time, tests, geographical locations, provenance, and instrumentation. Researchers in our community have presented on the design, implementation and use of these models broadly.
To facilitate data curation of the diverse and large datasets generated in the fields associated with natural hazards, we worked with experts in natural hazards research to develop<a href="https://www.designsafe-ci.org/rw/user-guides/data-curation-publication/"> five data models </a>that encompass the following types of datasets: experimental, simulation, field research, hybrid simulation, and other data products (See: 10.3390/publications7030051; 10.2218/ijdc.v13i1.661) as well as lists of specialized vocabulary. Based on the <a href="http://icatproject-contrib.github.io/CSMD/">Core Scientific Metadata Model</a>, these data models were designed considering the <a href="https://www.youtube.com/watch?v=iYzvYi-SY8Q">community's research practices and workflows</a>, <a href="https://www.youtube.com/watch?v=xUyFJwZmyqM">the need for documenting these processes</a> (provenance), and using terms common to the field. The models highlight the structure and components of natural hazards research projects across time, tests, geographical locations, provenance, and instrumentation. Researchers in our community have presented on the design, implementation and use of these models broadly.

In the DDR web interface the data models are implemented as interactive functions with instructions that guide the researchers through the curation and publication tasks. As researchers move through the tion pipelines, the interactive features reinforce data and metadata completeness and thus the quality of the publication. The process will not move forward if requirements for metadata are not in place (See <a href="https://www.designsafe-ci.org/rw/user-guides/curating-publishing-projects/best-practices/data-curation/">Metadata in Best Practices</a>), or if key files are missing. )
In the DDR web interface the data models are implemented as interactive functions with instructions that guide the researchers through the curation and publication tasks. As researchers move through the tion pipelines, the interactive features reinforce data and metadata completeness and thus the quality of the publication. The process will not move forward if requirements for metadata are not in place (See <a href="https://www.designsafe-ci.org/rw/user-guides/curating-publishing-projects/best-practices/data-curation/">Metadata in Best Practices</a>), or if key files are missing.

#### Metadata

Expand Down Expand Up @@ -230,7 +230,10 @@ More information about the reasons for amends and versioning are in<a href="http

Users can click a “Leave Feedback” button on the projects’ landing pages to provide comments on any publication. This feedback is forwarded to the curation team for any needed actions, including contacting the authors. In addition, it is possible for users to message the authors directly as their contact information is available via the authors field in the publication landing pages. We encourage users to provide constructive feedback and suggest themes they may want to discuss about the publication in our <a href="https://www.designsafe-ci.org/rw/user-guides/curating-publishing-projects/best-practices/data-publication/">Leave Data Feedback Best Practices</a>

#### Data Impact { #impact }
### Tombstone
A tombstone is a landing page describing a dataset that has been removed from public access in a data repository. Creation and maintenance of the tombstone is a responsibility of the repository. In DDR curators review published datasets regularly. If data creators deposit data that does not meet the accepted data types, or if their dataset does not comply with DDR's Policies and Best Practices, they will be alerted after publication to improve their dataset presentation via amends or versioning. Users have 30 days from notification before tombstoning so they have enough time to implement improvements, and the curator will work with them . In DDR the data will still be available to the creator in My Project.

#### Data Impact

We understand data impact as a strategy that includes complementary efforts at the crossroads of data discoverability, usage metrics, and scholarly communications.

Expand Down

0 comments on commit 7925f3e

Please sign in to comment.