Data Curation

Contents

Data Curation#

After reviewing the available literature on the topic (please see [1], [2], and [3] as sample), it has become evident to us that we should provide a clear definition of what data curation is in the context of science and, more specifically, in the context of our course and scientific community.

Our definition of data curation is the following:

Data Curation#

Data curation is the effort to organize multi-modal data (measurements and observations) collected during an experiment or a data acquisition in an organized set, named dataset, with the purpose of making the data more logical, easy to find, understand and reproduce.

Data curation refers to the complete process of collecting the files with all the experimental measurements, including all additional descriptors and auxiliary information. They include the set of information recorded and available in lab notebooks or in the memory of scientists and all the actors involved in collecting, reducing, and analyzing such data.

In many cases, data curation is part of data management and has to follow the policies adopted under the data governance, or DMPs (data management plans).
In some cases, data curation is used interchangeably with the term data enrichment, although that is not completely accurate.
In our vision, data enrichment is the act of adding additional descriptors to a dataset as the data contained in the dataset is further explored and analyzed.
According to this definition, data enrichment is only one step of what we see as data curation.


[1]: https://www.techtarget.com/searchbusinessanalytics/definition/data-curation What is Data Curation?
[2]: https://en.wikipedia.org/wiki/Data_curation Data Curation
[3]: https://www.alation.com/blog/what-is-data-curation/ What is data curation?