Capturing Data Provenance in Research

How to capture Data Provenance?

Data provenance as a combination of seven interconnected elements including, “what”, “when”, “where”, “how”, “who”, “which”, and “why” [1]. W7 Model
Case study 1: In a biology company, a marine biologist performs tests that studies the evolution of the genetic structure of a specific species of salmon. The biologist publishes the results of the research. Another biologist discovers the published research findings. Before reusing the results, the biologist verifies whether the results are valid by repeating the test procedure, in a test environment that was describe.
Reaction: A replication test to verify published data requires provenance that is capable in answering the following questions: 1) how was the data created? and 2) how was the study conducted in terms of the procedure, environment, sample condition, temperature, sample size and type etc. ?

Case study 2: An ecologist conducts a lab study on aquatic samples obtained for a habitat assessment of a lake basin to show the degree of impact on human activity. Another ecologist in a different lab later performs a test on the same samples provided by the producer. The ecologist compares the two results and noticed significant different. There is a need to access whether the differences are because of different test methods or different instruments used in the test.
Reaction: To determine the quality or reliability of the test results, it is necessary to explore provenance of the data to answers the following questions: 1) how were the results generated?, 2) when were the result generated and 3) which tools and methods were used to create the data?

Case study 3: A scientist is interested in the population status of salmon in neighboring streams in the Pacific Northwest. The scientist is trying to acquire data of reproduction rates from traps. Traps capture small fish during migration providing an estimation of the amount of fish each stream is producing.
Reaction: The use of data provenance for data discovery. In this use case, the question the scientist needs to answer is 1) where was the data measured ? So that the appropriate data is located.

