What is Data Provenance?

The word provenance originates from the French term 'provenir' meaning 'to come from'.

Data provenance is the ability to trace the origin of data and identify how it has been altered or transformed throughout its lifecycle. Data Lifecycle
Other definitions include:

  • Data provenance is the documentation of where a piece of data comes from and the processes and methodology by which it was produced

  • Provenance is a description of how things came to be, and how they came to be in the state they are in today

  • On the web, provenance would include information about the creation and publication of web resources as well as information about access of those resources, and activities related to their discussion, linking, and reuse [1]

  • In art history, provenance may include information about an artifact's creation (who created it, when, where, why, and how) as well as descriptive metadata that can be correlated with time (e.g. chemical composition that could bound when a work could have been created) or with context (e.g. analysis of brush strokes that could link a painting with other works of the same artist) [1]

  • In business, provenance may include information about financial and legal processes (e.g. in contracts) as well as the electronic (e.g. online ordering) and physical (e.g. shipping) processes that have occurred [1]

  • In scientific research, provenance may include the set of physical and computational processes applied to a sample that would allow repetition of an experiment as well as descriptive information about a sample (e.g. it's chemical composition) and the experimental protocol that would allow reproduction of the work [1]

