Why Provenance is the Key to AI Success: Knowledge Graph Ontology Design

  • The technology is widely adopted in open data circles — meaning we can make use of publicly available linked data.
  • There is a strong emphasis put on ontology design, meaning we can control the concepts that describe our domain. It also means we can semantically traverse the graph.
  • The nature of graph databases makes it extremely easy to add new data as knowledge.
  • An open standard means we can remain database vendor agnostic.

What is provenance?

Now for some more definitions. When we discuss data provenance (often referred to as lineage) what we are referring to is metadata that describes the origin of data. My semantic web friends who authored the PROV ontology have provided a more concrete definition:

What does maintaining provenance in ontology design enable?

  1. A perfect playground for data science — the beauty of RDF knowledge means that data is held in a highly flexible manner. It can be extracted at any granularity for machine learning tasks — including subgraphs for graph learning problems. All of this whilst maintaining the lineage of where the data came from.
  2. Ensures data quality — in data science, there is a common catchphrase — garbage in, garbage out. In a knowledge graph with such a huge amount from disparate sources, knowing where the data came from is crucial.
  3. Maintains context — even if the data is of high quality, it is important we understand the context behind metadata. For instance — the sectoral classification for two different company intelligence websites is not the same, even if the labels might be identical in some places.
  4. Entity reconciliation — one of the biggest problems in the digital world is understanding when two separate pieces of information are ultimately referring to the same instance of the same concept. This is known as entity reconciliation and can be easily enabled using provenance modeling techniques.
  5. Compliance and Security — understanding the origin of data that could potentially end up in the hands of a customer is crucial to ensuring compliance.



ODSC - Open Data Science

ODSC - Open Data Science


Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.