Skip to content

Modelling a provenance domain with Chronicle

Here we will present a reference domain that uses all the provenance features Chronicle provides and work through the process of representing it using Chronicle's domain modelling syntax. This should help you both understand Chronicle's capabilities and translating your own problem domain's provenance.

Chronicle uses the W3C Provenance Ontology as the basis for provenance modelling.

Reference domain - Medical evidence

This is a toy model of some aspects of evidence-based medicine, from an initial Question - the area and scope that the organization wishes to research and make guidance on - to revisions of a published Guidance document. The system is currently handled by a content management system that has identities for documents and users, and we will use Chronicle to add provenance capabilities.

Question creation

The question for medical evidence can vary pretty widely, but for the purposes of this example imagine it as something along the lines of "How best to assess and refer patients who have required emergency treatment for Anaphylaxis".

Various actors and processes are involved in the production of the question, but for our purposes we van view it like this:

file

The Question is then used to inform the Research for the production of Guidance.

To model and record this process you will need the Chronicle domain model definition explained here, along with the following operations:

  • question - define an Entity of subtype Question
  • questionAsked - define an Activity of subtype QuestionAsked
  • person or organization - define an Agent of subtype Person or Organization to act as Stakeholders
  • person - define an Agent of subtype Person to act as Authors
  • wasGeneratedBy - specify that the QuestionAsked Activity produced the Question
  • wasAssociatedWith - specify the Person who authored and the Organizations that asked
  • endedAtTime - specify the question was asked at a point in time

This process represented as provenance will look like:

file

Research

The Question is used to inform one or more searches to a search engine by a researcher, the parameters to the search engine are recorded, and the results are used to create references to Evidence.

file

To model and record this process you will need the Chronicle domain model definition explained here, along with the following operations:

  • question - define an Entity of subtype Question
  • evidence - define an Entity of subtype Evidence
  • researched - define an Activity of subtype Researched
  • person - define an Agent of subtype Person
  • used - specify that the Research Activity used the Question
  • wasGeneratedBy - specify that the Research Activity produced the Evidence
  • wasAssociatedWith - specify that the research was done by a Person acting as a researcher
  • startedAtTime - specify the research began at a point in time
  • endedAtTime - specify the research ended at a point in time

This process represented as provenance will look like:

file

Revision

Guidance, like authorship, is triggered by research - in this case for changes or additions to the evidence base. Evidence is used to inform a new revision of the Guidance document.

file

To model and record this process you will need the Chronicle domain model definition explained here, along with the following operations:

  • question - define an Entity of subtype Question
  • guidance - define an Entity of subtype Guidance
  • evidence - define an Entity of subtype Evidence
  • revised - define an Activity of subtype Revised
  • used - specify that the Guidance Activity used the Question
  • used - specify that the Guidance Activity used the Evidence
  • wasGeneratedBy - specify that the Guidance Activity produced the Guidance
  • wasAssociatedWith - specify that the research was done by a Person acting as a researcher
  • wasRevisionOf - specify that the Guidance is possibly a Revision of previous Guidance
  • hadPrimarySource - specify that the Guidance possibly has a primary source of the Question (for the first version)
  • startedAtTime - specify the Guidance process began at a point in time
  • endedAtTime - specify the Guidance process ended at a point in time

This process represented as provenance will look like:

file

Publication

A version of Guidance can be approved for Publication by one or more Editors or Stakeholders. Publication produces a digital artifact that can be signed.

file

  • guidance - define an Entity of subtype Guidance
  • publishedGuidance - define an Entity of subtype PublishedGuidance
  • evidence - define an Entity of subtype Evidence
  • published - define an Activity of subtype Published
  • used - specify that the Published Activity used the Guidance
  • wasGeneratedBy - specify that the Published Activity produced the PublishedGuidance
  • wasAssociatedWith - specify that the Publication was done by a Person acting as an Editor
  • actedOnBehalfOf - specify that the Publication was done by on behalf of on or more Stakeholders
  • hadPrimarySource - specify that the PublishedGuidance has a primary source of the Guidance
  • endedAtTime - specify the Published process happened at a point in time
  • hadEvidence - attach a signature of the published PDF document to the PublishedGuidance activity

This process represented as provenance will look like:

file

Conceptual design

Provenance is immutable. Once you have recorded it there is no way to contradict the provenance you have recorded. When translating your domain to provenance, your activities should be things that have either already take place, or in progress - so choose the past tense. From the process descriptions above we can create the following provenance domain:

Required attributes

Content

Plaintext content of an external resource.

CmsId

An opaque identifier from the CMS being used to author and publish documents.

Title

A plaintext title.

SearchParameter

The input to a search engine.

Reference

A BibTex reference to evidence.

Version

A simple incrementing integer representing a version number.

Entities

In PROV, things we want to describe the provenance of are called entities and have some fixed aspects. The term "things" encompasses a broad diversity of notions, including digital objects such as a file or web page, physical things such as a mountain, a building, a printed book, or a car as well as abstract concepts and ideas. An entity is a physical, digital, conceptual, or other kind of thing with some fixed aspects; entities may be real or imaginary.

When determining entities, a useful approach from process mapping is to look for nouns in your analysis. Provenance modelling is no different. We can identify the following Entities.

Question

The initial question that forms the basis of all research, informing guidance via research.

Has attributes:

  • CmdId
  • Content

Evidence

A reference to evidence gathered from a search engine.

Has attributes:

  • SearchParameter
  • Reference

Guidance

The source text of a document, either in the process of authoring or potentially published.

Has attributes:

  • Title
  • Guidance

PublishedGuidance

A published guidance document, containing a digital signature of the released PDF.

Has no attributes.

Activities

An activity is something that occurs over a period of time and acts upon or with entities; it may include consuming, processing, transforming, modifying, relocating, using, or generating entities. Just as entities cover a broad range of notions, activities can cover a broad range of notions: information processing activities may for example move, copy, or duplicate digital entities; physical activities can include driving a car between two locations or printing a book.

When determining activities, a useful approach from process mapping is to look for verbs in your analysis. Provenance modelling is similar, except we are modelling things that have taken place or are in progress. It is useful to use past tense for this reason. We can identify:

QuestionAsked

The first Activity we need to record, it will Generate a Question.

Researched

This activity will model the use of a search engine by a Researcher to produce Evidence.

Revised

This activity will model authorship and refinement by an Editor of a single revision of guidance, informed by the Question and Evidence from research.

Published

This activity models the publication of a particular revision of Guidance, approved by an editor under the advice of stakeholders.

Agents

An agent is something that bears some form of responsibility for an activity taking place, for the existence of an entity, or for another agent's activity.

For our example domain, actors are best modelled as Roles rather than Agents - People and Organizations can participate in multiple ways. So we will specify the following agents:

Person

An individual person

Organization

A named organization consisting of one or more persons, the details of the organizational model are not required to be recorded in provenance.

Roles

When participating in activities, when either directly responsible or via delegation, Agents can have a Role. Agents form the who, whereas Roles are the 'what'. Agents may have multiple roles in the same Activity. From our example domain we can identify the following roles:

Stakeholder

A stakeholder is an Organization or Person involved in the formulation of a Question and the approval of Publication.

Author

An Author is a Person who creates a Guidance of Guidance supervised by an Editor.

Researcher

A researcher is a Person who submits SearchParameter to a search engine and then creates References to Evidence.

Editor

An editor is a Person who approves Publication after consulting one or more Stakeholders and supervises Authors creating Guidances of Guidance.

Domain model format

We will now translate this conceptual design into Chronicle's domain-modelling syntax. Chronicle domain models are specified in YAML, a complete model for the conceptual design can be written like this:

name: 'evidence'
attributes:
  Content:
    type: String
  CmsId:
    type: String
  Title:
    type: String
  SearchParameter:
    type: String
  Reference:
    type: String
  Version:
    type: Int
entities:
  Question:
    attributes:
      - CmsId
      - Content
  Evidence:
    attributes:
      - SearchParameter
      - Reference
  Guidance:
    attributes:
      - Title
      - Version
  PublishedGuidance:
    attributes: []
activities:
  QuestionAsked:
    attributes:
      - Content
  Researched:
    attributes: []
  Published:
    attributes:
      - Version
  Revised:
    attributes:
      - CmsId
      - Version
agents:
  Person:
    attributes:
      - CmsId
  Organization:
    attributes:
      - Title
roles:
  - Stakeholder
  - Author
  - Researcher
  - Editor

Name

A string that names your domain, used to coordinate deployments that require multiple namespaces.

name: "evidence"

Attributes

Attributes are used to assign additional data to the prov terms - Agent, Activity and Entity. They are defined by their name and Primitive type, one of:

  • String
  • Int
  • Bool

Attribute names should be meaningful to your domain - choose things like 'Title' or 'Description', they can be reused between any of prov terms - Entity, Activity and Agent.

attributes:
  Content:
    type: String
  CmsId:
    type: String
  Title:
    type: String
  SearchParameter:
    type: String
  Reference:
    type: String
  Version:
    type: Int

Agent

Using Chronicle's domain model definitions an Agent can be subtyped and associated with attributes like other provenance terms. In the following example we define the two Agent subtypes, Person has an id from the CMS, Organization a text title.

agents:
  Person:
    attributes:
      - CmsId
  Organization:
    attributes:
      - Title

Entity

Using Chronicle's domain model definitions an Entity can be subtyped and associated with attributes like other provenance terms. In the following example we define the four Entity subtypes, Question has an id from the CMS and its content, Evidence the search parameters and reference. Guidance a title and version and PublishedGuidance needs no attributes.

entities:
  Question:
    attributes:
      - CmsId
      - Content
  Evidence:
    attributes:
      - SearchParameter
      - Reference
  Guidance:
    attributes:
      - Title
      - Version
  PublishedGuidance:
    attributes: []

Activity

Using Chronicle's domain model definitions an Activity can be subtyped and associated with attributes like other provenance terms. In the following example we define the four Activity subtypes, Question has an id from the CMS and its content, Evidence the search parameters and reference, Guidance a title and version and PublishedGuidance needs no attributes.

activities:
  QuestionAsked:
    attributes:
      - Content
  Researched:
    attributes: []
  Published:
    attributes:
      - Version
  Revised:
    attributes:
      - CmsId
      - Version

Role

Corresponding to actors in the example domain we specify the following roles:

roles:
  - Stakeholder
  - Author
  - Researcher
  - Editor

Supplying this as a YAML file to the Chronicle build image as documented in building chronicle will produce a well-typed API for your domain. The next step is then recording provenance.

Evolution

Redefinition of a Chronicle domain with existing data is possible, with some caveats:

Type removal

You can remove a prov term (Entity, Agent or Activity), but as Chronicle data is immutable it will still exist on the back end. Terms can still be returned via queries, but will be as their Untyped variant - ProvEntity, ProvAgent and ProvActivity and their attributes will no longer be available via GraphQL.

Attribute removal

You can remove an attribute, but again it will still exist in provenance you have already recorded.

Attribute addition

You can add new attributes, and add their values to both existing and new data.

This conforms to most reasonable models of interface and protocol evolution, where you should design for extension rather than modification.