cwlprov:relationship sketch #2

mr-c · 2018-10-02T14:27:38Z

Together with #1 this attempts to find a way to pre-define domain-specific provenance that would be generated at workflow run time. The idea is define a set of relationships that will be added onto the produced outputs of a step to relate it to other data values or concepts at creation time.

These can use domain-specific ontologies like EDAM ontology or BioSchemas, or more generic ones likes PROV or schema.org

#!/usr/bin/env cwl-runner
cwlVersion: v1.0
class: Workflow

inputs:
  first_input: File
  second_input: long

steps: []

outputs:
  first_output:
    type: File
    outputSource: first_input
    cwlprov:relationships:
       prov:wasDerivedFrom: [ '#inputs.second_input' ]
       prov:wasInfluencedBy: [ '#inputs.second_output' ]

$namespaces:
  prov: http://www.w3.org/ns/prov#
  cwlprov: https://w3id.org/cwl/prov#

$schemas:
  - http://www.w3.org/ns/prov.owl

stain · 2018-10-02T14:32:59Z

As this is a relationship to be generated between values of first_output and second_output, I think some kind of template or expression?

JSON-LD with $expansions

cwlprov:relationship:
  { "@id": "$second_output",
    "prov:wasDerivedFrom": "$first_output" }

Or if we assume the current port is the subject and you can't do arbitrary structures you can just have property-object references (no literals in this case):

cwlprov:relationship: {
    "prov:wasDerivedFrom": "$first_output",
    "example:foo": "edam:topic_0091",
  }

Namespaces like prov and edam here must be defined in CWL $namespaces. The template is expanded based on identifiers for the produced values (e.g. urn:uuid:8c97eb7a-94d8-40bf-a932-7e888445f2ec).

If we have:

{ "first_output": { 
    "@id": "urn:uuid:a1626deb-a5a8-4b84-803e-8dd51f80bf2d"
  },
  "second_output": {
    "@id": "urn:uuid:6e076c8b-d3fe-47f0-844b-b0e1561d3181"
  }
}

Then with expansion of namespaces and $variables we get:

{ "first_output": { 
    "@id": "urn:uuid:a1626deb-a5a8-4b84-803e-8dd51f80bf2d"
  },
  "second_output": {
    "@id": "urn:uuid:6e076c8b-d3fe-47f0-844b-b0e1561d3181",
    "http://www.w3.org/ns/prov#wasDerivedFrom":  {
      "@id": "urn:uuid:a1626deb-a5a8-4b84-803e-8dd51f80bf2d"
     },
    "http://example.com/foo":  {
      "@id": "http://edamontology.org/topic_0091"
    }
  }
}

[ updated by @mr-c to add missing commas, make the UUIDs unique ]

mr-c · 2018-10-03T08:57:19Z

@stain Thank you for the json-ld example.

I've updated my sketch to show that we might want to set relationships between an output and another output and also an input

Issue common-workflow-language/cwlprov#2

stain · 2019-05-23T12:37:36Z

OK, in 036af7c78a3e1c5125009ae05dbdb853afca6790 I try to sketch out how this can be recorded as templates in the CWL, and then add these to the PROV. There is an issue in what to call these (here cwlprov:relationships and how to reference the variables to fill in at execution time (here using a direct reference #inputs.first_input).

But this leads to fairly misleading information in cwlprov --print-rdf in that it would claim the output parameter definition has a "relationship" to an anonymous object, which then "is derived from" (or whatever property is used) an input parameter definition. This is acceptable if we think of the input/object parameter as a "superobject" of every object that passes through it, as in every file object prov:specializationOf the parameters it is input or output at.

(this is like saying Stian is a specialisation of CustomerOfTesco because I went shopping at Tesco once)

See also PROV-Template which would use a special var namespace for pre-existing variables, which we could bind directly to the input/output objects using existing CWL Expressions (e.g. $(inputs.message) -> var:inputs.message)

stain · 2019-05-23T12:45:57Z

Here are some of the mappings we should be able to do https://gist.github.com/stain/f0b0d966a103b1533d684aa6d7197364

The data concepts are often more complex expressions than pure typing from EDAM ontology or BioSchemas - so it might be we need to support more than 1 triple-level expressions as explored here and in #1.

stain added a commit to common-workflow-language/cwltool that referenced this issue May 1, 2019

record cwlprov#relationships

036af7c

Issue common-workflow-language/cwlprov#2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cwlprov:relationship sketch #2

cwlprov:relationship sketch #2

mr-c commented Oct 2, 2018 •

edited by stain

Loading

stain commented Oct 2, 2018 •

edited by mr-c

Loading

mr-c commented Oct 3, 2018

stain commented May 23, 2019

stain commented May 23, 2019

cwlprov:relationship sketch #2

cwlprov:relationship sketch #2

Comments

mr-c commented Oct 2, 2018 • edited by stain Loading

stain commented Oct 2, 2018 • edited by mr-c Loading

mr-c commented Oct 3, 2018

stain commented May 23, 2019

stain commented May 23, 2019

mr-c commented Oct 2, 2018 •

edited by stain

Loading

stain commented Oct 2, 2018 •

edited by mr-c

Loading