Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with changes in identifiers over time, adding provenance for local IDs #33

Open
tr325 opened this issue May 27, 2020 · 4 comments
Assignees
Labels
decision Decision to be taken that alligns the approach duplicate

Comments

@tr325
Copy link

tr325 commented May 27, 2020

identifier fields are (rightly) required throughout. However, at the start of a project a Dataset will not yet have a global identifier like a DOI, so implementing systems will need to provide a local identifier with type other. Similarly, if researchers have not yet signed up for an ORCID iD a local id will have to be supplied.

For these to be useful when systems integrate it would be helpful if the provenance of those identifiers could be added. Possibly as another entry in each of the something_id fields, with cardinality 0..1? Tracking the provenance of each identifier individually will enable integrating systems to re-use externally generated identifiers in the document rather than reassign them (for example, if a DMP is constructed from multiple cooperating systems).

As the DMP should be a living document, local ids should probably then be replaced with global ones when they become available (eg. when a DOI is created for a dataset). If an integrating system needs to track these changes and match the ids the record history of the DMP could be used to do so.

@briri
Copy link

briri commented May 27, 2020

➕ for this. We have been thinking about it as well. It would also be nice to supply these system-specific identifiers when they call the API in the event that they do not have the time or resources to update their code to capture things like ORCIDs, etc. (perhaps helping lower the bar to adoption of this standard).

I was originally leaning towards something like:

  {
    "type": "other",
    "provenance": "system_a",
    "identifier": 123
  }

It could also be possible to supply the name of the system in the type attribute:

 {
    "type": "system_a",
    "identifier": 123
  }

@briri
Copy link

briri commented May 27, 2020

If we allow for this, we could do other interesting things as well like providing a callback URL for updates. Something along the lines of the HATEOAS pattern.

For example a researcher creates a DMP in some tool and designates a specific repository. The DMP system sends the DMP maDMP json to the repository system with the callback URL. The repository system (at some point in the future) receives a dataset from the researcher. The repository system could then use that callback URI to send the DMP system the dataset's DOI

@cpina
Copy link

cpina commented Jan 19, 2021

I just joined the call earlier on, sorry if I miss some context in the wider project on this ticket...

Question: should identifiers expire? But still be recorded for data provenance?

E.g. have a primary identifier and also previous (unused?) identifiers. Or each identifier to have a timespan (created_date, deleted_date... or created_date, replaced_date, replaced_by). I've just found a similar approach: #34 (comment)

@paulwalk
Copy link
Contributor

I think that there is a fundamental issue to be discussed first:

Should the DMP standard attempt to convey "history" or, rather, simply be a mechanism for conveying the current known state of the DMP?

If we decide that we want to convey a historic record of DMP changes too, then we need to model this very carefully, and we need to anticipate a significant growth in complexity. Provenance might need to be recorded for anything that could change (note, not only IDs).

I strongly recommend that we do not assume a need to record revision history, without carefully considering the consequences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
decision Decision to be taken that alligns the approach duplicate
Projects
None yet
Development

No branches or pull requests

6 participants