-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe versioned TRS URIs #202
Conversation
@denis-yuen: Please add additional reviewers. Thanks! |
For example, if a TRS client was asked to process | ||
`trs://trs.example.org/tool_ABC/v1.2.3`, it would know that it could, e.g., | ||
issue `GET` requests to | ||
`https://trs.example.org/api/ga4gh/trs/v2/tools/tool_ABC` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, how this will work with versioning?
What happens when TRS v3 comes out and a new product only implements TRS v3 and doesn't implement v2?
That's assuming TRS v3 will use a new path segment v3
, which I suppose it doesn't have to. But I assume it would.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially very primitive content negotiation where the user agent would need to try the API versions it would be familiar with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think versioning APIs is an important topic and I think there's most definitely a need for discussion and harmonization across GA4GH APIs (e.g., Stian mentioned HATEOAS in #160), but as this PR does not introduces the issue nor is designed to address it, I don't think it should be blocked by it.
Besides, I don't think that TRS URIs, as identifiers of resources, should be concerned with API versioning at all: IMO, when bumping API versions, care must be taken (both on the spec- and implementation-side) not to introduce ambiguities in the way that the different responses in, e.g., TRS v2
and v3
can be interpreted by a client. And if changes to the internal representation of resources are required that would break backward compatibility, then I think those resources should be cloned and given new identifiers, such that reproducibility is not negatively affected. As long as these considerations/constraints are honored, it shouldn't matter which API version is used to access a given resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, I have clarified this/addressed these concerns now in lines 56/57 and 69-72
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question was not about API/schema versioning, which is a very complex subject and I'm no expert in. It was about the example that if you have a URI trs://trs.example.org/tool_ABC/v1.2.3
, it says you can fetch the content https://trs.example.org/api/ga4gh/trs/v2/tools/tool_ABC
(note the v2
in the path).
If TRS v3 comes along, and a new TRS implementation only supports that v3, then the v2
in the path means they would return a 404. Maybe that's OK, or maybe the requirement is that all TRS implementations need to implement v2 and above. Otherwise a client will need to iterate over paths with all potential versions.
Maybe I'm overthinking this, and it is a real edge case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think this is an API/schema versioning topic because the API version is precisely what v2
refers to. In any case, there anyway needs to be some sort of negotiation between a service and a client, because not every client will implement every version of the API specs, and so there will be cases where clients and servers won't match, regardless of whether the client will iterate or whether we specify the version in the TRS URI. One possibility to facilitate this negotation would be to define the supported API versions in the service info, but of course, the Service Info API is versioned, too...
I agree this is an important issue and needs to be discussed on a larger scale within the Cloud WS in order to come up with a sensible strategy that covers DRS and TRS URIs as well as other artefacts/problems with API versioning. A related issue I faced is also that the path (e.g., ga4gh/trs/v2
) defined in the schema may not always be possible or desirable to implement. For example, as far as I know, Dockstore does not. The path info could perhaps also be defined in the service info for any given service (although the same problem applies: how do you get to the service info in the first place; perhaps via the service registry...).
Still, I don't think this PR makes this problem worse than it is already, given the current definitions of DRS and TRS URIs, so I don't think it should block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think this is an API/schema versioning topic
Fair. I was thinking more of different schemas in different versions of the API when I made my comment, but you're right.
Still, I don't think this PR makes this problem worse than it is already, given the current definitions of DRS and TRS URIs
I think the explicitness that you can take a URI and translate it to a specific path is a new wrinkle, at least for TRS. You're right about DRS -- I hadn't looked until now and they do the same thing, with v1
in the path.
so I don't think it should block.
I had already approved, so it is not blocked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @coverbeck
So @denis-yuen, others: shall we bring up this important discussion to the Cloud WS?
Also perhaps to see if people agree that it might be useful to specify TRS URIs as inputs to WES (to access workflows) and inside workflows (to access containers through WES/TES implementations)? Because this might require changes on WES and TES specs as well, without which TRS URIs might not be all that useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposal at Cloud WS was to break this discussion into a separate ticket #221
For example, if a TRS client was asked to process | ||
`trs://trs.example.org/tool_ABC/v1.2.3`, it would know that it could, e.g., | ||
issue `GET` requests to | ||
`https://trs.example.org/api/ga4gh/trs/v2/tools/tool_ABC` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially very primitive content negotiation where the user agent would need to try the API versions it would be familiar with?
For example, if a TRS client was asked to process | ||
`trs://trs.example.org/tool_ABC/v1.2.3`, it would know that it could, e.g., | ||
issue `GET` requests to | ||
`https://trs.example.org/api/ga4gh/trs/v2/tools/tool_ABC` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question was not about API/schema versioning, which is a very complex subject and I'm no expert in. It was about the example that if you have a URI trs://trs.example.org/tool_ABC/v1.2.3
, it says you can fetch the content https://trs.example.org/api/ga4gh/trs/v2/tools/tool_ABC
(note the v2
in the path).
If TRS v3 comes along, and a new TRS implementation only supports that v3, then the v2
in the path means they would return a 404. Maybe that's OK, or maybe the requirement is that all TRS implementations need to implement v2 and above. Otherwise a client will need to iterate over paths with all potential versions.
Maybe I'm overthinking this, and it is a real edge case.
On today's ELIXIR Cloud & AAI call we discussed how this would interact with CWL's container specification (typically the If this proposal is adopted (for which I have nothing against) then I would like it to include specific guidance on implementation, especially in a user-facing (CWL) context. For CWL, I suggest putting versioned TRS URIs (if needed) under the "specs" field of SoftwarePackage. However, as |
Very good point @coverbeck . From my humble point of view, a future iteration of GA4GH TRS should consider either recommending/enforcing a fallback answer for selected v2 routes, or implementing both future and (then old) v2 paths. |
Thinking this through. In practice, the client would hit https://github.com/ga4gh/tool-registry-service-schemas/blob/2.0.1-beta.0/openapi/openapi.yaml#L24 once for each version of the API that the client supports from highest version to lowest version to see what is active. If it treats all requests as independent, it would indeed need to do this check each time. It kinda feels like we should lift a page out of something existing like https://developer.mozilla.org/en-US/docs/Web/HTTP/Content_negotiation but its not something I've worked with. |
I've heard of it, but I've actually played with it. I wonder how well that works with Swagger generated code? Also, you'd still need a url to negotiate with, so maybe there's an endpoint that doesn't include the version? |
Well, the swagger bit is a Dockstore problem, not a TRS problem 👿 The versionless endpoint is interesting, I do wonder in retrospect whether service-info is supposed to be nested inside the versioned API and implemented once per version or whether there should be one service-info for all versions. Going to ping them Edit to add: Another idea might be to ping the GA4GH registry to figure out which versions of what APIs are available |
I agree that guidance on implementation in CWL etc. would be nice. However, it seems to me that this would be something that should rather be documented on the implementers' side, because it is their user base that would need to be aware of these. In any case, I would probably wait a bit with this to see how this is actually being taken up and what problems might arise from using TRS URIs instead of specific container repo URIs or when specified, as you suggest, in the As for clients automatically constructing TRS URIs: the client has no way of knowing resource IDs (tool and version) just by knowing tool names, as TRS implementations are free to choose these identifiers. Mandating TRS implementations to use specific identifiers (e.g., semantic identifiers) would go too far, IMO, although implementations could of course choose to go that route (and deal with the ambiguities that this may bring along). But even then, I think clients guessing container images is potentially dangerous. But even if all of that could be somehow addressed satisfactorily, how would the client know/guess the domain/server address at which the API/resource is hosted? |
@uniqueg |
@denis-yuen: I guess one just came in from @patmagee in the corresponding issue: #164 (comment) |
For example, if a TRS client was asked to process | ||
`trs://trs.example.org/tool_ABC/v1.2.3`, it would know that it could, e.g., | ||
issue `GET` requests to | ||
`https://trs.example.org/api/ga4gh/trs/v2/tools/tool_ABC` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@uniqueg This example is a bit awkward and likely points to a deficiency in the current approach for defining trs identifiers (and potentially drs identifiers as well). The TES specification defines the root path of the API as /ga4gh/trs/<version>
.
Putting aside specific trs versioning issues (from the conversation above), I am now left with another dilema: TRS identifiers as defined only provide the hostname
and a toolId
and possibly a version id
. This approach does not leave ANY room for path values that may come before the /ga4gh/...
prefix. Given this, how am I supposed to resolve the actual trs URL for the example?
By convention, any tooling I create would resolve trs://trs.example.org/tool_ABC/v1.2.3
to https://trs.example.org/ga4gh/trs/v2/tools/tool_ABC/versions/v1.2.3
, which is not what is presented in this example (note the missing API). I would not be so concerned over this if it was not already present in the wild in the Dockstore api (@denis-yuen). So my question then is how can we possibly resolve the path components without introducing ambiguity into the trs identifier.
One approach could be just adding the prefix paths prior to the toolId trs://trs.example.org/api/tool_ABC
. Unfortuantely this is ambiguous since this could be referring to either a tool with the path prefix api
and version tool_ABC
or a tool tool_ABC
with a path api
. If we do not allow the path prefix to be in the identifier, any tooling is limited to the trs implementations that It already knows how to resolve, effectively limiting the usefulness of the trs id in general.
We essentially need a reliable way to extract the toolId
and versionId
without any ambiguity, while still allowing preceding fragments in the path. Some examples are below:
trs://trs.example.org/<prefix-before-ga4gh-api>/<toolId>:<versionId>
trs://trs.example.org/<prefix-before-ga4gh-api>/<toolId>/v/<versionId>
trs://trs.example.org/<prefix-before-ga4gh-api>/<toolId>/version/<versionId>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One could probably calculate the prefixes on the fly from https://github.com/ga4gh/tool-registry-service-schemas/blob/develop/registry.json but it probably does make more sense to change the URI format since we're not using it yet(?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@denis-yuen I would advise against relying on prefixes in the registry.json
file, which automatically enforces that file is the authoratative source of all tool registries. It does not really allow for new registries to dynamically be created (ie in a multitenant system)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my personal preference for approaches would likely be: trs://trs.example.org/<prefix-before-ga4gh-api>/<toolId>:<versionId>
. The side effect of this is that :
is now a restricted character that cannot be used in the identifier itself, but I think that is a reasonable restriction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tl;dr
I agree with @patmagee's suggestion provided that DRS folks agree that it will work for them as well.
Good catch @patmagee - thanks a lot! I had actually realized this issue before when implementing a TRS client that understands TRS URIs and having to introduce an optional base_path
parameter so that it would work with Dockstore. It seems I had forgotten about it though. And I agree, the TRS URI loses its utility if this has to be known beforehand.
From what I can see, @patmagee's suggestion should do the trick.
But thinking this through a little more, maybe there are a couple of things that we should discuss outside of this issue before merging:
- To make sure that whatever is decided here has the perspective of aligning with whatever is decided for DRS URIs in the future, I think this discussion should include the DRS community. @patmagee's other worry of being able to include basically any character in TRS Tool/Version IDs could be brought up there as well.
- I am not quite sure whether API versioning could have any effect on a client's ability to properly resolve a TRS URI. I feel that it should not. But without explicit guidelines on API versioning, outlining expected behaviors and discussing their impact on reproducibility, I feel that it is hard to be sure. So perhaps it might be worth to discuss specifying such guidelines somewhere, possibly in a place that includes other work streams as well.
Besides that, it is possibly a good idea to aim for a consensus of how servers
are defined in the various Cloud WS APIs in the mid to long term. And whatever is decided here, should not collide with our definitions of TRS and DRS URIs. Currently, the situation is like this:
TRS
servers:
- url: /ga4gh/trs/v2
DRS
servers:
- url: https://{serverURL}/ga4gh/drs/v1
variables:
serverURL:
default: drs.example.org
description: >
DRS server endpoints MUST be prefixed by the '/ga4gh/drs/v1' endpoint
path
WES
servers:
- url: https://{defaultHost}/{basePath}/{apiVersion}
variables:
defaultHost:
default: www.example.com
basePath:
default: ga4gh/wes
apiVersion:
default: v1
TES
servers:
- url: /ga4gh/tes/v1
So, it looks like TRS and TES do not care where the API is deployed, as long as endpoint paths are immediately preceded by /ga4gh/trs/v2
and /ga4gh/tes/v1
, respectively. DRS is very similar but a little more prescriptive in that it requires the use of the https
schema. WES, on the other hand, requires https
as well, but otherwise only seems to require that there be at least two slashes included in the URL before the endpoint path (weird?). In particular, it does not require a specific base path suffix like /ga4gh/wes/v1
(though the default does include it).
Now, there is currently no consensus across those definitions. But perhaps it seems to be going in the direction of the DRS definition, i.e., start with https
and end with a specific base path suffix of the form /ga4gh/{api_short_name}/{api_version}
. @patmagee's definition would account for that (having the client fill in the appropriate base path suffix). But even with a more relaxed definition that does not prescribed a specific base path for GA4GH Cloud APIs, the suggested solution should still work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposal at Cloud WS was to also break this into a separate ticket, but to prioritize it for discussion before 2.0.1 release and/or the GA4GH plenary
#220
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposing wrapping this specific PR so we can get the language about versioned TRS URIs in (describing versions of tools described by TRS), but continuing the discussion for the three other issues the the following tickets since the discussion seems to be sprawling out:
- percent encoding IDs percent encoding issue with IDs #222
- URI interaction with API versioning URI interaction problem with API versioning #221
- URI interaction with base paths base path problem with TRS URIs #220 (to be addressed as a high priority)
For example, if a TRS client was asked to process | ||
`trs://trs.example.org/tool_ABC/v1.2.3`, it would know that it could, e.g., | ||
issue `GET` requests to | ||
`https://trs.example.org/api/ga4gh/trs/v2/tools/tool_ABC` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposal at Cloud WS was to break this discussion into a separate ticket #221
For example, if a TRS client was asked to process | ||
`trs://trs.example.org/tool_ABC/v1.2.3`, it would know that it could, e.g., | ||
issue `GET` requests to | ||
`https://trs.example.org/api/ga4gh/trs/v2/tools/tool_ABC` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposal at Cloud WS was to also break this into a separate ticket, but to prioritize it for discussion before 2.0.1 release and/or the GA4GH plenary
#220
Fine with that :) |
Rationale
TRS URIs allow passing references to TRS content to potential TRS client applications (e.g., WES or TES). This PR adds support for versioned TRS URIs, which are required to specify which version of a specified tool the client application is supposed to use/consume. This is essential to ensure reproducibility for at least some of the main use cases of TRS URIs (e.g., fetching workflows or containers).
Implementation details
trs://<server>/<id>
) in TRS Data Model documentation to include versioned TRS URI (trs://<server>/<id>/<version_id>
)Misc
Related issues
Resolves #164 (see issue for additional details and a community discussion of the implementation proposed here)