-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflow engines fail to run workflows referenced by certain TRS URLs #247
Comments
I agree that the way the TRS spec references the relative location of accessory workflow files with respect to the primary descriptor is anywhere from confusing to limiting (see below). However, the issue you describe is probably mostly a server implementation issue (at Dockstore) and should probably be raised there (see TRS URLs section below). On another note, I would suggest that clients (Cromwell, TRS URLsI agree with you that servers should properly resolve relative paths in TRS URLs of the form LimitationHowever, note that parent directories relative to the primary descriptor are not supported. From the relevant part of the specification:
So technically that means that only workflows are supported that follow a directory structure where the primary descriptor (main workflow file) is in the top level directory relative to all secondary descriptors (imported subworkflows/modules). Either way, I guess Dockstore should probably respond with a SummaryNot being able to reference files in parent directories relative to the primary descriptor is indeed a limitation that I feel needs to be addressed in the TRS specs and could/should be further discussed here. I don't know or remember what speaks against supporting parent directories (perhaps a security "feature"?), but it seems to me that given the current specs and a fully compliant TRS implementation, you are out of luck when there are secondary descriptors in a parent or sibling directory directory relative to the primary descriptor (like in your use case, @svonworl). TRS URIsUsage
Current limitations
SummaryTRS URIs make it easier for users to share TRS resources and start tool/workflow runs in supporting clients. |
Ah interesting. We actually 400 or 404, so there's no security issue.
So far, we've been downloading the workflows in zip format from the files endpoint.
👍 |
Yes, you can do that. However, information about each file, including its description: Get a list of objects that contain the relative path and file type.
The descriptors are intended for use with the
/tools/{id}/versions/{version_id}/{type}/descriptor/{relative_path}
endpoint. Returns a zip file of all files when format=zip is specified. this information is
which, as described above, prohibits references to files that are not located in the primary descriptor's directory or a child directory thereof. For reference, the ToolFile:
type: object
properties:
path:
type: string
description: Relative path of the file. A descriptor's path can be used with
the GA4GH .../{type}/descriptor/{relative_path} endpoint.
file_type:
type: string
enum:
- TEST_FILE
- PRIMARY_DESCRIPTOR
- SECONDARY_DESCRIPTOR
- CONTAINERFILE
- OTHER
checksum:
$ref: "#/components/schemas/Checksum" So while nothing is stopping an implementation from returning a ZIP archive containing whatever directory structure a workflow requires when clients call To maintain consistency with To address this issue without having to ever deal with parent directory referenes, perhaps we could require resources to use an implicit workflow top-level directory (e.g., a Git repository root directory) as an anchor when constructing relative paths for workflow files to be made available via This approach would have the added benefit of us only needing to change descriptions, and so perhaps we could get away with just sneaking in such a change in for a future minor version bump of the TRS specs. |
I'm running into this problem trying to add support for TRS lookups to the Toil workflow runner. Since Toil already knows how to read and run a CWL or WDL workflow from a plain HTTP-hosted directory tree, I want to query TRS and somehow get from it a URL to the main workflow descriptor file in its appropriate directory context. Dockstore kind of supports this already; the TRS IDs on its workflow pages link to URLs like But this seems to be a Dockstore extension, and TRS doesn't itself provide a way to get that If the API is up for redesign here, I would favor designs that look as much like plain HTTP as possible, over designs that officially support fetching files in higher-level directories but require a sort of TRS-specific virtual filesystem implementation in the client. |
Link for presentation dockstore/dockstore#6005 (dockstore workaround) |
As detailed in dockstore/dockstore#5594, some workflow engines, including miniwdl and Cromwell, fail to run a valid Dockstore workflow when:
..
).For example, the following invocation fails:
Turns out the problem is bigger - the workflow engines also fail to run workflows that import files with relative paths.
The root cause is that, when the engines calculate the URL of an import, they interpret the specified TRS URL as a file path. However, a TRS URL doesn't represent a file path, so the engines miscalculate the import URLs and fail when they attempt to load them.
For example, given the TRS URL
and an import referenced by a relative path
The engines calculate the import URL, by applying typical file resolution semantics, as:
The above URL is a corrupt TRS URL, because parts of the original TRS URL have been deleted. During the import URL calculation, the engine drops the trailing
descriptor
portion of the TRS URL because it looks like a filename, and when the engine normalizes the URL prior to the request, it collapses the parent directory references and more of the original TRS URL is deleted.Per the TRS spec, a relative path can be appended to the TRS primary descriptor URL, and it will resolve the file relative to the primary descriptor and return its contents. So, the correct URL is:
Note that when miniwdl is run with a URL that references the raw github files, it works as expected:
Why does it work? The github URL ends with the absolute path of the workflow file, allowing the import urls to be resolved using typical file resolution semantics.
This issue should probably be addressed in the next major TRS revision.
In lieu of that, here are some possible solutions that would help the engines to correctly run a workflow referenced by a "bare" TRS primary descriptor URL:
┆Issue is synchronized with this Jira Story
┆Project Name: Zzz-ARCHIVE GA4GH tool-registry-service
┆Issue Number: TRS-70
The text was updated successfully, but these errors were encountered: