Skip to content

Commit

Permalink
Added shell-task tutorial
Browse files Browse the repository at this point in the history
  • Loading branch information
tclose committed Dec 24, 2024
1 parent e193bb8 commit 53f28bd
Show file tree
Hide file tree
Showing 3 changed files with 394 additions and 49 deletions.
356 changes: 354 additions & 2 deletions docs/source/tutorial/shell.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,345 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Shell task design"
"# Shell-task design"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Command-line template\n",
"\n",
"Define a shell-task specification using a command template string. Input and output fields are both specified by placing the name of the field within enclosing `<` and `>`. Outputs are differentiated by the `out|` prefix."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[outarg(name='out_file', type=<class 'fileformats.generic.fsobject.FsObject'>, default=EMPTY, help_string='', requires=[], converter=None, validator=None, xor=(), copy_mode=<CopyMode.any: 15>, copy_collation=<CopyCollation.any: 0>, copy_ext_decomp=<ExtensionDecomposition.single: 1>, readonly=False, argstr='', position=1, sep=None, allowed_values=None, container_path=False, formatter=None, path_template='out_file'), arg(name='executable', type=typing.Union[str, typing.Sequence[str]], default='cp', help_string=\"the first part of the command, can be a string, e.g. 'ls', or a list, e.g. ['ls', '-l', 'dirname']\", requires=[], converter=None, validator=<min_len validator for 1>, xor=(), copy_mode=<CopyMode.any: 15>, copy_collation=<CopyCollation.any: 0>, copy_ext_decomp=<ExtensionDecomposition.single: 1>, readonly=False, argstr='', position=0, sep=None, allowed_values=None, container_path=False, formatter=None)]\n"
]
},
{
"ename": "TypeError",
"evalue": "cp.__init__() got an unexpected keyword argument 'in_file'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[5], line 13\u001b[0m\n\u001b[1;32m 11\u001b[0m \u001b[38;5;28mprint\u001b[39m(list_fields(Cp))\n\u001b[1;32m 12\u001b[0m \u001b[38;5;66;03m# Parameterise the task spec\u001b[39;00m\n\u001b[0;32m---> 13\u001b[0m cp \u001b[38;5;241m=\u001b[39m \u001b[43mCp\u001b[49m\u001b[43m(\u001b[49m\u001b[43min_file\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mtest_file\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mout_file\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43m./out.txt\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m\n\u001b[1;32m 15\u001b[0m \u001b[38;5;66;03m# Print the cmdline to be run to double check\u001b[39;00m\n\u001b[1;32m 16\u001b[0m \u001b[38;5;28mprint\u001b[39m(cp\u001b[38;5;241m.\u001b[39mcmdline)\n",
"\u001b[0;31mTypeError\u001b[0m: cp.__init__() got an unexpected keyword argument 'in_file'"
]
}
],
"source": [
"from pydra.design import shell\n",
"from pydra.engine.helpers import list_fields\n",
"\n",
"test_file = \"./in.txt\"\n",
"with open(test_file, \"w\") as f:\n",
" f.write(\"this is a test file\\n\")\n",
"\n",
"# Define the shell-command task specification\n",
"Cp = shell.define(\"cp <in_file> <out|out_file>\")\n",
"\n",
"# Parameterise the task spec\n",
"cp = Cp(in_file=test_file, out_file=\"./out.txt\")\n",
"\n",
"# Print the cmdline to be run to double check\n",
"print(cp.cmdline)\n",
"\n",
"# Run the shell-comand task\n",
"cp()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If paths to output files are not provided in the parameterisation, it will default to the name of the field"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cp = Cp(in_file=test_file)\n",
"print(cp.cmdline)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"By default, shell-command fields are considered to be of `fileformats.generic.FsObject` type. However, more specific file formats or built-in Python types can be specified by appending the type to the field name after a `:`.\n",
"\n",
"File formats are specified by their MIME type or \"MIME-like\" strings (see the [FileFormats docs](https://arcanaframework.github.io/fileformats/mime.html) for details)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from fileformats.image import Png\n",
"\n",
"TrimPng = shell.define(\"trim-png <in_image:image/png> <out|out_image:image/png>\")\n",
"\n",
"trim_png = TrimPng(in_image=Png.mock())\n",
"\n",
"print(trim_png.cmdline)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Adding options\n",
"\n",
"Command line flags can also be added to the shell template, either the single or double hyphen form. The field template name immediately following the flag will be associate with that flag.\n",
"\n",
"If there is no space between the flag and the field template, then the field is assumed to be a boolean, otherwise it is assumed to be of type string unless otherwise specified.\n",
"\n",
"If a field is optional, the field template should end with a `?`. Tuple fields are specified by comma separated types.\n",
"\n",
"Varargs are specified by the type followed by an ellipsis, e.g. `<my_varargs:generic/file,...>`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"Cp = shell.define(\n",
" (\n",
" \"cp <in_fs_objects:fs-object,...> <out|out_dir:directory> \"\n",
" \"-R<recursive> \"\n",
" \"--text-arg <text_arg?> \"\n",
" \"--int-arg <int_arg:int?> \"\n",
" \"--tuple-arg <tuple_arg:int,str?> \"\n",
" ),\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Specifying defaults\n",
"\n",
"Defaults can be specified by appending them to the field template after `=`"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"'--int-arg' default: 99\n"
]
}
],
"source": [
"Cp = shell.define(\n",
" (\n",
" \"cp <in_fs_objects:fs-object,...> <out|out_dir:directory> \"\n",
" \"-R<recursive=True> \"\n",
" \"--text-arg <text_arg='foo'> \"\n",
" \"--int-arg <int_arg:int=99> \"\n",
" \"--tuple-arg <tuple_arg:int,str=(1,'bar')> \"\n",
" ),\n",
" )\n",
"\n",
"fields = {f.name: f for f in list_fields(Cp)}\n",
"print(f\"'--int-arg' default: {fields['int_arg'].default}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Specifying other field attributes\n",
"\n",
"Additional attributes of the fields in the template can be specified by providing `shell.arg` or `shell.outarg` fields to the `inputs` and `outputs` keyword arguments to the define"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"Cp = shell.define(\n",
" (\n",
" \"cp <in_fs_objects:fs-object,...> <out|out_dir:directory> <out|out_file:file?> \"\n",
" \"-R<recursive> \"\n",
" \"--text-arg <text_arg> \"\n",
" \"--int-arg <int_arg:int?> \"\n",
" \"--tuple-arg <tuple_arg:int,str> \"\n",
" ),\n",
" inputs={\"recursive\": shell.arg(\n",
" help_string=(\n",
" \"If source_file designates a directory, cp copies the directory and \"\n",
" \"the entire subtree connected at that point.\"\n",
" )\n",
" )},\n",
" outputs={\n",
" \"out_dir\": shell.outarg(position=-2),\n",
" \"out_file\": shell.outarg(position=-1),\n",
" },\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Callable outptus\n",
"\n",
"In addition to outputs that are specified to the tool on the command line, outputs can be derived from the outputs of the tool by providing a Python function that can take the output directory and inputs as arguments and return the output value"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from pathlib import Path\n",
"from fileformats.generic import File\n",
"\n",
"\n",
"def get_file_size(out_file: Path) -> int:\n",
" result = os.stat(out_file)\n",
" return result.st_size\n",
"\n",
"\n",
"ACommand = shell.define(\n",
" name=\"a-command <in_file:file> <out|out_file:file>\",\n",
" outputs=[\n",
" shell.out(\n",
" name=\"out_file_size\",\n",
" type=int,\n",
" help_string=\"size of the output directory\",\n",
" callable=get_file_size,\n",
" )\n",
" ],\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Dataclass form\n",
"\n",
"Like with Python tasks, shell-tasks can also be specified in dataclass-form by using `shell.define` as a decorator"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from fileformats.generic import FsObject, Directory\n",
"from pydra.utils.typing import MultiInputObj\n",
"\n",
"@shell.define\n",
"class Cp:\n",
"\n",
" executable = \"cp\"\n",
"\n",
" in_fs_objects: MultiInputObj[FsObject]\n",
" recursive: bool = False\n",
" text_arg: str\n",
" int_arg: int | None = None\n",
" tuple_arg: tuple[int, str] | None = None\n",
"\n",
" class Outputs:\n",
" out_dir: Directory "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Or alternatively in its canonical form, which is preferred when developing tool-packages as it will be type-checkable"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"@shell.define\n",
"class Cp(shell.Spec[\"Cp.Outputs\"]):\n",
"\n",
" executable = \"cp\"\n",
"\n",
" in_fs_objects: MultiInputObj[FsObject] = shell.arg()\n",
" recursive: bool = shell.arg(default=False)\n",
" text_arg: str = shell.arg()\n",
" int_arg: int | None = shell.arg(default=None)\n",
" tuple_arg: tuple[int, str] | None = shell.arg(default=None)\n",
"\n",
" @shell.outputs\n",
" class Outputs(shell.Outputs):\n",
" out_dir: Directory = shell.outarg(path_template=\"{out_dir}\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Dynamic form\n",
"\n",
"In some cases, it is required to generate the specification for a task dynamically, which can be done by just providing the executable to `shell.define` and specifying all inputs and outputs explicitly"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ACommand = shell.define(\n",
" name=\"a-command\",\n",
" inputs={\n",
" \"in_file\": shell.arg(type=File, help_string=\"output file\", argstr=\"\", position=-1)\n",
" },\n",
" outputs={\n",
" \"out_file\": shell.outarg(\n",
" type=File, help_string=\"output file\", argstr=\"\", position=-1\n",
" ),\n",
" \"out_file_size\": {\n",
" \"type\": int,\n",
" \"help_string\": \"size of the output directory\",\n",
" \"callable\": get_file_size,\n",
" }\n",
" },\n",
" )"
]
},
{
Expand All @@ -14,8 +352,22 @@
}
],
"metadata": {
"kernelspec": {
"display_name": "wf12",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python"
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.5"
}
},
"nbformat": 4,
Expand Down
Loading

0 comments on commit 53f28bd

Please sign in to comment.