Releases: allegroai/clearml-agent
Releases · allegroai/clearml-agent
PyPI v1.0.0 - ClearML-Agent
Features
- Add conda and pip environment debug prints (using
--debug
) - Add support for PyJWT v2
- Change the default conda channel order, so it pulls the correct
pytorch
package - Improve k8s glue support
- Support k8s glue container env vars merging
- Add number of pods limit to k8s glue using the
max_pods_limit
argument (use--max-pods
switch in the k8s glue example) - Add k8s glue default
restartPolicy=Never
to template to prevent pods from restarting
- Add
--stop
switch support for dynamic gpus - Verify
docker
command exists when running in docker mode - Add support for terminating dockers on
sig_term
in dynamic mode - Add stopping message on Task process termination
- Add
agent.docker_install_opencv_libs
configuration option to enable automatic opencv libs install for faster docker spin-up (default:true
, see here) - Add support for new container base setup script feature
- Bump virtualenv dependency version (support
v>=16,<21
) - Add support for dynamic gpus opportunistic scheduling (with min/max gpus per queue)
- Deprecate
venv_update
in configuration (replaced by the more robustvenvs_cache
) - Add Python 3.9 to the support table
Bug Fixes
- Fix agent can return non-zero error code and pods will end up restarting forever #56
- Fix poetry support #57
- Fix cuda version from driver does not return minor version
- Fix requirements local path replace back when using cache
- Fix k8s glue
- Fix broken k8s glue docker args parsing
- Fix empty env prevents override when merging template
- Fix venv cache crash on bad symbolic links
- Fix no docker arguments provided
PyPI v0.17.2 - ClearML-Agent
Features
- Add virtual environment caching
- Supports venv caching both in standard and docker mode
- Configurable using the
agent.venvs_cache
configuration section - Disabled by default, enable here
- Add support for
--services-mode
with venvs - Add
agent.force_git_ssh_user
configuration value (defaultgit
, see here) #42 - Add
agent.ignore_requested_python_version
configuration option for multi python environments (defaultfalse
) - Add
agent.enable_task_env
configuration option to set the OS environment based on the Environment section of the Task (defaultfalse
, see here) - K8s glue
- Add support for detecting and deleting k8s pods that fail to start
- Allow providing namespace in k8s glue and k8s glue example
- Add base-pod-number parameter to k8s glue and example
- Change
agent.default_docker.image
tonvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
(see here) - Use shared git cache for multiple agents on the same machine
- Upgrade pynvml add detect CUDA version from driver level
- Update agent and services docker files
- Update documentation
Bug Fixes
- Fix
docker --network
returnsNone
- Fix docker mode without venvs cache dir
- Fix applying git diff on a newly added file
- Fix environment variables
CLEARML_WEB_HOST
/CLEARML_FILES_HOST
not passed to running tasks (or updated on the config object) - Fix
--detached
command line option not supported on Windows (ignore and issue warning) - Fix file not found error (errno 2) interpreted as aborted (i.e.
Ctrl-C
) - Fix
from clearml
runtime diff patching - Fix cache to take cuda version into account
- Fix CPU mode
- Fix multi instances on Windows
- Fix conda support for
git+http
links - Fix k8s glue does not pass docker environment variables, remove deprecated flags
PyPI v0.17.1 - ClearML-Agent
ClearML-Agent (formerly allegro trains-agent)
Features and Bug Fixes
- Fix support for pip virtual-environment on Windows
- Fix support for conda using repository requirements.txt (empty "Installed Packages" section)
PyPI v0.17.0 - ClearML-Agent
ClearML-Agent (formerly allegro trains-agent)
Breaking Changes
Package renamed to clearml-agent
pip install clearml-agent
clearml-agent daemon --docker ...
PyPI v0.16.3
Features and Bug Fixes
- Update PyJWT requirement (v2.0.0 breaks interface)
- Update other requirements constraints
- Change k8s pod naming scheme in k8s glue to include queue name, conform queue name to k8s standard
PyPI v0.16.2
Features
- conda
- Add
agent.package_manager.conda_env_as_base_docker
allowing "docker_cmd" to contain link to a full pre-packaged conda environment (tar.gz
created byconda-pack
). UseTRAINS_CONDA_ENV_PACKAGE
environment variable to specifyconda tar.gz
file. - Add conda support for read-only pre-built environment (pass conda folder as
docker_cmd
on Task) - Improve trying to find conda executable
- Add
- k8s glue
- Add support for limited number of services exposing ports
- Add support for k8s pod custom user properties
- Allow selecting external
trains.conf
file for the pod itself - Allow providing pod template, extra bash init script, alternate SSH server port, gateway address (k8s ingress/ELB)
- Allow specifying
cudatoolkit
version in the "installed packages" section when using conda as package manager allegroai/clearml#229 - Add
agent.package_manager.force_repo_requirements_txt
. If True, "Installed Packages" on Task are ignored, and only repositoryrequirements.txt
is used - Pass
TRAINS_DOCKER_IMAGE
into docker for interactive sessions - Add
torchcsprng
andtorchtext
to PyTorch resolving
Bug Fixes
- When logging suppress "\r" when reading a current chunk of a file/stream. Add
agent.suppress_carriage_return
(default True) to support previous behavior - Make sure
TRAINS_AGENT_K8S_HOST_MOUNT
is used only once per mount - Fix k8s glue script to trains-agent default docker script
- Fix apply git diff from submodule only
- conda
- Fix conda pip freeze to be consistent with trains 0.16.3
- Fix conda environment support for trains 0.16.3 full env. Add
agent.package_manager.conda_full_env_update
to allow conda to update back the requirements (default False, to preserve previous behavior) - Fix running from conda environment -
conda.sh
not found in first conda PATH match
- Fix docker mode ubuntu/debian support by making sure not to ask for input (fix
tzdata
install) - Fix repository detection - ignore environment
SSH_AUTH_SOCK
, only check if git user/pass are configured - git diff
- Fix support for non-ascii diff
- Fix diff with empty line at the end will cause corrupt diff apply message
- Allow zero context diffs (useful when blind patching repository)
- Fix
daemon --stop
when agent UID cannot be located - Fix nvidia docker support on some linux distros (SUSE)
- Fix nvidia pytorch dockers support
- Fix torch CUDA 11.1 support
- Fix requirements dict with null entry in
pip
should be considered None install from repository'srequirements.txt
PyPI v0.16.1
Features
- Add
sdk.metrics.plot_max_num_digits
configuration option to reduce plot storage size - Add
agent.package_manager.post_packages
andagent.package_manager.post_optional_packages
configuration options to control packages install order (e.g. horovod) - Add
agent.git_host
configuration option for limiting git credential usage for a specific host (overridable usingTRAINS_AGENT_GIT_HOST
environment variable) - Add
agent.force_git_ssh_port
configuration option to controlhttps
tossh
link conversion for non standardssh
ports - Add requirements detection features
- Improve support for detecting new pip version (20+) supporting
package @ scheme://link
- Improve support for detecting new pip version (20+) supporting
Bug Fixes
- Fix pre-installed packages are ignored when installing a git package wheel. Reinstalling a
git+http
link is enough to make sure all requirements are met/installed allegroai/clearml#196 - Fix incorrect check for spaces in current execution folder
- Fix requirements detection
- Update torch version after using downloaded / system pre-installed version
- Do not install git packages twice when a new pip version is used (pip freeze will detect the correct git link version)
PyPI v0.16.0
Features
- Add
agent.docker_init_bash_script
configuration section to allow finer control over docker startup script - Changed default docker image from
nvidia/cuda
tonvidia/cuda:10.1-runtime-ubuntu18.04
to supportcudnn
frameworks (e.g. TF) - Improve support for dockers with preinstalled
conda
environment - Improve trains-agent-docker spinning
- Add
daemon --order-fairness
for round-robin queue pulling - Add
daemon --stop
to terminate a running agent (assuming other arguments are the same)- If no additional arguments, Agents are terminated in lexicographical order
- Support cleanup of all log files on termination unless executed with
--debug
- Add error message when Trains API Server is not accessible on startup
Bug Fixes
- Fix GPU Windows monitoring support allegroai/clearml#177
- Fix
.git-credentials
and.gitconfig
mapping into docker - Fix non-root docker image usage
- Fix docker to use
UTF-8
encoding, so prints won't break it - Fix
--debug
to set all loggers toDEBUG
- Fix task status change to
queued
should never happen during Task runtime - Fix
requirement_parser
to supportpackage @ git+http
lines - Fix GIT user/password in requirements and support for
-e git+http
lines - Fix configuration wizard to generate
trains.conf
matching latest Trains definitions
PyPI v0.15.1
Features
- Add Trains Agent Daemon and Services docker files
Bug Fixes
- Fix initialization wizard (allow at most two verification retries, then print error)
- Add warning on
--gpus
with no detected CUDA version #24 - Add
agent.force_git_ssh_protocol
configuration option to force all git links tossh://
#16 - Add git user/pass permission into pip package installation from Git repository #22
PyPI v0.15.0
Features
- Add daemon Services Mode (
daemon --services-mode
) where the daemon spins a task in its own docker and verifies start-up and shut-down. This allows multiple tasks to be launched simultaneously on the same machine (currently in CPU mode only), where each task service will register itself as a worker for the lifetime of the task - Enhance
build --docker
mode- Add
--install-globally
option to install required packages in the docker's system python - Add
--entry-point
option to allow automatic task cloning when running the docker
- Add
- Support PyTorch Nightly builds using the
agent.torch_nightly
configuration flag. Iftrue
, the agent looks for a nightly build when a stable torch wheel is not found - Add environment variables support for git user/password
- Using
TRAINS_AGENT_GIT_USER
/TRAINS_AGENT_GIT_PASS
- Pass git credentials to dockerized experiment execution
- Using
- Support running code from module (i.e.
-m
in execution entry point) - Add daemon
--create-queue
to automatically create a queue and use it if queue name doesn't exist in the server - Move
--gpus
and--cpu-only
to worker args (used by daemon, execute and build)
Bug Fixes
- Fix init wizard, correctly display the input servers #19
- Fix version control links in requirements when using
conda
- Fix
build --docker
mode standalone docker execution - Improve docker host-mount support, use
TRAINS_AGENT_DOCKER_HOST_MOUNT
environment variable - Support
pip
v20.1 local/http package reference inpip freeze
- Fix detached mode to correctly use cache folder slots
- Fix
CUDA_VISIBLE_DEVICES
should never be set to "all" (Trains Slack channel thread) - Do not monitor GPU when running with
--cpu-only