Skip to content

Releases: allegroai/clearml-agent

PyPI v1.0.0 - ClearML-Agent

03 May 18:00
Compare
Choose a tag to compare

Features

  • Add conda and pip environment debug prints (using --debug)
  • Add support for PyJWT v2
  • Change the default conda channel order, so it pulls the correct pytorch package
  • Improve k8s glue support
    • Support k8s glue container env vars merging
    • Add number of pods limit to k8s glue using the max_pods_limit argument (use --max-pods switch in the k8s glue example)
    • Add k8s glue default restartPolicy=Never to template to prevent pods from restarting
  • Add --stop switch support for dynamic gpus
  • Verify docker command exists when running in docker mode
  • Add support for terminating dockers on sig_term in dynamic mode
  • Add stopping message on Task process termination
  • Add agent.docker_install_opencv_libs configuration option to enable automatic opencv libs install for faster docker spin-up (default: true, see here)
  • Add support for new container base setup script feature
  • Bump virtualenv dependency version (support v>=16,<21)
  • Add support for dynamic gpus opportunistic scheduling (with min/max gpus per queue)
  • Deprecate venv_update in configuration (replaced by the more robust venvs_cache)
  • Add Python 3.9 to the support table

Bug Fixes

  • Fix agent can return non-zero error code and pods will end up restarting forever #56
  • Fix poetry support #57
  • Fix cuda version from driver does not return minor version
  • Fix requirements local path replace back when using cache
  • Fix k8s glue
    • Fix broken k8s glue docker args parsing
    • Fix empty env prevents override when merging template
  • Fix venv cache crash on bad symbolic links
  • Fix no docker arguments provided

PyPI v0.17.2 - ClearML-Agent

04 Mar 18:16
Compare
Choose a tag to compare

Features

  • Add virtual environment caching
    • Supports venv caching both in standard and docker mode
    • Configurable using the agent.venvs_cache configuration section
    • Disabled by default, enable here
  • Add support for --services-mode with venvs
  • Add agent.force_git_ssh_user configuration value (default git, see here) #42
  • Add agent.ignore_requested_python_version configuration option for multi python environments (default false)
  • Add agent.enable_task_env configuration option to set the OS environment based on the Environment section of the Task (default false, see here)
  • K8s glue
    • Add support for detecting and deleting k8s pods that fail to start
    • Allow providing namespace in k8s glue and k8s glue example
    • Add base-pod-number parameter to k8s glue and example
  • Change agent.default_docker.image to nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04 (see here)
  • Use shared git cache for multiple agents on the same machine
  • Upgrade pynvml add detect CUDA version from driver level
  • Update agent and services docker files
  • Update documentation

Bug Fixes

  • Fix docker --network returns None
  • Fix docker mode without venvs cache dir
  • Fix applying git diff on a newly added file
  • Fix environment variables CLEARML_WEB_HOST/CLEARML_FILES_HOST not passed to running tasks (or updated on the config object)
  • Fix --detached command line option not supported on Windows (ignore and issue warning)
  • Fix file not found error (errno 2) interpreted as aborted (i.e. Ctrl-C)
  • Fix from clearml runtime diff patching
  • Fix cache to take cuda version into account
  • Fix CPU mode
  • Fix multi instances on Windows
  • Fix conda support for git+http links
  • Fix k8s glue does not pass docker environment variables, remove deprecated flags

PyPI v0.17.1 - ClearML-Agent

25 Dec 00:33
Compare
Choose a tag to compare

ClearML-Agent (formerly allegro trains-agent)

Features and Bug Fixes

  • Fix support for pip virtual-environment on Windows
  • Fix support for conda using repository requirements.txt (empty "Installed Packages" section)

PyPI v0.17.0 - ClearML-Agent

22 Dec 22:36
Compare
Choose a tag to compare

ClearML-Agent (formerly allegro trains-agent)

Breaking Changes
Package renamed to clearml-agent

pip install clearml-agent
clearml-agent daemon --docker ...

PyPI v0.16.3

22 Dec 18:34
Compare
Choose a tag to compare

Features and Bug Fixes

  • Update PyJWT requirement (v2.0.0 breaks interface)
  • Update other requirements constraints
  • Change k8s pod naming scheme in k8s glue to include queue name, conform queue name to k8s standard

PyPI v0.16.2

10 Dec 11:03
Compare
Choose a tag to compare

Features

  • conda
    • Add agent.package_manager.conda_env_as_base_docker allowing "docker_cmd" to contain link to a full pre-packaged conda environment (tar.gz created by conda-pack). Use TRAINS_CONDA_ENV_PACKAGE environment variable to specify conda tar.gz file.
    • Add conda support for read-only pre-built environment (pass conda folder as docker_cmd on Task)
    • Improve trying to find conda executable
  • k8s glue
    • Add support for limited number of services exposing ports
    • Add support for k8s pod custom user properties
    • Allow selecting external trains.conf file for the pod itself
    • Allow providing pod template, extra bash init script, alternate SSH server port, gateway address (k8s ingress/ELB)
  • Allow specifying cudatoolkit version in the "installed packages" section when using conda as package manager allegroai/clearml#229
  • Add agent.package_manager.force_repo_requirements_txt. If True, "Installed Packages" on Task are ignored, and only repository requirements.txt is used
  • Pass TRAINS_DOCKER_IMAGE into docker for interactive sessions
  • Add torchcsprng and torchtext to PyTorch resolving

Bug Fixes

  • When logging suppress "\r" when reading a current chunk of a file/stream. Add agent.suppress_carriage_return (default True) to support previous behavior
  • Make sure TRAINS_AGENT_K8S_HOST_MOUNT is used only once per mount
  • Fix k8s glue script to trains-agent default docker script
  • Fix apply git diff from submodule only
  • conda
    • Fix conda pip freeze to be consistent with trains 0.16.3
    • Fix conda environment support for trains 0.16.3 full env. Add agent.package_manager.conda_full_env_update to allow conda to update back the requirements (default False, to preserve previous behavior)
    • Fix running from conda environment - conda.sh not found in first conda PATH match
  • Fix docker mode ubuntu/debian support by making sure not to ask for input (fix tzdata install)
  • Fix repository detection - ignore environment SSH_AUTH_SOCK, only check if git user/pass are configured
  • git diff
    • Fix support for non-ascii diff
    • Fix diff with empty line at the end will cause corrupt diff apply message
    • Allow zero context diffs (useful when blind patching repository)
  • Fix daemon --stop when agent UID cannot be located
  • Fix nvidia docker support on some linux distros (SUSE)
  • Fix nvidia pytorch dockers support
  • Fix torch CUDA 11.1 support
  • Fix requirements dict with null entry in pip should be considered None install from repository's requirements.txt

PyPI v0.16.1

05 Oct 15:47
Compare
Choose a tag to compare

Features

  • Add sdk.metrics.plot_max_num_digits configuration option to reduce plot storage size
  • Add agent.package_manager.post_packages and agent.package_manager.post_optional_packages configuration options to control packages install order (e.g. horovod)
  • Add agent.git_host configuration option for limiting git credential usage for a specific host (overridable using TRAINS_AGENT_GIT_HOST environment variable)
  • Add agent.force_git_ssh_port configuration option to control https to ssh link conversion for non standard ssh ports
  • Add requirements detection features
    • Improve support for detecting new pip version (20+) supporting package @ scheme://link

Bug Fixes

  • Fix pre-installed packages are ignored when installing a git package wheel. Reinstalling a git+http link is enough to make sure all requirements are met/installed allegroai/clearml#196
  • Fix incorrect check for spaces in current execution folder
  • Fix requirements detection
    • Update torch version after using downloaded / system pre-installed version
    • Do not install git packages twice when a new pip version is used (pip freeze will detect the correct git link version)

PyPI v0.16.0

11 Aug 14:57
Compare
Choose a tag to compare

Features

  • Add agent.docker_init_bash_script configuration section to allow finer control over docker startup script
  • Changed default docker image from nvidia/cuda to nvidia/cuda:10.1-runtime-ubuntu18.04 to support cudnn frameworks (e.g. TF)
  • Improve support for dockers with preinstalled conda environment
  • Improve trains-agent-docker spinning
  • Add daemon --order-fairness for round-robin queue pulling
  • Add daemon --stop to terminate a running agent (assuming other arguments are the same)
    • If no additional arguments, Agents are terminated in lexicographical order
  • Support cleanup of all log files on termination unless executed with --debug
  • Add error message when Trains API Server is not accessible on startup

Bug Fixes

  • Fix GPU Windows monitoring support allegroai/clearml#177
  • Fix .git-credentials and .gitconfig mapping into docker
  • Fix non-root docker image usage
  • Fix docker to use UTF-8 encoding, so prints won't break it
  • Fix --debug to set all loggers to DEBUG
  • Fix task status change to queued should never happen during Task runtime
  • Fix requirement_parser to support package @ git+http lines
  • Fix GIT user/password in requirements and support for -e git+http lines
  • Fix configuration wizard to generate trains.conf matching latest Trains definitions

PyPI v0.15.1

21 Jun 20:43
Compare
Choose a tag to compare

Features

  • Add Trains Agent Daemon and Services docker files

Bug Fixes

  • Fix initialization wizard (allow at most two verification retries, then print error)
  • Add warning on --gpus with no detected CUDA version #24
  • Add agent.force_git_ssh_protocol configuration option to force all git links to ssh:// #16
  • Add git user/pass permission into pip package installation from Git repository #22

PyPI v0.15.0

01 Jun 16:59
Compare
Choose a tag to compare

Features

  • Add daemon Services Mode (daemon --services-mode) where the daemon spins a task in its own docker and verifies start-up and shut-down. This allows multiple tasks to be launched simultaneously on the same machine (currently in CPU mode only), where each task service will register itself as a worker for the lifetime of the task
  • Enhance build --docker mode
    • Add --install-globally option to install required packages in the docker's system python
    • Add --entry-point option to allow automatic task cloning when running the docker
  • Support PyTorch Nightly builds using the agent.torch_nightly configuration flag. If true, the agent looks for a nightly build when a stable torch wheel is not found
  • Add environment variables support for git user/password
    • Using TRAINS_AGENT_GIT_USER/TRAINS_AGENT_GIT_PASS
    • Pass git credentials to dockerized experiment execution
  • Support running code from module (i.e. -m in execution entry point)
  • Add daemon --create-queue to automatically create a queue and use it if queue name doesn't exist in the server
  • Move --gpus and --cpu-only to worker args (used by daemon, execute and build)

Bug Fixes

  • Fix init wizard, correctly display the input servers #19
  • Fix version control links in requirements when using conda
  • Fix build --docker mode standalone docker execution
  • Improve docker host-mount support, use TRAINS_AGENT_DOCKER_HOST_MOUNT environment variable
  • Support pip v20.1 local/http package reference in pip freeze
  • Fix detached mode to correctly use cache folder slots
  • Fix CUDA_VISIBLE_DEVICES should never be set to "all" (Trains Slack channel thread)
  • Do not monitor GPU when running with --cpu-only