Skip to content

Releases: aws/aws-parallelcluster

AWS ParallelCluster v3.12.0

18 Dec 22:10
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 3.12.0

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

  • Add new build image configuration section Build/Installation to turn on/off Nvidia software and Lustre client installations. By default, Nvidia software, although included in official ParallelCluster AMIs, is not installed by build-image. By default, Lustre client is installed.
  • The CLI commands export-cluster-logs and export-image-logs can now by default export the logs to the default ParallelCluster bucket or to the CustomS3Bucket if specified in the config.
  • Extend Amazon DCV support to Ubuntu2204 on ARM instances.

CHANGES

  • Upgrade NVIDIA driver to version 550.127.08 (from 550.90.07). This addresses a known issue from Nivdia.
  • Upgrade Amazon DCV to version 2024.0-18131.
    • server: 2024.0-18131-1
    • xdcv: 2024.0.631-1
    • gl: 2024.0.1078-1
    • web_viewer: 2024.0-18131-1
  • Upgrade EFA installer to 1.36.0.
    • Efa-driver: efa-2.13.0-1
    • Efa-config: efa-config-1.17-1
    • Efa-profile: efa-profile-1.7-1
    • Libfabric-aws: libfabric-aws-1.22.0-1
    • Rdma-core: rdma-core-54.0-1
    • Open MPI: openmpi40-aws-4.1.7-1 and openmpi50-aws-5.0.5
  • Auto-restart slurmctld on failure.
  • Upgrade mysql-community-client to version 8.0.39.
  • Remove support for Python 3.7 and 3.8, which are in end of life.

BUG FIXES

  • Fix an issue where changes in sequence of custom actions scripts were not detected during cluster updates.
  • Add missing permissions for ParallelCluster API to create the service linked roles for Elastic Load Balancing and Auto Scaling, that are required to deploy login nodes.
  • Fix an issue in the way we get region when manage volumes so that it can correctly handle local zone.
  • Fix an issue where adding EFS filesystems with AccessPointIds during an update would fail.
  • Fix an issue where when using PCAPI, cluster update could fail when updating a parameter that is not type String (e.g. MaxCount).
  • When mounting an external OpenZFS, it is no longer required to set the outbound rules for ports 111, 2049, 20001, 20002, 20003.

AWS ParallelCluster v3.11.1

21 Oct 16:54
c877343
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 3.11.1

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

CHANGES

  • Pyxis is now disabled by default, so it must be manually enabled as documented in the product documentation.
  • Upgrade Python runtime to version 3.12 in ParallelCluster Lambda Layer.
  • Remove version pinning for setuptools to version prior to 70.0.0.
  • Upgrade libjwt to version 1.17.0.

BUG FIXES

  • Fix an issue in the way we configure the Pyxis Slurm plugin in ParallelCluster that can lead to job submission failures.
    #6459
  • Add missing permissions required by login nodes to the public template of policies.

AWS ParallelCluster v3.11.0

26 Sep 18:26
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 3.11.0

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

  • Add support for custom actions on login nodes.
  • Allow DCV connection to login nodes.
  • Add support for ap-southeast-3 region.
  • Add security groups to login node network load balancer.
  • Add AllowedIps configuration for login nodes.
  • Add new configuration SharedStorage/EfsSettings/AccessPointId to specify an optional EFS access point for a mount
  • Allow up to 10 login node pools.
  • Install enroot and pyxis in official pcluster AMIs

CHANGES

  • [BREAKING] The loginNodes field returned by the API DescribeCluster and the CLI command describe-cluster
    has been changed from a dictionary to an array to support multiple pools of login nodes.
    This change breaks backward compatibility, making these operations incompatible with clusters deployed with older versions.
  • Upgrade Slurm to 23.11.10 (from 23.11.7).
  • Upgrade Pmix to 5.0.3 (from 5.0.2).
  • Upgrade EFA installer to 1.34.0.
    • Efa-driver: efa-2.10.0-1
    • Efa-config: efa-config-1.17-1
    • Efa-profile: efa-profile-1.7-1
    • Libfabric-aws: libfabric-aws-1.22.0-1
    • Rdma-core: rdma-core-52.0-1
    • Open MPI: openmpi40-aws-4.1.6-3 and openmpi50-aws-5.0.3-11
  • Upgrade NVIDIA driver to version 550.90.07 (from 535.183.01).
  • Upgrade CUDA Toolkit to version 12.4.1 (from 12.2.2).
  • Upgrade Python to 3.9.20 (from 3.9.19).
  • Upgrade Intel MPI Library to 2021.13.1.769 (from 2021.12.1.8).

BUG FIXES

  • Fix validator EfaPlacementGroupValidator so that it does not suggest to configure a Placement Group when Capacity Blocks are used.
  • Fix occasional cluster creation failures by ensuring that FSx for Lustre file systems are created after security group rules.
  • Fix cluster deletion failure when placement group is enabled.
  • Fix issue with login nodes being marked unhealthy when restricting SSH access.
  • Fix retrieve_supported_regions so that it can get the correct S3 url.
  • Fix describe_images to use pagination.
  • Fix No route tables found bug when specifying default VPC subnet to LoginNodes/Networking/SubnetIds.

AWS ParallelCluster v3.10.1

08 Jul 20:05
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 3.10.1

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

BUG FIXES

  • Fix image build failure in China regions.

AWS ParallelCluster v3.10.0

27 Jun 21:42
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 3.10.0

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

  • Add new configuration section Scheduling/SlurmSettings/ExternalSlurmdbd to connect the cluster to an external Slurmdbd.
  • Allow build-image to be run in an isolated network.
  • Add support for Amazon Linux 2023.
  • Add support for price-capacity-optimized as an AllocationStrategy.
  • Add validator to prevent the use of Placement Groups with Capacity Blocks.

CHANGES

  • CentOS 7 is no longer supported.
  • Upgrade Cinc Client to version to 18.4.12 from 18.2.7.
  • Upgrade munge to version 0.5.16 (from 0.5.15).
  • Upgrade Pmix to 5.0.2 (from 4.2.9).
  • Upgrade third-party cookbook dependencies:
    • apt-7.5.22 (from apt-7.5.14)
    • openssh-2.11.12 (from openssh-2.11.3)
  • Remove third-party cookbook: selinux-6.1.12.
  • Upgrade EFA installer to 1.32.0.
    • Efa-driver: efa-2.8.0-1
    • Efa-config: efa-config-1.16-1
    • Efa-profile: efa-profile-1.7-1
    • Libfabric-aws: libfabric-aws-1.21.0-1
    • Rdma-core: rdma-core-50.0-1
    • Open MPI: openmpi40-aws-4.1.6-3 and openmpi50-aws-5.0.2-12
  • Upgrade NVIDIA driver to version 535.183.01 (from 535.154.05).
  • Upgrade Python to 3.9.19 (from 3.9.17).
  • Upgrade Intel MPI Library to 2021.12.1.8 (from 2021.9.0.43482).

BUG FIXES

  • Fix Data Repository Associations configuration to make AutoExportPolicy and AutoImportPolicy optional.
  • Fixed an issue during cluster deletion that now completes compute fleet cleanup when instances are either in shutting-down or terminated state.
    This is to avoid cluster deletion failures for instance types with longer termination cycles.
  • Allow cloudwatch dashboard to be enabled and alarms to be disabled in the Monitoring section of the cluster config.
  • Allow ParallelCluster Custom Resource to suppress validators using PclusterCluster/SuppressValidators.
  • Removing /etc/profile.d/pcluster.sh so that it's not executed at every user login and
    cfn_bootstrap_virtualenv is not added in PATH environment variable.
  • Fix ParallelCluster API spec by replacing field failureReason with failures in DescribeCluster response.
  • Fix ParallelCluster API spec by adding the CloudFormation stack status that were missing:
    IMPORT_*, REVIEW_IN_PROGRESS and UPDATE_FAILED.
  • Fix an issue that prevented cluster updates from including EFS filesystems with encryption in transit.
  • Fix an issue that prevented slurmctld and slurmdbd services from restarting on head node reboot when
    EFS is used for shared internal data.
  • On Ubuntu systems, remove default logrotate configuration for cloud-init log files that clashed with the
    configuration coming from Parallelcluster.
  • Fix image build failure with RHEL 8.10 or newer.

AWS ParallelCluster v3.9.3

19 Jun 12:19
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 3.9.3

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

  • Add support for FSx Lustre as a shared storage type in us-iso-east-1.

BUG FIXES

  • Remove cloud_dns from the SlurmctldParameters in the Slurm config to avoid Slurm fanout issues.
    This is also not required since we set the IP addresses on instance launch.

AWS ParallelCluster v3.9.2

28 May 19:20
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 3.9.2

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

CHANGES

  • Upgrade Slurm to 23.11.7 (from 23.11.4).

AWS ParallelCluster v3.9.1

11 Apr 10:42
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 3.9.1

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

BUG FIXES

  • Remove recursive deletion of shared storage mountdir when unmounting filesystems as part of update-cluster operation.

AWS ParallelCluster v3.9.0

12 Mar 01:27
0303ec9
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 3.9.0

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

  • Permit to update the external shared storage of type Efs, FsxLustre, FsxOntap, FsxOpenZfs and FileCache
    without replacing compute and login fleet.
  • Permit to update MinCount, MaxCount, Queue and ComputeResource configuration parameters without the need to
    stop the compute fleet. It's now possible to update them by setting Scheduling/SlurmSettings/QueueUpdateStrategy
    to TERMINATE. ParallelCluster will terminate only the nodes removed during a resize of the cluster capacity
    performed through a cluster update.
  • Add support for RHEL9.
  • Add support for Rocky Linux 9 as CustomAmi created through build-image process. No public official ParallelCluster Rocky9 Linux AMI is made available at this time.
  • Remove CommunicationParameters from the Custom Slurm Settings deny list.
  • Add the configuration parameter DeploymentSettings/DefaultUserHome to allow users to move the default user's home directory to /local/home instead of /home (default).
  • Add configuration parameter DeploymentSettings/DisableSudoAccessForDefaultUser to disable sudo access of default user in supported OSes.

CHANGES

  • Upgrade Slurm to 23.11.4 (from 23.02.7).
    • Upgrade Pmix to 4.2.9 (from 4.2.6).
  • Add support for Python 3.11, 3.12 in pcluster CLI and aws-parallelcluster-batch-cli.
  • Build network interfaces using network card index from NetworkCardIndex list of EC2 DescribeInstances response,
    instead of looping over MaximumNetworkCards range.
  • Fail cluster creation when using instance types P3, G3, P2 and G2 because their GPU architecture is not compatible with Open Source Nvidia Drivers (OpenRM) introduced as part of 3.8.0 release.
  • Upgrade the default FSx Lustre server version managed by ParallelCluster to 2.15.
  • Upgrade NVIDIA driver to version 535.154.05.
  • Upgrade EFA installer to 1.30.0.
    • Efa-driver: efa-2.6.0-1
    • Efa-config: efa-config-1.15-1
    • Efa-profile: efa-profile-1.6-1
    • Libfabric-aws: libfabric-aws-1.19.0
    • Rdma-core: rdma-core-46.0-1
    • Open MPI: openmpi40-aws-4.1.6-2 and openmpi50-aws-5.0.0-11
  • Upgrade NICE DCV to version 2023.1-16388.
    • server: 2023.1.16388-1
    • xdcv: 2023.1.565-1
    • gl: 2023.1.1047-1
    • web_viewer: 2023.1.16388-1
  • Upgrade ARM PL to version 23.10.
  • Upgrade third-party cookbook dependencies:
    • nfs-5.1.2 (from nfs-5.0.0)

BUG FIXES

  • Refactor IAM policies defined in CloudFormation template parallelclutser-policies.yaml to prevent ParallelCluster API deployment failure caused by policies exceeding IAM limits.
  • Fix issue making job fail when submitted as active directory user from login nodes. The issue was caused by an incomplete configuration of the integration with the external Active Directory on the head node.
  • Fix issue making login nodes fail to bootstrap when the head node takes more time than expected in writing keys.

AWS ParallelCluster v3.8.0

19 Dec 17:40
Compare
Choose a tag to compare

We're excited to announce the release of AWS ParallelCluster 3.8.0

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

ENHANCEMENTS

  • Add support for EC2 Capacity Blocks for ML.
  • Add support for Rocky Linux 8 as CustomAmi created through build-image process. No public official ParallelCluster Rocky8 Linux AMI is made available at this time.
  • Add Scheduling/ScalingStrategy parameter to control the cluster scaling strategy to use when launching EC2 instances for Slurm compute nodes.
    Possible values are all-or-nothing, greedy-all-or-nothing, best-effort, with all-or-nothing being the default.
  • Add HeadNode/SharedStorageType parameter to use EFS storage instead of NFS exports from the head node root volume
    for intra-cluster shared file system resources: ParallelCluster, Intel, Slurm, and /home data. This enhancement reduces the load on the head node networking.
  • Allow for mounting home as an EFS or FSx external shared storage via the SharedStorage section of the config file.
  • Add new parameter SlurmSettings/MungeKeySecretArn to permit to use an external user-defined MUNGE key from AWS Secrets Manager.
  • Add Monitoring/Alarms/Enabled parameter to toggle Amazon CloudWatch Alarms for the cluster.
  • Add head node alarms to monitor EC2 health checks, CPU utilization and the overall status of the head node, and add them to the CloudWatch Dashboard created with the cluster.
  • Add support for Data Repository Associations when using PERSISTENT_2 as DeploymentType for a managed FSx for Lustre.
  • Add Scheduling/SlurmSettings/Database/DatabaseName parameter to allow users to specify a custom name for the database on the database server to be used for Slurm accounting.
  • Make InstanceType an optional configuration parameter when configuring CapacityReservationTarget/CapacityReservationId in the compute resource.
  • Add possibility to specify a prefix for IAM roles and policies created by ParallelCluster API.
  • Add possibility to specify a permissions boundary to be applied for IAM roles and policies created by ParallelCluster API.
  • Add support for il-central-1 region.

CHANGES

  • Upgrade Slurm to 23.02.7 (from 23.02.6).
  • Upgrade NVIDIA driver to version 535.129.03.
  • Upgrade CUDA Toolkit to version 12.2.2.
  • Use Open Source NVIDIA GPU drivers (OpenRM) as NVIDIA kernel module for Linux instead of NVIDIA closed source module.
  • Remove support of all_or_nothing_batch configuration parameter in the Slurm resume program, in favor of the new Scheduling/ScalingStrategy cluster configuration.
  • Changed cluster alarms naming convention to '[cluster-name]-[component-name]-[metric]'.
  • Change default EBS volume types in ADC regions from gp2 to gp3, for both the root and additional volumes.
  • The optional permissions boundary for the ParallelCluster API is now applied to every IAM role created by the API infrastructure.
  • Upgrade EFA installer to 1.29.1.
    • Efa-driver: efa-2.6.0-1
    • Efa-config: efa-config-1.15-1
    • Efa-profile: efa-profile-1.5-1
    • Libfabric-aws: libfabric-aws-1.19.0-1
    • Rdma-core: rdma-core-46.0-1
    • Open MPI: openmpi40-aws-4.1.6-1
  • Upgrade GDRCopy to version 2.4 in all supported OSes, except for Centos 7 where version 2.3.1 is used.
  • Upgrade aws-cfn-bootstrap to version 2.0-28.
  • Add support for Python 3.10 in aws-parallelcluster-batch-cli.

BUG FIXES

  • Fix inconsistent scaling configuration after cluster update rollback when modifying the list of instance types declared in the Compute Resources.
  • Fix users SSH keys generation when switching users without root privilege in clusters integrated with an external LDAP server through cluster configuration files.
  • Fix disabling Slurm power save mode when setting ScaledownIdletime = -1.
  • Fix hard-coded path to Slurm installation dir in update_slurm_database_password.sh script for Slurm Accounting.