Releases: aws/aws-parallelcluster
AWS ParallelCluster v3.12.0
We're excited to announce the release of AWS ParallelCluster 3.12.0
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
ENHANCEMENTS
- Add new build image configuration section
Build/Installation
to turn on/off Nvidia software and Lustre client installations. By default, Nvidia software, although included in official ParallelCluster AMIs, is not installed bybuild-image
. By default, Lustre client is installed. - The CLI commands
export-cluster-logs
andexport-image-logs
can now by default export the logs to the default ParallelCluster bucket or to the CustomS3Bucket if specified in the config. - Extend Amazon DCV support to Ubuntu2204 on ARM instances.
CHANGES
- Upgrade NVIDIA driver to version 550.127.08 (from 550.90.07). This addresses a known issue from Nivdia.
- Upgrade Amazon DCV to version
2024.0-18131
.- server:
2024.0-18131-1
- xdcv:
2024.0.631-1
- gl:
2024.0.1078-1
- web_viewer:
2024.0-18131-1
- server:
- Upgrade EFA installer to
1.36.0
.- Efa-driver:
efa-2.13.0-1
- Efa-config:
efa-config-1.17-1
- Efa-profile:
efa-profile-1.7-1
- Libfabric-aws:
libfabric-aws-1.22.0-1
- Rdma-core:
rdma-core-54.0-1
- Open MPI:
openmpi40-aws-4.1.7-1
andopenmpi50-aws-5.0.5
- Efa-driver:
- Auto-restart slurmctld on failure.
- Upgrade mysql-community-client to version 8.0.39.
- Remove support for Python 3.7 and 3.8, which are in end of life.
BUG FIXES
- Fix an issue where changes in sequence of custom actions scripts were not detected during cluster updates.
- Add missing permissions for ParallelCluster API to create the service linked roles for Elastic Load Balancing and Auto Scaling, that are required to deploy login nodes.
- Fix an issue in the way we get region when manage volumes so that it can correctly handle local zone.
- Fix an issue where adding EFS filesystems with AccessPointIds during an update would fail.
- Fix an issue where when using PCAPI, cluster update could fail when updating a parameter that is not type
String
(e.g.MaxCount
). - When mounting an external OpenZFS, it is no longer required to set the outbound rules for ports 111, 2049, 20001, 20002, 20003.
AWS ParallelCluster v3.11.1
We're excited to announce the release of AWS ParallelCluster 3.11.1
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
CHANGES
- Pyxis is now disabled by default, so it must be manually enabled as documented in the product documentation.
- Upgrade Python runtime to version 3.12 in ParallelCluster Lambda Layer.
- Remove version pinning for setuptools to version prior to 70.0.0.
- Upgrade libjwt to version 1.17.0.
BUG FIXES
- Fix an issue in the way we configure the Pyxis Slurm plugin in ParallelCluster that can lead to job submission failures.
#6459 - Add missing permissions required by login nodes to the public template of policies.
AWS ParallelCluster v3.11.0
We're excited to announce the release of AWS ParallelCluster 3.11.0
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
ENHANCEMENTS
- Add support for custom actions on login nodes.
- Allow DCV connection to login nodes.
- Add support for ap-southeast-3 region.
- Add security groups to login node network load balancer.
- Add
AllowedIps
configuration for login nodes. - Add new configuration
SharedStorage/EfsSettings/AccessPointId
to specify an optional EFS access point for a mount - Allow up to 10 login node pools.
- Install enroot and pyxis in official pcluster AMIs
CHANGES
- [BREAKING] The
loginNodes
field returned by the APIDescribeCluster
and the CLI commanddescribe-cluster
has been changed from a dictionary to an array to support multiple pools of login nodes.
This change breaks backward compatibility, making these operations incompatible with clusters deployed with older versions. - Upgrade Slurm to 23.11.10 (from 23.11.7).
- Upgrade Pmix to 5.0.3 (from 5.0.2).
- Upgrade EFA installer to
1.34.0
.- Efa-driver:
efa-2.10.0-1
- Efa-config:
efa-config-1.17-1
- Efa-profile:
efa-profile-1.7-1
- Libfabric-aws:
libfabric-aws-1.22.0-1
- Rdma-core:
rdma-core-52.0-1
- Open MPI:
openmpi40-aws-4.1.6-3
andopenmpi50-aws-5.0.3-11
- Efa-driver:
- Upgrade NVIDIA driver to version 550.90.07 (from 535.183.01).
- Upgrade CUDA Toolkit to version 12.4.1 (from 12.2.2).
- Upgrade Python to 3.9.20 (from 3.9.19).
- Upgrade Intel MPI Library to 2021.13.1.769 (from 2021.12.1.8).
BUG FIXES
- Fix validator
EfaPlacementGroupValidator
so that it does not suggest to configure a Placement Group when Capacity Blocks are used. - Fix occasional cluster creation failures by ensuring that FSx for Lustre file systems are created after security group rules.
- Fix cluster deletion failure when placement group is enabled.
- Fix issue with login nodes being marked unhealthy when restricting SSH access.
- Fix
retrieve_supported_regions
so that it can get the correct S3 url. - Fix
describe_images
to use pagination. - Fix
No route tables found
bug when specifying default VPC subnet to LoginNodes/Networking/SubnetIds.
AWS ParallelCluster v3.10.1
We're excited to announce the release of AWS ParallelCluster 3.10.1
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
BUG FIXES
- Fix image build failure in China regions.
AWS ParallelCluster v3.10.0
We're excited to announce the release of AWS ParallelCluster 3.10.0
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
ENHANCEMENTS
- Add new configuration section
Scheduling/SlurmSettings/ExternalSlurmdbd
to connect the cluster to an external Slurmdbd. - Allow build-image to be run in an isolated network.
- Add support for Amazon Linux 2023.
- Add support for
price-capacity-optimized
as anAllocationStrategy
. - Add validator to prevent the use of Placement Groups with Capacity Blocks.
CHANGES
- CentOS 7 is no longer supported.
- Upgrade Cinc Client to version to 18.4.12 from 18.2.7.
- Upgrade munge to version 0.5.16 (from 0.5.15).
- Upgrade Pmix to 5.0.2 (from 4.2.9).
- Upgrade third-party cookbook dependencies:
- apt-7.5.22 (from apt-7.5.14)
- openssh-2.11.12 (from openssh-2.11.3)
- Remove third-party cookbook: selinux-6.1.12.
- Upgrade EFA installer to
1.32.0
.- Efa-driver:
efa-2.8.0-1
- Efa-config:
efa-config-1.16-1
- Efa-profile:
efa-profile-1.7-1
- Libfabric-aws:
libfabric-aws-1.21.0-1
- Rdma-core:
rdma-core-50.0-1
- Open MPI:
openmpi40-aws-4.1.6-3
andopenmpi50-aws-5.0.2-12
- Efa-driver:
- Upgrade NVIDIA driver to version 535.183.01 (from 535.154.05).
- Upgrade Python to 3.9.19 (from 3.9.17).
- Upgrade Intel MPI Library to 2021.12.1.8 (from 2021.9.0.43482).
BUG FIXES
- Fix Data Repository Associations configuration to make
AutoExportPolicy
andAutoImportPolicy
optional. - Fixed an issue during cluster deletion that now completes compute fleet cleanup when instances are either in shutting-down or terminated state.
This is to avoid cluster deletion failures for instance types with longer termination cycles. - Allow cloudwatch dashboard to be enabled and alarms to be disabled in the
Monitoring
section of the cluster config. - Allow ParallelCluster Custom Resource to suppress validators using
PclusterCluster/SuppressValidators
. - Removing
/etc/profile.d/pcluster.sh
so that it's not executed at every user login and
cfn_bootstrap_virtualenv
is not added in PATH environment variable. - Fix ParallelCluster API spec by replacing field
failureReason
withfailures
inDescribeCluster
response. - Fix ParallelCluster API spec by adding the CloudFormation stack status that were missing:
IMPORT_*
,REVIEW_IN_PROGRESS
andUPDATE_FAILED
. - Fix an issue that prevented cluster updates from including EFS filesystems with encryption in transit.
- Fix an issue that prevented slurmctld and slurmdbd services from restarting on head node reboot when
EFS is used for shared internal data. - On Ubuntu systems, remove default logrotate configuration for cloud-init log files that clashed with the
configuration coming from Parallelcluster. - Fix image build failure with RHEL 8.10 or newer.
AWS ParallelCluster v3.9.3
We're excited to announce the release of AWS ParallelCluster 3.9.3
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
ENHANCEMENTS
- Add support for FSx Lustre as a shared storage type in us-iso-east-1.
BUG FIXES
- Remove
cloud_dns
from theSlurmctldParameters
in the Slurm config to avoid Slurm fanout issues.
This is also not required since we set the IP addresses on instance launch.
AWS ParallelCluster v3.9.2
We're excited to announce the release of AWS ParallelCluster 3.9.2
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
CHANGES
- Upgrade Slurm to 23.11.7 (from 23.11.4).
AWS ParallelCluster v3.9.1
We're excited to announce the release of AWS ParallelCluster 3.9.1
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
BUG FIXES
- Remove recursive deletion of shared storage mountdir when unmounting filesystems as part of update-cluster operation.
AWS ParallelCluster v3.9.0
We're excited to announce the release of AWS ParallelCluster 3.9.0
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
ENHANCEMENTS
- Permit to update the external shared storage of type Efs, FsxLustre, FsxOntap, FsxOpenZfs and FileCache
without replacing compute and login fleet. - Permit to update
MinCount
,MaxCount
,Queue
andComputeResource
configuration parameters without the need to
stop the compute fleet. It's now possible to update them by settingScheduling/SlurmSettings/QueueUpdateStrategy
to TERMINATE. ParallelCluster will terminate only the nodes removed during a resize of the cluster capacity
performed through a cluster update. - Add support for RHEL9.
- Add support for Rocky Linux 9 as
CustomAmi
created throughbuild-image
process. No public official ParallelCluster Rocky9 Linux AMI is made available at this time. - Remove
CommunicationParameters
from the Custom Slurm Settings deny list. - Add the configuration parameter
DeploymentSettings/DefaultUserHome
to allow users to move the default user's home directory to/local/home
instead of/home
(default). - Add configuration parameter
DeploymentSettings/DisableSudoAccessForDefaultUser
to disable sudo access of default user in supported OSes.
CHANGES
- Upgrade Slurm to 23.11.4 (from 23.02.7).
- Upgrade Pmix to 4.2.9 (from 4.2.6).
- Add support for Python 3.11, 3.12 in pcluster CLI and aws-parallelcluster-batch-cli.
- Build network interfaces using network card index from
NetworkCardIndex
list of EC2 DescribeInstances response,
instead of looping overMaximumNetworkCards
range. - Fail cluster creation when using instance types P3, G3, P2 and G2 because their GPU architecture is not compatible with Open Source Nvidia Drivers (OpenRM) introduced as part of 3.8.0 release.
- Upgrade the default FSx Lustre server version managed by ParallelCluster to 2.15.
- Upgrade NVIDIA driver to version 535.154.05.
- Upgrade EFA installer to
1.30.0
.- Efa-driver:
efa-2.6.0-1
- Efa-config:
efa-config-1.15-1
- Efa-profile:
efa-profile-1.6-1
- Libfabric-aws:
libfabric-aws-1.19.0
- Rdma-core:
rdma-core-46.0-1
- Open MPI:
openmpi40-aws-4.1.6-2
andopenmpi50-aws-5.0.0-11
- Efa-driver:
- Upgrade NICE DCV to version
2023.1-16388
.- server:
2023.1.16388-1
- xdcv:
2023.1.565-1
- gl:
2023.1.1047-1
- web_viewer:
2023.1.16388-1
- server:
- Upgrade ARM PL to version 23.10.
- Upgrade third-party cookbook dependencies:
- nfs-5.1.2 (from nfs-5.0.0)
BUG FIXES
- Refactor IAM policies defined in CloudFormation template
parallelclutser-policies.yaml
to prevent ParallelCluster API deployment failure caused by policies exceeding IAM limits. - Fix issue making job fail when submitted as active directory user from login nodes. The issue was caused by an incomplete configuration of the integration with the external Active Directory on the head node.
- Fix issue making login nodes fail to bootstrap when the head node takes more time than expected in writing keys.
AWS ParallelCluster v3.8.0
We're excited to announce the release of AWS ParallelCluster 3.8.0
Upgrade
How to upgrade?
sudo pip install --upgrade aws-parallelcluster
ENHANCEMENTS
- Add support for EC2 Capacity Blocks for ML.
- Add support for Rocky Linux 8 as
CustomAmi
created throughbuild-image
process. No public official ParallelCluster Rocky8 Linux AMI is made available at this time. - Add
Scheduling/ScalingStrategy
parameter to control the cluster scaling strategy to use when launching EC2 instances for Slurm compute nodes.
Possible values areall-or-nothing
,greedy-all-or-nothing
,best-effort
, withall-or-nothing
being the default. - Add
HeadNode/SharedStorageType
parameter to use EFS storage instead of NFS exports from the head node root volume
for intra-cluster shared file system resources: ParallelCluster, Intel, Slurm, and/home
data. This enhancement reduces the load on the head node networking. - Allow for mounting
home
as an EFS or FSx external shared storage via theSharedStorage
section of the config file. - Add new parameter
SlurmSettings/MungeKeySecretArn
to permit to use an external user-defined MUNGE key from AWS Secrets Manager. - Add
Monitoring/Alarms/Enabled
parameter to toggle Amazon CloudWatch Alarms for the cluster. - Add head node alarms to monitor EC2 health checks, CPU utilization and the overall status of the head node, and add them to the CloudWatch Dashboard created with the cluster.
- Add support for Data Repository Associations when using
PERSISTENT_2
asDeploymentType
for a managed FSx for Lustre. - Add
Scheduling/SlurmSettings/Database/DatabaseName
parameter to allow users to specify a custom name for the database on the database server to be used for Slurm accounting. - Make
InstanceType
an optional configuration parameter when configuringCapacityReservationTarget/CapacityReservationId
in the compute resource. - Add possibility to specify a prefix for IAM roles and policies created by ParallelCluster API.
- Add possibility to specify a permissions boundary to be applied for IAM roles and policies created by ParallelCluster API.
- Add support for il-central-1 region.
CHANGES
- Upgrade Slurm to 23.02.7 (from 23.02.6).
- Upgrade NVIDIA driver to version 535.129.03.
- Upgrade CUDA Toolkit to version 12.2.2.
- Use Open Source NVIDIA GPU drivers (OpenRM) as NVIDIA kernel module for Linux instead of NVIDIA closed source module.
- Remove support of
all_or_nothing_batch
configuration parameter in the Slurm resume program, in favor of the newScheduling/ScalingStrategy
cluster configuration. - Changed cluster alarms naming convention to '[cluster-name]-[component-name]-[metric]'.
- Change default EBS volume types in ADC regions from
gp2
togp3
, for both the root and additional volumes. - The optional permissions boundary for the ParallelCluster API is now applied to every IAM role created by the API infrastructure.
- Upgrade EFA installer to
1.29.1
.- Efa-driver:
efa-2.6.0-1
- Efa-config:
efa-config-1.15-1
- Efa-profile:
efa-profile-1.5-1
- Libfabric-aws:
libfabric-aws-1.19.0-1
- Rdma-core:
rdma-core-46.0-1
- Open MPI:
openmpi40-aws-4.1.6-1
- Efa-driver:
- Upgrade GDRCopy to version 2.4 in all supported OSes, except for Centos 7 where version 2.3.1 is used.
- Upgrade
aws-cfn-bootstrap
to version 2.0-28. - Add support for Python 3.10 in aws-parallelcluster-batch-cli.
BUG FIXES
- Fix inconsistent scaling configuration after cluster update rollback when modifying the list of instance types declared in the Compute Resources.
- Fix users SSH keys generation when switching users without root privilege in clusters integrated with an external LDAP server through cluster configuration files.
- Fix disabling Slurm power save mode when setting
ScaledownIdletime = -1
. - Fix hard-coded path to Slurm installation dir in
update_slurm_database_password.sh
script for Slurm Accounting.