diff --git a/compute/gpu/reference-content/assets/scaleway-gpu-comparisation.pdf b/compute/gpu/reference-content/assets/scaleway-gpu-comparisation.pdf new file mode 100644 index 0000000000..0a25b84ee2 Binary files /dev/null and b/compute/gpu/reference-content/assets/scaleway-gpu-comparisation.pdf differ diff --git a/compute/gpu/reference-content/assets/scaleway-gpu-comparisation.webp b/compute/gpu/reference-content/assets/scaleway-gpu-comparisation.webp new file mode 100644 index 0000000000..cc03d669b6 Binary files /dev/null and b/compute/gpu/reference-content/assets/scaleway-gpu-comparisation.webp differ diff --git a/compute/gpu/reference-content/choosing-gpu-instance-type.mdx b/compute/gpu/reference-content/choosing-gpu-instance-type.mdx index bfc9183273..a39091c701 100644 --- a/compute/gpu/reference-content/choosing-gpu-instance-type.mdx +++ b/compute/gpu/reference-content/choosing-gpu-instance-type.mdx @@ -7,7 +7,7 @@ content: paragraph: This section provides information about how to choose a GPU Instance type tags: NVIDIA GPU cloud instance dates: - validation: 2023-08-31 + validation: 2024-02-08 posted: 2022-08-31 categories: - compute @@ -22,7 +22,7 @@ It empowers European AI startups, giving them the tools (without the need for a ## How to choose the right GPU Instance type -Scaleway provides a range of GPU Instance offers. There are several factors to consider when choosing the right GPU Instance type, to ensure that it meets your performance, budget, and scalability requirements. +Scaleway provides a range of GPU Instance offers, from [GPU RENDER Instances](https://www.scaleway.com/en/gpu-render-instances/) and [H100 PCIe Instances](https://www.scaleway.com/en/h100-pcie-try-it-now/) to [Jeroboam and Nabuchodonosor AI supercomputers](https://www.scaleway.com/en/ai-supercomputers/). There are several factors to consider when choosing the right GPU Instance type to ensure that it meets your performance, budget, and scalability requirements. Below, you will find a guide to help you make an informed decision: * **Workload requirements:** Identify the nature of your workload. Are you running machine learning, deep learning, high-performance computing (HPC), data analytics, or graphics-intensive applications? Different Instance types are optimized for different types of workloads. For example, the H100 is not designed for graphics rendering. However, other models are. As [stated by Tim Dettmers](https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/), “Tensor Cores are most important, followed by memory bandwidth of a GPU, the cache hierarchy, and only then FLOPS of a GPU.”. For more information, refer to the [NVIDIA GPU portfolio](https://www.nvidia.com/content/dam/en-zz/solutions/data-center/data-center-gpu-portfolio-line-card.pdf). @@ -33,27 +33,87 @@ Below, you will find a guide to help you make an informed decision: * **GPU driver and software compatibility:** Ensure that the GPU Instance type you choose supports the GPU drivers and software frameworks you need for your workload. This includes CUDA libraries, machine learning frameworks (TensorFlow, PyTorch, etc.), and other specific software tools. For all [Scaleway GPU OS images](/compute/gpu/reference-content/docker-images/), we offer a driver version that enables the use of all GPUs, from the oldest to the latest models. As is the NGC CLI, `nvidia-docker` is preinstalled, enabling containers to be used with CUDA, cuDNN, and the main deep learning frameworks. * **Scaling:** Consider the scalability requirements of your workload. The most efficient way to scale up your workload is by using: * Bigger GPU - * Up to 2 PCIe GPU + * Up to 2 PCIe GPU with [H100 Instances](https://www.scaleway.com/en/h100-pcie-try-it-now/) or 8 PCIe GPU with L4 Instances. * A HGX based server setup with 8x NVlink GPUs - * A SuperPod like architecture for a larger setup for workload-intensive tasks + * A [supercomputer architecture](https://www.scaleway.com/en/ai-supercomputers/) for a larger setup for workload-intensive tasks * Another way to scale your workload is to use [Kubernetes and MIG](/compute/gpu/how-to/use-nvidia-mig-technology/): You can divide a single H100 GPU into as many as 7 MIG partitions. This means that instead of employing seven P100 GPUs to set up seven K8S pods, you could opt for a single H100 GPU with MIG to effectively deploy all seven K8S pods. * **Online resources:** Check for online resources, forums, and community discussions related to the specific GPU type you are considering. This can provide insights into common issues, best practices, and optimizations. Remember that there is no one-size-fits-all answer, and the right GPU Instance type will depend on your workload’s unique requirements and budget. It is important that you regularly reassess your choice as your workload evolves. Depending on which type best fits your evolving tasks, you can easily migrate from one GPU Instance type to another. -## Scaleway GPU Instances types overview +## GPU Instances and AI Supercomputer comparison table -| | RENDER-S | H100-1-80G | H100-2-80G | +### Scaleway GPU Instances types overview + +| | **[RENDER-S](https://www.scaleway.com/en/gpu-render-instances/)** | **[H100-1-80G](https://www.scaleway.com/en/h100-pcie-try-it-now/)** | **[H100-2-80G](https://www.scaleway.com/en/h100-pcie-try-it-now/)** | |---------------------------------------------------------------------|-------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------| -| GPU Type | 1x [P100](https://www.nvidia.com/en-us/data-center/tesla-p100/) | 1x [H100](https://resources.nvidia.com/en-us-tensor-core/nvidia-tensor-core-gpu-datasheet) | 2x [H100](https://resources.nvidia.com/en-us-tensor-core/nvidia-tensor-core-gpu-datasheet) | +| GPU Type | 1x [P100](https://www.nvidia.com/en-us/data-center/tesla-p100/) PCIe3 | 1x [H100](https://resources.nvidia.com/en-us-tensor-core/nvidia-tensor-core-gpu-datasheet) PCIe5 | 2x [H100](https://resources.nvidia.com/en-us-tensor-core/nvidia-tensor-core-gpu-datasheet) PCIe5 | +| NVIDIA architecture | Pascal 2016 | Hopper 2022 | Hopper 2022 | | Tensor Cores | N/A | Yes | Yes | -| Performance in TFLOPS (FP16 acc 32 Tensor Cores - without sparsity) | (No Tensor Cores : 9,3 TFLOPS FP32) | 1513 TFLOPS | 2x 1513 TFLOPS | -| VRAM | 16 GB HBM2 (Memory bandwidth: 732 GB/s) | 80 GB HBM3 (Memory bandwidth: 2TB/s) | 2x80 GB HBM3 (Memory bandwidth: 2TB/s) | +| Performance (training in FP16 Tensor Cores) | (No Tensor Cores : 9,3 TFLOPS FP32) | 1513 TFLOPS | 2x 1513 TFLOPS | +| VRAM | 16 GB CoWoS HBM2 (Memory bandwidth: 732 GB/s) | 80 GB HBM2E (Memory bandwidth: 2TB/s) | 2x80 GB HBM2E (Memory bandwidth: 2TB/s) | | CPU Type | Intel Xeon Gold 6148 (2.4 GHz) | AMD EPYC™ 9334 (2.7GHz) | AMD EPYC™ 9334 (2.7GHz) | | vCPUs | 10 | 24 | 48 | | RAM | 42 GB DDR3 | 240 GB DDR5 | 480 GB DDR5 | | Storage | Block/Local | Block | Block | -| Scratch Storage | No | Yes (3 TB NVMe) | Yes (6 TB NVMe) | +| [Scratch Storage](/compute/gpu/how-to/use-scratch-storage-h100-instances/) | No | Yes (3 TB NVMe) | Yes (6 TB NVMe) | +| [MIG compatibility](/compute/gpu/how-to/use-nvidia-mig-technology/) | No | Yes | Yes | | Bandwidth | 1 Gbps | 10 Gbps | 20 Gbps | | Better used for | - Graphic Computer Vision
- General Deep Learning usage
- Video encoding/decoding (~4k) | - Large-size model training
- Fine-tune LLMs/transformer model
- Generative AI
- Optimize GPU workflows & deployments in Kubernetes thanks to MIG | - Large-size model training
- Fine-tune LLMs/transformers models
- Generative AI
- Optimize GPU workflows & deployments in Kubernetes thanks to MIG | -| Not made for | Large models (especially LLM) | Graphic or video encoding use cases | Graphic or video encoding use cases | +| What they are not made for | Large models (especially LLM) | Graphic or video encoding use cases | Graphic or video encoding use cases | + +| | **[L4-1-24G](https://www.scaleway.com/en/contact-l4/)** | **[L4-2-24G](https://www.scaleway.com/en/contact-l4/)** | **[L4-4-24G](https://www.scaleway.com/en/contact-l4/)** | **[L4-8-24G](https://www.scaleway.com/en/contact-l4/)** | +|---------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------| +| GPU Type | 1x [L4](https://www.nvidia.com/en-us/data-center/l4/) PCIe4 | 2x [L4](https://www.nvidia.com/en-us/data-center/l4/) PCIe4 | 4x [L4](https://www.nvidia.com/en-us/data-center/l4/)PCIe4 | 8x [L4](https://www.nvidia.com/en-us/data-center/l4/) PCIe4 | +| NVIDIA architecture | Lovelace 2022 | Lovelace 2022 | Lovelace 2022 | Lovelace 2022 | +| Tensor Cores | Yes | Yes | Yes | Yes | +| Performance (training in FP16 Tensor Cores) | 242 TFLOPS | 2x 242 TFLOPS | 4x 242 TFLOPS | 8x 242 TFLOPS | +| VRAM | 24 GB RAM GDDR6 (Memory bandwidth: 300 GB/s) | 2x 24 GB RAM GDDR6 (Memory bandwidth: 300 GB/s) | 4x 24 GB RAM GDDR6 (Memory bandwidth: 300 GB/s) | 8x 24 GB RAM GDDR6 (Memory bandwidth: 300 GB/s) | +| CPU Type | AMD EPYC™ 7413 (2.65GHz) | AMD EPYC™ 7413 (2.65GHz) | AMD EPYC™ 7413 (2.65GHz) | AMD EPYC™ 7413 (2.65GHz | +| vCPUs | 8 | 16 | 32 | 64 | +| RAM | 48 GB DDR4 | 96 GB DDR4 | 192 GB DDR4 | 384 GB DDR4 | +| Storage | Block | Block | Block | Block | +| [Scratch Storage](/compute/gpu/how-to/use-scratch-storage-h100-instances/) | No | No | No | No | +| [MIG compatibility](/compute/gpu/how-to/use-nvidia-mig-technology/) | No | No | No | No | +| Bandwidth | 2.5 Gbps | 5 Gbps | 10 Gbps | 20 Gbps | +| Better used for | - Building inference infrastructure
- Generative AI for visual communication
- Streaming video content
- 3D Graphism | - Building inference infrastructure
- Generative AI for visual communication
- Streaming video content
- 3D Graphism | - Building inference infrastructure
- Generative AI for visual communication
- Streaming video content
- 3D Graphism | - Building inference infrastructure
- Generative AI for visual communication
- Streaming video content
- 3D Graphism | +| What they are not made for | - Training of LLM | - Training of LLM | - Training of LLM | - Training of LLM | + +| | **[L40S GPU Instance](https://www.scaleway.com/en/contact-l40s/)** | +|---------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| GPU Type | [L40S 48GB PCIe](https://www.nvidia.com/en-us/data-center/l40s/) | +| NVIDIA architecture | Lovelace 2022 | +| Performance (training in FP16 Tensor Cores) | Up to 362 TFLOPS | +| Specifications | under construction | +| Format & Features | Multi-GPU (under construction) | +| Use cases | - Large DL model training
- Large DL model inference
- Medium LLM fine tuning (PEFT), & inference
- 3D Graphism
- Image/Video processing applications (encoding/ decoding (8k)) | +| What they are not made for | + +### Scaleway AI Supercomputer +| | **[Jeroboam](https://www.scaleway.com/en/ai-supercomputers/)** (2DGX H100, 16 H100 GPUs) | **[Nabuchodonosor](https://www.scaleway.com/en/ai-supercomputers/)** (127 DGX H100, 1016 H100 GPUs) | +|---------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------| +| GPU Type | 16x [H100](https://resources.nvidia.com/en-us-tensor-core/nvidia-tensor-core-gpu-datasheet) (SXM5) | 1,016x [H100](https://resources.nvidia.com/en-us-tensor-core/nvidia-tensor-core-gpu-datasheet) (SXM5) | +| NVIDIA architecture | Hopper 2022 | Hopper 2022 | +| Tensor Cores | Yes | Yes | +| Performance in PFLOPs FP8 Tensor Core | Up to 63.2 PFLOPS | Up to 4,021.3 PFLOPS | +| VRAM | 1280 GB (total cluster) | 81,280GB (total cluster) | +| CPU Type | Dual Intel® Xeon® Platinum 8480C Processors (3.8 GHz) | Dual Intel® Xeon® Platinum 8480C Processors (3.8 GHz) | +| Total CPU cores | 224 cores (total cluster) | 14,224 cores (total cluster) | +| RAM | 4 TB (total cluster) | 254 TB (total cluster) | +| Storage | 64TB of a3i DDN low latency storage | 1.8 PB of a3i DDN low latency storage | +| [MIG compatibility](/compute/gpu/how-to/use-nvidia-mig-technology/) | Yes | Yes | +| Inter-GPU bandwidth | Infiniband 400 Gb/s | Infiniband 400 Gb/s | + +### NVIDIA GH200 Superchip + +| | **[GH200 Grace Hopper™](https://www.scaleway.com/en/contact-gh200/)** | +|---------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| GPU Type | NVIDIA [GH200 Grace Hopper™ Superchip](https://www.nvidia.com/en-us/data-center/grace-hopper-superchip/) | +| NVIDIA architecture | GH200 Grace Hopper™ Architecture | +| Performance | 990 TFLops (in FP166 Tensor Core) | +| Specifications | - GH200 SuperChip with 72 ARM Neoverse V2 cores
- 480 GB of LPDDR5X DRAM
- 96GB of HBM3 GPU memory
(Memory is fully merged for up to 576GB of global usable memory) | +| [MIG compatibility](/compute/gpu/how-to/use-nvidia-mig-technology/) | Yes | +| Inter-GPU bandwidth (for clusters up to 256 GH200) | NVlink Switch System 900 GB/s | +| Format & Features | Single chip up to GH200 clusters. (For larger setup needs, [contact us](https://www.scaleway.com/en/contact-ai-supercomputers/)) | +| Use cases | - Extra large LLM and DL model inference
- HPC | +| What they are not made for | - Graphism
- (Training) | \ No newline at end of file diff --git a/compute/gpu/reference-content/understanding-nvidia-fp8.mdx b/compute/gpu/reference-content/understanding-nvidia-fp8.mdx index fb75751aaa..77b688cbc8 100644 --- a/compute/gpu/reference-content/understanding-nvidia-fp8.mdx +++ b/compute/gpu/reference-content/understanding-nvidia-fp8.mdx @@ -13,7 +13,7 @@ categories: - compute --- -With the release of the H100 GPU, NVIDIA introduced support for a new datatype called FP8 (8-bit floating point), enabling higher throughput of matrix multipliers and convolutions. +Scaleway offers GPU Instances featuring [L4 and H100 GPUs](https://www.scaleway.com/en/h100-pcie-try-it-now/) that support FP8 (8-bit floating point), a revolutionary datatype introduced by NVIDIA. It enables higher throughput of matrix multipliers and convolutions. FP8 is an 8-bit floating point standard which was jointly developed by NVIDIA, ARM, and Intel to speed up AI development by improving memory efficiency during AI training and inference processes. diff --git a/console/account/reference-content/organization-quotas.mdx b/console/account/reference-content/organization-quotas.mdx index 4c69747835..966177b2d8 100644 --- a/console/account/reference-content/organization-quotas.mdx +++ b/console/account/reference-content/organization-quotas.mdx @@ -146,6 +146,11 @@ At Scaleway, quotas are applicable per [Organization](/identity-and-access-manag | GPU 3070 - S| To use this product, you must [validate your identity](/console/account/how-to/verify-identity/). | 1 | | H100-1-80G | To use this product, you must [contact our support team](https://console.scaleway.com/support/create). | To use this product, you must [contact our support team](https://console.scaleway.com/support/create). | | H100-2-80G | To use this product, you must [contact our support team](https://console.scaleway.com/support/create). | To use this product, you must [contact our support team](https://console.scaleway.com/support/create). | +| L4-1-24G | To use this product, you must [contact our support team](https://console.scaleway.com/support/create). | 1 | +| L4-2-24G | To use this product, you must [contact our support team](https://console.scaleway.com/support/create). | 1 | +| L4-4-24G | To use this product, you must [contact our support team](https://console.scaleway.com/support/create). | 1 | +| L4-8-24G | To use this product, you must [contact our support team](https://console.scaleway.com/support/create). | 1 | + ## Apple silicon diff --git a/console/account/reference-content/products-availability.mdx b/console/account/reference-content/products-availability.mdx index 9f9abf033f..08b2bfc8ad 100644 --- a/console/account/reference-content/products-availability.mdx +++ b/console/account/reference-content/products-availability.mdx @@ -21,10 +21,13 @@ Scaleway products are available in multiple regions and locations worldwide. Thi * Amsterdam: AMS1, AMS2, AMS3 * Warsaw: WAW1, WAW2, WAW3 -| Product Category | Product | Paris region | Amsterdam region | Warsaw region | +| Product Category | Product | Paris region | Amsterdam region | Warsaw region | |---------------------------|---------------------------------------|------------------------|-------------------------|------------------------| | **Compute** | Instances | PAR1, PAR2, PAR3 | AMS1, AMS2, AMS3 | WAW1, WAW2, WAW3 | -| | GPU | PAR1, PAR2 | Not available yet | Not available yet | +| | GPU RENDER-S | PAR1 | Not available yet | Not available yet | +| | GPU 3070-S | PAR2 | Not available yet | Not available yet | +| | GPU L4-X-24G | PAR2 | Not available yet | WAW2 | +| | GPU H100-X-80G | PAR2 | Not available yet | WAW2 | | **Bare Metal** | Elastic Metal | PAR1, PAR2 | AMS1 | Not available yet | | | Apple Silicon | PAR1, PAR3 | Not available yet | Not available yet | | | Dedibox | DC2, DC3, DC5 | AMS1 | Not available yet |