- Install terraform, jq, aws-iam-authenticator, aws-cli, and kubectl.
- Configure the AWS CLI if you have not already. Instructions can be found here
- Modify
terraform.tfvars
as necessary - Run
terraform init
- Run
terraform apply --target=module.eks
- Run
terraform apply --target=module.kubernetes
After following these steps, your cluster should be up and running. You should see two outputs, jupyter_url
and dask_url
. Visit jupyter_url
to access the notebook server attached to the cluster, and dask_url
to view information on dask jobs. Because containers in the cluster take time to set up, these links may not work for a minute or two. Make sure to include the port number when opening the urls in the browser.
To scale the number of nodes in the cluster, change, spot_node_count
in terraform.tfvars
, and run terraform apply --target=module.eks
again.
To shut down non-essential components of the cluster when not in use to save costs, set hibernate
in terraform.tfvars
to true
, and run terraform apply --target=module.eks
. To recreate these components when you would like to work with the cluster, set hibernate
to false
, and run terraform apply --target=module.eks
again.
- Run
terraform destroy --target=module.kubernetes
- Run
terraform destroy --target=module.eks
To run kubectl
commands, simply append --kubeconfig ./kubeconfig.yml
to any command. For example, if you would like to retrieve a list of pods, run kubectl get pods --kubeconfig ./kubeconfig.yml
.
If you need to use some package not included in the default docker image, you can build a custom image using Docker.
- Download and start docker
- Log in or create an account on Docker Cloud
- Navigate to the
docker
folder - Modify
environment.yml
to include the packages you need - Run
docker build -t <docker-cloud-username>/<image-name>:latest .
to build your image locally. This process may take several minutes. - Run
docker push <docker-cloud-username>/<image-name>:latest
to upload your image - Change
worker_image
interraform.tfvars
to<docker-cloud-username>/<image-name>:latest
- Deploy cluster as above
To customize basic cluster details, such as the cluster name and region, modify the corresponding values in terraform.tfvars
(cluster_name
, cluster_region
). Make sure that the value you select for cluster_region
is a valid AWS region.
To customize the machine type of your nodes, modify the stable_instance_type
and spot_instance_type
in terraform.tfvars
. stable_instance_type
defines the type of machine that your Dask scheduler and notebook will run on, while spot_instance_type
defines the type of machine that your Dask workers will run on. Ensure that these are valid AWS machine types. If you are modifying stable_instance_type
, ensure that there is enough space on a single machine of type stable_instance_type
to allocate all of the CPU and RAM required by your Dask scheduler and notebook (See Dask Configuration, Jupyter Configuration below). If you are modifying spot_instance_type
, you should also modify spot_price
correspondingly. All Dask workers run on spot instances to lower costs, and spot_price
defines the maximum price that your instances will be scheduled at. If you're unsure how to proceed, simply set spot_price
to the on-demand price of your spot_instance_type
.
To adjust the number of nodes in your cluster, modify spot_node_count
in terraform.tfvars
. This will determine the number of machines to allocate to Dask workers. A single machine of type stable_instance_type
will always be allocated to the Dask scheduler and notebook server. If you set spot_node_count
to 10, for instance, your cluster will consist of 11 machines, 10 of which contain Dask workers, and one of which contains the Dask scheduler and notebook server.
To customize the configuration of both your Dask scheduler and pool of Dask workers, several variables are provided. dask_worker_count
defines the number of Dask workers that your cluster will attempt to schedule. To ensure that the cluster is always making full use of its resources, it is helpful to set this to a large value (e.g. 100 or 1000) so that all capacity will be used. dask_worker_milli_cpu
, dask_scheduler_mb_ram
, dask_worker_procs
, dask_worker_threads
are provided to allow you to fine-tune the resources provided to your Dask workers. Similarly, dask_scheduler_milli_cpu
and dask_scheduler_mb_ram
are provided to allow you to fine-tune the resources provided to your Dask scheduler. The defaults values for these values should suffice in most cases. If you would like to modify these variables, take care to ensure that your Dask scheduler and notebook can still be scheduled on a single machine of type stable_instance_type
.
To customize the configuration of your Jupyter notebook, several variables are provided. jupyter_milli_cpu
and jupyter_mb_ram
function in the same way as the variables dask_scheduler_milli_cpu
and dask_worker_mb_ram
described above. As above, if you modify these variables, take care to ensure that your Dask scheduler and notebook can still be scheduled on a single machine of type stable_instance_type
. jupyter_gb_storage
is provided to enable you to define the amount of disk space given to your notebook.
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
│ │
╔═══════════════════════╗
│ ║ Stable Node ║ │
╔════════════╗ ║ ║
│ ║ ║ ║ ┌───────┐ ┌─────────┐ ║ │
║ EBS Volume ║─┐║ │ │ │ │ ║
│ ║ ║ └─▶│Jupyter│ │Scheduler│ ║ │
╚════════════╝ ║ │ │ │ │ ║
│ ║ └───────┘ └─────────┘ ║ │
╚═══════════════════════╝
│ ▲ │
┌───────────────────────────┼───────────────────────────┐
│ ┌ ─ ─ ─ ─ ─ ─ ▼ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ▼ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ▼ ─ ─ ─ ─ ─ ─ ┐ │
╔═══════════════════════╗ ╔═══════════════════════╗ ╔═══════════════════════╗
│ │ ║ Spot Node ║ ║ Spot Node ║ ║ Spot Node ║ │ │
║ ║ ║ ║ ║ ║
│ │ ║ ┌──────┐ ┌──────┐ ║ ║ ┌──────┐ ┌──────┐ ║ ║ ┌──────┐ ┌──────┐ ║ │ │
║ │ │ │ │ ║◀─▶║ │ │ │ │ ║◀─▶║ │ │ │ │ ║
│ │ ║ │Worker│ ... │Worker│ ║ ║ │Worker│ ... │Worker│ ║ ║ │Worker│ ... │Worker│ ║ │ │
║ │ │ │ │ ║ ║ │ │ │ │ ║ ║ │ │ │ │ ║
│ │ ║ └──────┘ └──────┘ ║ ║ └──────┘ └──────┘ ║ ║ └──────┘ └──────┘ ║ │ │
╚═══════════════════════╝ ╚═══════════════════════╝ ╚═══════════════════════╝
│ │ Spot Autoscaling Group │ │
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
│ EKS │
─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─