Running cuOpt in an AWS EKS cluster as a managed node group

Published on March 21, 2025

March 21, 2025

When the SimpleRose engineering team went looking for a guide on how to deploy cuOpt for our specific needs, none existed because the technology is so groundbreaking. We wished when we did an internet search for how to get started we found an article exactly like this one. Now we want to share our findings so you too can benefit from the power of cuOpt.

NVIDIA has a guide for using a self-hosted server to run cuOpt, but many of its requirements conflicted with our desire to use a managed node group within AWS (Amazon Web Services) EKS (Elastic Kubernetes Service) along with an EKS optimized AMI (Application Machine Image) for our EC2 (Elastic Compute Cloud) instances. This left the SimpleRose engineering team with many questions, such as, “Is this even possible?” It turns out that, “Yes! It is possible,” by following this guide.

Step 1: Select the AL2023_x86_64_NVIDIA AMI for your EC2 instance

Given our desire to run cuOpt as a managed node group within EKS, we first wanted to find a suitable EKS-optimized AMI to use. In lieu of manually installing the set of software components required, we found that the EKS Optimized AMI references a set of AMI’s called the Amazon EKS optimized accelerated Amazon Linux AMIs that come pre-installed with, among other things, the following components that may be beneficial for our use-case:

NVIDIA drivers
nvidia-container-toolkit

This AWS blog post from October 2024 gives additional detail on these AMI’s. In particular, that these AMI’s are offered in several variants, including

AL2023_x86_64_NVIDIA – Amazon Linux 2023 x86 NVIDIA variant that includes the CUDA drivers
AL2023_x86_64_NEURON – Amazon Linux 2023 x86 Neuron variant that includes the kernel and rivers for AWS Neuron devices

The NVIDIA variant seemed like a great AMI to start with, given that it included some of the software and drivers needed for cuOpt.

Note: These AMI’s are not available on the AWS marketplace, and metadata about these AMI’s, including the AMI ID and release version, will likely be needed by your IaC (infrastructure as code) tools. Metadata about these AMI’s can be referenced via AWS SSM (Services Systems Manager) parameters.

To retrieve the AMI ID for AL2023_x86_64_NVIDIA, you can run the following command, replacing [YOUR EKS VERSION] and [YOUR REGION] with the relevant values:

aws ssm get-parameter --name /aws/service/eks/optimized-ami/[YOUR EKS VERSION]/amazon-linux-2023/x86_64/nvidia/recommended/image_id --region [YOUR REGION] --query "Parameter.Value" --output text

To retrieve the AMI version for AL2023_x86_64_NVIDIA, you can run the following command, replacing [YOUR EKS VERSION] and [YOUR REGION] with the relevant values:

aws ssm get-parameter --name /aws/service/eks/optimized-ami/[YOUR EKS VERSION]/amazon-linux-2023/x86_64/nvidia/recommended/release_version --region [YOUR REGION] --query "Parameter.Value" --output text

We launched an EC2 instance as part of a managed node group using the AL2023_x86_64_NVIDIA AMI. After launching our instance, we logged into the machine using SSH and verified that the NVIDIA drivers and CUDA were installed via the following command:

$ nvidia-smi

Running this command should result in an output like the image below, where the NVIDIA Drivers and CUDA Version are displayed across the top:

+--------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.05     Driver Version: 560.35.05          CUDA Version: 12.6 |
|---------------------------------+-----------------------+----------------------|
| GPU  Name         Persistence-M | Bus-Id         Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf   Pwr:Usage/Cap |          Memory-Usage | GPU-Util  Compute M. |
|                                 |                       |               MIG M. |
|================================================================================|
|  0  NVIDIA A10G             On  |  00000000:00:1E.0 Off |                    0 |
| 0%   28C    P0       56W / 300W |   1280MiB / 23028MiB  |        0%    Default |
|                                 |                       |                  N/A |
+---------------------------------+-----------------------+----------------------+

Step 2: Install NVIDIA device plugin for Kubernetes

While researching which Amazon EKS optimized accelerated AMI to start with, we found this AWS guide for running GPU accelerated containers (Linux on EC2). This guide installs a component in your cluster called the NVIDIA device plugin for Kubernetes. This plugin runs as a DaemonSet in your cluster and allows you to:

Expose the number of GPUs on each nodes of your cluster
Keep track of the health of your GPUs
Run GPU enabled containers in your Kubernetes cluster

The DaemonSet can be installed with this command, replacing vX.X.X with a release version defined in the NVIDIA k8s device plugin release page:

kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/vX.X.X/deployments/static/nvidia-device-plugin.yml

After installing the DaemonSet, you can run the below command to ensure that your new managed node is exposing a GPU to Kubernetes:

kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"

This command should result in a table that looks like the following, where each row is a node/EC2 instance in your EKS cluster. The row that corresponds to the node created in Step 1 should have a GPU value that is not [none]:

Name	GPU
IP-10-1-1-1.[YOUR REGION].compute.internal	1
IP-10-1-1-2.[YOUR REGION].compute.internal	[none]
IP-10-1-1-3.[YOUR REGION].compute.internal	[none]

Step 3: Verify GPU access

After installing the NVIDIA device plugin for Kubernetes DaemonSet, we wanted to verify a simple ‘hello world’-type workload that requires a GPU. The below manifest attempts to place a pod that runs a container that performs some vector addition on the GPU and then exits:

apiVersion: v1
kind: Pod
metadata:
 name: gpu-pod
spec:
 restartPolicy: Never
 containers:
  - name: cuda-container
   image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0
   resources:
    limits:
     nvidia.com/gpu: 1 # requesting 1 GPU
 tolerations:
 - key: nvidia.com/gpu
  operator: Exists
  effect: NoSchedule

Note that according to your pod placement strategy and the taints applied to your nodes, you may need to add additional tolerations to the above manifest so that the pod can be placed on your accelerated AMI node.

After applying the above manifest, checking the Kubernetes logs for gpu-pod should show the following:

$ kubectl logs gpu-pod
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

Step 4: Install and Run cuOpt

Now it was time for the grand finale – installing and running cuOpt!

The NVIDIA self-hosted server guide has several options for installing cuOpt, including:

Using docker run commands to run the cuOpt container
Deploying the cuOpt container via helm

Since the container runtime installed by our AMI is in a container, method #1 of using docker run commands was not an option for us. To simplify things for this experiment, we opted to create our own deployment manifest that references a cuOpt container image we deployed in a private ECR (Elastic Container Registry) repository that looks something like this:

apiVersion: apps/v1
kind: Deployment
metadata:
 labels:
  app: cuopt
 name: cuopt-deployment
 namespace: rose
spec:
 replicas: 1
 selector:
  matchLabels:
   app: cuopt
 template:
  metadata:
   labels:
    app: cuopt
  spec:
   containers:   
   - name: cuopt
    image: [URI for cuopt image in private ECR repo]
    ports:
     - containerPort: 5000
    env:
    - name: CUOPT_SERVER_LOG_LEVEL
     value: "debug"     
    resources:
     requests:
      cpu: 15
      memory: "48Gi"
     limits:
      nvidia.com/gpu: 1

We’d like to call special attention to the resources:limits:nvidia.com/gpu key in the above manifest. Initially we omitted this key. This resulted in the cuOpt pod getting placed correctly and the pod made it into a RUNNING state, but cuOpt emitted a Maximum pool size exceeded error to the logs message every 1 second or so:

2025-02-21 21:02:27.070 INFO cuopt received signal 17
2025-02-21 21:02:27.521 INFO Starting new process with pid 3526
Process Process-1165:
Traceback (most recent call last):
 File "/usr/local/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
  self.run()
 File "/usr/local/lib/python3.11/multiprocessing/process.py", line 108, in run
  self._target(*self._args, **self._kwargs)
 File "/usr/local/lib/python3.11/dist-packages/cuopt_server/utils/solver.py", line 331, in process_async_solve
  pool = rmm.mr.PoolMemoryResource(
      ^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "memory_resource.pyx", line 441, in rmm._lib.memory_resource.PoolMemoryResource.__cinit__
MemoryError: std::bad_alloc: out_of_memory: RMM failure at:/tmp/pip-build-env-261ye19l/normal/lib/python3.11/site-packages/librmm/include/rmm/mr/device/pool_memory_resource.hpp:277: Maximum pool size exceeded

After adding the missing resources:limits:nvidia.com/gpu key and redeploying the manifest, cuOpt was able to start without issues.

Conclusion

Now that SimpleRose has cuOpt running, we are able to solve optimization problems by an order of magnitude faster by leveraging the power of Rose+cuOpt together. See our joint blog article with NVIDIA for the data behind our performance increases. If you would like to try out Rose+cuOpt together, contact us today. It is offered as a fully solver-as-a-service solution so you will not need to go through the trouble of installing cuOpt.

About SimpleRose

SimpleRose transforms business-critical data into optimized, actionable strategies, driving better outcomes through advanced decision analysis. The first cloud native solver, Rose leverages massive parallelization and near limitless scalability to solve complex optimization problems anywhere at unprecedented speed, eliminating the need for installation and maintenance while achieving speedups across hundreds of machines.

About NVIDIA cuOpt

NVIDIA cuOpt is a high-performance GPU-accelerated solver designed for real-time route optimization and logistics planning. It leverages CUDA and NVIDIA’s parallel computing architecture to efficiently solve large-scale vehicle routing problems (VRP) and other combinatorial optimization challenges.