Running Optimization Workloads on Kubernetes

Published on February 2, 2026

It’s 7:15 on Tuesday morning.  Your technicians are already rolling out with their trucks half-loaded, ready to get to work on some repair and maintenance work.  But something is wrong with the scheduling system. It’s not showing today’s schedule. Every extra minute of delay is blowing up into a messy day ahead with frustrated customers and dispatchers trying to manually assign technicians to jobs.

Whether you’re scheduling or planning your manufacturing, customer deliveries, power production, or field technicians, building an enterprise-grade, optimization-powered decision support system involves more than just good data, models, and a polished UI. What’s often underestimated is the complexity of the execution environment these systems run on and how much infrastructure design decisions affect reliability, performance, and cost of the system.

Why Infrastructure Matters Particularly for Optimization

Many people are shocked when they first realize the compute demands of even seemingly small planning or scheduling applications that use optimization. Imagine trying to plan efficient routes for just 10 trucks that need to visit 100 locations. The number of possible routes is larger than the number of atoms in the observable universe. Thankfully, commercial optimization solvers can find good or even optimal solutions without testing every possible route. Even so, many real-world problems of this kind still take hours to run, even when using aggressive multi-threading on a single machine.

As more of these applications move to the cloud, combined with the growth in problem sizes and fidelity demands that we see in the market, there’s a big opportunity to rethink how this compute is used. When 99%+ of your total compute time is spent inside the solver itself, distributing only the surrounding application components doesn’t help much. Real gains come from distributing the solve itself: taking advantage of multiple CPUs, and in some cases GPUs, to parallelize the work beyond what a single machine can handle.

For these reasons, SimpleRose has spent the past several years building the world’s first massively parallel optimization solver. At this scale and level of parallelism, the demands placed on the underlying platform are significant. 

You can’t just launch optimization workloads  and hope for the best. The platform has to actively manage what happens while a solve is in progress. Workers need to reliably find each other, failures need to be detected and recovered from automatically, and compute has to scale up and down as the solver’s needs change. Without that, solves become fragile, expensive to rerun, and unpredictable in both performance and cost.

Supporting this reliably requires an execution environment that can scale compute resources up and down dynamically, recover from partial failures, and adapt to the solver’s actual runtime behavior. Those requirements don’t go away just because of where the solver is deployed. They apply whether a solve is running in a managed cloud environment or inside a customer’s own data center.

In practice, not every organization can run optimization workloads using a pure SaaS offering. Some operate under strict regulatory or data residency constraints, while others require full control over their infrastructure for performance, predictability, or isolation. For these teams, the platform needs to deliver the same reliability, scalability, and ease of use on-prem as it does in the cloud.

This is what drove our approach to the underlying platform. We needed a foundation that could meet the demands of large-scale, distributed optimization while remaining portable across environments. Kubernetes gives us that foundation, and it’s what enables the “fire-and-forget” experience we’re aiming for: solves that just run, scale as needed, recover from failures, and do so without forcing OR teams to become infrastructure experts.

Design considerations at the platform layer

The requirements above directly informed how we designed the underlying platform. We needed something that was portable, cost-efficient, and flexible enough to adapt to changing hardware and deployment environments, but without adding operational burden for our customers or for us.

First: portability.
While we primarily offer the solver as a SaaS-based solution, portability is still a core requirement.  Some customers need to deploy the platform in their own environments due to regulatory constraints or a desire to use existing hardware and contracts. At the same time, compute options are evolving quickly. New instance types, specialized accelerators, and region-specific pricing can have a meaningful impact on both performance and cost. Locking ourselves into specific providers would limit our customers’ ability to move quickly, optimize spend, or run solves wherever they get the best performance per dollar.

We considered building multiple, fully provider-native versions of the platform but decided against it. In practice, that approach tends to fragment behavior over time. Features become available in one environment before another, performance characteristics diverge, and subtle infrastructure differences start to leak into the solver itself. For example, the solver and its coordination logic start having to account for environment-specific behavior. Differences in networking, storage, node discovery, failure semantics, or scaling behavior may be small in isolation, but over time they add up. To compensate, logic that should be purely about optimization ends up containing special cases, conditional paths, or tuning that depends on where the system is running. 

From a customer perspective, this all leads to inconsistent performance, uneven capabilities, and different operational models depending on where the platform is deployed. Instead, we wanted a single platform that behaves the same way everywhere, while still allowing customers to take advantage of the best compute and pricing options available in each environment.

Also on the topic of portability, we’re seeing growing interest in running parts of optimization workloads on newer hardware, such as NVIDIA GPUs. Portability makes it possible to place workloads on the hardware that makes the most sense, even when that hardware isn’t offered competitively by a customer’s primary provider.

Second: cost efficiency and operational leverage.
Managing our own compute fleet simply isn’t a good use of our engineering time. Operating autoscaling groups, tracking capacity health, patching hosts, and managing the machine lifecycle is expensive and distracting. We’re a focused team, and we want our effort going into solver capabilities, not maintaining commodity infrastructure. That led to another core design goal: avoid managing raw compute wherever possible and, instead, rely on a platform layer that can handle scaling and right-sizing automatically.

Third: containerization as the execution model.
Optimization platforms naturally consist of components with very different performance characteristics and implementation needs: high-performance compute kernels, distributed coordination layers, web APIs, data pipelines, and more. We wanted the freedom to use the right language, runtime, and framework for each of these pieces. Containerization gives us a clean, consistent deployment unit that isolates those choices, lets components evolve independently, and still fit together as a single system. It also ensures that whether we’re running on CPUs or GPUs, ARM or x86, on-prem or in the cloud, the platform behaves predictably.

How did we accomplish this?

There are several popular technologies that support container execution and orchestration, including Kubernetes, Docker Swarm, and Hashicorp Nomad, with Kubernetes being the de-facto industry standard.  Kubernetes also offers a mature  packaging and deployment mechanism in Helm, giving us the ability to package and distribute our platform on any cloud or customer on-prem environment supporting Kubernetes.  Based on industry adoption and our portability requirements, our team decided to move forward with Kubernetes for our initial design. 

Kubernetes offers several complementary technologies to provide the automated compute elasticity required by our platform.  Beyond elasticity and portability, we also have several additional requirements driven by the distributed nature of our solver:

  1. Solve compute nodes require a seamless way to register and discover other nodes participating in the solves
  2. Different clients require different SLA’s and performance characteristics for their solves
  3. Long running solve tasks should be self-healing if a compute node for the solve task fails

Given the above requirements, we decided to implement our solve tasks using a Kubernetes Indexed Job resource.  A Kubernetes Indexed Job can be configured to launch multiple pods for an optimization task with stable, well-defined DNS entries for pod-to-pod communication.  Kubernetes Job success and failure policies allow jobs to self-heal.  Affinity rules, taints, and tolerations ensure jobs are placed appropriately for a given customer’s required performance characteristics, maximizing compute resources on a node for a given solve task while avoiding noisy scenarios for other solve tasks.

To achieve our automated elasticity and portability requirements while decoupling our platform from any specific cloud provider, we use Kubernetes Cluster Autoscaler.  Cluster Autoscaler monitors compute requests and attempts to meet those requests using configured scaling rules along with bindings to underlying cloud provider scaling services.  When a Kubernetes Indexed Job is submitted, Kubernetes will attempt to place the job, and if the job cannot be placed, infrastructure is dynamically scaled-out via Kubernetes Cluster Autoscaler and dynamically scaled-in when the solve task is complete.

Benefits For Our Customers

All of these design choices exist for one reason. They make optimization easier, faster, and more predictable for our customers.

Fast, isolated, and predictable optimization runs
Each solve runs in an isolated and well defined execution environment. Compute resources are allocated specifically for that solve and are not shared unpredictably with other workloads. Distributed solve workers can reliably find each other and recover automatically from failures. This leads to consistent performance even for large and long running models.

No need to become infrastructure experts
Customers do not need to install solvers or configure clusters. They do not need to manage scaling rules or monitor machines. The platform handles orchestration, recovery, and placement automatically. Teams can focus on modeling and results instead of infrastructure.

Costs that scale with real usage
Compute capacity is added only when a solve needs it. When a solve finishes, the capacity is released. This avoids paying for idle machines during quiet periods. It also avoids over provisioning for worst case scenarios. Spiky and long running jobs remain cost efficient.

Freedom to use on prem compute
The same platform runs in the cloud or inside a customer data center. Customers can deploy SimpleRose on their own Kubernetes environments and use existing idle hardware. This reduces cloud spend and helps meet regulatory or data residency requirements. It also provides full control without changing how solves are submitted or managed.

In short, Kubernetes allows us to deliver a true fire and forget optimization experience. Customers get speed, reliability, and flexibility without taking on operational complexity.