Which LLM Builds the Best Optimization Model?

Main pointing at the camera

Published on November 6, 2025 by Brian Schaefer

You do! When I asked seven different AI models to solve the same optimization problem, I got seven completely different answers. Every LLM produced something that looked like a model. But when I dug deeper, the differences were striking. Some models were mathematically infeasible. Others ran successfully but gave incorrect or meaningless results. A few produced code that was almost right, yet still required human intervention to fix small but critical errors.

As much as I tried to replace myself, I found that even with today’s advanced language models, a human is still required to be a part of the optimization model building process. In optimization modeling, AI can assist, but it can’t replace the expertise, intuition, and validation that humans bring. In my experiment, each LLM was given the same instructions, the same data, and the same list of constraints, and the same goal: to build an optimization model using the Pyomo open-source Python library.

Every LLM could produce code. None could produce trust.

A Vehicle Routing Example

One of the clearest examples came from a vehicle routing problem. I gave the same prompt to seven different models and compared how each formulated the problem. I also formulate the problem myself.

The results? No two robots agreed. Each used different variables, constraints, and logic. Some even misinterpreted the structure of the problem entirely. Out of the eight models tested, three of the LLMs produced infeasible or incorrect solutions. The table below shows the drastic variation in constraint types that each LLM produced. The numbers are how many constraints of that type were in each model generated by the LLM.

Table showing results of constraint types by LLM

My original LLM prompt in this case was for a 9-node network. I always recommend starting with a small toy-sized dataset so that you can validate the solution results by hand. Once I had the structure for the model for each LLM, I used my actual data that I wanted for the vehicle routing problem which had many more nodes.

When I ran the same models with a larger dataset, the runtime differences were dramatic: some solved in three seconds, while others took over an hour.

So even if an AI manages to build a small model that gives the right answer, there might be another, more efficient way to model it, one that a human could recognize and make 1,000x faster.

Optimization modeling isn’t just about writing code that runs. It’s about understanding the real-world meaning of the problem and ensuring that the math reflects it accurately. Good models require judgment. They balance tradeoffs, interpret constraints, and capture relationships that make sense in context. That blend of precision and creativity is what makes optimization both a science and an art.

I tested a vehicle routing problem, a network optimization problem, and a job-shop scheduling problem. Some LLMs were good at one type but bad at another. That inconsistency makes it difficult to trust that any single LLM “understands” the problem or that any single LLM is best at building optimization models.

If you’re using LLMs to build optimization models, you need a process to validate the methodology, not just the output.

A Simple Workflow That Works

Over time, I’ve found a process that keeps the best of both worlds — the speed of AI and the reliability of human validation:

  1. Start small. Build a toy-sized version of your model using a tiny dataset. Make sure you can verify the outputs by hand.
  2. Validate it. Check that the constraints behave logically and the objective makes sense.
  3. Scale up. Once it works, use the same code on your full dataset. Just swap the data source. Don’t rewrite the model.

This workflow lets you move fast without losing control.

Not a Grand Experiment, Just an Honest Test

This wasn’t a comprehensive benchmark of every LLM or optimization-specific tool. It was a practical, curiosity-driven test to see what’s possible. There are other models out there I didn’t test. But the results were clear: AI can help, but it can’t yet replace human reasoning in optimization. And I encourage you to do your own testing and validation.

AI tools can generate ideas, code, and even partial formulations, but they still need humans to guide, check, and refine the math. So the next time someone asks, “Which LLM builds the best optimization model?” The answer is simple: You do.

You build the best models. AI is just here to help.


This is a summary of my presentation at the 2025 INFORMS Annual Meeting. The presentation slides contain additional details, including all three prompts and additional findings and results.