AI Rivalry: Getting Better Results by Pitting LLMs Against Each Other

· origo's blog


After two decades of building software systems, I've consistently found that competition tends to improve results. This same principle, surprisingly, works remarkably well with AI language models. Forget prompt engineering wizardry—the most effective technique I've found for extracting genuinely useful solutions is creating controlled competition between different AI assistants.

The Problem: When AI Output Is Just Mediocre #

We've all experienced it. You're tackling a tricky problem in your codebase, so you ask an AI assistant for help. It provides a technically correct but uninspired solution that lacks nuance and practical considerations. It works, but it's just... adequate.

Why settle for adequate when you can push for something better?

The Solution: Strategic AI Rivalry #

The approach is straightforward but powerful: create a competitive dynamic between different AI models to drive better results. Here's the method:

  1. Define a Clear Challenge - Start with a specific, well-defined problem. Precision matters.

  2. Collect Different Solutions - Submit the same prompt to different LLMs (ChatGPT, Claude, Grok). Each brings its own approach.

  3. Cross-Pollinate with Feedback - Show each AI the others' solutions along with specific critique points: "This approach has performance issues when scaling" or "The error handling here is incomplete."

  4. Iterate with Focus - As the models improve on each other's work, guide them toward refinement: "Combine the elegant error handling from solution A with the performance optimization from solution B."

  5. Synthesize the Best Parts - Your job is to identify and integrate the strongest elements from each iteration.

Real-World Example: Optimizing a Search Function #

Lets say you needed to improve a text search function that was becoming a bottleneck. Here's a simplified view of one variant of the process:

First pass with ChatGPT: Get a basic solution using regex with decent functionality, but nothing special for performance.

Showed this to Claude with the note "This works but might be slow for large datasets." Claude suggests a more efficient indexing approach with some clever optimization tricks.

Presented both to Grok with the comment "These work, but could use better relevancy ranking." Grok might propose incorporating TF-IDF scoring and fuzzy matching for better results.

By bouncing these solutions between the models with targeted feedback, each iteration improves upon the last. The final solution combines Claude's indexing efficiency with Grok's relevancy improvements, creating something substantially better than any single model would produce on its own.

Why This Works #

The effectiveness of this approach comes down to three key factors:

  1. Different Training Strengths: Each AI has different capabilities and blind spots. Using them together compensates for individual weaknesses.

  2. Concrete Examples: Showing models alternative approaches gives them specific ideas to build upon, rather than generating solutions from scratch.

  3. Focused Iterations: Each round of feedback narrows the solution space toward better results, similar to how code reviews improve human-written code.

Taking the Approach Further #

For complex problems, try these enhancements:

Specific Failure Scenarios: Challenge each iteration with "But what if X happens?" questions to strengthen edge case handling.

Specialized Feedback Roles: Frame your feedback from different perspectives: "As a security expert, I'm concerned about..." or "From a performance standpoint..."

Structured Improvement Cycles: For complex problems, focus each round on a specific aspect:

Is It Worth the Extra Effort? #

Absolutely. For trivial tasks, a single AI query is fine. But for anything complex or critical, the rivalry approach consistently produces more robust, efficient, and creative solutions. It's the difference between getting advice from one developer versus consulting a diverse team of specialists.

Try this method on your next challenging problem. The results speak for themselves.

Inspired by an observation from Greg Isenberg, refined through practical application.