About Us

We built the tool we couldn't find.

PeerLM started as an internal testing ground — and became the evaluation platform AI teams trust to make model decisions.

PeerLM was born out of frustration. We were building AI-powered products and needed to know which LLM could deliver the most realistic, high-quality output for our specific use cases. The existing approach — flipping between chat interfaces, running ad-hoc prompts, and making gut-feel decisions — wasn't cutting it.

So we built a testing ground. A system that could take our real prompts, run them across multiple models simultaneously, anonymize and shuffle the responses, and have evaluator models rank them blind — no name bias, no positional bias, just raw performance data.

The results were eye-opening. Models we assumed were the best often weren't. Cheaper models frequently outperformed premium ones for specific tasks. And the data was reproducible — something we could present to leadership and actually defend.

We realized every team building with LLMs was facing the same problem. That internal tool became PeerLM: a blind evaluation platform that turns model selection from guesswork into science.

Today, PeerLM supports 200+ models across 15+ providers and has helped teams save over $800K on model costs by finding the right model for the right job.

Leadership

Anthony Simon

Co-Founder & CEO

Anthony founded PeerLM after experiencing firsthand how difficult it was to objectively evaluate LLMs for production use cases. With a background in software engineering and AI product development, he set out to build the rigorous, bias-free evaluation platform that the industry was missing.

Connect on LinkedIn

Ready to see which model wins?

Start evaluating models for free, or let us run a managed trial for your team.

Get Started Free Request a Free Report