Everything You Need to Know

How blind evaluation works, what credits cost, and why teams switch from manual testing.

What is blind comparative ranking?

PeerLM anonymizes all model responses, shuffles the order, and asks evaluator models to rank them best-to-worst. The evaluator never knows which model produced which response — eliminating name bias and positional bias. Rankings map back to source models as normalized scores.

How is PeerLM different from testing models manually?

Manual testing means one person, one prompt, one afternoon — and a decision nobody can reproduce. PeerLM runs blind evaluations across dozens of scenarios, personas, and criteria simultaneously. You get statistically meaningful data instead of anecdotes, and audit-ready reports instead of Slack threads.

What models can I evaluate?

200+ models across OpenAI, Anthropic, Google, Meta, Mistral, and more — via OpenRouter and Groq integrations. Evaluate any combination against each other. Model capabilities sync automatically so you always have access to the latest releases.

How do eval credits work?

Credits use a multiplier system based on model tier: Standard and Advanced = 1x, Premium = 2x, Frontier = 3x. Both generator and evaluator models consume credits. Formula: SUM(all model multipliers) × system prompts × test prompts × samples. Cache hits consume 0 credits, so re-running the same evaluation costs nothing. Free plans get 200 signup credits with pay-as-you-go at $0.20/credit. Pro plans include 1,000 credits/month with overage at $0.10/credit.

What happens when I subscribe to a plan?

Your workspace is activated immediately. All evaluations and data are retained — nothing resets. Pro plans unlock 1,000 monthly credits, Premium models, API & MCP access, and custom evaluation criteria. Start running evals right away. Reach out to sales@peerlm.com if you have any questions.

What are system prompts?

System prompts (personas) let you test models in context — as a customer support agent, code reviewer, creative writer, etc. Each system prompt defines the role and instructions given to the model. PeerLM breaks down results by system prompt so you see which model wins for each use case.

What's the difference between JSON output and text output?

JSON mode returns structured arrays (e.g., '3 one-liner jokes') where each item is scored independently — giving granular data. Text mode evaluates free-form prose as a whole. JSON mode is recommended for comparative evaluations as it produces more reliable rankings.

Can I share results with my team?

Yes. All paid plans include shareable reports — a public link anyone can view without an account. Reports include rankings, per-persona breakdowns, and response samples. Export as CSV or JSON for further analysis.

How does response caching work?

PeerLM hashes the combination of model, persona, and topic content. Matching cached responses are reused at zero cost. Edit any part of the prompt and the cache auto-invalidates. Re-running evaluations is free — you only pay for new generations.

What is deterministic mode?

PeerLM sets temperature to 0 and uses fixed seeds where supported. It checks each model's capability registry first to prevent silent failures. Parameters used are logged for audit purposes — fully reproducible results.

Do you store my API keys?

Model calls use PeerLM-managed keys by default — no setup needed. For BYOK (Bring Your Own Key) on Enterprise plans, keys are encrypted with AES-256 at rest and never logged or exposed.

What is the PeerLM MCP server?

The PeerLM MCP (Model Context Protocol) server lets you run evaluations directly from Claude Desktop, Cursor, or any MCP-compatible client. Create prompts, configure suites, trigger runs, and check results — all without leaving your IDE. Available on Pro and Enterprise plans. Set it up with: npx -y @peerlm/mcp

Does PeerLM have an API?

Yes. The REST API lets you manage suites, create system prompts and test prompts, trigger evaluation runs, and retrieve results programmatically. Available on Pro and Enterprise plans. Generate an API key from Settings > API Keys.

Can I cancel anytime?

Yes. Month-to-month, no long-term commitment. Cancel or downgrade anytime from billing settings. You keep access through the end of your billing period.

Still have questions?

Contact support