Question 1

What is ModelTrust?

Accepted Answer

ModelTrust is an AI model evaluation platform that lets you run structured questions across multiple language models, compare their outputs, and measure reliability. It helps teams decide which model to trust for specific use cases.

Question 2

How does ModelTrust compare models?

Accepted Answer

You create an evaluation with questions, select the models you want to test, and run them all simultaneously. ModelTrust collects responses, calculates agreement scores, flags disagreements, and identifies when outputs need human review.

Question 3

What models does ModelTrust support?

Accepted Answer

ModelTrust supports OpenAI (GPT-4, GPT-4o), Anthropic (Claude), Google (Gemini), and xAI (Grok). New providers can be added through the adapter system.

Question 4

What is AI model evaluation?

Accepted Answer

AI model evaluation is the process of systematically testing language models against defined questions to measure accuracy, consistency, and reliability. Instead of relying on general benchmarks, ModelTrust lets you test models against your own questions and criteria.

Question 5

What is a trust score?

Accepted Answer

A trust score is a quantified reliability metric calculated from a model's performance across an evaluation. It factors in response consistency, JSON validity rates, calibration accuracy, and agreement with other models. Higher scores indicate more reliable outputs for your specific use case.

Question 6

How much does ModelTrust cost?

Accepted Answer

ModelTrust is currently in private beta and free to use during the beta period. You only pay for the API costs of the models you evaluate (using your own API keys). Pricing for the hosted service will be announced when we launch publicly.

Question 7

Who built ModelTrust?

Accepted Answer

ModelTrust is built by Idea Warehouse, a software company founded by Colin Smillie. Colin is a software engineer and entrepreneur focused on building tools that help teams make better decisions with AI.

Which AI model can you actually trust?

The Problem with Trusting AI

Why Not Just Ask ChatGPT?

Features

Multi-Model Evaluation

Benchmark Question Types

Cost & Token Tracking

Side-by-Side Comparison

How It Works

Create an Evaluation

Select Your Models

Analyze the Results

Frequently Asked Questions