Skip to main content

evaluate

Supply ground truth for comparison versus LLM output and let a 2nd LLM be the "judge"

  • Pass or Fail
  • Eval history is stored in Sqlite database

Installation

Get started by setting up your development environment.

Or try Evaluate immediately by cloning from GitHub.

What you'll need

  • Rust version 1.70 or above
  • Node.js version 20.0 or above (for the documentation site)
  • A Gemini API key from Google AI Studio
  • Git for version control

Clone and Setup

Clone the Evaluate repository and set up your environment:

# 1. Clone the repository
git clone git@github.com:RGGH/evaluate.git

# 2. Navigate into the project directory
cd evaluate

Create a .env file in the root directory with your configuration:

DATABASE_URL=sqlite:data/evals.db
api_base = "https://generativelanguage.googleapis.com"
api_key = "AIzaSyAkQnssdafsdfasdfasxxxxxxxxxxxxxxxxxxx"

The project will automatically install all necessary dependencies when you build it.

Start your application

Run the development server:

cargo run

The cargo run command builds your Rust application and starts the evaluation server locally at http://127.0.0.1:8080/.

You should see output similar to:

                   _                          
| | _
_____ _ _ _____| | _ _ _____ _| |_ _____
| ___ | | | (____ | || | | (____ (_ _) ___ |
| ____|\ V // ___ | || |_| / ___ | | |_| ____|
|_____) \_/ \_____|\_)____/\_____| \__)_____)


LLM Evaluation & Testing Framework

✅ DATABASE_URL set to: sqlite:data/evals.db
✅ Created database directory: data
📦 Database file path: /home/pop/rust/evaluate/data/evals.db
📦 Connecting to: sqlite:///home/pop/rust/evaluate/data/evals.db?mode=rwc
✅ Database connected successfully
✅ Database migrations completed
🚀 Starting server...
📊 Frontend available at http://127.0.0.1:8080
[2025-10-14T15:40:17Z INFO actix_server::builder] starting 22 workers
[2025-10-14T15:40:17Z INFO actix_server::server] Actix runtime found; starting in Actix runtime
[2025-10-14T15:40:17Z INFO actix_server::server] starting service: "actix-web-service-0.0.0.0:8080", workers: 22, listening on: 0.0.0.0:8080

Open your browser and navigate to http://127.0.0.1:8080 to access the built-in GUI. You can now start running evaluations and the database automatically saves your history.

Next Steps

Now that your server is running, you can:

  • Test the API with sample curl commands
  • Use the web interface to run single evaluations
  • Submit batch evaluations using JSON files
  • View your evaluation history in the GUI

Explore the documentation to learn more about configuring models, writing eval definitions, and using the AI-powered judging capabilities.