3. Your First Evaluation
Let’s run a minimal conversation test to validate your setup.
1. Check for simple_batch.json
Ensure you have the simple_batch.json
file in your repo root. It should include something like this:
{
"test_batch": {
"id": "0001-first-eval",
"interactions": [
{
"id": "turn-1",
"user_message": "Hello, what can you do?",
"agent_reply": "",
"reference_reply": "I can evaluate AI model outputs against expectations.",
"interaction_type": "opening",
"reference_metadata": { "intent": "greeting", "sentiment": "neutral" },
"generated_metadata": {}
}
],
"description": "First basic test"
},
"endpoint": "http://localhost:8080",
"model_id": "meta-llama/Llama-3.3-70B-Instruct",
"attempts": 1,
"test_name": "first_eval"
}
2. Call the /evaluate
endpoint through curl or use Postman:
curl -X POST http://localhost:8080/evaluate \
-H "Content-Type: application/json" \
-d @simple_batch.json | jq
You should see a JSON response similar to this:
{
"message": "Evaluation completed successfully",
"results": {
"scenarios": [
{
"scenario_id": "0001-first-eval",
"attempts": [
{
"interactions": [ { /* ... */ } ],
"average_scores": { /* ... */ }
}
],
"average_scores": { /* ... */ }
}
]
}
}
3. Quick sanity checks
- If you see
"Request failed"
inagent_reply
, verify your LLM endpoint (endpoint
+model_id
). - No JSON? Check your
.env
credentials and restart the server.