Architecture

LevelApp is built around a simple but powerful flow: simulate conversations → evaluate the replies → collect and return structured results.

Core Components

LevelApp is made up of a few key building blocks:

Simulators Run test conversations by sending prompts to AI models and collecting their responses.
Evaluators Score the AI’s responses based on how well they match expected answers (using LLMs or string comparison).
Scoring Logic Combines multiple scores (text and metadata) and handles repeated test attempts for more reliable results.
Test Batches Packages of test conversations that drive the whole evaluation — you define what to test, LevelApp takes care of the rest.

This modular setup makes LevelApp flexible and easy to extend. You can: