v1
/eval
Launch an eval
Launch an evaluation. This is the API-equivalent of the Eval
function that is built into the Braintrust SDK. In the Eval API, you provide pointers to a dataset, task function, and scoring functions. The API will then run the evaluation, create an experiment, and return the results along with a link to the experiment. To learn more about evals, see the Evals guide.
Authorization
Authorization
RequiredBearer <token>
Most Braintrust endpoints are authenticated by providing your API key as a header Authorization: Bearer [api_key]
to your HTTP request. You can create an API key in the Braintrust organization settings page.
In: header
Request Body
application/json
RequiredEval launch parameters
project_id
Requiredstring
Unique identifier for the project to run the eval in
data
RequiredAny properties in dataset_id, project_dataset_name
The dataset to use
task
RequiredAny properties in function_id, project_slug, global_function, prompt_session_id, inline_code, inline_prompt
The function to evaluate
scores
Requiredarray<Any properties in function_id, project_slug, global_function, prompt_session_id, inline_code, inline_prompt & unknown>
The functions to score the eval on
experiment_name
string
An optional name for the experiment created by this eval. If it conflicts with an existing experiment, it will be suffixed with a unique identifier.
metadata
object
Optional experiment-level metadata to store about the evaluation. You can later use this to slice & dice across experiments.
stream
boolean
Whether to stream the results of the eval. If true, the request will return two events: one to indicate the experiment has started, and another upon completion. If false, the request will return the evaluation's summary upon completion.
trial_count
number | null
The number of times to run the evaluator per input. This is useful for evaluating applications that have non-deterministic behavior and gives you both a stronger aggregate measure and a sense of the variance in the results.
is_public
boolean | null
Whether the experiment should be public. Defaults to false.
timeout
number | null
The maximum duration, in milliseconds, to run the evaluation. Defaults to undefined, in which case there is no timeout.
max_concurrency
number | null
The maximum number of tasks/scorers that will be run concurrently. Defaults to undefined, in which case there is no max concurrency.
base_experiment_name
string | null
An optional experiment name to use as a base. If specified, the new experiment will be summarized and compared to this experiment.
base_experiment_id
string | null
An optional experiment id to use as a base. If specified, the new experiment will be summarized and compared to this experiment.
git_metadata_settings
object | null
Optional settings for collecting git metadata. By default, will collect all git metadata fields allowed in org-level settings.
repo_info
object | null & unknown
Eval launch response