> ## Documentation Index
> Fetch the complete documentation index at: https://braintrust.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Product updates

> New updates and product improvements

export const feature_1 = "Playground annotations"

export const verb_1 = "are"

export const feature_0 = "Snapshots"

export const verb_0 = "are"

export const version_40 = "v1.1.28+"

export const version_39 = "v1.1.28+"

export const version_38 = "v1.1.29+"

export const version_37 = "v1.1.29+"

export const version_36 = "v2.0+"

export const version_35 = "v1.1.31+"

export const version_34 = "v1.1.31+"

export const version_33 = "v2.0+"

export const version_32 = "v2.0+"

export const version_31 = "v2.0+"

export const version_30 = "v2.0+"

export const version_29 = "v2.0+"

export const version_28 = "v2.0+"

export const version_27 = "v2.1.0+"

export const version_26 = "v2.0+"

export const version_25 = "v2.1.0+"

export const version_24 = "v2.0+"

export const version_23 = "v2.1.0+"

export const version_22 = "v2.1.0+"

export const version_21 = "v2.2.0+"

export const version_20 = "v2.2.1+"

export const version_19 = "v2.2.1+"

export const version_18 = "v2.1.0+"

export const version_17 = "v2.2.1+"

export const version_16 = "v2.1.1+"

export const version_15 = "v2.2.1+"

export const version_14 = "v2.1.1+"

export const version_13 = "v2.1.0+"

export const version_12 = "v2.2.1+"

export const version_11 = "v2.2.1+"

export const version_10 = "v2.2.1+"

export const version_9 = "v2.2.1+"

export const version_8 = "v2.2.1+"

export const version_7 = "v2.2.1+"

export const version_6 = "v2.3.0+"

export const version_5 = "v2.2.1+"

export const version_4 = "v2.2.1+"

export const version_3 = "v2.2.1+"

export const version_2 = "v2.0+"

export const version_1 = "v2.3.0+"

export const version_0 = "v2.6.0+ (not yet released for self-hosting)"

<Update label="June 2026">
  ### Audit logging

  Braintrust now records administrative actions across your organization, such as creating projects, changing settings, granting permissions, managing members, and creating API keys. View recent activity in <Icon icon="settings-2" /> **Settings** > <Icon icon="clipboard-list" /> **Audit log** with time range, filter, and column controls, or query the full history with [SQL](/reference/sql) from the SQL sandbox, the [`bt sql`](/reference/cli/sql) CLI, or the API. Organizations with strict data access requirements can also enable auditing of data reads. See [Audit logging](/admin/audit-logs) for details.

  <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_0}</Badge></Tooltip>

  ### Gateway provider failover

  Gateway requests can now set `x-bt-fallback-providers` to retry the same request against fallback provider credentials when the primary provider returns a retryable provider error. Responses include headers showing which provider completed the request and, when failover succeeds, the original and final `model/provider` pairs. See [Enable provider failover](/deploy/gateway#enable-provider-failover) for details.

  ### Trace redesign

  Selecting a span now opens its detail panel to the **Messages** tab, focused on the highest-signal content first. When [topics](/observe/topics) are enabled, the matched topics and facet summaries appear at the top, followed by the span's input and output messages, tool calls, and annotations. Metadata, metrics, scores, and raw data remain available in the other tabs. See [Examine traces](/observe/examine-traces#inspect-a-trace) for details.

  ### Score visibility for human review

  Human review scores now support an optional visibility setting. Restrict a score to specific members or permission groups so only relevant reviewers see it during review, which keeps the review experience focused for large teams, or leave it unset to keep the score visible to everyone. See [Restrict score visibility](/annotate/human-review#restrict-score-visibility) for details.

  ### Conditional human review scores

  Human review scores now support filter conditions that control when a score appears in the review panel. Set a **Show when** condition on a score to surface it only when its filter expression evaluates to true for the span being reviewed — for example, show a detailed rubric only when a triage score is below a threshold, or a correction score only when the expected output matches a specific category. Conditions use SQL syntax and can reference other scores, expected values, and metadata. See [Show scores conditionally](/annotate/human-review#show-scores-conditionally) for details.

  ### Prompt and scorer versions in playground tasks

  When you add a saved prompt or scorer as a playground task, you can now build the task from a specific historical version instead of always getting the latest, making it easier to compare versions side by side and reproduce past results. See [Add tasks](/evaluate/playgrounds#add-tasks) for details.

  ### Rewind online scoring automations

  When you update a scorer or need to correct scoring results, you can now rewind an online scoring automation to re-process traces from a specific timestamp forward. This is useful for re-evaluating recent traces after scorer updates, fixing interrupted scoring runs, or addressing scoring errors. See [Rewind an automation](/evaluate/score-online#rewind-an-automation) for details.

  <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_1}</Badge></Tooltip>

  ### Classifiers

  Classifiers are scorer-like evaluators that return a categorical label instead of a numeric score, so you can categorize by a dimension such as sentiment, issue type, or policy category. They run wherever scorers do, across eval cases in experiments and on production traces via online scoring. Classifications surface alongside scores and appear as columns you can sort and filter in experiment and trace tables. In logs, you can also promote a category's traces into a dataset, turning production patterns into evaluation test cases. See [Classifiers](/evaluate/write-scorers#classifiers) for details.

  <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_2}</Badge></Tooltip>

  ### Multi-user human review

  Multiple reviewers can now score the same span independently without overwriting each other's work. Braintrust stores each reviewer's scores separately and automatically averages them on the parent span, with a breakdown of your own reviews, all reviews of the span, and all spans in the trace. For score types that can't be averaged, such as free-form text and categorical scores that write to expected output, you can choose which reviewer's value becomes the parent span's value. See [Add human feedback](/annotate/human-review) for details.

  ### Programmatic service token creation

  Organization owners can now create service tokens through the API. Call [`POST /v1/service_token`](/api-reference/servicetokens/create-service_token) and authenticate with a service token that has organization-owner permissions, or include `token_name` when creating a service account through [`PATCH /v1/organization/members`](/api-reference/organizations/modify-organization-membership). API key creation remains UI-only, and user API keys cannot be used to create service tokens. See [Create service tokens](/admin/organizations#create-service-tokens) for the supported flows.

  ### Run remote evals and sandboxes as experiments

  [Remote evals and sandboxes](/evaluate/remote-evals) can now run directly as experiments, not just from a playground. This lets you capture immutable, tracked results from custom agent and non-prompt code, comparable to any other experiment in your project, without going through the playground iteration loop first.

  ### Workload identity federation for Azure

  Connect Azure AI Foundry and Azure OpenAI to Braintrust using Microsoft Entra Workload Identity, which exchanges Braintrust-signed OIDC tokens for Microsoft Entra access tokens instead of storing long-lived Azure credentials in Braintrust. Available for Braintrust-hosted organizations through organization-level AI providers when the Braintrust gateway is enabled. Project-level Azure providers continue to use API key or Entra API authentication. See [Azure AI Foundry](/integrations/ai-providers/azure#configure-the-integration) for setup instructions.

  ### Workload identity federation for OpenAI

  Connect OpenAI to Braintrust using workload identity federation, which exchanges Braintrust-signed JWTs for OpenAI access tokens instead of storing long-lived OpenAI API keys in Braintrust. Available for Braintrust-hosted organizations through organization-level AI providers when the Braintrust gateway is enabled. Project-level OpenAI providers continue to use API key authentication. See [OpenAI](/integrations/ai-providers/openai#setup) for setup instructions.

  ### Shingled search optimization

  Brainstore can now index multi-word shingles in bloom filters, so phrase and multi-word `search()` queries eliminate more log segments before scanning. Enable **Shingled search optimization** alongside log search optimization in project settings to speed up full-text search on high-volume logs. See [Speed up log filtering](/admin/projects#speed-up-log-filtering) for details.

  <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_3}</Badge></Tooltip>

  ### bt CLI releases

  * [v0.12.0](https://github.com/braintrustdata/bt/releases/tag/v0.12.0) - Added [`bt datasets snapshots`](/reference/cli/datasets#bt-datasets-snapshots) to create, list, restore, and delete dataset snapshots from the CLI, with a preview of the rows affected before restoring. Added [`bt topics btmap`](/reference/cli/topics#bt-topics-btmap) to download the raw topic map artifact for a facet's topic map. Improved JSON parsing error messages to include more context about what failed and where.
  * [v0.11.1](https://github.com/braintrustdata/bt/releases/tag/v0.11.1) - Fixed the npm package build so the Windows binary is extracted to the correct path, completing npm distribution support across all platforms.
  * [v0.11.0](https://github.com/braintrustdata/bt/releases/tag/v0.11.0) - `bt` is now available through npm as an optional dependency of the [`braintrust` JavaScript SDK](/reference/sdks/typescript) (v3.17.0 or later). After installing `braintrust`, run the CLI with `npx bt` or `pnpm exec bt`. See [Install with npm](/reference/cli/quickstart#install-with-npm).
  * [v0.10.1](https://github.com/braintrustdata/bt/releases/tag/v0.10.1) - Fixed the global `--json` flag, which previously had no effect, to format command output as JSON. Fixed serialization of typed parameter schemas for remote evals.
  * [v0.10.0](https://github.com/braintrustdata/bt/releases/tag/v0.10.0) - Added [`bt datasets pipeline`](/annotate/datasets/pipelines) for transforming project logs into dataset rows, with one-shot (`run`) or staged (`pull`, `transform`, `push`) execution. Supports TypeScript and Python pipelines declared with `DatasetPipeline(...)`. `bt functions push` (Python) now discovers and pushes parameter definitions alongside functions and prompts. Windows binaries are now code-signed, removing SmartScreen "unknown publisher" warnings. Fixed `ERR_CLOSED_SERVER` errors when running evals via `vite-node`.

  ### TypeScript SDK releases

  * [v3.20.0](https://github.com/braintrustdata/braintrust-sdk-javascript/releases/tag/braintrust@3.20.0) - Added `braintrustFlueInstrumentation()` for cleaner [Flue v1 manual instrumentation](/integrations/agent-frameworks/flue#manual-instrumentation-typescript), so Flue v1 apps can register Braintrust with Flue's `instrument()` API. Flue v0.8.x manual instrumentation continues to use `braintrustFlueObserver`.
  * [v3.19.0](https://github.com/braintrustdata/braintrust-sdk-javascript/releases/tag/braintrust@3.19.0) - Added support for tracing [`@anthropic-ai/bedrock-sdk`](/integrations/ai-providers/anthropic#anthropic-bedrock-typescript), [`@aws-sdk/client-bedrock-runtime`](/integrations/ai-providers/bedrock#tracing-typescript), and [`@strands-agents/sdk`](/integrations/agent-frameworks/strands-agent#tracing-typescript). The SDK now exposes a reporter for the [vitest-evals library](/integrations/sdk-integrations/vitest#report-vitest-evals-runs-to-braintrust). Added support for [Flue v1](/integrations/agent-frameworks/flue) via manual instrumentation, removing the previously erroring automatic instrumentation. Fixed Anthropic system-message ordering and dataset-backed eval origin tracking for copied dataset rows.
  * [v3.18.0](https://github.com/braintrustdata/braintrust-sdk-javascript/releases/tag/braintrust@3.18.0) - Added support for tracing [AI SDK v7](/integrations/sdk-integrations/vercel#manual-instrumentation-telemetry-registration-typescript) and the [Pi Coding Agent SDK](/integrations/agent-frameworks/pi-coding-agent). Expanded support for tracing [Google GenAI Interactions API](/integrations/ai-providers/gemini#trace-live-api-and-interactions). Explicit `origin` metadata can now be passed into evals. `bt functions push` can upload project-scoped TypeScript [classifiers](/evaluate/write-scorers#classifiers). Fixed `BraintrustSpanProcessor` support for OpenTelemetry SDK v1 spans, Flue and Claude Agent SDK span nesting, AI SDK v6 time to first token metrics, and eval summaries for explicit base experiments.
  * [v3.17.0](https://github.com/braintrustdata/braintrust-sdk-javascript/releases/tag/braintrust@3.17.0) - The [`bt` CLI](/reference/cli/quickstart) is now available as an optional dependency of the `braintrust` npm package, so JavaScript SDK users can pin `bt` to their SDK version and run it via `pnpm exec bt` without a separate install. Platform binaries are delivered through optional dependencies, with a postinstall download as a fallback when optional dependencies are skipped. Added `BT_BINARY_PATH` to override the resolved binary and `BT_SKIP_DOWNLOAD=1` to skip the postinstall download.
  * [v3.16.0](https://github.com/braintrustdata/braintrust-sdk-javascript/releases/tag/braintrust@3.16.0) - Exported `LocalTrace`, the `Trace` object used by [trace-level scorers](/evaluate/custom-code#score-traces), so you can construct one directly. `invoke()` now accepts an `overrides` parameter that deep-merges configuration into the resolved function data server-side for facet, code, global, and remote eval functions (no effect on prompt functions). Fixed `BraintrustStream` decoding of multi-byte UTF-8 characters split across chunk boundaries. The `DatasetPipelineRow` type no longer includes an `output` field.
  * [v3.15.0](https://github.com/braintrustdata/braintrust-sdk-javascript/releases/tag/braintrust@3.15.0) - Updated [Flue](/integrations/agent-frameworks/flue) instrumentation to use the `@flue/runtime` 0.8.0 observe hooks, with new span names and metadata keys. Requires `@flue/runtime` 0.8.0 or later. **Breaking change**: removed the `wrapFlueContext` and `wrapFlueSession` exports. Migrate to the observe hooks API. Auto-instrumentation continues to work without changes.

  ### Go SDK releases

  * [v0.8.0](https://github.com/braintrustdata/braintrust-sdk-go/releases/tag/v0.8.0) - Added [classifier support](/evaluate/write-scorers#classifiers) through the `Classifiers` option on the eval runner, with `eval.NewClassifier` and the `eval.Classification` type. The SDK now discovers `BRAINTRUST_API_KEY` from a `.braintrust.json` file, searching upward from the working directory when no key is set through the environment or the `WithAPIKey` option. `WithAPIKey` now ignores blank or whitespace-only values, so environment and file-based fallback remain in effect.
  * [v0.7.2](https://github.com/braintrustdata/braintrust-sdk-go/releases/tag/v0.7.2) - Base64-encoded AI attachments (images, PDFs, and other binary data) in span inputs and outputs are now automatically uploaded as separate Braintrust attachment objects, reducing span payload sizes and the chance of exceeding OpenTelemetry span size limits. Supported for OpenAI, Anthropic, Google Gemini, and AWS Bedrock. Enabled by default. Disable with `BRAINTRUST_AUTO_CONVERT_AI_ATTACHMENTS=false` or the `WithAutoConvertAIAttachments(false)` option.

  ### Java SDK releases

  * [v0.3.13](https://github.com/braintrustdata/braintrust-sdk-java/releases/tag/v0.3.13) - Fixed `BrainstoreTrace.getLLMConversationThread()` returning an empty conversation inside [trace-level scorers](/evaluate/custom-code#score-traces). The method now reconstructs the conversation thread even when the trace's root span has not yet been ingested, which is the case during eval scoring. Also fixed remote evals so eval metadata omits the `parameters` field entirely when an eval defines no parameters.
  * [v0.3.12](https://github.com/braintrustdata/braintrust-sdk-java/releases/tag/v0.3.12) - Added [classifier support](/evaluate/write-scorers#classifiers) in `Eval()` through the `classifiers` parameter. The SDK now also discovers `BRAINTRUST_API_KEY` from a `.env.braintrust` file, searching upward from the working directory.

  ### C# SDK releases

  * [v0.2.8](https://github.com/braintrustdata/braintrust-sdk-dotnet/releases/tag/v0.2.8) - Added [classifier support](/evaluate/write-scorers#classifiers) for categorical evaluation: implement `IClassifier<TInput, TOutput>`, use `FunctionClassifier<TInput, TOutput>` for inline single- or multi-label classifiers, or `ITracedClassifier<TInput, TOutput>` to inspect trace spans, and register them with `.Classifiers(...)` on the eval builder. The OpenAI and Anthropic integrations now tag LLM spans with their span type and capture request parameters such as model, temperature, max tokens, and tools as span metadata. The SDK now discovers `BRAINTRUST_API_KEY` from a `.env.braintrust` file, searching upward from the working directory, and defers API key validation until first use instead of at construction.

  ### Improvements

  * Conditional human review scores now support **Trace** and **Subspan** condition scopes in addition to span-level conditions. **Trace** conditions show the score when any span in the trace matches the expression. **Subspan** conditions show the score when any child span of the current span matches. See [Show scores conditionally](/annotate/human-review#show-scores-conditionally) for details.
  * Timeline view now properly renders post-hoc scoring spans (such as AutoEval results) inline. By default, scorer spans are hidden to focus on the root execution timeline. Select <Icon icon="ellipsis-vertical" /> and toggle **Include score spans in timeline** to view scoring duration alongside your trace execution. See [View as a timeline](/observe/examine-traces#view-as-a-timeline) for details.
  * Large text and object diffs can now be displayed in full by clicking **Show more** when comparing prompt versions, improving visibility for long prompts and complex data structures.
  * When Loop is enabled, the SQL filter editor for [log alerts](/admin/automations/alerts#create-a-log-alert) now supports <Icon icon="blend" /> **Generate**, so you can write the filter from a natural-language description instead of by hand.
  * Added shareable links to specific saved [custom views](/annotate/custom-views). Selecting a saved trace or dataset view now records it in the URL, so copying the link opens that exact view.
  * The [**<Icon icon="shield-check" /> Permission groups**](https://www.braintrust.dev/app/~/configuration/org/groups) dialog now lets you set organization-level, all-projects, and project-specific permissions inline while creating a group, and you can create a new permission group directly from the member assignment picker. See [Access control](/admin/access-control) for details.
  * Time series charts on the [Monitor](/observe/dashboards) page now support a **Target time interval** option (Auto, Week, Day, or Hour) for explicit control over time bucketing, with smoother automatic bucket sizes and query ranges aligned to full intervals.

    <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_4}</Badge></Tooltip>
  * Git diff content is now opt-in. Braintrust no longer logs the **Diff** field of git metadata unless your organization explicitly enables it at [**<Icon icon="settings-2" /> Settings** > **<Icon icon="scroll-text" /> Logging**](https://www.braintrust.dev/app/~/configuration/org/logging). Other git metadata fields are still collected by default. See [Set git metadata logging](/admin/organizations#set-git-metadata-logging) for details.

    <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_5}</Badge></Tooltip>
  * You can now tune topic clustering settings, including sample size, hierarchy threshold, reconciliation, clustering algorithm, and dimension reduction, directly from the Topics page in addition to the [`bt` CLI](/reference/cli/topics). See [Tune clustering settings](/observe/topics/manage#tune-clustering-settings) for details.
  * Trace-level scorers and prompts now support a [`{{thread_with_system}}`](/evaluate/llm-as-a-judge#score-traces) reserved variable that renders the full conversation including system messages. Scorer `{{thread}}` continues to omit system messages so judge rubrics aren't polluted by your application's system prompt.

    <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_6}</Badge></Tooltip>
  * When creating an organization, the data plane region is now pre-selected based on your location: EU for users in the EU, EEA, UK, and Switzerland; US otherwise. You can still change it before creating the organization. See [Create an organization](/admin/organizations#create-an-organization) for details.
  * Log alerts with a webhook action now require a successful test before saving, so a stale test result can't authorize a changed webhook target. See [Set up alerts](/admin/automations/alerts).
  * Automation webhook URLs are now validated when you save an alert. Malformed and non-http(s) URLs are rejected, and delivery to private or internal network addresses is blocked. See [Set up alerts](/admin/automations/alerts).
  * [Spend alerts](/admin/billing/monitor-usage#set-up-spend-alerts) now support Slack channel notifications in addition to email. Select **Billing email**, **Slack channel**, or both when configuring your alert thresholds.
  * Webhook alert URLs must now use `http` or `https`, and Braintrust blocks delivery to private, internal, or otherwise reserved network addresses to protect against server-side request forgery (SSRF). See [Webhook payloads](/admin/automations/alerts#webhook-payloads) for details.
  * MCP server URLs must now use `http` or `https` to protect against server-side request forgery (SSRF). See [Add MCP servers](/evaluate/write-prompts#add-mcp-servers) for details.

    <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_7}</Badge></Tooltip>
  * Bedrock custom models can now use the **OpenAI** format, routing through Bedrock's OpenAI-compatible Chat Completions endpoint, including streaming. See [AWS Bedrock](/integrations/ai-providers/bedrock#connect-bedrock-to-braintrust) for details.

    <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_8}</Badge></Tooltip>
  * Human review score cards now show a per-score review count with a breakdown of which reviewers match the base value and which diverge, and the review-span dropdown includes a **Base span** option to jump back to the parent span's scores. See [Add human feedback](/annotate/human-review#review-with-multiple-reviewers) for details.
  * The [gateway](/deploy/gateway) now supports Google Gemini's native `embedContent` and `batchEmbedContents` endpoints, so you can route Gemini embedding requests, including multimodal text and image content, through the gateway with logging and billing. See [Generate embeddings](/deploy/gateway#generate-embeddings) for details.
  * The [**<Icon icon="beaker" /> Experiments**](https://www.braintrust.dev/app/~/experiments) list now shows an **Estimated cost** column alongside scores and token metrics, making it easier to compare total cost across experiment runs at a glance.
  * Custom columns in [**<Icon icon="activity" /> Logs**](https://www.braintrust.dev/app/~/logs) now accept any valid non-aggregate SQL expression. In addition to simple field references, you can use functions (`concat`, `coalesce`, `lower`), `CASE` expressions, arithmetic operators, and string patterns (`LIKE`, `ILIKE`). See [Create custom columns](/observe/view-logs#create-custom-columns) for details.
  * The `len()` SQL function now returns `1` for scalar string, number, and boolean values instead of `null`. This preserves filter and sort behavior when row materialization collapses a single-element array into a scalar. See [SQL functions](/reference/sql#sql-functions) for details.
  * <Icon icon="blend" /> **Loop** and [**<Icon icon="shapes" /> Playgrounds**](https://www.braintrust.dev/app/~/playgrounds) now display actionable error messages when AI provider requests fail, including an **Edit** link to [**<Icon icon="sparkle" /> AI providers**](https://www.braintrust.dev/app/~/configuration/org/secrets) for invalid API key errors, a **New session** action for context limit errors in Loop, and a **Try again** button for network and provider-outage errors.
  * [**<Icon icon="pentagon" /> Topics**](/observe/topics) is now accessible from the command palette (⌘K / Ctrl+K) for projects with Topics enabled. Settings shortcuts such as API keys and AI providers now appear above the project list in the command palette switcher.
  * The **Thread** (<Icon icon="messages-square" />) layout now includes an <Icon icon="ellipsis" /> options menu with a **Refresh** action to reload thread data. See [View as a conversation](/observe/examine-traces#view-as-a-conversation) for details.
</Update>

<Update label="May 2026">
  ### Topics pricing

  [Topics](/observe/topics) now uses monthly included credits across all plans: \$10/month on Starter and \$249/month on Pro as a launch promotion for the first 3 months, reverting to \$100/month after. Overage charges apply once credits are used up, at \$0.06/MTok input and \$0.40/MTok output, uniform across plans. Credits do not roll over month-to-month. [Custom facets](/observe/topics/custom-facets) are now available on all plans. See [Topics usage](/plans-and-limits#topics-usage) and [Billing FAQ](/admin/billing/faq) for details.

  ### Cached tokens view for traces

  The <Icon icon="square-chart-gantt" /> **Timeline** trace view adds a cached tokens view that scales LLM span bars by cached read tokens and breaks down input token usage by uncached input, cached read, and cache write, so you can assess prompt caching effectiveness across a trace at a glance. See [View as a timeline](/observe/examine-traces#view-as-a-timeline) for details.

  ### Topics for sessions and multi-turn conversations

  Topics automations can now classify groups of related traces, like sessions or multi-turn conversations, as a single unit instead of each trace independently. Pick a grouping key like `metadata.session_id`, set the interval and trace cap that defines each group, and choose whether the classification is written on the chronologically first trace in the group or on every trace with prior traces as context. See [Group traces into conversations](/observe/topics/manage#group-traces-into-conversations) for details.

  <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_9}</Badge></Tooltip>

  ### Workload identity federation for Anthropic

  Connect Anthropic to Braintrust using workload identity federation, which exchanges Braintrust-signed JWTs for Anthropic access tokens instead of storing long-lived Anthropic API keys in Braintrust. Available for Braintrust-hosted organizations through organization-level AI providers. Project-level Anthropic providers continue to use API key authentication. See [Add Anthropic as an AI provider](/integrations/ai-providers/anthropic#add-anthropic-as-an-ai-provider) for setup instructions.

  ### Workload identity federation for Vertex AI

  Connect Google Vertex AI to Braintrust using workload identity federation, which exchanges Braintrust-signed OIDC tokens for Google access tokens instead of storing long-lived Google credentials in Braintrust. Available for Braintrust-hosted organizations through organization-level AI providers when the Braintrust gateway is enabled. Project-level Vertex AI providers continue to use access token or service account key authentication. See [Google Vertex AI](/integrations/ai-providers/google#configure-the-integration) for setup instructions.

  ### Query patterns in the Infra dashboard

  For self-hosted deployments, the **<Icon icon="activity" /> Infra dashboard** now includes **UI queries** and **API queries** tables that group slow query shapes by object type, filter fields, source, and predicate types like `ILIKE`, `match()`, and inequalities. Each row shows count and p50/p95/p99 latency, helping you identify which query shapes are driving latency and where to focus indexing or query refactors. See [Infra dashboard](/admin/self-hosting#infra-dashboard) for details.

  ### Project-scoped default views

  Set a default view at the project level so everyone viewing a project lands on the same starting view, without affecting other projects or requiring organization-level permissions. Project admins (project-level **Update** permission) can now set defaults on Logs, Experiments, Datasets, Review, and Monitor pages, sitting between personal and organization defaults in the lookup order. See [Set default table views](/observe/view-logs#set-default-table-views) for details.

  ### Dataset filters for experiments

  Scope an [experiment run](/evaluate/run-evaluations#create-from-scratch) to a subset of a dataset instead of every record. Filters from a [filtered dataset view](/annotate/datasets/manage#filter-records) carry over when you create an experiment, so the run targets the same records.

  <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_10}</Badge></Tooltip>

  ### Dataset versioning in Loop

  [Loop](/loop#manage-dataset-versions) can now save, list, and restore dataset snapshots and tag versions with environments. Checkpoint a dataset before a batch of edits, roll back to a saved version, or promote a snapshot to `production` without leaving the chat. Available when Loop is opened from a dataset page.

  ### Comparison grade in experiment comparisons

  Experiment comparisons now label each experiment as improvement, regression, tradeoff, or tie relative to the base experiment, so you can tell at a glance whether a run is a clear win, a clear loss, or a mix of gains and tradeoffs. The grade appears as a row in **<Icon icon="table-2" /> Summary table** layout and in column headers in **<Icon icon="grid-2x2" /> Grid** layout, including playground task columns. See [Compare experiments](/evaluate/compare-experiments#assess-overall-impact) for details.

  ### Assume role authentication for Amazon Bedrock

  Connect Amazon Bedrock to Braintrust using AWS STS `AssumeRole` with an IAM role in your AWS account, instead of storing long-lived AWS access keys in Braintrust. Available for Braintrust-hosted organizations. See [Connect Bedrock to Braintrust](/integrations/ai-providers/bedrock#connect-bedrock-to-braintrust) for setup instructions.

  ### Self-hosted data plane v2.1.1

  Data plane v2.1.1 is now the recommended target for self-hosted v2.x upgrades. Use Terraform AWS module v5.2.1 for AWS deployments, or Helm chart 6.2.1 for GCP and Azure deployments. Helm chart 6.2.1 disables code function execution by default. See [Upgrade to data plane v2.x](/admin/self-hosting/upgrade/v2) and [Self-hosting releases](/data-plane-changelog) for details.

  ### Secret previews and rotation tracking

  AI provider and env variable settings show a redacted preview of each saved value (for example, `abc...xyz`) and a **Last updated** timestamp that tracks when the secret value itself was last changed, along with the user who made the change. The same fields are also returned by the public API for `/v1/env_var` and `/v1/ai_secret`. See [Configure AI providers](/admin/ai-providers) and [Set environment variables](/admin/organizations#set-environment-variables) for details.

  <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_11}</Badge></Tooltip>

  ### API key and service token creation restricted to the UI

  <Warning>
    **Update (June 2026):** Service token creation through the API has since been re-enabled for organization owners. See [Programmatic service token creation](#programmatic-service-token-creation).

    **Breaking change**: API keys and service tokens can now be created only from the Braintrust UI. The REST API no longer supports creating either, and the `POST /v1/api_key`, `POST /v1/service_token`, and `PUT /v1/service_token` endpoints have been removed from the public API. The [`PATCH /v1/organization/members`](/api-reference/organizations/modify-organization-membership) endpoint can still create a service account, but its service token must be created by an organization owner via the Braintrust UI. See [Manage API keys](/admin/organizations#manage-api-keys) and [Create service tokens](/admin/organizations#create-service-tokens) for the supported flows.
  </Warning>

  ### Auto-instrumentation for Java

  Braintrust now supports [auto-instrumentation](/instrument/trace-llm-calls#java) for Java, enabling zero-code tracing via the `braintrust-java-agent` JAR. Attach it at JVM startup to automatically trace OpenAI, Anthropic, Spring AI, LangChain4j, and Google GenAI calls. The agent also supports running alongside the Datadog Java agent.

  ### SQL `IF` and `COUNT_IF` for conditional expressions

  SQL queries now support `IF(condition, then_value, else_value)` as a shorter alternative to `CASE WHEN ... THEN ... ELSE ... END` for two-branch conditions, and `COUNT_IF(condition)` as a shorter alternative to `count(CASE WHEN condition THEN 1 ELSE NULL END)` for counting rows that match a predicate. See [Conditional expressions](/reference/sql#conditional-expressions) for syntax and examples.

  <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_12}</Badge></Tooltip>

  ### SQL `TABLESAMPLE` for random sampling

  SQL queries now support `TABLESAMPLE` (with the shorter `SAMPLE` alias) to randomly sample rows from a table without scanning every row. Use `TABLESAMPLE n PERCENT` to sample a percentage of rows or `TABLESAMPLE n ROWS` to sample a fixed count. An optional `SEED` clause makes results deterministic across runs. See [SAMPLE](/reference/sql#sample) for syntax and examples.

  <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_13}</Badge></Tooltip>

  ### SQL `ROLLUP` and `GROUPING SETS` for multi-level aggregation

  SQL queries now support `ROLLUP` and `GROUPING SETS` for multi-level aggregation in a single query. `ROLLUP` produces subtotals and a grand total alongside individual groups, while `GROUPING SETS` lets you specify exactly which grouping combinations to compute. The `GROUPING()` function labels which dimensions are rolled up in each row, so you can distinguish detail rows from subtotals. See [ROLLUP and GROUPING SETS](/reference/sql#rollup-and-grouping-sets) for details.

  <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_14}</Badge></Tooltip>

  ### SQL `SETTINGS` clause for per-query options

  SQL queries now support a trailing `SETTINGS` clause for passing per-query hints to the Brainstore execution engine. The initial options are `max_bloom_terms`, which controls when bloom-filter segment elimination is skipped for filters with many disjunctive terms, `disable_metric_columnstore`, which opts a single query out of the metric columnstore optimization for diagnostic comparisons, and `preview_length`, which truncates preview fields (`input`, `output`, `expected`, `error`, `metadata`) to a chosen character length. See [SETTINGS](/reference/sql#settings) for syntax and the full list of options.

  <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_15}</Badge></Tooltip>

  ### bt CLI releases

  * [v0.9.2](https://github.com/braintrustdata/bt/releases/tag/v0.9.2) - [`bt setup`](/reference/cli/setup) now opens a browser to sign in or sign up and then prompts in the terminal to pick an organization and project, replacing the previous API-key paste flow. When multiple [auth profiles](/reference/cli/overview#troubleshooting) are saved and no profile is selected by flag, env var, or org match, `bt` now prompts you to pick one interactively instead of failing with an ambiguous-profile error.
  * [v0.9.1](https://github.com/braintrustdata/bt/releases/tag/v0.9.1) - `bt` now warns once daily when any [AI provider](/admin/ai-providers) key is older than six months. `bt view trace` text output now suggests `--json` for fetching the full trace payload in one command. `bt sync push` reads input data more efficiently via byte-based buffering. Fixed `--verbose` defaulting to true for `bt sync pull`. Fixed `bt datasets view` row fetching.
  * [v0.9.0](https://github.com/braintrustdata/bt/releases/tag/v0.9.0) - Added `bt datasets` for creating, listing, viewing, updating (aliases: `add`, `refresh`), and deleting remote datasets from the CLI. Supports file, stdin, and inline JSON input. Fixed `--limit` being ignored when paginating `bt view` results. Fixed unicode handling in `bt sync pull`. `bt setup` now combines project creation and selection into a single interactive step.
  * [v0.8.0](https://github.com/braintrustdata/bt/releases/tag/v0.8.0) - Added `--matrix-param` to `bt eval` to specify multiple values for one or more parameters and run one experiment per combination. Added `bt topics config delete` to permanently remove a Topics automation. Added Copilot and Qwen support to `bt setup`, with skills installed at `.copilot/skills/braintrust/SKILL.md` and `.qwen/skills/braintrust/SKILL.md` respectively. Copilot MCP is configured via `copilot mcp add`. `bt setup` default mode is now ephemeral. Pass `--skills` to install agent skills or `--mcp` to configure MCP. Agent skills are no longer installed automatically by `bt setup`. Run `bt setup skills` to opt in. Added a post-success verification prompt to `bt setup`.

  ### Python SDK releases

  * [v0.23.0](https://github.com/braintrustdata/braintrust-sdk-python/releases/tag/py-sdk-v0.23.0) - New [Hugging Face](/integrations/ai-providers/huggingface) Hub integration that traces `InferenceClient` calls (chat completion, text generation, feature extraction, and sentence similarity). [Instructor](/integrations/sdk-integrations/instructor) structured-output calls are now traced as `task` spans. **Breaking change**: when your org hasn't configured a **<Icon icon="settings-2" /> Settings** > [**<Icon icon="scroll-text" /> Logging**](https://www.braintrust.dev/app/~/configuration/org/logging) policy, `Eval()` and `init()` no longer log a default set of [git metadata](/admin/organizations#set-git-metadata-logging) fields. Configure a policy in the UI, or pass a `git_metadata_settings` object specifying the fields to log.
  * [v0.22.1](https://github.com/braintrustdata/braintrust-sdk-python/releases/tag/py-sdk-v0.22.1) - You can now pass [Pydantic](https://docs.pydantic.dev/) models (or any object that serializes to a dict) directly as `metadata` when logging, instead of converting to a dict first. [LangChain](/integrations/sdk-integrations/langchain) spans now break out Anthropic prompt cache creation tokens by TTL (`prompt_cache_creation_5m_tokens`, `prompt_cache_creation_1h_tokens`), so you can attribute prompt-cache spend to each pricing tier. Failed [Claude Agent SDK](/integrations/agent-frameworks/claude-agent-sdk) runs now surface their error on the trace instead of being silently dropped, so you can see why a run failed without leaving Braintrust.
  * [v0.22.0](https://github.com/braintrustdata/braintrust-sdk-python/releases/tag/py-sdk-v0.22.0) - [LiteLLM](/integrations/sdk-integrations/litellm) async moderation calls (`amoderation()`) are now traced, like the existing sync `moderation()`. Fixed inflated token counts in [LangChain](/integrations/sdk-integrations/langchain) traces for OpenAI-style providers using prompt caching, where cached input tokens were double-counted. [Custom scorers](/evaluate/write-scorers) that return `list[Score]` are no longer rejected by static type checkers. [LiveKit Agents](/integrations/agent-frameworks/livekit-agents) traces now capture end-of-utterance detection delay, so you can measure voice-turn responsiveness.
  * [v0.21.0](https://github.com/braintrustdata/braintrust-sdk-python/releases/tag/py-sdk-v0.21.0) - Added [LiveKit Agents](/integrations/agent-frameworks/livekit-agents) integration for tracing real-time voice AI pipelines, including LLM turns, STT, TTS, audio output, and function tool calls.
  * [v0.20.0](https://github.com/braintrustdata/braintrust-sdk-python/releases/tag/py-sdk-v0.20.0) - [AutoGen](/integrations/agent-frameworks/autogen) agents running inside teams (such as `RoundRobinGroupChat`) are now traced individually as child spans of the team. [LiteLLM](/integrations/sdk-integrations/litellm) `text_completion()` and `atext_completion()` calls are now traced as `Completion` spans. Default [git metadata](/admin/organizations#set-git-metadata-logging) collection for `Eval()`, `EvalAsync()`, and `init()` now uses a fixed 8-field set (`commit`, `branch`, `tag`, `dirty`, `author_name`, `author_email`, `commit_message`, `commit_time`) and excludes `git_diff`. **Breaking change**: orgs that relied on automatic diff collection must enable `git_diff` under [**<Icon icon="settings-2" /> Settings** > **<Icon icon="scroll-text" /> Logging**](https://www.braintrust.dev/app/~/configuration/org/logging). Fixed [Autoevals](/evaluate/autoevals) scorers type error issue. [Cohere](/integrations/ai-providers/cohere) v2 streaming responses now include citations in span output, and `chat.stream()` closes its span correctly when used as a context manager. [Mistral](/integrations/ai-providers/mistral) `beta.conversations` tool calls are now traced. The background logger no longer retries on HTTP 413. The [Strands](/integrations/agent-frameworks/strands-agent) integration correctly closes spans when running alongside a no-op OpenTelemetry provider. Anthropic cache tokens captured via [LangChain](/integrations/sdk-integrations/langchain) are now folded into `prompt_tokens` and `total_tokens`. Span metadata for [OpenAI](/integrations/ai-providers/openai) calls no longer includes entries for parameters you didn't pass.
  * [v0.19.0](https://github.com/braintrustdata/braintrust-sdk-python/releases/tag/py-sdk-v0.19.0) - Auto-instrumentation now covers [Temporal](/integrations/sdk-integrations/temporal); call `braintrust.auto_instrument()` to trace workflows and activities. The integration moved to `braintrust.integrations.temporal`, and the old `braintrust.contrib.temporal` import is deprecated. [OpenAI](/integrations/ai-providers/openai) streaming audio transcriptions are now traced. [Cohere](/integrations/ai-providers/cohere) and [Mistral](/integrations/ai-providers/mistral) tool calls now generate child tool spans. Fixed a [remote evals](/evaluate/remote-evals) bug where the Python SDK's `braintrust.devserver` module routed experiments to the wrong project when an `/eval` request omitted `project_id`. **Breaking change**: [Anthropic](/integrations/ai-providers/anthropic#what-traced-python) per-TTL prompt cache write counts moved from span metadata to span metrics and were renamed: `cache_creation_ephemeral_5m_input_tokens` → `prompt_cache_creation_5m_tokens` and `cache_creation_ephemeral_1h_input_tokens` → `prompt_cache_creation_1h_tokens`. Update any references to the old names (for example, in SQL queries, saved filters, alerts, automations, or dashboards).

  ### TypeScript SDK releases

  * [v3.14.0](https://github.com/braintrustdata/braintrust-sdk-javascript/releases/tag/braintrust@3.14.0) - Added auto-instrumentation for [Mastra](/integrations/agent-frameworks/mastra), tracing agents, workflows, steps, tool calls, and LLM calls with no code changes. Added the `BRAINTRUST_CACHE_LOCATION` [environment variable](/instrument/advanced-tracing#tune-performance) to select the prompt and parameter cache mode (`mixed`, `memory`, `disk`, or `none`).
  * [v3.13.0](https://github.com/braintrustdata/braintrust-sdk-javascript/releases/tag/braintrust@3.13.0) - The Node.js SDK now searches parent directories for a `.env.braintrust` file containing `BRAINTRUST_API_KEY` at login. Added the experimental [`DatasetPipeline`](/annotate/datasets/pipelines) API for transforming project logs into dataset rows.
  * [v3.12.0](https://github.com/braintrustdata/braintrust-sdk-javascript/releases/tag/braintrust@3.12.0) - Added auto-instrumentation for the [OpenAI Agents SDK](/integrations/agent-frameworks/openai-agents-sdk) (`@openai/agents`), [LangChain](/integrations/sdk-integrations/langchain) and [LangGraph](/integrations/agent-frameworks/langgraph), and Flue (`@flue/runtime`). Added `wrapNextjsConfigWithBraintrust`, imported from `braintrust/next`, as the canonical Next.js setup helper, which selects the webpack plugin or Turbopack loader based on the active build. Renamed the bundler plugin exports to `braintrustVitePlugin`, `braintrustWebpackPlugin`, `braintrustEsbuildPlugin`, and `braintrustRollupPlugin` (the previous names are deprecated). Added the `braintrust/apply-auto-instrumentation` entrypoint so CommonJS and TypeScript-to-CommonJS projects can enable auto-instrumentation without the `--import` ESM hook. Traces can now span multiple projects. **Behavior change**: git metadata is no longer collected by default when your org has no [git metadata logging](/admin/organizations#set-git-metadata-logging) policy. Hardened `mergeDicts` against prototype pollution and added exponential backoff to request retries.
  * [v3.11.0](https://github.com/braintrustdata/braintrust-sdk-javascript/releases/tag/braintrust@3.11.0) - Added auto-instrumentation for `@github/copilot-sdk`, `@openai/codex-sdk`, and [Firebase Genkit](/integrations/sdk-integrations/firebase-genkit). Instrumented [Mistral](/integrations/ai-providers/mistral) classification and moderation APIs, and added `projectId` support to `wrapVitest`. Captured reasoning content for [OpenRouter](/integrations/ai-providers/openrouter) and [Groq](/integrations/ai-providers/groq) reasoning models. Fixed duplicate LLM spans when multiple SDK instances load in one process, and corrected Google ADK agent naming.
  * [v3.10.0](https://github.com/braintrustdata/braintrust-sdk-javascript/releases/tag/braintrust@3.10.0) - Added auto-instrumentation for [Groq](/integrations/ai-providers/groq) (`groq-sdk`) and Cursor (`@cursor/sdk`). Added dataset versioning support so experiments can pin a [dataset](/annotate/datasets) to a specific version, and experiment metadata now records the dataset filters used for a run. [Cohere](/integrations/ai-providers/cohere) now captures extended thinking content and a `reasoning_tokens` metric. Expanded [Google ADK](/integrations/agent-frameworks/google) auto-instrumentation to cover `@google/adk@1.0.0`. Fixed [Mistral](/integrations/ai-providers/mistral) reasoning capture, [Hugging Face](/integrations/ai-providers/huggingface) streamed tool calls, [Claude Agent SDK](/integrations/agent-frameworks/claude-agent-sdk) built-in tool nesting, and bundler plugin export maps.

  ### Go SDK releases

  * [v0.7.0](https://github.com/braintrustdata/braintrust-sdk-go/releases/tag/v0.7.0) - Upgraded the OpenTelemetry modules, gzipped OpenTelemetry span exports, and expanded OpenAI metadata capture to include request status, token usage, and details for incomplete or failed responses. **Breaking change**: requires Go 1.25.0 or later.

  ### Java SDK releases

  * [v0.3.10](https://github.com/braintrustdata/braintrust-sdk-java/releases/tag/v0.3.10) - Automatic base64 attachment conversion (introduced in v0.3.9) now recognizes the Anthropic, Google Gemini, and AWS Bedrock Converse message formats in addition to OpenAI-style data URIs, so multimodal traces from those providers render as previewable [attachments](/instrument/attachments).
  * [v0.3.9](https://github.com/braintrustdata/braintrust-sdk-java/releases/tag/v0.3.9) - OpenTelemetry span and log payloads are now gzip-compressed before export by default, reducing network bandwidth with no caller-side changes. Braintrust span size limits still apply to the uncompressed payload. Disable span compression with `BRAINTRUST_COMPRESS_OTEL_PAYLOAD=false` or `BraintrustConfig.builder().compressOtelPayload(false)`. Base64-encoded AI attachments (images, PDFs, and other binary data) in span inputs and outputs are now also automatically uploaded as separate Braintrust attachment objects, reducing span payload sizes and the chance of exceeding OpenTelemetry span size limits. Enabled by default. Disable with `BRAINTRUST_AUTO_CONVERT_AI_ATTACHMENTS=false` or `.autoConvertAIAttachments(false)`.
  * [v0.3.8](https://github.com/braintrustdata/braintrust-sdk-java/releases/tag/v0.3.8) - Added [trace-level scoring](/evaluate/custom-code#score-traces): implement `TracedScorer<INPUT, OUTPUT>` instead of `Scorer` to score on intermediate LLM calls, tool invocations, and other spans produced during a task. The `Eval` framework dispatches to `score(TaskResult, BrainstoreTrace)`, where `BrainstoreTrace` gives cached access to task spans via `getSpans()` and `getSpans(type)` and a reconstructed conversation via `getLLMConversationThread()`. Existing `Scorer` implementations are unaffected. Opening a cursor on an empty Braintrust-hosted dataset no longer throws.
  * [v0.3.7](https://github.com/braintrustdata/braintrust-sdk-java/releases/tag/v0.3.7) - [Auto-instrumentation](/instrument/trace-llm-calls#java) now requires `com.openai:openai-java` 2.15.0 or later. Streaming calls to the OpenAI Responses API (`client.responses().createStreaming(...)`) now capture `output`, token usage, and `time_to_first_token`.
  * [v0.3.6](https://github.com/braintrustdata/braintrust-sdk-java/releases/tag/v0.3.6) - Internal publishing fix. No user-facing changes.
  * [v0.3.5](https://github.com/braintrustdata/braintrust-sdk-java/releases/tag/v0.3.5) - [Anthropic instrumentation](/integrations/ai-providers/anthropic#what-traced-java) now captures per-TTL prompt cache metrics (`prompt_cache_creation_5m_tokens`, `prompt_cache_creation_1h_tokens`) in place of the aggregate `prompt_cache_creation_tokens`, and tracks cache reads as `prompt_cached_tokens`. `BraintrustConfig.sslContext()` is now applied to the API HTTP client, fixing API calls in custom-CA environments. `BraintrustApiClient` is deprecated in favor of `Braintrust.openApiClient()` and `BraintrustOpenApiClient` (generated from the OpenAPI spec).

  ### C# SDK releases

  * [v0.2.7](https://github.com/braintrustdata/braintrust-sdk-dotnet/releases/tag/v0.2.7) - Added [Azure OpenAI](/integrations/ai-providers/azure) instrumentation in a new `Braintrust.Sdk.AzureOpenAI` package, and upgraded the OpenTelemetry dependencies.

  ### Improvements

  * [AI provider](/admin/ai-providers) keys that have not been rotated in over six months now display a warning indicator on the secrets manager page, surfacing the recommendation to rotate secrets periodically.
  * New [SQL best practices](/reference/sql/best-practices) page covering query shapes, `ANY_SPAN()` filtering, subfield indexing, and performance warnings.
  * Score filters on `summary` shape queries now evaluate at the span level for better performance. This is correct for the common case where scorers run once per span. To filter on the averaged score across all spans in a trace, use `HAVING avg(scores.foo)`. See [Data shapes](/reference/sql#data-shapes) for details.
  * The [Baseten](/integrations/ai-providers/baseten) provider now accepts an optional **API base URL** so you can point Braintrust at a custom Baseten deployment endpoint, such as a dedicated deployment for a Gemma model.
  * New models available through the Braintrust gateway: GPT-5.5 Pro (`gpt-5.5-pro`) via OpenAI and Azure, and Claude 4.7 Opus (`claude-opus-4-7`) via Anthropic. See [Supported models](/deploy/supported-models) for the full list.
  * SQL queries now support `substring(text, start, length)` for extracting substrings. Positions are 1-based, and a start value of 0 or less is clipped to the first character. See [SQL functions](/reference/sql#sql-functions) for the full function reference.

    <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_16}</Badge></Tooltip>
  * `date_trunc` now accepts positive-integer multiples of `second(s)`, `minute(s)`, `hour(s)`, and `day(s)` (for example, `'15 minutes'` or `'2 hours'`) for finer-grained time bucketing in `SELECT` and `GROUP BY`. The existing single-unit intervals continue to work, including `'week'`, `'month'`, and `'year'`, which don't accept multiples because they have variable durations. See [SQL functions](/reference/sql#sql-functions) for details.

    <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_17}</Badge></Tooltip>
  * SQL queries now support `estimated_cost_breakdown()` and `estimated_cost_component(name)` for per-component cost analysis. Use these to separate prompt, cached prompt, cache creation, and completion costs. See [SQL functions](/reference/sql#sql-functions) for available component names.

    <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_18}</Badge></Tooltip>
  * Topics automations now let you select which Braintrust-served `brain-facet-*` model generates facet summaries, so you can pin a specific version or stay on `brain-facet-latest`. Configure under **Advanced > Facet model** in [automation settings](/observe/topics/manage#adjust-automation).
  * You can now [pause and resume the Topics automation](/observe/topics/manage#pause-resume-automation) to stop and restart log processing without losing your place. When resumed, the automation picks up from where it left off rather than reprocessing.

    <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_19}</Badge></Tooltip>
  * Tightened validation for [custom query export automations](/admin/automations/export-to-cloud-storage#custom-sql-query): the `project_logs(...)` source must take a single string-literal project ID matching the automation's own project. Cross-project queries are now rejected.
  * The [SQL sandbox](/reference/sql) editor now autocompletes column names inferred from the `FROM` source (including nested fields), suggests the most common observed values when typing a filter comparison like `WHERE metadata.env =`, and completes `date_trunc` intervals and the `SETTINGS` clause.
  * The trace viewer now honors an explicit `metrics.estimated_cost` logged on any span, including parent spans. A parent's displayed cost is the sum of child span costs plus the parent's own logged cost, bringing the trace UI into alignment with the [`estimated_cost()`](/reference/sql#sql-functions) SQL function. Previously, an explicit `estimated_cost` was only preserved on leaf spans.
  * When [testing a scorer](/evaluate/write-scorers#test-with-logs) with **Logs** as the source, the <Icon icon="play" /> **Run** section now shows the full trace (tree and span detail) for each matching root span instead of a single JSON row, so you can drill into spans before running the scorer.
  * The [**<Icon icon="credit-card" /> Billing**](https://www.braintrust.dev/app/~/configuration/org/billing) page now shows a projected upcoming invoice broken down by Logs and Scores, so you can see on-demand usage spend in context before the invoice cycle closes.
  * Inline base64 payloads in [AWS Bedrock](/integrations/ai-providers/bedrock) Converse `image`, `video`, `audio`, and `document` content blocks are now automatically extracted into [Braintrust attachments](/instrument/attachments) at ingest, so multimodal Converse traces render as previewable attachments without any SDK changes.

    <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_20}</Badge></Tooltip>
  * The [Braintrust gateway](/deploy/gateway) now returns an `x-bt-error-origin` response header on error responses, with a value of `braintrust` for gateway errors or the provider name (for example, `openai`) for upstream provider errors, so callers can distinguish gateway failures from provider failures.
  * Timestamp columns (`created`, `start`, `end`, and any timestamp field) in tables can now be switched between absolute and friendly (relative) formats. Open the column header menu and select **Show friendly dates** or **Show absolute dates**. The choice is saved per column and per page.
  * Organization owners can now turn off automatic direct ownership grants for newly created objects with the **Create direct ownership grants** toggle in **<Icon icon="settings-2" /> Settings** > [**<Icon icon="shield-check" /> Permission groups**](https://www.braintrust.dev/app/~/configuration/org/groups), and optionally clear existing direct owner grants. See [Access control](/admin/access-control#how-permissions-work) for details.

    <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_21}</Badge></Tooltip>
</Update>

<Update label="April 2026">
  ### Translate message content in traces

  You can now translate any message or string field in a trace without leaving the UI. This is useful when debugging multilingual agents or reviewing customer conversations in unfamiliar languages. Choose from English, Spanish, French, German, Japanese, and Chinese, or type any language name. See [Translate message content](/observe/examine-traces#translate-message-content) for details.

  ### Custom views for dataset rows

  Custom views, which let you build tailored interfaces for traces using Loop, are now also available for dataset rows. Use them to create annotation interfaces, side-by-side input and expected comparisons, or any visualization that helps your team review and label records more effectively. See [Create custom views](/annotate/custom-views) for details.

  ### Dataset snapshots

  Save named checkpoints of a dataset to mark stable states, compare before-and-after changes, and restore to a previous version if needed. See [Save snapshots](/annotate/datasets/manage#save-snapshots) for details.

  <Note>
    {feature_0} {verb_0} only available on [Pro and Enterprise plans](/plans-and-limits#plans).
  </Note>

  ### Dataset environments

  Datasets can now be assigned to [environments](/deploy/environments), alongside prompts and parameters. Pin a specific dataset version to production to ensure evals always run against a known-good baseline, and promote dataset versions through dev, staging, and production the same way you promote prompts. [Environment alerts](/admin/automations/alerts#create-an-environment-alert) fire for dataset environment changes as well, and webhook payloads now include an `object_type` field (`"prompt"` or `"dataset"`) so handlers can branch on the type of change.

  <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_22}</Badge></Tooltip>

  ### Monitor dashboard import and export

  Monitor dashboards can now be exported and imported across projects, making it easy to reuse dashboards without recreating them from scratch. Charts also support exporting their configuration or underlying data as CSV or JSON. See [Duplicate views across projects](/observe/dashboards#duplicate-views-across-projects) and [Manage custom charts](/observe/dashboards#manage-custom-charts) for details.

  ### Cloud storage export

  Cloud storage export now supports start-from and rewind controls and a Hive-partitioned file layout. See [Export to cloud storage](/admin/automations/export-to-cloud-storage) for details.

  ### Tag filtering for experiment runs

  Tags you apply to experiments are now available as a filter in the dataset **Runs** panel, making it easy to compare runs across a specific model version, prompt variant, or release candidate. See [Filter experiment runs](/annotate/datasets/track-performance#filter-experiment-runs) for details.

  ### Token distribution overview for traces

  The <Icon icon="square-chart-gantt" /> **Timeline** trace view now includes a token distribution overview above the timeline bars. It breaks down LLM span token usage by type — uncached input, cached read, cache write, and output — and shows cache hit rate per span, making it easy to spot where caching is and isn't working. The overview adjusts to the current **Scale by** selection. See [View as a timeline](/observe/examine-traces#view-as-a-timeline) for details.

  ### Summary table layout

  The experiments table now includes a <Icon icon="table-2" /> **Summary table** layout that shows scores and metrics as rows with experiments as columns, making it easy to compare multiple experiments at a glance. Export results as a PDF to share with stakeholders who don't have Braintrust access. See [Compare experiments](/evaluate/compare-experiments#assess-overall-impact) for details.

  ### Infra dashboard for self-hosted deployments

  Organization owners and admins can now view infrastructure metrics for their self-hosted deployment directly in the Braintrust UI. The infra dashboard shows processing throughput, CPU and memory usage, object storage latency, realtime lag, and status checks. See [Monitor your infrastructure](/admin/self-hosting#monitoring) for details.

  ### Log indexing and full-text search

  You can now enable log search optimization to accelerate full-text queries across all fields, or add subfield indexes for fields you filter on frequently — such as `metadata.user_id` or `input.query`. Use the new [`search()`](/reference/sql#full-text-search) SQL function to query all text fields in a single expression with automatic bloom filter acceleration when log search optimization is on. Braintrust suggests candidate fields based on your data and can backfill up to 3 days of history.

  See [Speed up log filtering](/admin/projects#speed-up-log-filtering) and [Full-text search](/reference/sql#full-text-search) for details.

  <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_23}</Badge></Tooltip>

  ### Data plane region selection

  When creating a new Braintrust organization, you can now choose to host its [data plane](/admin/organizations#data-plane-region) in the EU or the US. After creating an organization, you cannot change its region. For details, see [Create an organization](/admin/organizations#create-an-organization).

  ### `estimated_cost()` SQL function

  SQL now includes an `estimated_cost()` scalar function that returns the estimated cost of a span in US dollars. It resolves to the pre-computed `metrics.estimated_cost` value when available, and falls back to computing cost from token metrics and registered model pricing when it isn't. Unlike the raw `metrics.estimated_cost` field, `estimated_cost()` works across spans, traces, and summary shapes, and can be used inside aggregates like `sum(estimated_cost())`. See [SQL functions](/reference/sql#sql-functions) for details.

  <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_24}</Badge></Tooltip>

  ### SQL subqueries

  SQL queries now support subqueries in the `FROM` clause. Write `FROM (<inner query>) AS <alias>` to use the result of one query as the data source for another. The alias is required. The most common pattern is aggregating in the inner query and filtering on the aggregated result in the outer query. Subqueries can be nested to multiple levels. See [Subqueries](/reference/sql#subqueries) for syntax, constraints, and examples.

  <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_25}</Badge></Tooltip>

  ### Sandboxes for agent evals

  Use sandboxes when your eval runs custom agent code that can't be expressed as a playground prompt. Unlike remote evals — which require running a dev server — sandboxes let you push your eval once and then run it from the playground on demand. Supports AWS Lambda (Python and TypeScript) and Modal (TypeScript, custom container). See [Test complex agents](/evaluate/remote-evals) for details.

  <Warning>
    Sandboxes are in beta and the API, configuration, and behavior are likely to change in the near future. Requires a [Pro or Enterprise plan](/plans-and-limits).
  </Warning>

  <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_26}</Badge></Tooltip>

  ### pi integration

  Braintrust now integrates with [pi](https://pi.dev), a minimal terminal coding harness. Install the [`@braintrust/pi-extension`](https://github.com/braintrustdata/braintrust-pi-extension) package to automatically trace pi sessions, turns, LLM calls, and tool executions to Braintrust. The extension also supports attaching pi sessions under existing parent traces for end-to-end workflow observability. See the [pi integration guide](/integrations/developer-tools/pi) for setup instructions.

  ### OTel attribute deduplication

  OTel ingestion previously stored each recognized OTel attribute twice on a span: once as the raw attribute on `metadata` (such as `gen_ai.input.messages`, `ai.prompt.messages`, or `llm.input_messages`), and again in the structured field Braintrust mapped it into (such as `input`, `output`, or `metrics`). Braintrust now keeps only the structured copy. This reduces cost for OTel-heavy workloads.

  For Braintrust-hosted organizations, this change is enabled by default. Self-hosted deployments can opt in via the `STRIP_OTEL_ATTRIBUTES_FROM_METADATA` environment variable. In either case, you can preserve raw attributes on a specific span by setting the `braintrust.otel.preserve_attributes` attribute to `true` on it. For details, see [Strip OTel attributes from metadata](/kb/strip-otel-attributes-from-metadata).

  <Warning>
    **Breaking change**: this change can break SQL queries, automations, or dashboards that read raw OTel attributes (like `gen_ai.input.messages`) directly from `metadata`. Update them to read from the structured field instead (`input`, `output`, `metrics`, etc.). If you need raw attributes preserved on specific spans, set the `braintrust.otel.preserve_attributes` attribute to `true` on those spans in your OTel instrumentation.
  </Warning>

  ### `bt` CLI releases

  * [v0.7.1](https://github.com/braintrustdata/bt/releases/tag/v0.7.1) - Added [`--ca-cert <PATH>`](/reference/cli/overview#global-flags) global flag (env: `BRAINTRUST_CA_CERT`) for custom PEM CA bundles, useful for self-hosted deployments with private TLS certificates. The Braintrust SDK (`braintrust`, `autoevals`, `@braintrust/*`) is now [bundled into function archives](/reference/cli/functions) by default for both JS and Python, so deployed functions are self-contained. Added `--disable-reconciliation` flag to [`bt topics config topic-map set`](/reference/cli/topics) to force fresh topic map generation without referencing the previously saved report.
  * [v0.7.0](https://github.com/braintrustdata/bt/releases/tag/v0.7.0) - Added [`--param key=value`](/reference/cli/eval) flag to `bt eval` for passing runtime parameter values into evaluators that declare a parameters schema. Multi-evaluator commands filter params per-evaluator so unrecognized keys are silently dropped. [`bt setup`](/reference/cli/setup) now supports `gemini` as a coding agent target, symlinking `.gemini/skills` to `.agents/skills/braintrust/SKILL.md`. `bt setup` now displays the created API key after setup completes. Minimum Node.js version for TypeScript evals updated to 18.19.0+ or 20.6.0+ (Bun 1.0+ and Deno with Node compat also supported).
  * [v0.6.0](https://github.com/braintrustdata/bt/releases/tag/v0.6.0) - Added [`--first N` and `--sample N [--sample-seed S]`](/reference/cli/eval) flags to `bt eval` for running a subset of eval data as a non-final smoke run. Summary output now includes `runMode`, `isFinal`, and `runLabel` fields. **Breaking change:** `bt` no longer automatically loads `.env` files. Set environment variables explicitly before running `bt` commands. [`bt setup`](/reference/cli/setup) now handles authentication and API key creation interactively, simplifying first-time onboarding. Improved `bt setup mcp` experience for MCP server configuration.
  * [v0.5.0](https://github.com/braintrustdata/bt/releases/tag/v0.5.0) - Added [`bt topics`](/reference/cli/topics) command group for managing Topics automations, including `status`, `poke`, `rewind`, `open`, and `config` subcommands. Auth resolution order updated: explicit `--profile` now takes top priority, followed by `--api-key`/`BRAINTRUST_API_KEY`, then `BRAINTRUST_PROFILE` as a distinct step. `bt setup` now detects non-interactive environments (CI, agents) and falls back gracefully when no TTY is available. `bt functions push` now rejects Python bundle paths containing whitespace characters.
  * [v0.4.0](https://github.com/braintrustdata/bt/releases/tag/v0.4.0) - Added [`bt functions push`](/reference/cli/functions) for uploading TypeScript and Python function definitions to Braintrust, with esbuild bundling for TypeScript and source collection for Python. Added [`bt functions pull`](/reference/cli/functions) for downloading function definitions from Braintrust to local files. Enhanced [`bt setup instrument`](/reference/cli/setup) with a full agentic SDK installation workflow — language detection, exact version pinning, LLM client instrumentation, app verification, and Braintrust permalink output — supporting Python, TypeScript, Go, Java, Ruby, and C#. The eval runner's SSE `start` event now uses camelCase keys (`projectName`, `experimentName`, `projectId`, `experimentId`, `projectUrl`, `experimentUrl`); the previous snake\_case keys are no longer emitted. Added `parent` field support in the eval request payload for attaching eval results to an existing trace span.

  ### Python SDK releases

  * [v0.18.0](https://github.com/braintrustdata/braintrust-sdk-python/releases/tag/py-sdk-v0.18.0) - Added native [LlamaIndex integration](/integrations/sdk-integrations/llamaindex) (`llama-index-core>=0.13.0`); call `braintrust.auto_instrument()` to trace LLM calls, embeddings, and query engine runs automatically. Added `cached_tokens` to OpenAI Agents SDK span metrics. Fixed `wrap_openai()` compatibility with Datadog `ddtrace`. Broadened `EvalResult.scores` from `dict[str, float | None]` to `Mapping[str, float | None]`, allowing scorers to return `MappingProxyType` or custom `Mapping` subclasses without type errors. Code that relies on `scores` being a mutable `dict` may need updating. **Breaking change**: Requires Python 3.10 or later.
  * [v0.17.0](https://github.com/braintrustdata/braintrust-sdk-python/releases/tag/py-sdk-v0.17.0) - Added automatic tracing for [CrewAI](/integrations/agent-frameworks/crew-ai), [AutoGen](/integrations/agent-frameworks/autogen), and [Strands Agent SDK](/integrations/agent-frameworks/strands-agent). Expanded [Cohere](/integrations/ai-providers/cohere) tracing to cover audio transcription calls and [LiteLLM](/integrations/sdk-integrations/litellm) tracing to cover rerank and async rerank calls. [Mistral](/integrations/ai-providers/mistral) now traces beta conversations API calls. Fixed [Claude Agent SDK](/integrations/agent-frameworks/claude-agent-sdk) tool span output preservation across back-to-back `tool_use` messages and simplified [Pydantic AI](/integrations/agent-frameworks/pydantic-ai) wrapper span output.
  * [v0.16.0](https://github.com/braintrustdata/braintrust-sdk-python/releases/tag/py-sdk-v0.16.0) - Added tracing for [Cohere](/integrations/ai-providers/cohere) `auto_instrument()` instrumenting v1/v2 chat, streaming, embed, and rerank calls. The [OpenAI Agents SDK](/integrations/agent-frameworks/openai-agents-sdk) integration captures `TaskSpanData` and `TurnSpanData` span types (openai-agents ≥ v0.14.0). Added [classifier support](/evaluate/write-scorers#classifiers) in `Eval()` through the `classifiers` parameter. Fixed Python typing issues around exports and Mapping types.
  * [v0.15.0](https://github.com/braintrustdata/braintrust-sdk-python/releases/tag/py-sdk-v0.15.0) - Added per-input [`trial_count`](/evaluate/run-evaluations#override-trial-count-per-case) override on `EvalCase`. Expanded [Mistral](/integrations/ai-providers/mistral) tracing to cover audio speech, transcription, and OCR calls, and expanded [Anthropic](/integrations/ai-providers/anthropic) tracing to cover beta managed agents APIs and server-side tool content blocks (e.g., web search) as child tool spans. The [OpenAI](/integrations/ai-providers/openai) Responses API now emits child tool spans for function calls and web search, and [LiteLLM](/integrations/sdk-integrations/litellm) now traces audio speech, transcription, and image generation calls. Speech audio and streaming audio from OpenAI and LiteLLM are captured as attachments, streaming refusals from OpenAI are preserved, and a `CachedSpanFetcher` bug that permanently cached empty span fetches was fixed.
  * [v0.14.0](https://github.com/braintrustdata/braintrust-sdk-python/releases/tag/py-sdk-v0.14.0) - Added automatic tracing for the [OpenAI](/integrations/ai-providers/openai) Images and Audio APIs (sync and async). [Anthropic](/integrations/ai-providers/anthropic) `.text_stream` now captures structured output metadata and `time_to_first_token`, and the [Claude Agent SDK](/integrations/agent-frameworks/claude-agent-sdk) now logs `ResultMessage` metrics. Fixes landed for scorer and task callback default parameter handling and a `braintrust push` whitespace bug.
  * [v0.13.0](https://github.com/braintrustdata/braintrust-sdk-python/releases/tag/py-sdk-v0.13.0) - Added native [Mistral](/integrations/ai-providers/mistral) tracing, added [OpenAI Agents SDK](/integrations/agent-frameworks/openai-agents-sdk) tracing, and expanded Python Auto-Instrumentation to support `mistral` and `openai_agents`. [Google GenAI](/integrations/ai-providers/gemini) now traces `interactions` and Live API calls, [Anthropic](/integrations/ai-providers/anthropic) now logs base64 multimodal inputs as attachments, and [DSPy](/integrations/sdk-integrations/dspy) now traces adapter callbacks.
  * [v0.12.1](https://github.com/braintrustdata/braintrust-sdk-python/releases/tag/py-sdk-v0.12.1) - Fixed nested usage metrics and metadata capture for [Anthropic](/integrations/ai-providers/anthropic), fixed parameter handling for Python sandboxes, and encoded request bodies as UTF-8 bytes to prevent Latin-1 corruption.
  * [v0.12.0](https://github.com/braintrustdata/braintrust-sdk-python/releases/tag/py-sdk-v0.12.0) - Added [LangChain](/integrations/sdk-integrations/langchain) auto-instrumentation and moved the Python LangChain integration into the main `braintrust` package, added native [OpenRouter](/integrations/ai-providers/openrouter) tracing with `wrap_openrouter()`, and added [AgentScope](/integrations/agent-frameworks/agentscope) tracing with `setup_agentscope()`. `auto_instrument()` now supports `langchain`, `openrouter`, and `agentscope`. [Google GenAI](/integrations/ai-providers/gemini) now traces `generate_images`, and fixes landed for LiteLLM async embeddings, Anthropic, remote prompt parameter rehydration, and Temporal.

  ### TypeScript SDK releases

  * [v3.9.0](https://github.com/braintrustdata/braintrust-sdk-javascript/releases/tag/braintrust@3.9.0) - Added tracing for [Google ADK](/integrations/agent-frameworks/google), [Cohere SDK](/integrations/ai-providers/cohere), and [Hugging Face Inference SDK](/integrations/ai-providers/huggingface). Expanded [Google GenAI](/integrations/ai-providers/gemini) coverage with `embedContent` and grounding metadata. Added reranking instrumentation for [AI SDK](/integrations/sdk-integrations/vercel) and [OpenRouter](/integrations/ai-providers/openrouter). Added [Anthropic](/integrations/ai-providers/anthropic) `beta.messages.toolRunner` support, with full tool-loop tracing and an `anthropic_tool_runner_iterations` metadata field for the iteration count. Anthropic server-side tool use metrics (e.g., `server_tool_use_web_search_requests`) are now captured. Per-input [`trialCount`](/evaluate/run-evaluations#override-trial-count-per-case) override now supported on `EvalCase` in `Eval()`. Improved [Claude Agent SDK](/integrations/agent-frameworks/claude-agent-sdk) lifecycle handling and subagent tool span nesting. Added [classifier support](/evaluate/write-scorers#classifiers) in `Eval()` through the `classifiers` parameter. Fixed AI SDK prompt cache metrics (`prompt_cached_tokens` and `prompt_cache_creation_tokens`) and OpenAI streaming `logprob` and `refusals` capture.
  * [v3.8.0](https://github.com/braintrustdata/braintrust-sdk-javascript/releases/tag/js-sdk-v3.8.0) - Added tracing for [Mistral](/integrations/ai-providers/mistral) and [OpenRouter Agent SDK](/integrations/agent-frameworks/openrouter-agent), expanded AI SDK instrumentation to cover `embed()` and `embedMany()`, and added OpenAI `responses.compact()` instrumentation. SDK tracing now works in Cloudflare Workers and Vercel Edge Runtime via an automatic polyfill. Fixed AI SDK streaming instrumentation (corrected sync/async handling for `streamText` / `streamObject` across v3–v6 and fixed `Agent.stream` tracing in v5), corrected Claude Agent SDK tool-call nesting, fixed span-context propagation in OTel compat mode, and allowed `undefined` as a source for remote eval params.
  * [v3.7.1](https://github.com/braintrustdata/braintrust-sdk-javascript/releases/tag/js-sdk-v3.7.1) - Stability updates and fixes.
  * [v3.7.0](https://github.com/braintrustdata/braintrust-sdk-javascript/releases/tag/js-sdk-v3.7.0) - Added the `braintrust/webpack-loader` API for Next.js Turbopack support. See [Trace LLM calls](/instrument/trace-llm-calls#auto-instrumentation) for usage.

  ### Go SDK releases

  * [v0.6.1](https://github.com/braintrustdata/braintrust-sdk-go/releases/tag/v0.6.1) - Fixed: org names containing spaces or special characters are now correctly percent-encoded in experiment permalink URLs generated by the eval runner. Previously, such org names produced malformed URLs.
  * [v0.6.0](https://github.com/braintrustdata/braintrust-sdk-go/releases/tag/v0.6.0) - Added a new [AWS Bedrock](/integrations/ai-providers/bedrock#tracing-go) tracing integration (`trace/contrib/bedrockruntime`) with Orchestrion auto-instrumentation support. Added embeddings tracing across existing integrations: [OpenAI](/integrations/ai-providers/openai) (both `openai-go` and `sashabaranov/go-openai`), [Google GenAI](/integrations/ai-providers/gemini), [LangChainGo](/integrations/sdk-integrations/langchain#tracing-embeddings-go), [CloudWeGo Eino](/integrations/sdk-integrations/cloudwego-eino#tracing-embeddings-go), and [Firebase Genkit](/integrations/sdk-integrations/firebase-genkit#tracing-embeddings-go).
  * [v0.5.0](https://github.com/braintrustdata/braintrust-sdk-go/releases/tag/v0.5.0) - The [Gemini](/integrations/ai-providers/gemini) integration now instruments `streamGenerateContent` calls, with reconstructed output, token usage metrics, and `time_to_first_token` (previously only non-streaming `generateContent` produced spans). Gemini `ThinkingConfig` is captured in span metadata, and `thoughtsTokenCount` is mapped to `completion_reasoning_tokens` in span metrics. The [Anthropic](/integrations/ai-providers/anthropic) middleware now preserves document citations from streaming `citations_delta` events. Restored support for the `BRAINTRUST_ENABLE_TRACE_CONSOLE_LOG` environment variable (broken since v0.1) and added a `WithEnableTraceConsoleLog(bool)` option for programmatic configuration.
  * [v0.4.1](https://github.com/braintrustdata/braintrust-sdk-go/releases/tag/v0.4.1) - The [OpenAI](/integrations/ai-providers/openai) integration now captures `max_completion_tokens` and `reasoning_effort` in span metadata, giving visibility into reasoning model (e.g. `o4-mini`) configuration in traces. The OpenAI integration also now captures the `text` parameter for the responses API in span metadata.
  * [v0.4.0](https://github.com/braintrustdata/braintrust-sdk-go/releases/tag/v0.4.0) - Moved tracing integrations to separate Go modules and added LangChainGo to `trace/contrib/all`. **Breaking change**: users of `trace/contrib/*` integrations must now `go get` each module explicitly. See the [migration guide](/reference/sdks/go/migrations/v0-3-to-v0-4).
  * [v0.3.1](https://github.com/braintrustdata/braintrust-sdk-go/releases/tag/v0.3.1) - Anthropic tracing now records `time_to_first_token` for both streaming and non-streaming requests and correctly captures streaming extended thinking blocks. Eval trace structure also changed: each scorer now runs in its own named child span under the eval span, with scorer-specific `span_attributes` and full task context attached. Failed eval tasks now keep `input`, `expected`, `metadata`, and `origin` on the eval span, and omit `output_json` instead of writing `null`. This is a breaking change for teams that inspect raw eval span structure.
  * [v0.3.0](https://github.com/braintrustdata/braintrust-sdk-go/releases/tag/v0.3.0) - Added [CloudWeGo Eino](/integrations/sdk-integrations/cloudwego-eino) and [Firebase Genkit](/integrations/sdk-integrations/firebase-genkit) tracing integrations with Orchestrion auto-instrumentation support.

  ### Java SDK releases

  * [v0.3.4](https://github.com/braintrustdata/braintrust-sdk-java/releases/tag/v0.3.4) - The `opentelemetry-sdk-extension-autoconfigure` and `opentelemetry-sdk-extension-autoconfigure-spi` modules are no longer transitive dependencies of the main `braintrust-sdk-java` artifact. Applications that used the SDK without the Braintrust Java agent and relied on these being on the classpath should either add them explicitly, depend on the [`braintrust-otel-extension`](https://central.sonatype.com/artifact/dev.braintrust/braintrust-otel-extension) artifact, or run the [`braintrust-java-agent`](/instrument/trace-llm-calls#auto-instrumentation).
  * [v0.3.3](https://github.com/braintrustdata/braintrust-sdk-java/releases/tag/v0.3.3) - Fixed grandchild spans being dropped when `braintrust-java-agent` ran alongside the Datadog Java agent with `DD_TRACE_OTEL_ENABLED=true`. Datadog detection now uses JVM JAR manifest inspection, making it reliable on Linux. The [`braintrust-otel-extension`](https://central.sonatype.com/artifact/dev.braintrust/braintrust-otel-extension) artifact now ships its OTel autoconfigure implementation, enabling Braintrust tracing under the OpenTelemetry Java agent without the Braintrust agent.
  * [v0.3.2](https://github.com/braintrustdata/braintrust-sdk-java/releases/tag/v0.3.2) - Added [AWS Bedrock](/integrations/ai-providers/bedrock#tracing-java) tracing via `BraintrustAWSBedrock.wrap()` for `converse` and `converseStream`, with auto-interception under `braintrust-java-agent`. Requires `software.amazon.awssdk:bedrockruntime` 2.30.0 or later. The Braintrust agent now coexists with the OpenTelemetry Java agent. For full library auto-instrumentation in that setup, use the [`braintrust-otel-extension`](https://central.sonatype.com/artifact/dev.braintrust/braintrust-otel-extension) artifact.
  * [v0.3.1](https://github.com/braintrustdata/braintrust-sdk-java/releases/tag/v0.3.1) - Spring AI instrumentation rewritten on HTTP-level interceptors (RestClient + WebClient) instead of Micrometer observation handlers. Streaming responses now capture `time_to_first_token`, and image and file attachments in Spring AI messages are captured as `braintrust_attachment` objects in span inputs. Fixed `time_to_first_token` being recorded as `0` for non-streaming LangChain calls.
  * [v0.3.0](https://github.com/braintrustdata/braintrust-sdk-java/releases/tag/v0.3.0) - Added `ParameterDef<T>` for typed parameter definitions configurable from the Playground UI: use `ParameterDef.data(name, defaultValue)` for string, number, boolean, array, and POJO types, and `ParameterDef.model(name, defaultValue)` for a model picker. Added `Parameters` runtime container with typed `get(key, Class<T>)` accessors and numeric coercion. `Task.apply(DatasetCase, Parameters)` is now the primary method to implement; the single-arg form becomes a backward-compatible default. `TaskResult` now carries a `Parameters` field — use `new TaskResult<>(output, datasetCase, parameters)` to forward parameters to remote scorers. `Eval.Builder` gains `.parameters()` and `.parameterValues()` for declaring and overriding parameter values in offline runs. **Breaking change**: `RemoteEval.Parameter` and `RemoteEval.ParameterType` are removed; replace with `ParameterDef.data(...)` and `ParameterDef.model(...)`, and update `RemoteEval.Builder.parameter()` calls to accept `ParameterDef<?>`.
  * [v0.2.11](https://github.com/braintrustdata/braintrust-sdk-java/releases/tag/v0.2.11) - The Java auto-instrumentation agent now supports running alongside the Datadog Java agent. List the Braintrust `-javaagent` after the Datadog `-javaagent`. Added `BRAINTRUST_FILTER_AI_SPANS` environment variable to export only AI-related spans and drop generic instrumentation spans. See [Trace LLM calls](/instrument/trace-llm-calls#auto-instrumentation) for setup details.

  ### Ruby SDK releases

  * [v0.3.2](https://github.com/braintrustdata/braintrust-sdk-ruby/releases/tag/v0.3.2) - Relaxed the `openssl` dependency constraint to allow `openssl` 4.x for compatibility with newer Ruby environments.
  * [v0.3.1](https://github.com/braintrustdata/braintrust-sdk-ruby/releases/tag/v0.3.1) - Added a `parameters:` keyword argument to `Eval.run` for passing runtime configuration to tasks and scorers, which powers [remote evals](/evaluate/remote-evals) triggered from the playground. Tasks and scorers that declare `parameters:` receive the values automatically, and those that don't are unaffected. `Functions#list` and `Prompt.load` now accept `project_id:` as an alternative to a project name. `Prompt` instances expose a new `#version` accessor returning the loaded version's transaction ID. Added support for Ruby 4.0.

  ### C# SDK releases

  * [v0.2.5](https://github.com/braintrustdata/braintrust-sdk-dotnet/releases/tag/v0.2.5) - Added a `Braintrust.Sdk.AgentFramework` package that traces apps built on the [Microsoft Agent Framework](/integrations/agent-frameworks/microsoft-agent-framework), with agent-level (`WithBraintrustAgentTracing`), LLM-level (`UseBraintrustLLMTracing`), and function-level (`UseBraintrustFunctionTracing`) tracing, plus a combined `UseBraintrustTracing` that instruments the whole pipeline in one call.

  ### Improvements

  * Playground table cells now render YAML and Markdown content with syntax highlighting, using colors consistent with the prompt editor.
  * Lambda sandbox evals are no longer limited to 15 minutes end-to-end. Braintrust manages dataset iteration outside the Lambda invocation, so only individual per-case invocations are subject to the Lambda execution limit.
  * Modal sandbox evals now have a 60-minute lifetime by default, up from the previous 5-minute Modal default, so longer-running agent evals complete without hitting the sandbox lifetime cap. Self-hosted deployments can override the default with the `MODAL_SANDBOX_TIMEOUT_S` environment variable (in seconds).

    <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_27}</Badge></Tooltip>
  * Project descriptions now render as Markdown. Use headings, bullet points, links, and code blocks to format your project description.
  * When you run `Eval()` with [`loadParameters()`](/evaluate/write-parameters#use-in-evaluations), the experiment's **Details** sidebar now shows a **Parameters** section with a clickable link back to the saved parameters object used. The experiments table also includes a **Parameters** column so you can see which parameter version each experiment used at a glance.
  * SQL now supports null-safe equality operators: `<=>` (null-safe equal) and `<!=>` (null-safe not-equal). Unlike `=` and `!=`, these treat `null` as a comparable value rather than propagating `null` in the result. `coalesce(field, 'x') != 'y'` patterns are also automatically rewritten to `field <!=> 'y'` for better index performance. See [SQL operators](/reference/sql#sql-operators) for details.
  * `ANY_SPAN()` now supports one level of nesting. See [`ANY_SPAN()`](/reference/sql#matching-spans-filters) for details.

    <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_28}</Badge></Tooltip>
  * [Experiments created in the Braintrust UI](/evaluate/run-evaluations#run-in-ui) now run without a time limit on cloud and on self-hosted deployments running data plane v2.0 or later. Previously, UI-originated experiments timed out after 15 minutes.
  * When the Braintrust Slack app is updated with new permissions, a **Refresh permissions** button now appears next to connected workspaces on the Integrations page, making it easy to re-authorize and enable the latest features. See [Enable Slack integration](/admin/organizations#enable-slack-integration) for details.
  * Monitor charts now support **count distinct** as an aggregator, making it easy to plot unique value counts (e.g. distinct users or models) as a time series. See [Create custom charts](/observe/dashboards#create-custom-charts) for details.
  * The [custom chart editor](/observe/dashboards#create-custom-charts) now accepts complex SQL aggregate expressions as measures, such as `sum(tokens) / count(id)` and `100 * sum(errors) / count(id)`. Invalid expressions display a validation error on hover.
  * The [Compare experiments](/evaluate/compare-experiments) guide has been significantly updated: restructured around common workflows, verified against the source code, and expanded with new sections on setting baselines, diff mode, layout options, sharing results, and comparing trials.
  * You can now create an [online scoring rule](/evaluate/score-online#create-scoring-rules) directly from the **Scorers** list page by selecting one or more scorers.
  * The scorer documentation has been reorganized into dedicated pages for each scorer type: [Autoevals](/evaluate/autoevals), [LLM-as-a-judge](/evaluate/llm-as-a-judge), and [Custom code](/evaluate/custom-code). The [Scorers overview](/evaluate/write-scorers) covers where to define scorers, how to test them, and best practices.
  * The custom facet panel now lets you test the preprocessor on a selected trace independently of the prompt, so you can verify your transformation before iterating on the prompt. See [Create a facet](/observe/topics/custom-facets#create-a-facet) for details.
</Update>

<Update label="March 2026">
  ### Tag and star datasets

  Tags can now be applied to datasets themselves, not just to individual records within them. Use dataset tags to organize datasets in the list and star datasets to pin them to the top of the table and dataset picker dropdowns. See [Tag and star datasets](/annotate/datasets/manage#tag-and-star-datasets) for details.

  ### Estimated cost in trace tree

  The trace tree now shows estimated LLM cost inline on each span by default, alongside duration and total tokens. Cost is propagated from child spans to parent spans, making it easy to see which parts of a multi-step workflow are consuming most of your cost budget. Use <Icon icon="settings-2" /> **Display** > **Display metric types** to customize which metrics appear. See [View as a hierarchy](/observe/examine-traces#view-as-a-hierarchy) for details.

  ### Timeline scaling by tokens and cost

  The <Icon icon="square-chart-gantt" /> **Timeline** trace view now lets you scale span bars by metrics other than duration — including total tokens, prompt tokens, completion tokens, or estimated cost. This makes it easy to diagnose context bloat by seeing at a glance which spans consume the most tokens. See [View as a timeline](/observe/examine-traces#view-as-a-timeline) for details.

  ### Assign prompts to environments via API

  The [`POST /v1/prompt`](/api-reference/prompts/create-prompt) and [`PUT /v1/prompt`](/api-reference/prompts/create-or-replace-prompt) endpoints now accept an `environment_slugs` parameter, letting you assign a prompt to one or more environments in a single atomic request. See [Assign to environments](/deploy/environments#assign-to-environments) for details.

  ### Organization-wide default views

  Admins can now set organization-wide default views on any page that supports them, including Logs, Monitor, Review, Playgrounds, Experiments, Datasets, Prompts, Scorers, Parameters, and Tools. See [Set default table views](/observe/view-logs#set-default-table-views) and [Set default dashboards](/observe/dashboards#set-a-default-view) for details.

  ### Annotate playground outputs

  Annotate multiple outputs across many prompts using quick thumbs up/down reactions and free text comments, and then let Loop suggest prompt improvements based on your feedback. See [Annotate outputs](/evaluate/playgrounds#annotate-outputs) for details.

  <Note>
    {feature_1} {verb_1} only available on [Pro and Enterprise plans](/plans-and-limits#plans).
  </Note>

  ### Preprocessor-powered thread view

  The <Icon icon="messages-square" /> **Thread** tab now has a <Icon icon="settings-2" /> settings control that lets you apply a preprocessor to the thread view. Choose the built-in **Thread** preprocessor to format the trace as a readable conversation, or use a [custom preprocessor](/observe/topics/custom-facets) to control how messages are rendered. See [View as a conversation](/observe/examine-traces#view-as-a-conversation) for details.

  ### Trace-level filters in charts

  The <Icon icon="chart-no-axes-column" /> **Monitor** page now supports trace-level filters in charts and at the page level. Page-level filters, which previously matched at the span level, now match traces where any span satisfies the conditions. This is useful for filtering by metadata you may have set, like `metadata.email` or `metadata.org`. See [Create custom charts](/observe/dashboards#create-custom-charts) for details.

  <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_29}</Badge></Tooltip>

  ### Search scope in trace views

  Trace search (`Cmd/Ctrl+F`) now includes a scope selector to search within just the currently selected span or across the full trace. See [Search within a trace](/observe/examine-traces#search-within-a-trace) for details.

  ### Monitor topic trends

  The <Icon icon="chart-no-axes-column" /> **Monitor** page now includes a built-in chart to help you track topic distribution changes over time. See [Track trends over time](/observe/topics/review-insights#track-trends-over-time) for details.

  <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_30}</Badge></Tooltip>

  ### Duplicate custom views across projects

  Custom trace views can now be duplicated to other projects and organizations, enabling you to reuse custom interfaces across different contexts. See [Duplicate views](/annotate/custom-views#duplicate-views) for details.

  ### Custom view version history

  Custom trace views now maintain version history, enabling you to track changes over time and revert to previous versions. See [Access version history](/annotate/custom-views#access-version-history) for details.

  ### Chrome Local Network Access for self-hosted deployments

  Self-hosted deployments on private networks can now enable the **Data plane is on a private network** checkbox in organization settings. When enabled, Braintrust will detect Chrome Local Network Access permission issues and display instructions to resolve them. See [Grant browser permissions](/admin/self-hosting/advanced#grant-browser-permissions) for details.

  ### Python SDK releases

  * [v0.11.0](https://github.com/braintrustdata/braintrust-sdk-python/releases/tag/py-sdk-v0.11.0) - [Claude Agent SDK](/integrations/agent-frameworks/claude-agent-sdk) integration now instruments hooks, creating dedicated spans for each hook invocation. [Anthropic](/integrations/ai-providers/anthropic) integration now instruments `messages.batches.create` and `beta.messages.batches.create` (sync and async), and captures `thinking` config in span metadata when extended thinking is enabled. [Google GenAI](/integrations/ai-providers/gemini) integration now instruments `embed_content` and `aembed_content` for embedding calls. [Google ADK](/integrations/agent-frameworks/google) integration now correctly traces nested subagent tool calls with proper span parenting. Fixed OpenAI traced stream wrappers to support the context manager protocol (`with`/`async with`). `Eval()` now emits a warning when it receives an empty dataset.
  * [v0.10.0](https://github.com/braintrustdata/braintrust-sdk-python/releases/tag/py-sdk-v0.10.0) - Added saved-parameter loading in Python via `load_parameters()`, traced OpenAI `with_raw_response` calls, added `jsonschema` as a required dependency, and fixed noisy Pydantic serializer warnings.
  * [v0.9.0](https://github.com/braintrustdata/braintrust-sdk-python/releases/tag/py-sdk-v0.9.0) - Improved [Claude Agent SDK](/integrations/agent-frameworks/claude-agent-sdk) tracing, including CLI and parallel child-span handling, and added `tags` support to Python SDK builder classes.
  * [v0.8.0](https://github.com/braintrustdata/braintrust-sdk-python/releases/tag/py-sdk-v0.8.0): Added [pytest integration](/integrations/sdk-integrations/pytest) for running tracked evals inside pytest suites. Expanded [Agno](/integrations/sdk-integrations/agno) tracing to include workflow execution paths, including workflow spans and nested agent traces. Fixed installation of the Python SDK `performance` extra.

  ### TypeScript SDK releases

  * [v3.6.0](https://github.com/braintrustdata/braintrust-sdk-javascript/releases/tag/js-sdk-v3.6.0) - Added OpenRouter SDK instrumentation support. See [OpenRouter SDK](/integrations/ai-providers/openrouter).
  * [v3.5.0](https://github.com/braintrustdata/braintrust-sdk-javascript/releases/tag/js-sdk-v3.5.0) - Agent tool call tracing for [AI SDK v5/v6](/integrations/sdk-integrations/vercel): individual tool calls in agentic workflows are now captured as separate `function`-type spans. `prompts.create()` now accepts an `environment` field. Tool definitions are now recorded in span metadata for AI SDK and [Google GenAI](/integrations/ai-providers/gemini) wrappers. Fixed `AsyncLocalStorage` resolution in Cloudflare Workers environments.
  * [v3.4.0](https://github.com/braintrustdata/braintrust-sdk-javascript/releases/tag/js-sdk-v3.4.0) - Added [Node.js test runner integration](/integrations/sdk-integrations/node-test-runner), support for `tags` in functions, model parameter support, and a `BRAINTRUST_DEBUG_LOG_LEVEL` [environment variable](/instrument/advanced-tracing#tune-performance) for internal SDK troubleshooting output.
  * [v3.3.0](https://github.com/braintrustdata/braintrust-sdk-javascript/releases/tag/js-sdk-v3.3.0) - Improved eval runtime performance by up to 8x through smarter flush backpressure. Added `BRAINTRUST_FLUSH_BACKPRESSURE_BYTES` [environment variable](/instrument/advanced-tracing#tune-performance) to configure the flush threshold. Deprecated `BRAINTRUST_LOG_FLUSH_CHUNK_SIZE`.

  ### Java SDK releases

  * [v0.2.10](https://github.com/braintrustdata/braintrust-sdk-java/releases/tag/v0.2.10) - Added the Braintrust Java auto-instrumentation agent (`braintrust-java-agent`): attach it at JVM startup with `-javaagent` to automatically trace OpenAI, Anthropic, Spring AI, LangChain4j, and Google GenAI calls with no code changes. The OpenAI instrumentation now also covers the `/v1/responses` endpoint. Spring AI instrumentation now supports the OpenAI and Anthropic model backends. Fixed: score spans in `Eval` runs now correctly include the `purpose` tag. See [Trace LLM calls](/instrument/trace-llm-calls#auto-instrumentation) for setup instructions.
  * [v0.2.9](https://github.com/braintrustdata/braintrust-sdk-java/releases/tag/v0.2.9) - Added Anthropic beta service instrumentation.
  * [v0.2.8](https://github.com/braintrustdata/braintrust-sdk-java/releases/tag/v0.2.8) - Support gateway header in devserver CORS.

  ### Ruby SDK releases

  * [v0.3.0](https://github.com/braintrustdata/braintrust-sdk-ruby/releases/tag/v0.3.0) - Added a [Rails engine](/evaluate/remote-evals#run-as-a-rails-engine) for mounting the eval server into existing Rails 8.x applications. Scorers can now return [multiple named scores](/evaluate/custom-code#return-multiple-scores) and attach metadata to scores. Trace structure aligned with other SDKs.
  * [v0.2.1](https://github.com/braintrustdata/braintrust-sdk-ruby/releases/tag/v0.2.1) - Added `trace:` keyword for [scoring full eval traces](/evaluate/custom-code#score-traces). Scorer and Task blocks now use keyword arguments. `Braintrust::Scorer.new` and `Braintrust::Task.new` are now available as top-level classes.
  * [v0.2.0](https://github.com/braintrustdata/braintrust-sdk-ruby/releases/tag/v0.2.0) - Added a [dev server](/evaluate/remote-evals) for running evaluations from the Braintrust UI against code on your own infrastructure. Define evaluators using `Braintrust::Eval::Evaluator` (inline or by subclassing), mount them with `Braintrust::Server::Rack.app`, and start a Rack-compatible server to expose them to the playground.

  ### C# SDK releases

  * [v0.2.3](https://github.com/braintrustdata/braintrust-sdk-dotnet/releases/tag/v0.2.3) - Added trace-level scoring: implement `ITracedScorer<TInput, TOutput>` to score based on intermediate trace spans, such as individual LLM calls in a multi-step task. The eval runner automatically flushes spans before traced scorers run. The new `EvalTrace` provides cached access to task spans via `GetSpansAsync()` and a reconstructed conversation via `GetThreadAsync()`. Existing `IScorer<TInput, TOutput>` implementations are unaffected.
  * [v0.2.2](https://github.com/braintrustdata/braintrust-sdk-dotnet/releases/tag/v0.2.2) - Eval tasks and scorers now handle errors gracefully. When a task throws, each scorer's new `ScoreForTaskException` method is called as a fallback. When a scorer throws, its new `ScoreForScorerException` method is called as a fallback. Both default to a score of `0.0` and can be overridden on `IScorer`.
  * [v0.2.1](https://github.com/braintrustdata/braintrust-sdk-dotnet/releases/tag/v0.2.1) - `Score` now accepts an optional `Metadata` dictionary for attaching structured data to individual scores.
  * [v0.2.0](https://github.com/braintrustdata/braintrust-sdk-dotnet/releases/tag/v0.2.0) - Added Anthropic instrumentation (`Braintrust.Sdk.Anthropic`). Moved OpenAI instrumentation into a separate `Braintrust.Sdk.OpenAI` package. Added automatic git repo info on experiments, with opt-out and field-level control via `RepoInfo()` and `GitMetadataSettings()` on `Eval.Builder`.

  ### Improvements

  * Documented how to [add metadata and tags to spans](/instrument/trace-application-logic#add-metadata-and-tags) at creation time or from within a span.
  * Loop is now available on the projects overview page, enabling natural language analysis of project metrics, usage trends, and patterns across all your projects. See [Analyze projects](/loop#analyze-projects) for details.
  * Loop is now available on individual project overview pages, with access to the project's logs, experiments, datasets, prompts, and score progress chart. See [Analyze projects](/loop#analyze-projects) for details.
  * Added alerts management to the **<Icon icon="chart-no-axes-column" /> Monitor** page.
  * Topics clustering now supports configurable sample size (100-50,000 summaries) for faster processing of large datasets. See [Re-generate topics](/observe/topics/manage#re-generate-topics) for details.
  * Added `TS_API_KEEP_ALIVE_TIMEOUT_SECONDS` environment variable for self-hosted deployments to [configure HTTP keep-alive timeout](/admin/self-hosting/advanced#configure-http-keep-alive-timeout) when running behind load balancers.
  * [Preset monitor charts](/observe/dashboards#create-custom-charts) (Spans, Latency, Total LLM cost, Token count, and Time to first token) now automatically exclude internal scorer spans generated by online scoring automations such as [Topics](/observe/topics) and [online scorers](/evaluate/score-online), so metrics reflect actual production traffic.
</Update>

<Update label="February 2026">
  ### TypeScript auto-instrumentation

  Braintrust now supports [auto-instrumentation](/instrument/trace-llm-calls) for TypeScript, enabling automatic tracing of all LLM calls with minimal setup.

  ### Topics for automated log insights

  Topics automatically analyze and classify logs to surface patterns and insights without manual review. Create topic maps that combine preprocessors (which transform trace data) and AI prompts (which extract summaries) to analyze your logs. Summaries are clustered into meaningful topics using machine learning, then used to automatically classify new and existing traces. Built-in topic maps include Task (user intents), Sentiment (emotional tone), and Issues (agent problems). Custom topic maps enable domain-specific analysis. See [Discover insights with Topics](/observe/topics) for details.

  <Note>
    **Self-hosted deployments**: Topics requires data plane v2.0+ and has additional eligibility requirements. See [Enable Topics](/admin/self-hosting/upgrade/v2#enable-topics) in the v2.0 upgrade guide.
  </Note>

  ### Braintrust gateway

  The Braintrust gateway provides a production-grade, unified API to access LLM models from OpenAI, Anthropic, Google, AWS, Mistral, and third-party providers with automatic caching, observability, and multi-provider support. Use any supported provider's SDK to call any provider's models—standardize on one SDK while accessing all available models. See [Use the Braintrust gateway](/deploy/gateway) for details.

  <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_31}</Badge></Tooltip>

  ### Braintrust CLI

  The `bt` command-line interface provides terminal access to Braintrust features including running evaluations, querying logs with SQL, managing prompts and projects, syncing data to local NDJSON files, and configuring coding agents. The CLI supports interactive log browsing, multiple authentication profiles for switching between organizations, and integration with CI/CD pipelines via API key authentication. See [CLI reference](/reference/cli/quickstart) for installation and available commands.

  ### Evaluation parameters

  Parameters let you create reusable, versioned configurations for your evaluations. Create parameters once in Braintrust and load them across multiple evaluations, enabling centralized management and environment-specific values. In remote evals, parameters automatically become editable controls in the playground UI, letting you experiment with different configurations without changing code. See [Create parameters](/evaluate/write-parameters) for details.

  <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_32}</Badge></Tooltip>

  ### Render attachments in custom views

  Custom trace views now support rendering images, videos, audio, and other attachments directly in the interface. When span data is fetched, Braintrust automatically converts attachment references to signed URLs ready for display. This enables building rich annotation interfaces for multimodal evaluations, visual inspection workflows, and media-heavy applications. See [Render attachments](/annotate/custom-views#render-attachments) for examples and implementation details.

  ### Navigate to trace origins

  Navigate from traces in logs back to their originating prompt or dataset row, enabling rapid iteration between analyzing results and refining the prompts or test data that generated them. See [Navigate to trace origins](/observe/examine-traces#navigate-to-trace-origins) for details.

  ### Loop improvements

  Loop now supports `@` mention autocomplete for quickly adding data sources to your queries. Type `@` in the Loop chat input to bring up a searchable menu of available datasets, experiments, project logs, playgrounds, and SQL queries. Loop can also help you file support tickets directly from the chat when you need assistance from the Braintrust support team, automatically including relevant context from your conversation. See [Select data sources](/loop#select-data-sources) and [Request support](/loop#request-support) for details.

  ### Trace-level scorers

  LLM-as-a-judge and custom code scorers can now access the entire execution trace to evaluate multi-step workflows and agent behavior. This enables scoring based on tool usage patterns, workflow steps, operation counts, and multi-turn interactions. See [Trace-level scorers](/evaluate/custom-code#score-traces) for details.

  <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_33}</Badge></Tooltip>

  ### LangSmith integration

  Braintrust now provides an experimental LangSmith wrapper to integrate LangSmith with Braintrust. The wrapper can either send tracing and evaluation calls to both LangSmith and Braintrust in parallel, or route them solely to Braintrust, with minimal code changes. See [LangSmith integration](/integrations/sdk-integrations/langsmith) for details.

  ### Braintrust MCP server improvements

  The Braintrust MCP server now includes a resource to auto-install the Braintrust SDK, streamlining initial setup. Also, the `btql_query` tool has been replaced with `sql_query` for querying logs using standard SQL syntax, and long tool descriptions have been moved into MCP resources to reduce context window usage while keeping detailed documentation accessible when needed. See [Braintrust MCP](/integrations/developer-tools/mcp) for more details.

  ### OpenCode integration

  Braintrust now integrates with [OpenCode](https://opencode.ai/), an open-source AI coding assistant. Install the Braintrust plugin to automatically configure the Braintrust MCP server and enable tracing, giving OpenCode access to query logs, fetch experiment results, and log evaluation data using natural language. Sessions are automatically traced with detailed span tracking for workflows, turns, and tool executions. See [OpenCode integration](/integrations/developer-tools/opencode) for setup instructions.

  ### Cursor integration

  Braintrust now integrates with [Cursor](https://cursor.sh/), the AI-powered code editor. Install the Braintrust extension to automatically configure the Braintrust MCP server, enabling Cursor to query logs, fetch experiment results, and log data using natural language. This streamlines AI-assisted development workflows with direct access to your Braintrust data from within Cursor. See [Cursor integration](/integrations/developer-tools/cursor) for setup instructions.

  ### Claude Code integration updates

  The `braintrust` Claude Code plugin now automatically configures the Braintrust MCP server for seamless integration. Also, the `trace-claude-code` plugin now supports embedding Claude Code traces inside parent traces, enabling better observability when Claude Code is used as part of a larger workflow. See [Embed traces in parent traces](/integrations/developer-tools/claude-code#embed-traces-in-parent-traces) for details.

  ### Image rendering security controls

  Braintrust now provides configurable image rendering modes to prevent sensitive data leaks from malicious image URLs. Organizations can control when images load in logs with three modes: auto-load images (default), click-to-load (requires user approval), or block all external images. This protection applies organization-wide to all logs, experiments, and playgrounds. See [Image rendering controls](/admin/organizations#control-image-rendering) for configuration details.

  <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_34}</Badge></Tooltip>

  ### Single span filters with aggregations

  Single span filters (`any_span()`) can now be combined with `GROUP BY` to aggregate traces based on span-level conditions. This enables analyzing patterns across traces that contain specific types of spans, such as counting traces with both errors and production tags grouped by model, or calculating average costs for traces with specific span characteristics. This works with both `traces` and `summary` shapes. See [Analyze traces with span filters](/reference/sql#analyze-traces-with-span-filters) for examples.

  <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_35}</Badge></Tooltip>

  ### LIMIT with aggregations

  SQL queries now support LIMIT with GROUP BY aggregations to restrict the number of grouped results returned. This enables top-N queries and paginating through aggregated data. When combined with ORDER BY, rows are sorted before limiting. See [LIMIT](/reference/sql#limit) for examples.

  <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_36}</Badge></Tooltip>

  ### Python SDK releases

  * [v0.7.0](https://github.com/braintrustdata/braintrust-sdk-python/releases/tag/py-sdk-v0.7.0): Added support for passing tags at experiment and eval creation time. The [Google ADK](/integrations/agent-frameworks/google) integration is now auto-enabled when using `braintrust.auto_instrument()` automatic instrumentation. Fixed a bug with custom `git_metadata_settings` in `braintrust.init()` crashing the SDK.

  * [v0.6.0](https://github.com/braintrustdata/braintrust-sdk-python/releases/tag/py-sdk-v0.6.0): Added automatic tool call span creation for [Pydantic AI](/integrations/agent-frameworks/pydantic-ai) agents with detailed timing, inputs, and outputs. Fixed compatibility with Agno v2.5.0 and resolved Pydantic AI patching issues when using logfire co-instrumentation.

  * [v0.5.6](https://github.com/braintrustdata/braintrust-sdk/releases/tag/py-sdk-v0.5.6): Added public `name` property to the Span interface, support for retrieving threads in Python, fixed LiteLLM wrapper metric handling for booleans, included root span ID in `update_span()`, and added `with_raw_response` to `responses.parse()` for OpenAI clients.

  * [v0.5.5](https://github.com/braintrustdata/braintrust-sdk/releases/tag/py-sdk-v0.5.5): Improved OpenAI Agents integration to handle all span types and added the `review` span type.

  * [v0.5.4](https://github.com/braintrustdata/braintrust-sdk/releases/tag/py-sdk-v0.5.6): Fixed empty tool call arguments from string concatenation with None.

  * [v0.5.3](https://github.com/braintrustdata/braintrust-sdk/releases/tag/py-sdk-v0.5.6): Added classifications field, improved thread safety for Python context vars, added support for overflowing payloads to S3 at upload time, and removed duplicate retry logic in HTTPConnection.

  ### TypeScript SDK releases

  * [v3.2.0](https://github.com/braintrustdata/braintrust-sdk-javascript/releases/tag/js-sdk-v3.2.0) - Added [Vitest integration](/integrations/sdk-integrations/vitest) for running evaluations directly within Vitest test suites. Adds JavaScript [auto-instrumentation](/instrument/trace-llm-calls) support.

  * [v3.1.0](https://github.com/braintrustdata/braintrust-sdk-javascript/releases/tag/js-sdk-v3.1.0) - Added support for passing tags at experiment creation time. Fixed token double counting between parent and child spans in [Vercel AI SDK](/integrations/sdk-integrations/vercel) integration.

  * [v3.0.0](https://github.com/braintrustdata/braintrust-sdk-javascript/releases/tag/js-sdk-v3.0.0) - Improved compatibility with Cloudflare Workers, Next.js Edge, and other restricted environments. Added cache write token metrics for OpenAI Agents. **Breaking changes**: Nunjucks templating is now a separate package. If you use `templateFormat: "nunjucks"` with prompts, you must install `@braintrust/templates-nunjucks-js` and register the plugin at startup. Mustache templating continues to work without any changes. See [v2.x to v3.x migration guide](/reference/sdks/typescript/migrations/v2-to-v3) for details.

  * [v2.2.2](https://github.com/braintrustdata/braintrust-sdk/releases/tag/js-sdk-v2.2.2) - Initialize parameter values to defaults. Include root span ID in span updates for better trace linking.

  * [v2.2.1](https://github.com/braintrustdata/braintrust-sdk/releases/tag/js-sdk-v2.2.1) - Added sub-agent nesting for [Claude Agent SDK](/integrations/agent-frameworks/claude-agent-sdk). Includes internal improvements for AsyncIterable input handling, trace scorer capabilities, and S3 overflow support for large payloads.

  ### Go SDK releases

  * [v0.2.1](https://github.com/braintrustdata/braintrust-sdk-go/releases/tag/v0.2.1) - Added langchaingo support to autoinstrumentation and new tracing library for the ADK (Agent Development Kit). Pinned orchestrion dependency to version 1.6.1 for stability.

  ### Java SDK releases

  * [v0.2.7](https://github.com/braintrustdata/braintrust-sdk-java/releases/tag/v0.2.7) - Upgraded OpenTelemetry dependency versions for improved observability and tracing capabilities.

  * [v0.2.6](https://github.com/braintrustdata/braintrust-sdk-java/releases/tag/v0.2.6) - Added task and scorer error handling for evaluations. Fixed experiment dataset linking to properly connect experiments with remote datasets.

  * [v0.2.5](https://github.com/braintrustdata/braintrust-sdk-java/releases/tag/v0.2.5) - Added support for custom JSON object mapper configuration, top-level tags and metadata for experiments, and tagging metadata for eval cases.

  * [v0.2.4](https://github.com/braintrustdata/braintrust-sdk-java/releases/tag/v0.2.4) - Added distributed tracing for remote scorers with per-scorer span creation. Fixed issue where score spans were incorrectly flagged as missing a score.

  ### Ruby SDK releases

  * [v0.1.4](https://github.com/braintrustdata/braintrust-sdk-ruby/releases/tag/v0.1.4) - Improved scorer API to handle objects in addition to strings, fixed HTTP redirect handling for Evals API calls, and added proper dataset linking for experiments using remote dataset sources.

  * [v0.1.3](https://github.com/braintrustdata/braintrust-sdk-ruby/releases/tag/v0.1.3) - Added origin tagging for eval datasets, elevated datasets to first-class API, refactored Eval API to use `api` objects instead of `state`, and fixed Anthropic instrumentation to properly record system prompts.

  * [v0.1.2](https://github.com/braintrustdata/braintrust-sdk-ruby/releases/tag/v0.1.2) - Added support for prompts.

  ### Improvements

  * Added [troubleshooting](/instrument/trace-application-logic#span-names-must-be-strings) to prevent validation failures that silently hide logs from dashboards.
  * Added [Nunjucks templating](/evaluate/write-prompts#nunjucks) support when invoking prompts via SDK.
  * Updated permission groups interface to use a **Members** button for adding and removing users from groups, improving discoverability.
  * Added SQL cursor pagination support using `OFFSET '<CURSOR_TOKEN>'` (with `_pagination_key` sorting), alongside BTQL `cursor:` pagination.
  * Added `FILTER_SPANS()` function for SQL queries to return only matching spans from traces, complementing `ANY_SPAN()` which returns entire traces. See [Filter to matching spans only](/reference/sql#matching-spans-filters) for details.
  * Added support for unpivoting nested fields from arrays of objects in SQL queries. See [Array of objects unpivot](/reference/sql#unpivot) for examples.
  * Added Claude Opus 4.6 support.
  * Added search functionality to JSON attachment viewer.
  * Simplified trace views in diff mode by disabling Timeline, Thread, and custom views during experiment comparison for a more focused side-by-side comparison experience.
  * Added support for viewing span instead of traces in the table on the experiments page.
  * Added support for changing a project's creator via the API.
  * Added support for setting a default limit for listing out experiments via the API.
  * Added support for input cache read and write cost fields in [custom AI provider configuration](/integrations/ai-providers/custom#model-metadata), enabling accurate cost estimation for models that support prompt caching.
  * Added API key authentication option for [AWS Bedrock](/integrations/ai-providers/bedrock) as an alternative to IAM credentials.
  * Improved trace detail panel to display custom column values and the create custom column button on all spans, not just root spans.
  * Added invite members option to the projects page dropdown for quick access to organization member invitations.
</Update>

<Update label="January 2026">
  ### Auto-instrumentation for Python, Ruby, and Go

  Braintrust now supports [auto-instrumentation](/instrument/trace-llm-calls) for Python, Ruby and Go, enabling zero-code tracing for most providers. For any other providers, use [manual wrapping](/instrument/trace-llm-calls#manual-instrumentation).

  ### Temporal integration

  Braintrust now integrates with [Temporal](https://temporal.io/), a durable execution platform for building reliable distributed applications. The integration automatically traces Temporal workflows and activities, capturing execution spans, metadata, and distributed traces across workers. This provides full observability across workflow executions with parent-child relationships between workflows and activities. Available for TypeScript, Python, and Go. See [Temporal integration](/integrations/sdk-integrations/temporal) for setup instructions.

  ### TrueFoundry integration

  Braintrust now integrates with [TrueFoundry](https://www.truefoundry.com/), an AI Gateway that provides a unified interface for accessing multiple AI providers. TrueFoundry exports LLM traces to Braintrust using OpenTelemetry, automatically capturing all interactions including chat completions, agent responses, embeddings, token usage, costs, and performance metrics. Configure the integration through the TrueFoundry dashboard by setting up the Braintrust OpenTelemetry endpoint and authentication headers. See [TrueFoundry integration](/integrations/sdk-integrations/truefoundry) for setup instructions.

  ### Kanban layout for reviews

  The <Icon icon="list-checks" /> **Review** page now supports a kanban layout for managing flagged spans. Drag and drop cards to update status, and click to open the full trace. See [Kanban layout](/annotate/human-review#use-kanban-layout) for details.

  ### Streamlined online scoring setup

  Create online scoring rules directly from scorers or the logs browser with automatic prepopulation of scorers and filters, enabling rapid iteration from production logs to scoring rules. See [Create scoring rules](/evaluate/score-online#create-scoring-rules) for details.

  ### Loop on trace pages

  <Icon icon="blend" /> **Loop** is now available when viewing individual traces. Select a trace and open it in fullscreen or a separate page, then use Loop to summarize trace execution, identify errors, search project logs for similar patterns, and generate custom visualizations. Loop on trace pages provides a focused set of tools optimized for single-trace analysis. See [Analyze individual traces](/loop#analyze-individual-traces) for details.

  ### View raw trace and span data

  You can now view and search the complete JSON representation of individual spans or entire traces. This gives you access to all span fields including metadata and internal properties that aren't visible in other views, making it easier to debug issues, verify exact values, and export data for reproduction. See [View raw span data](/observe/examine-traces#view-raw-span-data) for details.

  ### HAVING clause for SQL queries

  SQL queries now support the `HAVING` clause for filtering aggregated results after `GROUP BY` operations. Use `HAVING` to filter based on aggregate values like counts, averages, or sums. This enables queries like finding models with average scores above a threshold or identifying patterns with a minimum number of occurrences. See [HAVING for filtering aggregations](/reference/sql#having-for-filtering-aggregations) for details.

  ### Implicit aliasing in SQL queries

  SQL queries now support implicit aliasing for multi-part identifiers. When you reference nested fields like `metadata.category`, you can now use the short form `category` in all SQL operations when unambiguous. This makes queries more concise while maintaining clarity. For more details, see [Implicit aliasing](/reference/sql#implicit-aliasing).

  ### Project-level AI providers

  You can now configure AI provider API keys at the project level to override organization-level keys. This allows you to isolate API usage, manage separate billing, or use different credentials per project. Project-level keys work across playgrounds, experiments, and the AI Proxy. See [Configure AI providers](/admin/ai-providers#add-a-project-level-provider) for setup instructions.

  ### Flexible tags across spans

  You can now add tags to any span in a trace (not just root spans) through the SDK and UI. Tags from all spans are automatically aggregated at the trace level for filtering. When you log additional tags, they are merged (union) rather than replaced, allowing you to add contextual tags throughout your workflow. See [Add tags](/instrument/trace-application-logic#add-metadata-and-tags) for details.

  ### Thread view search

  You can now search within the <Icon icon="messages-square" /> **Thread** view of a trace to quickly find specific content in long conversations. This is especially helpful when debugging multi-turn LLM conversations, finding where specific tools were called, or locating particular scores in lengthy traces. See [thread view](/observe/examine-traces#view-as-a-conversation) for details.

  ### New docs structure

  Our docs now follow a systematic workflow for using Braintrust to measure, understand, and improve your AI applications. This goal-based structure makes it easier to find the right guidance at the right time:

  * **Instrument** - Capture traces from your application
  * **Observe** - Find patterns and issues in your data
  * **Annotate** - Review and improve with human feedback
  * **Evaluate** - Test and validate improvements
  * **Deploy** - Ship changes and monitor impact

  The new structure includes:

  * [Braintrust workflow overview](/workflow)
  * Improved [observability quickstart](/tracing-quickstart)
  * New [Integrations tab](/integrations) for all AI provider and framework integrations
  * New [Reference tab](/reference) for SDKs, API, and other reference content

  ### Python SDK releases

  * [v0.5.2](https://github.com/braintrustdata/braintrust-sdk/releases/tag/py-sdk-v0.5.2) - Added [`braintrust.auto_instrument()`](/instrument/trace-llm-calls) for one-line automatic instrumentation, cache control options for evals, and automatic cache cleanup after span export. Fixed hanging evals issue.

  * [v0.5.0](https://github.com/braintrustdata/braintrust-sdk/releases/tag/py-sdk-v0.5.0) - Added trace argument support in scorers, dataset initialization with ID parameter, and force tag replacement option. Improved Claude Agent SDK async iterable support and git metadata collection.

  * [v0.4.3](https://github.com/braintrustdata/braintrust-sdk/releases/tag/py-sdk-v0.4.3) - The "agent" function type has been renamed to "workflow" to better reflect its purpose.

  * [v0.4.2](https://github.com/braintrustdata/braintrust-sdk/releases/tag/py-sdk-v0.4.2) - OpenTelemetry enhancements, improved span tags, and `patch_litellm()` now patches `responses` and `aresponses`.

  * [v0.4.1](https://github.com/braintrustdata/braintrust-sdk/releases/tag/py-sdk-v0.4.1) - Fixes Pydantic AI wrapper context loss, improves serialization, and adds Claude Agent SDK Query property forwarding.

  * [v0.4.0](https://github.com/braintrustdata/braintrust-sdk/releases/tag/py-sdk-v0.4.0) - Bug fixes for global scoping issues. **Breaking change**: Requires Python 3.10+.

  * [v0.3.15](https://github.com/braintrustdata/braintrust-sdk/releases/tag/py-sdk-v0.3.15) - Smaller logs with `wrap_openai`, new [`setup_pydantic_ai()`](/integrations/agent-frameworks/pydantic-ai) integration.

  ### TypeScript SDK releases

  * [v2.2.0](https://github.com/braintrustdata/braintrust-sdk/releases/tag/js-sdk-v2.2.0) - Added cache control options for evals and automatic cache cleanup after span export.

  * [v2.1.0](https://github.com/braintrustdata/braintrust-sdk/releases/tag/js-sdk-v2.1.0) - Added [Temporal](/integrations/sdk-integrations/temporal) integration, `wrapAgentClass` for agent tracing, trace argument support in scorers, and bug fixes.

  * [v2.0.1](https://github.com/braintrustdata/braintrust-sdk/releases/tag/js-sdk-v2.0.1) - Bug fixes and improvements.

  * [v2.0.0](https://github.com/braintrustdata/braintrust-sdk/releases/tag/js-sdk-v2.0.0) - Span purpose tracking for scorers, bug fixes. **Breaking change**: Zod is now a peer dependency (`npm install zod`).

  * [v1.1.1](https://github.com/braintrustdata/braintrust-sdk/releases/tag/js-sdk-v1.1.1) - Fixes async generator handling.

  * [v1.1.0](https://github.com/braintrustdata/braintrust-sdk/releases/tag/js-sdk-v1.1.0) - Smaller logs with `wrapOpenAI`, Vercel AI SDK v6 support, Nunjucks template format for prompts.

  * [v1.0.3](https://github.com/braintrustdata/braintrust-sdk/releases/tag/js-sdk-v1.0.3) - Browser support for Eval and SDK integrations, plain text eval output, type safety improvements.

  ### Go SDK releases

  * [v0.2.0](https://github.com/braintrustdata/braintrust-sdk-go/releases/tag/v0.2.0) - Adds automatic instrumentation using [Orchestrion](https://github.com/DataDog/orchestrion) for zero-code compile-time tracing. **Breaking change**: Requires Go 1.24+.

  * [v0.1.2](https://github.com/braintrustdata/braintrust-sdk-go/releases/tag/v0.1.2) - Adds time to first token support for OpenAI responses, aligns Gemini span naming with other SDKs.

  ### Java SDK releases

  * [v0.2.3](https://github.com/braintrustdata/braintrust-sdk-java/releases/tag/v0.2.3) - Added support for remote scorers in evals and devserver.

  * [v0.2.2](https://github.com/braintrustdata/braintrust-sdk-java/releases/tag/v0.2.2) - Extended [Langchain4j](https://github.com/langchain4j/langchain4j) instrumentation to AI Services.

  * [v0.2.1](https://github.com/braintrustdata/braintrust-sdk-java/releases/tag/v0.2.1) - Adds remote evals devserver support and [Langchain4j](https://github.com/langchain4j/langchain4j) integration for OpenAI chat models.

  ### Ruby SDK releases

  * [v0.1.1](https://github.com/braintrustdata/braintrust-sdk-ruby/releases/tag/v0.1.1) - Added OpenAI moderations API support, Anthropic Messages beta API support, and automatic trace flushing at process end for ephemeral environments.

  * [v0.1.0](https://github.com/braintrustdata/braintrust-sdk-ruby/releases/tag/v0.1.0) - Introduces automatic instrumentation via `require: "braintrust/setup"` in Gemfile for Rails and Ruby applications. Adds new integration API for custom instrumentation. **Breaking change**: Deprecates `.wrap(client)` method in favor of new integration API.

  ### C# SDK releases

  * [v0.0.2](https://github.com/braintrustdata/braintrust-sdk-dotnet/releases/tag/v0.0.2) - Added support for enriching spans with [custom metadata and tags](/instrument/trace-application-logic#add-metadata-and-tags), enhanced OpenAI instrumentation with additional span attributes. **Breaking change**: APIs are now async-first. Update `eval.Run()` to `await eval.RunAsync()` and change `static void Main` to `static async Task Main`. Cursor types replaced with `IAsyncEnumerable`. Some interfaces renamed.

  ### Improvements

  * In-product Python and Typescript editors now lint and autocomplete while you type.
  * Added default data view preferences in personal settings. You can now set your preferred format (Pretty, JSON, YAML, or Tree) for viewing span fields across all traces, and reset all manually configured view overrides with one click. See [Personal settings](/admin/personal-settings#default-data-display-format) for details.
  * Self-hosted deployments now automatically provision service tokens when configuring data plane URLs, eliminating a manual setup step and immediately enabling features like data retention. See [Configure API URLs](/admin/organizations#configure-api-urls-self-hosted) for details.
  * You can now create log alerts directly from filtered views on the <Icon icon="activity" /> **Logs** or <Icon icon="chart-no-axes-column" /> **Monitor** pages. See [Set up alerts](/admin/automations/alerts) for details.
  * Added inline dataset creation from experiment and playground workflows. You can now create datasets directly when running experiments or adding datasets to playgrounds without navigating to the datasets page.
  * Documented testing scorers with manual input, datasets, and production logs. Each method is optimized for different stages of scorer development, from prototyping to validation against real-world data. See [Test scorers](/evaluate/write-scorers#test-scorers-and-classifiers) for details.
  * Slack channel selection now uses a searchable typeahead interface instead of manual channel ID entry. The channel list automatically refreshes every 7 days and includes a manual refresh option for newly created channels.
  * Added client-side SQL filtering for experiment runs at the dataset level. You can now filter the list of experiments associated with a dataset using SQL queries based on scores, timestamps, experiment names, and other fields. Filter states persist in URLs for easy bookmarking and sharing. See [Filter experiment runs](/annotate/datasets/track-performance#filter-experiment-runs).
  * Improved MCP server authentication with refresh token support for longer-lasting sessions without re-authentication.
  * Added chart-based time filtering for dataset experiment runs. See [Filter experiment runs](/annotate/datasets/track-performance#filter-experiment-runs).
  * The "agent" function type has been renamed to "workflow" to better reflect its purpose.
  * The data plane's realtime service now contains only metadata, with no visibility into underlying data.
  * Dataset imports now display a live table preview that updates as you categorize columns. CSV columns now default to the `input` field for faster setup, and tag fields are automatically created in your project configuration. See [Upload CSV](/annotate/datasets/create#upload-csv-json) for details.
  * Native GCS authentication is now available for self-hosted deployments on Google Cloud Platform. You can use Workload Identity instead of HMAC keys for more secure authentication. See [GCS authentication options](/admin/self-hosting/deploy#gcs-authentication-options) for configuration details.
  * The version dialog on prompt and scorer activity views now allows diffing any version against each other.
  * Moved toolbar controls (columns, row height, layout toggles) into a consolidated <Icon icon="settings-2" /> **Display** menu.
  * Added explicit `function_type` parameter for global function invocation, replacing naming-based type inference.
  * Restricted file parts in playgrounds and prompts to formats supported by the selected provider.
  * The user feedback modal now supports adding up to 5 image attachments.
  * Added [limits documentation page](/plans-and-limits) covering usage quotas, rate limits, and system constraints.
  * Improved [evaluation quickstart](/evaluation-quickstart) with inline datasets, iteration examples showing prompt improvements, and troubleshooting.
</Update>

<Update label="December 2025">
  ### Claude Code integration

  You can now use Braintrust with [Claude Code](https://code.claude.com/docs/en/overview), Anthropic's agentic coding tool. The integration automatically traces Claude Code sessions to give you insight into LLM calls, tool usage, and performance, while enabling Claude to query logs, fetch experiment results, and log data using natural language, especially useful when writing and iterating on evals. For setup instructions and usage examples, see the [Claude Code integration guide](/integrations/developer-tools/claude-code).

  ### New SDKs: Java, Go, Ruby, and C\#

  Braintrust now offers native SDKs for Java, Go, Ruby, and C#/.NET that provide tools for evaluating and tracing AI applications in Braintrust.

  See the SDK documentation for setup instructions and examples: [Java](https://github.com/braintrustdata/braintrust-sdk-java), [Go](https://github.com/braintrustdata/braintrust-sdk-go), [Ruby](https://github.com/braintrustdata/braintrust-sdk-ruby), [C#](https://github.com/braintrustdata/braintrust-sdk-dotnet/tree/main).

  ### Nunjucks templating syntax for prompts

  You can now use [Nunjucks](https://mozilla.github.io/nunjucks/templating.html) as an advanced templating syntax for prompts in the UI. Nunjucks provides features like loops, conditionals, and filters for sophisticated prompt engineering workflows. For more details, see [Use templating](/evaluate/write-prompts#use-templating).

  ### Track dataset performance across experiments

  You can now see which experiments used your dataset and how each row performed. This helps you identify problematic test cases and understand your evaluation data quality. For more details, see [Track dataset performance](/annotate/datasets/track-performance).

  ### Slack integration for alerts

  You can now post alerts to Slack channels when conditions are met. For more details, see [Alerts](/admin/automations/alerts).

  <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_37}</Badge></Tooltip>

  ### SQL syntax support

  BTQL now supports standard SQL syntax as an alternative to the native clause-based syntax. The parser automatically detects whether your query is SQL or BTQL. For more details, see [SQL](/reference/sql).

  <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_38}</Badge></Tooltip>

  ### MCP servers in prompts

  You can now use public MCP (Model Context Protocol) servers to give your prompts access to external tools and data. This is useful for evaluating complex tool calling workflows, experimenting with external APIs and services, and tuning public MCP servers. For more details, see [Add MCP servers](/evaluate/write-prompts#add-mcp-servers)

  <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_39}</Badge></Tooltip>

  ### Custom trace views

  Using Loop, you can now use natural language to create custom views of traces. This helps you highlight specific parts of a trace or visualize the trace in a way that is specific to your use case. For more details, see [Create custom trace views](/annotate/custom-views).

  ### Pass/fail thresholds for scorers

  You can now define a minimum score (between 0 and 1) that a scorer must achieve for a result to be considered passing. This helps you quickly identify which evaluations meet your quality standards. For more details, see [Pass/fail thresholds](/evaluate/custom-code#set-pass-thresholds).

  <Tooltip tip="Self-hosted customers must upgrade to use this feature." cta="Upgrade" href="/admin/self-hosting/index#upgrades"><Badge size="md" className="text-xs px-2">Requires data plane {version_40}</Badge></Tooltip>

  ### TypeScript SDK releases

  * [v1.0.0](https://github.com/braintrustdata/braintrust-sdk/releases/tag/js-sdk-v1.0.0) - This release moves OpenTelemetry functionality to the separate `@braintrust/otel` package. This solves ESM build issues in Next.js (edge), Cloudflare Workers, Bun, and TanStack applications, and adds support for both OpenTelemetry v1 and v2.

    If you are using OpenTelemetry functionality, this is a **breaking change**. See [TypeScript SDK upgrade guide](/reference/sdks/typescript-upgrade-guide) for migration instructions. If you are not using OpenTelemetry, upgrade as usual with `npm install braintrust@latest`.

  * [v1.0.1](https://github.com/braintrustdata/braintrust-sdk/releases/tag/js-sdk-v1.0.1) - This release makes URLs clickable in supported terminals when running evaluations and enables project-level AI secrets to work correctly in remote evaluations.

  * [v1.0.2](https://github.com/braintrustdata/braintrust-sdk/releases/tag/js-sdk-v1.0.2) - This release improves tracing for Vercel AI SDK applications, adding complete visibility into [multi-step tool interactions](/integrations/sdk-integrations/vercel#multi-round-tool-interactions-typescript) and automatic tracing for [AI SDK Agents](/integrations/sdk-integrations/vercel#trace-agents-typescript).

  ### Improvements

  * Added `json_extract` function to BTQL for extracting values from JSON strings using path expressions with support for dot notation, array indexing, and nested paths.
  * Added progress bar for UI-triggered experiments.
  * Online scores now always show up in the list of scores in the summary view.
  * Online scores now always show up in the trace table, even if they haven't been run yet.
  * Added support for extracting last turn prompts and scorer-format dataset inputs in iterate in playground modals.
  * Updated BTQL filters to automatically extract filter statements from broader input.
  * Added docs on [attaching custom metadata to traces](/integrations/sdk-integrations/vercel#add-metadata-typescript) when using the Vercel AI SDK.
  * Expanded deep search docs with setup instructions, query examples, and filtering workflows.
  * Improved Loop chart type and unit selection for data visualizations
  * Added ability to [filter logs and experiments by comments](/observe/filter#apply-a-filter).
  * Added project description field to projects. This can be used to provide additional context to teammates and when using AI features.
  * Consolidate project configuration and organization settings pages into a single page.
</Update>

<Update label="November 2025">
  * Custom metric columns on the experiments list page
  * Aggregate table column headers on the experiments list page
  * Resizable trace timeline sidebar
  * Tag from prompt/scorer pages
  * Add option to maintain hierarchy in trace tree view while filtering span types
  * Dataset schemas with visual schema builder and form-based editing with validation
  * Aggregate table column headers on the projects list page
  * Added support for Loop to make btql queries with arbitrary time range
  * Added logs and dataset browsers to scorer detail page
  * Added support for Loop to generate monitoring chart in monitor page's edit chart dialog
  * BTQL now supports using `dimensions` and `measures` with the `summary` shape to group and aggregate traces. This enables analyzing patterns, monitoring performance trends, and comparing metrics across models or time periods. See [Aggregated trace analytics](/reference/sql#aggregated-trace-analytics).
  * BTQL queries issued through the API are now rate limited at 20 requests per object per minute.
  * Added automatic context mechanism to Loop to automatically add currently viewed trace as context in logs and experiments page
  * Added Grok 4.1 support
  * Added Claude Opus 4.5 support
  * If you are self-hosted, requests to [https://api.braintrust.dev](https://api.braintrust.dev) will now fail
  * Fix max tokens and reasoning token budget settings for Gemini models

  ### Python SDK version 0.3.8

  * Fixes logging very deep objects or with circular recursion

  ### Python SDK version 0.3.7

  * Added time to first token for Anthropic wrapper
  * Fixes nesting for OpenAI agents wrapper

  ### TypeScript SDK version 0.4.9

  * SDK integration rewrite. Based on customer feedback we rewrote the integration to be simpler and more robust. Now officially supports v3 up to v6 of the library. All users are recommended to switch to `wrapAISDK` instead of now deprecated `wrapAISDKModel` and `BraintrustMiddleware`. BREAKING CHANGE: spans have a different input/output and metadata and the do\* spans are no longer needed.

  ### SDK Integrations: Google ADK 0.2.3

  * Support MCP agent tracing

  ### SDK Integrations: LangChain / LangGraph JS  0.2.1

  * Added time to first token metric

  ### SDK Integrations: LangChain / LangGraph Python 0.1.5

  * Added time to first token metric
</Update>

<Update label="October 2025">
  * Enabled editing and resending Loop chat messages
  * Document how to integrate Apollo GraphQL and Braintrust for automatic tracing
  * Add support for Grok 4 Fast (Reasoning & Non-Reasoning)
  * Add support for Groq gwen/gwen3-32 & moonshotai/kimi-k2-instruct-0905
  * Deprecate Anthropic Claude 3.5 models as they are no longer supported by Anthropic
  * Modify Apply filter button in btql tool to be more prominent
  * Added AI-assisted generation to run data box in scorer details page
  * Added message queuing to Loop
  * Added a button to extract filter clause from a btql query in filter btql editor
  * Start a Loop conversation from the CMD-K menu
  * Move Loop button to the bottom right of the screen
  * Use case examples when creating a playground
  * Java SDK
  * Scope collapse state for span fields by the span type
  * Collapse/expand all button for LLM data view
  * By default, collapse all messages in LLM data view besides the last turn
  * Generate scorer spans when applying scores to logs
  * Added support for scoring experiment rows
  * Added AI-assisted generation in tools form, btql filter form and online scoring form
  * Increased default maximum agentic tool use roundtrips from 5 to 100
  * Added support for Gemini tracing
  * Added support for Claude 4.5 Haiku
  * Added Loop to the prompt and scorer detail pages
  * **Refreshed OpenAI Realtime Audio proxy support** - Updated AI proxy to support the latest OpenAI SDK (v6.0+) for realtime audio interactions
    * Added support for both `OpenAIRealtimeWebSocket` (browser/Cloudflare Workers) and `OpenAIRealtimeWS` (Node.js with ws library)
    * Updated event types to match the current OpenAI Realtime API specification (`response.output_audio.delta`, `response.output_text.delta`, etc.)
    * Added header-based authentication and logging with `x-bt-parent` and `x-bt-compress-audio` headers
    * Improved audio logging with automatic format detection and optional MP3 compression for reduced storage costs
  * Added "Pretty" span field display option that optimizes for object value readability and renders object values in markdown
    * The Pretty display option replaces the Markdown option since Pretty renders markdown by default
  * Added support for viewing spans in the table on the logs page
  * Added GPT-5 Pro support
  * Added **Review** page to see spans marked for review in logs, experiments and datasets across a project
  * Fixed Loop prompt optimization of remote evals
  * Fix issue with thinking events coming from Mistral
  * Added Toplist and Big number monitor chart types
  * Support for [JSON attachments](/instrument/attachments)
  * Improve "Raw span data" and new buttons to download a span or entire trace as JSON from the trace viewer

  ### SDK Integrations: LangChain (Python) v0.1.2

  * Bug fix to ignore async context changed error

  ### Python SDK version 0.3.6

  * Fixed remote evals bug where experiments were not properly marked as completed on the backend
  * Fixed dataset `_internal_btql` parameter to properly override default BTQL settings (e.g., custom limit values)

  ### TypeScript SDK version 0.4.8

  * Added OpenTelemetry distributed tracing helpers (`contextFromSpanExport()` and `spanContextFromSpanExport()`) for seamless trace propagation between Braintrust and OpenTelemetry across service boundaries

  ### Python SDK version 0.3.5

  * Added DSPy integration with `wrap_dspy` wrapper for automatic tracing of DSPy applications
  * Added OpenTelemetry distributed tracing helpers (`context_from_span_export()` and `span_context_from_span_export()`) for seamless trace propagation between Braintrust and OpenTelemetry across service boundaries

  ### Python SDK version 0.3.4

  * Added support for `GEMINI_API_KEY` environment variable

  ### TypeScript SDK version 0.4.6

  * Properly support querying versioned datasets

  ### TypeScript SDK version 0.4.3

  * Improved LangChain integrations with simplified parsing for both TypeScript and Python
  * Added JSON attachment SDK support

  ### TypeScript SDK version 0.4.2

  * Add OpenTelemetry compatibility mode for TypeScript. This allows OTel spans to work with Evals

  ### TypeScript SDK version 0.4.1

  * Added Google GenAI wrapper support
  * Updated Mastra wrapper methods from `generateVNext`/`streamVNext` to `generate`/`stream`
  * Moved langchain-js braintrust dependency to peer dependencies
  * Fixed handling of attachments for Anthropic to avoid large base64 strings in UI
  * Fixed preservation of result object when returning from `wrappedStreamObject` in AI SDK
  * Fixed `LanguageModelV1#supportsUrl` being a function, not a property

  ### Python SDK version 0.3.3

  * Properly support querying versioned datasets
</Update>

<Update label="September 2025">
  * Added Anthropic Claude 4.5 Sonnet support
  * Fixed Gemini schema support to enable proper function calling and structured outputs when using Google's Gemini models through Braintrust and the AI proxy
  * Added Claude Agent SDK Integration support
  * Added Gemini Flash and Lite Preview (Sept 2025) support
  * Improved prompt detail chat logging and added link to corresponding trace
  * Fixed bugs with parallel tool calling in Loop
  * Enabled Loop to write BTQL queries against arbitrary data sources on non-BTQL-sandbox pages
  * Added support for creating datasets and scorers with Loop from the experiment, dataset, and logs pages
  * Resolved excessive `localStorage` usage in Loop and BTQL sandbox
  * Improved Loop's `from` clause handling in the BTQL sandbox
  * Fixed cross-tab syncing and session restoration bugs in Loop
  * Prompt/scorer activity view UI updates
    * Before: selecting a version showed a diff vs. the current editor content, where the selected version is the base of the diff
    * After: prompt versions can be viewed without diffing vs. editor. When diff is enabled, version is shown as incoming, to indicate what would occur when reverting to that version
  * Added support for updating the email associated with billing data
  * Added support for iterating on logs in playgrounds
  * Added support for scoring existing logs
  * Trace tree is now visible in human review mode
  * BTQL sandbox improvements
    * Loop is now on the page and can write queries, debug errors and answer syntax questions
    * Tabs
    * Simple charts
    * Improved auto-complete
  * Updated UI color palette
  * Custom charts added to the monitor page (requires data plane 1.1.22)
  * View state changes for non-saved views
    * Before: We would attempt to restore any previous edited view state to the URL
    * After: With a few exceptions, edited view state for non-saved views is only represented in the URL
  * Loop can search through Braintrust's docs and blog posts to help you answer questions about how to use Braintrust, including generating sample code

  ### Python SDK version 0.3.1

  * Ensure experiments use SpanComponentsV3 by default

  ### Python SDK version 0.3.0

  * Added OpenTelemetry compatibility mode for seamless integration between Braintrust and OTEL tracing
  * Added `setup_claude_agent_sdk` for automatic tracing of Claude Agent SDK applications
  * Improved Anthropic wrapper to log consistent input/output format
  * Added `strict` parameter to `Prompt.build` for strict schema validation
  * Added SpanComponentsV4 support

  ### TypeScript SDK version 0.4.0

  * Added `wrapClaudeAgentSDK` for automatic tracing of Claude Agent SDK applications
  * Improved Anthropic wrapper to log consistent input/output format
  * Fixed AI SDK model detection in `wrapGenerate` callback
  * Added SpanComponentsV4 support
</Update>

<Update label="August 2025">
  * Traces in the trace viewer on the logs page can now show all associated traces based on a metadata field or tag
  * Monitor page layout changed to be more responsive to screen size
  * Various UX improvements to prompt dialog
  * Improved onboarding experience
  * Trace timeline layout improvements
  * Pro plan organizations can now downgrade to the Starter plan via the settings page without contacting support
  * Prevent read-only users from downloading data from the UI
  * @mention team members in comments to notify them via email. To mention someone, type "@" and a team member's name or email in any comment input
  * You can now assign users to rows in experiments, logs, and datasets. Once assigned, you can filter rows by a specific user or a group of users
  * View configuration has been changed to no longer auto-save changes. It now shows a dirty state and you have the option of saving or resetting those changes back to the base view

  ### TypeScript SDK version 0.3.7

  * Support locking down remote evals via `--dev-org-name` to only accept users from your org
  * Fixed parent span precedence issues for better trace hierarchy
  * Improved propagation of parentSpanId into parentSpanContext for OpenTelemetry JS v2 compatibility
  * Fold the `@braintrust/core` package into `braintrust`. This package consists of a small set of utility functions that is more easily-managed as part of the main `braintrust` package. After version `0.3.7`, you should no longer need a dependency on `@braintrust/core`

  ### Python SDK version 0.2.6

  * Python SDK now correctly nests spans logged from inside tool calls in OpenAI Agents

  ### Python SDK version 0.2.5

  * Support data masking (see [docs](/instrument/advanced-tracing#mask-sensitive-data))
  * Remote evals in Python SDK
  * Support tags in Eval hooks
  * Validate attachment file readability at creation time

  ### Python SDK version 0.2.4

  * Allow non-batch span processors in `BraintrustSpanProcessor`

  ### Python SDK version 0.2.3

  * Fix openai-agents to inherit the right tracing context

  ### TypeScript SDK version 0.3.6

  * OpenAI responses wrapper no longer filters out span data fields when logging
  * Fixed `withResponse` and `wrapOpenAI` interaction to not hide response data

  ### TypeScript SDK version 0.2.5

  * Support data masking (see [docs](/instrument/advanced-tracing#mask-sensitive-data))
  * Support tags in Eval hooks
  * Validate attachment file readability at creation time

  ### TypeScript SDK version 0.2.4

  * Support OpenAI Agents SDK

  ### SDK Integrations: Google ADK (Python) (version 0.1.1)

  * Added integration with [Google Agent Development Kit (ADK)](/integrations/agent-frameworks/google)

  ### SDK Integrations: OpenAI Agents (TS) (version 0.0.2)

  * Fix openai-agents to inherit the right tracing context

  ### Python SDK version 0.2.2

  * Added `environment` parameter to `load_prompt`
  * The Otel SpanProcessor now keeps `traceloop.*` spans by default
  * Experiments can now be run without sending results to the server
  * Span creation is significantly faster in Python

  ### TypeScript SDK version 0.2.3

  * Added `environment` parameter to `load_prompt`
  * The Otel SpanProcessor now keeps `traceloop.*` spans by default
  * Experiments can now be run without sending results to the server
  * Fix `npx braintrust pull` for large prompts

  ### TypeScript SDK version 0.2.2

  * Fix ai-sdk tool call formatting in output
  * Log OpenAI Agents input and output to root span
  * Wrap OpenAI responses.parse
  * Add wrapTraced support for generator functions

  ### Python SDK version 0.2.1

  * Fix langchain-py integration tracing when users use a @traced method
  * Wrap OpenAI responses.parse
  * Add @traced support for generator functions

  ### Autoevals PY (version 0.0.130)

  * Fold the `braintrust_core` external package into the `autoevals` package, since it is the only user of `braintrust_core`. Future braintrust packages will not depend on the `braintrust_core` py package
</Update>

<Update label="July 2025">
  * New improved UI for trace tree
  * Token and cost metrics are computed per sub-tree in the trace viewer
  * Download BTQL sandbox results as JSON or CSV
  * Moved monitor chart legends to the bottom and increased chart heights
  * Fixed a monitor chart issue where the series toggle selector would filter the incorrect series
  * Improved monitor fullscreen experience: charts now open faster and retain their series filter state
  * Loop is now available in the experiments page and has a new ability to render interactive components inside the chat that will help you find the UI element that Loop is referencing
  * You can now use remote evals with the "+Experiment" button to create a new experiment. Previously, they were only available in the playground
  * Add monitor page UTC timezone toggle
  * Improved trace view loading performance for large traces
  * Loop can now create custom code scorers in playgrounds
  * Schema builder UI for structured outputs
  * Sort datasets when the `Faster tables` feature flag is enabled
  * Change LLM duration to be the sum, not average, of LLM duration across spans
  * Add support for Grok 4 and Mistral's Devstral Small Latest

  ### TypeScript SDK version 0.2.1

  * Fix support for the `openai.chat.completions.parse` method when used with `wrapOpenAI`
  * Added support for ai-sdk\@beta with new `BraintrustMiddleware`
  * Support running remote evals as full experiments

  ### TypeScript SDK version 0.2.0

  * When running multiple trials per input (`trial_count > 1`), you can now access the current trial index (0-based) via `hooks.trialIndex` in your task function
  * Added `BraintrustExporter` in addition to `BraintrustSpanProcessor`
  * Bound max ancestors in git to 1,000

  ### Python SDK version 0.2.0

  * When running multiple trials per input (`trial_count > 1`), you can now access the current trial index (0-based) via `hooks.trial_index` in your task function
  * New LiteLLM `wrap_litellm` wrapper
  * Increase max ancestors in git to 1,000

  ### Python SDK version 0.1.8

  * Added `BraintrustSpanProcessor` to simplify Braintrust's integration with OpenTelemetry

  ### Python SDK version 0.1.7

  * Added support for loading prompts by ID via the `load_prompt` function. You can now load prompts directly by their unique identifier

  ### TypeScript SDK version 0.1.1

  * Added `BraintrustSpanProcessor` to simplify integration with OpenTelemetry

  ### TypeScript SDK version 0.1.0

  * Fix a bug where large experiments would drop spans if they could not flush data fast enough
  * Fix bug in attachment uploading in evals executed with `npx braintrust eval`
  * Upgrading zod dependency from `^3.22.4` to `^3.25.3`
  * Added support for loading prompts by ID via the `loadPrompt` function
</Update>

<Update label="June 2025">
  * Time range filters on the logs page
  * Add support for multi-factor authentication
  * Fix a bug with Vertex AI calls when the request includes the anthropic-beta header
  * Add Zapier integration to trigger Zaps when there's a new automation event or a new project
  * Add OpenAI's [o3-pro](https://platform.openai.com/docs/models/o3-pro) model to the playground and AI proxy
  * View parameters are now present in the url when viewing a default view
  * Experiments charting controls have been added into views
  * Experiment objects now support tags through the API and on the experiments view
  * Add support for Gemini 2.5 Pro, Gemini 2.5 Flash, and Gemini 2.5 Flash Lite
  * Correctly propagate `expected` and `metadata` values to function calls when running `invoke`
  * Chat-like thread layout that simplifies thread display to LLM and score data
  * Enable all agent nodes to access dataset variables with the mustache variable `{{dataset}}`
  * Improve reliability of online scoring when logging high volumes of data to a project
  * Tags can now be sorted in the project configuration page which will change their display order in other parts of the UI
  * System-only messages are now supported in Anthropic and Bedrock models
  * Logs page UI can now filter nested data fields in `metadata`, `input`, `output`, and `expected`
  * Support reasoning params and reasoning tokens in streaming and non-streaming responses in the [AI proxy](/deploy/ai-proxy) and across the product
  * New [braintrust-proxy](https://pypi.org/project/braintrust-proxy/) Python library to help developers integrate with their IDEs to support new reasoning input and output types
  * New `@braintrust/proxy/types` module to augment OpenAI libraries with reasoning input and output types
  * New streaming protocol between Brainstore and the API server speeds up queries
  * Time brushing interaction enabled on Monitor page charts
  * Can create user-defined views in the monitoring page
  * Live updating time mode added to the monitoring page
  * The `anthropic` package is now included by default in Python functions
  * Audit log queries must now specify an `id` filter for the set of rows to fetch
  * (Beta) continuously export logs, experiments, and datasets to S3
  * Enable passing `metadata` and `expected` as arguments to the first agent prompt node

  ### Autoevals.js v0.0.130

  * Remove dependency on `@braintrust/core`

  ### TypeScript SDK version 0.0.209

  * Ensure SpanComponentsV3 encoding works in the browser

  ### TypeScript SDK version 0.0.208

  * Ensure running remote evals (i.e. `runDevServer`) works without the CLI wrapper
  * Add span + parent ids to `StartSpanArgs`

  ### TypeScript SDK version 0.0.207

  * The SDK's under-the-hood queue for sending logs now has a default size of 5000 logs
  * You can configure the max size by setting `BRAINTRUST_LOG_QUEUE_MAX_SIZE` in your environment
  * Improvements to the logging of parallel tool calls
  * Attachments are now converted to base64 data URLs, making it easier to work with image attachments in prompts

  ### TypeScript SDK version 0.0.206

  * Add support for `project.publish()` to directly `push` prompts to Braintrust (without running `braintrust push`)
  * The OpenAI and Anthropic wrappers set `provider` metadata

  ### Python SDK version 0.1.5

  * The SDK's under-the-hood log queue will not block when full and has a default size of 25000 logs
  * You can configure the max size by setting `BRAINTRUST_LOG_QUEUE_MAX_SIZE` in your environment
  * Improvements to the logging of parallel tool calls
  * Attachments are now converted to base64 data URLs, making it easier to work with image attachments in prompts

  ### Python SDK version 0.1.4

  * Add `project.publish()` to directly `push` prompts to Braintrust (without running `braintrust push`)
  * `@traced` now works correctly with async generator functions
  * The OpenAI and Anthropic wrappers set `provider` metadata

  ### Python SDK version 0.1.3

  * Improve retry logic in the control plane connection (used to create new experiments and datasets)
</Update>

<Update label="May 2025">
  * The "Faster tables" flag is now the default. You should notice experiments, datasets, and the logs page load much faster
  * Add Claude 4 models in Bedrock and Vertex to the AI proxy and playground
  * Braintrust now incorporates cached tokens into the cost calculations for experiments and logs
  * The monitor page also now includes separate lines so you can track costs and counts for uncached, cached, and cache creation tokens
  * Native support for thinking parameters in the playground
  * Improved playground prompt editor stability and performance
  * Capture cached tokens from OpenAI and Anthropic models in a unified format and surface them in the UI
  * Create experiments from the experiments list page using saved prompts/agents
  * New BTQL sandbox page and editor with autocomplete
  * Fullscreen-able monitor charts
  * Added a 'Copy page' button to the top of every docs page
  * Brainstore now supports vacuuming data from object storage to reclaim space
  * Organization owners can manage API keys for all users in their organization in the UI
  * Add endpoint for admins to list all ACLs within an org
  * Collapsible sidebar navigation
  * Command bar (CMD/CTRL+K) to quickly navigate and between pages and projects
  * View monitor page logs across all projects in an organization
  * Added Mistral Medium 3 and Gemini 2.5 Pro Preview to the AI proxy and playground
  * Self-hosted builds now log in a structured JSON format that is easier to parse

  ### Python SDK version 0.1.2

  * Added support for `metadata` and `tags` arguments to `invoke`
  * The SDK now gracefully handles OpenAI's `NotGiven` parameter
  * Added `span.link()` to synchronously generate permalinks

  ### Python SDK version 0.1.1

  * Update cached token accounting in `wrap_anthropic` to correctly capture cached tokens
  * Pull additional metadata in `braintrust pull` for prompts and functions to improve tracing

  ### SDK (version 0.1.0)

  * Allow custom model descriptions in Braintrust
  * Improve support for PDF attachments to multimodal OpenAI models
  * The Python library no longer has a dependency on `braintrust_core`

  ### TypeScript SDK version 0.0.206

  * Add support for `metadata` and `tags` arguments to `invoke`

  ### TypeScript SDK version 0.0.205

  * Make the `_xact_id` field in `origin` optional
  * Added `span.link()` as a synchronous means of generating permalinks

  ### TypeScript SDK version 0.0.204

  * Update cached token accounting in `wrapAnthropic` to correctly capture cached tokens

  ### SDK (version 0.0.203)

  * Add new reasoning to OpenAI messages

  ### SDK (version 0.0.202)

  * Gracefully handle experiment summarization failures in Eval()
  * Fix a bug where `wrap_openai` was breaking `pydantic_ai run_stream` func
  * Add tracing to the `client.beta.messages` calls in the TypeScript Anthropic library
  * Fix some deprecation warnings in the Python SDK
</Update>

<Update label="April 2025">
  * Permission groups settings page now allows admins to set group-level permissions
  * Automations alpha: trigger webhooks based on log events
  * Preview attachments in playground input cells
  * Playground now support list mode which includes score and metric summaries
  * Handle structured outputs from OpenAI's responses API in the "Try prompt" experience
  * Allow users to remove themselves from any organization they are part of using the `/v1/organization/members` REST endpoint
  * Group monitor page charts by metadata path
  * Download playground contents as CSV
  * Add pending and streaming state indicators to playground cells
  * Distinguish per-row and global playground progress
  * Added GPT-4.1, o4-mini and o3 to the AI proxy and playground
  * On the monitor page, add aggregate values to chart legends
  * Add Gemini 2.5 Flash Preview model to the AI proxy and playground
  * Add support for audio and video inputs for Gemini models in the AI proxy and playground
  * Add support for PDF files for OpenAI models
  * Native tracing support in the proxy has finally arrived! Read more in [the docs](/deploy/ai-proxy#enable-logging)
  * Upload attachments directly in the UI in datasets, playgrounds, and prompts
  * Playground option to append messages from a dataset to the end of a prompt
  * A new toggle that lets you skip tracing scoring info for online scoring
  * GIF and image support in comments
  * Add embedded view and download action for inline attachments of supported file types

  ### SDK (version 0.0.201)

  * Support OpenAI `client.beta.chat.completions.parse` in the Python wrapper

  ### SDK (version 0.0.200)

  * Ensure the prompt cache properly handles any manner of prompt names
  * Ensure the output of `anthropic.messages.create` is properly traced when called with `stream=True` in an async program

  ### SDK (version 0.0.199)

  * Fix a bug that broke async calls to the Python version of `anthropic.messages.create`
  * Store detailed metrics from OpenAI's `chat.completion` TypeScript API

  ### SDK (version 0.0.198)

  * Trace the `openai.responses` endpoint in the Typescript SDK
  * Store the `token_details` metrics return by the `openai/responses` API

  ### SDK (version 0.0.197)

  * Fix a bug in `init_function` in the Python SDK which prevented the `input` argument from being passed to the function correctly when it was used as a scorer
  * Support setting `description` and `summarizeScores`/`summarize_scores` in `Eval(...)`
</Update>

<Update label="March 2025">
  * Many improvements to the playground experience:
    * Fixed many crashes and infinite loading spinner states
    * Improved performance across large datasets
    * Better support for running single rows for the first time
    * Fixed re-ordering prompts
    * Fixed adding and removing dataset rows
    * You can now re-run specific prompts for individual cells and columns
  * You can now do "does not contain" filters for tags in experiments and datasets
  * When you `invoke()` a function, inline base64 payloads will be automatically logged as attachments
  * Add a strict mode to evals and functions which allows you to fail test cases when a variable is not present in a prompt
  * Add Fireworks' DeepSeek V3 03-24 and DeepSeek R1 (Basic), along with Qwen QwQ 32B in Fireworks and Together.ai, to the playground and AI proxy
  * Fix bug that prevented Databricks custom provider form from being submitted without toggling authentication types
  * Unify Vertex AI, Azure, and Databricks custom provider authentication inputs
  * Add Llama 4 Maverick and Llama 4 Scout models to Together.ai, Fireworks, and Groq providers in the playground and AI proxy
  * Add Mistral Saba and Qwen QwQ 32B models to the Groq provider in the playground and AI proxy
  * Add Gemini 2.5 Pro Experimental and Gemini 2.0 Flash Thinking Mode models to the Vertex provider in the playground and AI proxy
  * Add OpenAI's [o1-pro](https://platform.openai.com/docs/models/o1-pro) model to the playground and AI proxy
  * Support OpenAI Responses API in the AI proxy
  * Add support for the Gemini 2.5 Pro Experimental model in the playground and AI proxy
  * Option to disable the experiment comparison auto-select behavior
  * Add support for Databricks custom provider as a default cloud provider in the playground and AI proxy
  * Allow supplying a base API URL for Mistral custom providers in the playground and AI proxy
  * Support pushed code bundles larger than 50MB
  * The OTEL endpoint now understands structured output calls from the Vercel AI SDK
  * Added support for `concat`, `lower`, and `upper` string functions in BTQL
  * Correctly propagate Bedrock streaming errors through the AI proxy and playground
  * Online scoring supports sampling rates with decimal precision
  * Added support for OpenAI GPT-4o Search Preview and GPT-4o mini Search Preview in the playground and AI proxy
  * Add support for making Anthropic and Google-format requests to corresponding models in the AI proxy
  * Fix bug in model provider key modal that prevents submitting a Vertex provider with an empty base URL
  * Add column menu in grid layout with sort and visibility options
  * Enable logging the `origin` field through the REST API
  * Add support for "image" pdfs in the AI proxy
  * Fix issue in which code function executions could hang indefinitely
  * Add support for custom base URLs for Vertex AI providers
  * Add dataset column to experiments table
  * Add python3.13 support to user-defined functions
  * Fix bug that prevented calling Python functions from the new unified playground

  ### SDK (version 0.0.196)

  * Adding Anthropic tracing for our TypeScript SDK. See `braintrust.wrapAnthropic`
  * The SDK now paginates datasets and experiments, which should improve performance for large datasets and experiments
  * Add `strict` flag to `invoke` which implements the strict mode described above
  * Raise if a Python tool is pushed without without defined parameters, instead of silently not showing the tool in the UI
  * Fix Python OpenAI wrapper to work for older versions of the OpenAI library without `responses`
  * Set time\_to\_first\_token correctly from AI SDK wrapper

  ### SDK (version 0.0.195)

  * Improve the metadata collected by the Anthropic client
  * Anthropic client can now be run with `braintrust.wrap_anthropic`
  * Fix a bug when `messages.create` was called with `stream=True`

  ### SDK (version 0.0.194)

  * Add Anthropic tracing to the Python SDK with `wrap_anthropic_client`
  * Fix a bug calling `braintrust.permalink` with `NoopSpan`

  ### SDK (version 0.0.193)

  * Fix retry bug when downloading large datasets/experiments from the SDK
  * Background logger will load environment variables upon first use rather than when module is imported

  ### SDK (version 0.0.192)

  * Improve default retry handler in the python SDK to cover more network-related exceptions

  ### SDK (version 0.0.190)

  * Fix `prompt pull` for long prompts
  * Fix a bug in the Python SDK which would not retry requests that were severed after a connection timeout

  ### SDK (version 0.0.189)

  * Added integration with [OpenAI Agents SDK](/integrations/agent-frameworks/openai-agents-sdk)

  ### SDK (version 0.0.188)

  * Deprecated `braintrust.wrapper.langchain` in favor of the new `braintrust-langchain` package

  ### SDK (version 0.0.187)

  * Always bundle default python packages when pushing code with `braintrust push`
  * Fix bug in the TypeScript SDK where `asyncFlush` was not correctly defaulted to false
  * Fix a bug where `span_attributes` failed to propagate to child spans through propagated events
  * Added support for handling score values when an Eval has errored
  * Improve support for binary packages in `npx braintrust eval`
  * Support templated structured outputs
  * Fix dataset summary types in Typescript

  ### Autoevals (version 0.0.124)

  * Added `init` to set a global default client for all evaluators (Python and Node.js)
  * Added `client` argument to all evaluators to specify the client to use
  * Improved the Autoevals docs with more examples

  ### Autoevals (version 0.0.123)

  * Swapped `polyleven` for `levenshtein` for faster string matching

  ### SDK Integrations: LangChain (Python) (version 0.0.2)

  * Add a new `braintrust-langchain` integration with an improved `BraintrustCallbackHandler` and `set_global_handler` to set the handler globally for all LangChain components

  ### SDK Integrations: LangChain.js (version 0.0.6)

  * Small improvement to avoid logging unhelpful LangGraph spans
  * Updated peer dependencies with LangChain core that fixes the global handler for LangGraph runs

  ### SDK Integrations: Val Town

  * New `val.town` integration with example vals to quickly get started with Braintrust
</Update>

<Update label="February 2025">
  * Add support for removing all permissions for a group/user on an object with a single click
  * Add support for Claude 3.7 Sonnet model
  * Add [llms.txt](https://www.braintrust.dev/llms.txt) for docs content
  * Enable spellcheck for prompt message editors
  * Add support for Anthropic Claude models in Vertex AI
  * Add support for Claude 3.7 Sonnet in Bedrock and Vertex AI
  * Add support for Perplexity R1 1776, Mistral Saba, Gemini LearnLM, and more Groq models
  * Support system instructions in Gemini models
  * Add support for Gemini 2.0 Flash-Lite
  * Add support for default Bedrock cross-region inference profiles in the playground and AI proxy
  * Move score distribution charts to the experiment sidebar
  * Add support for OpenAI GPT-4.5 model in the playground and AI proxy
  * Add deprecation warning for `_parent_id` field in the REST API
  * Add support for stop sequences in Anthropic, Bedrock, and Google models
  * Resolve JSON Schema references when translating structured outputs to Gemini format
  * Add button to copy table cell contents to clipboard
  * Add support for basic Cache-Control headers in the AI proxy
  * Add support for selecting all or none in the categories of permission dialogs
  * Respect Bedrock providers not supporting streaming in the AI proxy
  * Store table grouping, row height, and layout options in the view configuration
  * Add the ability to set a default table view
  * Add support for Google Cloud Vertex AI in the playground and proxy
  * Add default cloud providers section to the organization AI providers page
  * Support streaming responses from OpenAI o1 models in the playground and AI proxy
  * Add complete support for Bedrock models in the playground and AI proxy
  * Fix model provider configuration issues in which custom models could clobber default models
  * Fix bug in streaming JSON responses from non-OpenAI providers
  * Supported templated structured outputs in experiments run from the playground
  * Support structured outputs in the playground and AI proxy for Anthropic models, Bedrock models, and any OpenAI-flavored models that support tool calls
  * Support templated custom headers for custom AI providers
  * Added and updated models across all providers in the playground and AI proxy
  * Support tool usage and structured outputs for Gemini models in the playground and AI proxy
  * Simplify playground model dropdown by showing model variations in a nested dropdown

  ### SDK (version 0.0.187)

  * Added support for handling score values when an Eval has errored
  * Improve support for binary packages in `npx braintrust eval`
  * Support templated structured outputs
  * Fix dataset summary types in Typescript
</Update>

<Update label="January 2025">
  * Add support for duplicating prompts, scorers, and tools
  * Fix pagination for the `/v1/prompt` REST API endpoint
  * "Unreviewed" default view on experiment and logs tables to filter out rows that have been human reviewed
  * Add o3-mini to the AI proxy and playground
  * Scorer dropdown now supports using custom scoring functions across projects
  * Drag and drop to reorder span fields in experiment/log traces and dataset rows
  * Small convenience improvement to the BTQL Sandbox
  * Add an attachments browser to view all attachments for a span in a sidebar
  * Add support for setting a baseline experiment for experiment comparisons
  * UI updates to experiment and log tables
    * Trace audit log now displays granular changes to span data
    * Start/end columns shown as dates/times
    * Non-existent trace records display an error message instead of loading indefinitely
  * Creating an experiment from a playground now correctly renders prompts with `input`, `metadata`, `expected`, and `output` mapped fields
  * The [AI proxy](/deploy/ai-proxy) now includes `x-bt-used-endpoint` as a response header
  * Add support for deeplinking to comments within spans
  * In Human Review mode, display all scores in a form
  * Experiment table rows can now be sorted based on score changes and regressions for each group
  * The OTEL endpoint now converts attributes under the `braintrust` namespace directly to the corresponding Braintrust fields
  * New OTEL attributes that accept JSON-serialized values have been added for convenience
  * Experiment tables and individual traces now support comparing trial data between experiments

  ### SDK Integrations: LangChain.js (version 0.0.5)

  * Less noisy logging from the LangChain.js integration
  * You can now pass a `NOOP_SPAN` to the `BraintrustCallbackHandler` to disable logging
  * Fixes a bug where the LangChain.js integration could not handle null/undefined values in chain inputs/outputs

  ### SDK Integrations: LangChain.js (version 0.0.4)

  * Support logging spans from inside evals in the LangChain.js integration

  ### SDK (version 0.0.184)

  * `span.export()` will no longer throw if braintrust is down
  * Improvement to the Python prompt rendering to correctly render formatted messages, LLM tool calls, and other structured outputs

  ### SDK (version 0.0.183)

  * Fix a bug related to `initDataset()` in the Typescript SDK creating links in `Eval()` calls
  * Fix a few type checking issues in the Python SDK

  ### SDK (version 0.0.182)

  * Improved logging for moderation models from the SDK wrappers

  ### SDK (version 0.0.181)

  * Add `ReadonlyAttachment.metadata` helper method to fetch a signed URL for downloading the attachment metadata

  ### SDK (version 0.0.179)

  * New `hook.expected` for reading and updating expected values in the Eval framework
  * Small type improvements for `hook` objects
  * Fixed a bug to enable support for `init_function` with LLM scorers in Python
  * Support nested attachments in Python
  * Add support for imports in Python functions pushed to Braintrust via `braintrust push`

  ### SDK (version 0.0.178)

  * Cache prompts locally in a two-layered memory/disk cache
  * Support for using custom functions that are stored in Braintrust in evals
  * Add support for running traced functions in a `ThreadPoolExecutor` in the Python SDK
  * Improved formatting of spans logged from the Vercel AI SDK's `generateObject` method
  * Default to `asyncFlush: true` in the TypeScript SDK

  ### SDK integrations: LangChain.js (version 0.0.2)

  * Add support for initializing global LangChain callback handler to avoid manually passing the handler to each LangChain object
</Update>

<Update label="December 2024">
  * Add support for free-form human review scores (written to the `metadata` field)
  * Add support for structured outputs in the playground
  * Sparkline charts added to the project home page
  * Better handling of missing data points in monitor charts
  * Clicking on monitor charts now opens a link to traces filtered to the selected time range
  * Add `Endpoint supports streaming` flag to custom provider configuration
  * Experiments chart can be resized vertically by dragging the bottom of the chart
  * BTQL sandbox to explore project data using [Braintrust Query Language](/reference/sql)
  * Add support for updating span data from custom span iframes
  * Significantly speed up loading performance for experiments and logs, especially with lots of spans
    * Searches inside experiments will only work over content in the tabular view, rather than over the full trace
    * While searching on the logs page, realtime updates are disabled
  * Starring rows in experiment and dataset tables now supported
  * "Order by regression" option in experiment column menu can now be toggled on and off without losing previous order
  * Add expanded timeline view for traces
  * Added a 'Request count' chart to the monitor page
  * Add headers to custom provider configuration which the [AI proxy](/deploy/ai-proxy) will include in the request to the custom endpoint
  * The logs viewer now supports exporting the currently loaded rows as a CSV or JSON file
  * Experiment columns can now be reordered from the column menu
  * You can now customize legends in monitor charts

  ### Autoevals (version 0.0.110)

  * Python Autoevals now support custom clients when calling evaluators

  ### SDK (version 0.0.179)

  * Add support for imports in Python functions pushed to Braintrust via `braintrust push`

  ### SDK (version 0.0.178)

  * Cache prompts locally in a two-layered memory/disk cache
  * Support for using custom functions that are stored in Braintrust in evals
  * Add support for running traced functions in a `ThreadPoolExecutor` in the Python SDK
  * Improved formatting of spans logged from the Vercel AI SDK's `generateObject` method
  * Default to `asyncFlush: true` in the TypeScript SDK

  ### SDK (version 0.0.177)

  * Support for creating and pushing custom scorers from your codebase with `braintrust push`

  ### SDK (version 0.0.176)

  * New `hook.metadata` for reading and updating Eval metadata when using the `Eval` framework

  ### SDK (version 0.0.175)

  * Fix bug with serializing ReadonlyAttachment in logs

  ### SDK (version 0.0.174)

  * AI SDK fixes: support for image URLs and properly formatted tool calls so "Try prompt" works in the UI

  ### SDK (version 0.0.173)

  * Attachments can now be loaded when iterating an experiment or dataset

  ### SDK (version 0.0.172)

  * Fix a bug where `braintrust eval` did not respect certain configuration options, like `base_experiment_id`
  * Fix a bug where `invoke` in the Python SDK did not properly stream responses

  ### SDK integrations: LangChain.js (version 0.0.1)

  * New LangChain.js integration to export traces from `langchainjs` runs
</Update>

<Update label="November 2024">
  * The Traceloop OTEL integration now uses the input and output attributes to populate the corresponding fields in Braintrust
  * The monitor page now supports querying experiment metrics
  * Removed the `filters` param from the REST API fetch endpoint
  * New experiment summary layout option, a url-friendly view for experiment summaries that respects all filters
  * Add a default limit of 10 to all fetch and `/sql` requests for project\_logs
  * You can now export your prompts from the playground as code snippets and run them through the [AI proxy](/deploy/ai-proxy)
  * Support for creating and pushing custom Python tools and prompts from your codebase with `braintrust push`
  * You can now view grouped summary data for all experiments by selecting **Include comparisons in group** from the **Group by** dropdown inside an experiment
  * The experiments page now supports downloading as CSV/JSON
  * Downloading or duplicating a dataset in the UI now properly copies all dataset rows
  * You can now view a score data as a bar chart for your experiments data by selecting **Score comparison** from the X axis selector
  * Trials information is now shown as a separate column in diff mode in the experiment table
  * Cmd/Ctrl + S hotkey to save from prompts in the playground and function dialogs
  * The Braintrust [AI Proxy](/deploy/ai-proxy) now supports the [OpenAI Realtime API](https://platform.openai.com/docs/guides/realtime)
  * Add "Group by" functionality to the monitor page
  * The experiment table can now be visualized in a [grid layout](/evaluate/interpret-results#adjust-table-layout)
  * 'Select all' button in permission dialogs
  * Create custom columns on dataset, experiment and logs tables from `JSON` values in `input`, `output`, `expected`, or `metadata` fields
  * The Braintrust [AI Proxy](/deploy/ai-proxy) can now [issue temporary credentials](/deploy/ai-proxy#configure-api-keys) to access the proxy for a limited time
  * Move experiment score summaries to the table column headers
  * You now receive a clear error message if you run out of free-tier capacity while running an experiment from the playground
  * Filters on JSON fields now support array indexing, e.g. `metadata.foo[0] = 'bar'`

  ### SDK (version 0.0.171)

  * Add a `.data` method to the `Attachment` class, which lets you inspect the loaded attachment data

  ### SDK (version 0.0.170)

  * Support uploading [file attachments in the Python SDK](https://www.braintrust.dev/docs/reference/libs/python#attachment-objects)
  * Log, feedback, and dataset inputs to the Python SDK are now synchronously deep-copied for more consistent logging

  ### SDK (version 0.0.169)

  * The Python SDK `Eval()` function has been split into `Eval()` and `EvalAsync()`
  * Improved type annotations in the Python SDK

  ### SDK (version 0.0.168)

  * A new `Span.permalink()` method allows you to format a permalink for the current span
  * `braintrust push` support for Python tools and prompts
  * `initDataset()`/`init_dataset()` used in `Eval()` now tracks the dataset ID and links to each row in the dataset properly

  ### SDK (version 0.0.167)

  * Support uploading [file attachments in the TypeScript SDK](/instrument/attachments)
  * Log, feedback, and dataset inputs to the TypeScript SDK are now synchronously deep-copied for more consistent logging
  * Address an issue where the TypeScript SDK could not make connections when running in a Cloudflare Worker
</Update>

<Update label="October 2024">
  * The Monitor page now shows an aggregate view of log scores over time
  * Improvement/Regression filters between experiments are now saved to the URL
  * Add `max_concurrency` and `trial_count` to the playground when kicking off evals
  * Show a button to scroll to a single search result in a span field when using trace search
  * Indicate spans with errors in the trace span list
  * After using "Copy to Dataset" to create a new dataset row, the audit log of the new row now links back to the original experiment, log, or other dataset
  * Tools now stream their `stdout` and `stderr` to the UI
  * Fix prompt, scorer, and tool dropdowns to only show the correct function types
  * The [Github action](/evaluate/run-evaluations#github-actions) now supports Python runtimes
  * Add support for [Cerebras](https://cerebras.ai/) models in the proxy, playground, and saved prompts
  * You can now create [span iframe viewers](/instrument/advanced-tracing#customize-span-rendering) to visualize span data in a custom iframe
  * `NOT LIKE`, `NOT ILIKE`, `NOT INCLUDES`, and `NOT CONTAINS` supported in BTQL
  * Add "Upload Rows" button to insert rows into an existing dataset from CSV or JSON
  * Add "Maximum" aggregate score type
  * The experiment table now supports grouping by input (for trials) or by a metadata field
  * Gemini models now support multimodal inputs
  * Preview [file attachments](/instrument/attachments) in the trace view
  * View and filter by comments in the experiment table
  * Add table row numbers to experiments, logs, and datasets

  ### SDK (version 0.0.166)

  * Allow explicitly specifying git metadata info in the Eval framework

  ### SDK (version 0.0.165)

  * Support specifying dataset-level metadata in `initDataset/init_dataset`

  ### SDK (version 0.0.164)

  * Add `braintrust.permalink` function to create deep links pointing to particular spans in the Braintrust UI

  ### SDK (version 0.0.163)

  * Fix Python SDK compatibility with Python 3.8

  ### SDK (version 0.0.162)

  * Fix Python SDK compatibility with Python 3.9 and older

  ### SDK (version 0.0.161)

  * Add utility function `spanComponentsToObjectId` for resolving the object ID from an exported span slug
</Update>

<Update label="September 2024">
  * Basic monitor page that shows aggregate values for latency, token count, time to first token, and cost for logs
  * Create custom tools to use in your prompts and in the playground
  * Pull your prompts to your codebase using the `braintrust pull` command
  * Select and compare multiple experiments in the experiment view using the `compared with` dropdown
  * The playground now displays aggregate scores (avg/max/min) for each prompt and supports sorting rows by a score
  * Compare span field values side-by-side in the trace viewer when fullscreen and diff mode is enabled
  * The tag picker now includes tags that were added dynamically via API
  * You can now create server-side online evaluations for your logs
  * New member invitations now support being added to multiple permission groups
  * Move datasets and prompts to a new Library navigation tab, and include a list of custom scorers
  * Clean up tree view by truncating the root preview and showing a preview of a node only if collapsed
  * Automatically save changes to table views
  * You can now upload typescript evals from the command line as functions, and then use them in the playground
  * Click a span field line to highlight it and pin it to the URL
  * Copilot tab autocomplete for prompts and data in the Braintrust UI
  * Basic filter UI (no BTQL necessary)
  * Add to dataset dropdown now supports adding to datasets across projects
  * Add REST endpoint for batch-updating ACLs: `/v1/acl/batch_update`
  * Cmd/Ctrl click on a table row to open it in a new tab
  * Show the last 5 basic filters in the filter editor
  * You can now explicitly set and edit prompt slugs
  * Fixed comment deletion
  * You can now use `%` in BTQL queries to represent percent values

  ### Autoevals (version 0.0.86)

  * Add support for Azure OpenAI in node

  ### SDK (version 0.0.160)

  * Fix a bug with `setFetch()` in the TypeScript SDK

  ### SDK (version 0.0.159)

  * In Python, running the CLI with `--verbose` now uses the `INFO` log level
  * Create and push custom tools from your codebase with `braintrust push`
  * You can now pull prompts to your codebase using the `braintrust pull` command

  ### SDK (version 0.0.158)

  * A dedicated `update` method is now available for datasets
  * Fixed a Python-specific error causing experiments to fail initializing when git diff encounters invalid repositories
  * Token counts have the correct units when printing `ExperimentSummary` objects

  ### SDK (version 0.0.157)

  * Enable the `--bundle` flag for `braintrust eval` in the TypeScript SDK

  ### SDK (version 0.0.155)

  * The client wrappers `wrapOpenAI()`/`wrap_openai()` now support [Structured Outputs](https://platform.openai.com/docs/guides/structured-outputs)
</Update>

<Update label="August 2024">
  * You can now create custom LLM and code (TypeScript and Python) evaluators in the playground
  * Fullscreen trace toggle
  * Datasets now accept JSON file uploads
  * When uploading a CSV/JSON file to a dataset, columns/fields named `input`, `expected`, and `metadata` are now auto-assigned to the corresponding dataset fields
  * Full text search UI for all span contents in a trace
  * New metrics in the UI and summary API: prompt tokens, completion tokens, total tokens, and LLM duration
  * Switching organizations via the header navigates to the same-named project in the selected organization
  * Errors now show up in the trace viewer
  * New cookbook recipe on [benchmarking LLM providers](/cookbook/recipes/ProviderBenchmark)
  * Viewer mode selections will no longer automatically switch to a non-editable view if the field is editable
  * Show `%` in diffs instead of `pp`
  * Add rename, delete and copy current project id actions to the project dropdown
  * Playgrounds can now be shared publicly
  * Duration now reflects the "task" duration not the overall test case duration
  * Duration is now also displayed in the experiment overview table
  * Add support for Fireworks and Lepton inference providers
  * "Jump to" menu to quickly navigate between span sections
  * Speed up queries involving metadata fields using the columnstore backend if it is available
  * Update to include the latest Mistral models in the proxy/playground
  * Categorical human review scores can now be re-ordered via Drag-n-Drop
  * Human review row selection is now a free text field, enabling a quick jump to a specific row

  ### Autoevals (version 0.0.85)

  * LLM calls used in autoevals are now marked with `span_attributes.purpose = "scorer"` so they can be excluded from metric and cost calculations

  ### Autoevals (version 0.0.84)

  * Fix a bug where `rationale` was incorrectly formatted in Python
  * Update the `full` docker deployment configuration to bundle the metadata DB (supabase) inside the main docker compose file

  ### SDK (version 0.0.151)

  * `Eval()` can now take a base experiment. Provide either `baseExperimentName`/`base_experiment_name` or `baseExperimentId`/`base_experiment_id`

  ### SDK (version 0.0.148)

  * While tracing, if your code errors, the error will be logged to the span

  ### SDK (version 0.0.147)

  * `project_name` is now `projectName`, etc. in the `invoke(...)` function in TypeScript
  * `Eval()` return values are printed in a nicer format
  * [`updateSpan()`/`update_span()`](/instrument/advanced-tracing#update-spans) allows you to update a span's fields after it has been created

  ### SDK (version 0.0.146)

  * Add support for `max_concurrency` in the Python SDK
  * Hill climbing evals that use a `BaseExperiment` as data will use that as the default base experiment
</Update>

<Update label="July 2024">
  * In preparation for auth changes, we are making a series of updates that may affect self-deployed instances
  * Human review scores are now sortable from the project configuration page
  * Streaming support for tool calls in Anthropic models through the proxy and playground
  * The playground now supports different "parsing" modes: `auto`, `parallel`, `raw`, `raw_stream`
  * Table views [can now be saved](/reference/views), persisting the BTQL filters, sorts, and column state
  * Add support for the new `window.ai` model into the playground
  * Use push history when navigating table rows to allow for back button navigation
  * In the experiments list, grouping by a metadata field will group rows in the table as well
  * Allow the trace tree panel to be resized
  * Port the log summary query to BTQL for improved speed
  * Update the experiment progress and experiment score distribution chart layouts
  * Format table column headers with icons
  * Move active filters to the table toolbar
  * Enable RBAC for all users
  * Use btql to power the datasets list, making it significantly faster if you have multiple large datasets
  * Experiments list chart supports click interactions
  * Jump into comparison view between 2 experiments by selecting them in the table an clicking "Compare"
  * Add support for labeling [expected fields using human review](/annotate/labels#update-expected-values)
  * Create and edit descriptions for datasets
  * Create and edit metadata for prompts
  * Click scores and attributes (tree view only) in the trace view to filter by them
  * Highlight the experiments graph to filter down the set of experiments
  * Add support for new models including Claude 3.5 Sonnet
  * Improved empty state and instructions for custom evaluators in the playground
  * Show query examples when filtering/sorting
  * [Custom comparison keys](/evaluate/compare-experiments#set-a-comparison-key) for experiments
  * New model dropdown in the playground/prompt editor that is organized by provider and model type

  ### Autoevals (version 0.0.80)

  * New `ExactMatch` scorer for comparing two values for exact equality

  ### Autoevals (version 0.0.77)

  * Officially switch the default model to be `gpt-4o`
  * Support claude models

  ### Autoevals (version 0.0.76)

  * New `.partial(...)` syntax to initialize a scorer with partial arguments like `criteria` in `ClosedQA`
  * Allow messages to be inserted in the middle of a prompt

  ### SDK (version 0.0.140)

  * New `wrapTraced` function allows you to trace javascript functions in a more ergonomic way

  ### SDK (version 0.0.138)

  * The TypeScript SDK's `Eval()` function now takes a `maxConcurrency` parameter
  * `braintrust install api` now sets up your API and Proxy URL in your environment
  * You can now specify a custom `fetch` implementation in the TypeScript SDK

  ### Deployment

  * The proxy service now supports more advanced functionality which requires setting the `PG_URL` and `REDIS_URL` parameters
</Update>

<Update label="June 2024">
  * You can now collapse the trace tree. It's auto collapsed if you have a single span
  * Improvements to the experiment chart including greyed out lines for inactive scores and improved legend
  * Show diffs when you save a new prompt version
  * You can now see which users are viewing the same traces as you are in real-time
  * Improve whitespace and presentation of diffs in the trace view
  * Show markdown previews in score editor
  * Show cost in spans and display the average cost on experiment summaries and diff views
  * Published a new [Text2SQL eval recipe](/cookbook/recipes/Text2SQL-Data)
  * Add groups view for RBAC
  * Deprecate the legacy dataset format (`output` in place of `expected`) in a new version of the SDK
  * Improve the UX for saving and updating prompts from the playground
  * New hide/show column controls on all tables
  * New [model comparison](/cookbook/recipes/ModelComparison) cookbook recipe
  * Add support for model / metadata comparison on the experiments view
  * New experiment picker dropdown
  * Markdown support in the LLM message viewer
  * Support copying to clipboard from `input`, `output`, etc. views
  * Improve the empty-state experience for datasets
  * New multi-dimensional charts on the experiment page for comparing models and model parameters
  * Support `HTTPS_PROXY`, `HTTP_PROXY`, and `NO_PROXY` environment variables in the API containers
  * Support infinite scroll in the logs viewer and remove dataset size limitations
  * Denser trace view with span durations built in
  * Rework pagination and fix scrolling across multiple pages in the logs viewer
  * Make BTQL the default search method
  * Add support for Bedrock models in the playground and the proxy
  * Add "copy code" buttons throughout the docs
  * Automatically overflow large objects (e.g. experiments) to S3 for faster loading and better performance
  * Show images in LLM view
  * Send an invite email when you invite a new user to your organization
  * Support selecting/deselecting scores in the experiment view
  * Roll out [Braintrust Query Language](/reference/sql) (BTQL) for querying logs and traces
  * Smart relative time labels for dates (`1h ago`, `3d ago`, etc.)
  * Added double quoted string literals support
  * Jump to top button in trace details for easier navigation
  * Fix a race condition in distributed tracing
</Update>

<Update label="May 2024">
  * Incremental support for roles-based access control (RBAC) logic within the API server backend
  * Changed the semantics of experiment initialization with `update=True`
  * Added support for new multimodal models
  * Introduced [REST API for RBAC](/api-reference)
  * Improved AI search and added positive/negative tag filtering in AI search
  * Added functionality for distributed tracing
  * Introduce multimodal support for OpenAI and Anthropic models in the prompt playground and proxy
  * The REST API now gzips responses
  * You can now return dynamic arrays of scores in `Eval()` functions
  * Launched Reporters
  * New coat of paint in the trace view
  * Added support for Clickhouse as an additional storage backend
  * Implemented realtime checks using a WebSocket connection
  * Introduced an API version checker tool
  * Faster optimistic updates for large writes in the UI
  * "Open in playground" now opens a lighter weight modal instead of the full playground
  * Can create a new prompt playground from the prompt viewer
  * Shipped support for [prompt management](/deploy/prompts)
  * Moved playground sessions to be within projects
  * Allowed customizing proxy and real-time URLs through the web application
  * Improved documentation for Docker deployments
  * Improved folding behavior in data editors
  * Support custom models and endpoint configuration for all providers
  * New add team modal with support for multiple users
  * New information architecture to enable faster project navigation
  * Experiment metadata now visible in the experiments table
  * Improve UI write performance with batching
  * Log filters now apply to *any* span
  * Share button for traces
  * Images now supported in the tree view
  * Show auto scores before manual scores (matching trace) in the table
  * New logo is live!
  * Any span can now submit scores, which automatically average in the trace
  * Improve sidebar scrolling behavior
  * Add AI search for datasets and logs
  * Add tags to the SDK
  * Support viewing and updating metadata on the experiment page
</Update>

<Update label="April 2024">
  * Add support for tags
  * Score fields are now sorted alphabetically
  * Add support for Groq ModuleResolutionKind
  * Improve tree viewer and XML parser
  * New experiment page redesign
  * Support duplicate `Eval` names
  * Fallback to `BRAINTRUST_API_KEY` if `OPENAI_API_KEY` is not set
  * Throw an error if you use `experiment.log` and `experiment.start_span` together
  * Add keyboard shortcuts (j/k/p/n) for navigation
  * Increased tooltip size and delay for better usability
  * Support more viewing modes: HTML, Markdown, and Text
</Update>

<Update label="March 2024">
  * Tons of improvements to the prompt playground
  * Cloudformation now supports more granular RDS configuration
  * Support optional slider params
  * Lots of style improvements for tables
  * Deleting a prompt takes you back to the prompts tab
</Update>

<Update label="February 2024">
  * New [REST API](/api-reference)
  * [Cookbook](/cookbook) of common use cases and examples
  * Support for [custom models](/evaluate/playgrounds) in the playground
  * Search now works across spans, not just top-level traces
  * Show creator avatars in the prompt playground
  * Improved UI breadcrumbs and sticky table headers
  * UI improvements to the playground
  * Added an example of closed QA / extra fields
  * New YAML parser and new syntax highlighting colors for data editor
  * Added support for enabling/disabling certain git fields from collection
  * Added new GPT-3.5 and 4 models to the playground
  * Fixed scrolling jitter issue in the playground
  * Made table fields in the prompt playground sticky
</Update>

<Update label="January 2024">
  * Added ability to download dataset as CSV
  * Added YAML support for logging and visualizing traces
  * Added JSON mode in the playground
  * Added span icons and improved readability
  * Enabled shift modifier for selecting multiple rows in Tables
  * Improved tables to allow editing expected fields and moved datasets to trace view
  * Added ability to manually score results in the experiment UI
  * Added comments and audit log in the experiment UI
  * Added ability to upload dataset CSV files in prompt playgrounds
  * Published new [guide for tracing and logging your code](/instrument/trace-application-logic)
  * Added support to download experiment results as CSVs
</Update>

<Update label="December 2023">
  * Dropped the official 2023 Year-in-Review dashboard
  * Improved ergonomics for the Python SDK
    * The `@traced` decorator will automatically log inputs/outputs
    * You no longer need to use context managers to scope experiments or loggers
  * Enable skew protection in frontend deploys
  * Added syntax highlighting in the sidepanel to improve readability
  * Add `jsonl` mode to the eval CLI to log experiment summaries in an easy-to-parse format
  * Released new trials feature to rerun each input multiple times
  * Added ability to run evals in the prompt playground
  * Added support for Gemini and Mistral Platform in AI proxy and playground
  * Enabled the prompt playground and datasets for free users
  * Added Together.ai models including Mixtral to AI Proxy
  * Turned prompts tab on organization view into a list
  * Removed data row limit for the prompt playground
  * Enabled configuration for dark mode and light mode in settings
  * Added automatic logging of a diff if an experiment is run on a repo with uncommitted changes
  * API keys are now scoped to organizations
  * You can now search for experiments by any metadata, including their name, author, or even git metadata
  * Filters are now saved in URL state so you can share a link to a filtered view
  * Improve performance of project page by optimizing API calls
</Update>

<Update label="November 2023">
  * Added experiment search on project view to filter by experiment name
  * Upgraded AI Proxy to support tracking Prometheus metrics
  * Modified Autoevals library to use the [AI proxy](/deploy/ai-proxy)
  * Upgraded Python braintrust library to parallelize evals
  * Optimized experiment diff view for performance improvements
  * Added support for new Perplexity models to playground
  * Released [AI proxy](/deploy/ai-proxy): access many LLMs using one API w/ caching
  * Added [load balancing endpoints](/deploy/ai-proxy#load-balance-across-providers) to AI proxy
  * Updated org-level view to show projects and prompt playground sessions
  * Added ability to batch delete experiments
  * Added support for Claude 2.1 in playground
  * Made experiment column resized widths persistent
  * Fixed our libraries including Autoevals to work with OpenAI's new libraries
  * Added support for function calling and tools in our prompt playground
  * Added tabs on a project page for datasets, experiments, etc
  * Improved selectors for diffing and comparison modes on experiment view
  * Added support for new OpenAI models (GPT4 preview, 3.5turbo-1106) in playground
  * Added support for OS models (Mistral, Codellama, Llama2, etc.) in playground using Perplexity's APIs
</Update>

<Update label="October 2023">
  * Improved experiment sidebar to be fully responsive and resizable
  * Improved tooltips within the web UI
  * Multiple performance optimizations and bug fixes
  * Improved prompt playground variable handling and visualization
  * Added time duration statistics per row to experiment summaries
  * [Launched new tracing feature: log and visualize complex LLM chains and executions](/instrument/trace-application-logic)
  * Added a new "text-block" prompt type in the playground
  * Increased default # of rows per page from 10 to 100 for experiments
  * UI fixes and improvements for the side panel and tooltips
  * The experiment dashboard can be customized to show the most relevant charts
  * Performance improvements related to user sessions
  * All experiment loading HTTP requests are 100-200ms faster
  * The prompt playground now supports autocomplete
  * Dataset versions are now displayed on the datasets page
  * Projects in the summary page are now sorted alphabetically
  * Long text fields in logged data can be expanded into scrollable blocks
</Update>

<Update label="September 2023">
  * The Eval framework is now supported in Python!
  * Onboarding and signup flow for new users
  * Switch product font to Inter
  * Big performance improvements for registering experiments (down from \~5s to \<1s)
  * New graph shows aggregate accuracy between experiments for each score
  * Throw errors in the prompt playground if you reference an invalid variable
  * A significant backend database change which significantly improves performance while reducing costs
  * No more record size constraints (previously, strings could be at most 64kb long)
  * New autoevals for numeric diff and JSON diff
  * You can duplicate prompt sessions, prompts, and dataset rows in the prompt playground
  * You can download prompt sessions as JSON files
  * You can adjust model parameters (e.g. temperature) in the prompt playground
  * You can publicly share experiments
  * Datasets now support editing, deleting, adding, and copying rows in the UI
</Update>

<Update label="August 2023">
  * The prompt playground is now live!
  * A new chart shows experiment progress per score over time
  * The eval CLI now supports `--watch`, which will automatically re-run your evaluation
  * You can now edit datasets in the UI
  * Introducing datasets! You can now upload datasets to Braintrust and use them in your experiments
  * Fix several performance issues in the SDK and UI
  * Complex data is now substantially more performant in the UI
  * The UI updates in real-time as new records are logged to experiments
  * Ergonomic improvements to the SDK and CLI
    * The JS library is now Isomorphic and supports both Node.js and the browser
    * The Evals CLI warns you when no files match the `.eval.[ts|js]` pattern
</Update>

<Update label="July 2023">
  * You can now break down scores by metadata fields
  * Improve performance for experiment loading (especially complex experiments)
  * Support for renaming and deleting experiments
  * When you expand a cell in detail view, the row is now highlighted
  * A new [framework](/evaluate/run-evaluations) for expressing evaluations in a much simpler way
  * `inputs` is now `input` in the SDK (>= 0.0.23) and UI
  * Improved diffing behavior for nested arrays
  * SDK updates that allow you to update an existing experiment `init(..., update=True)` and specify an id in `log(..., id='my-custom-id')`
  * Tables with lots and lots of columns are now visually more compact in the UI
  * A new Node.js SDK which mirrors the Python SDK
  * You can now swap the primary and comparison experiment with a single click
  * You can now compare `output` vs. `expected` within an experiment
  * Version 0.0.19 is out for the SDK
  * Support for real-time updates, using Redis
  * New settings page that consolidates team, installation, and API key settings
  * The experiment page now shows commit information for experiments run inside of a git repository
</Update>

<Update label="June 2023">
  * Experiments track their git metadata and automatically find a "base" experiment to compare against
  * The Python SDK's `summarize()` method now returns an `ExperimentSummary` object with score differences
  * Organizations can now be "multi-tenant"
  * New scatter plot and histogram insights to quickly analyze scores and filter down examples
  * API keys that can be set in the SDK and do not require user login
  * Improved performance for event logging in the SDK
  * Auto-merge experiment fields with different types
  * Tutorial guide + notebook
  * Automatically refresh cognito tokens in the Python client
  * New filter and sort operators on the experiments table
  * SQL query explorer to run arbitrary queries against one or more experiments
</Update>