Skip to main content

Roadmap

What we shipped, what we are building next, and what we plan to build.

Last Shipped

PDF Support in the Playground

PlaygroundEvaluationObservability

Attach PDF documents to chat messages in the playground. Upload files, provide URLs, or use file IDs from provider APIs. Works with OpenAI, Gemini, and Claude models. PDFs are supported in evaluations and observability traces.

Provider Built-in Tools in the Playground

Use provider built-in tools like web search, code execution, and file search directly in the Playground. Supported providers include OpenAI, Anthropic, and Gemini. Tools are saved with prompts and automatically used via the LLM gateway.

Projects within Organizations

Create projects within organizations to divide work between different AI products. Each project scopes its prompts, traces, and evaluations independently.

Jinja2 Template Support in the Playground

Use Jinja2 templating in prompts to add conditional logic, filters, and template blocks. The template format is stored in the configuration schema, and the SDK handles rendering automatically.

Programmatic Evaluation through the SDK

Run evaluations programmatically from code with full control over test data and evaluation logic. Evaluate agents built with any framework and view results in the Agenta dashboard.

Online Evaluation

Automatically evaluate every request to your LLM application in production. Catch hallucinations and off-brand responses as they happen instead of discovering them through user complaints.

Customize LLM-as-a-Judge Output Schemas

Configure LLM-as-a-Judge evaluators with custom output schemas. Use binary, multiclass, or custom JSON formats. Enable reasoning for better evaluation quality.

In progress

Chat Session View in Observability

Display entire chat sessions in one consolidated view. Currently, each trace in a chat session appears in a separate tab. This feature will group traces by session ID and show the complete conversation in a single view.

Navigation Links from Traces to App/Environment/Variant

Add clickable links in the observability trace and drawer view to navigate to the application, variant, version, and environment used in each trace. Makes it easy to jump directly to the configuration that generated a specific trace.

Folders for Prompt Organization

Create folders and subfolders to organize prompts in the playground. Move prompts between folders and search within specific folders to structure prompt libraries.

Prompt Snippets

Create reusable prompt snippets that can be referenced across multiple prompts. Reference specific versions or always use the latest version to maintain consistency across prompt variants.

Date Range Filtering in Metrics Dashboard

We are adding the ability to filter traces by date range in the metrics dashboard.

Planned

AI-Powered Prompt Refinement in the Playground

Analyze prompts and suggest improvements based on best practices. Identify issues, propose refined versions, and allow users to accept, modify, or reject suggestions.

Open Observability Spans Directly in the Playground

PlaygroundObservability

Add a button in observability to open any chat span directly in the playground. Creates a stateless playground session pre-filled with the exact prompt, configuration, and inputs for immediate iteration.

Improving Navigation between Testsets in the Playground

We are making it easy to use and navigate in the playground with large testsets .

Appending Single Testcases in the Playground

Using testcases from different testsets is not possible right now in the Playground. We are adding the ability to append a single testcase to a testset.

Improving Testset View

We are reworking the testset view to make it easier to visualize and edit testsets.

Prompt Caching in the SDK

We are adding the ability to cache prompts in the SDK.

Testset Versioning

We are adding the ability to version testsets. This is useful for correctly comparing evaluation results.

Tagging Traces, Testsets, Evaluations and Prompts

We are adding the ability to tag traces, testsets, evaluations and prompts. This is useful for organizing and filtering your data.

Feature Requests

Upvote or comment on the features you care about or request a new feature.

Request a feature