Python - ZeroEval Documentation

Installation

pip install zeroeval

Basic Setup

Replace hardcoded prompt strings with ze.prompt(). Your existing text becomes the fallback content that’s used until an optimized version is available.

import zeroeval as ze
from openai import OpenAI

ze.init()
client = OpenAI()

system_prompt = ze.prompt(
    name="support-bot",
    content="You are a helpful customer support agent for {{company}}.",
    variables={"company": "TechCorp"}
)

response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "How do I reset my password?"}
    ]
)

That’s it. Every call to ze.prompt() is tracked, versioned, and linked to the completions it produces. You’ll see production traces at ZeroEval → Prompts.

When you provide content, ZeroEval automatically uses the latest optimized version from your dashboard if one exists. The content parameter serves as a fallback for when no optimized versions are available yet.

Version Control

Auto-optimization (default)

prompt = ze.prompt(
    name="customer-support",
    content="You are a helpful assistant."
)

Uses the latest optimized version if one exists, otherwise falls back to the provided content.

Explicit mode

prompt = ze.prompt(
    name="customer-support",
    from_="explicit",
    content="You are a helpful assistant."
)

Always uses the provided content. Useful for debugging or A/B testing a specific version.

Latest mode

prompt = ze.prompt(
    name="customer-support",
    from_="latest"
)

Requires an optimized version to exist. Fails with PromptRequestError if none is found.

Pin to a specific version

prompt = ze.prompt(
    name="customer-support",
    from_="a1b2c3d4..."  # 64-char SHA-256 hash
)

Prompt Library

For more control, use ze.get_prompt() to fetch prompts from the Prompt Library with tag-based deployments and caching.

prompt = ze.get_prompt(
    "support-triage",
    tag="production",
    fallback="You are a helpful assistant.",
    variables={"product": "Acme"},
)

print(prompt.content)
print(prompt.version)
print(prompt.model)

Parameters

Parameter	Type	Default	Description
`slug`	`str`	—	Prompt slug (e.g. `"support-triage"`)
`version`	`int`	`None`	Fetch a specific version number
`tag`	`str`	`"latest"`	Tag to fetch (`"production"`, `"latest"`, etc.)
`fallback`	`str`	`None`	Content to use if the prompt is not found
`variables`	`dict`	`None`	Template variables for `{{var}}` tokens
`task_name`	`str`	`None`	Override the task name for tracing
`render`	`bool`	`True`	Whether to render template variables
`missing`	`str`	`"error"`	What to do with missing variables: `"error"` or `"ignore"`
`use_cache`	`bool`	`True`	Use in-memory cache for repeated fetches
`timeout`	`float`	`None`	Request timeout in seconds

Return value

Returns a Prompt object with:

Field	Type	Description
`content`	`str`	The rendered prompt content
`version`	`int`	Version number
`version_id`	`str`	Version UUID
`tag`	`str`	Tag this version was fetched from
`is_latest`	`bool`	Whether this is the latest version
`model`	`str`	Model bound to this version (if any)
`metadata`	`dict`	Additional metadata
`source`	`str`	`"api"` or `"fallback"`
`content_hash`	`str`	SHA-256 hash of the content

Model Deployments

When you deploy a model to a prompt version in the dashboard, the SDK automatically patches the model parameter in your LLM calls:

system_prompt = ze.prompt(
    name="support-bot",
    content="You are a helpful customer support agent."
)

response = client.chat.completions.create(
    model="gpt-4",  # Gets replaced with the deployed model
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "Hello"}
    ]
)

Multi-Artifact Runs

When a single prompt-linked run produces multiple judged outputs (e.g. a final decision and a visual card), use ze.artifact_span to mark each output as a named artifact. The primary artifact becomes the default completion preview; secondary artifacts are accessible in the detail view.

with ze.artifact_span(
    name="final-decision",
    artifact_type="final_decision",
    role="primary",
    label="Final Decision",
) as s:
    s.set_io(input_data=ticket_text, output_data=decision_json)

See the full API and examples in Python Tracing Reference.

Manual Prompt-Linked Spans

When you want ze.prompt() to manage a specific LLM interaction but do not want global auto-instrumentation (e.g. your agent makes many LLM calls and only one should be tracked as a prompt generation), disable integrations and create the span yourself.

When to use this

Your codebase makes many LLM calls but only a subset should appear as prompt completions.
You use a provider that has no auto-integration (a custom HTTP endpoint, an internal model service, etc.).
You want full control over which codepath produces prompt-linked traces.

Setup

Disable all integrations, then initialize:

import zeroeval as ze

ze.init(disabled_integrations=["openai", "gemini", "langchain", "langgraph"])

Create a prompt-linked span

Call ze.prompt() inside an active span so the SDK writes task / zeroeval metadata onto the trace. Then open a child span with kind="llm" (or use ze.artifact_span()) around your provider call. The SDK automatically propagates prompt linkage to child spans in the same trace.

import zeroeval as ze

def handle_request(user_message: str):
    with ze.span(name="support-pipeline"):
        system_prompt = ze.prompt(
            name="customer-support",
            content="You are a helpful support agent for {{company}}.",
            variables={"company": "Acme"},
            from_="explicit",
        )

        # The <zeroeval> tag is embedded in the string returned by ze.prompt().
        # Strip it before sending to the provider if needed.
        clean_content = system_prompt.split("</zeroeval>", 1)[-1]

        with ze.span(name="llm.chat", kind="llm", attributes={
            "provider": "my-custom-provider",
            "model": "gpt-4o",
        }) as llm_span:
            response = my_provider.chat(
                messages=[
                    {"role": "system", "content": clean_content},
                    {"role": "user", "content": user_message},
                ]
            )
            llm_span.set_io(
                input_data=user_message,
                output_data=response.text,
            )

The SDK automatically stamps prompt metadata (task, zeroeval.prompt_version_id, etc.) onto every span in the trace, so the inner llm span is linked to the prompt version without any extra wiring. Judge evaluations, feedback, and the prompt completions page all work as if an auto-integration created the span. You can also use ze.artifact_span() instead of ze.span(kind="llm") when you want the output to appear as a named completion artifact:

with ze.artifact_span(
    name="support-reply",
    artifact_type="reply",
    role="primary",
    label="Support Reply",
) as s:
    response = my_provider.chat(messages=[...])
    s.set_io(input_data=user_message, output_data=response.text)

Sending Feedback

Attach feedback to completions to power prompt optimization:

ze.send_feedback(
    prompt_slug="support-bot",
    completion_id=response.id,
    thumbs_up=True,
    reason="Clear and concise response"
)

Parameter	Type	Required	Description
`prompt_slug`	`str`	Yes	Prompt name (same as used in `ze.prompt()`)
`completion_id`	`str`	Yes	UUID of the completion
`thumbs_up`	`bool`	Yes	Positive or negative feedback
`reason`	`str`	No	Explanation of the feedback
`expected_output`	`str`	No	What the output should have been
`metadata`	`dict`	No	Additional metadata

​Installation

​Basic Setup

​Version Control

​Auto-optimization (default)

​Explicit mode

​Latest mode

​Pin to a specific version

​Prompt Library

​Parameters

​Return value

​Model Deployments

​Multi-Artifact Runs

​Manual Prompt-Linked Spans

​When to use this

​Setup

​Create a prompt-linked span

​Sending Feedback

Installation

Basic Setup

Version Control

Auto-optimization (default)

Explicit mode

Latest mode

Pin to a specific version

Prompt Library

Parameters

Return value

Model Deployments

Multi-Artifact Runs

Manual Prompt-Linked Spans

When to use this

Setup

Create a prompt-linked span

Sending Feedback