Skip to main content

Installation

pip install zeroeval

Basic Setup

Replace hardcoded prompt strings with ze.prompt(). Your existing text becomes the fallback content that’s used until an optimized version is available.
import zeroeval as ze
from openai import OpenAI

ze.init()
client = OpenAI()

system_prompt = ze.prompt(
    name="support-bot",
    content="You are a helpful customer support agent for {{company}}.",
    variables={"company": "TechCorp"}
)

response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "How do I reset my password?"}
    ]
)
That’s it. Every call to ze.prompt() is tracked, versioned, and linked to the completions it produces. You’ll see production traces at ZeroEval → Prompts.
When you provide content, ZeroEval automatically uses the latest optimized version from your dashboard if one exists. The content parameter serves as a fallback for when no optimized versions are available yet.

Version Control

Auto-optimization (default)

prompt = ze.prompt(
    name="customer-support",
    content="You are a helpful assistant."
)
Uses the latest optimized version if one exists, otherwise falls back to the provided content.

Explicit mode

prompt = ze.prompt(
    name="customer-support",
    from_="explicit",
    content="You are a helpful assistant."
)
Always uses the provided content. Useful for debugging or A/B testing a specific version.

Latest mode

prompt = ze.prompt(
    name="customer-support",
    from_="latest"
)
Requires an optimized version to exist. Fails with PromptRequestError if none is found.

Pin to a specific version

prompt = ze.prompt(
    name="customer-support",
    from_="a1b2c3d4..."  # 64-char SHA-256 hash
)

Prompt Library

For more control, use ze.get_prompt() to fetch prompts from the Prompt Library with tag-based deployments and caching.
prompt = ze.get_prompt(
    "support-triage",
    tag="production",
    fallback="You are a helpful assistant.",
    variables={"product": "Acme"},
)

print(prompt.content)
print(prompt.version)
print(prompt.model)

Parameters

ParameterTypeDefaultDescription
slugstrPrompt slug (e.g. "support-triage")
versionintNoneFetch a specific version number
tagstr"latest"Tag to fetch ("production", "latest", etc.)
fallbackstrNoneContent to use if the prompt is not found
variablesdictNoneTemplate variables for {{var}} tokens
task_namestrNoneOverride the task name for tracing
renderboolTrueWhether to render template variables
missingstr"error"What to do with missing variables: "error" or "ignore"
use_cacheboolTrueUse in-memory cache for repeated fetches
timeoutfloatNoneRequest timeout in seconds

Return value

Returns a Prompt object with:
FieldTypeDescription
contentstrThe rendered prompt content
versionintVersion number
version_idstrVersion UUID
tagstrTag this version was fetched from
is_latestboolWhether this is the latest version
modelstrModel bound to this version (if any)
metadatadictAdditional metadata
sourcestr"api" or "fallback"
content_hashstrSHA-256 hash of the content

Model Deployments

When you deploy a model to a prompt version in the dashboard, the SDK automatically patches the model parameter in your LLM calls:
system_prompt = ze.prompt(
    name="support-bot",
    content="You are a helpful customer support agent."
)

response = client.chat.completions.create(
    model="gpt-4",  # Gets replaced with the deployed model
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "Hello"}
    ]
)

Multi-Artifact Runs

When a single prompt-linked run produces multiple judged outputs (e.g. a final decision and a visual card), use ze.artifact_span to mark each output as a named artifact. The primary artifact becomes the default completion preview; secondary artifacts are accessible in the detail view.
with ze.artifact_span(
    name="final-decision",
    artifact_type="final_decision",
    role="primary",
    label="Final Decision",
) as s:
    s.set_io(input_data=ticket_text, output_data=decision_json)
See the full API and examples in Python Tracing Reference.

Manual Prompt-Linked Spans

When you want ze.prompt() to manage a specific LLM interaction but do not want global auto-instrumentation (e.g. your agent makes many LLM calls and only one should be tracked as a prompt generation), disable integrations and create the span yourself.

When to use this

  • Your codebase makes many LLM calls but only a subset should appear as prompt completions.
  • You use a provider that has no auto-integration (a custom HTTP endpoint, an internal model service, etc.).
  • You want full control over which codepath produces prompt-linked traces.

Setup

Disable all integrations, then initialize:
import zeroeval as ze

ze.init(disabled_integrations=["openai", "gemini", "langchain", "langgraph"])

Create a prompt-linked span

Call ze.prompt() inside an active span so the SDK writes task / zeroeval metadata onto the trace. Then open a child span with kind="llm" (or use ze.artifact_span()) around your provider call. The SDK automatically propagates prompt linkage to child spans in the same trace.
import zeroeval as ze

def handle_request(user_message: str):
    with ze.span(name="support-pipeline"):
        system_prompt = ze.prompt(
            name="customer-support",
            content="You are a helpful support agent for {{company}}.",
            variables={"company": "Acme"},
            from_="explicit",
        )

        # The <zeroeval> tag is embedded in the string returned by ze.prompt().
        # Strip it before sending to the provider if needed.
        clean_content = system_prompt.split("</zeroeval>", 1)[-1]

        with ze.span(name="llm.chat", kind="llm", attributes={
            "provider": "my-custom-provider",
            "model": "gpt-4o",
        }) as llm_span:
            response = my_provider.chat(
                messages=[
                    {"role": "system", "content": clean_content},
                    {"role": "user", "content": user_message},
                ]
            )
            llm_span.set_io(
                input_data=user_message,
                output_data=response.text,
            )
The SDK automatically stamps prompt metadata (task, zeroeval.prompt_version_id, etc.) onto every span in the trace, so the inner llm span is linked to the prompt version without any extra wiring. Judge evaluations, feedback, and the prompt completions page all work as if an auto-integration created the span. You can also use ze.artifact_span() instead of ze.span(kind="llm") when you want the output to appear as a named completion artifact:
with ze.artifact_span(
    name="support-reply",
    artifact_type="reply",
    role="primary",
    label="Support Reply",
) as s:
    response = my_provider.chat(messages=[...])
    s.set_io(input_data=user_message, output_data=response.text)

Sending Feedback

Attach feedback to completions to power prompt optimization:
ze.send_feedback(
    prompt_slug="support-bot",
    completion_id=response.id,
    thumbs_up=True,
    reason="Clear and concise response"
)
ParameterTypeRequiredDescription
prompt_slugstrYesPrompt name (same as used in ze.prompt())
completion_idstrYesUUID of the completion
thumbs_upboolYesPositive or negative feedback
reasonstrNoExplanation of the feedback
expected_outputstrNoWhat the output should have been
metadatadictNoAdditional metadata