What is Seer?
Seer is a retrieval evaluation & monitoring platform that helps teams keep RAG/context pipelines accurate, auditable, and fast — without labeled data.
Why this matters: Wrong or missing context is the #1 source of bad AI answers. Seer continuously measures retrieval quality from unlabeled live traffic, so you can test changes faster, ship them confidently, and catch regressions early.
What Seer Does
- Log retrieval events from your app using a lightweight SDK
- Compute metrics automatically from unlabeled inputs:
- Recall: Our model enumerates the minimal requirements needed to answer a question and checks which are supported by the retrieved context.
recall = covered_requirements / total_requirements - Precision: what proportion of retrieved documents in context are useful to answer the question/task.
relevant_docs / total_docs - F1, nDCG: Derived from recall and precision
- Recall: Our model enumerates the minimal requirements needed to answer a question and checks which are supported by the retrieved context.
- Compare variants using
feature_flagin metadata for A/B testing - Monitor production with configurable sampling and real-time dashboards
How Seer Fits Your Stack
| Tool Type | Example | Seer's Role |
|---|---|---|
| Tracing | LangSmith, Arize | Seer focuses on retrieval quality signals, not full trace introspection. Use both together. Seer is OTEL-compatible and automatically picks up trace context. |
| Core Search Infra | Pinecone, Weaviate, Exa | These handle search. Seer evaluates whether the results were sufficient and helps evaluate and tune search infra for your product's specific needs. |
| Observability | Datadog, Grafana | Seer provides domain-specific retrieval metrics that complement general APM. |
Where Seer Fits In Your App
Seer integrates as a lightweight sidecar to your retrieval step:
- Your app retrieves context from vector DBs, search APIs, or agent tools
- After retrieval, call
client.log()with the query and results - Seer evaluates quality asynchronously — no impact on latency
Quick Example
from seer import SeerClient
client = SeerClient() # reads SEER_API_KEY from env
# Your retrieval step (vector DB, search API, agent tool, etc.)
def retrieve(query: str) -> list[dict]:
return [
{"text": "Christopher Nolan directed Inception.", "score": 0.95},
{"text": "Nolan is British-American.", "score": 0.89},
]
query = "Who directed Inception and what is their nationality?"
context = retrieve(query)
client.log(
task=query,
context=context,
metadata={
"env": "prod",
"feature_flag": "retrieval-v1", # for A/B testing
},
)
# Events are sent automatically in the background (fire-and-forget mode)
The SDK auto-flushes pending events when your process exits normally. You only need to call client.flush() explicitly if using os._exit(), process pools, or short-lived scripts.
Key Concepts
Task
The user query or question that triggered the retrieval.
Context
The list of passages/documents returned by your retriever. Can be simple strings or objects with metadata.
Metadata
Free-form key-value dict for filtering. Common fields:
env— environment (prod, staging, dev)feature_flag— A/B test variant
Recall
The fraction of requirements needed to answer the query that are covered by the context. recall = 1.0 means the context is complete.
Precision
The fraction of context passages that actually contribute to the answer. Low precision = context bloat.
Multi-Hop & Agentic Retrieval
For multi-step retrieval (decomposed queries, agent loops), Seer supports:
is_final_context— Mark which retrieval step provides the final evidence for the answersubquery— Track decomposed sub-questions for per-hop evaluation- Trace linking — Automatic OTEL trace context to group related retrievals
# Example: Multi-hop retrieval
client.log(
task="What awards did the director of Inception win?",
context=hop2_results,
subquery="What awards did Christopher Nolan win?", # this hop's goal
is_final_context=True, # final evidence for the answer
)
Learn more → Multi-Hop Retrieval Guide
Use Cases
| Use Case | How Seer Helps |
|---|---|
| Change Testing | Compare retrieval variants (top k, rerankers, hybrid) before shipping |
| Production Monitoring | Track recall/precision trends, catch drift early |
| Citation Auditing | Verify answers cite the right sources for compliance |
| Index Gap Analysis | Find information categories that need improvement in your knowledge bases |
Getting Started
- Quickstart — Make your first log in 5 minutes
- Python SDK — Full SDK reference
- Metrics — Understand what Seer computes
- Change Testing — A/B test retrieval changes
- Production Monitoring — Set up ongoing monitoring