Skip to main content

Monitoring Dashboard

Make your retrieval performance measurable. Set SLOs, track trends over time, and catch regressions before they impact users—all with low-overhead sampling.

Prerequisites: You've completed the Quickstart and understand the Context & Event Schema.


What You Get

  • Evaluator-defined Recall on unlabeled traffic (flag queries with recall < 1.0)
  • Precision (proxy): Ratio of supporting passages to total context (detects context bloat)
  • F1 & nDCG derived from recall + precision
  • P95 Latency tracking for evaluation time
  • Trend charts to catch drift over time
  • Environment filtering to compare prod vs staging vs dev
  • Evaluator Accuracy (when ground truth is provided)

Enable Monitoring

Control volume with sample_rate to manage costs.

from seer import SeerClient
import os

client = SeerClient()

def retrieve(query: str) -> list[dict]:
# your real retriever here
return [{"text": "...", "score": 0.9}]

def handle_request(query: str):
context = retrieve(query)

client.log(
task=query,
context=context,
metadata={
"env": os.getenv("APP_ENV", "prod"),
"feature_flag": "retrieval-v1",
},
sample_rate=0.10, # 10% of calls are evaluated
)

Sampling Guidance

Use CaseRecommended sample_rate
Change testing1.0 (100%) for test queries
High-volume production0.05 - 0.10 (5-10%)
Low-volume or critical0.25 - 0.50 (25-50%)
Debugging1.0 temporarily

Smart Sampling with Decorator

Use the decorator with dynamic sampling based on metadata:

from seer import seer_trace
import os

@seer_trace(
task_arg="query",
context_from_return=True,
metadata={"env": os.getenv("APP_ENV", "prod")},
sample_rate=0.10, # 10% sampling
)
def retrieve(query: str) -> list[dict]:
return [{"text": "...", "score": 0.9}]

# Logs automatically when called
results = retrieve("user question...")

The Monitoring Dashboard

Monitoring Dashboard

KPI Cards

At the top, you'll see summary metrics for the selected period:

MetricDescription
RecallAverage fraction of requirements covered
PrecisionAverage fraction of supporting documents
F1Harmonic mean of recall and precision
nDCGRanking quality (if scores provided)
P95 Latency95th percentile evaluation time
Evaluator AccuracyF1 against your ground truth (shown when gold data exists)

Each card shows delta vs. previous period (e.g., +2.1% week-over-week).

Trend Charts

The trend chart has three tabs:

TabMetricsScale
QualityRecall, Precision, F1, nDCG, Subquery Effectiveness0-100%
LatencyP50, P95 evaluation latencymilliseconds
StructureTrace Depth (multi-hop only)count

Toggle individual metrics on/off to focus your analysis.

Latency Tab

Latency Tab

Track evaluation latency percentiles over time. Useful for detecting slow-downs or backend issues.

Structure Tab

Structure Tab

For multi-hop retrieval, track trace depth to understand query complexity trends.

Filtering

Filter your view using:

  • Environment: Select a specific env (prod, staging, dev)
  • Period: 24 hours, 7 days, or 30 days
Coming Soon

We're adding support for filtering by feature_flag and custom metadata fields directly in the Monitoring dashboard. For now, use the A/B Testing page to compare feature flags.


Cost Management

  • Sampling keeps evaluation cost predictable. Start at 0.05 and tune up if needed.
  • SDK batching minimizes request overhead (events are queued and sent in batches).
  • Async mode (default) — logging never blocks your request path.

Estimated Costs

Monthly EvaluationsSeer Cost
100k~$16-20
1M~$160-200
10M~$1,600-2,000

Privacy Considerations

  • You control what you send. If passages are sensitive, include only what's needed for evaluation.
  • Use metadata to tag records with access boundaries (e.g., collection, tenant_id) for future filtering.
  • Consider truncating or summarizing very long passages.

Example: Tier-Based Sampling

from seer import SeerClient

client = SeerClient()

def handle_request(query: str, user_tier: str):
context = retrieve(query)

# Higher sampling for enterprise customers
rate = 0.25 if user_tier == "enterprise" else 0.05

client.log(
task=query,
context=context,
metadata={
"env": "prod",
"user_tier": user_tier,
"feature_flag": "retrieval-v1",
},
sample_rate=rate,
)

See Also