Operations: observability
A loyalty engine that you cannot debug at 2am is not production-grade. This page covers what to log on your side, which Bricqs metrics matter, what to alert on, and how to trace an event from POST to webhook delivery.
Key takeaways
Quick read- Log every event POST and webhook delivery on your side. Bricqs logs its side; you need yours.
- Track three Bricqs metrics: ingestion lag, webhook delivery success, and rules-engine evaluation time.
- Alert on three things: 5xx error rate, webhook delivery failures, and rules-engine latency above SLA.
- Tag every log line with idempotency_key. It is the universal trace id across the stack.
- Sample dashboards ship in the Bricqs dashboard. Mirror them in Grafana or Datadog if you operate there.
Logging
What to log on your side
export async function emitToBricqs(/* ... */) {
const start = Date.now();
try {
const res = await fetch(/* ... */);
const body = await res.json();
log.info({
event: "bricqs.event.posted",
participant_id,
event_type,
idempotency_key,
status: res.status,
duplicate: body.duplicate,
latency_ms: Date.now() - start,
});
return body;
} catch (err) {
log.error({
event: "bricqs.event.failed",
participant_id,
event_type,
idempotency_key,
error: err instanceof Error ? err.message : String(err),
latency_ms: Date.now() - start,
});
throw err;
}
}Tag every log line with idempotency_key. When something is wrong with one user, you grep one key across your logs, Bricqs logs, and webhook delivery records.
Metrics
The three Bricqs metrics that matter
Ingestion lag
Time from event POST to fact persisted. Healthy under 500ms p95. Above 2s suggests a queue backlog; check the dashboard ingestion graph.
Rules-engine evaluation time
Time from fact persisted to all programs evaluated. Healthy under 1s p95. Spikes suggest a misconfigured challenge or contest with too many participants.
Webhook delivery success
Percentage of webhook deliveries returning 2xx on first attempt. Healthy above 95%. Below 90% means your endpoint is unreliable; review timeouts and retries.
Alerts
The three alerts every team needs
1. Bricqs 5xx error rate
Threshold: > 1% of POST /events for 5 consecutive minutes.
Action: page on-call. Check Bricqs status; pause non-critical event sources.
2. Webhook delivery failures
Threshold: 3+ failed deliveries to your endpoint in 10 minutes.
Action: page on-call for your webhook handler. The most common cause is a
timeout on your side; check downstream dependencies (CRM, ESP).
3. Rules-engine evaluation latency
Threshold: p95 > 5s for 10 minutes.
Action: review recently-launched challenges. A misconfigured count evaluator
on a high-volume event can saturate the engine. Pause and reconfigure.Tracing
How to follow one event end-to-end
When a user reports "I did the thing but my points did not arrive":
1. Get the participant_id and the action (e.g. quiz_completed today).
2. Construct the likely idempotency_key:
p_<id>:quiz_completed:2026-04-30
3. grep your logs for that key.
4. Find the POST: status, duplicate flag, latency.
5. Open the Bricqs dashboard, /admin/events/search?idempotency_key=...
See the fact persisted, programs evaluated, points granted, webhooks fired.
6. If a webhook was meant to fire on your side, grep your inbound logs.
7. The break is usually obvious in step 4, 5, or 6.
The idempotency_key is the universal trace id. Tag everything with it.Dashboards
What ships in the Bricqs dashboard
Ingestion graph
Events received, accepted, deduped, failed. Per-minute resolution. Drill in by event_type or source.
Rules-engine load
Evaluations per second, p50/p95/p99 latency, top contributing programs. Helps spot a runaway challenge.
Webhook delivery
Per-subscription success rate, retry counts, failed deliveries with the response body. Click to redeliver.
Program health
Per-challenge enrolment, completion, drop-off. Per-contest scoring rate, top entries, fraud holds.
Common mistakes
What goes wrong
Logging the response body without redacting reward codes. Codes leak to the SIEM.
Mask code, secret, and authorization fields in your logger before shipping. The Bricqs SDK ships a redactor; reuse it.
Alerting on individual webhook failures. Pager fatigue.
Alert on rate or burst, not on count. 3 failures in 10 minutes beats 1 failure in an hour.
No correlation between request id and idempotency key. Tracing requires multiple greps.
Log both. Bricqs returns x-bricqs-request-id on every response; persist it alongside idempotency_key.
Dashboards in Bricqs only. Your own SRE team has no visibility.
Mirror the three core metrics in Grafana or Datadog. The dashboard is for marketing ops; SRE needs its own view.
Developer FAQ
Common questions when integrating gamification with Bricqs.
Ready to ship?
Wire it up with the Bricqs SDK or API
Headless SDK for React UIs, REST API for any backend. Same engine behind both.
