feat(enterprise): add data drains for continuous export to S3 / webhook#4440
feat(enterprise): add data drains for continuous export to S3 / webhook#4440waleedlatif1 wants to merge 15 commits intostagingfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
PR SummaryHigh Risk Overview Implements new server-side infrastructure: DB tables/migration for drains + run history, org/plan/role gating (self-hosted opt-in via Adds destination implementations with security controls: S3 delivery with deterministic object keys + SSE and endpoint SSRF validation, and webhook delivery with DNS pinning, HMAC timestamp signatures, retry/backoff behavior; also updates Reviewed by Cursor Bugbot for commit 4fe0d0e. Configure here. |
Data drains let enterprise organizations continuously export Sim data (workflow logs, job logs, audit logs, copilot chats, copilot runs) to customer-controlled S3 buckets or HTTPS webhooks on hourly or daily schedules. Pairs with data retention to satisfy long-term compliance archives. Built around two registries (DrainSource + DrainDestination) so adding new sources or destinations is a single-file change. Cursor-based at-least-once delivery; cursor advances only on full success and rows carry stable ids so consumers can dedupe. Includes SSRF-validated webhooks with DNS pinning, HMAC-SHA256 timestamp signatures, S3 server-side encryption, audit logging on every config and run change, and self-hosted env var gating that mirrors data retention.
53850b2 to
db5d298
Compare
Greptile Summary
Confidence Score: 5/5Safe to merge — all previously flagged P1/P0 issues have been addressed; only P2 style and minor responsiveness findings remain. No P0 or P1 findings in the current diff. The two remaining issues (backoff sleep not honouring abort signal, contract-level endpoint schema looser than destination schema) are P2 quality/style items that do not affect correctness or security. The core cursor logic, SSRF validation, access-gate checks, and abort propagation are all correctly implemented. apps/sim/lib/data-drains/destinations/webhook.ts (backoff sleep), apps/sim/lib/api/contracts/data-drains.ts (endpoint schema consistency) Important Files Changed
Sequence DiagramsequenceDiagram
participant Cron as Cron Route
participant Dispatcher
participant Queue as Job Queue
participant Runner as run-data-drain Task
participant Service as runDrain()
participant Source as DrainSource
participant Dest as DrainDestination
Cron->>Dispatcher: dispatchDueDrains()
Dispatcher->>Dispatcher: reapOrphanedRuns()
Dispatcher->>DB: SELECT due drains
loop per candidate
Dispatcher->>Billing: isOrganizationOnEnterprisePlan()
Dispatcher->>DB: UPDATE lastRunAt (claim)
Dispatcher->>Queue: enqueue run-data-drain
end
Queue->>Runner: trigger task (signal)
Runner->>Service: runDrain(drainId, trigger, {signal})
Service->>DB: INSERT data_drain_runs (running)
Service->>Dest: openSession()
loop per chunk
Service->>Source: pages() → chunk
Service->>Dest: deliver(body, metadata, signal)
Dest-->>Service: locator
end
alt success
Service->>DB: UPDATE drain cursor + run status=success
else failure / cancelled
Service->>DB: UPDATE run status=failed, cursorAfter=cursorBefore
end
Service->>Dest: close()
Reviews (11): Last reviewed commit: "fix(data-drains): guard dispatcher rollb..." | Re-trigger Greptile |
…ument copilot_chats cursor - Thread AbortSignal through webhook test() and secureFetchWithPinnedIP so the route's 10s timeout actually cancels the outbound request - Re-validate destinationConfig against the typed schema in serializeDrain so unexpected JSONB shapes surface instead of leaking - Note in docs that drains export rows once on creation cursor; mutable copilot_chats fields are a point-in-time snapshot
|
@greptile |
|
@cursor review |
…SRF, unused hook) - webhook deliver: pass signal to secureFetchWithPinnedIP so aborts cancel the in-flight socket instead of waiting for the 30s timeout - S3 config: SSRF-validate the optional endpoint via validateExternalUrl so an enterprise admin cannot point the AWS SDK at internal/metadata addresses - hooks: remove unused useDataDrain (single-drain detail hook had no consumer) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@greptile |
|
@cursor review |
…, self-hosted gate)
- update body schema: drop the discriminated-union-with-.optional() that silently required destinationType for any non-undefined body. The route already validates destination payloads against the typed configSchema/credentialsSchema for the existing drain, so the contract is now a flat partial — clients can send {enabled:false} without supplying destinationType
- S3 buildKey: partition by run startedAt instead of new Date() per chunk so a single run that crosses midnight still lands under one YYYY/MM/DD prefix
- self-hosted gate: wire DATA_DRAINS_ENABLED into authorizeDrainAccess and the cron dispatcher route so the docs claim ("reserved for server-side feature gating") is actually enforced — mutating endpoints 404 and the dispatcher no-ops when unset
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@greptile |
|
@cursor review |
…elf-hosted isOrganizationOnEnterprisePlan returns false on deployments without billing infrastructure, so the dispatcher would silently skip every drain on self-hosted even with DATA_DRAINS_ENABLED=true. Mirror the access.ts pattern: when isBillingEnabled is false, treat all orgs as eligible — the cron route's DATA_DRAINS_ENABLED gate already controls global on/off. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…cher A throw from isOrganizationOnEnterprisePlan (Stripe outage, DB timeout) for one org used to propagate out of the for-loop and abort the whole dispatch batch. Wrap the check in try-catch so a single bad lookup just skips that drain — the next cron tick retries it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@greptile |
Docs promise 3 retries with 500ms/1s/2s backoff but MAX_ATTEMPTS=3 only delivered 2 retries (the 2s backoff was never reached). Bump to 4 so the initial attempt plus 3 retries match the published contract. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@greptile |
|
@cursor review |
Only role enforcement should relax for read-only callers — feature-flag and enterprise-plan checks must apply to reads too. Otherwise on self-hosted with DATA_DRAINS_ENABLED unset any org member can enumerate drain configs (bucket names, webhook URLs), and on Cloud an org that downgraded off Enterprise still exposes its old drain configs to every member. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two robustness fixes: - PUT /data-drains/[drainId]: return 404 if returning() yields no row, i.e. a concurrent DELETE landed between loadDrain and the UPDATE. - runDrain catch block: wrap the failed-status transaction so a DB outage during the status write doesn't mask the original delivery error. The reaper will eventually rewrite the row to failed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@greptile |
|
@cursor review |
Promise reject is idempotent so this wasn't a correctness bug, but routing the already-aborted branch through settledReject keeps all settling paths consistent and ensures cleanupAbort runs even if a listener somehow gets registered later. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@cursor review |
|
@greptile |
- service: throw on cancellation after pages loop so a run aborted mid-stream isn't recorded as success - audit-logs: include org-scoped rows (workspace_id IS NULL with metadata->>organizationId match) alongside workspace rows - access: require owner/admin for read routes too; drain configs leak bucket names and webhook URLs Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@cursor review |
|
@greptile |
…batch If the rollback update threw (e.g. transient DB error), the exception bubbled out of the for loop and silently skipped the rest of the candidate drains for the cycle. Wrap it so the batch continues. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@greptile |
|
@cursor review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 4fe0d0e. Configure here.
Summary
DATA_DRAINS_ENABLED/NEXT_PUBLIC_DATA_DRAINS_ENABLED, mirroring data retentionType of Change
Testing
bun run check:api-validationpassingChecklist