Skip to content

feat(enterprise): add data drains for continuous export to S3 / webhook#4440

Open
waleedlatif1 wants to merge 15 commits intostagingfrom
waleedlatif1/data-drains
Open

feat(enterprise): add data drains for continuous export to S3 / webhook#4440
waleedlatif1 wants to merge 15 commits intostagingfrom
waleedlatif1/data-drains

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

Summary

  • Continuously exports workflow logs, job logs, audit logs, copilot chats, and copilot runs to customer-owned S3 buckets or HTTPS webhooks on hourly or daily schedules
  • Pairs with data retention so customers can drain into long-term storage before Sim deletes
  • Built on two registries (DrainSource + DrainDestination) so future destinations are a single-file change
  • At-least-once delivery via opaque cursor that advances only on full success; consumers dedupe on stable row ids
  • SSRF-validated webhooks with DNS pinning, HMAC-SHA256 timestamp signatures, S3 server-side encryption, audit logging on every config and run change
  • Self-hosted gating via DATA_DRAINS_ENABLED / NEXT_PUBLIC_DATA_DRAINS_ENABLED, mirroring data retention

Type of Change

  • New feature

Testing

  • 26 unit tests passing (service, dispatcher, sources, S3, webhook)
  • bun run check:api-validation passing
  • Manually tested S3 + webhook end-to-end including failure paths and cursor replay

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

@vercel
Copy link
Copy Markdown

vercel Bot commented May 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped May 5, 2026 5:22am

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 5, 2026

PR Summary

High Risk
Introduces new org-scoped API endpoints, background jobs, and database tables that handle credential encryption and external network delivery (S3/webhooks), making correctness and SSRF/authn/authz critical.

Overview
Adds Enterprise Data Drains: configurable exports of workflow/job/audit/copilot data as NDJSON to either S3 or HTTPS webhooks, with hourly/daily scheduling and a UI for creation, enable/disable, manual run, connection testing, and viewing recent runs.

Implements new server-side infrastructure: DB tables/migration for drains + run history, org/plan/role gating (self-hosted opt-in via DATA_DRAINS_ENABLED), audited CRUD/run/test API routes, a cron dispatcher (/api/cron/run-data-drains) that fans out Trigger.dev run-data-drain jobs, and a runner that advances an opaque cursor only on full success (at-least-once semantics).

Adds destination implementations with security controls: S3 delivery with deterministic object keys + SSE and endpoint SSRF validation, and webhook delivery with DNS pinning, HMAC timestamp signatures, retry/backoff behavior; also updates secureFetchWithPinnedIP to support AbortSignal. Docs and Helm values are updated to expose the new feature and env flags.

Reviewed by Cursor Bugbot for commit 4fe0d0e. Configure here.

Data drains let enterprise organizations continuously export Sim data
(workflow logs, job logs, audit logs, copilot chats, copilot runs) to
customer-controlled S3 buckets or HTTPS webhooks on hourly or daily
schedules. Pairs with data retention to satisfy long-term compliance
archives.

Built around two registries (DrainSource + DrainDestination) so adding
new sources or destinations is a single-file change. Cursor-based
at-least-once delivery; cursor advances only on full success and rows
carry stable ids so consumers can dedupe.

Includes SSRF-validated webhooks with DNS pinning, HMAC-SHA256 timestamp
signatures, S3 server-side encryption, audit logging on every config
and run change, and self-hosted env var gating that mirrors data
retention.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 5, 2026

Greptile Summary

  • Adds a complete data-drain pipeline for continuous export of workflow logs, job logs, audit logs, copilot chats, and copilot runs to customer-owned S3 buckets or HTTPS webhooks, with hourly/daily scheduling, opaque cursor-based at-least-once delivery, and SSRF-validated destinations.
  • Implements a two-registry architecture (DrainSource + DrainDestination) enabling future destinations as a single-file change; all previously flagged issues (abort signal propagation, S3 SSRF endpoint validation, cursor commit on cancellation, orphaned run reaping, access-gate bypass on reads, and abort listener accumulation) have been addressed across prior fix commits.
  • Two remaining P2 items: the backoff sleep() in webhook.ts does not yield to the abort signal (up to 30 s delay on cancellation), and s3ConfigBodySchema in the public contract accepts HTTP endpoints at the contract boundary (HTTPS enforcement is deferred to the destination schema in the route handler).

Confidence Score: 5/5

Safe to merge — all previously flagged P1/P0 issues have been addressed; only P2 style and minor responsiveness findings remain.

No P0 or P1 findings in the current diff. The two remaining issues (backoff sleep not honouring abort signal, contract-level endpoint schema looser than destination schema) are P2 quality/style items that do not affect correctness or security. The core cursor logic, SSRF validation, access-gate checks, and abort propagation are all correctly implemented.

apps/sim/lib/data-drains/destinations/webhook.ts (backoff sleep), apps/sim/lib/api/contracts/data-drains.ts (endpoint schema consistency)

Important Files Changed

Filename Overview
apps/sim/lib/data-drains/service.ts Core drain orchestrator — cursor advances on success only, at-least-once delivery, aborted runs correctly marked failed, session always closed in finally block.
apps/sim/lib/data-drains/dispatcher.ts Conditional-claim dispatch with orphan reaping, per-org enterprise check wrapped in try-catch, claim rollback on enqueue failure — all previously flagged issues addressed.
apps/sim/lib/data-drains/destinations/webhook.ts SSRF-validated webhook with HMAC-SHA256 signatures, retry/backoff logic, and DNS pinning. Signal now threaded through; minor issue: backoff sleep (up to 30 s) does not honour the abort signal.
apps/sim/lib/data-drains/destinations/s3.ts S3 destination with SSE-AES256, DNS-aware endpoint SSRF check, per-run date partitioning using runStartedAt, and clean client lifecycle via openSession/close.
apps/sim/lib/data-drains/access.ts Feature-flag and enterprise-plan gates now apply to reads as well as writes; owner/admin role check remains gated on requireMutating.
apps/sim/lib/api/contracts/data-drains.ts Well-structured API contracts; the s3ConfigBodySchema endpoint field accepts HTTP URLs without SSRF restriction at the contract boundary, enforced only at the destination schema layer in the route handler.
apps/sim/lib/data-drains/sources/cursor.ts Composite (timestamp, id) cursor implemented correctly — no skips or duplicates at timestamp boundary; returns undefined for null cursor, which Drizzle's and() safely ignores.
packages/db/schema.ts Adds data_drains and data_drain_runs tables with proper FK cascades and composite covering indexes for drain sources; new composite indexes on existing tables align with drain query patterns.
apps/sim/lib/core/security/input-validation.server.ts Abort listener cleanup via settled resolve/reject wrappers correctly removes the listener on success, timeout, network error, and abort — prevents reference accumulation on long-lived signals.

Sequence Diagram

sequenceDiagram
    participant Cron as Cron Route
    participant Dispatcher
    participant Queue as Job Queue
    participant Runner as run-data-drain Task
    participant Service as runDrain()
    participant Source as DrainSource
    participant Dest as DrainDestination

    Cron->>Dispatcher: dispatchDueDrains()
    Dispatcher->>Dispatcher: reapOrphanedRuns()
    Dispatcher->>DB: SELECT due drains
    loop per candidate
        Dispatcher->>Billing: isOrganizationOnEnterprisePlan()
        Dispatcher->>DB: UPDATE lastRunAt (claim)
        Dispatcher->>Queue: enqueue run-data-drain
    end
    Queue->>Runner: trigger task (signal)
    Runner->>Service: runDrain(drainId, trigger, {signal})
    Service->>DB: INSERT data_drain_runs (running)
    Service->>Dest: openSession()
    loop per chunk
        Service->>Source: pages() → chunk
        Service->>Dest: deliver(body, metadata, signal)
        Dest-->>Service: locator
    end
    alt success
        Service->>DB: UPDATE drain cursor + run status=success
    else failure / cancelled
        Service->>DB: UPDATE run status=failed, cursorAfter=cursorBefore
    end
    Service->>Dest: close()
Loading

Reviews (11): Last reviewed commit: "fix(data-drains): guard dispatcher rollb..." | Re-trigger Greptile

Comment thread apps/sim/lib/data-drains/destinations/webhook.ts Outdated
Comment thread apps/sim/lib/data-drains/destinations/webhook.ts Outdated
Comment thread apps/sim/lib/data-drains/serializers.ts
Comment thread apps/sim/lib/data-drains/dispatcher.ts
Comment thread apps/sim/lib/data-drains/sources/copilot-chats.ts
…ument copilot_chats cursor

- Thread AbortSignal through webhook test() and secureFetchWithPinnedIP so the route's 10s timeout actually cancels the outbound request
- Re-validate destinationConfig against the typed schema in serializeDrain so unexpected JSONB shapes surface instead of leaking
- Note in docs that drains export rows once on creation cursor; mutable copilot_chats fields are a point-in-time snapshot
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/lib/data-drains/destinations/webhook.ts
Comment thread apps/sim/ee/data-drains/hooks/data-drains.ts Outdated
Comment thread apps/sim/lib/data-drains/destinations/webhook.ts
Comment thread apps/sim/lib/data-drains/destinations/s3.ts
…SRF, unused hook)

- webhook deliver: pass signal to secureFetchWithPinnedIP so aborts cancel the in-flight socket instead of waiting for the 30s timeout
- S3 config: SSRF-validate the optional endpoint via validateExternalUrl so an enterprise admin cannot point the AWS SDK at internal/metadata addresses
- hooks: remove unused useDataDrain (single-drain detail hook had no consumer)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/lib/api/contracts/data-drains.ts Outdated
Comment thread apps/sim/lib/core/config/feature-flags.ts
Comment thread apps/sim/lib/data-drains/destinations/s3.ts Outdated
…, self-hosted gate)

- update body schema: drop the discriminated-union-with-.optional() that silently required destinationType for any non-undefined body. The route already validates destination payloads against the typed configSchema/credentialsSchema for the existing drain, so the contract is now a flat partial — clients can send {enabled:false} without supplying destinationType
- S3 buildKey: partition by run startedAt instead of new Date() per chunk so a single run that crosses midnight still lands under one YYYY/MM/DD prefix
- self-hosted gate: wire DATA_DRAINS_ENABLED into authorizeDrainAccess and the cron dispatcher route so the docs claim ("reserved for server-side feature gating") is actually enforced — mutating endpoints 404 and the dispatcher no-ops when unset

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/lib/data-drains/dispatcher.ts
…elf-hosted

isOrganizationOnEnterprisePlan returns false on deployments without billing
infrastructure, so the dispatcher would silently skip every drain on
self-hosted even with DATA_DRAINS_ENABLED=true. Mirror the access.ts pattern:
when isBillingEnabled is false, treat all orgs as eligible — the cron route's
DATA_DRAINS_ENABLED gate already controls global on/off.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment thread apps/sim/lib/data-drains/dispatcher.ts
…cher

A throw from isOrganizationOnEnterprisePlan (Stripe outage, DB timeout) for
one org used to propagate out of the for-loop and abort the whole dispatch
batch. Wrap the check in try-catch so a single bad lookup just skips that
drain — the next cron tick retries it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

Comment thread apps/sim/lib/data-drains/destinations/webhook.ts Outdated
Docs promise 3 retries with 500ms/1s/2s backoff but MAX_ATTEMPTS=3 only
delivered 2 retries (the 2s backoff was never reached). Bump to 4 so the
initial attempt plus 3 retries match the published contract.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/lib/data-drains/access.ts Outdated
Only role enforcement should relax for read-only callers — feature-flag
and enterprise-plan checks must apply to reads too. Otherwise on
self-hosted with DATA_DRAINS_ENABLED unset any org member can enumerate
drain configs (bucket names, webhook URLs), and on Cloud an org that
downgraded off Enterprise still exposes its old drain configs to every
member.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment thread apps/sim/app/api/organizations/[id]/data-drains/[drainId]/route.ts
Comment thread apps/sim/lib/data-drains/service.ts
Two robustness fixes:
- PUT /data-drains/[drainId]: return 404 if returning() yields no row,
  i.e. a concurrent DELETE landed between loadDrain and the UPDATE.
- runDrain catch block: wrap the failed-status transaction so a DB
  outage during the status write doesn't mask the original delivery
  error. The reaper will eventually rewrite the row to failed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Comment thread apps/sim/lib/core/security/input-validation.server.ts
Comment thread apps/sim/lib/data-drains/sources/cursor.ts
Promise reject is idempotent so this wasn't a correctness bug, but
routing the already-aborted branch through settledReject keeps all
settling paths consistent and ensures cleanupAbort runs even if a
listener somehow gets registered later.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

Comment thread apps/sim/lib/data-drains/service.ts
Comment thread apps/sim/lib/data-drains/dispatcher.ts Outdated
Comment thread apps/sim/lib/data-drains/sources/audit-logs.ts
Comment thread apps/sim/lib/data-drains/access.ts
- service: throw on cancellation after pages loop so a run aborted mid-stream
  isn't recorded as success
- audit-logs: include org-scoped rows (workspace_id IS NULL with
  metadata->>organizationId match) alongside workspace rows
- access: require owner/admin for read routes too; drain configs leak bucket
  names and webhook URLs

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

Comment thread apps/sim/lib/data-drains/dispatcher.ts Outdated
…batch

If the rollback update threw (e.g. transient DB error), the exception
bubbled out of the for loop and silently skipped the rest of the
candidate drains for the cycle. Wrap it so the batch continues.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 4fe0d0e. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant