How We Build Security Integrations at Secure.com

The Same Problem, Twenty Times Over

A new vendor lands on the roadmap. You skim the docs. The auth is almost OAuth2 but with a quirk. Pagination is cursor-based, except for one endpoint that’s offset-based, and another that’s a time window. The schema looks superficially like the last vendor’s — same fields, different names, different nesting, different timestamp format, different idea of what counts as an “actor.” You write the extractor. You write the pagination loop. You write the schema mapping. You wire the checkpoints. You debug the rate-limit handling for a week because the vendor’s documented limits are not the real limits. Three weeks gone.

Then the next vendor arrives. And you do all of it again.

We built more than a dozen security tools into our platform before we admitted the obvious: we kept solving the same problem with different code. Different auth shims, different pagination wrappers, different per-vendor schemas, different observability glue. Each integration looked unique up close, but the silhouettes all rhymed. So we stopped treating every vendor as a snowflake and started treating integrations as composition — pick the pattern, plug in the bricks, fill in the vendor-specific blanks.

This article is the playbook we wish we’d had on day one. It’s our opinion, not a standard — your archetype list and your pack list will look different, and that’s fine. The discipline of having a taxonomy matters more than which one you pick. By the end you’ll have three things:

A mental model — security integrations as archetype + capability packs. The eight buckets we keep landing in. The seven packs we’ve found sufficient.
A worked example — GitHub Audit Log API, traced end-to-end through every concept, so you can see how the abstractions land in real code.
An 8-step checklist — what to do on Monday morning when the next vendor shows up. Copy it, paste it, work it.

Nothing revolutionary here. Just the patterns we wish someone had told us before we wrote the same code a dozen times.

The Archetype Mindset

The first claim of this playbook is the one that took us longest to accept: every integration we’ve shipped fell into one of eight buckets, and the buckets — not the vendors — are what we should be designing around.

An integration archetype is a category of vendor data that demands the same shape — schema, semantics, idempotency rules, replay model — regardless of which vendor produced it. The vendor varies only the source mapping; the target is fixed. Once we accepted this, “build a new integration” stopped meaning “design something new” and started meaning “pick one of eight buckets and fill in the mapping.” That single reframe is what makes the rest of the platform tractable.

After more than a dozen integrations we keep landing in the same eight buckets. Your taxonomy might land on six or eleven — the count matters less than the discipline of having one. Here are ours, with the OCSF target each one normalizes to and a few vendors that fall into each:

Eight Archetypes, Many Vendors

Every security integration we’ve shipped fits in one of these rows.

Archetype

Type

Vendors

audit-event-logs

Type

api_activity · 6003

CloudTrail Azure Activity GitHub Audit Okta System Log

findings

Type

vulnerability_finding · 2002

Rapid7 InsightVM Qualys AWS Inspector GuardDuty

alerts-incidents

Type

detection_finding · 2004

CrowdStrike SentinelOne MS Defender SIEM

identity-access

Type

account_change · user_inventory

Okta Entra ID AWS IAM

OCSF still catching up

inventory-snapshot

Type

canonical asset register

Steampipe ServiceNow CMDB Kubernetes

ticketing-itsm

Type

canonical work-item

Jira ServiceNow PagerDuty

posture-scores

Type

canonical scorecard

BitSight SecurityScorecard Panorays

enrichment

Type

canonical reference data

VirusTotal Shodan GreyNoise

A quick aside for anyone new to the schema column: OCSF is the Open Cybersecurity Schema Framework — an open standard originally proposed by Splunk and AWS and now developed in the open by a broad vendor community. It gives security data a shared vocabulary: every “API call” looks like an API Activity event (class_uid 6003), every “vulnerability” looks like a Vulnerability Finding (class_uid 2002), and so on. We don’t have to invent these contracts; we just have to map to them.

Coverage honest-up: OCSF covers four of these eight archetypes cleanly today — audit-event-logs (API Activity 6003), findings (Vulnerability Finding 2002), alerts-incidents (Detection Finding 2004), and identity-access (Account Change / User Inventory). The other four — inventory-snapshot, ticketing-itsm, posture-scores, and enrichment — either don’t have OCSF classes yet or have classes that only partially fit (OCSF’s Software Inventory Info 5020 covers SBOM-style package lists, not general cloud-asset or container inventory). For those four, we maintain small canonical schemas internally and watch the OCSF roadmap. The mapping gap is real but bounded.

The punchline. Archetypes pin the target schema. OCSF gives the shared vocabulary where it has coverage. Only the vendor mapping differs. Every audit log on Earth — CloudTrail, Azure Activity, GitHub, Okta — ends up in the same Silver table (api_activity_6003). New vendor, zero new dashboards.

This is also why classification is the first step of every integration. Before we touch auth or pagination, we run a four-question test against the source: does each record have an actor, an action, a resource or target, and a timestamp?

If yes, it’s an audit-event-log and the target is API Activity (6003). If instead each record describes a finding — a CVE, a CVSS score, an affected device — it’s the findings archetype (2002). If records describe current state of a resource rather than an event, it’s inventory-snapshot. The classification is rarely ambiguous in practice, and when it is, the ambiguity itself is a signal that the source is doing two things and should be split into two extractors.

Worked example — GitHub Audit Log. A GitHub audit event has an actor (the user or app that performed the action), an action (repo.create, team.add_member, org.update_member), a repo / org / team as the target resource, and a @timestamp. Four-for-four on the heuristic, so the archetype is audit-event-logs and the OCSF target is API Activity (class_uid 6003). Concretely: GitHub events land in the same Silver table as AWS CloudTrail and Azure Activity Log.

The “show me everything user X did across our stack in the last 24 hours” dashboard already exists — it just starts returning GitHub rows the moment our extractor turns on. The detection scaffolding our security team already wrote over api_activity_6003 (query surface, joins, time windows) fires against GitHub events on day one — only the per-vendor action vocabulary needs to be added, and that’s a small mapping table, not a new pipeline.

That’s the whole leverage of the archetype mindset. We don’t gain it by writing better code per vendor; we gain it by refusing to write a bespoke schema per vendor.

Capability Packs: The Lego Bricks

If archetypes pin the target, capability packs handle the journey to get there. A capability pack is a reusable solution to a cross-cutting concern that every integration has to solve — authenticating to the source, walking its pages, surviving its rate limits, remembering where it left off, tolerating its schema changes, watching it in production, and never leaking the credential it uses. We treat these concerns as pre-built Lego bricks: each pack has a stable interface, a small set of well-tested implementations, and a configuration surface narrow enough to fit on a page.

New integrations don’t reinvent them; they configure them. Each integration is archetype + a combo of packs, and the only thing that genuinely changes between vendors is which packs you pick and how you tune them.

There are seven we’ve found sufficient in our platform. When we thought we needed an eighth, the “new” concern usually turned out to be one of these seven wearing a different hat — though there are real exceptions we haven’t hit yet (data-quality validation, lineage, dead-letter queues for webhook receivers, schema-registry/contract testing all live just outside the seven below and become first-class when you scale).

Auth. API key, Bearer token, OAuth2 client credentials, IAM role, or workload identity. Prefer short-lived; cache in memory only; never write to disk or logs.
Pagination. Cursor, page-number, offset, time-window, or Link-header. Each has different idempotency and replay properties — choose with that in mind.
Rate limiting. Token-bucket throttle on the way out, exponential backoff with jitter on 429, and always respect Retry-After (or the vendor’s reset header) before you guess.
Checkpoints. A persisted watermark per tenant per integration, with a small overlap window so late-arriving events aren’t silently dropped on the next run.
Schema drift. Pin the silver-layer schema. Unknown fields route to an unmapped_properties JSON column. We do not ALTER TABLE on every vendor surprise.
Observability. Per-window event counts, p95 extraction latency, watermark lag, and dedup ratio — emitted with consistent tags so every integration shows up on the same dashboard.
Secrets hygiene. Credentials come from a managed secrets store, are scoped per tenant, never appear in git or logs, and are re-fetched at the start of each run so rotation is invisible.

A note on delivery mode

The article’s worked example pulls from a REST API on a schedule, which is one delivery mode among four common ones. Delivery mode is orthogonal to archetype — a findings source might arrive as a daily S3 export or as a webhook stream — and it changes which packs dominate.

Pull-batch (GitHub Audit, AWS Inspector, most SaaS REST APIs) — what most of this article describes. Watermarks and pagination dominate.
Push / webhook (Okta Event Hooks, GitHub webhooks, AWS EventBridge → HTTPS) — signature verification, idempotent receipt, and a dead-letter queue dominate; watermarks aren’t a thing.
Streaming (Kafka topics from a SIEM, syslog/CEF over TCP) — consumer-group offsets replace watermarks; backpressure dominates.
File-drop (vendor exports to S3 or SFTP) — manifest detection, partial-write protection, and schema-on-read dominate.

The seven packs above still apply across all four modes, but their configuration shifts. We mention this because the worked example below is pull-batch — don’t read its specifics as universal.

GitHub Audit Log pack inventory

Put this against a real source and the abstraction stops being abstract. Here is the pack inventory we’d write for GitHub’s organization audit log (pull-batch over REST):

Pack	Choice
Auth	Bearer token — classic PAT with admin:org scope (or read:audit_log on GitHub Enterprise), or a GitHub App with Organization administration: read
Pagination	Cursor via Link: <…>; rel=”next” response header
Rate limit	Subject to GitHub’s primary (5,000 req/hr for PATs) and secondary rate limits; installation tokens scale higher — see GitHub’s rate-limit docs
Checkpoints	Watermark on event timestamp, with overlap tuned to observed delivery latency (see Section 5 rule #3)
Schema drift	New audit action names appear constantly → route to unmapped_properties
Observability	Log events_received per window; alert on empty unexpected windows

That table is, almost in its entirety, the integration-specific configuration that GitHub demands of our platform. The other five packs inherit their defaults: standard structured logging, the platform’s secrets store, the platform’s drift router, the platform’s retry policy. None of the rows above are GitHub-specific code — they’re six dials on a pre-built chassis.

SDKs first

Before showing the code, one disclaimer: when the vendor provides a well-maintained SDK (boto3 for AWS, octokit for GitHub, okta-sdk-python, slack-sdk), use it. We only drop to raw HTTP when we need multi-tenant token rotation that the SDK doesn’t expose, fine-grained retry control beyond the SDK’s defaults, or consistent observability hooks across vendors. For a typical first integration, an SDK plus the seven packs is enough — the sketches below are illustrative of shape, not a recommendation to hand-roll HTTP for every source.

Auth — Bearer header with a token-provider abstraction

Security Pattern

Bearer Token Header Builder

def bearer_headers(token_provider):

    # token_provider is a callable returning a fresh, in-memory token.
    # It hides whether the source is a static PAT, OAuth2 refresh,
    # an IAM-issued token, or a per-tenant secrets lookup.

    token = token_provider()

    return {
        "Authorization": f"Bearer {token}",
        "Accept": "application/json",
    }

Returns a standard Authorization header using a dynamically resolved bearer token source.

The token_provider is the seam. Swap a PAT for a GitHub App installation token, or for an OAuth2 client-credentials flow against another vendor, and nothing else in the extractor changes.

Pagination — Link-header cursor as a generator

Pagination Utility

Iterative API Pagination Handler

import re

def iter_pages(session, url, headers):

    while url:

        resp = session.get(url, headers=headers)

        resp.raise_for_status()

        yield resp.json()

        link = resp.headers.get("Link", "")

        # Illustrative only — naive regex that does not handle the full
        # RFC 8288 Link-header grammar (quoted strings with commas,
        # multiple link-values, alternate parameter ordering, etc.).
        #
        # In production use a real parser:
        # requests.utils.parse_header_links
        # or httpwg structured-headers implementation

        m = re.search(r'<([^>]+)>;\s*rel="next"', link)

        url = m.group(1) if m else None

Handles paginated API responses using Link headers with safe next-page traversal logic.

A generator, not a list — so the caller can stream pages straight to bronze, checkpoint between them, and stop early on shutdown without buffering the whole extraction in memory.

The deeper point: when we onboard the next audit-log vendor — Okta, Azure Activity, CloudTrail, the next one after that — we don’t write new auth code, new pagination code, new retry code, or new secrets plumbing. We pick the packs, fill in the configuration, and spend our effort on the only part that’s genuinely new: the source-to-OCSF mapping. That’s where the leverage lives.

The Pipeline Skeleton: Bronze → Silver → Gold

Once we picked our archetype and our capability packs, we still needed a place to put the data. We adopted the medallion architecture — Bronze, Silver, Gold — and we keep the labels because they’re widely understood, even when “Bronze” and “Silver” are just nicknames for “raw layer” and “conformed layer.” What actually matters is that each layer answers a different question and has a different contract with the next one downstream.

We think of the three layers like this:

Layer	The question it answers	What lives here	Schema rule
Bronze	Do we have the data?	Raw vendor payload, append-only, partitioned by ingest date. No business logic, no renaming, no type coercion.	Vendor-shaped — whatever the API returned.
Silver	Do we trust the data?	Archetype-normalized to OCSF. One canonical table per archetype, shared across every vendor that fits it.	Fixed per archetype.
Gold	Can we query the data?	Joined to dimensions (users, assets, accounts), partitioned by event time, shaped for analysts and dashboards.	Business-shaped, vendor-agnostic.

Bronze — “do we have the data?”

Bronze is the cheapest layer to get right and the most expensive one to get wrong. Its only job is to faithfully record what the vendor sent us, so that if anything downstream is broken we can rebuild from scratch without going back to the API. That means no transformations, no cleverness, no opinions. The raw JSON lands in an Iceberg table, partitioned by the date we ingested it, and stays there. If we later discover we mapped a field incorrectly, we replay from Bronze. We do not re-extract.

Silver — “do we trust the data?”

Silver is where the archetype earns its keep. Every audit-log vendor on Earth — CloudTrail, Azure Activity, GitHub Audit, Okta System Log — lands in the same Silver table, with the same columns, in OCSF API Activity shape (class_uid 6003). The mapping code is per-vendor, but the target is not. This is the discipline that makes the whole architecture work:

Silver schema is fixed per archetype. Unknown fields go to a JSON column. This is what keeps dashboards vendor-agnostic.

A new audit-log vendor cannot add a column. It can only add a row. Vendor-specific fields that do not map to OCSF — Azure’s RBAC claims, CloudTrail’s request parameters, GitHub’s per-event metadata — land in an unmapped_properties JSON blob. They are not lost; they are just not promoted to first-class columns. Promoting them would force every downstream query to know which vendor it was looking at, which is the exact problem we are trying to avoid.

Gold — “can we query the data?”

Gold is where the data meets the business. We join to the dimensions analysts actually ask about — user, asset, account, tenant — and we partition by event time, not ingest time, because compliance questions (“show me everything user X did between Y and Z”) only make sense against event time. Because Silver is vendor-agnostic, Gold is too. The “user activity” mart does not know or care whether a row originated in CloudTrail or GitHub.

GitHub Audit Log, end to end

See Figure 2 for the GitHub example end to end. The trace is short: the GitHub Audit Log API lands in a Bronze table called github_audit_log, raw payload, ingest-date partitioned. The Silver transform maps it into api_activity_6003, the same table that already holds CloudTrail and Azure Activity events. Gold is the existing “user activity” mart — the dashboard team writes zero new code. The integration is done when the new rows show up in the existing dashboard.

That is the payoff.

Bronze

“do we have the data?”

github_audit_log

raw API payload
partitioned by ingest_date

cloudtrail_log

raw API payload
partitioned by ingest_date

azure_activity_log

raw API payload
partitioned by ingest_date

Silver

“do we trust the data?”

api_activity_6003

            OCSF API Activity (class_uid 6003)

            schema: FIXED per archetype

            partitioned by event_time

            merge on stable event id

            unmapped_properties → JSON

            metadata_provider: github |

            cloudtrail | azure_activity

Gold

“can we query the data?”

user_activity_mart

vendor-agnostic
joined to user / asset / account
analyst-shaped, dashboard-ready

↑ every audit-log vendor lands here — new vendor, zero new dashboards

The Disciplines That Separate Prototypes from Production

A prototype ingests a vendor’s data once. A production integration ingests it every fifteen minutes for the next five years, survives backfills, replays, vendor outages, audits, and privacy reviews — without quietly corrupting itself or anyone’s data. The gap between those two states is not framework choice or cleverness. It is seven disciplines we now apply to every integration without exception. Skip any of them and you will eventually pay, usually at the worst possible moment.

Idempotency by design: MERGE on a stable event ID. Re-running the same window must produce the same bytes. We use a deterministic merge key — for GitHub audit events ingested via GitHub’s Splunk integration that’s _document_id, and for the raw REST API a composite of (@timestamp, actor, action, created_at) works as a stable surrogate. We write with a MERGE-style overwrite of the affected partitions. The minute you append instead of merge, every retry, backfill, and replay quietly doubles your row counts. Detection rules that count events become wrong. Compliance reports that count events become wrong. And nobody notices until an auditor does.
Partition by event time, not processing time. A late event that occurred at 23:58 yesterday must land in yesterday’s partition, not today’s. Compliance and detection queries are phrased in event time: “show me every privileged action between 14:00 and 15:00 on the 12th.” If you partition by ingest time, late arrivals — and they always arrive late, because every vendor’s API is eventually consistent — silently disappear from those queries. You will pass tests for weeks before someone notices the gap.
Silver schema is fixed; unknown fields go to a JSON column. Vendors add fields whenever they feel like it. If your Silver schema mutates every time that happens, you cannot unify across vendors, your dashboards break on every drift, and every downstream consumer plays whack-a-mole with new columns. We pin the Silver schema to the archetype’s canonical shape and route everything unmapped into a single JSON column. Surprises become inspectable data, not migrations.
Pick a multi-tenancy isolation model and enforce it mechanically. We chose schema-per-tenant because we wanted isolation by construction — one schema per customer, every query scoped by the schema itself rather than by a WHERE clause we have to remember to write. The trade-offs are real: metadata bloat past a few thousand tenants, fan-out cost on every migration, more catalog objects to monitor. Many serious platforms instead enforce a tenant column at the catalog/policy layer (Snowflake row-access policies, BigQuery authorized views, Databricks Unity Catalog RLS, Postgres RLS) with equal safety and better scaling characteristics. Either model can be production-grade; what we’d refuse to ship is a bare tenant_id filter with the isolation living only in handwritten query conventions. Pick a model, encode it where humans can’t accidentally bypass it, and move on.
metadata_provider is globally unique per integration. Every event in the shared Silver table carries a provider tag: github, cloudtrail, azure_activity. We treat this string as a namespace and forbid reuse. Two integrations sharing a value silently corrupt the shared table — joins overcount, deduplication merges unrelated events, and the contamination spreads to every Gold mart downstream. Nobody notices for weeks, because the rows still look plausible.
PII minimization at the Bronze → Silver boundary. Bronze keeps the raw vendor payload for replay; Silver is what analysts query, dashboards visualize, and rules run against. For every PII-bearing field a vendor sends us — user emails, source IPs, request bodies, internal note text, document titles — we make one of four calls at the mapping boundary: drop (the field stays in Bronze only and never enters Silver), hash with a per-tenant salt (preserves join semantics within the tenant, prevents cross-tenant correlation), mask to a redacted form (last-4 of a card, domain-only of an email), or keep with explicit justification. The default is drop, not keep. This is also when we set Bronze retention — typically much shorter than Silver’s regulatory tail — because the minimization in Silver is what makes long retention defensible. Skip this discipline and your blog post about dashboards becomes a blog post about how your dashboards leaked customer IPs.

A Build Checklist for Monday

Field Guide · OCSF Vendor Onboarding

The Monday-morning vendor playbook: any source, any API, in eight moves.

So what does this look like when the next vendor lands on your desk? Here’s the same playbook, condensed. A Monday-morning starting point you can apply to any vendor regardless of API shape, auth model, or data volume.

Read it top to bottom once. After that it becomes muscle memory, and every new connector starts looking like the last one.

Phase 1 · Classify

Decide what kind of source you’re actually looking at.

01 Step 01 / 08

Classify by archetype

Run the four-question test: actor + action + target + timestamp.

Use this matrix to pick the archetype before you write any code. If two answers fit, the source is doing two things and should be split into two extractors.

Actor + action + target + timestamp

⇒

Audit event logOCSF 6003

CVE / CVSS + an affected asset

⇒

FindingOCSF 2002

Current-state record, no event timeline

⇒

Inventory snapshot

Alert with severity + a status lifecycle

⇒

Alerts & incidents

02 Step 02 / 08

Determine delivery mode

The archetype tells you the target shape; the delivery mode tells you which packs dominate.

Pick exactly one. It changes which capability packs you need more than anything else.

Pull / batch Push / webhook Streaming File-drop

03 Step 03 / 08

Inventory capability packs

List the packs this vendor will need before you write a line.

Most of the cost in any connector lives in these capabilities, not in the business logic. Pagination becomes signature verification on webhooks, or offset management on streams. The slot is the same, the contents change with delivery mode.

Auth Pagination / signature / offset Rate limit Checkpoints Schema drift Observability Secrets hygiene

Phase 2 · Map

Decide what fields survive, what gets dropped, what gets masked.

04 Step 04 / 08

Draft the schema mapping spec

OCSF target ⇔ vendor source, with explicit decisions at every boundary.

Three decisions, made upfront in a document, not in code review:

1. Which vendor fields map cleanly to OCSF columns. 2. Which fields stay unmapped and route to a JSON column for forensic preservation. 3. Which PII-bearing fields to drop, hash, mask, or keep at the BronzeSilver boundary.

Phase 3 · Build

Three modules, three jobs. No logic crosses lines.

05 Step 05 / 08

Implement extraction

A single module with no business logic.

Authenticate, paginate (or subscribe, or list-and-download), and yield raw records downstream untouched. That’s the whole job.

Prefer a vendor SDK if one is well maintained. Drop to raw HTTP only when you need multi-tenant token rotation or fine-grained retry control. Those are the two places SDKs almost always disappoint.

06 Step 06 / 08

Implement Bronze ingest

Bronze Raw payloads land in Iceberg, append-only.

Partition by ingest date. Never mutate. Never apply business logic at this layer. Bronze is the replay tape. If Silver is wrong tomorrow, you re-derive it from here without re-pulling from the vendor.

07 Step 07 / 08

Implement Silver transform

Silver Bronze rows normalize into archetype-shaped OCSF.

Unknown fields serialize to a JSON column rather than being dropped. The Silver schema is fixed per archetype, not per vendor. So a dashboard built on audit-event-log keeps working when you onboard the next IdP, the next EDR, the next CASB.

Phase 4 · Ship

Nothing ships until every gate is green.

08 Step 08 / 08

Wire orchestration & gates

Pipeline status, freshness, alerting, and a verification pass.

Stand up pipeline status, freshness metrics, alerting, and an empty-window alarm. Then run the verification pass against the archetype’s quality gate. The run must reproduce end-to-end before the connector counts as shipped.

Schema conformance
Partitioning
Idempotency
Multi-tenant isolation
PII minimization
End-to-end reproduction

The playbook compresses to a single sentence: classify, deliver, inventory, map, extract, land, normalize, gate.

Every vendor looks bespoke on the first read. By step three, they all look the same. That’s the point.

Why This Pays Off

The first integration is expensive. The second is cheaper. The twentieth is almost free. That curve is the whole point.

When we started, every vendor was a from-scratch project. Now most of the work is done before we open the spec. The archetype pins the target schema. The capability packs cover the cross-cutting concerns. The Bronze/Silver/Gold skeleton and the dashboards already exist. What’s left is the vendor-specific mapping — fill-in-the-blank, not a research project. The platform gets cheaper to extend over time, not more expensive.

Three concrete ways this compounds:

Adding the Nth audit-log vendor takes hours, not weeks. Same archetype, same Silver table (OCSF API Activity, 6003), same Gold mart. We write the extraction module and the source-to-OCSF mapping; everything downstream already exists.
Detection scaffolding compounds across vendors — detection logic still needs a thin per-vendor layer. OCSF 6003 standardizes the envelope — actor, action, time, target — so the query surface, joins, time windows, and severity scoring of every detection rule are written once. The per-vendor action vocabulary (cloudtrail: ConsoleLogin vs. github: org.update_member vs. azure: Microsoft.Authorization/roleAssignments/write) still needs a mapping table. The expensive 80% is shared; the irreducible 20% stays small and lives in one place.
Dashboards built for one vendor’s data work for all vendors of that archetype. “User activity over time” doesn’t care whether the row came from Okta, GitHub, or Entra ID. The vendor column filters; the visualization stays the same.

None of this works without open standards and open infrastructure: OCSF gives us the shared vocabulary, Apache Iceberg the storage format, Apache Spark the compute. We composed them, we didn’t invent them.

A note on portability: the patterns in this article (archetype + capability packs + medallion + the seven production disciplines) are stack-agnostic.

Iceberg + Spark is what we run; the same discipline works just as well on Snowflake (with row-access policies for tenancy), BigQuery + dbt, Databricks + Unity Catalog, Trino + Iceberg, or even DuckDB for a small deployment.

Pick the substrate that matches your team and budget — the architecture is portable.

How We Build Security Integrations at Secure.com | A Practical Guide

The Same Problem, Twenty Times Over

The Archetype Mindset

Eight Archetypes, Many Vendors

Capability Packs: The Lego Bricks

A note on delivery mode

GitHub Audit Log pack inventory

SDKs first

Auth — Bearer header with a token-provider abstraction

The Pipeline Skeleton: Bronze → Silver → Gold

Bronze — “do we have the data?”

Silver — “do we trust the data?”

Gold — “can we query the data?”

GitHub Audit Log, end to end

Three Layers, One Convergence Point

The Disciplines That Separate Prototypes from Production

A Build Checklist for Monday

The Monday-morning vendor playbook: any source, any API, in eight moves.

Decide what kind of source you’re actually looking at.

Run the four-question test: actor + action + target + timestamp.

The archetype tells you the target shape; the delivery mode tells you which packs dominate.

List the packs this vendor will need before you write a line.

Decide what fields survive, what gets dropped, what gets masked.

OCSF target ⇔ vendor source, with explicit decisions at every boundary.

Three modules, three jobs. No logic crosses lines.

A single module with no business logic.

Bronze Raw payloads land in Iceberg, append-only.

Silver Bronze rows normalize into archetype-shaped OCSF.

Nothing ships until every gate is green.

Pipeline status, freshness, alerting, and a verification pass.

Why This Pays Off

Related Blogs

Types of Penetration Testing: A Complete Guide for AppSec Teams

Types of Penetration Testing: A Complete Guide for AppSec Teams

Cloud Asset Chaos: Why L1 Analysts Are Flying Blind in Multi-Cloud Environments

Cloud Asset Chaos: Why L1 Analysts Are Flying Blind in Multi-Cloud Environments

The Multi-Cloud Fragmentation Problem

The Multi-Cloud Fragmentation Problem