Data Warehouses voor Marketing: BigQuery, Snowflake, Postgres
Your marketing data lives in 8-15 different systems. CRM has deal data. Google Analytics has web behavior. Your ad platforms have spend data. Your email tool has engagement data. Your billing system has revenue data.
Each system has its own dashboard. None of them agree. Your CAC calculation differs depending on which tool generated it. Your attribution report contradicts your revenue report. Your marketing team spends 40% of their time pulling and reconciling data instead of acting on it.
A data warehouse solves this by putting all marketing data in one place with one schema, making unified reporting possible. Here's how to choose and build one.
Why Marketing Needs a Data Warehouse
The Data Silo Problem
Consider a simple question: "What's our customer acquisition cost by channel?" To answer it, you need:
- Ad spend from Google Ads, LinkedIn Ads, Meta Ads (3 platforms)
- Attribution data from your analytics tool (which channel gets credit for each customer)
- Revenue data from your billing system (which of those attributed customers actually paid)
- CRM data from Salesforce or HubSpot (deal lifecycle and timestamps)
Without a warehouse, answering this requires exporting CSVs from each platform, joining them in a spreadsheet, and hoping the date formats, customer identifiers, and currency conversions align. They won't.
With a warehouse, a single SQL query joins ad spend, attribution, and revenue data by a common customer identifier and returns CAC by channel in seconds.
Why Dashboards Don't Solve It
"But my CRM has a dashboard." CRM dashboards show CRM data. They don't show ad spend, web behavior, or billing data. Even tools like Looker Studio or Tableau can connect to multiple sources directly — but they're querying live production databases, competing with transactional workloads for resources, and joining data in memory rather than on a proper query engine.
Dashboards connected directly to source systems are fine for monitoring. They're insufficient for analysis that spans multiple data sources.
BigQuery vs Snowflake vs Postgres
Three platforms dominate marketing data warehousing in B2B. Each serves a different profile.
| Criteria | BigQuery | Snowflake | Postgres |
|---|---|---|---|
| Pricing model | Per-query (bytes scanned) | Per-compute-second (warehouse credits) | Self-hosted or managed (fixed) |
| Free tier | 1 TB queries/month, 10 GB storage | $400 trial credits | Self-hosted: free |
| Monthly cost ($10M ARR company) | $100-$500 | $500-$2,000 | $50-$200 (self-hosted) |
| Query speed (10M rows) | < 3 seconds | < 3 seconds | 5-30 seconds |
| Setup complexity | Low (GCP account + SQL) | Medium (account + warehouse config) | Medium (server + schema design) |
| Scaling | Automatic | Semi-automatic (warehouse sizing) | Manual (vertical/read replicas) |
| Ecosystem (ETL/BI tools) | Excellent | Excellent | Good |
| Team skills needed | SQL + basic GCP | SQL + Snowflake admin | SQL + PostgreSQL admin + DevOps |
| Best for | Cost-sensitive, variable workloads | Heavy analytics, concurrent users | Small teams, existing Postgres stack |
BigQuery: The Default Choice
For most B2B companies under $50M ARR, BigQuery is the right answer. The pay-per-query model means you only pay for what you use. If your marketing team runs 20 queries/day scanning 50 GB of data, that's approximately $5/day — $150/month.
BigQuery requires no infrastructure management, scales automatically, and has the broadest ecosystem of connectors and BI tools. Every major ETL platform (Fivetran, Airbyte, Stitch) has a native BigQuery connector.
The downside: costs can spike unpredictably if someone runs a poorly optimized query that scans terabytes of data. Use partitioning and query cost alerts to prevent bill shock.
Snowflake: The Analytics Powerhouse
Snowflake is better when you have multiple teams running concurrent analytical workloads. Its unique architecture separates storage from compute, so you can scale query capacity independently. Marketing runs a heavy attribution model while finance runs revenue reconciliation — neither impacts the other.
Snowflake's cost model (compute credits per second) is more predictable than BigQuery's per-query pricing, but the minimum compute cost is higher. Expect $500-$2,000/month for a marketing analytics workload.
Choose Snowflake when: you have 5+ analysts running concurrent queries, your data volume exceeds 1 TB, or you need features like data sharing (sharing datasets across organizations).
Postgres: The Scrappy Option
A well-configured PostgreSQL instance handles most B2B marketing analytics needs at a fraction of the cost. With proper indexing, materialized views, and pg_cron for scheduled transformations, Postgres handles 100M+ row analytical queries.
Limitations: Postgres isn't designed for analytical workloads at scale. It lacks columnar storage, automatic query optimization for large scans, and the concurrency handling of purpose-built warehouses. Past 500M rows or 5+ concurrent analytical users, performance degrades.
Choose Postgres when: your team already runs Postgres, your data volume is under 50 GB, and you want to avoid additional vendor costs. It's the right choice for companies under $5M ARR who want warehouse capabilities without warehouse costs.
The Data Pipeline
Getting data from source systems into your warehouse requires an extraction, loading, and transformation pipeline.
ETL vs ELT
ETL (Extract, Transform, Load): Data is transformed before it enters the warehouse. Traditional approach, now mostly outdated for analytics.
ELT (Extract, Load, Transform): Raw data is loaded into the warehouse first, then transformed using SQL. This is the modern standard because warehouse compute is cheap and raw data preservation enables flexible analysis.
Use ELT. Load raw data from every source, then build transformation layers on top.
Data Ingestion Tools
| Tool | Type | Connectors | Monthly Cost | Best For |
|---|---|---|---|---|
| Fivetran | Managed ELT | 500+ | $1,000-$3,000 | Teams that want zero maintenance |
| Airbyte | Open-source ELT | 350+ | $0 (self-hosted) - $500 (cloud) | Budget-conscious, technical teams |
| Stitch (Talend) | Managed ELT | 130+ | $500-$1,500 | Simple pipelines, small data volumes |
| Custom scripts | Code | Unlimited | $0 + dev time | Unique sources, custom logic |
Fivetran is the gold standard for managed data ingestion. It handles schema changes automatically, monitors data freshness, and requires near-zero maintenance. The cost is worth it if your team isn't technical enough to maintain Airbyte.
Airbyte is the best value if you have someone comfortable with Docker and basic Python. Self-hosted Airbyte handles the same connectors as Fivetran at zero platform cost.
The Transformation Layer: dbt
dbt (data build tool) has become the standard for data transformation in warehouses. It lets you write SQL transformations as version-controlled models, with testing, documentation, and dependency management.
A typical marketing data transformation pipeline:
Raw Sources Staging Marts
──────────── ───────── ─────────
raw_google_ads → stg_ad_spend → mart_cac_by_channel
raw_salesforce → stg_deals → mart_pipeline_velocity
raw_stripe → stg_revenue → mart_ltv_cohorts
raw_ga4 → stg_sessions → mart_attribution
raw_hubspot → stg_contacts → mart_funnel_metrics
Each stage cleans, normalizes, and joins data. The final "mart" models are what your BI tool queries. dbt is open-source; dbt Cloud adds scheduling and a web IDE for $100-$500/month.
Building Marketing Reports from Warehouse Data
Connecting BI Tools
| BI Tool | Best For | Monthly Cost | Warehouse Compatibility |
|---|---|---|---|
| Looker Studio (Google) | Simple dashboards, free | $0 | BigQuery (native), others via connectors |
| Metabase | Self-hosted, open-source | $0 (self-hosted) - $85 | All major warehouses |
| Looker | Enterprise, governed analytics | $3,000-$5,000 | All major warehouses |
| Preset (Superset) | Open-source, cloud-hosted | $0 (self-hosted) - $500 | All major warehouses |
For most B2B companies, Metabase (self-hosted) or Looker Studio provides sufficient analytics capability at minimal cost. Looker and Tableau justify their premium when you need governed metrics (ensuring everyone uses the same definition of "MRR" or "active customer").
The Marketing Reports That Matter
With unified data in a warehouse, you can finally build reports that answer real business questions:
CAC by channel (blended). Join ad spend + attribution + revenue data. Show true cost-per-customer by first-touch channel.
Pipeline velocity. Join CRM stage timestamps. Show average time between first touch, MQL, SQL, opportunity, and closed-won — by segment, channel, and deal size.
Cohort analysis. Group customers by acquisition month. Track revenue, churn, and expansion per cohort over time. Identify which acquisition channels produce the highest-LTV customers.
Content ROI. Join web analytics (page views, time on page) with CRM data (did the visitor become a customer?). Identify which blog posts, case studies, and landing pages actually drive revenue.
Marketing efficiency. Total marketing spend / new revenue generated. Trend over time. The single number that tells you whether marketing is getting more or less efficient.
FAQ
How much does a marketing data warehouse cost in total? For a $10M ARR B2B company: BigQuery ($200/month) + Fivetran or Airbyte ($500-$2,000/month) + dbt Cloud ($100-$500/month) + Metabase ($0-$85/month) = $800-$2,785/month. That's $10,000-$33,000/year for complete marketing analytics infrastructure. Compare that to the $100,000+/year cost of a marketing analyst manually reconciling data from 8 tools.
Do we need real-time data in our warehouse? Almost certainly not. Marketing analysis is retrospective. A 15-minute lag between source system and warehouse is invisible for reporting. Real-time data replication costs 3-5x more than batch loading and adds complexity for no practical benefit. The exception: if you're building real-time personalization or triggering actions based on warehouse data, sub-minute latency matters.
How do we handle data governance? Define metric definitions in your dbt models and documentation. "CAC" means one thing, defined in one SQL model, queried by all dashboards. When stakeholders disagree on definitions, resolve it in the dbt model — not in competing spreadsheets. Restrict raw data access to the data team; give business users access to curated mart models.
What about reverse ETL — pushing warehouse data back to tools? Reverse ETL tools (Census, Hightouch) push computed segments and metrics from your warehouse back to operational tools — enriched contact scores to your CRM, audience segments to ad platforms, health scores to customer success tools. This closes the loop: data flows in for analysis and flows back out for action. Budget $500-$2,000/month for a reverse ETL tool, and implement it after your warehouse and dbt models are stable.
A marketing data warehouse is the infrastructure that turns data from an expense into an asset. Without one, every report is a manual reconciliation exercise. With one, your team spends time analyzing data and acting on insights instead of pulling CSVs. Empirium builds the full marketing data stack — from pipeline design to warehouse architecture to BI implementation. Let's build yours.