Contact
Strategy

Hurtownie Danych: BigQuery, Snowflake, Postgres

Empirium Team10 min read

Your marketing data lives in 8-15 different systems. CRM has deal data. Google Analytics has web behavior. Your ad platforms have spend data. Your email tool has engagement data. Your billing system has revenue data.

Each system has its own dashboard. None of them agree. Your CAC calculation differs depending on which tool generated it. Your attribution report contradicts your revenue report. Your marketing team spends 40% of their time pulling and reconciling data instead of acting on it.

A data warehouse solves this by putting all marketing data in one place with one schema, making unified reporting possible. Here's how to choose and build one.

Why Marketing Needs a Data Warehouse

The Data Silo Problem

Consider a simple question: "What's our customer acquisition cost by channel?" To answer it, you need:

  • Ad spend from Google Ads, LinkedIn Ads, Meta Ads (3 platforms)
  • Attribution data from your analytics tool (which channel gets credit for each customer)
  • Revenue data from your billing system (which of those attributed customers actually paid)
  • CRM data from Salesforce or HubSpot (deal lifecycle and timestamps)

Without a warehouse, answering this requires exporting CSVs from each platform, joining them in a spreadsheet, and hoping the date formats, customer identifiers, and currency conversions align. They won't.

With a warehouse, a single SQL query joins ad spend, attribution, and revenue data by a common customer identifier and returns CAC by channel in seconds.

Why Dashboards Don't Solve It

"But my CRM has a dashboard." CRM dashboards show CRM data. They don't show ad spend, web behavior, or billing data. Even tools like Looker Studio or Tableau can connect to multiple sources directly — but they're querying live production databases, competing with transactional workloads for resources, and joining data in memory rather than on a proper query engine.

Dashboards connected directly to source systems are fine for monitoring. They're insufficient for analysis that spans multiple data sources.

BigQuery vs Snowflake vs Postgres

Three platforms dominate marketing data warehousing in B2B. Each serves a different profile.

Criteria BigQuery Snowflake Postgres
Pricing model Per-query (bytes scanned) Per-compute-second (warehouse credits) Self-hosted or managed (fixed)
Free tier 1 TB queries/month, 10 GB storage $400 trial credits Self-hosted: free
Monthly cost ($10M ARR company) $100-$500 $500-$2,000 $50-$200 (self-hosted)
Query speed (10M rows) < 3 seconds < 3 seconds 5-30 seconds
Setup complexity Low (GCP account + SQL) Medium (account + warehouse config) Medium (server + schema design)
Scaling Automatic Semi-automatic (warehouse sizing) Manual (vertical/read replicas)
Ecosystem (ETL/BI tools) Excellent Excellent Good
Team skills needed SQL + basic GCP SQL + Snowflake admin SQL + PostgreSQL admin + DevOps
Best for Cost-sensitive, variable workloads Heavy analytics, concurrent users Small teams, existing Postgres stack

BigQuery: The Default Choice

For most B2B companies under $50M ARR, BigQuery is the right answer. The pay-per-query model means you only pay for what you use. If your marketing team runs 20 queries/day scanning 50 GB of data, that's approximately $5/day — $150/month.

BigQuery requires no infrastructure management, scales automatically, and has the broadest ecosystem of connectors and BI tools. Every major ETL platform (Fivetran, Airbyte, Stitch) has a native BigQuery connector.

The downside: costs can spike unpredictably if someone runs a poorly optimized query that scans terabytes of data. Use partitioning and query cost alerts to prevent bill shock.

Snowflake: The Analytics Powerhouse

Snowflake is better when you have multiple teams running concurrent analytical workloads. Its unique architecture separates storage from compute, so you can scale query capacity independently. Marketing runs a heavy attribution model while finance runs revenue reconciliation — neither impacts the other.

Snowflake's cost model (compute credits per second) is more predictable than BigQuery's per-query pricing, but the minimum compute cost is higher. Expect $500-$2,000/month for a marketing analytics workload.

Choose Snowflake when: you have 5+ analysts running concurrent queries, your data volume exceeds 1 TB, or you need features like data sharing (sharing datasets across organizations).

Postgres: The Scrappy Option

A well-configured PostgreSQL instance handles most B2B marketing analytics needs at a fraction of the cost. With proper indexing, materialized views, and pg_cron for scheduled transformations, Postgres handles 100M+ row analytical queries.

Limitations: Postgres isn't designed for analytical workloads at scale. It lacks columnar storage, automatic query optimization for large scans, and the concurrency handling of purpose-built warehouses. Past 500M rows or 5+ concurrent analytical users, performance degrades.

Choose Postgres when: your team already runs Postgres, your data volume is under 50 GB, and you want to avoid additional vendor costs. It's the right choice for companies under $5M ARR who want warehouse capabilities without warehouse costs.

The Data Pipeline

Getting data from source systems into your warehouse requires an extraction, loading, and transformation pipeline.

ETL vs ELT

ETL (Extract, Transform, Load): Data is transformed before it enters the warehouse. Traditional approach, now mostly outdated for analytics.

ELT (Extract, Load, Transform): Raw data is loaded into the warehouse first, then transformed using SQL. This is the modern standard because warehouse compute is cheap and raw data preservation enables flexible analysis.

Use ELT. Load raw data from every source, then build transformation layers on top.

Data Ingestion Tools

Tool Type Connectors Monthly Cost Best For
Fivetran Managed ELT 500+ $1,000-$3,000 Teams that want zero maintenance
Airbyte Open-source ELT 350+ $0 (self-hosted) - $500 (cloud) Budget-conscious, technical teams
Stitch (Talend) Managed ELT 130+ $500-$1,500 Simple pipelines, small data volumes
Custom scripts Code Unlimited $0 + dev time Unique sources, custom logic

Fivetran is the gold standard for managed data ingestion. It handles schema changes automatically, monitors data freshness, and requires near-zero maintenance. The cost is worth it if your team isn't technical enough to maintain Airbyte.

Airbyte is the best value if you have someone comfortable with Docker and basic Python. Self-hosted Airbyte handles the same connectors as Fivetran at zero platform cost.

The Transformation Layer: dbt

dbt (data build tool) has become the standard for data transformation in warehouses. It lets you write SQL transformations as version-controlled models, with testing, documentation, and dependency management.

A typical marketing data transformation pipeline:

Raw Sources          Staging            Marts
────────────        ─────────         ─────────
raw_google_ads  →  stg_ad_spend   →  mart_cac_by_channel
raw_salesforce  →  stg_deals      →  mart_pipeline_velocity
raw_stripe      →  stg_revenue    →  mart_ltv_cohorts
raw_ga4         →  stg_sessions   →  mart_attribution
raw_hubspot     →  stg_contacts   →  mart_funnel_metrics

Each stage cleans, normalizes, and joins data. The final "mart" models are what your BI tool queries. dbt is open-source; dbt Cloud adds scheduling and a web IDE for $100-$500/month.

Building Marketing Reports from Warehouse Data

Connecting BI Tools

BI Tool Best For Monthly Cost Warehouse Compatibility
Looker Studio (Google) Simple dashboards, free $0 BigQuery (native), others via connectors
Metabase Self-hosted, open-source $0 (self-hosted) - $85 All major warehouses
Looker Enterprise, governed analytics $3,000-$5,000 All major warehouses
Preset (Superset) Open-source, cloud-hosted $0 (self-hosted) - $500 All major warehouses

For most B2B companies, Metabase (self-hosted) or Looker Studio provides sufficient analytics capability at minimal cost. Looker and Tableau justify their premium when you need governed metrics (ensuring everyone uses the same definition of "MRR" or "active customer").

The Marketing Reports That Matter

With unified data in a warehouse, you can finally build reports that answer real business questions:

  1. CAC by channel (blended). Join ad spend + attribution + revenue data. Show true cost-per-customer by first-touch channel.

  2. Pipeline velocity. Join CRM stage timestamps. Show average time between first touch, MQL, SQL, opportunity, and closed-won — by segment, channel, and deal size.

  3. Cohort analysis. Group customers by acquisition month. Track revenue, churn, and expansion per cohort over time. Identify which acquisition channels produce the highest-LTV customers.

  4. Content ROI. Join web analytics (page views, time on page) with CRM data (did the visitor become a customer?). Identify which blog posts, case studies, and landing pages actually drive revenue.

  5. Marketing efficiency. Total marketing spend / new revenue generated. Trend over time. The single number that tells you whether marketing is getting more or less efficient.

FAQ

How much does a marketing data warehouse cost in total? For a $10M ARR B2B company: BigQuery ($200/month) + Fivetran or Airbyte ($500-$2,000/month) + dbt Cloud ($100-$500/month) + Metabase ($0-$85/month) = $800-$2,785/month. That's $10,000-$33,000/year for complete marketing analytics infrastructure. Compare that to the $100,000+/year cost of a marketing analyst manually reconciling data from 8 tools.

Do we need real-time data in our warehouse? Almost certainly not. Marketing analysis is retrospective. A 15-minute lag between source system and warehouse is invisible for reporting. Real-time data replication costs 3-5x more than batch loading and adds complexity for no practical benefit. The exception: if you're building real-time personalization or triggering actions based on warehouse data, sub-minute latency matters.

How do we handle data governance? Define metric definitions in your dbt models and documentation. "CAC" means one thing, defined in one SQL model, queried by all dashboards. When stakeholders disagree on definitions, resolve it in the dbt model — not in competing spreadsheets. Restrict raw data access to the data team; give business users access to curated mart models.

What about reverse ETL — pushing warehouse data back to tools? Reverse ETL tools (Census, Hightouch) push computed segments and metrics from your warehouse back to operational tools — enriched contact scores to your CRM, audience segments to ad platforms, health scores to customer success tools. This closes the loop: data flows in for analysis and flows back out for action. Budget $500-$2,000/month for a reverse ETL tool, and implement it after your warehouse and dbt models are stable.

A marketing data warehouse is the infrastructure that turns data from an expense into an asset. Without one, every report is a manual reconciliation exercise. With one, your team spends time analyzing data and acting on insights instead of pulling CSVs. Empirium builds the full marketing data stack — from pipeline design to warehouse architecture to BI implementation. Let's build yours.

Written by Empirium Team

Explore More

Deep-dive into related topics across our five pillars.

Pillar Guide

Nowoczesny Stos Marketing Operations

Warstwa po warstwie dla operatorów B2B w 2026.

View all Strategy articles

Related Resources

Need help with this?

Talk to Empirium