A mental map for beginners and interview warm-ups. Analogies first, cloud names second—then you can dig into each vendor’s docs. Pairs with architect interview lens and SQL Reference Guide once you know what “warehouse” means.
Interviewers and architects care that you can separate compute (CPU/RAM doing work right now) from storage (bytes kept durably) from network (moving bytes between places). Mixing them up leads to wrong cost guesses and weak designs.
Snowflake maps cleanly: your virtual warehouse is compute; long-lived tables lean on cloud object storage in the account’s region; the control plane coordinates auth, metadata, and query planning.
Four friends that work together:
Software is instructions + data. Layers stack from metal upward:
Libraries and frameworks sit inside the “app” box—they reuse someone else’s solved problems (draw a button, talk HTTPS, parse JSON).
So “the cloud” mostly means someone else’s computers running your workload, billed by use.
Same word “database,” very different jobs. Knowing which world you are in stops you from using the wrong tool for the pattern.
| OLTP (online transaction processing) | OLAP / analytics (warehouse workload) | |
|---|---|---|
| Typical question | “Insert this order,” “update this balance,” “show this customer’s last login.” | “Revenue by region for three years,” “funnel conversion last week,” “train features from billions of rows.” |
| Row pattern | Many small reads/writes; low latency per operation. | Large scans, aggregations, joins; throughput matters more than single-row speed. |
| Common homes | PostgreSQL, MySQL, SQL Server, Oracle—often backing a product or store. | Snowflake, BigQuery, Redshift, Databricks SQL—curated analytics and reporting. |
| Storage shape (conceptual) | Row-friendly layouts; indexes for point lookups. | Often columnar or hybrid—great for “sum/average these columns across huge history.” |
Rule of thumb: do not treat a warehouse like the primary database for thousands of single-row writes per second from a shopping app—that is OLTP territory. Land events fast, then batch or micro-batch into analytics stores.
Data warehouse: A governed place for analytics-ready data—dimensions, facts, conformed keys—so BI and SQL users get consistent answers. Workloads are mostly read-heavy and set-oriented.
Data mart: A smaller slice of the warehouse for one department (finance, sales). Same ideas, narrower scope—faster to build and permission.
Data lake: Cheap, flexible storage (often object storage + open file formats). Many teams can land raw data; many engines can read it. Flexibility goes up; governance discipline must go up too or it becomes a swamp.
Lakehouse (idea): Combine lake economics with warehouse-style tables—e.g. Apache Iceberg, Delta Lake—so you get ACID-ish table semantics and SQL over files. Products differ; the interview story is “one copy of truth, clearer contracts.”
ETL: Extract → Transform outside the warehouse (tooling on VMs/containers) → Load curated tables.
ELT: Extract → Load raw or light shape → Transform inside the warehouse with SQL (scale compute when needed).
Neither is universally “right”—compliance, skill mix, and cost of large transforms drive the choice.
Most platform answers sound like a pipeline. You do not need every buzzword—just the direction of travel.
Hyperscalers sell the object store, virtual machines / containers, managed databases, identity, and network paths that make this pipeline someone else’s day job to rack-and-stack—you still own data contracts, access rules, and the bill.
Traditional files live in folders on a disk you manage. Object storage is a giant, API-driven warehouse of objects (files + metadata + a key like a path). It scales huge, is built for the network, and is the bedrock under many databases and data lakes.
Analogy: Instead of a basement full of labeled boxes you walk to yourself, you get a barcode system: “bring me object reports/2025/jan.parquet” and the system retrieves it from a massive automated warehouse.
Each has regions (geography), durability (very hard to lose data), and access control (who can read/write). Names differ; idea is the same.
| Concept | AWS | Google Cloud | Microsoft Azure |
|---|---|---|---|
| Object store product | S3 (Simple Storage Service) | Cloud Storage (GCS) — “buckets” | Blob Storage — containers & blobs |
| Unit you create | Bucket (globally unique name) | Bucket | Storage account → container → blob |
| Typical use | Data lake files (Parquet, CSV), backups, static websites, ingest landing zone before loading a warehouse | ||
| “Cold / cheap” tiers | S3 Glacier tiers, Intelligent-Tiering | Archive / Nearline / Coldline | Cool / Archive access tiers |
Snowflake often reads your data from external stages pointing at these systems (with credentials and a URL). The warehouse compute is separate from where bytes sit—same interview story as “compute vs storage.”
Object storage is only one layer. Platforms are sold as building blocks you compose; interviews reward naming the category even if the product name slips.
| Building block | Why it matters | AWS (examples) | Azure (examples) | GCP (examples) |
|---|---|---|---|---|
| Identity & access | Who may read/write which bucket or table? Least privilege reduces breach blast radius. | IAM roles & policies | Microsoft Entra ID, RBAC | IAM, service accounts |
| Network | Private paths from your VPC to managed services; fewer public endpoints. | VPC, PrivateLink | Virtual Network, Private Link | VPC, Private Service Connect |
| Encryption | Data at rest (disk/object) and in flight (TLS). Keys tied to compliance stories. | KMS | Key Vault | Cloud KMS |
| Regions & zones | Latency, residency, and DR: data and compute placement are policy questions—not just performance. | Pick region first; then enable services inside it. Cross-region replication exists but adds cost and complexity. | ||
Snowflake runs in a cloud region; your external stages and egress patterns still follow the cloud provider’s rules. Verify private connectivity and encryption settings in current docs for each product.