How computers & cloud storage work

A mental map for beginners and interview warm-ups. Analogies first, cloud names second—then you can dig into each vendor’s docs. Pairs with architect interview lens and SQL Reference Guide once you know what “warehouse” means.

Why this matters (jobs, money, and Snowflake)

Interviewers and architects care that you can separate compute (CPU/RAM doing work right now) from storage (bytes kept durably) from network (moving bytes between places). Mixing them up leads to wrong cost guesses and weak designs.

Cost: You often pay for storage by GB-month and for compute by time or query power. Moving the same file repeatedly across the internet can mean egress charges—another line item.
Latency: RAM is microseconds; disk is slower; the network adds milliseconds to seconds. “Fast query” usually means less data moved and smarter pruning—not a bigger laptop.
Compliance: Region choice (where data physically lives) and who can decrypt (keys, IAM) show up in every regulated project.

Snowflake maps cleanly: your virtual warehouse is compute; long-lived tables lean on cloud object storage in the account’s region; the control plane coordinates auth, metadata, and query planning.

Interview phrase: “We optimized bytes scanned and warehouse concurrency, not just SQL syntax—because that’s where credits and SLAs live.”

1 — What is a computer, physically?

Four friends that work together:

CPU — the “thinker”: runs instructions one after another (billions per second). Hot and fast.
Memory (RAM) — the desk: data the CPU is working on right now. Big but cleared when power goes.
Storage (SSD / disk) — the filing cabinet: keeps programs and files when the machine is off. Slower than RAM but persistent.
Network — the phone line: talks to other computers (including “the cloud”).

Fun analogy: Cooking — CPU is you chopping, RAM is the cutting board (limited space), disk is the fridge (everything stored until you need it), network is ordering groceries delivered.

2 — What is software?

Software is instructions + data. Layers stack from metal upward:

YOU → browser or app (“I want to see my data”) ↓ APP → your application code (web app, game, Snowflake SQL worksheet…) ↓ OS → Windows / macOS / Linux (manages CPU, RAM, files, network) ↓ HARDWARE → CPU · RAM · disk · network card

Libraries and frameworks sit inside the “app” box—they reuse someone else’s solved problems (draw a button, talk HTTPS, parse JSON).

3 — What happens when you click “Run query”?

Your browser sends a request over the network (HTTPS).
A server in a data center receives it (a computer you rent, not the one on your desk).
That server runs software (Snowflake’s services + warehouses), reads/writes storage, and sends a result back.
Your screen paints rows—the “answer” was computed somewhere else.

So “the cloud” mostly means someone else’s computers running your workload, billed by use.

OLTP vs OLAP (two kinds of “database work”)

Same word “database,” very different jobs. Knowing which world you are in stops you from using the wrong tool for the pattern.

	OLTP (online transaction processing)	OLAP / analytics (warehouse workload)
Typical question	“Insert this order,” “update this balance,” “show this customer’s last login.”	“Revenue by region for three years,” “funnel conversion last week,” “train features from billions of rows.”
Row pattern	Many small reads/writes; low latency per operation.	Large scans, aggregations, joins; throughput matters more than single-row speed.
Common homes	PostgreSQL, MySQL, SQL Server, Oracle—often backing a product or store.	Snowflake, BigQuery, Redshift, Databricks SQL—curated analytics and reporting.
Storage shape (conceptual)	Row-friendly layouts; indexes for point lookups.	Often columnar or hybrid—great for “sum/average these columns across huge history.”

Rule of thumb: do not treat a warehouse like the primary database for thousands of single-row writes per second from a shopping app—that is OLTP territory. Land events fast, then batch or micro-batch into analytics stores.

Data warehouse, data mart, lake, lakehouse (one paragraph each)

Data warehouse: A governed place for analytics-ready data—dimensions, facts, conformed keys—so BI and SQL users get consistent answers. Workloads are mostly read-heavy and set-oriented.

Data mart: A smaller slice of the warehouse for one department (finance, sales). Same ideas, narrower scope—faster to build and permission.

Data lake: Cheap, flexible storage (often object storage + open file formats). Many teams can land raw data; many engines can read it. Flexibility goes up; governance discipline must go up too or it becomes a swamp.

Lakehouse (idea): Combine lake economics with warehouse-style tables—e.g. Apache Iceberg, Delta Lake—so you get ACID-ish table semantics and SQL over files. Products differ; the interview story is “one copy of truth, clearer contracts.”

ETL vs ELT

ETL: Extract → Transform outside the warehouse (tooling on VMs/containers) → Load curated tables.

ELT: Extract → Load raw or light shape → Transform inside the warehouse with SQL (scale compute when needed).

Neither is universally “right”—compliance, skill mix, and cost of large transforms drive the choice.

Batch vs “near real time”

Daily/hourly batch: Simple ops; higher latency acceptable.
Near real time (minutes): Micro-batches or continuous ingest into the warehouse—common for dashboards that do not need millisecond freshness.
True low latency (milliseconds): Usually streaming systems + OLTP stores—not the same SLA as a warehouse scan.

How data flows (landing → curated → consumed)

Most platform answers sound like a pipeline. You do not need every buzzword—just the direction of travel.

SOURCES (apps, SaaS, IoT, partners) │ ▼ ingest (API, CDC, files, events) OBJECT STORAGE (often Parquet/CSV/JSON “landing”) │ ▼ transform & model (SQL, Spark, dbt…) WAREHOUSE / LAKE TABLES (curated, governed) │ ▼ serve BI · notebooks · ML features · reverse ETL

Hyperscalers sell the object store, virtual machines / containers, managed databases, identity, and network paths that make this pipeline someone else’s day job to rack-and-stack—you still own data contracts, access rules, and the bill.

Why “object storage” matters for data platforms

Traditional files live in folders on a disk you manage. Object storage is a giant, API-driven warehouse of objects (files + metadata + a key like a path). It scales huge, is built for the network, and is the bedrock under many databases and data lakes.

Analogy: Instead of a basement full of labeled boxes you walk to yourself, you get a barcode system: “bring me object reports/2025/jan.parquet” and the system retrieves it from a massive automated warehouse.

Three big clouds: buckets at a glance

Each has regions (geography), durability (very hard to lose data), and access control (who can read/write). Names differ; idea is the same.

Concept	AWS	Google Cloud	Microsoft Azure
Object store product	S3 (Simple Storage Service)	Cloud Storage (GCS) — “buckets”	Blob Storage — containers & blobs
Unit you create	Bucket (globally unique name)	Bucket	Storage account → container → blob
Typical use	Data lake files (Parquet, CSV), backups, static websites, ingest landing zone before loading a warehouse
“Cold / cheap” tiers	S3 Glacier tiers, Intelligent-Tiering	Archive / Nearline / Coldline	Cool / Archive access tiers

Snowflake often reads your data from external stages pointing at these systems (with credentials and a URL). The warehouse compute is separate from where bytes sit—same interview story as “compute vs storage.”

Remember: The cloud logo is not magic—it’s disciplined ops: encryption, access keys, network paths, and bills. You’re the architect of who can touch which bucket.

Beyond buckets: what AWS, Azure, and GCP actually provide

Object storage is only one layer. Platforms are sold as building blocks you compose; interviews reward naming the category even if the product name slips.

Building block	Why it matters	AWS (examples)	Azure (examples)	GCP (examples)
Identity & access	Who may read/write which bucket or table? Least privilege reduces breach blast radius.	IAM roles & policies	Microsoft Entra ID, RBAC	IAM, service accounts
Network	Private paths from your VPC to managed services; fewer public endpoints.	VPC, PrivateLink	Virtual Network, Private Link	VPC, Private Service Connect
Encryption	Data at rest (disk/object) and in flight (TLS). Keys tied to compliance stories.	KMS	Key Vault	Cloud KMS
Regions & zones	Latency, residency, and DR: data and compute placement are policy questions—not just performance.	Pick region first; then enable services inside it. Cross-region replication exists but adds cost and complexity.

Snowflake runs in a cloud region; your external stages and egress patterns still follow the cloud provider’s rules. Verify private connectivity and encryption settings in current docs for each product.