Foundations
SQL Command Categories
DDL — Data Definition
CREATE, ALTER, DROP, TRUNCATE. Defines and modifies schema structure. Auto-committed in most RDBMS.
DML — Data Manipulation
SELECT, INSERT, UPDATE, DELETE. Reads and modifies row data. Participates in transactions.
DCL — Data Control
GRANT, REVOKE. Manages user permissions and access control at the object level.
TCL — Transaction Control
BEGIN, COMMIT, ROLLBACK, SAVEPOINT. Controls the boundary and outcome of transactions.
DQL — Data Query
Some taxonomies separate SELECT into its own category. It is the most complex and performance-critical statement.
Execution Order
FROM → JOIN → WHERE → GROUP BY → HAVING → SELECT → DISTINCT → ORDER BY → LIMIT
Core Data Types
| Category | SQL Server | PostgreSQL | Use when |
|---|---|---|---|
| Integers | INT, BIGINT, SMALLINT | integer, bigint, smallint | Counts, IDs, flags. Prefer BIGINT for PKs at scale. |
| Exact decimal | DECIMAL(p,s), NUMERIC(p,s) | numeric(p,s) | Money, measurements. Never use FLOAT for currency. |
| Floating point | FLOAT, REAL | double precision, real | Scientific data. Expect rounding errors. |
| Strings | VARCHAR(n), NVARCHAR(n), CHAR(n) | varchar(n), text, char(n) | VARCHAR for variable, CHAR for fixed-width codes. |
| Dates/Times | DATETIME2, DATE, TIME, DATETIMEOFFSET | timestamp, timestamptz, date, time | Always store in UTC. Use DATETIMEOFFSET/timestamptz for global apps. |
| Boolean | BIT | boolean | SQL Server has no native BOOL — use BIT (0/1). |
| Semi-structured | JSON (stored as string), XML | json, jsonb | PostgreSQL's JSONB is binary and indexable — prefer it over json. |
| Identity / UUID | IDENTITY(1,1), UNIQUEIDENTIFIER | SERIAL, BIGSERIAL, uuid | Use UUIDs when rows are merged across databases; INT otherwise (smaller, faster index). |
NULL Semantics — The Three-Valued Logic
SQL uses three-valued logic: TRUE, FALSE, and UNKNOWN. Any comparison with NULL yields UNKNOWN, which is treated as FALSE in WHERE clauses. This causes common bugs.
-- These never return rows — comparisons with NULL are UNKNOWN
SELECT * FROM orders WHERE deleted_at = NULL; -- WRONG
SELECT * FROM orders WHERE deleted_at != NULL; -- WRONG
-- Correct NULL checks
SELECT * FROM orders WHERE deleted_at IS NULL;
SELECT * FROM orders WHERE deleted_at IS NOT NULL;
-- COALESCE returns first non-NULL value
SELECT COALESCE(phone, mobile, 'N/A') AS contact
FROM customers;
-- NULLIF returns NULL when both args are equal (avoids div-by-zero)
SELECT revenue / NULLIF(quantity, 0) AS unit_price
FROM sales;
SQL
Constraints
Declarative Integrity
- PRIMARY KEY — unique + NOT NULL. Every table should have one.
- FOREIGN KEY — referential integrity. Prevents orphaned rows.
- UNIQUE — one or more columns form a natural key.
- CHECK — arbitrary expression must be TRUE (e.g.
price > 0). - DEFAULT — supplies a value when INSERT omits the column.
- NOT NULL — prevents NULL storage for that column.
FK Cascade Strategies
ALTER TABLE order_items
ADD CONSTRAINT fk_order
FOREIGN KEY (order_id)
REFERENCES orders(id)
ON DELETE CASCADE -- auto-delete children
ON UPDATE NO ACTION; -- block PK changes
-- Other options: SET NULL, SET DEFAULT, RESTRICT
SQL
Joins, Subqueries & CTEs
Join Types
| Join | Returns | When to use |
|---|---|---|
INNER JOIN | Only rows with a match in both tables | The default. Use when both sides must exist. |
LEFT JOIN | All rows from left + matched rows from right (NULL if no match) | Optional relationship — e.g. customers with or without orders. |
RIGHT JOIN | All rows from right + matched rows from left | Rare. Swap table order and use LEFT JOIN instead for readability. |
FULL OUTER JOIN | All rows from both; NULL where no match | Reconciliation / diff queries between two datasets. |
CROSS JOIN | Cartesian product (every combination) | Generating test data, date/time grids. Dangerous on large tables. |
SELF JOIN | Table joined to itself with aliases | Hierarchies, org charts, adjacency lists. |
-- LEFT JOIN with NULL check to find missing relationships
SELECT c.id, c.name
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
WHERE o.id IS NULL; -- customers with NO orders
-- Self join: employee + their manager
SELECT
e.name AS employee,
m.name AS manager
FROM employees e
LEFT JOIN employees m ON m.id = e.manager_id;
SQL
Common Table Expressions (CTEs)
CTEs are named subqueries defined with WITH. They improve readability, allow self-reference (recursive CTEs), and in many databases are evaluated once and reused.
-- Non-recursive CTE: clean up a complex SELECT
WITH
monthly_revenue AS (
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(amount) AS revenue
FROM orders
GROUP BY 1
),
prev_month AS (
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_revenue
FROM monthly_revenue
)
SELECT
month,
revenue,
ROUND((revenue - prev_revenue) * 100.0 / prev_revenue, 2) AS pct_change
FROM prev_month
ORDER BY month;
SQL
-- Recursive CTE: walk an org-chart hierarchy
WITH RECURSIVE org_tree AS (
-- Anchor: start from the CEO
SELECT id, name, manager_id, 0 AS depth
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive step: each employee's reports
SELECT e.id, e.name, e.manager_id, t.depth + 1
FROM employees e
JOIN org_tree t ON t.id = e.manager_id
)
SELECT
REPEAT(' ', depth) || name AS hierarchy
FROM org_tree
ORDER BY depth, name;
SQL — PostgreSQL
#tmp / TEMP TABLE) when the intermediate result is large and queried multiple times — the engine will materialise it once and build statistics on it, which can dramatically improve plan quality.
Aggregation & Grouping
Aggregate Functions
COUNT
COUNT(*) counts rows. COUNT(col) ignores NULLs. COUNT(DISTINCT col) counts unique non-NULL values.
SUM / AVG
Both ignore NULLs. AVG divides by non-NULL count, not total rows — can be surprising with sparse data.
MIN / MAX
Work on any orderable type (numbers, strings, dates). Also ignore NULLs.
STRING_AGG
STRING_AGG(col, ', ') concatenates values into a comma-separated list. Supports WITHIN GROUP (ORDER BY ...) in SQL Server.
GROUPING SETS
Compute multiple groupings in one pass: GROUPING SETS ((a,b), (a), ()) gives subtotals and grand totals without UNION ALL.
ROLLUP / CUBE
ROLLUP(a,b,c) produces hierarchical totals. CUBE(a,b) produces all possible combinations of groupings.
-- HAVING filters on the result of aggregation
-- WHERE filters before aggregation — do not confuse them
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(amount) AS total_spent,
AVG(amount) AS avg_order,
MAX(order_date) AS last_order,
STRING_AGG(status, ', ' ORDER BY order_date) AS status_history
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY customer_id
HAVING SUM(amount) > 1000 -- only high-value customers
ORDER BY total_spent DESC;
SQL
-- ROLLUP: sales totals by region + country, with subtotals
SELECT
COALESCE(region, 'ALL REGIONS') AS region,
COALESCE(country, 'ALL') AS country,
SUM(revenue) AS total
FROM sales
GROUP BY ROLLUP(region, country)
ORDER BY region, country;
SQL
Window Functions
Window functions compute a value across a set of rows related to the current row without collapsing them. Unlike GROUP BY, the original rows are preserved.
function() OVER (PARTITION BY ... ORDER BY ... ROWS/RANGE BETWEEN ...). All three clauses are optional. No PARTITION BY = the whole result set is one window.
Ranking Functions
SELECT
name,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank,
NTILE(4) OVER (PARTITION BY department ORDER BY salary DESC) AS quartile
FROM employees;
-- ROW_NUMBER: always unique (1,2,3,4,5)
-- RANK: ties share same rank, gaps after (1,1,3,4,5)
-- DENSE_RANK: ties share same rank, no gaps (1,1,2,3,4)
SQL
Offset & Frame Functions
SELECT
order_date,
revenue,
-- Prior and next row values
LAG(revenue, 1, 0) OVER (ORDER BY order_date) AS prev_revenue,
LEAD(revenue, 1, 0) OVER (ORDER BY order_date) AS next_revenue,
-- First and last value in partition
FIRST_VALUE(revenue) OVER (PARTITION BY customer_id ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS first_order_value,
-- 7-day rolling average
AVG(revenue) OVER (ORDER BY order_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS rolling_7d_avg,
-- Running total
SUM(revenue) OVER (ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM daily_sales;
SQL
ROWS BETWEEN refers to physical row positions. RANGE BETWEEN groups all rows with the same ORDER BY value together. Use ROWS for rolling calculations — RANGE can include unexpected extra rows when there are ties.
Practical Pattern — Top-N Per Group
-- Top 3 products by revenue, per category
WITH ranked AS (
SELECT
category,
product_name,
revenue,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC) AS rn
FROM products
)
SELECT category, product_name, revenue
FROM ranked
WHERE rn <= 3;
SQL
Indexes & Access Patterns
Index Internals
Most indexes are B-Tree structures. Each leaf page contains the indexed value(s) plus a pointer to the actual row. Lookups are O(log n); range scans walk leaf pages in order.
Clustered Index (SQL Server) / Heap Table with PK (Postgres)
- One per table — the data pages ARE the index.
- Row data is physically sorted by the clustered key.
- Range scans and ORDER BY on the key are fast — sequential I/O.
- Inserts on a monotone key (IDENTITY) avoid page splits. Random UUIDs cause page splits — fragmentation.
Non-Clustered Index
- Separate structure with key + row locator (RID or clustered key).
- Multiple per table (up to 999 in SQL Server).
- A Key Lookup back to the base table is needed for non-indexed columns.
- Add an
INCLUDEclause to make it a covering index — avoids the lookup entirely.
Creating Effective Indexes
-- Basic index
CREATE INDEX idx_orders_customer
ON orders (customer_id);
-- Composite index — column order matters!
-- Best for: WHERE customer_id = ? AND status = ?
-- or: WHERE customer_id = ? (prefix match)
CREATE INDEX idx_orders_cust_status
ON orders (customer_id, status);
-- Covering index: avoids going back to base table
CREATE INDEX idx_orders_covering
ON orders (customer_id, order_date)
INCLUDE (amount, status); -- SQL Server / PostgreSQL
-- Partial index (PostgreSQL): index only active orders
CREATE INDEX idx_active_orders
ON orders (customer_id)
WHERE status = 'active';
-- Filtered index (SQL Server equivalent)
CREATE INDEX idx_active_orders
ON orders (customer_id)
WHERE status = 'active'; -- SQL Server 2008+
SQL
The Selectivity & SARGability Rules
Index Selectivity
An index is useful when it filters out most rows. A column with 2 distinct values (boolean) is low-selectivity — a full scan may be faster. A column with millions of distinct values (email, UUID) is high-selectivity — the index shines.
Rule of thumb: if a query returns >5–15% of rows, the optimiser will likely ignore the index and scan.
SARGable Predicates
A predicate is SARGable (Search ARGument Able) if the index can be used directly. Wrapping a column in a function breaks SARGability.
WHERE YEAR(order_date) = 2024— function on column, non-SARGableWHERE LOWER(email) = 'a@b.com'— non-SARGableWHERE amount + 10 > 100— non-SARGable
WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01'WHERE email = LOWER('A@B.COM')— move the function to the constantWHERE amount > 90
sys.dm_db_index_usage_stats; in PostgreSQL use pg_stat_user_indexes.
Transactions, ACID & Isolation
ACID Properties
Atomicity
All statements in a transaction succeed together or all are rolled back. No partial updates.
Consistency
A transaction can only bring the database from one valid state to another. All constraints, cascades, and triggers fire.
Isolation
Concurrent transactions appear to execute serially. The degree of isolation is tunable — see isolation levels.
Durability
Committed data survives crashes. Achieved via write-ahead logging (WAL / transaction log) and checkpoints.
Write-Ahead Log
Every change is written to the WAL before the data page. On crash, replay the WAL from the last checkpoint to recover.
SAVEPOINT
Partial rollback within a transaction without aborting the whole thing. Useful for loop-based batch processing.
Isolation Levels & Read Phenomena
| Isolation Level | Dirty Read | Non-Repeatable Read | Phantom Read | Use when |
|---|---|---|---|---|
READ UNCOMMITTED | ✓ Possible | ✓ Possible | ✓ Possible | Approximate analytics only. Never for financial data. |
READ COMMITTED | ✗ Prevented | ✓ Possible | ✓ Possible | Default in SQL Server & PostgreSQL. Most OLTP workloads. |
REPEATABLE READ | ✗ | ✗ Prevented | ✓ Possible | When you re-read the same rows and need consistent results. |
SERIALIZABLE | ✗ | ✗ | ✗ Prevented | Full isolation. Highest correctness, lowest throughput. |
SNAPSHOT (RCSI) | ✗ | ✗ | ✗ | SQL Server's MVCC mode. Readers don't block writers. Best for mixed OLTP/reporting. |
-- Setting isolation level for a session
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN TRANSACTION;
UPDATE accounts SET balance = balance - 500
WHERE id = 1;
UPDATE accounts SET balance = balance + 500
WHERE id = 2;
-- Only commits if BOTH updates succeed
COMMIT;
SQL Server
Locking & Deadlocks
Lock Types
- Shared (S) — acquired by reads. Compatible with other S locks.
- Exclusive (X) — acquired by writes. Blocks all other locks.
- Update (U) — intent to update. Prevents U+U deadlocks during read-then-write.
- Intent (IS/IX) — table-level signal that a row/page is locked below.
- Key-Range — used by SERIALIZABLE to prevent phantom inserts.
Preventing Deadlocks
- Access tables in the same order — most deadlocks come from T1 locking A then B while T2 locks B then A.
- Keep transactions short — minimise the window where locks are held.
- Use NOLOCK only for dirty-read-safe reporting — never for financial writes.
- Enable RCSI / Snapshot Isolation — readers never block writers in MVCC mode.
- Retry on deadlock — SQL Server error 1205; catch it and retry the transaction.
Query Optimisation
Reading Execution Plans
The query optimiser generates a plan — a tree of physical operators. Read the plan right-to-left, bottom-up. Arrows between operators show row flow; width indicates estimated rows.
Table Scan
Reads every row. Expected on small tables or low-selectivity queries. Bad on large tables — look for a missing index.
Index Seek
Uses the B-Tree to jump directly to matching rows. The goal. Appears when a SARGable predicate matches an index.
Index Scan
Reads all rows in the index. Better than a table scan if the index is covering. Consider a narrower predicate.
Key Lookup
Follows a pointer from a non-clustered index back to the base table. Expensive when repeated thousands of times — add INCLUDE columns to eliminate it.
Hash Join
Builds an in-memory hash table from the smaller input. Good for large unordered sets. Spills to disk if memory is exceeded — look for memory grant warnings.
Nested Loops
For each row in the outer input, probe the inner. Excellent when outer is small and inner has an index seek. Terrible with large outer inputs.
-- SQL Server: show estimated plan
SET SHOWPLAN_XML ON;
-- SQL Server: capture actual plan + stats
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
-- PostgreSQL: full actual execution details
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT * FROM orders WHERE customer_id = 42;
SQL
Common Optimisation Patterns
-- 1. Avoid SELECT * in production — column projection reduces I/O
SELECT id, name, email -- not SELECT *
FROM customers
WHERE status = 'active';
-- 2. EXISTS is faster than IN for correlated subqueries
-- (stops as soon as first match is found)
SELECT * FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.id
AND o.amount > 1000
);
-- 3. Avoid row-by-row CURSOR operations — use set-based UPDATE
UPDATE orders
SET status = 'expired'
WHERE order_date < DATEADD(day, -90, GETDATE())
AND status = 'pending';
-- 4. Batch large deletes to avoid log bloat & long lock holds
WHILE 1 = 1
BEGIN
DELETE TOP (5000) FROM audit_log
WHERE created_at < '2023-01-01';
IF @@ROWCOUNT = 0 BREAK;
END;
SQL Server
UPDATE STATISTICS after bulk loads. In PostgreSQL, ANALYZE tablename.
Stored Procedures, Functions & Views
Stored Procedures vs Functions
Stored Procedure
- Can perform DML (INSERT/UPDATE/DELETE) and DDL.
- Can have multiple result sets and output parameters.
- Can call COMMIT / ROLLBACK.
- Cannot be called inside a SELECT statement.
- Use for complex business logic, ETL, batch operations.
User-Defined Function (UDF)
- Returns a single scalar value or a table.
- Can be called inside SELECT, WHERE, JOIN.
- Scalar UDFs are notoriously slow in SQL Server — they inhibit parallelism and execute row-by-row.
- Prefer inline table-valued functions (iTVFs) — they are inlined into the query plan like a macro.
-- Inline Table-Valued Function (fast — inlined by optimiser)
CREATE FUNCTION dbo.GetOrdersByCustomer
(@customer_id INT, @start_date DATE)
RETURNS TABLE
AS
RETURN (
SELECT id, order_date, amount, status
FROM orders
WHERE customer_id = @customer_id
AND order_date >= @start_date
);
GO
-- Usage: joins neatly with other tables
SELECT c.name, o.amount
FROM customers c
CROSS APPLY dbo.GetOrdersByCustomer(c.id, '2024-01-01') o;
SQL Server
Views & Materialised Views
Regular View
- A named, saved SELECT — no data is stored.
- Always reflects current data — no staleness.
- Can be slower: the underlying query runs every time.
- Use for abstraction and security (hide raw table structure).
Materialised View / Indexed View
- Results are physically stored and kept in sync.
- Dramatically faster reads on complex aggregations.
- Writes are slower — every base table INSERT/UPDATE/DELETE must update the materialised view.
- SQL Server: Indexed View. PostgreSQL: MATERIALIZED VIEW (refreshed manually with
REFRESH MATERIALIZED VIEW).
Triggers — Use Sparingly
-- AFTER INSERT trigger: maintain an audit log
CREATE TRIGGER trg_orders_audit
ON orders
AFTER INSERT
AS
BEGIN
SET NOCOUNT ON;
INSERT INTO orders_audit (order_id, action, actioned_at)
SELECT id, 'INSERT', GETUTCDATE()
FROM INSERTED; -- INSERTED is a virtual table of new rows
END;
SQL Server
Normalization & Schema Design
Normal Forms
| Normal Form | Rule | Violation example |
|---|---|---|
| 1NF | Each column holds one atomic value; no repeating groups. | Storing tag1,tag2,tag3 in a single tags column. |
| 2NF | In 1NF + every non-key column depends on the entire primary key (matters only with composite PKs). | order_items(order_id, product_id, product_name) — product_name depends only on product_id. |
| 3NF | In 2NF + no non-key column depends on another non-key column (no transitive dependencies). | employees(id, dept_id, dept_name) — dept_name depends on dept_id, not on id. |
| BCNF | Stricter 3NF — every determinant must be a candidate key. | Rare in practice. Mostly theoretical for multi-valued dependencies. |
When to Denormalise
Normalisation is a correctness tool, not an absolute law. Denormalise deliberately when:
Read-heavy reporting
Pre-aggregate or flatten data into a summary table or data warehouse fact table. Avoids expensive joins on every report query. Populate via nightly ETL or event-driven triggers.
Example: store order_count and lifetime_value directly on the customers row and update them on each order.
Audit / Snapshot columns
Duplicate the name of the product at the time of purchase onto the order line item rather than joining through a product table that could change. Preserves historical accuracy.
Pattern: order_items(product_id, product_name_snapshot, unit_price_snapshot).
Surrogate vs Natural Keys
Surrogate Key (INT IDENTITY / BIGSERIAL)
- System-generated, meaningless. Stable — never changes.
- Small (4–8 bytes). Ideal for clustered indexes — monotone inserts, no page splits.
- Opaque to users — doesn't leak business data.
Natural Key (email, order_number)
- Business-meaningful. Self-documenting.
- Wider — more bytes in every FK and join.
- Can change (email address changes break FK chains).
- Use as a UNIQUE constraint alongside a surrogate PK.
Enterprise Patterns
Table Partitioning
Partitioning splits a large table into smaller physical segments while keeping a single logical table. The most common strategy is range partitioning on date — each month or year lives on its own partition.
Benefits
- Partition elimination — queries with a date range predicate only scan relevant partitions, not the full table.
- Fast archival — switch an old partition to an archive table in milliseconds (metadata-only operation).
- Parallel scan — the engine can scan partitions in parallel.
Watch-outs
- Partition key must be in every index and every query predicate for elimination to work.
- Cross-partition queries are slower than a single-partition scan.
- Most small tables don't benefit — gains become real above ~100M rows.
Temporal Tables (System-Versioned)
SQL Server 2016+ and PostgreSQL (via extensions) support temporal tables — the database automatically tracks the full history of every row change. No manual audit log needed.
-- Create a system-versioned temporal table
CREATE TABLE products (
id INT PRIMARY KEY,
name NVARCHAR(200) NOT NULL,
price DECIMAL(10,2) NOT NULL,
valid_from DATETIME2 GENERATED ALWAYS AS ROW START,
valid_to DATETIME2 GENERATED ALWAYS AS ROW END,
PERIOD FOR SYSTEM_TIME (valid_from, valid_to)
)
WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.products_history));
-- Query the state of a row at a point in time
SELECT * FROM products
FOR SYSTEM_TIME AS OF '2024-06-01'
WHERE id = 42;
SQL Server
JSON in SQL Server & PostgreSQL
-- SQL Server: extract from JSON column
SELECT
id,
JSON_VALUE(metadata, '$.shipping.city') AS city,
JSON_VALUE(metadata, '$.shipping.country') AS country
FROM orders
WHERE JSON_VALUE(metadata, '$.priority') = 'high';
-- PostgreSQL JSONB: index a specific path
CREATE INDEX idx_metadata_priority
ON orders ((metadata ->> 'priority'));
SELECT *
FROM orders
WHERE metadata ->> 'priority' = 'high'; -- uses the index
SQL
High Availability & Replication Patterns
SQL Server Always On AG
Synchronous or asynchronous replicas. Automatic failover on sync replicas. Readable secondaries for reporting offload.
PostgreSQL Streaming Replication
WAL shipped to standbys continuously. Hot standbys are readable. Tools like Patroni automate failover and leader election.
Read Replicas
Route SELECT queries to replicas to scale reads horizontally. Introduces replication lag — applications must tolerate stale reads or route writes + immediate reads to primary.
Connection Pooling
PgBouncer (PostgreSQL) / SQL Server connection pool (built-in). Reduces connection overhead — databases handle hundreds of connections poorly; poolers handle thousands.
Sharding
Horizontally split data across multiple database servers by shard key (e.g. customer_id % N). Eliminates cross-shard joins — design your access patterns first.
CQRS at the DB Layer
Separate write model (normalised OLTP) from read model (denormalised, pre-aggregated). Events or CDC (Change Data Capture) sync them. Scales reads independently.
Upsert Patterns
-- SQL Server: MERGE statement (insert or update)
MERGE INTO products AS target
USING product_updates AS source
ON (target.sku = source.sku)
WHEN MATCHED THEN
UPDATE SET price = source.price, name = source.name
WHEN NOT MATCHED THEN
INSERT (sku, name, price) VALUES (source.sku, source.name, source.price)
WHEN NOT MATCHED BY SOURCE THEN
DELETE; -- remove products no longer in source
-- PostgreSQL: INSERT ... ON CONFLICT (cleaner for simple upserts)
INSERT INTO products (sku, name, price)
VALUES ('ABC-1', 'Widget', 9.99)
ON CONFLICT (sku)
DO UPDATE SET
name = EXCLUDED.name,
price = EXCLUDED.price;
SQL
Performance Checklist for Enterprise Schemas
- Every FK has a supporting index — unindexed FKs cause full scans on cascade operations.
- Use BIGINT not INT for high-volume PKs — INT overflows at ~2.1B rows.
- Store UTC everywhere — convert to local time in the application layer.
- Never use NTEXT/TEXT/IMAGE — replaced by NVARCHAR(MAX) / VARCHAR(MAX) / VARBINARY(MAX) since SQL Server 2005.
- Avoid implicit conversions —
WHERE varchar_col = 1causes a type cast on every row; the index is ignored. - Partition large tables by date — enables partition elimination and fast archival via partition switch.
- Enable Read Committed Snapshot Isolation (RCSI) on write-heavy OLTP databases — eliminates reader/writer blocking with a small tempdb overhead.
- Monitor wait statistics —
sys.dm_os_wait_stats(SQL Server) /pg_stat_activity(PostgreSQL) reveal bottlenecks faster than guessing. - Archive old data before it grows — partitioning + partition switch moves 100M rows in milliseconds. Deleting 100M rows takes hours.
- Rebuild / reorganise fragmented indexes — above 30% fragmentation rebuild, 5–30% reorganise. Schedule during off-peak hours.