SQL — Enterprise Reference

Foundations

SQL Command Categories

DDL — Data Definition

CREATE, ALTER, DROP, TRUNCATE. Defines and modifies schema structure. Auto-committed in most RDBMS.

DML — Data Manipulation

SELECT, INSERT, UPDATE, DELETE. Reads and modifies row data. Participates in transactions.

DCL — Data Control

GRANT, REVOKE. Manages user permissions and access control at the object level.

TCL — Transaction Control

BEGIN, COMMIT, ROLLBACK, SAVEPOINT. Controls the boundary and outcome of transactions.

DQL — Data Query

Some taxonomies separate SELECT into its own category. It is the most complex and performance-critical statement.

Execution Order

FROM → JOIN → WHERE → GROUP BY → HAVING → SELECT → DISTINCT → ORDER BY → LIMIT

Core Data Types

Category	SQL Server	PostgreSQL	Use when
Integers	`INT`, `BIGINT`, `SMALLINT`	`integer`, `bigint`, `smallint`	Counts, IDs, flags. Prefer BIGINT for PKs at scale.
Exact decimal	`DECIMAL(p,s)`, `NUMERIC(p,s)`	`numeric(p,s)`	Money, measurements. Never use FLOAT for currency.
Floating point	`FLOAT`, `REAL`	`double precision`, `real`	Scientific data. Expect rounding errors.
Strings	`VARCHAR(n)`, `NVARCHAR(n)`, `CHAR(n)`	`varchar(n)`, `text`, `char(n)`	VARCHAR for variable, CHAR for fixed-width codes.
Dates/Times	`DATETIME2`, `DATE`, `TIME`, `DATETIMEOFFSET`	`timestamp`, `timestamptz`, `date`, `time`	Always store in UTC. Use DATETIMEOFFSET/timestamptz for global apps.
Boolean	`BIT`	`boolean`	SQL Server has no native BOOL — use BIT (0/1).
Semi-structured	`JSON` (stored as string), `XML`	`json`, `jsonb`	PostgreSQL's JSONB is binary and indexable — prefer it over json.
Identity / UUID	`IDENTITY(1,1)`, `UNIQUEIDENTIFIER`	`SERIAL`, `BIGSERIAL`, `uuid`	Use UUIDs when rows are merged across databases; INT otherwise (smaller, faster index).

NULL Semantics — The Three-Valued Logic

SQL uses three-valued logic: TRUE, FALSE, and UNKNOWN. Any comparison with NULL yields UNKNOWN, which is treated as FALSE in WHERE clauses. This causes common bugs.

-- These never return rows — comparisons with NULL are UNKNOWN
SELECT * FROM orders WHERE deleted_at = NULL;   -- WRONG
SELECT * FROM orders WHERE deleted_at != NULL;  -- WRONG

-- Correct NULL checks
SELECT * FROM orders WHERE deleted_at IS NULL;
SELECT * FROM orders WHERE deleted_at IS NOT NULL;

-- COALESCE returns first non-NULL value
SELECT COALESCE(phone, mobile, 'N/A') AS contact
FROM customers;

-- NULLIF returns NULL when both args are equal (avoids div-by-zero)
SELECT revenue / NULLIF(quantity, 0) AS unit_price
FROM sales;
SQL

Enterprise rule: Declare columns NOT NULL wherever possible. NULLable columns complicate joins, aggregations, and indexes. A NULL in a composite index is treated differently across databases — SQL Server indexes NULLs, PostgreSQL includes them in partial indexes.

Constraints

Declarative Integrity

PRIMARY KEY — unique + NOT NULL. Every table should have one.
FOREIGN KEY — referential integrity. Prevents orphaned rows.
UNIQUE — one or more columns form a natural key.
CHECK — arbitrary expression must be TRUE (e.g. price > 0).
DEFAULT — supplies a value when INSERT omits the column.
NOT NULL — prevents NULL storage for that column.

FK Cascade Strategies

ALTER TABLE order_items
  ADD CONSTRAINT fk_order
  FOREIGN KEY (order_id)
  REFERENCES orders(id)
  ON DELETE CASCADE    -- auto-delete children
  ON UPDATE NO ACTION; -- block PK changes

-- Other options: SET NULL, SET DEFAULT, RESTRICT
SQL

Joins, Subqueries & CTEs

Join Types

Join	Returns	When to use
`INNER JOIN`	Only rows with a match in both tables	The default. Use when both sides must exist.
`LEFT JOIN`	All rows from left + matched rows from right (NULL if no match)	Optional relationship — e.g. customers with or without orders.
`RIGHT JOIN`	All rows from right + matched rows from left	Rare. Swap table order and use LEFT JOIN instead for readability.
`FULL OUTER JOIN`	All rows from both; NULL where no match	Reconciliation / diff queries between two datasets.
`CROSS JOIN`	Cartesian product (every combination)	Generating test data, date/time grids. Dangerous on large tables.
`SELF JOIN`	Table joined to itself with aliases	Hierarchies, org charts, adjacency lists.

-- LEFT JOIN with NULL check to find missing relationships
SELECT c.id, c.name
FROM       customers  c
LEFT JOIN  orders     o  ON o.customer_id = c.id
WHERE o.id IS NULL;  -- customers with NO orders

-- Self join: employee + their manager
SELECT
    e.name  AS employee,
    m.name  AS manager
FROM       employees e
LEFT JOIN  employees m  ON m.id = e.manager_id;
SQL

Common Table Expressions (CTEs)

CTEs are named subqueries defined with WITH. They improve readability, allow self-reference (recursive CTEs), and in many databases are evaluated once and reused.

-- Non-recursive CTE: clean up a complex SELECT
WITH
  monthly_revenue AS (
    SELECT
      DATE_TRUNC('month', order_date) AS month,
      SUM(amount)                       AS revenue
    FROM  orders
    GROUP BY 1
  ),
  prev_month AS (
    SELECT
      month,
      revenue,
      LAG(revenue) OVER (ORDER BY month) AS prev_revenue
    FROM monthly_revenue
  )
SELECT
  month,
  revenue,
  ROUND((revenue - prev_revenue) * 100.0 / prev_revenue, 2) AS pct_change
FROM prev_month
ORDER BY month;
SQL

-- Recursive CTE: walk an org-chart hierarchy
WITH RECURSIVE org_tree AS (
  -- Anchor: start from the CEO
  SELECT id, name, manager_id, 0 AS depth
  FROM   employees
  WHERE  manager_id IS NULL

  UNION ALL

  -- Recursive step: each employee's reports
  SELECT e.id, e.name, e.manager_id, t.depth + 1
  FROM   employees  e
  JOIN   org_tree   t  ON t.id = e.manager_id
)
SELECT
  REPEAT('  ', depth) || name AS hierarchy
FROM  org_tree
ORDER BY depth, name;
SQL — PostgreSQL

Subquery vs CTE vs Temp Table: Use CTEs for readability on one-off queries. Use temp tables (#tmp / TEMP TABLE) when the intermediate result is large and queried multiple times — the engine will materialise it once and build statistics on it, which can dramatically improve plan quality.

Aggregation & Grouping

Aggregate Functions

COUNT

COUNT(*) counts rows. COUNT(col) ignores NULLs. COUNT(DISTINCT col) counts unique non-NULL values.

SUM / AVG

Both ignore NULLs. AVG divides by non-NULL count, not total rows — can be surprising with sparse data.

MIN / MAX

Work on any orderable type (numbers, strings, dates). Also ignore NULLs.

STRING_AGG

STRING_AGG(col, ', ') concatenates values into a comma-separated list. Supports WITHIN GROUP (ORDER BY ...) in SQL Server.

GROUPING SETS

Compute multiple groupings in one pass: GROUPING SETS ((a,b), (a), ()) gives subtotals and grand totals without UNION ALL.

ROLLUP / CUBE

ROLLUP(a,b,c) produces hierarchical totals. CUBE(a,b) produces all possible combinations of groupings.

-- HAVING filters on the result of aggregation
-- WHERE filters before aggregation — do not confuse them
SELECT
    customer_id,
    COUNT(*)                                          AS order_count,
    SUM(amount)                                       AS total_spent,
    AVG(amount)                                       AS avg_order,
    MAX(order_date)                                   AS last_order,
    STRING_AGG(status, ', ' ORDER BY order_date)    AS status_history
FROM    orders
WHERE   order_date >= '2024-01-01'
GROUP BY customer_id
HAVING  SUM(amount) > 1000    -- only high-value customers
ORDER BY total_spent DESC;
SQL

-- ROLLUP: sales totals by region + country, with subtotals
SELECT
    COALESCE(region, 'ALL REGIONS') AS region,
    COALESCE(country, 'ALL')         AS country,
    SUM(revenue)                      AS total
FROM  sales
GROUP BY ROLLUP(region, country)
ORDER BY region, country;
SQL

Window Functions

Window functions compute a value across a set of rows related to the current row without collapsing them. Unlike GROUP BY, the original rows are preserved.

Anatomy: function() OVER (PARTITION BY ... ORDER BY ... ROWS/RANGE BETWEEN ...). All three clauses are optional. No PARTITION BY = the whole result set is one window.

Ranking Functions

SELECT
    name,
    department,
    salary,
    ROW_NUMBER()  OVER (PARTITION BY department ORDER BY salary DESC) AS row_num,
    RANK()        OVER (PARTITION BY department ORDER BY salary DESC) AS rank,
    DENSE_RANK()  OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank,
    NTILE(4)      OVER (PARTITION BY department ORDER BY salary DESC) AS quartile
FROM  employees;

-- ROW_NUMBER: always unique (1,2,3,4,5)
-- RANK:       ties share same rank, gaps after (1,1,3,4,5)
-- DENSE_RANK: ties share same rank, no gaps   (1,1,2,3,4)
SQL

Offset & Frame Functions

SELECT
    order_date,
    revenue,
    -- Prior and next row values
    LAG(revenue, 1, 0)  OVER (ORDER BY order_date) AS prev_revenue,
    LEAD(revenue, 1, 0) OVER (ORDER BY order_date) AS next_revenue,
    -- First and last value in partition
    FIRST_VALUE(revenue) OVER (PARTITION BY customer_id ORDER BY order_date
                          ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS first_order_value,
    -- 7-day rolling average
    AVG(revenue) OVER (ORDER BY order_date
                   ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS rolling_7d_avg,
    -- Running total
    SUM(revenue)  OVER (ORDER BY order_date
                   ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM  daily_sales;
SQL

ROWS vs RANGE: ROWS BETWEEN refers to physical row positions. RANGE BETWEEN groups all rows with the same ORDER BY value together. Use ROWS for rolling calculations — RANGE can include unexpected extra rows when there are ties.

Practical Pattern — Top-N Per Group

-- Top 3 products by revenue, per category
WITH ranked AS (
    SELECT
        category,
        product_name,
        revenue,
        ROW_NUMBER() OVER (PARTITION BY category ORDER BY revenue DESC) AS rn
    FROM  products
)
SELECT  category, product_name, revenue
FROM    ranked
WHERE   rn <= 3;
SQL

Indexes & Access Patterns

Index Internals

Most indexes are B-Tree structures. Each leaf page contains the indexed value(s) plus a pointer to the actual row. Lookups are O(log n); range scans walk leaf pages in order.

Clustered Index (SQL Server) / Heap Table with PK (Postgres)

One per table — the data pages ARE the index.
Row data is physically sorted by the clustered key.
Range scans and ORDER BY on the key are fast — sequential I/O.
Inserts on a monotone key (IDENTITY) avoid page splits. Random UUIDs cause page splits — fragmentation.

Non-Clustered Index

Separate structure with key + row locator (RID or clustered key).
Multiple per table (up to 999 in SQL Server).
A Key Lookup back to the base table is needed for non-indexed columns.
Add an INCLUDE clause to make it a covering index — avoids the lookup entirely.

Creating Effective Indexes

-- Basic index
CREATE INDEX idx_orders_customer
    ON orders (customer_id);

-- Composite index — column order matters!
-- Best for: WHERE customer_id = ? AND status = ?
--       or: WHERE customer_id = ? (prefix match)
CREATE INDEX idx_orders_cust_status
    ON orders (customer_id, status);

-- Covering index: avoids going back to base table
CREATE INDEX idx_orders_covering
    ON orders (customer_id, order_date)
    INCLUDE   (amount, status);  -- SQL Server / PostgreSQL

-- Partial index (PostgreSQL): index only active orders
CREATE INDEX idx_active_orders
    ON orders (customer_id)
    WHERE status = 'active';

-- Filtered index (SQL Server equivalent)
CREATE INDEX idx_active_orders
    ON orders (customer_id)
    WHERE status = 'active';  -- SQL Server 2008+
SQL

The Selectivity & SARGability Rules

Index Selectivity

An index is useful when it filters out most rows. A column with 2 distinct values (boolean) is low-selectivity — a full scan may be faster. A column with millions of distinct values (email, UUID) is high-selectivity — the index shines.

Rule of thumb: if a query returns >5–15% of rows, the optimiser will likely ignore the index and scan.

SARGable Predicates

A predicate is SARGable (Search ARGument Able) if the index can be used directly. Wrapping a column in a function breaks SARGability.

WHERE YEAR(order_date) = 2024 — function on column, non-SARGable
WHERE LOWER(email) = 'a@b.com' — non-SARGable
WHERE amount + 10 > 100 — non-SARGable

WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01'
WHERE email = LOWER('A@B.COM') — move the function to the constant
WHERE amount > 90

Index over-indexing: Every index slows down INSERT, UPDATE, and DELETE because all indexes on the table must be maintained. Audit unused indexes regularly — in SQL Server use sys.dm_db_index_usage_stats; in PostgreSQL use pg_stat_user_indexes.

Transactions, ACID & Isolation

ACID Properties

Atomicity

All statements in a transaction succeed together or all are rolled back. No partial updates.

Consistency

A transaction can only bring the database from one valid state to another. All constraints, cascades, and triggers fire.

Isolation

Concurrent transactions appear to execute serially. The degree of isolation is tunable — see isolation levels.

Durability

Committed data survives crashes. Achieved via write-ahead logging (WAL / transaction log) and checkpoints.

Write-Ahead Log

Every change is written to the WAL before the data page. On crash, replay the WAL from the last checkpoint to recover.

SAVEPOINT

Partial rollback within a transaction without aborting the whole thing. Useful for loop-based batch processing.

Isolation Levels & Read Phenomena

Isolation Level	Dirty Read	Non-Repeatable Read	Phantom Read	Use when
`READ UNCOMMITTED`	✓ Possible	✓ Possible	✓ Possible	Approximate analytics only. Never for financial data.
`READ COMMITTED`	✗ Prevented	✓ Possible	✓ Possible	Default in SQL Server & PostgreSQL. Most OLTP workloads.
`REPEATABLE READ`	✗	✗ Prevented	✓ Possible	When you re-read the same rows and need consistent results.
`SERIALIZABLE`	✗	✗	✗ Prevented	Full isolation. Highest correctness, lowest throughput.
`SNAPSHOT` (RCSI)	✗	✗	✗	SQL Server's MVCC mode. Readers don't block writers. Best for mixed OLTP/reporting.

-- Setting isolation level for a session
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

BEGIN TRANSACTION;
    UPDATE accounts SET balance = balance - 500
    WHERE  id = 1;

    UPDATE accounts SET balance = balance + 500
    WHERE  id = 2;

    -- Only commits if BOTH updates succeed
COMMIT;
SQL Server

Locking & Deadlocks

Lock Types

Shared (S) — acquired by reads. Compatible with other S locks.
Exclusive (X) — acquired by writes. Blocks all other locks.
Update (U) — intent to update. Prevents U+U deadlocks during read-then-write.
Intent (IS/IX) — table-level signal that a row/page is locked below.
Key-Range — used by SERIALIZABLE to prevent phantom inserts.

Preventing Deadlocks

Access tables in the same order — most deadlocks come from T1 locking A then B while T2 locks B then A.
Keep transactions short — minimise the window where locks are held.
Use NOLOCK only for dirty-read-safe reporting — never for financial writes.
Enable RCSI / Snapshot Isolation — readers never block writers in MVCC mode.
Retry on deadlock — SQL Server error 1205; catch it and retry the transaction.

Query Optimisation

Reading Execution Plans

The query optimiser generates a plan — a tree of physical operators. Read the plan right-to-left, bottom-up. Arrows between operators show row flow; width indicates estimated rows.

Table Scan

Reads every row. Expected on small tables or low-selectivity queries. Bad on large tables — look for a missing index.

Index Seek

Uses the B-Tree to jump directly to matching rows. The goal. Appears when a SARGable predicate matches an index.

Index Scan

Reads all rows in the index. Better than a table scan if the index is covering. Consider a narrower predicate.

Key Lookup

Follows a pointer from a non-clustered index back to the base table. Expensive when repeated thousands of times — add INCLUDE columns to eliminate it.

Hash Join

Builds an in-memory hash table from the smaller input. Good for large unordered sets. Spills to disk if memory is exceeded — look for memory grant warnings.

Nested Loops

For each row in the outer input, probe the inner. Excellent when outer is small and inner has an index seek. Terrible with large outer inputs.

-- SQL Server: show estimated plan
SET SHOWPLAN_XML ON;

-- SQL Server: capture actual plan + stats
SET STATISTICS IO ON;
SET STATISTICS TIME ON;

-- PostgreSQL: full actual execution details
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT * FROM orders WHERE customer_id = 42;
SQL

Common Optimisation Patterns

-- 1. Avoid SELECT * in production — column projection reduces I/O
SELECT id, name, email  -- not SELECT *
FROM   customers
WHERE  status = 'active';

-- 2. EXISTS is faster than IN for correlated subqueries
--    (stops as soon as first match is found)
SELECT * FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o
    WHERE o.customer_id = c.id
      AND o.amount > 1000
);

-- 3. Avoid row-by-row CURSOR operations — use set-based UPDATE
UPDATE orders
SET    status = 'expired'
WHERE  order_date < DATEADD(day, -90, GETDATE())
  AND  status     = 'pending';

-- 4. Batch large deletes to avoid log bloat & long lock holds
WHILE 1 = 1
BEGIN
    DELETE TOP (5000) FROM audit_log
    WHERE created_at < '2023-01-01';
    IF @@ROWCOUNT = 0 BREAK;
END;
SQL Server

Statistics matter as much as indexes. The optimiser uses column statistics to estimate row counts. Stale statistics produce bad plans. Run UPDATE STATISTICS after bulk loads. In PostgreSQL, ANALYZE tablename.

Stored Procedures, Functions & Views

Stored Procedures vs Functions

Stored Procedure

Can perform DML (INSERT/UPDATE/DELETE) and DDL.
Can have multiple result sets and output parameters.
Can call COMMIT / ROLLBACK.
Cannot be called inside a SELECT statement.
Use for complex business logic, ETL, batch operations.

User-Defined Function (UDF)

Returns a single scalar value or a table.
Can be called inside SELECT, WHERE, JOIN.
Scalar UDFs are notoriously slow in SQL Server — they inhibit parallelism and execute row-by-row.
Prefer inline table-valued functions (iTVFs) — they are inlined into the query plan like a macro.

-- Inline Table-Valued Function (fast — inlined by optimiser)
CREATE FUNCTION dbo.GetOrdersByCustomer
    (@customer_id INT, @start_date DATE)
RETURNS TABLE
AS
RETURN (
    SELECT id, order_date, amount, status
    FROM   orders
    WHERE  customer_id = @customer_id
      AND  order_date  >= @start_date
);
GO

-- Usage: joins neatly with other tables
SELECT c.name, o.amount
FROM   customers c
CROSS APPLY dbo.GetOrdersByCustomer(c.id, '2024-01-01') o;
SQL Server

Views & Materialised Views

Regular View

A named, saved SELECT — no data is stored.
Always reflects current data — no staleness.
Can be slower: the underlying query runs every time.
Use for abstraction and security (hide raw table structure).

Materialised View / Indexed View

Results are physically stored and kept in sync.
Dramatically faster reads on complex aggregations.
Writes are slower — every base table INSERT/UPDATE/DELETE must update the materialised view.
SQL Server: Indexed View. PostgreSQL: MATERIALIZED VIEW (refreshed manually with REFRESH MATERIALIZED VIEW).

Triggers — Use Sparingly

Triggers fire invisibly — they are hard to debug, complicate migrations, and cause unexpected latency on bulk operations. Prefer application-layer logic or computed columns. If you must use triggers, keep them simple and always consider the INSERTED / DELETED pseudo-tables that SQL Server provides.

-- AFTER INSERT trigger: maintain an audit log
CREATE TRIGGER trg_orders_audit
ON orders
AFTER INSERT
AS
BEGIN
    SET NOCOUNT ON;
    INSERT INTO orders_audit (order_id, action, actioned_at)
    SELECT id, 'INSERT', GETUTCDATE()
    FROM   INSERTED;  -- INSERTED is a virtual table of new rows
END;
SQL Server

Normalization & Schema Design

Normal Forms

Normal Form	Rule	Violation example
1NF	Each column holds one atomic value; no repeating groups.	Storing `tag1,tag2,tag3` in a single `tags` column.
2NF	In 1NF + every non-key column depends on the entire primary key (matters only with composite PKs).	`order_items(order_id, product_id, product_name)` — product_name depends only on product_id.
3NF	In 2NF + no non-key column depends on another non-key column (no transitive dependencies).	`employees(id, dept_id, dept_name)` — dept_name depends on dept_id, not on id.
BCNF	Stricter 3NF — every determinant must be a candidate key.	Rare in practice. Mostly theoretical for multi-valued dependencies.

When to Denormalise

Normalisation is a correctness tool, not an absolute law. Denormalise deliberately when:

Read-heavy reporting

Pre-aggregate or flatten data into a summary table or data warehouse fact table. Avoids expensive joins on every report query. Populate via nightly ETL or event-driven triggers.

Example: store order_count and lifetime_value directly on the customers row and update them on each order.

Audit / Snapshot columns

Duplicate the name of the product at the time of purchase onto the order line item rather than joining through a product table that could change. Preserves historical accuracy.

Pattern: order_items(product_id, product_name_snapshot, unit_price_snapshot).

Surrogate vs Natural Keys

Surrogate Key (INT IDENTITY / BIGSERIAL)

System-generated, meaningless. Stable — never changes.
Small (4–8 bytes). Ideal for clustered indexes — monotone inserts, no page splits.
Opaque to users — doesn't leak business data.

Natural Key (email, order_number)

Business-meaningful. Self-documenting.
Wider — more bytes in every FK and join.
Can change (email address changes break FK chains).
Use as a UNIQUE constraint alongside a surrogate PK.

Enterprise Patterns

Table Partitioning

Partitioning splits a large table into smaller physical segments while keeping a single logical table. The most common strategy is range partitioning on date — each month or year lives on its own partition.

Benefits

Partition elimination — queries with a date range predicate only scan relevant partitions, not the full table.
Fast archival — switch an old partition to an archive table in milliseconds (metadata-only operation).
Parallel scan — the engine can scan partitions in parallel.

Watch-outs

Partition key must be in every index and every query predicate for elimination to work.
Cross-partition queries are slower than a single-partition scan.
Most small tables don't benefit — gains become real above ~100M rows.

Temporal Tables (System-Versioned)

SQL Server 2016+ and PostgreSQL (via extensions) support temporal tables — the database automatically tracks the full history of every row change. No manual audit log needed.

-- Create a system-versioned temporal table
CREATE TABLE products (
    id           INT              PRIMARY KEY,
    name         NVARCHAR(200)   NOT NULL,
    price        DECIMAL(10,2)   NOT NULL,
    valid_from   DATETIME2        GENERATED ALWAYS AS ROW START,
    valid_to     DATETIME2        GENERATED ALWAYS AS ROW END,
    PERIOD FOR SYSTEM_TIME (valid_from, valid_to)
)
WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.products_history));

-- Query the state of a row at a point in time
SELECT * FROM products
FOR SYSTEM_TIME AS OF '2024-06-01'
WHERE id = 42;
SQL Server

JSON in SQL Server & PostgreSQL

-- SQL Server: extract from JSON column
SELECT
    id,
    JSON_VALUE(metadata, '$.shipping.city')   AS city,
    JSON_VALUE(metadata, '$.shipping.country') AS country
FROM  orders
WHERE JSON_VALUE(metadata, '$.priority') = 'high';

-- PostgreSQL JSONB: index a specific path
CREATE INDEX idx_metadata_priority
    ON orders ((metadata ->> 'priority'));

SELECT *
FROM  orders
WHERE metadata ->> 'priority' = 'high';  -- uses the index
SQL

High Availability & Replication Patterns

SQL Server Always On AG

Synchronous or asynchronous replicas. Automatic failover on sync replicas. Readable secondaries for reporting offload.

PostgreSQL Streaming Replication

WAL shipped to standbys continuously. Hot standbys are readable. Tools like Patroni automate failover and leader election.

Read Replicas

Route SELECT queries to replicas to scale reads horizontally. Introduces replication lag — applications must tolerate stale reads or route writes + immediate reads to primary.

Connection Pooling

PgBouncer (PostgreSQL) / SQL Server connection pool (built-in). Reduces connection overhead — databases handle hundreds of connections poorly; poolers handle thousands.

Sharding

Horizontally split data across multiple database servers by shard key (e.g. customer_id % N). Eliminates cross-shard joins — design your access patterns first.

CQRS at the DB Layer

Separate write model (normalised OLTP) from read model (denormalised, pre-aggregated). Events or CDC (Change Data Capture) sync them. Scales reads independently.

Upsert Patterns

-- SQL Server: MERGE statement (insert or update)
MERGE INTO products         AS target
USING      product_updates  AS source
    ON (target.sku = source.sku)
WHEN MATCHED THEN
    UPDATE SET price = source.price, name = source.name
WHEN NOT MATCHED THEN
    INSERT (sku, name, price) VALUES (source.sku, source.name, source.price)
WHEN NOT MATCHED BY SOURCE THEN
    DELETE;  -- remove products no longer in source

-- PostgreSQL: INSERT ... ON CONFLICT (cleaner for simple upserts)
INSERT INTO products (sku, name, price)
VALUES ('ABC-1', 'Widget', 9.99)
ON CONFLICT (sku)
DO UPDATE SET
    name  = EXCLUDED.name,
    price = EXCLUDED.price;
SQL

Performance Checklist for Enterprise Schemas

Every FK has a supporting index — unindexed FKs cause full scans on cascade operations.
Use BIGINT not INT for high-volume PKs — INT overflows at ~2.1B rows.
Store UTC everywhere — convert to local time in the application layer.
Never use NTEXT/TEXT/IMAGE — replaced by NVARCHAR(MAX) / VARCHAR(MAX) / VARBINARY(MAX) since SQL Server 2005.
Avoid implicit conversions — WHERE varchar_col = 1 causes a type cast on every row; the index is ignored.
Partition large tables by date — enables partition elimination and fast archival via partition switch.
Enable Read Committed Snapshot Isolation (RCSI) on write-heavy OLTP databases — eliminates reader/writer blocking with a small tempdb overhead.
Monitor wait statistics — sys.dm_os_wait_stats (SQL Server) / pg_stat_activity (PostgreSQL) reveal bottlenecks faster than guessing.
Archive old data before it grows — partitioning + partition switch moves 100M rows in milliseconds. Deleting 100M rows takes hours.
Rebuild / reorganise fragmented indexes — above 30% fragmentation rebuild, 5–30% reorganise. Schedule during off-peak hours.