Advanced PostgreSQL Interview Questions and Answers

🐘 Advanced PostgreSQL Interview Questions

This lesson targets mid-to-senior roles. Topics include CTEs, window functions, JSONB, full-text search, advanced indexing, MVCC, transaction isolation levels, stored procedures, triggers, partitioning, and performance tuning. These questions separate SQL writers from database architects.

Questions & Answers

01 What are CTEs (Common Table Expressions)? What is a recursive CTE? ►

SQL A CTE is a named, temporary result set defined at the beginning of a query with the WITH keyword. It improves readability and can be referenced multiple times in the same query.

-- Basic CTE
WITH high_value_orders AS (
    SELECT customer_id, SUM(total) AS lifetime_value
    FROM orders
    WHERE status = 'delivered'
    GROUP BY customer_id
    HAVING SUM(total) > 5000
)
SELECT c.name, h.lifetime_value
FROM high_value_orders h
JOIN customers c ON c.id = h.customer_id
ORDER BY h.lifetime_value DESC;

-- Multiple CTEs
WITH
  monthly_sales AS (SELECT DATE_TRUNC('month', created_at) AS month, SUM(total) AS revenue FROM orders GROUP BY 1),
  avg_monthly   AS (SELECT AVG(revenue) AS avg_rev FROM monthly_sales)
SELECT ms.month, ms.revenue, am.avg_rev,
       ms.revenue - am.avg_rev AS diff_from_avg
FROM monthly_sales ms CROSS JOIN avg_monthly am
ORDER BY ms.month;

-- Recursive CTE — traverse hierarchies (org charts, categories, graphs)
WITH RECURSIVE org_tree AS (
    -- Base case: top-level employees (no manager)
    SELECT id, name, manager_id, 0 AS depth, name::TEXT AS path
    FROM employees WHERE manager_id IS NULL

    UNION ALL

    -- Recursive case: join each employee to their manager's row
    SELECT e.id, e.name, e.manager_id, ot.depth + 1,
           ot.path || ' > ' || e.name
    FROM employees e
    JOIN org_tree ot ON ot.id = e.manager_id
)
SELECT depth, path FROM org_tree ORDER BY path;

02 What are window functions? Give practical examples. ►

SQL Window functions perform calculations across a set of rows related to the current row — without collapsing them into one (unlike aggregate functions). They use the OVER() clause.

SELECT
    name,
    department,
    salary,

    -- Ranking functions
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num,
    RANK()       OVER (PARTITION BY department ORDER BY salary DESC) AS rank,
    DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank,
    NTILE(4)     OVER (ORDER BY salary)                              AS quartile,
    PERCENT_RANK() OVER (ORDER BY salary)                           AS pct_rank,

    -- Aggregate as window
    SUM(salary)  OVER (PARTITION BY department)                     AS dept_total,
    AVG(salary)  OVER (PARTITION BY department)                     AS dept_avg,
    COUNT(*)     OVER (PARTITION BY department)                     AS dept_headcount,

    -- Running total (cumulative sum)
    SUM(salary)  OVER (ORDER BY hire_date
                       ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total,

    -- 3-row moving average
    AVG(salary)  OVER (ORDER BY hire_date
                       ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)   AS moving_avg,

    -- Lead/Lag — access adjacent rows
    LAG(salary, 1)  OVER (PARTITION BY department ORDER BY hire_date) AS prev_salary,
    LEAD(salary, 1) OVER (PARTITION BY department ORDER BY hire_date) AS next_salary,

    -- First/Last value in partition
    FIRST_VALUE(name) OVER (PARTITION BY department ORDER BY salary DESC) AS top_earner

FROM employees;

03 What is JSONB in PostgreSQL? How does it differ from JSON? ►

Data Types

JSON — stores an exact text copy of the JSON input. Preserves whitespace, key order, and duplicate keys. Re-parses on every access. Faster to write.
JSONB — stores parsed, binary representation. Removes whitespace, eliminates duplicate keys (last value wins), normalises key order. Faster to read and supports indexing. Preferred in almost all cases.

CREATE TABLE products (
    id       BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    name     TEXT NOT NULL,
    metadata JSONB NOT NULL DEFAULT '{}'
);

INSERT INTO products (name, metadata) VALUES
('Widget', '{"colour": "red", "sizes": [1,2,3], "specs": {"weight": 0.5}}');

-- Access operators
SELECT metadata->'colour'              FROM products; -- "red" (JSON text)
SELECT metadata->>'colour'             FROM products; -- red  (text, unquoted)
SELECT metadata->'specs'->>'weight'    FROM products; -- 0.5
SELECT metadata#>>'{specs,weight}'     FROM products; -- path operator: 0.5

-- Test existence
SELECT * FROM products WHERE metadata ? 'colour';          -- key exists
SELECT * FROM products WHERE metadata ?| ARRAY['colour','size']; -- any key
SELECT * FROM products WHERE metadata ?& ARRAY['colour','specs']; -- all keys

-- Containment
SELECT * FROM products WHERE metadata @> '{"colour": "red"}';

-- Modify JSONB
UPDATE products SET metadata = metadata || '{"on_sale": true}';        -- merge/overwrite keys
UPDATE products SET metadata = metadata - 'on_sale';                    -- remove key
UPDATE products SET metadata = jsonb_set(metadata, '{specs,weight}', '0.7'); -- update nested

-- GIN index for fast JSONB queries
CREATE INDEX idx_products_metadata ON products USING GIN (metadata);
CREATE INDEX idx_products_colour ON products USING GIN ((metadata -> 'colour'));

04 What is MVCC (Multi-Version Concurrency Control) in PostgreSQL? ►

Internals MVCC is PostgreSQL’s mechanism for handling concurrent access without locking readers. Instead of locking rows during a read, PostgreSQL maintains multiple versions of each row. Readers see a consistent snapshot of data as of when their transaction began — they never block writers and writers never block readers.

How it works:

Every row version has a xmin (transaction that created it) and xmax (transaction that deleted/updated it)
When you UPDATE a row, PostgreSQL writes a new row version — the old version is preserved for concurrent transactions
Each transaction sees only row versions created before it started (depending on isolation level)
Old row versions become “dead tuples” when no transaction needs them anymore
VACUUM reclaims space used by dead tuples and updates statistics

-- See MVCC internals
SELECT xmin, xmax, ctid, * FROM orders WHERE id = 1;
-- ctid = physical location (page, offset) of the row version

-- Check for bloat (too many dead tuples)
SELECT relname, n_dead_tup, n_live_tup,
       ROUND(n_dead_tup::NUMERIC / NULLIF(n_live_tup + n_dead_tup, 0) * 100, 2) AS dead_pct
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

05 What are transaction isolation levels in PostgreSQL? ►

Transactions Isolation levels control what concurrent transaction data a transaction can see. Higher isolation = more consistency, more potential for serialisation conflicts.

READ UNCOMMITTED — PostgreSQL treats this as READ COMMITTED. Dirty reads are never allowed.
READ COMMITTED (default) — each statement sees only committed data as of its own start time. A second SELECT in the same transaction may see different data if another transaction committed between the two SELECTs.
REPEATABLE READ — the entire transaction sees the snapshot taken at its first statement. Prevents non-repeatable reads. Phantom reads prevented in PostgreSQL (stronger than SQL standard requires).
SERIALIZABLE — strictest. Transactions appear to execute one at a time. Detects and aborts transactions that would produce a result inconsistent with serial execution. Uses Serializable Snapshot Isolation (SSI) — predicate locking.

-- Set isolation level for a transaction
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT balance FROM accounts WHERE id = 1;  -- snapshot taken here
-- ... some time passes, another TX updates the balance ...
SELECT balance FROM accounts WHERE id = 1;  -- still sees original value
COMMIT;

-- Serializable — prevents write skew anomalies
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- If your TX conflicts with another's READ predicate, one will be rolled back
-- with ERROR: could not serialize access due to concurrent update
COMMIT;

06 What are the PostgreSQL index types and when do you use each? ►

Indexes

B-tree (default) — balanced tree for equality (=), range (<, >, BETWEEN), sorting (ORDER BY). Works for most cases.
Hash — faster than B-tree for equality only (=). Smaller than B-tree. Use when only equality lookups are needed.
GIN (Generalised Inverted Index) — for types with multiple component values: arrays, JSONB, full-text search (tsvector). Handles containment (@>), existence (?) operators.
GiST (Generalised Search Tree) — flexible framework for geometric data, ranges, full-text search. Supports overlap, containment, distance operators.
SP-GiST — space-partitioned GiST; good for non-balanced data structures (quadtree, k-d tree).
BRIN (Block Range Index) — very small index for naturally ordered large tables (timestamps, sequential IDs). Stores min/max per block range — approximate but tiny.

-- B-tree (default, most common)
CREATE INDEX ON orders (created_at);

-- GIN for JSONB and arrays
CREATE INDEX ON products USING GIN (metadata);
CREATE INDEX ON articles USING GIN (tags);        -- TEXT[]

-- GIN for full-text search
CREATE INDEX ON articles USING GIN (to_tsvector('english', content));

-- BRIN for large append-only tables (e.g., logs table, ordered by time)
CREATE INDEX ON server_logs USING BRIN (logged_at) WITH (pages_per_range = 128);

-- Hash for equality-only lookups
CREATE INDEX ON sessions USING HASH (session_token);

07 What is full-text search in PostgreSQL? ►

Search PostgreSQL has built-in full-text search using tsvector (document representation) and tsquery (search query).

-- Convert text to tsvector (tokenises, removes stop words, stems)
SELECT to_tsvector('english', 'The Quick Brown Foxes Jumped Over');
-- 'brown':3 'fox':4 'jump':5 'quick':2

-- Convert query to tsquery
SELECT to_tsquery('english', 'quick & fox');    -- quick AND fox
SELECT plainto_tsquery('english', 'quick fox');  -- simpler: auto AND
SELECT websearch_to_tsquery('english', '"quick fox" OR lazy'); -- phrase + OR

-- Search
SELECT title FROM articles
WHERE to_tsvector('english', title || ' ' || body) @@ to_tsquery('english', 'postgresql & performance');

-- Store tsvector in a generated column (recommended for performance)
ALTER TABLE articles
ADD COLUMN fts tsvector GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(body, '')), 'B')
) STORED;

CREATE INDEX idx_articles_fts ON articles USING GIN (fts);

SELECT title,
       ts_rank(fts, query)                 AS rank,
       ts_headline('english', body, query) AS excerpt
FROM articles, to_tsquery('english', 'postgresql') AS query
WHERE fts @@ query
ORDER BY rank DESC
LIMIT 10;

08 What are stored procedures and functions in PostgreSQL? What is the difference? ►

PL/pgSQL

Function (CREATE FUNCTION) — returns a value. Can be called inside SQL expressions. Cannot use transaction control commands (COMMIT/ROLLBACK) inside the function body (it runs within the caller’s transaction).
Procedure (CREATE PROCEDURE, PostgreSQL 11+) — does not return a value directly. Called with CALL. Can contain COMMIT and ROLLBACK — manages its own transaction boundaries. Used for batch operations and ETL.

-- Function — returns a value, used in SQL expressions
CREATE OR REPLACE FUNCTION calculate_tax(price NUMERIC, rate NUMERIC DEFAULT 0.2)
RETURNS NUMERIC
LANGUAGE plpgsql
AS $$
BEGIN
    RETURN ROUND(price * rate, 2);
END;
$$;

SELECT name, price, calculate_tax(price) AS tax FROM products;

-- Procedure — no return value, can COMMIT mid-operation
CREATE OR REPLACE PROCEDURE archive_old_orders(cutoff_date DATE)
LANGUAGE plpgsql
AS $$
DECLARE
    batch_size INT := 1000;
    rows_moved INT;
BEGIN
    LOOP
        INSERT INTO orders_archive SELECT * FROM orders
        WHERE created_at < cutoff_date LIMIT batch_size;

        GET DIAGNOSTICS rows_moved = ROW_COUNT;
        DELETE FROM orders WHERE id IN (
            SELECT id FROM orders WHERE created_at < cutoff_date LIMIT batch_size
        );
        COMMIT;          -- commit each batch (not possible in a function)
        EXIT WHEN rows_moved < batch_size;
    END LOOP;
END;
$$;

CALL archive_old_orders('2024-01-01');

09 What are triggers in PostgreSQL? Give a practical example. ►

PL/pgSQL Triggers automatically execute a function before or after data modifications (INSERT, UPDATE, DELETE, TRUNCATE). They enforce business rules and maintain derived data at the database level.

-- Step 1: Create the trigger function
CREATE OR REPLACE FUNCTION update_modified_at()
RETURNS TRIGGER
LANGUAGE plpgsql
AS $$
BEGIN
    NEW.updated_at = NOW();  -- NEW = the row being inserted/updated
    RETURN NEW;              -- must return NEW for BEFORE triggers
END;
$$;

-- Step 2: Attach the trigger to a table
CREATE TRIGGER trg_users_updated_at
BEFORE UPDATE ON users
FOR EACH ROW
EXECUTE FUNCTION update_modified_at();

-- Audit log trigger
CREATE OR REPLACE FUNCTION audit_changes()
RETURNS TRIGGER AS $$
BEGIN
    IF TG_OP = 'DELETE' THEN
        INSERT INTO audit_log VALUES (TG_TABLE_NAME, 'DELETE', OLD.id, OLD, NULL, NOW());
        RETURN OLD;
    ELSE
        INSERT INTO audit_log VALUES (TG_TABLE_NAME, TG_OP, NEW.id, OLD, NEW, NOW());
        RETURN NEW;
    END IF;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_orders_audit
AFTER INSERT OR UPDATE OR DELETE ON orders
FOR EACH ROW EXECUTE FUNCTION audit_changes();

10 What is table partitioning in PostgreSQL? What are the partition types? ►

Partitioning Partitioning divides a large table into smaller physical pieces (partitions) while presenting a single logical table. Queries against specific partitions scan only the relevant partition(s) — dramatically improving performance on large tables.

-- RANGE partitioning — by date (most common for time-series data)
CREATE TABLE orders (
    id         BIGINT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL,
    total      NUMERIC
) PARTITION BY RANGE (created_at);

CREATE TABLE orders_2025 PARTITION OF orders
    FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');
CREATE TABLE orders_2026 PARTITION OF orders
    FOR VALUES FROM ('2026-01-01') TO ('2027-01-01');
CREATE TABLE orders_default PARTITION OF orders DEFAULT;

-- LIST partitioning — by discrete values
CREATE TABLE transactions (
    id     BIGINT, status TEXT, amount NUMERIC
) PARTITION BY LIST (status);
CREATE TABLE transactions_pending   PARTITION OF transactions FOR VALUES IN ('pending');
CREATE TABLE transactions_completed PARTITION OF transactions FOR VALUES IN ('completed','settled');

-- HASH partitioning — distribute rows evenly by hash of a column
CREATE TABLE events (id BIGINT, user_id BIGINT, data JSONB)
PARTITION BY HASH (user_id);
CREATE TABLE events_0 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE events_1 PARTITION OF events FOR VALUES WITH (MODULUS 4, REMAINDER 1);

-- Indexes on parent are inherited by partitions
CREATE INDEX ON orders (created_at);

-- Partition pruning — planner skips irrelevant partitions automatically
EXPLAIN SELECT * FROM orders WHERE created_at >= '2026-01-01'::date;
-- Should show only orders_2026 in the plan, not all partitions

11 What are partial and functional indexes? ►

Indexes

Partial index — indexes only rows satisfying a WHERE condition. Smaller, faster, less write overhead.

-- Index only active users (if 90% are inactive, this index is tiny and fast)
CREATE INDEX idx_users_email_active ON users (email)
WHERE is_active = TRUE;

-- Only used when query predicate matches the index condition
SELECT * FROM users WHERE email = 'alice@example.com' AND is_active = TRUE;
-- ✅ uses the partial index

-- Index only unprocessed jobs (queue table — keeps index small)
CREATE UNIQUE INDEX idx_jobs_pending ON jobs (job_type)
WHERE status = 'pending';

Functional (expression) index — indexes the result of an expression or function rather than a column value.

-- Case-insensitive email search
CREATE INDEX idx_users_email_lower ON users (LOWER(email));
SELECT * FROM users WHERE LOWER(email) = LOWER('Alice@Example.com');

-- Index on a JSONB path
CREATE INDEX idx_products_category ON products ((metadata ->> 'category'));
SELECT * FROM products WHERE metadata ->> 'category' = 'electronics';

-- Index on extracted year (avoid full scan with EXTRACT)
CREATE INDEX idx_orders_year ON orders (EXTRACT(YEAR FROM created_at));
SELECT * FROM orders WHERE EXTRACT(YEAR FROM created_at) = 2026;

12 What is VACUUM and ANALYZE in PostgreSQL? Why are they important? ►

Maintenance

VACUUM — reclaims disk space occupied by dead row versions (created by MVCC). Without VACUUM, tables grow indefinitely. Also prevents transaction ID wraparound (a critical issue that causes database shutdown if unchecked).
VACUUM FULL — rewrites the entire table, fully reclaiming space. Requires an exclusive lock — blocks all queries. Use only when there is severe bloat.
ANALYZE — collects statistics about column value distributions, used by the query planner to choose optimal execution plans. Stale statistics cause bad plans.
AUTOVACUUM — background daemon (enabled by default) that automatically runs VACUUM and ANALYZE on tables that exceed thresholds. Should generally be left enabled.

-- Manual vacuum
VACUUM orders;              -- reclaim dead tuples (non-blocking)
VACUUM FULL orders;         -- full rewrite, reclaim ALL space (exclusive lock!)
VACUUM ANALYZE orders;      -- vacuum + update statistics
VACUUM VERBOSE ANALYZE;     -- vacuum entire database with progress output

-- Update statistics only
ANALYZE orders;
ANALYZE orders (customer_id, status, created_at);  -- specific columns

-- Check autovacuum statistics
SELECT relname, last_vacuum, last_autovacuum, last_analyze, last_autoanalyze,
       n_live_tup, n_dead_tup, vacuum_count, autovacuum_count
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC LIMIT 20;

13 What is row-level security (RLS) in PostgreSQL? ►

Security Row-Level Security (RLS) restricts which rows a user can see or modify in a table. Policies are automatically appended to every query — enforced at the database level regardless of the application layer.

-- Enable RLS on a table
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;

-- Policy: users can only see their own documents
CREATE POLICY documents_isolation ON documents
    USING (owner_id = current_setting('app.current_user_id')::BIGINT);

-- Separate SELECT and INSERT policies
CREATE POLICY select_own  ON documents FOR SELECT USING (owner_id = current_user_id());
CREATE POLICY insert_own  ON documents FOR INSERT WITH CHECK (owner_id = current_user_id());
CREATE POLICY update_own  ON documents FOR UPDATE USING (owner_id = current_user_id());

-- Admin bypass (superusers bypass RLS by default; explicit BYPASSRLS privilege)
CREATE POLICY admin_all ON documents USING (current_setting('app.role') = 'admin');

-- Set the current user context at session start (from application)
SET app.current_user_id = '42';

-- RLS in a multi-tenant SaaS app
CREATE POLICY tenant_isolation ON orders
    USING (tenant_id = current_setting('app.tenant_id')::BIGINT);
-- Every query is automatically filtered: SELECT * FROM orders → returns only tenant's orders

-- Force RLS for table owners too (by default, owners bypass RLS)
ALTER TABLE documents FORCE ROW LEVEL SECURITY;

14 What is pg_stat_statements and how do you use it for performance analysis? ►

Performance pg_stat_statements is a PostgreSQL extension that tracks execution statistics for all SQL queries — total calls, total time, mean time, rows returned. It is the most valuable tool for identifying slow queries in production.

-- Enable the extension (requires restart if adding to shared_preload_libraries)
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- postgresql.conf: shared_preload_libraries = 'pg_stat_statements'

-- Top 10 slowest queries by mean execution time
SELECT
    LEFT(query, 80)       AS query_snippet,
    calls,
    ROUND(mean_exec_time::NUMERIC, 2)   AS mean_ms,
    ROUND(total_exec_time::NUMERIC, 2)  AS total_ms,
    ROUND(stddev_exec_time::NUMERIC, 2) AS stddev_ms,
    rows
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

-- Most called queries (high-frequency queries worth optimising)
SELECT LEFT(query, 80), calls, ROUND(mean_exec_time::NUMERIC, 2) AS mean_ms
FROM pg_stat_statements
ORDER BY calls DESC LIMIT 10;

-- Queries with highest total CPU time (biggest overall impact)
SELECT LEFT(query, 80), total_exec_time, calls
FROM pg_stat_statements
ORDER BY total_exec_time DESC LIMIT 10;

-- Reset statistics
SELECT pg_stat_statements_reset();

15 What is a materialized view? How does it differ from a regular view? ►

Objects

View — virtual table; query re-executes every time you query the view. Always shows current data. No storage cost.
Materialized view — stores the query result as a physical table. Very fast to query (pre-computed). Data is stale until explicitly refreshed. Supports indexes.

-- Create a materialized view
CREATE MATERIALIZED VIEW monthly_revenue AS
SELECT
    DATE_TRUNC('month', created_at) AS month,
    SUM(total)                       AS revenue,
    COUNT(*)                         AS order_count,
    COUNT(DISTINCT customer_id)      AS unique_customers
FROM orders
WHERE status = 'delivered'
GROUP BY 1
ORDER BY 1;

-- Index on the materialized view (unlike regular views, indexes are supported)
CREATE INDEX ON monthly_revenue (month);

-- Query it just like a table — very fast, pre-aggregated
SELECT * FROM monthly_revenue WHERE month >= '2026-01-01';

-- Refresh options
REFRESH MATERIALIZED VIEW monthly_revenue;                  -- blocks reads during refresh
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_revenue;     -- non-blocking (requires unique index)

-- Create the unique index required for CONCURRENTLY
CREATE UNIQUE INDEX ON monthly_revenue (month);

-- Automate refresh with a periodic trigger, pg_cron, or application scheduler

16 What are advisory locks in PostgreSQL? ►

Concurrency Advisory locks are application-level locks managed by the application, not tied to any database object. They are useful for coordinating distributed processes — ensuring only one process runs a job at a time.

-- Session-level advisory lock (held until released or session ends)
SELECT pg_try_advisory_lock(12345);  -- returns TRUE if acquired, FALSE if already held
SELECT pg_advisory_lock(12345);      -- waits until lock is available (blocking)
SELECT pg_advisory_unlock(12345);    -- release the lock

-- Transaction-level advisory lock (automatically released at end of transaction)
BEGIN;
SELECT pg_try_advisory_xact_lock(12345);  -- lock held until COMMIT/ROLLBACK
-- do work...
COMMIT; -- lock automatically released

-- Practical: distributed cron job — only one worker runs at a time
-- Worker 1 and Worker 2 both run this query:
SELECT pg_try_advisory_lock(42) AS got_lock;
-- Only ONE returns TRUE; the other returns FALSE and skips its work

-- Use with a name hash for readable lock IDs
SELECT pg_try_advisory_lock(hashtext('process_daily_reports'));

-- List active advisory locks
SELECT * FROM pg_locks WHERE locktype = 'advisory';

-- Pair with FOR UPDATE SKIP LOCKED for queue processing
SELECT * FROM jobs WHERE status = 'pending'
ORDER BY priority DESC
FOR UPDATE SKIP LOCKED
LIMIT 1;

17 What is UPSERT (INSERT ON CONFLICT) in PostgreSQL? ►

SQL UPSERT inserts a row if it does not exist, or updates it if a conflict (duplicate key) occurs — atomically, without a separate SELECT.

-- ON CONFLICT DO NOTHING — ignore duplicates silently
INSERT INTO tags (name)
VALUES ('postgresql'), ('database'), ('sql')
ON CONFLICT (name) DO NOTHING;

-- ON CONFLICT DO UPDATE — update on duplicate (classic upsert)
INSERT INTO user_stats (user_id, login_count, last_login)
VALUES (42, 1, NOW())
ON CONFLICT (user_id) DO UPDATE
    SET login_count = user_stats.login_count + 1,
        last_login  = EXCLUDED.last_login;  -- EXCLUDED = the row that failed to insert

-- EXCLUDED keyword — refers to the conflicting proposed row
INSERT INTO products (sku, name, price, updated_at)
VALUES ('SKU-001', 'Widget Pro', 29.99, NOW())
ON CONFLICT (sku) DO UPDATE
    SET name       = EXCLUDED.name,
        price      = EXCLUDED.price,
        updated_at = EXCLUDED.updated_at
WHERE products.price <> EXCLUDED.price;  -- only update if price changed

-- ON CONFLICT on a composite unique constraint
INSERT INTO order_items (order_id, product_id, quantity)
VALUES (1, 5, 3)
ON CONFLICT (order_id, product_id) DO UPDATE
    SET quantity = order_items.quantity + EXCLUDED.quantity;

18 What is the LATERAL JOIN? When is it useful? ►

SQL A LATERAL subquery (or LATERAL JOIN) can reference columns from the FROM clause items that appear before it in the same query — like a correlated subquery, but in the FROM clause. It evaluates once per row of the left side.

-- Get each customer's 3 most recent orders (TOP-N per group)
-- Without LATERAL: complex window functions or correlated subquery
-- With LATERAL: clean and often faster

SELECT c.name, recent.id AS order_id, recent.total, recent.created_at
FROM customers c
CROSS JOIN LATERAL (
    SELECT id, total, created_at
    FROM orders
    WHERE customer_id = c.id      -- references c from outer query
    ORDER BY created_at DESC
    LIMIT 3
) AS recent;

-- Unnest arrays with context
SELECT p.name, tag
FROM products p
CROSS JOIN LATERAL UNNEST(p.tags) AS tag
WHERE tag LIKE 'tech%';

-- Compute derived values per row
SELECT e.name,
       salary_analysis.pct_above_avg
FROM employees e
CROSS JOIN LATERAL (
    SELECT ROUND((e.salary / AVG(salary) - 1) * 100, 1) AS pct_above_avg
    FROM employees
    WHERE department = e.department
) AS salary_analysis;

19 What is table inheritance in PostgreSQL? ►

Schema PostgreSQL supports table inheritance — a child table inherits all columns of its parent. Queries on the parent automatically include rows from all child tables. This predates declarative partitioning and is now rarely used for partitioning but remains useful for polymorphic schemas.

-- Parent table (abstract base)
CREATE TABLE vehicles (
    id           BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    make         TEXT NOT NULL,
    model        TEXT NOT NULL,
    year         INT NOT NULL,
    registered_at TIMESTAMPTZ DEFAULT NOW()
);

-- Child tables inherit all parent columns plus add their own
CREATE TABLE cars (
    num_doors    INT DEFAULT 4,
    boot_litres  INT
) INHERITS (vehicles);

CREATE TABLE motorcycles (
    sidecar      BOOLEAN DEFAULT FALSE
) INHERITS (vehicles);

-- Insert into child tables
INSERT INTO cars (make, model, year, num_doors) VALUES ('Toyota', 'Camry', 2026, 4);
INSERT INTO motorcycles (make, model, year) VALUES ('Honda', 'CB500', 2026);

-- Query parent — returns rows from ALL child tables
SELECT make, model, year FROM vehicles;

-- Query only the parent (no children)
SELECT make, model FROM ONLY vehicles;

-- Caveats: UNIQUE constraints and FK references to parent do NOT cover children
-- For partitioning, use declarative PARTITION BY instead of INHERITS

20 What are common PostgreSQL performance tuning parameters? ►

Performance

-- postgresql.conf — key parameters (use pgTune.leopard.in for a good starting config)

-- Memory
shared_buffers        = 25% of RAM        -- main data cache (e.g., 4GB on 16GB RAM)
effective_cache_size  = 75% of RAM        -- estimate of OS page cache for planner
work_mem              = 64MB              -- per-sort/hash operation (beware: multiple per query)
maintenance_work_mem  = 512MB             -- for VACUUM, CREATE INDEX, ALTER TABLE

-- Write performance
wal_buffers           = 16MB              -- WAL write buffer
checkpoint_completion_target = 0.9       -- spread checkpoint writes over 90% of interval
max_wal_size          = 4GB

-- Connections
max_connections       = 200              -- use PgBouncer for pooling; don't just raise this
-- Each connection uses ~5-10MB RAM

-- Planner tuning
random_page_cost      = 1.1              -- for SSD (default 4 assumes HDD — reduces seq scans)
effective_io_concurrency = 200           -- for SSD: enables parallel heap fetches

-- Autovacuum tuning (tune per-table for high-write tables)
autovacuum_vacuum_scale_factor   = 0.05  -- vacuum after 5% of table changes (default 0.2)
autovacuum_analyze_scale_factor  = 0.02  -- analyze after 2% changes (default 0.1)

-- Logging slow queries
log_min_duration_statement = 1000  -- log queries taking > 1 second
log_checkpoints            = on
log_lock_waits             = on

21 What are generated columns in PostgreSQL? ►

Schema Generated columns (PostgreSQL 12+) are columns whose values are automatically computed from other columns — always up to date without triggers or application code.

CREATE TABLE products (
    id          BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    price_gbp   NUMERIC(10,2) NOT NULL,
    vat_rate    NUMERIC(4,3) NOT NULL DEFAULT 0.200,

    -- STORED: computed and physically stored (can be indexed)
    price_inc_vat NUMERIC(10,2) GENERATED ALWAYS AS (
        ROUND(price_gbp * (1 + vat_rate), 2)
    ) STORED,

    first_name  TEXT NOT NULL,
    last_name   TEXT NOT NULL,

    -- Full name — always derived, no sync issues
    full_name   TEXT GENERATED ALWAYS AS (
        first_name || ' ' || last_name
    ) STORED
);

-- Can be indexed, queried, used in WHERE
CREATE INDEX ON products (price_inc_vat);
SELECT * FROM products WHERE price_inc_vat < 100;

-- Full-text search generated column (avoid re-computing on every query)
ALTER TABLE articles ADD COLUMN fts tsvector
GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(body, '')), 'B')
) STORED;
CREATE INDEX ON articles USING GIN (fts);

Advanced PostgreSQL Interview Questions and Answers