Full-Text Search — tsvector, tsquery and GIN Indexes

📚 React + FastAPI 📂 Chapter 19: Advanced SQL — Window Functions, CTEs and Triggers 📄 Lesson 19040 Advanced 🕒 March 17, 2026

PostgreSQL’s built-in full-text search turns your database into a search engine — no Elasticsearch required for moderate search needs. The system works by converting text into a sorted list of lexemes (normalised word forms), then matching against a tsquery. A GIN index on the tsvector makes searches on millions of rows return in milliseconds. For the blog application, full-text search enables searching post titles and bodies, with relevance ranking so the most relevant posts appear first. This is the search endpoint you will build in the FastAPI part of the series.

tsvector and tsquery Basics

-- tsvector: a processed document (sorted lexemes with positions)
-- tsquery: a search query

-- Convert text to tsvector
SELECT to_tsvector('english', 'FastAPI is a modern Python web framework');
-- 'fastapi':1 'framework':7 'modern':4 'python':5 'web':6
-- Stopwords removed (is, a), lexemes normalised (framework not frameworks)

-- Convert search terms to tsquery
SELECT to_tsquery('english', 'python & fastapi');     -- python AND fastapi
SELECT to_tsquery('english', 'python | django');       -- python OR django
SELECT to_tsquery('english', '!javascript');           -- NOT javascript
SELECT plainto_tsquery('english', 'python web framework');  -- implicit AND, no operators
SELECT websearch_to_tsquery('english', 'python "web framework" -javascript');

-- Match a tsvector against a tsquery (@@ operator)
SELECT to_tsvector('english', 'FastAPI is a Python web framework')
    @@ to_tsquery('english', 'python & framework');   -- TRUE

-- The @@ operator returns true if the document matches the query

Note: PostgreSQL’s full-text search normalises text to lexemes — the base form of words. “running”, “runs”, and “ran” all become the lexeme “run”. “Python” and “python” both become “python”. This means searches are case-insensitive and handle common word variations automatically. The language parameter ('english') controls the stemming rules and stopword list. For multilingual content, either use a generic dictionary ('simple') or detect the language and store documents in multiple language-specific tsvector columns.

Tip: Use websearch_to_tsquery() for user-facing search inputs — it handles the most natural search syntax: quoted phrases, minus for exclusion, implicit AND between words, without requiring users to know Boolean operators. plainto_tsquery() is simpler (all words ANDed, no operators) but does not support phrases. Reserve to_tsquery() for programmatic queries where you control the syntax.

Warning: Do not call to_tsvector() in a WHERE clause without a GIN index — it recomputes the tsvector for every row on every query, resulting in a sequential scan. Always either: (1) store a computed tsvector column with a GIN index, (2) create an expression GIN index on to_tsvector('english', col), or (3) for multi-column search, create an index on the concatenated tsvector. Any of these allows the index to be used for fast lookups.

Creating a Full-Text Search Index

-- ── Option A: Expression GIN index (no extra column) ─────────────────────────
CREATE INDEX idx_posts_fts ON posts
    USING GIN (to_tsvector('english', title || ' ' || COALESCE(body, '')));

-- Queries that use this index:
SELECT id, title FROM posts
WHERE to_tsvector('english', title || ' ' || COALESCE(body, ''))
   @@ websearch_to_tsquery('english', 'fastapi tutorial');

-- ── Option B: Generated tsvector column (more efficient for updates) ──────────
ALTER TABLE posts
    ADD COLUMN search_vector TSVECTOR
    GENERATED ALWAYS AS (
        setweight(to_tsvector('english', COALESCE(title, '')), 'A') ||
        setweight(to_tsvector('english', COALESCE(body, '')), 'B')
    ) STORED;

-- setweight assigns weight to different fields: 'A' (highest) to 'D' (lowest)
-- This means title matches rank higher than body matches

CREATE INDEX idx_posts_search ON posts USING GIN (search_vector);

-- Query using the stored column (faster — no recomputation per query):
SELECT id, title
FROM posts
WHERE search_vector @@ websearch_to_tsquery('english', 'fastapi tutorial');

Relevance Ranking with ts_rank

-- ts_rank: score based on how often query terms appear and their weights
-- ts_rank_cd: score based on term density and cover (distance between terms)

-- Search with relevance ranking
SELECT
    p.id,
    p.title,
    p.status,
    ts_rank(
        p.search_vector,
        websearch_to_tsquery('english', :search_query)
    ) AS relevance_score,
    ts_headline(
        'english',
        p.body,
        websearch_to_tsquery('english', :search_query),
        'StartSel=<mark>, StopSel=</mark>, MaxWords=35, MinWords=15'
    ) AS excerpt_with_highlights
FROM posts p
WHERE p.status = 'published'
  AND p.search_vector @@ websearch_to_tsquery('english', :search_query)
ORDER BY relevance_score DESC, p.view_count DESC   -- rank by relevance, then popularity
LIMIT 20;

-- ts_headline: returns a fragment of the original text with query terms highlighted
-- The StartSel/StopSel options define the HTML tags wrapping matched terms

Full-Text Search FastAPI Endpoint

-- Complete search query for FastAPI GET /search?q=...&page=1&limit=10

-- Count total matches (for pagination)
SELECT COUNT(*) AS total
FROM posts
WHERE status = 'published'
  AND search_vector @@ websearch_to_tsquery('english', :query);

-- Paginated results with highlighting
SELECT
    p.id,
    p.title,
    p.slug,
    p.view_count,
    p.published_at,
    u.name                 AS author_name,
    ts_rank(p.search_vector, q.query) AS score,
    ts_headline('english', p.body, q.query,
        'MaxWords=40, MinWords=15, StartSel=<em>, StopSel=</em>'
    ) AS excerpt
FROM posts p
JOIN users u ON p.author_id = u.id
CROSS JOIN websearch_to_tsquery('english', :query) AS q(query)
WHERE p.status = 'published'
  AND p.search_vector @@ q.query
ORDER BY score DESC, p.view_count DESC
LIMIT :limit OFFSET :offset;

Common Mistakes

Mistake 1 — Missing GIN index on tsvector (sequential scan)

❌ Wrong — recomputes tsvector for every row:

WHERE to_tsvector('english', title) @@ to_tsquery('python');
-- Sequential scan on all posts!

✅ Correct — use a stored column or expression index with GIN:

CREATE INDEX idx_fts ON posts USING GIN (to_tsvector('english', title));
-- OR use a generated STORED column ✓

Mistake 2 — Using to_tsquery with user input (syntax errors)

❌ Wrong — user input like “C++” breaks to_tsquery syntax:

WHERE search_vector @@ to_tsquery('english', user_input);
-- "C++" → ERROR: syntax error in tsquery

✅ Correct — use websearch_to_tsquery for user input:

WHERE search_vector @@ websearch_to_tsquery('english', user_input);   -- ✓ handles special chars

Mistake 3 — Not using setweight to prioritise title over body

❌ Wrong — title and body have equal weight:

to_tsvector('english', title || ' ' || body)   -- title match = body match

✅ Correct — weight title higher:

setweight(to_tsvector('english', title), 'A') ||
setweight(to_tsvector('english', body),  'B')   -- title matches rank higher ✓

Quick Reference

Function	Use
`to_tsvector(lang, text)`	Convert text to lexeme document
`websearch_to_tsquery(lang, q)`	Convert user search string to query (safe)
`plainto_tsquery(lang, q)`	Simple AND query from plain text
`to_tsquery(lang, q)`	Explicit Boolean tsquery (requires valid syntax)
`ts_rank(vec, query)`	Relevance score
`ts_headline(lang, text, query)`	Highlighted excerpt
`setweight(vec, weight)`	Assign A/B/C/D weight to tsvector
GIN index	`CREATE INDEX ... USING GIN (tsvector_col)`

🧠 Test Yourself

A user searches for “python web framework”. How should you convert this to a tsquery for a FastAPI search endpoint?

websearch_to_tsquery('english', 'python web framework') — this function handles user input safely, implicitly ANDs the words, handles special characters without errors, and supports natural search syntax like quotes for phrases and minus for exclusion. It is the correct choice for user-facing search
to_tsquery('english', 'python web framework') — this requires explicit Boolean operators; plain words without & or | raise a syntax error
to_tsquery('english', 'python & web & framework') — works but requires your application to construct the query syntax manually
to_tsvector('english', 'python web framework') — this creates a tsvector document, not a tsquery for searching

tsvector and tsquery Basics #

Creating a Full-Text Search Index #

Relevance Ranking with ts_rank #

Full-Text Search FastAPI Endpoint #

Common Mistakes #

Mistake 1 — Missing GIN index on tsvector (sequential scan) #

Mistake 2 — Using to_tsquery with user input (syntax errors) #

Mistake 3 — Not using setweight to prioritise title over body #

Quick Reference #

🧠 Test Yourself #

📚 More in this Tutorial Series