6.5.6 Optimization of databases using indexes
Indexes in databases are special structures that speed up data retrieval in a table. They work similarly to a pointer in a book: instead of scanning the entire table, the DBMS uses an index to quickly find the records you need.
Without indexes, searching for the required rows requires a full scan (Full Table Scan), which is inefficient for large tables. Indexing can significantly reduce the number of read operations, improving performance.
Example without index:
SELECT * FROM users WHERE email = 'user@example.com';
If email
is not indexed, the DBMS scans the entire users
table.
Example with index:
CREATE INDEX idx_users_email ON users(email);
Now the search is much faster.
Please note: Although indexes speed up reading data, they also have disadvantages:
- Slowing down write operations – when inserting, updating or deleting a record, indexes need to be updated.
- Additional memory consumption – indexes take up disk space.
- Do not always speed up queries – if a query selects a significant portion of the data, the index may not provide any growth.
Therefore, it is important to approach indexing wisely, choosing only those columns that are frequently searched and sorted.
Additionally: We have instructions on creating an index in practice. You can read it at address.
Main types of indexes
Indexes in relational databases come in different types, each suitable for specific scenarios. Let’s look at the most common ones.
B-Tree indexes
B-Tree (balanced trees) is the most common type of index, used by default in PostgreSQL, MySQL (InnoDB) and other DBMS.
How they work:
- Data is stored as a balanced tree, which provides logarithmic search complexity (
O(log n)
). - Supports efficient searching, sorting and range queries.
Usage example:
CREATE INDEX idx_users_lastname ON users(last_name);
Now queries like WHERE last_name = 'Smith'
or ORDER BY last_name
are executed faster.
When to use:
- For
=
(equality) search. - For
BETWEEN
,>
,<
(range queries). - For
ORDER BY
andGROUP BY
.
When inefficient:
- On small tables (faster enumeration).
- With frequent inserts and deletes (tree maintenance costs).
Hash indexes
Hash indexes work on the principle of mapping values into a hash table. Present in MySQL (InnoDB) up to version 8.0, after that only available for PostgreSQL.
How it works:
- Very fast for exact search (
=
). - Does not support range queries (
>
,<
,BETWEEN
).
Example in PostgreSQL:
CREATE INDEX idx_users_email_hash ON users USING HASH(email);
Now WHERE email = 'user@example.com'
is faster.
When to use:
- Only for exact search (
=
). - When B-Tree is redundant.
When ineffective:
- When sorting or range searching is required.
GIN
GIN (generalized inverted index) — designed for fast searching through arrays, JSONB, full-text data and other structures with many values in a single field. Available for PostgreSQL only.
How it works:
- Optimized for searching through many values in a single column.
- Used for full-text search and JSONB.
- Suitable for
@>
,<@
,?
,?|
,?&
in PostgreSQL.
GIN example for JSONB:
CREATE INDEX idx_users_data ON users USING GIN(data);
WHERE data @> '{"age": 30}'
now runs faster.
GiST
GiST (generalized search tree) is a flexible index that is well suited for multidimensional data, geographic features, and fuzzy search. Available for PostgreSQL only.
How it works:
- Flexible, suitable for multidimensional data (geoindexes, string similarity).
- Used in PostgreSQL for pg_trgm (substring search).
GiST example for substring search:
CREATE INDEX idx_users_name_gist ON users USING gist(last_name gist_trgm_ops);
WHERE last_name LIKE '%mit%'
is now faster.
BRIN indexes
BRIN (Block Range INdex) is useful for huge tables where data is sorted by some feature. Available for PostgreSQL only.
How it works:
- The index stores the minimum and maximum values for blocks of rows.
- Very small in size.
Example for temporary data:
CREATE INDEX idx_logs_date ON logs USING BRIN(event_date);
Now WHERE event_date BETWEEN '2024-01-01' AND '2024-02-01'
works faster.
When to use:
- For large tables (millions of rows).
- When data is ordered (e.g. by date).
When inefficient:
- If data is randomly distributed.
When and what indexes to use
Indexes improve the performance of retrieval (SELECT
), but can slow down modification operations (INSERT
, UPDATE
, DELETE
). Therefore, it is important to understand when and what indexes to use.
Indexes on primary keys and unique fields
When creating a PRIMARY KEY
or UNIQUE
, a B-Tree index is automatically created.
Example:
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email TEXT UNIQUE
);
This means that:
PRIMARY KEY
(id) will create an index for fast searching on id.UNIQUE
(email) will create an index to prevent duplication.
When useful:
- id is often used in
JOIN
. - Searching on email is often done.
Foreign Key Indexing
Foreign keys (FOREIGN KEY
) do not automatically create indexes by themselves (e.g. in PostgreSQL). If JOINs are performed on them, indexing is mandatory.
Example:
CREATE INDEX idx_orders_user_id ON orders(user_id);
Now JOIN users ON orders.user_id = users.id
is faster.
When to use:
- If the foreign key is involved in
JOIN
. - If it is often used in
WHERE
.
Indexes for filtering and sorting
Indexing helps if the query contains WHERE
, ORDER BY
, GROUP BY
.
Example:
CREATE INDEX idx_users_lastname ON users(last_name);
Now WHERE last_name = 'Smith' and ORDER BY last_name
work faster.
When indexes are harmful
- Rarely used indexes. If a column is rarely used in queries, the index only takes up space.
- Small tables. If the table contains less than 1000 rows, a full enumeration (Seq Scan) is faster than an index.
- Frequent
INSERT
,UPDATE
,DELETE
. Indexes slow down inserts and updates. For example, if there are 5 indexes, each update must modify them all.
Example problem:
UPDATE users SET email = 'new@example.com' WHERE id = 100;
If email is indexed, the update requires changing the index.
Solution:
- Avoid indexes on frequently modified columns.
- Use partial indexes (WHERE indexing constraint).
Partial index example:
CREATE INDEX idx_active_users ON users(email) WHERE is_active = true;
Now the index is applied only to active users.
Creating and managing indexes
Additionally: we have instructions on creating an index in practice. You can read it at address. The most convenient way to work with indexes is with phpMyAdmin, which we also have.
In this section, we will discuss the main operations with indexes: creation, deletion, automatic management of indexes in different DBMS and special types of indexes.
Creating indexes (CREATE INDEX)
Indexes are created with CREATE INDEX
.
- Example of a regular B-Tree index:
CREATE INDEX idx_users_lastname ON users(last_name);
- Create a unique index:
CREATE UNIQUE INDEX idx_users_email ON users(email);
- Create a multi-column index:
CREATE INDEX idx_orders_date_status ON orders(order_date, status);
This index speeds up:
WHERE order_date = '2024-01-01' AND status = 'shipped'
ORDER BY order_date, status
- But does not speed up search only by
status
.
Dropping and renaming indexes (DROP INDEX, ALTER INDEX)
If an index is no longer needed, it can be dropped:
DROP INDEX idx_users_lastname;
Or renamed if necessary:
ALTER INDEX idx_users_lastname RENAME TO idx_users_lname;
Automatic indexes in different DBMS
Some DBMS create indexes automatically:
- PostgreSQL – creates indexes for PRIMARY KEY and UNIQUE, but not for FOREIGN KEY.
- MySQL (InnoDB) – automatically indexes PRIMARY KEY, UNIQUE and FOREIGN KEY.
- SQLite – indexes PRIMARY KEY and UNIQUE.
You can check indexes using (example for PostgreSQL):
SELECT * FROM pg_indexes WHERE tablename = 'users';
Covering and partial indexes
A regular index stores only key columns. A covering index allows you to add additional data, speeding up SELECT
.
CREATE INDEX idx_users_email_include ON users(email) INCLUDE (name, created_at);
Now SELECT name, created_at FROM users WHERE email = 'user@example.com'
does not require accessing the table.
Partial indexes
Indexing only part of the data is useful for large tables.
CREATE INDEX idx_active_users ON users(email) WHERE is_active = true;
Now the index is applied only to active users.
Analysis and optimization of queries with indexes
Indexes work effectively only if they are configured correctly. Let’s look at how to analyze their use and what errors can reduce their effectiveness.
Using EXPLAIN and EXPLAIN ANALYZE
Before optimizing, it is important to understand whether the query uses indexes. To do this, use EXPLAIN:
EXPLAIN SELECT * FROM users WHERE email = 'user@example.com';
The result will show whether Index Scan or Seq Scan (exhaustive scan) will be used.
For detailed analysis with execution time measurement, add ANALYZE
:
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user@example.com';
This executes the query and shows the real costs.
How to understand if the index is working
If Seq Scan is used – the index is not applied, possible reasons:
- No index on the field in
WHERE
. - Filtering returns too many rows (a full enumeration is optimal).
- Query conditions do not match the index structure.
If Index Scan or Index Only Scan is used – the index is used:
- Index Scan – the index is used, but the data is requested from the table.
- Index Only Scan – the index completely covers the query (works faster).
Problems with indexes and optimization
-
Index selectivity. An index is effective if it discards most of the data. If a field with a small number of unique values is indexed (for example,
is_active = true/false
), the index may not provide any gain. -
Correct order of fields in composite indexes. When using an index on multiple columns (
CREATE INDEX idx_orders ON orders(status, created_at)
), the queryWHERE status = 'shipped' AND created_at > '2024-01-01'
will be effective, butWHERE created_at > '2024-01-01'
does not use the index.
Index Optimization Strategies
- Drop unused indexes – you can check index access statistics:
SELECT * FROM pg_stat_user_indexes WHERE schemaname = 'public';
- Use
VACUUM ANALYZE
(PostgreSQL) – updates query planner statistics:
VACUUM ANALYZE;
-
Do not index columns with low selectivity – an index on gender (m/f) is useless.
-
Use partial indexes – reduces the size of the index, speeding up work.
-
Rebuild indexes (
REINDEX
) – removes fragmentation, improving performance:
REINDEX TABLE users;
Conclusion
Optimizing database performance using indexes is a balance between speeding up retrieval (SELECT
) and minimizing write overhead (INSERT
, UPDATE
, DELETE
).
Conclusions and recommendations:
- Indexes speed up data retrieval, but require additional resources.
- Not all queries use indexes - it is important to analyze
EXPLAIN ANALYZE
. - The index type matters -
B-Tree
is universal, butGIN/GiST
are suitable for specific tasks. - Excessive indexing is harmful - too many indexes slow down writes and consume memory.
- Regular index maintenance (
REINDEX
,VACUUM
) helps to avoid performance degradation.
Following these principles, you can significantly speed up your database and improve the efficiency of the system.