Creating Index in GPDB

The CREATE INDEX command defines a new index on a table. By default, a B-tree index is created if you do not declare an index type. For example, to create a B-tree index on the column title in the table films:

CREATE INDEX title_idx ON films (title);

To create a bitmap index on the column gender in the table employee:

CREATE INDEX gender_bmp_idx ON employee USING bitmap (gender);

Although indexes in Greenplum Database do not need maintenance and tuning, it is still important to check which indexes are actually used by the real-life query workload. Examining index usage for an individual query is done with the EXPLAIN command.

The query plan shows the different steps or plan nodes that the database will take to answer a particular query, along with time estimates for each plan node. To examine the use of indexes, look for the following query plan node types in your EXPLAIN output:

•Index Scan - A scan of an index.

•Bitmap Heap Scan - Retrieves all rows from the bitmap generated by BitmapAnd, BitmapOr, or BitmapIndexScan and accesses the heap to retrieve the relevant rows.

•Bitmap Index Scan - Compute a bitmap by OR-ing all bitmaps that satisfy the query predicates from the underlying index.

•BitmapAnd or BitmapOr - Takes the bitmaps generated from multiple BitmapIndexScan nodes, ANDs or ORs them together, and generates a new bitmap as its output.

It is difficult to formulate a general procedure for determining which indexes to set up. A good deal of experimentation will be necessary in most cases.

•Always run ANALYZE after creating or updating an index. This command collects table statistics that are used by the query planner. This information is required to guess the number of rows returned by a query, which is needed by the planner to assign realistic costs to each possible query plan.

•Use real data for experimentation. Using test data for setting up indexes will tell you what indexes you need for the test data, but that is all.

•It is especially fatal to use very small test data sets. While selecting 1000 out of 1,000,000 rows could be a candidate for an index, selecting 1 out of 100 rows will hardly be, because the 100 rows will probably fit within a single disk page, and there is no plan that can beat sequentially fetching one disk page.

•Also be careful when making up test data, which is often unavoidable when the application is not in production use yet. Values that are very similar, completely random, or inserted in sorted order will skew the statistics away from the distribution that real data would have.

•When indexes are not used, it can be useful for testing to force their use. There are run-time parameters that can turn off various plan types. For instance, turning off sequential scans (enable_seqscan) and nested-loop joins (enable_nestloop), which are the most basic plans, will force the system to use a different plan. Time your query with and without indexes. The EXPLAIN ANALYZE command can be useful here.

Managing Indexes

In certain conditions, a poorly performing index may need to be rebuilt using the REINDEX command. This rebuilds an index using the data stored in the index’s table, replacing the old copy of the index.

Updates or delete operations do not update bitmap indexes. Therefore, if you have deleted rows or updated columns in a table that has bitmap indexes, you will need to rebuild the indexes using the REINDEX command.

To rebuild all indexes on a table
REINDEX my_table;

To rebuild a particular index
REINDEX my_index;

Dropping an Index
The DROP INDEX command removes an index. For example:
DROP INDEX title_idx;

When loading data, it is often faster to drop all indexes, load, then recreate the indexes afterwards.