The optimizer of Postgres/Greenplum is cost based and selects the execution plan with the lowest cost. In Greenplum, the cost for one scattered IO access is measured by the random_page_cost parameter, which is 100 by default. Comparing to 1 in postgres, it is quite large. It is being set in this way because we want to choose seq scan as frequently as possible in most cases. Example: gpadmin=# create table t as select generate_series x from generate_series(1,10000); NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'x' as the Greenplum Database data distribution key for this table. HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. SELECT 10000 gpadmin=# create index f_idx on t( lower(x)); CREATE INDEX gpadmin=# explain select * from t where lower(x) = '1'; QUERY PLAN ------------------------------------------------------------------------------ Gather Motion 8:1 (slice1; segments: 8) (cost=0.00..191.00 rows=2 width=4) -> Seq Scan on t (cost=0.00..191.00 rows=2 width=4) Filter: lower(x::text) = '1'::text (3 rows) gpadmin=# set random_page_cost to 1; SET gpadmin=# explain select * from t where lower(x) = '1'; QUERY PLAN ---------------------------------------------------------------------------- Gather Motion 8:1 (slice1; segments: 8) (cost=0.00..9.43 rows=2 width=4) -> Index Scan using f_idx on t (cost=0.00..9.43 rows=2 width=4) Index Cond: lower(x::text) = '1'::text Settings: random_page_cost=1 (4 rows) |