HISTOGRAM

posted Jan 28, 2012, 10:47 AM by Sachchida Ojha   [ updated Jan 28, 2012, 10:58 AM ]
"A HISTOGRAM RECORDS THE DISTRIBUTION OF VALUES WITHIN A COLUMN"
Histogram is very useful when column includes both frequently and infrequently encountered column values and where an index is, therefore, only sometime effective.

The Histogram enables Oracle to decide when to use index based on the value of the search criteria.

By default, Oracle collects geneeral information about column data, such as high value, low value, no of distinct values, but does not always collect information about the distribution of data within the columns. If the data in a column is fairly evenly distributed, the default stats will be adequate, but if the data is unevenly distributed (skewed) the optimizer might need histogram to determine the best plan.


The creation of a column histogram enables Oracle to recognize selective values within otherwise unselective columns and vice versa. The Optimizer can then chose to use an index for a selective value and full table scan for nonselective column.

A new feature of the dbms_stats package is the ability to look for columns that should have histograms, and then automatically create the histograms. Oracle introduced some new method_opt parameter options for the dbms_stats package. These new options are auto, repeat and skewonly and are coded as follows:

method_opt=>'for all columns size auto'
method_opt=>'for all columns size repeat'
method_opt=>'for all columns size skewonly'


Automatic histogram generation

The auto option is used only when monitoring has been invoked via the alter tablemonitoring command. Histograms are created based upon both the data distribution (see Figure A) and the workload on the column as determined by monitoring, like this.

execute dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE,
method_opt => 'for all columns size auto',
degree => DBMS_STATS.DEFAULT_DEGREE);

Evenly distributed data vs. skewed data distribution

Using the dbms_stats repeat option, histograms are collected only on the columns that already have histograms. Histograms are static, like any other CBO statistic, and need to be refreshed when column value distributions change. The repeat option would be used when refreshing statistics, as in this example:

execute dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE,
method_opt => 'for all columns size repeat',
degree => DBMS_STATS.DEFAULT_DEGREE);

The skewonly option introduces a very time-consuming build process because it examines the data distribution of values for every column within every index. When the dbms_stats package finds an index whose column values are distributed unevenly, it creates histograms to help the CBO make a table access decision (i.e., index versus a full-table scan). From the earlier vehicle_type example, if an index has one column value (e.g., CAR) that exists in 65 percent of the rows, a full-table scan will be faster than an index scan to access those rows, as in this example:

execute dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE,
method_opt => 'for all columns size skewonly',
degree => DBMS_STATS.DEFAULT_DEGREE);

Comments