DBMS_STATS (10G to 11G)

posted Sep 24, 2010, 8:59 AM by Sachchida Ojha
Starting with Oracle Database 11g DBMS_STATS uses a new hash based algorithm when using AUTO_SAMPLE_SIZE for ESTIMATE_PERCENT. Prior to Oracle Database 11g, dbms_stats uses a row sampling based algorithm. The new approximate number of distinct value method provide an efficient and accurate method for gathering number of distinct values and other statistics for columns using a hash based algorithm. The approximate NDV technique is used when you invoke a procedure from DBMS_STATS with ESTIMATE_PERCENT gathering option set to AUTO_SAMPLE_SIZE, which is the default value. The row sampling based algorithm will be used for collection of number of distinct values if you specify any value other than AUTO_SAMPLE_SIZE. This preserves the old behavior when you specify sampling percentage. The use of AUTO_SAMPLE_SIZE results in more accurate statistics comparable to that of estimate_percent set to 100, but taking less time as compared to used by dbms_stats when estimate_percent set to 100. There are new features in dbms_stats, using those could also change the statistics gathered by dbms_stats and hence the query plans of a sql may change as compared to earlier versions.

These features are:

    * Multicolumn Statistics
    * Expression Statistics
    * Statistics on views
    * Statistic Preferences

New options are also available in Oracle Database 11g for Statistic Preferences in dbms_stats which where not present in earlier versions.



For more details on these features, see the Oracle Database documentation set.

Comparing Statistics

When it comes to deploying a new application or application module it is standard practice to test and tune the application in a test environment before it is moved to production. However, even with testing it's possible that SQL statements in the application will have different execution plans in production then they did on the test system. One of the key reasons an execution plan can differ from one system to another (from test and production) is because the optimizer statistics on each system are different. In Oracle Database 11g, the DIFF_TABLE_STATS_* function can be used to compare statistics for a table from two different sources. The statistics can be from:

    * A user statistics table and current statistics in the dictionary
    * A single user statistics table containing two sets of statistics that can be identified using statids
    * Two different user statistics tables
    * Two points in history

The function also compares the statistics of the dependent objects (indexes, columns, partitions). The function displays statistics for the object(s) from both sources if the difference between the statistics exceeds a certain threshold (%). The threshold can be specified as an argument to the function; the default value is 10%. The statistics corresponding to the first source are used as the basis for computing the differential percentage.