Performance tuning focuses on writing efficient SQL, allocating
computer resources and analyzing wait events and contention in the
system. The design approach to a database is critical to ensuring the
best performance from a database, here are the steps when designing a
- Design the application correctly
- Tune the application SQL code
- Tune memory
- Tune I/O
- Tune contention and other issues
There are two approaches to performance tuning
- pro-active - try to see the problem before it happens i.e. warning alert on a tablespace threshold before it becomes a problem
- reactive - the problem has already occurred and action is required
pro-active means that it does not impact the end user where as
reactive is generally when the user tells you he/she has a problem. OEM provides many graphs that can shown you trends about a particular area of the database for more information see OEM.
One rule is first try and fix the SQL code, especially if it is old
code, no matter how you tune the database if the code is badly written
it won't make any difference, however oracle does provide some
parameters that can help with poorly written code.
Oracle Query process
Query processing requires the transformation of your SQL query into
an efficient execution plan, Query optimization requires that the best
execution plan is the one executed, the goal is to use the least amount
of resources possible (CPU and I/O), the more resources a query uses the
more impact it has on the general performance of the database.
A users query will go through 3 phases
||At this stage the syntax ( decomposed into a
relational algebra that's analyzed to see if its syntactically correct)
and semantics (make sure that tables and columns exist and you have
permissions to access the objects) are checked, at the end you have a
parse tree which represents the query's structure.
The statement is normalized so that it can be processed more
efficiently, once all the checks have completed it is considered a valid
parse tree and is sent to the logical query plan generation stage.
All this is done in the library cache of the SGA
The optimizer is used at this stage, which is a cost-based
optimizer (CBO - see below for more info), this chooses the best access
method to retrieve the requested data. It uses statistics and any hints
specified in the SQL query, the CBO produces an optimal execution plan
for the SQL statement.
The optimization process can be divided in two parts
Query rewrite phase
The parse tree is converted into an abstract logical query plan,
the various nodes and branches are replaced by operators of relational
Execution plan generation phase
Oracle transforms the logical query plan into a physical query
plan, the physical query or execution plan takes into account the
- The various operations (joins) to be performed during the query
- The order in which the operations are performed
- The algorithm to be used for performing each operation
- The best way to retrieve data from disk or memory
- The best way to pass data from operation to another during the query
The optimizer may well come up with multiple physical plans, all
of which are potential execution plans. The optimizer then chooses among
them by estimating the costs of each possible plan (based on table and
index statistics) and selecting the plan with the lowest cost. This is
called the cost-based query optimization (CBO).
||The final stage is to execute the physical query plan that was selected by the CBO
In previous versions of Oracle you had two choices RBO (rule-based
optimizer) or CBO (cost-based optimizer), RBO is available in Oracle 10g
but is a deprecated product the CBO is the preferred and default
|used a heuristic method to select among several
alternative access paths with the help of certain rules. All paths were
ranked and the lowest was chosen i.e. using the ROWID was a cost of 1, a
full table scan was a cost of 19.
The CBO uses statistics on tables and indexes, the order of
tables and columns in the SQL statements, available indexes, and any
user-supplied hints to pick the most efficient way to access the data
CBO almost always performs better than RBO.
Cost-Based Optimizer (CBO)
The optimizers job is to find out the optimal or best plan to execute
your DML statements (select, insert, update and delete). The CBO uses
statistics on tables and indexes, the order of tables and columns in the
SQL statements, available indexes, and any user-supplied hints to pick
the most efficient way to access the data requested, the CBO uses the
least costly method (cost being CPU and I/O which is the most expensive)
to get at the data.
Oracle 10g has a automatic job that collects statistics which is used the the CBO, the gather_stats_job
is run between 10pm-6am everyday. The job collects statistics on all
tables that either have no statistics or stale statistics (more than 10%
of data has changed in the table since the last collection).
|Check if statistics are being collected
select last_analyzed, table_name, owner, num_rows, sample_size from dba_tables order by last_analyzed;
Note: sample_size can vary from 1 to 100%, the greater the
sample_size the better statistics are obtained, however a general 5 to
20% should be enough in most cases especially in very large tables
Once the statistics have been gathered a number of columns will have been updated in a table or index
- Number of rows and number of rows per block
- Average row length
- Total number of database blocks in a table
- Number of levels in each index
- Number of leaf blocks in each index
- Number of distinct values in each column of a table
- Data distribution histograms
- Number if distinct index keys
- Cardinality (the number of columns with similar values for each column)
- Minimum and Maximum values for each column
- System statistics, which include I/O characteristics of your
system and CPU statistics which include CPU speed and other related
The CBO will use all of the above statistics and other statistics
(CPU, I/O and O/S statistics) to help with finding the optimal plan,
using the above statistics the CBO can estimate costs of individual
operations. The less number of statistics collected will result in less
physical plans the CBO can come up with, thus the less number of choices
the CBO can make.
||select column_name, num_distinct fro dba_col_statistics where table_name = 'PERSONNEL';
The mode level can be set to a number of levels, depending on what
your application requirements are you can start sending data to the user
quickly, if the user requires all the data to be seen altogether then
the ALL_ROWS option is best but if the user does not care about getting
all the data only getting something fast then FIRST_ROWS may be an
option, also by using hints you can force the optimizer to take a chosen path see hints below for more details
||The optimizer will process all rows before outputting data
||The optimizer will process n rows before outputting data
||Once data is available start outputting immediately.
|Mode Level (system wide)
||alter system set optimizer_mode = ALL_ROWS;
|Mode Level (session level)
||alter session set optimizer_goal = first_rows_10;
The CBO may not use the same plan for the same SQL statement every time, you need to watch out for the following
- Execution plans can change across oracle versions, to make sure that the same plan is used then you need to use stored outlines
- The application developer may know of a better plan than the CBO, in which case hints can be used to direct or help the CBO.
- The CBO depends enormously on current statistics, if the
statistics are absent or outdated the optimizer can make poor decisions
Manually Collect Statistics
To collect statistics manually you need to use the package dbms_stats, you can sample row or block and depending on the size of the table
estimate_percent => NULL,
method_opt => 'auto',
granularity => 'all,
cascade => 'true',
option => 'gather_auto'
estimate - refers to the percentage of rows that should be used to estimate the statistics, null means all rows
method_opt - can specify several things , like include histograms, AUTO means collect histograms
granularity - only applies to tables, ALL means collect statistics for subpartitions, partitions, etc
cascade - collect all statistics for both table and index
- gather = collect statistics for all objects even it the object has stale or fresh statistics
gather_auto = Oracle will decide on what to collect
gather_empty = collect only for objects that do not have any statistics
gather_stale = collect only for objects with stale statistics
||dbms_stats.gather_schema_stats( ownname => 'VALLEP');
||dbms_stats.gather_table_stats( 'hr', 'employees');
||dbms_stats.gather_index_stats( 'hr', 'employ_idx');
dbms_stats.create_stat_table(ownname => 'vallep', stat_tab => 'stats_table', tblspace => 'stat_tbs');
dbms_stats.gather_system.stats('start'); (wait a while then run stop)
dbms_stats.gather_system_stats(interval => 720, stat_tab => 'stats_table', stat_id => 'OLTP');
Note: you can use the create_stat_table to transfer system stats to other databases.
You can also collect statistics by using the analyze command
|Analyze a table
analyze table employee compute statistics;
Note: additional columns are filled in when this is run i.e. average_row_size, last_analyzed, size_of_table (in data blocks)
||look at the last_analyzed column to make sure statistics are being collected
||look at the last_analyzed column to make sure statistics are being collected
Dynamic sampling is controlled by the optimizer_dynamic_sampling parameter which accepts values from 0 (off) to 10 (aggressive sampling) with 2 as the default.
Dynamic sampling can be a benefit
- The sample time is small compared to the overall query execution time
- Results in better performing query
- Query may be executed multiple times
||show parameter optimizer_dynamic_sampling;
||alter system set optimizer_dynamic_sampling = 2;
||alter session set optimizer_dynamic_sampling = 5;
||select /*+ dynamic_sampling(employees 5) */
emp_id, fname, lname, job, sal
where dept_num = 50;
To elimate the number of rows that the optimizer has to retrieve we use where
clauses, however the optimizer may not end up writing the best
execution plan, you have better knowledge of the application than the
optimizer and with this knowledge you can use hints which force the optimizer to use that knowledge. Using where
clauses efficiently will reduce the I/O, thus increasing performance,
the optimizer will use statistics (row count) to determine how to create
the best plan, if no statistics are available then it has no option but
to perform a full table scan.
Sometimes the optimizer will not use a index, even if you know one exists, the possible reason for these could be any of the following
- Views in a query sometimes prevent the use of indexes
- If you think heavy data skew is in the table, using histograms
will help the optimizer to provide more accurate representations of the
data distribution in the table.
- If for some reason it still refuses to use the index you can force it by using a hint.
Try using the where clause instead of the having clause, as the having clause incurs the additional overhead of sorting and summing.
Sometimes the CBO does not know best and needs help to point it in the right direction, this is where hints help by forcing the optimizer to take a path that you have knowledge about, hints can alter the join methods, join order or even access paths. The are many hint options you would need to see the Oracle documentation for a full listing but i have listed some of the more common one
- all_rows - optimize throughput, not optimizer the response time of the statement
- first_rows - return the first rows quickly
- full - perform a full table scan
- ordered - force the join order of the tables in the query
- index - force the use of indexes
- index_ffs - force a full scan of a index, can use parallelization for this to improve performance.
select /*+ FULL (employees) */ ... from employees
select /*+ INDEX_FFS (employees) */ ... from employees
select /*+ USE_NL (employees, dept) */ .. from employees, dept
Note: see oracle documentation for full listing of all the hints
You should choose a join method based on how many rows you expect to be returned from the join
||cartesian joins are normally a result of not using a
where clause, it basically joins every row in all tables, so for an
example if one table as 50,000 rows and the other table has 100 rows
then a cartesian join of the tables would be 50,000 * 100 = 500,000 rows
If you are join tables with few than 10,000 rows then a Nested Loop would be the way to go, see hints section above for more details on how to use a Nested Loop
||Use if the join will produce large subsets of data or a substantial proportion of a table is going to be joined.
||If the tables in the join are being joined with an inequality condition (not an equi join), then use a merge join
||Basically used for data warehouses, do not use if running OLTP
system, they are used with low cardinality columns (columns having low
distinct values - gender, marital _status, relation, etc)
See join methods for more information regarding different types of joins
An index is a data structure that takes the value of one or more
columns of a table (the key) and returns all rows (or the requested
columns in that row) with that value of the column quickly. The
efficiency of the index is that it lets you find the necessary rows
without having to perform a full table scan, this leads to few I/O's.
As a general rule you so only use a index if you select about 10-15%
of a table, when using a index this prevents you from performing a full
table scan. When you want to retrieve a row, oracle has to perform a
lookup in the index to obtain the ROWID, using this ROWID it can then
retrieve the requested row. Using to many indexes can degrade
performance as when a table is updated the index has to be updated as
well, so only index when the trade-off is better, in other words if you
use a OLTP database then limit the amount of indexes that you use. Here
are some guidelines on when you should index
- Index columns with high selectivity (few rows with identical values)
- Index all important foreign keys
- Index all predicate columns
- Index columns used in table joins
- Try to avoid indexing columns that consist of long character strings
- Try to use index-only query plans, in other words try to retrieve the query using the index only.
- Use secondary indexes on columns frequently involved in order by and group by as well as sorting operations union and distinct
||this is the default index type, you will probably use it for almost all the indexes in a typical OLTP application.
||This is ideal for column data that has a low cardinality (few distinct values). Do not use if you have heavy DML going on the table, this is ideal in a data warehouse environment.
(index organized table)
|IOT's place all the table data in its primary key index, thus
eliminating the need for a separate index. The data is sorted and rows
are stored in primary key order. This type of index will save storage
space when compared to a normal B-Tree index.
||Concatenated or Composite indexes are indexes that include more
than one column and are excellent for improving selectivity of the where predicates. You are trying to elimate the optimizer performing full table scans thus reducing I/O.
||are efficient in frequently used statements that involve
functions or complex expression on columns, they can quickly return the
computed value of the function or expression directly from the index.
||These are ideal when you have a heavy insert application, the
reverse key index provide an efficient way to distribute the index
values more evenly and thus improve performance.
||each partition can be distributed across disks thus increasing
performance, also each partition can be maintained separately (backup,
remove) without affecting any other partition You can also use parallel
query options to improve performance.
It is worth rebuilding indexes regularly so queries can run faster, see indexes for more information.
For more information on indexes click here
In a ideal world you should parse just once and use the same parsed
version of the statement for repeated executions, this operation is much
more expensive than actually executing the statement. In order you use
the same execution plan for the same query the SQL statement must be
identical, you use bind variables to archive this.
The way oracle knows if the query is identical is that the statement
is hashed and a hash key stored, if this key does not match then the
statement is not the same, so even you add a space or replace a letter
with a upper case letter the hash key will be different.
There is a parameter you can set to force statements that fail to use bind variables to do so
|Force bind variable usage
alter system set cursor_sharing = force;
alter system set cursor_sharing = similar;
alter system set cursor_sharing = extact;
force - forces the use of bind variables, oracle will substitute the hard coded variable
similar - force the use of a bind variable only if oracle thinks it will not adversely affects optimization.
only use the already parsed query if it is identical (hash key matches)
If you are dealing with very large amounts of data, you should
consider using materialized views to improve response time. Materialized
views are objects with summary data from the underlying table.
Expensive table joins can be done beforehand and saved in the
materialized view. You can use the package dbms_olap package to get recommendations on ideal materialized views.
For more information about materialized views click here.
The CBO doesn't always use the same execution strategy, changes
within the database can force the CBO to change its plan. You can force
oracle to use the same plan by using the plan stability feature stored outlines
to preserve the current execution plans, even if the statistics and
optimizer mode changes. The only catch is that the SQL statement must be
identical if you wish to use the stored outline.
Stored outlines are use when you plan to migrate from one oracle
version to another, thus you can cut risks and preserve the applications
present performance via stored outlines. Outlines ensure that the
execution paths the queries used in a test instance successfully carry
over to the production instance. Also it can be used to override the
code that is imbedded in the application.
The more common name for this feature is called optimizer stability.
All the information on stored outlines is stored in the OUTLN schema in
two tables OL$ and OL$HINTS (these are created with you install
alter system set query_rewrite_enabled = true;
alter system set star_transformation_enabled = true;
alter system set optimizer_features_enabled = 10.2.0;
Note: the above values must be the same on all instances, when using this feature across different databases.
|Stored Outlines (database)
alter system set create_stored_outlines = true;
Note: this can use lots of disk space
|Stored Outlines (session)
alter session set create_stored_outlines = true;
||create outline test_outline
on select employee_id, last_name
||alter system set use_stored_outline = true;
alter session set use_stored_outline = true;
Use the dbms_outln_edit package
change_join_pos - changes the join position for the hint identified by outline name
create_edit_tables - creates outline editing tables in calling a users schema
dropedit_tables - drops outline editong tables
generate_signature - generates a signature for the specified SQL text
referesh_private_outline - refreshes the in-memory copy of the outline
||drop outline test_outline;
||A view of the below tables (name, owner, category, used, sql_text)
Contains the outlines (name, sql_text, signature, category, flags, etc)
Note: flags is useful as it can tell you if the outline is being used.
||Contains the outlines hints (name, hint_type, hint_text, table_name, etc)
||Contains the outlines nodes
|Manage stored outlines and their outline categories
|Manage stored outlines and their outline categories
The CBO normally assumes that the data is uniformly distributed in
the table, however there are times when the data is extremely skewed
which means you are better off using histograms to store column
statistics, histograms provide more efficient access methods. Histograms
use buckets to represent distribution of data in a column and Oracle
uses these buckets to how skewed the data distribution is.
You can use the following histograms
- height-based - divide column values into bands, with each band containing a roughly equal number of rows
- frequency-based - determine the number of buckets based
on the distinct values in the column, each bucket contains all the data
that has the same value
ownname => 'HR',
tabname => 'benefits',
method_opt => 'for column size 10 number_of_vists'
ownname => 'HR',
tabname => 'benefits',
method_opt => 'for column size skewonly'
ownname => 'HR',
tabname => 'BENEFITS',
method_opt => 'for all columns size auto'
for column - your own histogram creation
auto - let oracle decide what to do based on data distribution and workload
skewonly - based the decision only on the data distribution of the columns
ownname => 'HR',
tabname => 'benefits',
method_opt => 'for column size 20 department_id'
||describes histograms on columns of all tables in the database
||is a synonym for DBA_TAB_HISTOGRAMS
||provides the histogram data (end-points per histogram) for histograms on all table partitions in the database
||lists actual histogram data (end-points per histogram) for histograms on all table subpartitions in the database
When you are developing a database there are four system parameters that can help the developer
(disabled by default)
Used to indentify errors and poor performance
alter system set plsql_warnings = 'ENABLE:ALL' scope = both;
alter system set plsql_warnings = 'ENABLE:PERFORMANCE','ENABLE:SERVE' scope=both;
(false by default)
|Additional debuging information, compiled code will be stored as interpreted regardless of PLSQL_CODE_TYPE
alter system set plsql_debug = false scope = both;
(2 by default)
|Optimize when compiled
alter system set plsql_optimize_mode = 2 scope=both;
Compiled code into interpreted byte code (default) or native machine code
alter system plsql_code_type = native scope=both;
Note: When PL/SQL objects are compiled the resulting code is
stored depends on the PLSQL_CODE_TYPE, Native compiled code will be
stored in an O/S system file, while interpreted code is stored in the