Latches and enqueues

Tracing sessions: waiting on an enqueue

posted Jul 11, 2011, 6:44 AM by Sachchida Ojha

PURPOSE
-------

How to monitor and trace locking problems (enqueues).


SCOPE
-----

DBA's and people working with technical support.


APPLICATION
-----------

This bulletin will assist you when:

1. Requiring to figure out if an instance has had locking problems with 
   enqueues.  See section "Has locking been a problem".

2. Are there sessions just now in the instance that are waiting on
   enqueues and who are the blockers. See "Is locking a problem just now".


Abstract
--------

What are locks, enqueues and latches?
What are the common types of enqueues?
Has locking been a problem?
Is locking a problem just now?

All those questions will be answered in this document.

To get better performance (get a better flow), one approach is to
eliminate the time a session is waiting to get a resource (for example
an enqueue). You don’t want sessions that are waiting on other sessions,
because those waiting sessions may also hold enqueues and resources, which
will make other sessions wait for them - which can end up with a deadlock.

===============================================================================

What are locks, enqueues and latches
-----------------------------------

Often the word "lock" is synonymous to enqueue. In this document, "enqueue" 
will be used for the locking mechanism that you can find in V$LOCK. "locking" 
will be used when talking about requesting an enqueue, waiting on an enqueue 
and blocking others while holding the enqueue.

A lock is a resource that you need to hold to get access to the resource.
Oracle has two kind of locks: enqueues and latches. This bulletin will not
talk about latches. A latch is a fast, inexpensive and non-sophisticated
lock. A latch is used when you need serial access to operations/functions
in Oracle. There is no ordered queue for the waiters on a latch - first to
grab it gets it.

Enqueues are sophisticated locks for managing access to shared resources
(like tables, rows, jobs, redo threads). An enqueue can be requested in
different levels/mode: null, row share, row exclusive, share, share row
exclusive or exclusive. If a session holds an enqueue in share mode, other
sessions can then also take the enqueue in share mode (for the same resource).
If a session holds an enqueue in exclusive mode, other sessions that wants to
get it - independently in which level - they have to wait.

When access is required by a session, a lock structure is obtained and a
request is made to acquire access to the resource at a specific level (mode)
is made. The lock structure is placed on one of three linked lists which hang
off of the resource, called the OWNER (if the enqueue could be acquired),
WAITER (if the session is waiting to acquiring the enqueue) and CONVERTER
(the session is holding the enqueue in one level but wants to convert it to
another) lists.

An enqueue is uniquely defined by it's TYPE, ID1 and ID2 (which are columns
in the V$LOCK view). For example can there only be one enqueue for user
SCOTT's EMP table (identified by TYPE=TM, ID1=<object ID>, ID2=0).


Common types of enqueues
------------------------

· JQ - Job Queue. When a job (submitted by DBMS_JOB.SUBMIT) is running,
  it is protected by a JQ enqueue (which means that only one SNP-process
  can run the job).

· ST - Space management Transaction. The ST enqueue is need to be held
  every time the session is allocating/deallocating extents (which means
  wants to change the UET$ and FET$ dictionary tables), like coalescing,
  drop/truncate segments and disk -sorting. If the session gets a timeout
  when requesting the ST enqueue, "ORA-1575 timeout waiting for space
  management" is returned.

· TM - DML (Table) enqueue. Every time a session wants to lock a table,
  a TM enqueue is requested. If a session deletes a row in the parent-table
  (DEPT) and a referential constraint (foreign key) is created without
  an index on the child-table (EMP), or if the session is updating the
  column(s) that the foreign key references to then a share lock (level 4)
  is taken on the child table. If another session tries to do changes to
  the child-table they have to wait (because they want the enqueue in row
  exclusive mode, and that is not compatible with the share mode). If an
  index is created on the child-table’s foreign key-column, then no share-
  lock is required on the child-table.

· TX - Transaction. As soon as a transaction is started a TX enqueue is
  needed. A transaction is uniquely defined by the rollback segment number,
  the slot number in the rollback segment’s transaction table and the slot
  number’s sequence number. A session can be waiting on a TX enqueue for
  several reasons: 1) Another session is locking the requested row.
  2) When two sessions tries to insert the same unique key into a table
  (none of them has done a COMMIT), then the last session is waiting for
  the first one to COMMIT or ROLLBACK. 3) There are no free ITL (Interested
  Transaction List) in the block header (increase INI_TRANS och PCT_FREE
  for the segment).

· UL - User Lock. A session has taken a lock with the DBMS_LOCK.REQUEST
  function.

===============================================================================

Has locking been a problem
--------------------------

To see if the instance has had locking problems since startup, execute the
following SELECT:

  SELECT * 
    FROM v$sysstat 
   WHERE class=4;

STATISTIC# NAME                     CLASS     VALUE
        22 enqueue timeouts             4         0
        23 enqueue waits                4         2
        24 enqueue deadlocks            4         0
        25 enqueue requests             4       213
        26 enqueue conversions          4         0
        27 enqueue releases             4       204

· "enqueue timeouts" shows the total number of enqueue operations (get
  and convert) that timed out before they could complete.

· "enqueue waits" shows how many times a session had to wait for an
   enqueue. This column is updated first after the wait is finished.

   FYI:

      enqueue waits time out (usually every 3 seconds). Hence a single session 
      waiting for 1 minute could show up as 20 waits in the "wait event" section
      of a BSTAT/ESTAT report but only ONE enqueue wait in the 
      "statistics"(output from V$SYSSTAT) section of the BSTAT/ESTAT.      

· "enqueue deadlocks" is how many times a deadlock situation has
  occured (every time the client receives an ORA-60 and a trace file
  will be created). This value should be zero, else an investigation
  should be made and the trace files should be checked.

· "enqueue requests" minus "enqueue releases" shows how many locks that
  are held just now (which is equal to the number of rows in V$LOCK).

· "enqueue conversions" is how many times an session is holding a lock
  in one mode and then wants to change the mode (for example, first the
  session is doing an SELECT * FROM emp FOR UPDATE and then a UPDATE emp
  SET sal=9876 WHERE empno=7839).

· "enqueue releases" shows how many times an enqueue was released (freed).

Maybe the session(s) that created these waits is still connected. If
they are, you find them in V$SESSTAT (check first with V$STATNAME or
V$SYSSTAT that statistic# 23 is "enqueue waits"):

  SELECT * 
    FROM v$sesstat 
   WHERE statistic# = 23 AND value > 0;

--------------------------------------------------------------------------------

If a session has been waiting for an enqueue, then a row in V$SYSTEM_EVENT is
created. If the instance is running with TIMED_STATISTICS = TRUE, you can see
for how long time the session had to wait for the enqueue:

  SELECT *
    FROM v$system_event
   WHERE event = 'enqueue';

EVENT        TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT
enqueue               15             13        4193    279.53333

· TOTAL_WAITS and AVERAGE_WAIT are not showing what you are expecting.
  The real number of waits are "enqueue waits" in V$SYSSTAT, or by taking
  TOTAL_WAITS minus TOTAL_TIMEOUTS. To get the correct AVERAGE_WAIT you
  should take TIME_WAITED (4193) divided by the real number of waits (2).
  So the real average wait time is not 2.79 seconds - it is 20.96 seconds.

With the following SELECT you can see for how long time the connected
sessions had to wait for an enqueue and how long the maximum wait time was:

  SELECT COUNT( * ), SUM( time_waited ), MAX( max_wait )
    FROM v$session_event
   WHERE event = 'enqueue';

*) Remember that in V$SYSTEM_EVENT you see the total values for all sessions
   that has been connected to the instance and those that are still connected.
   In V$SESSION_EVENT you will only see values for those sessions that are
   connected to the instance just now.
-------------------------------------------------------------------------------

To see the gets and waits that had been on the different types of enqueues,
check in the fixed internal table X$KSQST:

  SELECT *
    FROM x$ksqst
   WHERE ksqstget > 0;

ADDR          INDX   INST_ID KS  KSQSTGET  KSQSTWAT
01ECBE30        69         1 CF         9         0
01ECBE48        72         1 CI        12         0
01ECBEA8        84         1 CU        57         0
01ECBF60       107         1 DL        28         0
01ECBF90       113         1 DR         1         0
01ECBFB0       117         1 DV        12         0
01ECC498       274         1 IS         5         0
01ECC890       401         1 MR        16         0
01ECCB30       485         1 PF         1         0
01ECCDA0       563         1 RT         1         0
01ECCE88       592         1 SQ         1         0
01ECCEA0       595         1 ST         1         0
01ECCF68       620         1 TM        14         0
01ECCF98       626         1 TS         1         0
01ECCFC0       631         1 TX        13         2
01ECD098       658         1 US        59         0

· The sum of KSQSTWAT is equal to "enqueue waits" in V$SYSSTAT, so here you
  can see what type of locks that the sessions and background processes
  had been waiting on.

===============================================================================

Is locking a problem currently
------------------------------

This is an example of how you can trace locking problems.

Background: User SCOTT has updated the salary for all employees in table EMP.
Then user SYSTEM tries to update the JOB-description for employee KING in the
same table.


The SELECTS
-----------

Which session is the blocker and which sessions are waiting to get the lock?
This SELECT will return no rows if there are no waiters and blockers. You can
never find situations when there is only blocker(s) or waiter(s). One session
can block many others.

    column Username format A15 column Sid format 9990 heading SID
    column Type format A4 column Lmode format 990 heading 'HELD'
    column Request format 990 heading 'REQ' column Id1 format 9999990
    column Id2 format 9999990 break on Id1 skip 1 dup

    SELECT SN.Username, M.Sid, M.Type,
        DECODE(M.Lmode, 0, 'None', 1, 'Null', 2, 'Row Share', 3, 'Row
        Excl.', 4, 'Share', 5, 'S/Row Excl.', 6, 'Exclusive',
        LTRIM(TO_CHAR(Lmode,'990'))) Lmode,
        DECODE(M.Request, 0, 'None', 1, 'Null', 2, 'Row Share', 3, 'Row
        Excl.', 4, 'Share', 5, 'S/Row Excl.', 6, 'Exclusive',
        LTRIM(TO_CHAR(M.Request, '990'))) Request,
        M.Id1, M.Id2 
    FROM V$SESSION SN, V$LOCK M
    WHERE (SN.Sid = M.Sid and M.Request ! = 0)
        or (SN.Sid = M.Sid and M.Request = 0 and Lmode != 4 and (id1, id2)
        in (select S.Id1, S.Id2 from V$LOCK S where Request != 0 and S.Id1
        = M.Id1 and S.Id2 = M.Id2) ) order by Id1, Id2, M.Request; 


USERNAME             SID TY LMODE         REQUEST             ID1        ID2
---------------- ------- -- ------------- -------------  ---------- --------
                               
SCOTT                 9  TX Exclusive     None             262154        117

SCOTT                11  TX None          Exclusive        262154        117

 Session 9 is blocking(LMODE=Exclusive)
 Session 11 is waiting(REQUEST=Exclusive)
 The meaning of ID1 and ID2 depend on the lock TYPE.

· We can see situations where a session is both a Blocker and a Waiter.
· If there are only two sessions and both are Blockers and Waiters then we
  got a deadlock situation (which Oracle will solve automatically).
· A fast way to check blocking/waiting situations is to run:

  SELECT * FROM v$lock WHERE block > 0 OR request > 0;


**NEW for Oracle 10g**
~~~~~~~~~~~~~~~~~~~~~~

 V$SESSION now tracks the same wait data as V$SESSION_WAIT

Please see Note:245981.1 10g New Functionality of V$SESSION and WAIT MODEL


===============================================================================

Get some common information about these sessions from V$SESSION:

  SELECT sid, taddr, lockwait, status, sql_address,
         row_wait_obj# RW_OBJ#, row_wait_file# RW_FILE#, row_wait_block#
         RW_BLOCK#, row_wait_row# RW_ROW#
    FROM v$session 
   WHERE sid IN( 7, 8 ) 
   ORDER BY sid;

SID TADDR    LOCKWAIT STATUS   SQL_ADDRESS RW_OBJ# RW_FILE# RW_BLOCK# RW_ROW#
  7 01F2D014 01EC9858 ACTIVE      0224C9A8    2494        1     15631       8
  8 01F2CCBC          INACTIVE    02258D80      -1        0         0       0

· The column TADDR could be joined with ADDR in V$TRANSACTION.

· The sessions that are waiting to get a lock has a value (IS NOT NULL)
  in LOCKWAIT.

· Status for a waiting session is ACTIVE. There are a lot of SQL:s around,
  that counts the number of active sessions by looking at this column.

· SQL_ADDR is the address for the last SQL statement that the session executed.
  Could be joined with V$SQL.ADDRESS and V$OPEN_CURSOR.ADDRESS. For the Waiter
  this is the SQL-command that made him wait. For the blocker this could be the
  SQL that makes the block but it could also be any later SQL.

· If ROW_WAIT_OBJ# has not the value -1, then it means that the session is
  waiting on a lock for this object. Could be joined with
  DBA_OBJECTS.OBJECT_ID or DBA_OBJECTS.DATA_OBJECT_ID.


     select DATA_OBJECT_ID from  dba_objects
     where OBJECT_ID = ROW_WAIT_OBJ#;

· The other ROW_WAIT* columns shows where the row is located that the
  waiting session wants to get a lock on.

· From DBA_OBJECTS you can see who is the owner to the table and the table
  name, then you can with the information from the ROW_WAIT* columns build
  the following SELECT (the first argument is 1 for the Oracle8 extended ROWID,
  the second is the object ID, and then the file#, block#, slot#):

  SELECT * FROM scott.emp
   WHERE rowid = DBMS_ROWID.ROWID_CREATE( 1, 2494, 1, 15631, 8 );

===============================================================================

Get information about the enqueues the sessions are holding and waiting on:

  SELECT * 
    FROM v$lock 
   WHERE sid IN( 7, 8 ) 
   ORDER BY sid, type;

ADDR     KADDR    SID  TY    ID1   ID2  LMODE  REQUEST  CTIME  BLOCK
01F27D0C 01F27D20   7  TM   2494     0      3        0   1008      0
01EC9848 01EC9858   7  TX  65546  5989      0        6   1008      0
01F27C98 01F27CAC   8  TM   2494     0      3        0   1032      0
01CD8820 01CD88EC   8  TX  65546  5989      6        0   1032      1

· If a session is waiting to get a lock, then column KADDR contains the
  same value as V$SESSION.LOCKWAIT.

· The type column shows what type of lock the session is holding or waiting on.

· For the lock type TM (table lock), column ID1 is the object_id (or the
  data_object_id) for that table (could be joined with DBA_OBJECTS). For a
  lock of type TX (transaction lock), TRUNC( ID1 / 65536 ) shows which
  rollback segment that is used for the transaction and ID1 - 
  ( rollbacksegment# * 65536 ) shows which slot number in the transaction
  table that this sessions is using (or want to used).

· LMODE shows in which mode the session is holding  the lock.
  0=None, 1=NULL, 2=Row share (SS), 3=Row exclusive (SX), 4=Share (S),
  5=Share row exclusive (SSX), 6=Exclusive (X)

· REQUEST is the mode for which the session wants the lock.

· If a session has values greater than 0 in both LMODE and REQUEST, then the
  session is a Converter.

· CTIME is how many seconds the session has either hold the lock or waited
  for the lock.

· BLOCK is greater than 0 if the lock is blocking other sessions (if there
  are other sessions waiting for this lock to be released).

===============================================================================

What are the sessions doing (which event are the sessions waiting on just
now or last waited on):

  SELECT sid, seq#, event, p1raw, p2raw, p3raw, state, wait_time WAIT,
         seconds_in_wait SECS
    FROM v$session_wait 
   WHERE sid IN( 7, 8 ) 
   ORDER BY sid;

SID SEQ# EVENT                      P1RAW    P2RAW    P3RAW    STATE  WAIT SECS
  7 488 enqueue                     54580006 0001000A 00001765 WAITING   0 1011
  8 147 SQL*Net message from client 28444553 00000001 00       WAITING   0 1035

· SEQ# is increasing by 1 for each wait.

· EVENT is the name of the event (or resource) that the session is waiting on.

· From this you can draw the conclusion that session 8 has locked a row that
  session 7 wants, but session 8 has not done a COMMIT or ROLLBACK (and
  that’s why session 7 is waiting). Session 8 has been idle in 1035 seconds
  (typically the user has gone for a cup of coffee).

===============================================================================

Get some information about the Blocker’s transaction:

  SELECT t.addr, t.xidusn USN, t.xidslot SLOT, t.xidsqn SQL, t.status,
         t.start_time, t.used_ublk UBLK, t.used_urec UREC, t.log_io LOG,
         t.phy_io PHY, t.cr_get, t.cr_change CR_CHA
    FROM v$transaction t, v$session s 
   WHERE s.sid IN( 7, 8 ) AND t.addr = s.taddr;

ADDR    USN SLOT  SQN STATUS START_TIME        UBLK UREC LOG PHY CR_GET CR_CHA
01F2CCBC  1   10 5989 ACTIVE 01/22/00 11:11:28    2   15  54  34     32      0

· ADDR can be joined with V$SESSION.TADDR.

· USN is the Undo Segment Number. Can be joined with V$ROLLNAME.USN and
  V$ROLLSTAT.USN.

· SLOT is the slot number in the rollback segment’s transaction table.

· SQN is the sequence number for the transaction.

· USN+SLOT+SQN are the three values that uniquely identifies a transaction.

· UBLK is how many undo blocks that the transaction has used.

  @updated or deleted. If you are doing an INSERT or DELETE, then you will
 @ see that UREC is set to <number of indexes for this table> + how many rows
 @you inserts/deletes. If you UPDATE a column then UREC will be set to
  @<number of indexes that this column belongs to> * 2 + number of updated
  @rows (so if the column belongs to no index, then UREC is set to the number
  @of rows that was updated).

· If USED_UBLK and USED_UREC is decreasing for a transaction every time you
  look, it means that the transaction is rolling back. When USED_UREC is down
  to zero the rollback is finished. So here you can see when the rollback
  phase is finished after an instance/crash recovery.

===============================================================================

Which SQL statements are the sessions currently executing (or last executed):

 SELECT s.sid, q.users_executing, q.sql_text 
   FROM v$session s, v$sql q
  WHERE s.sid IN( 7, 8 ) AND q.address = s.sql_address 
  ORDER BY s.sid;

SID USERS_EXECUTING SQL_TEXT
  7               1 update scott.emp set job=job where empno=7839
  8               0 SELECT * FROM DUAL

· The SQL statement for the Waiter is the SQL that the user is hanging on.
· The SQL statement for the Blocker may not be the SQL that is holding the
  lock.
· If USERS_EXECUTING greater than 0 then this SQL is executing just now.

===============================================================================

Get all open cursors for the two session:

  SELECT sid, user_name, address, sql_text
    FROM v$open_cursor 
   WHERE sid IN( 7, 8 ) 
   ORDER BY sid;

SID USER_NAME ADDRESS  SQL_TEXT
  7 SYSTEM    0224C9A8 update scott.emp set job=job where empno=7839
  8 SCOTT     02258D80 update emp set sal=sal
  8 SCOTT     02263154 SELECT * FROM DUAL

· Here you can see the SQL that is blocking others (if the session has not
  closed the cursor yet, tools like SQL*Plus closes the cursor automatically
  after execution).

===============================================================================

Get some information about the Blocker’s session so you can call him and tell
him to COMMIT/ROLLBACK his transaction.

  SELECT sid, serial#, username, osuser, program, machine
    FROM v$session 
   WHERE sid = 8;

SID  SERIAL#  USERNAME  OSUSER    PROGRAM                 MACHINE
  8      147  SCOTT     jtreugut  c:\orawin95\bin\plus80  JTREUGUT2

· If you need to kill the session, only do it with ALTER SYSTEM KILL
  SESSION '<sid>, <serial#>' command. The session will then have status
  KILLED or MARKED FOR KILLED (if PMON couldn’t terminate the session directly).
  But the locks on the resources can be held for a long time (until PMON has
  cleaned up everything).
Tracing sessions: waiting on an enqueue [ID 102925.1]

How to monitor and trace locking problems (enqueues)

posted Mar 9, 2011, 11:05 AM by Sachchida Ojha

What are locks, enqueues and latches?
What are the common types of enqueues?
Has locking been a problem?
Is locking a problem just now?


All those questions will be answered in this document.

To get better performance (get a better flow), one approach is to eliminate the time a session is waiting to get a resource (for example an enqueue). You don’t want sessions that are waiting on other sessions,because those waiting sessions may also hold enqueues and resources, which will make other sessions wait for them - which can end up with a deadlock.

===============================================================================

What are locks, enqueues and latches
-----------------------------------

Often the word "lock" is synonymous to enqueue. In this document, "enqueue"  will be used for the locking mechanism that you can find in V$LOCK. "locking"  will be used when talking about requesting an enqueue, waiting on an enqueue
and blocking others while holding the enqueue.

A lock is a resource that you need to hold to get access to the resource. Oracle has two kind of locks: enqueues and latches. This bulletin will not talk about latches. A latch is a fast, inexpensive and non-sophisticated lock. A latch is used when you need serial access to operations/functions in Oracle. There is no ordered queue for the waiters on a latch - first to grab it gets it.

Enqueues are sophisticated locks for managing access to shared resources (like tables, rows, jobs, redo threads). An enqueue can be requested in different levels/mode: null, row share, row exclusive, share, share row exclusive or exclusive. If a session holds an enqueue in share mode, other sessions can then also take the enqueue in share mode (for the same resource). If a session holds an enqueue in exclusive mode, other sessions that wants to get it - independently in which level - they have to wait.

When access is required by a session, a lock structure is obtained and a request is made to acquire access to the resource at a specific level (mode) is made. The lock structure is placed on one of three linked lists which hang off of the resource, called the OWNER (if the enqueue could be acquired),WAITER (if the session is waiting to acquiring the enqueue) and CONVERTER (the session is holding the enqueue in one level but wants to convert it to another) lists.

An enqueue is uniquely defined by it's TYPE, ID1 and ID2 (which are columns in the V$LOCK view). For example can there only be one enqueue for user SCOTT's EMP table (identified by TYPE=TM, ID1=<object ID>, ID2=0).


Common types of enqueues
------------------------

· JQ - Job Queue. When a job (submitted by DBMS_JOB.SUBMIT) is running,it is protected by a JQ enqueue (which means that only one SNP-process can run the job).

· ST - Space management Transaction. The ST enqueue is need to be held every time the session is allocating/deallocating extents (which means wants to change the UET$ and FET$ dictionary tables), like coalescing,drop/truncate segments and disk -sorting. If the session gets a timeout when requesting the ST enqueue, "ORA-1575 timeout waiting for space management" is returned.

· TM - DML (Table) enqueue. Every time a session wants to lock a table,a TM enqueue is requested. If a session deletes a row in the parent-table (DEPT) and a referential constraint (foreign key) is created without an index on the child-table (EMP), or if the session is updating the column(s) that the foreign key references to then a share lock (level 4) is taken on the child table. If another session tries to do changes to the child-table they have to wait (because they want the enqueue in row exclusive mode, and that is not compatible with the share mode). If an index is created on the child-table’s foreign key-column, then no share-lock is required on the child-table.

· TX - Transaction. As soon as a transaction is started a TX enqueue is needed. A transaction is uniquely defined by the rollback segment number, the slot number in the rollback segment’s transaction table and the slot number’s sequence number. A session can be waiting on a TX enqueue for several reasons:

1) Another session is locking the requested row.

2) When two sessions tries to insert the same unique key into a table (none of them has done a COMMIT), then the last session is waiting for the first one to COMMIT or ROLLBACK. 3) There are no free ITL (Interested Transaction List) in the block header (increase INI_TRANS och PCT_FREE for the segment).

· UL - User Lock. A session has taken a lock with the DBMS_LOCK.REQUEST function.

===============================================================================

Has locking been a problem
--------------------------

To see if the instance has had locking problems since startup, execute the
following SELECT:

SELECT *
FROM v$sysstat
WHERE class=4;

STATISTIC# NAME                     CLASS     VALUE
22 enqueue timeouts             4         0
23 enqueue waits                4         2
24 enqueue deadlocks            4         0
25 enqueue requests             4       213
26 enqueue conversions          4         0
27 enqueue releases             4       204

· "enqueue timeouts" shows the total number of enqueue operations (get and convert) that timed out before they could complete.

· "enqueue waits" shows how many times a session had to wait for an enqueue. This column is updated first after the wait is finished.

FYI:

enqueue waits time out (usually every 3 seconds). Hence a single session  waiting for 1 minute could show up as 20 waits in the "wait event" section of a BSTAT/ESTAT report but only ONE enqueue wait in the
"statistics"(output from V$SYSSTAT) section of the BSTAT/ESTAT.     

· "enqueue deadlocks" is how many times a deadlock situation has occured (every time the client receives an ORA-60 and a trace file will be created). This value should be zero, else an investigation should be made and the trace files should be checked.

· "enqueue requests" minus "enqueue releases" shows how many locks that are held just now (which is equal to the number of rows in V$LOCK).

· "enqueue conversions" is how many times an session is holding a lock in one mode and then wants to change the mode (for example, first the session is doing an SELECT * FROM emp FOR UPDATE and then a UPDATE emp
SET sal=9876 WHERE empno=7839).

· "enqueue releases" shows how many times an enqueue was released (freed).

Maybe the session(s) that created these waits is still connected. If they are, you find them in V$SESSTAT (check first with V$STATNAME or V$SYSSTAT that statistic# 23 is "enqueue waits"):

SELECT *
FROM v$sesstat
WHERE statistic# = 23 AND value > 0;

--------------------------------------------------------------------------------

If a session has been waiting for an enqueue, then a row in V$SYSTEM_EVENT is created. If the instance is running with TIMED_STATISTICS = TRUE, you can see for how long time the session had to wait for the enqueue:

SELECT *
FROM v$system_event
WHERE event = 'enqueue';

EVENT        TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT
enqueue               15             13        4193    279.53333

· TOTAL_WAITS and AVERAGE_WAIT are not showing what you are expecting.
The real number of waits are "enqueue waits" in V$SYSSTAT, or by taking TOTAL_WAITS minus TOTAL_TIMEOUTS. To get the correct AVERAGE_WAIT you should take TIME_WAITED (4193) divided by the real number of waits (2).
So the real average wait time is not 2.79 seconds - it is 20.96 seconds.

With the following SELECT you can see for how long time the connected sessions had to wait for an enqueue and how long the maximum wait time was:

SELECT COUNT( * ), SUM( time_waited ), MAX( max_wait )
FROM v$session_event
WHERE event = 'enqueue';

*) Remember that in V$SYSTEM_EVENT you see the total values for all sessions that has been connected to the instance and those that are still connected. In V$SESSION_EVENT you will only see values for those sessions that are
connected to the instance just now.
-------------------------------------------------------------------------------

To see the gets and waits that had been on the different types of enqueues, check in the fixed internal table X$KSQST:

SELECT *
FROM x$ksqst
WHERE ksqstget > 0;

ADDR          INDX   INST_ID KS  KSQSTGET  KSQSTWAT
01ECBE30        69         1 CF         9         0
01ECBE48        72         1 CI        12         0
01ECBEA8        84         1 CU        57         0
01ECBF60       107         1 DL        28         0
01ECBF90       113         1 DR         1         0
01ECBFB0       117         1 DV        12         0
01ECC498       274         1 IS         5         0
01ECC890       401         1 MR        16         0
01ECCB30       485         1 PF         1         0
01ECCDA0       563         1 RT         1         0
01ECCE88       592         1 SQ         1         0
01ECCEA0       595         1 ST         1         0
01ECCF68       620         1 TM        14         0
01ECCF98       626         1 TS         1         0
01ECCFC0       631         1 TX        13         2
01ECD098       658         1 US        59         0

· The sum of KSQSTWAT is equal to "enqueue waits" in V$SYSSTAT, so here you can see what type of locks that the sessions and background processes had been waiting on.

===============================================================================

Is locking a problem currently
------------------------------

This is an example of how you can trace locking problems.

Background: User SCOTT has updated the salary for all employees in table EMP. Then user SYSTEM tries to update the JOB-description for employee KING in the same table.


The SELECTS
-----------

Which session is the blocker and which sessions are waiting to get the lock?

This SELECT will return no rows if there are no waiters and blockers. You can never find situations when there is only blocker(s) or waiter(s). One session can block many others.

column Username format A15 column Sid format 9990 heading SID
column Type format A4 column Lmode format 990 heading 'HELD'
column Request format 990 heading 'REQ' column Id1 format 9999990
column Id2 format 9999990 break on Id1 skip 1 dup

SELECT SN.Username, M.Sid, M.Type,
DECODE(M.Lmode, 0, 'None', 1, 'Null', 2, 'Row Share', 3, 'Row
Excl.', 4, 'Share', 5, 'S/Row Excl.', 6, 'Exclusive',
LTRIM(TO_CHAR(Lmode,'990'))) Lmode,
DECODE(M.Request, 0, 'None', 1, 'Null', 2, 'Row Share', 3, 'Row
Excl.', 4, 'Share', 5, 'S/Row Excl.', 6, 'Exclusive',
LTRIM(TO_CHAR(M.Request, '990'))) Request,
M.Id1, M.Id2
FROM V$SESSION SN, V$LOCK M
WHERE (SN.Sid = M.Sid and M.Request ! = 0)
or (SN.Sid = M.Sid and M.Request = 0 and Lmode != 4 and (id1, id2)
in (select S.Id1, S.Id2 from V$LOCK S where Request != 0 and S.Id1
= M.Id1 and S.Id2 = M.Id2) ) order by Id1, Id2, M.Request;


USERNAME             SID TY LMODE         REQUEST             ID1        ID2
---------------- ------- -- ------------- -------------  ---------- --------

SCOTT                 9  TX Exclusive     None             262154        117

SCOTT                11  TX None          Exclusive        262154        117

Session 9 is blocking(LMODE=Exclusive)
Session 11 is waiting(REQUEST=Exclusive)
The meaning of ID1 and ID2 depend on the lock TYPE.

· We can see situations where a session is both a Blocker and a Waiter.
· If there are only two sessions and both are Blockers and Waiters then we got a deadlock situation (which Oracle will solve automatically).
· A fast way to check blocking/waiting situations is to run:

SELECT * FROM v$lock WHERE block > 0 OR request > 0;


**NEW for Oracle 10g**
~~~~~~~~~~~~~~~~~~~~~~

V$SESSION now tracks the same wait data as V$SESSION_WAIT

Please see Note:245981.1 10g New Functionality of V$SESSION and WAIT MODEL


===============================================================================

Get some common information about these sessions from V$SESSION:

SELECT sid, taddr, lockwait, status, sql_address,
row_wait_obj# RW_OBJ#, row_wait_file# RW_FILE#, row_wait_block#
RW_BLOCK#, row_wait_row# RW_ROW#
FROM v$session
WHERE sid IN( 7, 8 )
ORDER BY sid;

SID TADDR    LOCKWAIT STATUS   SQL_ADDRESS RW_OBJ# RW_FILE# RW_BLOCK# RW_ROW#
7 01F2D014 01EC9858 ACTIVE      0224C9A8    2494        1     15631       8
8 01F2CCBC          INACTIVE    02258D80      -1        0         0       0

· The column TADDR could be joined with ADDR in V$TRANSACTION.

· The sessions that are waiting to get a lock has a value (IS NOT NULL)
in LOCKWAIT.

· Status for a waiting session is ACTIVE. There are a lot of SQL:s around, that counts the number of active sessions by looking at this column.

· SQL_ADDR is the address for the last SQL statement that the session executed.
Could be joined with V$SQL.ADDRESS and V$OPEN_CURSOR.ADDRESS. For the Waiter this is the SQL-command that made him wait. For the blocker this could be the SQL that makes the block but it could also be any later SQL.

· If ROW_WAIT_OBJ# has not the value -1, then it means that the session is waiting on a lock for this object. Could be joined with DBA_OBJECTS.OBJECT_ID or DBA_OBJECTS.DATA_OBJECT_ID.


select DATA_OBJECT_ID from  dba_objects
where OBJECT_ID = ROW_WAIT_OBJ#;

· The other ROW_WAIT* columns shows where the row is located that the
waiting session wants to get a lock on.

· From DBA_OBJECTS you can see who is the owner to the table and the table name, then you can with the information from the ROW_WAIT* columns build the following SELECT (the first argument is 1 for the Oracle8 extended ROWID,
the second is the object ID, and then the file#, block#, slot#):

SELECT * FROM scott.emp
WHERE rowid = DBMS_ROWID.ROWID_CREATE( 1, 2494, 1, 15631, 8 );

===============================================================================

Get information about the enqueues the sessions are holding and waiting on:

SELECT *
FROM v$lock
WHERE sid IN( 7, 8 )
ORDER BY sid, type;

ADDR     KADDR    SID  TY    ID1   ID2  LMODE  REQUEST  CTIME  BLOCK
01F27D0C 01F27D20   7  TM   2494     0      3        0   1008      0
01EC9848 01EC9858   7  TX  65546  5989      0        6   1008      0
01F27C98 01F27CAC   8  TM   2494     0      3        0   1032      0
01CD8820 01CD88EC   8  TX  65546  5989      6        0   1032      1

· If a session is waiting to get a lock, then column KADDR contains the
same value as V$SESSION.LOCKWAIT.

· The type column shows what type of lock the session is holding or waiting on.

· For the lock type TM (table lock), column ID1 is the object_id (or the data_object_id) for that table (could be joined with DBA_OBJECTS). For a lock of type TX (transaction lock), TRUNC( ID1 / 65536 ) shows which rollback segment that is used for the transaction and ID1 - ( rollbacksegment# * 65536 ) shows which slot number in the transaction table that this sessions is using (or want to used).

· LMODE shows in which mode the session is holding  the lock.
0=None, 1=NULL, 2=Row share (SS), 3=Row exclusive (SX), 4=Share (S),
5=Share row exclusive (SSX), 6=Exclusive (X)

· REQUEST is the mode for which the session wants the lock.

· If a session has values greater than 0 in both LMODE and REQUEST, then the session is a Converter.

· CTIME is how many seconds the session has either hold the lock or waited for the lock.

· BLOCK is greater than 0 if the lock is blocking other sessions (if there are other sessions waiting for this lock to be released).

===============================================================================

What are the sessions doing (which event are the sessions waiting on just now or last waited on):

SELECT sid, seq#, event, p1raw, p2raw, p3raw, state, wait_time WAIT,
seconds_in_wait SECS
FROM v$session_wait
WHERE sid IN( 7, 8 )
ORDER BY sid;

SID SEQ# EVENT                      P1RAW    P2RAW    P3RAW    STATE  WAIT SECS
7 488 enqueue                     54580006 0001000A 00001765 WAITING   0 1011
8 147 SQL*Net message from client 28444553 00000001 00       WAITING   0 1035

· SEQ# is increasing by 1 for each wait.

· EVENT is the name of the event (or resource) that the session is waiting on.

· From this you can draw the conclusion that session 8 has locked a row that session 7 wants, but session 8 has not done a COMMIT or ROLLBACK (and that’s why session 7 is waiting). Session 8 has been idle in 1035 seconds
(typically the user has gone for a cup of coffee).

===============================================================================

Get some information about the Blocker’s transaction:

SELECT t.addr, t.xidusn USN, t.xidslot SLOT, t.xidsqn SQL, t.status,
t.start_time, t.used_ublk UBLK, t.used_urec UREC, t.log_io LOG,
t.phy_io PHY, t.cr_get, t.cr_change CR_CHA
FROM v$transaction t, v$session s
WHERE s.sid IN( 7, 8 ) AND t.addr = s.taddr;

ADDR    USN SLOT  SQN STATUS START_TIME        UBLK UREC LOG PHY CR_GET CR_CHA
01F2CCBC  1   10 5989 ACTIVE 01/22/00 11:11:28    2   15  54  34     32      0

· ADDR can be joined with V$SESSION.TADDR.

· USN is the Undo Segment Number. Can be joined with V$ROLLNAME.USN and
V$ROLLSTAT.USN.

· SLOT is the slot number in the rollback segment’s transaction table.

· SQN is the sequence number for the transaction.

· USN+SLOT+SQN are the three values that uniquely identifies a transaction.

· UBLK is how many undo blocks that the transaction has used.

@updated or deleted. If you are doing an INSERT or DELETE, then you will
@ see that UREC is set to <number of indexes for this table> + how many rows
@you inserts/deletes. If you UPDATE a column then UREC will be set to
@<number of indexes that this column belongs to> * 2 + number of updated
@rows (so if the column belongs to no index, then UREC is set to the number
@of rows that was updated).

· If USED_UBLK and USED_UREC is decreasing for a transaction every time you look, it means that the transaction is rolling back. When USED_UREC is down to zero the rollback is finished. So here you can see when the rollback phase is finished after an instance/crash recovery.

===============================================================================

Which SQL statements are the sessions currently executing (or last executed):

SELECT s.sid, q.users_executing, q.sql_text
FROM v$session s, v$sql q
WHERE s.sid IN( 7, 8 ) AND q.address = s.sql_address
ORDER BY s.sid;

SID USERS_EXECUTING SQL_TEXT
7               1 update scott.emp set job=job where empno=7839
8               0 SELECT * FROM DUAL

· The SQL statement for the Waiter is the SQL that the user is hanging on.
· The SQL statement for the Blocker may not be the SQL that is holding the
lock.
· If USERS_EXECUTING greater than 0 then this SQL is executing just now.

===============================================================================

Get all open cursors for the two session:

SELECT sid, user_name, address, sql_text
FROM v$open_cursor
WHERE sid IN( 7, 8 )
ORDER BY sid;

SID USER_NAME ADDRESS  SQL_TEXT
7 SYSTEM    0224C9A8 update scott.emp set job=job where empno=7839
8 SCOTT     02258D80 update emp set sal=sal
8 SCOTT     02263154 SELECT * FROM DUAL

· Here you can see the SQL that is blocking others (if the session has not closed the cursor yet, tools like SQL*Plus closes the cursor automatically after execution).

===============================================================================

Get some information about the Blocker’s session so you can call him and tell him to COMMIT/ROLLBACK his transaction.

SELECT sid, serial#, username, osuser, program, machine
FROM v$session
WHERE sid = 8;

SID  SERIAL#  USERNAME  OSUSER    PROGRAM                 MACHINE
8      147  SCOTT     jtreugut  c:\orawin95\bin\plus80  JTREUGUT2

· If you need to kill the session, only do it with ALTER SYSTEM KILL SESSION '<sid>, <serial#>' command. The session will then have status KILLED or MARKED FOR KILLED (if PMON couldn’t terminate the session directly).
But the locks on the resources can be held for a long time (until PMON has cleaned up everything).

How To Identify a Hot Block Within The Database Buffer Cache?

posted Mar 9, 2011, 10:56 AM by Sachchida Ojha

How to identify blocks which cause latch contention on the 'cache buffers chains' latch.
How to identify a hot block within the database buffer cache.



Solution
Possible hot blocks in the buffer cache normally can be identified by a high or
rapid increasing wait count on the CACHE BUFFERS CHAINS latch.

This latch is acquired when searching for data blocks cached in the buffer cache.
Since the Buffer cache is implemented as a sum of chains of blocks, each of those
chains is protected by a child of this latch when needs to be scanned. Contention
in this latch can be caused by very heavy access to a single block. This can
require the application to be reviewed.

To solve a hot block, the application maybe need to be reviewed.

By examining the waits on this latch, information about the segment and the
specific block can be obtained using the following queries.

First determine which latch id(ADDR) are interesting by examining the number of
sleeps for this latch. The higher the sleep count, the more interesting the
latch id(ADDR) is:

SQL> select CHILD#  "cCHILD"
,      ADDR    "sADDR"
,      GETS    "sGETS"
,      MISSES  "sMISSES"
,      SLEEPS  "sSLEEPS"
from v$latch_children
where name = 'cache buffers chains'
order by 5, 1, 2, 3;

Run the above query a few times to to establish the id(ADDR) that has the most
consistent amount of sleeps. Once the id(ADDR) with the highest sleep count is found
then this latch address can be used to get more details about the blocks
currently in the buffer cache protected by this latch.
The query below should be run just after determining the ADDR with
the highest sleep count.

SQL> column segment_name format a35
select /*+ RULE */
e.owner ||'.'|| e.segment_name  segment_name,
e.extent_id  extent#,
x.dbablk - e.block_id + 1  block#,
x.tch,
l.child#
from
sys.v$latch_children  l,
sys.x$bh  x,
sys.dba_extents  e
where
x.hladdr  = '&ADDR' and
e.file_id = x.file# and
x.hladdr = l.addr and
x.dbablk between e.block_id and e.block_id + e.blocks -1
order by x.tch desc ;

Example of the output :
SEGMENT_NAME                     EXTENT#      BLOCK#       TCH    CHILD#
-------------------------------- ------------ ------------ ------ ----------
SCOTT.EMP_PK                       5            474          17     7,668
SCOTT.EMP                          1            449           2     7,668

Depending on the TCH column (The number of times the block is hit by a SQL
statement), you can identify a hotblock. The higher the value of the TCH column,
the more frequent the block is accessed by SQL statements.

In order to reduce contention for this object the following mechanisms can be put in place:

1) Examine the application to see if the execution of certain DML and SELECT statements
can be reorganized to eliminate contention on the object.

2) Decrease the buffer cache -although this may only help in a small amount of cases.

3) DBWR throughput may have a factor in this as well.
If using multiple DBWR's then increase the number of DBWR's.

4) Increase the PCTFREE for the table storage parameters via ALTER TABLE
or rebuild. This will result in less rows per block.

5) Consider implementing reverse key indexes
(if range scans aren't commonly used against the segment)

What is a latch?

posted Mar 9, 2011, 6:50 AM by Sachchida Ojha   [ updated Jul 11, 2011, 6:34 AM ]

The Oracle RDBMS makes use of different types of locking mechanisms. They are mainly latches, enqueues, distributed locks and global locks (used in parallel instance implementations).

1. What is a latch?

Latches are low level serialization mechanisms used to protect shared data structures in the SGA. The implementation of latches is operating system dependent, particularly in regard to whether a process will wait for a latch and for how long.

A latch is a type  of a lock that can be very quickly acquired and freed. Latches are typically used to prevent more than one process from executing the same  piece of  code at  a given time. Associated with each latch is a cleanup procedure that will be called if a process  dies while holding  the latch.  Latches  have an  associated level  that  is used to prevent deadlocks.  Once a process acquires a latch at a certain level it cannot subsequently acquire a latch at a  level that is equal to  or less than that level (unless it acquires it nowait).

2. Latches vs Enqueues

 Enqueues are another type of locking mechanism used in Oracle.  An enqueue is a more sophisticated mechanism which permits several concurrent processes to have varying degree of sharing of "known" resources. Any object  which can be concurrently used, can be protected with enqueues. A good example  is of locks on tables. We allow varying levels of sharing on tables e.g.  two processes can lock a table in share mode or in share update mode etc. One difference is that the enqueue is obtained using an OS specific     locking mechanism. An enqueue allows the user to store a value in the lock,  i.e the mode in which we are requesting it. The OS lock manager keeps track of the resources locked. If a process cannot be granted the lock because it  is incompatible with the mode requested and the lock is requested with wait,  the OS puts the requesting process on a wait queue which is serviced in FIFO. Another difference between latches and enqueues is that  in latches there is no ordered queue of waiters like in enqueues. Latch  waiters may either use timers to wakeup and retry or spin (only in  multiprocessors). Since all waiters are concurrently retrying (depending on the scheduler), anyone might get the latch and conceivably the first one to try might be the last one to get.


3. When do we need to obtain a latch?

 A process acquires a latch when working with a structure in the SGA (System Global Area). It continues to hold the latch for the period of time it works with the structure. The latch is dropped when the process is finished with the structure. Each latch protects a different set of data, identified by the name of the latch.

Oracle uses atomic instructions like "test and set" for operating on latches.Processes waiting to execute a part of code for which a latch has already been obtained by some other process will wait until the latch is released. Examples are redo allocation latches, copy latches, archive control latch etc. The basic idea is to block concurrent access to shared data structures. Since the instructions to set and free latches are atomic, the OS guarantees that only one process gets it. Since it is only one instruction, it is quite fast. Latches are held for short periods of time and provide a mechanism for cleanup in case a holder dies abnormally while holding it. This cleaning is done using the services of PMON.


4. Latches request modes?

Latches request can be made in two modes: "willing-to-wait" or "no wait". Normally,latches will be requested in "willing-to-wait" mode. A request in "willing-to-wait"  mode will loop, wait, and request again until the latch is obtained.  In "no wait" mode the process request the latch. If one is not available, instead of waiting, another one is requested. Only when all fail does the server process have to wait.

Examples of "willing-to-wait" latches are: shared pool and library cache latches A example of "no wait" latches is the redo copy latch.
    

5. What causes latch contention?

If a required latch is busy, the process requesting it spins, tries again and if still not available, spins again. The loop is repeated up to a maximum number of times determined by the initialization parameter _SPIN_COUNT. If after this entire loop, the latch is still not available, the process must yield the CPU and go to sleep. Initially is sleeps for one centisecond. This time is doubled in every subsequent sleep.

This causes a slowdown to occur and results in additional CPU usage,until a latch is available. The CPU usage is a consequence of the "spinning" of the process. "Spinning" means that the process continues to look for the availability of the latch after certain intervals of time,during which it sleeps.


6. How to identify contention for internal latches?
 

Relevant data dictionary views to query
---------------------------------------------

V$LATCH
V$LATCHHOLDER
V$LATCHNAME

Each row in the V$LATCH table contains statistics for a different type of latch. The columns of the table reflect activity for different types of latch requests. The distinction between these types of requests is whether the requesting process continues to request a latch if it  is unavailable:

   willing-to-wait        If the latch requested with a willing-to-wait
                                    request is not available, the requesting process
                                    waits a short time and requests the latch again.
                                    The process continues waiting and requesting until
                                    the latch is available.

    no wait                    If the latch requested with an immediate request is
                                    not available, the requesting process does not
                                    wait, but continues processing.

    V$LATCHNAME key information:
    ---------------------------------------

    GETS                      Number of successful willing-to-wait requests for
                                    a latch.

    MISSES                  Number of times an initial willing-to-wait request
                                    was unsuccessful.

    SLEEPS                  Number of times a process waited a requested a latch
                                    after an initial wiling-to-wait request.
    

    IMMEDIATE_GETS              Number of successful immediate requests for each latch.

    IMMEDIATE_MISSES          Number of unsuccessful immediate requests for each latch.
    

    Calculating latch hit ratio
    ----------------------------

    To get the Hit ratio for  latches apply the following formula:

    "willing-to-wait" Hit Ratio=(GETS-MISSES)/GETS
    "no wait" Hit Ratio=(IMMEDIATE_GETS-IMMEDIATE_MISSES)/IMMEDIATE_GETS

    This number should be close to 1. If not, tune according to the latch name


7. Useful SQL scripts to get latch information
 

    /*
    ** Display System-wide latch statistics.
    */
    column name format A32 truncate heading "LATCH NAME"
    column pid heading "HOLDER PID"
    select c.name,a.addr,a.gets,a.misses,a.sleeps,
    a.immediate_gets,a.immediate_misses,b.pid
    from v$latch a, v$latchholder b, v$latchname c
    where a.addr = b.laddr(+)
    and a.latch# = c.latch#
    order by a.latch#;

    /*
    ** Given a latch address, find out the latch name.
    */
    column name format a64 heading 'Name'
    select a.name from v$latchname a, v$latch b
    where b.addr = '&addr'
    and b.latch#=a.latch#;
    

    /*
    ** Display latch statistics by latch name.
    */
    column name format a32 heading 'LATCH NAME'
    column pid heading 'HOLDER PID'
    select c.name,a.addr,a.gets,a.misses,a.sleeps,
    a.immediate_gets,a.immediate_misses,b.pid
    from v$latch a, v$latchholder b, v$latchname c
    where a.addr   = b.laddr(+) and a.latch# = c.latch#
    and c.name like '&latch_name%' order by a.latch#;
    


8. List of all the latches
 

    Oracle versions might differ in the latch# assigned to the existing latches.
    The following query will help you to identify all latches and the number assigned.

    column name format a40 heading 'LATCH NAME'
    select latch#, name from v$latchname;
    


9. List of latches that are of most concern to a DBA
 

    * BUFFER CACHE LATCHES: There are two main latches which protect data blocks in the buffer cache. Contention for these two latches is usually seen when a database has high I/O rates. We can reduce contention for these latches and tune them by adjusting certain init.ora parameters.

    Cache buffers chains latch:

This latch is acquired whenever a block in the buffer cache is accessed (pinned).

Reducing contention for the cache buffer chains latch will usually require reducing logical I/O rates by tuning and minimizing the I/O requirements of the SQL involved. High I/O rates could be a sign of a hot block (meaning a block highly accessed). 

    See NOTE:163424.1 How To Identify a Hot Block Within The Database to correctly identify this issue

    

    Cache buffers LRU chain latch:
    The cache buffer lru chain latch is acquired in order to introduce a new block into the buffer cache and when writing a buffer back to disk, specifically when trying  to scan
     the LRU (least recently used) chain containing all the dirty blocks in the buffer cache.

    Its possible to reduce contention for the cache buffer lru chain latch by increasing the size of the buffer cache and thereby reducing the rate at which new blocks are introduced into the buffer cache. Two parameters dictate the size of the buffer cache, DB_BLOCK_SIZE and DB_BLOCK_BUFFERS. In actuality, only the DB_BLOCK_BUFFERS can be changed without recreating the database. Caution, when tuning the buffer pool, avoid the use of additional buffers that contribute little or nothing to the cache hit ratio. A common mistake is to continue increasing the value of DB_BLOCK_BUFFERS. Such increases have no effect if you are doing full table scans or other operations that do not use the buffer cache. Multiple buffer pools can help reduce contention on this latch.You can create additional cache buffer lru chain latches by adjusting the configuration parameter DB_BLOCK_LRU_LATCHES. You may be able to reduce the load on the cache buffer chain latches by increasing the configuration parameter _DB_BLOCK_HASH_BUCKETS

    

    * REDOLOG BUFFER LATCHES: There are two Redo buffer latches, the redo allocation latch and the redo copy latch. The redo allocation latch must be acquired in order to allocate space within the buffer. If the redo log entry to be made is greater than the configuration parameter LOG_SMALL_ENTRY_MAX_SIZE, the session which acquires the redo allocation latch may copy the entry into the redo buffer immediately while holding the allocation latch. If the log entry is greater than LOG_SMALL_ENTRY_MAX_SIZE, then the session will release the redo allocation latch and will acquire the redo copy latch in order to copy the entry. There is only one redo allocation latch, but there may be up to LOG_SIMULTANEOUS_COPIES allocation latches.

    Redo allocation latch:

This latch controls the allocation of space for redo entries in the redo log buffer. There is one redo allocation latch per instance.

    Contention for this latch in Oracle7 can be reduced by decreasing the value of  LOG_SMALL_ENTRY_MAX_SIZE on multi-cpu systems to force the use of the
     redo copy latch. In Oracle8i this parameter is obsolete, so you need to consider   to increase the size of the LOG_BUFFER or reduce the load of the log buffer using
    NOLOGGING features when possible.  

    Redo copy latch:

This latch is used to write redo records into the redolog buffer. This latch is waited for on both single and multi-cpu systems.
     
On multi-cpu systems, contention can be reduced by increasing the  value of LOG_SIMULTANEOUS_COPIES (Hidden in Oracle8i)   and/or increasing LOG_ENTRY_PREBUILD_THRESHOLD (undocumented in Oracle7).


    * LIBRARY CACHE

    Library cache latch:

The library cache latches protect the cached SQL statements and objects definitions held in the library cache within the shared pool. The library cache latch must be acquired in order to
add a new statement to the library cache. During a parse, Oracle searches the library cache for a matching statement. If one is not found, then Oracle will parse the SQL statement, obtain
 the library cache latch and insert the new SQL.

The first resource to reduce contention on this latch is to ensure that the application is reusing as much as possible SQL statement representation. Use bind variables whenever possible in the application. Misses on this latch may also be a sign that the application is parsing SQL at a high rate and may be suffering from too much parse CPU overhead.If the application is already  tuned the SHARED_POOL_SIZE can be increased. Be aware that if the application is not  using the library cache appropriately, the contention might be worse with a larger structure to be handled.

The _KGL_LATCH_COUNT parameter controls the number of library cache latches. The default value should be adequate, but if contention for the library cache latch cant be resolved, it may be advisable to increase this value. The default value for _KGL_LATCH_COUNT is the next prime number after CPU_COUNT. This value cannot exceed 66 (See: <>).

    Library cache pin latch:

The library cache pin latch must be acquired when a statement in the library cache is reexecuted. Misses on this latch occur when there is very high rates SQL execution.

    There is little that can be done to reduce the load on the library cache pin latch, although using private rather than public synonyms or direct object references such as OWNER.TABLE may help.


    * SHARED POOL RELATED LATCHES

    Shared pool latch:

    While the library cache latch protects operations withing the library cache, the shared pool latchis used to protect critical operations when allocating and freeing memory in the shared pool.
    If an application makes use of literal (unshared) SQL then this can severely limit scalability and throughput. The cost of parsing a new SQL statement is expensive both in terms of
    CPU requirements and the number of times the library cache and shared pool latches may need to be acquired and released. Before Oracle9, there use to be just one such latch to the entire database to protects the allocation of memory in the library cache.  In Oracle9 multiple childs were introduced to relieve contention on this resource.

    Ways to reduce the shared pool latch are, avoid hard parses when possible, parse once, execute many. Eliminating literal SQL is also useful to avoid the shared pool latch. The size of the shared_pool and use of MTS (shared server option) also greatly influences the shared pool latch. Note 62143.1 explains how to identify and correct problems with the shared pool, and shared pool latch.

    Row cache objects latch:
    This latch comes into play when user processes are attempting to  access the cached data dictionary values.

    It is not common to have contention in this latch and the only way to reduce contention for this latch is by increasing the size of the shared pool (SHARED_POOL_SIZE).


10. Tuning _SPIN_COUNT (_LATCH_SPIN_COUNT in Oracle7)
 

SPIN_COUNT controls how many times the process will re-try to obtain   the latch before backing off and going to sleep. This basically means the process is in a tight CPU loop continually trying to get   the latch for SPIN_COUNT attempts. On a single CPU system if an Oracle process tries to acquire a latch but it is held by someone else the process will release the CPU and go to sleep for a short period before trying again. However, on a multi processor system (SMP) it is possible that the process holding the latch is running on one of the other CPUs and so will potentially release the latch in the next few instructions  (latches are usually held for only very short periods of time).

Performance can be adjusted by changing the value of SPIN_COUNT. If a high value is used, the latch will be attained sooner than if  you use a low value. However, you may use more CPU time spinning to get the latch if you use a high value for SPIN_COUNT. You can decrease this probability of session sleeps by increasing the value of the configuration parameters _LATCH_SPIN_COUNT or SPIN_COUNT.  This parameter controls the number of attempts the session will make to obtain the latch before sleeping. Spinning on the latch consumes CPU, so if you increase this parameter,  you may see an increase in your systems overall CPU utilization. If your computer is near 100% CPU and your application is throughput rather than response time driven, you could consider decreasing SPIN_COUNT in order to conserve CPU. Adjusting SPIN_COUNT is trial and error.  In general, only increase SPIN_COUNT if there are enough free CPU resources available on the system, and decrease it only if there is no spare CPU capacity.
To summarize latch sleeps and spin count, if you encounter latch contention and have spare CPU capacity, consider increasing the value of SPIN_COUNT. If CPU resources are at full capacity, consider decreasing the value of SPIN_COUNT.



Latches vs Enqueues

posted Mar 9, 2011, 6:47 AM by Sachchida Ojha

Enqueues are another type of locking mechanism used in Oracle. An enqueue is a more sophisticated mechanism which permits several concurrent processes to have varying degree of sharing of "known" resources. Any object which can be concurrently used, can be protected with enqueues. A good example is of locks on tables. We allow varying levels of sharing on tables e.g. two processes can lock a table in share mode or in share update mode
etc.

One difference is that the enqueue is obtained using an OS specific locking mechanism. An enqueue allows the user to store a value in the lock, i.e the mode in which we are requesting it. The OS lock manager keeps track of the resources locked. If a process cannot be granted the lock because it is incompatible with the mode requested and the lock is requested
with wait, the OS puts the requesting process on a wait queue which is serviced in FIFO.
Another difference between latches and enqueues is that in latches there is no ordered queue of waiters like in enqueues.

Latch waiters may either use timers to wakeup and retry or spin (only in multiprocessors). Since all waiters are concurrently retrying (depending on the scheduler), anyone might get the latch and conceivably the first one to try might be the last one to get.

Latches and enqueues

posted Mar 9, 2011, 6:45 AM by Sachchida Ojha

Latches and enqueues are both types of locks.

latches are lightweight serialization devices. we try to get a latch, spin for a bit and
try again. so when getting a latch, we try and try and try -- we are not told that the
latch is available, we keep trying to get it (eg: not necessary a first come, first serve
lock). we use latches to serialize access to in memory data structures typically (like
SGA data structures)

enqueues are heavyweight serialization devices. if we cannot get an enqueue, we "go to
sleep" and when the enqueue is available -- we are told about it in a first come, first
serve manner. We use enqueues to perform row level locking for example.


A semaphore is an operating system supplied serialization device that one might use to
implemented latching or enqueues.


So, latches and enqueues are types of locks, semaphores a programming device one might
use to implement latching and enqueuing.

1-6 of 6