How to Identify Which Latch is Associated with a "latch free" wait?

posted Jul 26, 2011, 8:15 AM by Sachchida Ojha

It is important to identify which latch is associated with latch free waits when tuning a database for latch waits. 

For versions prior to 10g, there is an umbrella wait event called latch free that covers all latch waits.  The specific latch or latches involved must be determined from either a 10046 trace (and TKProf) or from a statspack report.  This note will show you how to determine which latch or latches are associated with the latch free event.

In Oracle 10g or later, finding which latches are causing waits is easy because most wait events have been introduced for specific latch waits (e.g., latch: shared pool).  However, some latch waits are still rolled up in the old latch free wait event and you will need to follow the procedure here to obtain more information. 

Solution

TKProf

This technique will help you identify the latch as well as the top SQL statements associated with the latch free event.

  • In the "Overall Totals" section,look for wait events with high elapsed times for "latch:" or "latch free" events (Overall Totals, recursive and non-recursive)

For example, the listing below is from the overall summary for recursive statements (in this case, the application was PL/SQL and non-recursive statements were negligible):

 Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  latch: library cache                          623        0.45         29.48
  latch: shared pool                            494        0.25         10.31
  latch free                                     77        0.33          4.92
  latch: row cache objects                        1        0.01          0.01
  • In the "Overall Totals" section, determine which call type is associated with the highest elapsed time: parse, execute, or fetch

For example, here you see that parse calls have the highest elapsed time.

OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse     3337     86.97     371.50          0          0          0           0
Execute   3338      0.97       3.06          0          0          0           0
Fetch     3338      0.75       2.67          0       3338          0        3338
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    10013     88.69     377.25          0       3338          0        3338
  • Generate a new TKProf report sorted by the call type found for the highest elapsed times. For example:

    Parse calls:  
    tkprof trace_file_name output_file sort=prsela

Fetch calls:

tkprof trace_file_name output_file sort=fchela
  • Choose a few of the top cursors in this new TKProf report and find them in the original trace file.
  • Oracle 10g+: Examine the waits for the statement and see which "latch:" wait it is
  • When most waits are for latch free rather than a specific latch:
    • Look at the lines with "WAIT #" for the latch free event corresponding to the cursor and find the value of the "P2" field. This field corresponds to the latch number in V$LATCHNAME.
    • Query V$LATCHNAME to find the name of the latch
    • Now, you can get an idea of which latches are causing most of the waits.

For example, here is what we would look for in the trace file:

From the trace file:
 
=====================
PARSING IN CURSOR #1 len=98 dep=1 uid=54 oct=3 lid=54 tim=5351590246329 hv=2697127572 ad='88f5bf60'
SELECT COUNT(*) FROM (SELECT 1949,3898,5847,7796,9745,11694,13643,15592 FROM EMP WHERE rownum = 1)
END OF STMT
PARSE #1:c=10000,e=7527,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,tim=5351590246298
EXEC #1:c=0,e=329,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=5351590247140
FETCH #1:c=0,e=221,p=0,cr=1,cu=0,mis=0,r=1,dep=1,og=1,tim=5351590247568
STAT #1 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=1 pr=0 pw=0 time=263 us)'
STAT #1 id=2 cnt=1 pid=1 pos=1 obj=0 op='VIEW  (cr=1 pr=0 pw=0 time=163 us)'
STAT #1 id=3 cnt=1 pid=2 pos=1 obj=0 op='COUNT STOPKEY (cr=1 pr=0 pw=0 time=148 us)'
STAT #1 id=4 cnt=1 pid=3 pos=1 obj=51152 op='INDEX FULL SCAN PK_EMP (cr=1 pr=0 pw=0 time=111 us)'
WAIT #1: nam='latch free' ela= 176876 address=15232212216 number=202 tries=1 obj#=-1 tim=5351590477313
=====================
PARSING IN CURSOR #1 len=70 dep=1 uid=54 oct=3 lid=54 tim=5351590478048 hv=3616361148 ad='85b72468'
SELECT COUNT(*) FROM (SELECT 1950,3900,5850 FROM EMP WHERE rownum = 1)
END OF STMT
PARSE #1:c=40000,e=229111,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=1,tim=5351590478021
EXEC #1:c=10000,e=572,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=5351590479029

In SQLPlus, find out which latch corresponds to the latch free waits:

SQL> select latch#, name from v$latchname where latch# = 202;
 
    LATCH# NAME
---------- -----------
       202 kks stats


AWR or statspack report

  • 10g or higher; most latch waits will include which latch is causing the wait. E.g., latch: library cache
  • 9.0.x - 9.2.x, the wait will be latch free. You'll have to visit the Latch Activities section and find the latches with the highest wait times.

For example, here is what we would look for in the AWR or Statspack regarding the Top Timed Events:

Top 5 Timed Events                                         Avg %Total
~~~~~~~~~~~~~~~~~~                                        wait   Call
Event                                 Waits    Time (s)   (ms)   Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
CPU time                                            571          23.8
latch: library cache                  3,894         195     50    8.1 Concurrenc
latch: shared pool                    2,439          45     18    1.9 Concurrenc
latch free                              448          31     69    1.3      Other
control file parallel write             706           7     10    0.3 System I/O
          -------------------------------------------------------------

Then, we would examine the Latch Activities section of the report:

Latch Activity                     DB/Inst: DB10GR2/DB10gR2  Snaps: 5162-5163
-> "Get Requests", "Pct Get Miss" and "Avg Slps/Miss" are statistics for
   willing-to-wait latch get requests
-> "NoWait Requests", "Pct NoWait Miss" are for no-wait latch get requests
-> "Pct Misses" for both should be very close to 0.0
 
                                           Pct    Avg   Wait                 Pct
                                    Get    Get   Slps   Time       NoWait NoWait
Latch Name                     Requests   Miss  /Miss    (s)     Requests   Miss
------------------------ -------------- ------ ------ ------ ------------ ------
...
job_queue_processes para             37    0.0    N/A      0            0    N/A
kks stats                       106,603    0.4    1.0     31            0    N/A
ksuosstats global area              144    0.0    N/A      0            0    N/A
...
library cache                   645,359    0.5    1.3    195        9,728    0.1
...
shared pool                     646,234    0.3    1.1     45            0    N/A
...

In the above report sample, one can see that the time spent waiting for the library cache and shared pool latches are easily accounted for (195 and 45 seconds) and match the wait events for those latches.  However, the kks stats latch is not accounted in its own wait event but instead its time is rolled into the latch free wait event (31 seconds).

  • Prior to 9.0.x, the wait will be latch free. You'll have to visit the Latch Sleep breakdown section and find the latches with the highest sleeps.

For example, assuming latch free waits are significant, we'll look at the Latch Sleep section of the report (this is from an 8.1.7 statspack):

Latch Sleep breakdown for DB: P013  Instance: P013  Snaps: 6 -11
-> ordered by misses desc
 
                                Get                              
Latch Name                    Requests         Misses      Sleeps
-------------------------- -------------- ----------- -----------
cache buffer handles           95,036,786  21,350,918     364,796
cache buffers chains          231,148,059  10,683,933   4,136,206
shared pool                     6,296,880     502,802     166,198
row cache objects               5,752,233      40,837       8,470
library cache                   6,138,041      40,031      30,015

In this example, the cache buffers chains latch has the highest sleeps and is responsible for most of the latch wait time.  Get Requests and Misses is not very reliable for finding the latch causing most of the wait time - use sleeps since it has a more direct correlation to the time spent waiting.

Comments