RAC - Oracle Real Application Clusters

What is Voting Disk?

posted Aug 6, 2011, 4:48 PM by Sachchida Ojha

Voting Disk, is a file that sits in the shared storage area and must be accessible by all nodes in the cluster. All nodes in the cluster registers their heart-beat information in the voting disk, so as to
confirm that they are all operational. If heart-beat information of any node in the voting disk is not available that node will be evicted from the cluster. The CSS (Cluster Synchronization Service) daemon in the clusterware maintains the heart beat of all nodes to the voting disk. When any node is not able to send heartbeat to voting disk, then it will reboot itself, thus help avoiding the
split-brain syndrome.

For high availability, Oracle recommends that you have a minimum of three or odd number (3 or greater) of voting disks.

According to Oracle – “An absolute majority of voting disks configured (more than half) must be available and responsive at all times for Oracle Clusterware to operate.”

Which means to survive from loss of ‘N’ voting disks, you must configure atleast ’2N+1′ voting disks.

Suppose you have 5 voting disks configured for your 2 Node environment, then you can survive evenafter loss of 2 voting disks.

Keep in mind that, having multiple voting disks is reasonable if you keep them on different disks/volumes/san arrays so that your cluster can survive even during the loss of one disk/volume/array.
So, there is no point in configuring multiple voting disks on a single disk/lun/array.

But there is a special scenario, where all the nodes in the cluster can see the all voting disks but the cluster-interconnect between the nodes failed, to avoid split-brain syndrome in this scenario,

node eviction must happen. But the question here is which one?

According to Oracle – “The node with the lower node number will survive the eviction (The first node to join the cluster).” . So, the very first one that joined in the cluster will survive from eviction.

Operations

1.) Obtaining voting disk information –

$ crsctl query css votedisk

2.) Adding Voting Disks

First shut down Oracle Clusterware on all nodes, then use the following commands as the root user.
# crsctl add css [path of voting disk]

3.) Removing a voting disk:

First shut down Oracle Clusterware on all nodes, then use the following commands as the root user.
# crsctl delete css [path of voting disk]

Do not use -force option for adding or removing voting disk while the Oracle Clusterware stack is active, it can corrupt cluster configuration. You can use it when cluster is down and can modify the voting disk configuration using either of these commands without interacting with active Oracle Clusterware daemons.

4.) Backing up Voting Disks

Perform backup operation whenever there is change in the configuration like add/delete of new nodes or add/delete of voting disks.

$ dd if=current_voting_disk of=backup_file_name

If your voting disk is stored on a raw device, specify the device name -

$ dd if=/dev/sdd1 of=/tmp/vd1_.dmp

5.) Recovering Voting Disks

A Bad voting disk can be recovered using a backup copy.

$ dd if=backup_file_name of=current_voting_disk


A Bad voting disk can be rec

What does RAC do incase node becomes inactive?

posted Aug 6, 2011, 4:45 PM by Sachchida Ojha

In RAC if any node becomes inactive, or if other nodes are unable to ping/connect to a node in the RAC, then the node which first detects that one of the node is not accessible, it will evict that node from the RAC group. e.g. there are 4 nodes in a rac instance, and node 3 becomes unavailable, and node 1 tries to connect to node 3 and finds it not responding, then node 1 will evict node 3 out of the RAC groups and will leave only Node1, Node2 & Node4 in the RAC group to continue functioning.

The split brain concepts can become more complicated in large RAC setups. For example there are 10 RAC nodes in a cluster. And say 4 nodes are not able to communicate with the other 6. So there are 2 groups formed in this 10 node RAC cluster ( one group of 4 nodes and other of 6 nodes). Now the nodes will quickly try to affirm their membership by locking controlfile, then the node that lock the controlfile will try to check the votes of the other nodes. The group with the most number of active nodes gets the preference and the others are evicted.

OCR and Voting Disks

posted Aug 6, 2011, 4:33 PM by Sachchida Ojha

OCR:

Oracle Cluster Registry (OCR)—Maintains cluster configuration information as well as configuration information about any cluster database within the cluster. The OCR also manages information about processes that Oracle Clusterware controls. The OCR stores configuration information in a series of key-value pairs within a directory tree structure. The OCR must reside on shared disk that is accessible by all of the nodes in your cluster. The Oracle Clusterware can multiplex the OCR and Oracle recommends that you use this feature to ensure cluster high availability. You can replace a failed OCR online, and you can update the OCR through supported APIs such as Enterprise Manager, the Server Control Utility (SRVCTL), or the Database Configuration Assistant (DBCA).

  1. The OCR is the CLUSTER REGISTRY and holds information related to Resources that are part of the Cluster.
  2. OCR backup is necessary after any node/service/resource is added/altered/deleted from the cluster.
  3. If you do "ocrdump -stdout" you will see its contents. You have permissions/services definitions etc there. Thats why you should backup ocr every time something changes in your cluster config.

get information about OCR use command in CRS_HOME/bin PATH
$ ocrdump /tmp/a

check /tmp/a file.

Or just check ocr
$ ocrcheck

If you need to get information about resources.., by "crs_stat" at CRS_HOME/bin PATH

$ crs_stat
$ crs_stat -t


Voting disk:

Voting Disk—Manages cluster membership by way of a health check and arbitrates cluster ownership among the instances in case of network failures. Oracle RAC uses the voting disk to determine which instances are members of a cluster. The voting disk must reside on shared disk. For high availability, Oracle recommends that you have multiple voting disks. The Oracle Clusterware enables multiple voting disks but you must have an odd number of voting disks, such as three, five, and so on. If you define a single voting disk, then you should use external mirroring to provide redundancy.

If you had any even numbers of voting disk ( say 2 ) and a 2 node cluster ... what happens if one voting disk has a vote for node1 and the other for node 2? Voting disks among other things can lead to a rac node getting evicted ( thrown out ) of a rac cluster.

Voting disk keeps track of the RESOURCES that are available, active and is polled dynamically when Cluster Service is running. Voting disks contains cluster nodes info. They are used by clusterware to acts a tiebreaker during communication failures. In case of split-brain voting disks are used to decide which part of cluster should be evicted. Thats why you only need to backup voting disks when you add/remove nodes.

VOTE : use command-line in CRS_HOME/bin PATH

$ olsnodes -n -v

check vote configure:
$ crsctl query css votedisk

-

New features in Oracle Clusterware for Oracle Database 11g release 2 (11.2) and 11g release 2 (11.2.0.1)

posted Aug 6, 2011, 4:31 PM by Sachchida Ojha

Oracle Database 11g Release 2 (11.2) New Features in Oracle Clusterware

This section describes administration and deployment features for Oracle Clusterware starting with Oracle Database 11g release 2 (11.2).

See Also:

Oracle Database New Features Guide for a complete description of the features in Oracle Database 11g release 2 (11.2)
  • Oracle Real Application Clusters One Node (Oracle RAC One Node)

    Oracle Real Application Clusters One Node (Oracle RAC One Node) provides enhanced high availability for single-instance databases, protecting them from both planned and unplanned downtime. Oracle RAC One Node provides the following:

    • Always-on single-instance database services

    • Better consolidation for database servers

    • Enhanced server virtualization

    • Lower cost development and test platform for full Oracle RAC

    In addition, Oracle RAC One Node facilitates the consolidation of database storage, standardizes your database environment, and, when necessary, enables you to upgrade to a full, multinode Oracle RAC database without downtime or disruption.

    Use online database relocation to migrate an Oracle RAC One Node database from one node to another while maintaining service availability.

    This feature includes enhancements to the Server Control Utility (SRVCTL) for both Oracle RAC One Node and online database relocation.

    See Also:

    Oracle Real Application Clusters Administration and Deployment Guide for more information about Oracle RAC One Node
  • Configuration Wizard for the Oracle Grid Infrastructure Software

    This Configuration Wizard enables you to configure the Oracle Grid Infrastructure software after performing a software-only installation. You no longer have to manually edit the config_params configuration file as this wizard takes you through the process, step by step.

    See Also:

    "Configuring Oracle Grid Infrastructure" for more information
  • Cluster Health Monitor (CHM)

    The Cluster Health Monitor (CHM) gathers operating system metrics in real time and stores them in its repository for later analysis to determine the root cause of many Oracle Clusterware and Oracle RAC issues with the assistance of Oracle Support. It also works together with Oracle Database Quality of Service Management (Oracle Database QoS Management) by providing metrics to detect memory over-commitment on a node. With this information, Oracle Database QoS Management can take action to relieve the stress and preserve existing workloads.

    See Also:

  • Enhancements to SRVCTL for Grid Infrastructure Management

    Enhancements to the Server Control utility (SRVCTL) simplify the management of various new Oracle Grid Infrastructure and Oracle RAC resources.

  • Redundant Interconnect Usage

    In previous releases, to make use of redundant networks for the interconnect, bonding, trunking, teaming, or similar technology was required. Oracle Grid Infrastructure and Oracle RAC can now make use of redundant network interconnects, without the use of other network technology, to enhance optimal communication in the cluster.

    Redundant Interconnect Usage enables load-balancing and high availability across multiple (up to four) private networks (also known as interconnects).

    See Also:

    "Redundant Interconnect Usage" for more information
  • Oracle Database Quality of Service Management Server

    The Oracle Database Quality of Service Management server allows system administrators to manage application service levels hosted in Oracle Database clusters by correlating accurate run-time performance and resource metrics and analyzing with an expert system to produce recommended resource adjustments to meet policy-based performance objectives.

Oracle Database 11g Release 2 (11.2.0.1) New Features in Oracle Clusterware

This section describes administration and deployment features for Oracle Clusterware starting with Oracle Database 11g Release 2 (11.2.0.1).

  • Oracle Restart

    Oracle Restart provides automatic restart of Oracle Database and listeners.

    For standalone servers, Oracle Restart monitors and automatically restarts Oracle processes, such as Oracle Automatic Storage Management (Oracle ASM), Oracle Database, and listeners, on the server. Oracle Restart and Oracle ASM provide the Grid Infrastructure for a standalone server.

    See Also:

    Oracle Database Administrator's Guide for more information about Oracle Restart
  • Improved Oracle Clusterware resource modeling

    Oracle Clusterware can manage different types of applications and processes, including third-party applications. You can create dependencies among the applications and processes and manage them as one entity.

    Oracle Clusterware uses different entities to manage your applications and processes, including resources, resource types, servers, and server pools. In addition to revised application programming interfaces (APIs), Oracle has created a new set of APIs to manage these entities.

    See Also:

  • Policy-based cluster and capacity management

    Server capacity management is improved through logical separation of a cluster into server pools. You can determine where and how resources run in the cluster using a cardinality-based approach. Subsequently, nodes become anonymous, eliminating the need to identify the nodes when placing resources on them.

    Server pools are assigned various levels of importance. When a failure occurs, Oracle Clusterware efficiently reallocates and reassigns capacity for applications to another, less important server pool within the cluster based on user-defined policies. This feature enables faster resource failover and dynamic capacity assignment.

    Clusters can host resources (defined as applications and databases) in server pools, which are isolated with respect to their resource consumption by the user-defined policies. For example, you can choose to run all human resources applications, accounting applications, and email applications in separate server pools.

    See Also:

    "Policy-Based Cluster and Capacity Management" for more information
  • Role-separated management

    Role-separated management enables multiple applications and databases to share the same cluster and hardware resources, but ensures that different administration groups do not interfere with each other.

    See Also:

    "Role-Separated Management" for more information
  • Cluster time synchronization service

    Cluster time synchronization service synchronizes the system time on all nodes in a cluster when vendor time synchronization software (such as NTP on UNIX and Window Time Service) is not installed. Synchronized system time across the cluster is a prerequisite to successfully run an Oracle cluster, improving the reliability of the entire Oracle cluster environment.

    See Also:

    "Cluster Time Management" for more information
  • Oracle Cluster Registry and voting disks can be stored using Oracle Automatic Storage Management

    OCR and voting disks can be stored in Oracle Automatic Storage Management (Oracle ASM). The Oracle ASM partnership and status table (PST) is replicated on multiple disks and is extended to store OCR. Consequently, OCR can tolerate the loss of the same number of disks as are in the underlying disk group and be relocated in response to disk failures.

    Oracle ASM reserves several blocks at a fixed location on every Oracle ASM disk for storing the voting disk. Should the disk holding the voting disk fail, Oracle ASM selects another disk on which to store this data.

    Storing OCR and the voting disk on Oracle ASM eliminates the need for third-party cluster volume managers and eliminates the complexity of managing disk partitions for OCR and voting disks in Oracle Clusterware installations.

    Note:

    The dd commands used to back up and recover voting disks in previous versions of Oracle Clusterware are not supported in Oracle Clusterware 11g release 2 (11.2).

    See Also:

    Chapter 2, "Administering Oracle Clusterware" for more information about OCR and voting disks
  • Oracle Automatic Storage Management Cluster File System

    The Oracle Automatic Storage Management Cluster File System (Oracle ACFS) extends Oracle ASM by providing a robust, general purpose extent-based and journaling file system for files other than Oracle database files. Oracle ACFS provides support for files such as Oracle binaries, report files, trace files, alert logs, and other application data files. With the addition of Oracle ACFS, Oracle ASM becomes a complete storage management solution for both Oracle database and non-database files.

    Additionally, Oracle ACFS

    • Supports large files with 64-bit file and file system data structure sizes leading to exabyte-capable file and file system capacities.

    • Uses extent-based storage allocation for improved performance.

    • Uses a log-based metadata transaction engine for file system integrity and fast recovery.

    • Can be exported to remote clients through industry standard protocols such as Network File System and Common Internet File System.

    Oracle ACFS eliminates the need for third-party cluster file system solutions, while streamlining, automating, and simplifying all file type management in both a single node and Oracle Real Application Clusters (Oracle RAC) and Grid computing environments.

    Oracle ACFS supports dynamic file system expansion and contraction without downtime. It is also highly available, leveraging the Oracle ASM mirroring and striping features in addition to hardware RAID functionality.

    See Also:

    Oracle Automatic Storage Management Administrator's Guide for more information about Oracle ACFS
  • Oracle Clusterware out-of-place upgrade

    You can install a new version of Oracle Clusterware into a separate home. Installing Oracle Clusterware in a separate home before the upgrade reduces planned outage time required for cluster upgrades, which assists in meeting availability service level agreements. After the Oracle Clusterware software is installed, you can then upgrade the cluster by stopping the previous version of the Oracle Clusterware software and starting the new version node by node (known as a rolling upgrade).

    See Also:

    Oracle Grid Infrastructure Installation Guide for more information about out-of-place upgrades
  • Enhanced Cluster Verification Utility

    Enhancements to the Cluster Verification Utility (CVU) include the following checks on the cluster:

    • Before and after node addition

    • After node deletion

    • Before and after storage addition

    • Before and after storage deletion

    • After network modification

    • Oracle ASM integrity

    In addition to command-line commands, these checks are done through the Oracle Universal Installer, Database Configuration Assistant, and Oracle Enterprise Manager. These enhancements facilitate implementation and configuration of cluster environments and provide assistance in diagnosing problems in a cluster environment, improving configuration and installation.

    See Also:

  • Enhanced Integration of Cluster Verification Utility and Oracle Universal Installer

    This feature fully integrates the CVU with Oracle Universal Installer so that multi-node checks are done automatically. This ensures that any problems with cluster setup are detected and corrected before installing Oracle software.

    The CVU validates cluster components and verifies the cluster readiness at different stages of Oracle RAC deployment, such as installation of Oracle Clusterware and Oracle RAC databases, and configuration of Oracle RAC databases. It also helps validate the successful completion of a specific stage of Oracle RAC deployment.

    See Also:

    Oracle Grid Infrastructure Installation Guide for more information about CVU checks done during installation
  • Grid Plug and Play

    Grid Plug and Play enables you to move your data center toward a dynamic Grid Infrastructure. This enables you to consolidate applications and lower the costs of managing applications, while providing a highly available environment that can easily scale when the workload requires. There are many modifications in Oracle RAC 11g release 2 (11.2) to support the easy addition of servers in a cluster and therefore a more dynamic grid.

    In the past, adding or removing servers in a cluster required extensive manual preparation. With this release, Grid Plug and Play reduces the costs of installing, configuring, and managing server nodes by automating the following tasks:

    • Adding an Oracle RAC database instance

    • Negotiating appropriate network identities for itself

    • Acquiring additional information it needs to operate from a configuration profile

    • Configuring or reconfiguring itself using profile data, making host names and addresses resolvable on the network

    Additionally, the number of steps necessary to add and remove nodes is reduced.

    Oracle Enterprise Manager immediately reflects Grid Plug and Play-enabled changes.

  • Oracle Enterprise Manager support for Oracle ACFS

    This feature provides a comprehensive management solution that extends Oracle ASM technology to support general purpose files not directly supported by ASM, and in both single-instance Oracle Database and Oracle Clusterware configurations. It also enhances existing Oracle Enterprise Manager support for Oracle ASM, and adds new features to support the Oracle ASM Dynamic Volume Manager (ADVM) and Oracle ASM Cluster File System technology (ACFS).

    Oracle Automatic Storage Management Cluster File System (Oracle ACFS) is a scalable file system and storage management design that extends Oracle ASM technology. It supports all application data in both single host and cluster configurations and leverages existing Oracle ASM functionality to achieve the following:

    • Dynamic file system resizing

    • Maximized performance through Oracle ASM's automatic distribution

    • Balancing and striping of the file system across all available disks

    • Storage reliability through Oracle ASM's mirroring and parity protection

    Oracle ACFS provides a multiplatform storage management solution to access clusterwide, non-database customer files.

    See Also:

    Oracle Automatic Storage Management Administrator's Guide for more information about Oracle ACFS
  • Oracle Enterprise Manager-based Oracle Clusterware resource management

    You can use Oracle Enterprise Manager to manage Oracle Clusterware resources. You can create and configure resources in Oracle Clusterware and also monitor and manage resources after they are deployed in the cluster.

  • Zero downtime for patching Oracle Clusterware

    Patching Oracle Clusterware and Oracle RAC can be completed without taking the entire cluster down. This also allows for out-of-place upgrades to the cluster software and Oracle Database, reducing the planned maintenance downtime required in an Oracle RAC environment.

  • Improvements to provisioning of Oracle Clusterware and Oracle RAC

    This feature offers a simplified solution for provisioning Oracle RAC systems. Oracle Enterprise Manager Database Control enables you to extend Oracle RAC clusters by automating the provisioning tasks on the new nodes.

Why do we have a Virtual IP (VIP) in Oracle RAC?

posted Aug 6, 2011, 4:24 PM by Sachchida Ojha

Without using VIPs or FAN, clients connected to a node that died will often wait for a TCP timeout period (which can be up to 10 min) before getting an error. As a result, you don't really have a good HA solution without using VIPs.
When a node fails, the VIP associated with it is automatically failed over to some other node and new node re-arps the world indicating a new MAC address for the IP. Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients. This results in the clients getting errors immediately.

How does one stop and start RAC instances?

posted Aug 6, 2011, 4:14 PM by Sachchida Ojha   [ updated Aug 6, 2011, 4:15 PM ]

You can use the srvctl utility to start instances and listener across the cluster from a single node. Here are some examples:
$ srvctl status database -d RACDB
$ srvctl start database -d RACDB
$ srvctl start instance -d RACDB -i RACDB1
$ srvctl start instance -d RACDB -i RACDB2
$ srvctl stop database -d RACDB
$ srvctl start asm -n node2

Procwatcher: Script to Monitor and Examine Oracle DB and Clusterware Processes [ID 459694.1]

posted Jul 26, 2011, 8:37 AM by Sachchida Ojha

Procwatcher is a tool to examine and monitor Oracle database and clusterware processes at an interval.  The tool will collect stack traces of these processes using Oracle tools like oradebug short_stack and/or OS debuggers like pstack, gdb, dbx, or ladebug and collect SQL data if specified.

If there are any problems with the prw.sh script or if you you have suggestions, please post a comment on this document with details.

Scope and Application

This tool is for Oracle representatives and DBAs looking to troubleshoot a problem further by monitoring processes.  This tool should be used in conjunction with other tools or troubleshooting methods depending on the situation. 

Procwatcher: Script to Monitor and Examine Oracle DB and Clusterware Processes

# This script will find clusterware and/or Oracle Background processes and collect
# stack traces for debugging. It will write a file called procname_pid_date_hour.out
# for each process. If you are debugging clusterware then run this script as root.
# If you are only debugging Oracle background processes then you can run as
# root or oracle.

To install the script, simply download it put it in its own directory, unzip it, and give it execute permissions.

Requirements

  • Must have /bin and /usr/bin in your $PATH
  • Have your instance_name or db_name set in the oratab and/or set the $ORACLE_HOME env variable (PRW searches the oratab for the SID it finds and if it can't find the SID in the oratab it will default to $ORACLE_HOME).  Procwatcher cannot function properly if it cannot find an $ORACLE_HOME to use. 
  • Run Procwatcher as the oracle software owner if you are only troubleshooting homes/instances for that user.  If you are troubleshooting clusterware processes (EXAMINE_CLUSTER=true or are troubleshooting for multiple oracle users) run as root.
  • If you are monitoring the clusterware you must have the relevant OS debugger installed on your platform.  PRW looks for:

Linux - /usr/bin/gdb
HP-UX and HP Itanium - /opt/langtools/bin/gdb64 or /usr/ccs/bin/gdb64
Sun - /usr/bin/pstack
IBM AIX - /bin/procstack or /bin/dbx
HP Tru64 - /bin/ladebug

It will use pstack on any platform where it is available besides Linux (since pstack is a wrapper script for gdb anyway). 

Procwatcher Features

  • Procwatcher collects stack traces for all processes defined using either oradebug short_stack or an OS debugger at a predefined interval.
  • If USE_SQL is set to true, PRW will generate session wait, lock, and latch reports (look for pw_* reports in the PRW_DB_<SID> subdirectory).
  • If USE_SQL is set to true, PRW will look for wait events, lock, and latch contention and also dump stack traces of processes that are either waiting for non-idle wait events or waiting for or holding a lock or latch.
  • If USE_SQL is set to true, PRW will dump session wait, lock, latch, current SQL, process memory, and session history information into specific process files (look for prw_* files in the PRW_DB_<SID> subdirectory).
  • You can define how aggressive PRW is about getting information by setting parameters like THROTTLE, IDLECPU, and INTERVAL.  You can tune these parameters to either get the most information possible or to reduce PRW's cpu impact.  See below for more information about what each of these parameters does.
  • If CPU usage gets too high on the machine (as defined by IDLECPU), PRW will sleep and wait for CPU utilization to go down.
  • Procwatcher gets stack traces of ALL threads of a process (this is important for clusterware processes).
  • The housekeeper process runs on a 5 minute loop and cleans up files older than the specified number of days (default is 7).
  • If USE_SQL is set to true and any SQL times out after 90 seconds (by default) it will be disabled.  At a later time the SQL can be re-tested.  If the SQL times out 3 times it will be disabled for the life of Procwatcher.  Any GV$ view that times out will automatically revert to the corresponding V$ view.  Note that the GV$ view timeout is much lower.  The logic is: it's not worth using GV$ views if they aren't fast...If oradebug shortstack is enabled and it times out or fails, the housekeeper process will re-enable shortstack if the test passes.

Procwatcher is Ideal for...

  • Session level hangs or severe contention in the database/instance.
  • Severe performance issues.
  • Instance evictions and/or DRM timeouts.
  • Clusterware or DB processes stuck or consuming high CPU (must set EXAMINE_CLUSTER=true and run as root for clusterware processes)
  • ORA-4031 and SGA memory management issues.  (Set USE_SQL=true and sgastat=y which are the defaults, also set heapdetails=y (not the default). 
  • ORA-4030 and DB process memory issues.  (Set USE_SQL=true and process_memory=y).
  • RMAN slowness/contention during a backup.  (Set USE_SQL=true and rmanclient=y). 

Procwatcher is Not Ideal for...

  • Node evictions/reboots.  In order to troubleshoot these you would have to enable Procwatcher for a process(es) that are capable of rebooting the machine.  If the OS debugger suspends the processs for too long *that* could cause a reboot of the machine.  I would only use Procwatcher for a node eviction/reboot if the problem was reproducing on a test system and I didn't care of the node got rebooted.  Even in that case the INTERVAL would need to be set low (30) and many options would have to be turned off to get the cycle time low enough (EXAMINE_BG=false, USE_SQL=false, probably removing additional processes from the CLUSTERPROCS list).
  • Non-severe database performance issues.  AWR/ADDM/statspack are better options for this...
  • Most installation or upgrade issues.  We aren't getting data for this unless we are at a stage of the installation/upgrade where key processes are already started. 

Procwatcher User Commands

To start Procwatcher:

./prw.sh start

If Procwatcher is registered with the clusterware:
For instructions on registering Procwatcher with the clusterware click HERE.

cd <CLUSTER_HOME>/bin
11.2: ./crsctl start res procwatcher
10.x or 11.1: ./crs_start procwatcher 


To stop Procwatcher: :

./prw.sh stop

If Procwatcher is registered with the clusterware:

cd <CLUSTER_HOME>/bin
11.2: ./crsctl stop res procwatcher
10.x or 11.1: ./crs_stop -f procwatcher  (may need to run twice)


To check the status of Procwatcher:

./prw.sh stat

If Procwatcher is registered with the clusterware:

cd <CLUSTER_HOME>/bin
11.2: ./crsctl stat res procwatcher
10.x or 11.1: ./crs_stat procwatcher


To package up Procwatcher files to upload to support:

./prw.sh pack

Sample directory structure:

[root@racnode2 procwatcher]# ls
prw.log prwOLD1.log PRW_CLUSTER PRW_DB_rac2 prw.sh PRW_SYS

Note that all runtime data goes to prw.log and it creates a directory for the clusterware  (PRW_CLUSTER) and each DB instance that it finds (PRW_DB_$SID).  The PRW_SYS directory contains files that prw uses at runtime (don't touch). 

Sample log output:

################################################################################
Mon Mar 1 15:10:11 EST 2010: Procwatcher Version 030110 starting on Linux
################################################################################
Mon Mar 1 15:10:12 EST 2010: Procwatcher running as user oracle
Mon Mar 1 15:10:12 EST 2010: Debugging for SIDs: ASM1 RAC1
Mon Mar 1 15:10:12 EST 2010: ### Parameters ###
Mon Mar 1 15:10:12 EST 2010: EXAMINE_CLUSTER=false
Mon Mar 1 15:10:12 EST 2010: EXAMINE_BG=true
Mon Mar 1 15:10:12 EST 2010: USE_SQL=true
Mon Mar 1 15:10:12 EST 2010: INTERVAL=180
Mon Mar 1 15:10:12 EST 2010: THROTTLE=4
Mon Mar 1 15:10:12 EST 2010: IDLECPU=3
Mon Mar 1 15:10:12 EST 2010: SIDLIST=ASM1|RAC1
Mon Mar 1 15:10:12 EST 2010: BGPROCS=_dbw|_smon|_pmon|_lgwr|_lmd|_lms|_lck|_lmon|_ckpt|_arc|_rvwr|_gmon|_lmhb|_rms0
Mon Mar 1 15:10:12 EST 2010: ### End Parameters ###
Mon Mar 1 15:10:12 EST 2010: Using oradebug short_stack to speed up DB stack times...
Mon Mar 1 15:10:12 EST 2010: Going to use gdb for debugging if we can't use short_stack
Mon Mar 1 15:10:12 EST 2010: Collecting SQL Data for SID ASM1
Mon Mar 1 15:10:20 EST 2010: Finished Collecting SQL Data for SID ASM1
Mon Mar 1 15:10:20 EST 2010: Collecting SQL Data for SID RAC1
Mon Mar 1 15:10:28 EST 2010: Finished Collecting SQL Data for SID RAC1
Mon Mar 1 15:10:30 EST 2010: Saving SQL report data for SID ASM1
Mon Mar 1 15:10:30 EST 2010: Saving SQL report data for SID RAC1
Mon Mar 1 15:10:30 EST 2010: Collecting SQL Text Data for SID ASM1
Mon Mar 1 15:10:30 EST 2010: Finished Collecting SQL Text Data for SID ASM1
Mon Mar 1 15:10:31 EST 2010: Collecting SQL Text Data for SID RAC1
Mon Mar 1 15:10:31 EST 2010: Finished Collecting SQL Text Data for SID RAC1
Mon Mar 1 15:10:32 EST 2010: SQL collection complete after 21 seconds
Mon Mar 1 15:10:32 EST 2010: Getting stack for asm_pmon_+ASM1 3987 in PRW_DB_ASM1/prw_asm_pmon_+ASM1_3987_03-01-10.out
Mon Mar 1 15:10:34 EST 2010: Getting stack for asm_lmon_+ASM1 4003 in PRW_DB_ASM1/prw_asm_lmon_+ASM1_4003_03-01-10.out

Sample debug output:

################################################################################
Fri Sep 21 22:15:06 MDT 2007
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY
TIME CMD
0 S oracle 1754 1 0 40 20 ? 33778 ? Jul 18 ?
164:24 asm_lmd0_+ASM1


1754: asm_lmd0_+ASM1
ffffffff7a8ce49c pollsys (ffffffff7fffb9c0, 2, ffffffff7fffb900, 0)
ffffffff7a867f24 poll (ffffffff7fffb9c0, 2, 50, 4c4b400, 50, 1312d0) + 88
ffffffff7da19594 sskgxp_select (ffffffff7fffc9a0, 106744e10, ffffffff7fffc310,
2, 0, 50) + f4
ffffffff7da08fcc skgxpiwait (ffffffff7da1af78, 106744e10, 106745c10, 4f08ca92,
ffffffff7fffc310, fffe) + 82c
ffffffff7da086e4 skgxpwait (0, 106744e10, 4f08ca42, 400000, 50, 400000) + 364
0000000101185f6c ksxpwait (0, 101000, 0, 10652a698, 1000, 106530ac8) + 70c
0000000100ed50c8 ksliwat (0, 2, 8, 38793ad00, 38793ac88, 0) + b88
0000000100ed5690 kslwaitns_timed (8, 1, 33, 0, ffffffff7fffcec8, 0) + 30
0000000101172628 kskthbwt (8, 33, 0, 40, 0, 0) + e8
0000000100ed55d4 kslwait (1f5d7b0b, 0, a, a, 0, 0) + 74 
0000000100ed5690 kslwaitns_timed (8, 1, 33, 0, ffffffff7fffcec8, 0) + 30
0000000101172628 kskthbwt (8, 33, 0, 40, 0, 0) + e8
0000000100ed55d4 kslwait (1f5d7b0b, 0, a, a, 0, 0) + 74
0000000101184894 ksxprcv (1056db, 106527c18, 8, 1056db618, 106527, 1056db000) +394
0000000101645894 kjctr_rksxp (40, 385fe5af8, 0, ffffffff7fffda18, 14, ffffffff7fffda14) + 1f4
0000000101647464 kjctrcv (ffffffff79c2c2c8, 385fe5af8, 10675bca0, ffffffff7fffe25c, 40, 33) + 164
0000000101633c80 kjcsrmg (ffffffff79c2c2b0, 0, 40, 33, 0, 106531) + 60
0000000101690634 kjmdm (8, 44, a, 8, 106531, 0) + 3274 

Sample SQL Report (if USE_SQL=true):

################################################################################
Procwatcher sessionwait report
################################################################################

Snapshot Taken At: Thu Sep 27 13:36:03 GMT 2007
SID             PROC              STATE      EVENT                                  P1         P2         P3 WAIT_CLASS
  SEC
--------------- ----------------- ---------- ------------------------------ ---------- ---------- ---------- ------------ -----
-----
SID H1021       PROC 233474       WAITING    enq: TX - row lock contention  1415053318     524330        611 Application
  117
SID H1021       PROC 913492       WAITED SHO SQL*Net message to client      1650815232          1          0 Network
    0
Elapsed: 00:00:00.02

Sample SQL Data Dumped to Process Specific Files (if USE_SQL=true):
################################################################################
SQL: Session Wait Report for Process 192546 ora_fg_H1021

Snapshot Taken At: Thu Sep 27 13:37:49 GMT 2007
SID             PROC              STATE      EVENT                                  P1         P2         P3 WAIT_CLASS  SEC
--------------- ----------------- ---------- ------------------------------ ---------- ---------- ---------- ------------ -----
SID H1021       PROC 192546       WAITING    SQL*Net message from client    1650815232          1          0 Idle        228

################################################################################
SQL: Lock Report for Process 192546 ora_fg_H1021

Snapshot Taken At: Thu Sep 27 13:37:58 GMT 2007
SID                  PROC              TY        ID1        ID2      LMODE    REQUEST      BLOCK
-------------------- ----------------- -- ---------- ---------- ---------- ---------- ----------
SID H1021            PROC 192546       TX     524330        611          6          0          1

Procwatcher Parameters

Procwatcher also has some configurable parameters that can be set within the script itself. The script also provides more information on how to set each one. Here is the section of the script where parameters can be set:

CONFIG SETTINGS:

# Set EXAMINE_CLUSTER variable if you want to examine clusterware processes (default is false - or set to true):
EXAMINE_CLUSTER=false 

# Set EXAMINE_BG variable if you want to examine all BG processes (default is true - or set to false):
EXAMINE_BG=true 

# Set USE_SQL variable if you want to use SQL to troubleshoot (default is true - or set to false):
USE_SQL=true

# Set RETENTION variable to the number of days you want to keep historical procwatcher data (default: 7)
RETENTION=7

PERFORMANCE SETTINGS:

# Set INVERVAL to the number of seconds between runs (default 180):
# Probably should not set below 60 if USE_SQL=true and/or EXAMINE_CLUSTER=true
INTERVAL=180 

# Set THROTTLE to the max # of stack trace sessions or SQLs to run at once (default 5 - minimum 2):
THROTTLE=5 

# Set IDLECPU to the percentage of idle cpu remaining before PRW sleeps (default 3 - which means PRW will sleep if the machine is more than 97% busy - check every 5 seconds)
IDLECPU=3 

PROCESS LIST SETTINGS:


# Set SIDLIST to the list of SIDs you want to examine (default is derived - format "SID1|SID2|SID3"
# Default: If root is starting prw, get all sids found running at the time prw was started.
# If another user is starting prw, get all sids found running owned by that user.
SIDLIST= 

# Cluster Process list for examination (seperated by "|"):
# Default: "crsd.bin|evmd.bin|evmlogge|racgimon|racge|racgmain|racgons.b|ohasd.b|oraagent|oraroota|
gipcd.b|mdnsd.b|gpnpd.b|gnsd.bi|diskmon|octssd.b|ons -d|tnslsnr"
# - The processes oprocd, cssdagent, and cssdmonitor are intentionally left off the list because of high reboot danger.
# - The ocssd.bin process is off the list due to moderate reboot danger. Only add this if your css misscount is the
# - default or higher, your machine is not highly loaded, and you are aware of the tradeoffs.
CLUSTERPROCS="crsd.bin|evmd.bin|evmlogge|racgimon|racge|racgmain|racgons.b|ohasd.b|oraagent|oraroota|
gipcd.b|mdnsd.b|gpnpd.b|gnsd.bi|diskmon|octssd.b|ons -d|tnslsnr" 

# DB Process list for examination (seperated by "|"):
# Default: "_dbw|_smon|_pmon|_lgwr|_lmd|_lms|_lck|_lmon|_ckpt|_arc|_rvwr|_gmon|_lmhb|_rms0"
# - To examine ALL oracle DB and ASM processes on the machine, set BGPROCS="ora|asm" (not typically recommended)
BGPROCS="_dbw|_smon|_pmon|_lgwr|_lmd|_lms|_lck|_lmon|_ckpt|_arc|_rvwr|_gmon|_lmhb|_rms0"

For additional details, see the prw.sh script itself.   

If there are any problems with the prw.sh script or if you you have suggestions, please post a comment on this document with details. 

Advanced Options

Control the SQL that Procwatcher uses with:

## SQL Control
## Set to 'y' to enable SQL, 'n' to disable
sessionwait=y
lock=y
latchholder=y
sgastat=y
heapdetails=n
gesenqueue=y
waitchains=y
rmanclient=n
process_memory=n
sqltext=y
ash=y

# Set to 'n' to disable gv$ views
# (makes queries a little faster in RAC but can't see other instances in reports)
use_gv=y

Additional advanced options:

# DB Versions enabled, set to 'y' or 'n' (this will override the SIDLIST setting)
VERSION_10_1=y
VERSION_10_2=y
VERSION_11_1=y
VERSION_11_2=y

# Procinterval - only set this to 2 or higher if you want to slow Procwatcher down
# ...but THROTTLE is a better option to speed up/slow down
PROCINTERVAL=

# Should we fall back to an OS debugger if oradebug short_stack fails?
# OS debuggers are less safe per bug 6859515 so default is false (or set to true)
FALL_BACK_TO_OSDEBUGGER=false

# Number of oradebug shortstacks to get on each pass
# Will automatically lower if stacks are taking too long
STACKCOUNT=3

# Point this to a custom .sql file for Procwatcher to capture every cycle.
# Don't use big or long running SQL. The .sql file must be executable.
# Example: CUSTOMSQL1=/home/oracle/test.sql
CUSTOMSQL1=
CUSTOMSQL2=
CUSTOMSQL3=

Registering Procwatcher with the Oracle Clusterware (Optional)

If you want Procwatcher to start when the node/clusterware starts up and if you want it to restart if it is killed, you can register it with the clusterware.  If this isn't important to you, then you can skip this section.  To register with the clusterware there are 2 things to consider before running the commands:

  • Where does Procwatcher live (prw.sh)?
  • What is the most important DB/instance for Procwatcher to monitor?

Once you know this, run the following command if on 11.2+ (run this command as the user you want Procwatcher to run as):

./crsctl add resource procwatcher -type application -attr "ACTION_SCRIPT=<PATH TO prw.sh>,START_DEPENDENCIES=hard(<MOST IMPORTANT DB RESOURCE FOR PRW TO MONITOR>),AUTO_START=always,STOP_TIMEOUT=15"

Example:

./crsctl add resource procwatcher -type application -attr "ACTION_SCRIPT=/home/oracle/prw.sh,START_DEPENDENCIES=hard(ora.rac.db),AUTO_START=always,STOP_TIMEOUT=15"

 

 Note: Clusterware log info in:
<GRID_HOME>/log/<NODENAME>/agent/crsd/application_oracle

If on 10g or 11.1 run the following as root:

./crs_profile -create procwatcher -t application -a <PATH TO prw.sh> -r <MOST IMPORTANT INST RESOURCE FOR PRW TO MONITOR> -o as=always,pt=15

Example:

./crs_profile -create procwatcher -t application -a /home/oracle/prw.sh -r ora.RAC.RAC1.inst -o as=always,pt=15


Then register the resource:

./crs_register procwatcher

If you intend to run procwatcher as a user other than root, change the permissions:

./crs_setperm procwatcher -u user:oracle:r-x
./crs_setperm procwatcher -o oracle 

 

Note: Refer to the crsd.log to get information about procwatcher monitoring via the clusterware. 



Oracle RAC

posted Feb 24, 2011, 7:19 AM by Sachchida Ojha


Failover Cluster

? Detecting failure by monitoring the heartbeat and checking status of resources
? Reorganizing Cluster membership in the cluster manager
? Transferring Disk ownership from primary node to secondary node
? Mounting theFS on secondary node
? Starting DB instance
? Recovering the Database and rollback of uncommitted data
? Reestablishing the client connections to the failover node

FAILOVER CLUSTER OFFERINGS

? Veritas cluster server
? HP Service Guard
? Microsoft Cluster Service with OracleFailsafe
? RedHat Linux Advanced Server 2.1
? Sun Cluster Oracle Agent
? Compaq, now HP, Segregated Cluster
? HACMP

RAC ScalableRAC Real Application Cluster

? Many instances of Oracle running on many nodes
? Multiple instances share a single physical database
? All instances have common data, control, and initialization files
? Each instances has individual, shared log files and rollback segments or undo tablespaces
? All instances can simultaneously execute transactions against the single database
? Caches are synchronized using Oracle¶s Global Cache Management technology (CacheFusion)

RAC Building Blocks

? Instance and Database files
? Shared storage with OCFS, CFS or raw devices
? Redundant HBA cards per HOST
? Redundant NIC cards per HOST, one for cluster interconnect and one for LAN connectivity
? Local RAID protected drives for ORACLE_HOMES ( OCFS does not support ORACLE_HOME install)

CLUSTERINTER CONNECT FUNCTION

? - Monitoring Health, status and message synchronization
? - Transporting Distributed Lock manager messages
? - Accessing remoteFile system
? - Moving application specific traffic
? - providing cluster alias routing Interconnect Requirements
? - Low latency for short messages
? - High speed and sustained data rates for large messages
? - LOW Host CPU utilization
? -Flow Control, Error Control and heart beat continuity monitoring
? - switched network that scale well

INTERCONNECT PRODUCTS

? Memory Channel
? SMP Bus
? Myrinet
? Sun SCI
? Gigabit Ethernet
? Infiband Interconnect

INTERCONNECT PROTOCOL
? TCP/IP
? UDP
? VIA
? RDG
? HMP

Failover Cluster Architecture

posted Feb 23, 2011, 6:25 AM by Sachchida Ojha   [ updated Feb 23, 2011, 6:38 AM ]




Active/Passive Clusters – This type comprises two near identical infrastructures, logically sitting side-by-side. One node hosts the database service or application, while the other rests idly waiting in case the primary system goes down. They share a storage component, and the primary server gracefully turns over control of the storage to the other server or node when it fails. On failure of the primary node, the inactive node becomes the primary and hosts the database or application.

Active/Active Clusters – In this type, one node acts as primary to a database instance and another one acts as a secondary node for failover purpose. At the same time, the secondary node acts as primary for another instance and the primary node act as the backup/secondary node.


The Active/Passive architecture is the most widely used. Unfortunately, this option is usually capital intensive and an expensive option. For simplicity and manageability reasons many administrators prefer to implement this way. Active/Active looks attractive and is a more cost-benefit solution as the backup server is put to use. However, it can result in performance problems when both the database services (or applications) failover to single node. As the surviving node picks up the load from the failed node, performance issues may arise.


Oracle Database Service in HA Cluster

The Oracle database is a widely used database system. Large numbers of critical applications and business operations depend on the availability of the database. Most of the cluster products provide agents to support database fail over processes.

The implementation of Oracle Database service with failover in a HA cluster has the following general features.

* A single instance of Oracle runs on one of the nodes in the cluster. The Oracle instance and listener has dependencies on other resources such as file systems, mount points and IP address. etc.

* It has exclusive access to the set of database disk groups on a storage array that is shared among the nodes.

* Optionally, an Active/Active architecture of Oracle databases can be established. One node acts as the primary node to an Oracle instance and another node acts as a secondary node for failover purposes. At the same time, the secondary node acts as primary for another database instance and the primary node acts as the backup/secondary node.

* When the primary node suffers a failure, the Oracle instance is restarted on the surviving or backup node in the cluster.

* The failover process involves moving IP address, volumes, and file systems containing the Oracle data files. In other words, on the backup node, IP address is configured, disk group is imported, volumes are started and file systems are mounted. 

* The restart of the database automatically performs crash recovery returning the database to a transactional consistent state.

There are some issues connected with Oracle Database failover one needs to be aware of:

* On restart of the database, there is a fresh database cache (SGA) established and it loses all the previous instance’s SGA contents. All the frequently used packages and statements parsed images are lost.

* Once the new instance is created and made available on the backup node, all the client connections seeking the database service attempts to connect at the same time. This could result in a lengthy waiting period.

* The impact of the outage may be felt for an extended duration during the failover process. When there is a failure at the primary node, all the relevant resources such as mount points, disk group, listener, database instance have to be logically off-lined or shutdown. This process may take considerable time depending on failure situation.

However, when the Oracle Database Cluster is implemented in Parallel, Scalable cluster such as Oracle RAC, there are many advantages and it provides a transparent failover for the clients. The main high availability features include:

* Multiple Instances exist at the same time accessing a single database. Data files are common to the multiple instances.

* Multiple nodes have read/write access to the shared storage at the same time. Data blocks are read and updated by multiple nodes.

* Should a failure occur in a node and the Oracle instance is not usable or has crashed, the surviving node performs recovery for the crashed instance. There is no need to restart the instance on the surviving node since a parallel instance is already running there.

* All the client connections continue to access the database through the surviving node/instance. With the help of the Transparent Application Failover (TAF) facility, clients will be able to move over to the surviving instance near instantaneously.

* There is no such thing as the moving of Volumes and File system to the surviving node.

Oracle RAC 10g Overview

posted Sep 11, 2010, 6:17 AM by Sachchida Ojha

Oracle RAC, introduced with Oracle9i, is the successor to Oracle Parallel Server (OPS). RAC allows multiple instances to access the same database (storage) simultaneously. It provides fault tolerance, load balancing, and performance benefits by allowing the system to scale out, and at the same time—because all nodes access the same database—the failure of one instance will not cause the loss of access to the database.

At the heart of Oracle RAC is a shared disk subsystem. All nodes in the cluster must be able to access all of the data, redo log files, control files and parameter files for all nodes in the cluster. The data disks must be globally available to allow all nodes to access the database. Each node has its own redo log and control files but the other nodes must be able to access them in order to recover that node in the event of a system failure.

One of the bigger differences between Oracle RAC and OPS is the presence of Cache Fusion technology. In OPS, a request for data between nodes required the data to be written to disk first, and then the requesting node could read that data. With cache fusion, data is passed along a high-speed interconnect using a sophisticated locking algorithm.

Not all clustering solutions use shared storage. Some vendors use an approach known as a federated cluster, in which data is spread across several machines rather than shared by all. With Oracle RAC 10g, however, multiple nodes use the same set of disks for storing data. With Oracle RAC, the data files, redo log files, control files, and archived log files reside on shared storage on raw-disk devices, a NAS, a SAN, ASM, or on a clustered file system. Oracle's approach to clustering leverages the collective processing power of all the nodes in the cluster and at the same time provides failover security.

1-10 of 10