11gR2 CRS doesn't startup after node reboot

posted May 25, 2011, 11:36 AM by Sachchida Ojha
11gR2 CRS doesn't startup after node reboot [ID 1050164.1]

Applies to:

Oracle Server - Enterprise Edition - Version: 11.2.0.1.0 to 11.2.0.1.0 - Release: 11.2 to 11.2
Generic Linux

Symptoms

  • Installation of the 11gR2 Grid Infrastructure on a Linux cluster completed successfully
  • OCR & Voting files located in ASM diskgroup
  • using ASMLIB driver
  • ASM disks are located on multipath devices (/dev/mapper/)
  • following a node reboot CRS does not startup
  • CSS daemon log shows the following message:

2010-01-13 09:04:15.075: [ CSSD][1150449984]clssnmvDDiscThread: using discovery string for initial discovery
2010-01-13 09:04:15.075: [ SKGFD][1150449984]Discovery with str::
2010-01-13 09:04:15.075: [ SKGFD][1150449984]UFS discovery with ::
2010-01-13 09:04:15.075: [ SKGFD][1150449984]OSS discovery with ::
2010-01-13 09:04:15.076: [ SKGFD][1150449984]Discovery with asmlib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: str ::
2010-01-13 09:04:15.076: [ SKGFD][1150449984]Fetching asmlib disk :ORCL:DATA1:
2010-01-13 09:04:15.076: [ SKGFD][1150449984]Fetching asmlib disk :ORCL:DATA2:
2010-01-13 09:04:15.076: [ SKGFD][1150449984]Fetching asmlib disk :ORCL:DATA3:
2010-01-13 09:04:15.076: [ SKGFD][1150449984]Fetching asmlib disk :ORCL:DATA4:
2010-01-13 09:04:15.077: [ SKGFD][1150449984]ERROR: -15(asmlib ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so op asm_open error Operation not permitted)
2010-01-13 09:04:15.077: [ SKGFD][1150449984]ERROR: -15(asmlib ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so op asm_open error Operation not permitted)
2010-01-13 09:04:15.077: [ SKGFD][1150449984]ERROR: -15(asmlib ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so op asm_open error Operation not permitted)
2010-01-13 09:04:15.077: [ SKGFD][1150449984]ERROR: -15(asmlib ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so op asm_open error Operation not permitted)
2010-01-13 09:04:15.077: [ CSSD][1150449984]clssnmvDiskVerify: Successful discovery of 0 disks
2010-01-13 09:04:15.077: [ CSSD][1150449984]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2010-01-13 09:04:15.077: [ CSSD][1150449984]clssnmvFindInitialConfigs: No voting files found
2010-01-13 09:04:15.077: [ CSSD][1150449984]###################################
2010-01-13 09:04:15.077: [ CSSD][1150449984]clssscExit: CSSD signal 11 in thread clssnmvDDiscThread
2010-01-13 09:04:15.077: [ CSSD][1150449984]###################################
2010-01-13 09:04:15.077: [ CSSD][1139960128]clssgmClientShutdown: total iocapables 0
2010-01-13 09:04:15.077: [ CSSD][1139960128]clssgmClientShutdown: graceful shutdown completed.
2010-01-13 09:04:15.077: [ CSSD][1150449984]

  • running the cluster verification utility returns the following messages:

/cluvfy stage -post crsinst -n racnode1

Performing post-checks for cluster services setup
Checking node reachability...
Node reachability check passed from node "racnode1"
Checking user equivalence...
User equivalence check passed for user "grid"
Checking time zone consistency...
Time zone consistency check passed.
ERROR:
Cluster manager integrity check failed
PRVF-5434 : Cannot identify the current CRS software version
UDev attributes check for OCR locations started...
UDev attributes check passed for OCR locations
UDev attributes check for Voting Disk locations started...
ERROR:
PRVF-5197 : Failed to retrieve voting disk locations
UDev attributes check failed for Voting Disk locations
Default user file creation mask check passed
Checking cluster integrity...
Cluster integrity check failed This check did not run on the following node(s):
racnode1
Checking OCR integrity...
Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations
ERROR:
PRVF-5300 : Failed to retrieve active version for CRS on this node
OCR integrity check failed
Checking CRS integrity...
ERROR:
PRVF-5300 : Failed to retrieve active version for CRS on this node
CRS integrity check failed
OCR detected on ASM. Running ACFS Integrity checks...
Starting check to see if ASM is running on all cluster nodes...
PRVF-5137 : Failure while checking ASM status on node "racnode1"
Starting Disk Groups check to see if at least one Disk Group configured...
PRVF-5112 : An Exception occurred while checking for Disk Groups
PRVF-5114 : Disk Group check failed. No Disk Groups configured
Task ACFS Integrity check failed
Checking Oracle Cluster Voting Disk configuration...
ERROR:
PRVF-5434 : Cannot identify the current CRS software version
PRVF-5431 : Oracle Cluster Voting Disk configuration check failed
User "grid" is not part of "root" group. Check passed
Post-check for cluster services setup was unsuccessful on all the nodes.

 

Changes

Node was rebooted after install.

Cause


The CSS daemon crashes because it cannot locate any Voting files in any of the discovered ASM disks, which is indicated by the following message in the CSS daemon log (<grid_home>/log/<node_name>/cssd/ocssd.log):

2010-01-13 09:04:15.077: [ CSSD][1150449984]clssnmvFindInitialConfigs: No voting files found



This error is preceded by the following ASMLIB error:

2010-01-13 09:04:15.077: [ SKGFD][1150449984]ERROR: -15(asmlib ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so op asm_open error Operation not permitted)

suggesting that ASMLIB has problem accessing the ASM disk.

Solution

1. either edit the file /etc/sysconfig/oracleasm-_dev_oracleasm    and change the lines:

ORACLEASM_SCANORDER=""
ORACLEASM_SCANEXCLUDE=""

to

ORACLEASM_SCANORDER="dm"
ORACLEASM_SCANEXCLUDE="sd"

or alternatively run the following command (as user root)

/usr/sbin/oracleasm configure -i -e -u user -g group -o "dm" -x "sd"


2. stop & restart ASMLIB as user root using:

/usr/sbin/oracleasm exit
/usr/sbin/oracleasm init


3. restart CRS or reboot node

The above steps need to be executed on all nodes




Comments