[vbox] 11g rac to simulate ocr damage recovery

Both the voice and ocr in 11gR2 are under the same disk group, so recovery is integrated

View normal backups
[grid@rac1 admin]$ 
 

[grid@rac1 admin]$ ocrconfig -showbackup 

rac1     2019/05/08 12:27:51     /u01/app/11.2.0/grid/cdata/rac-cluster/backup00.ocr

rac1     2019/05/07 15:32:37     /u01/app/11.2.0/grid/cdata/rac-cluster/backup01.ocr

rac1     2019/05/07 15:32:37     /u01/app/11.2.0/grid/cdata/rac-cluster/day.ocr

rac1     2019/05/07 15:32:37     /u01/app/11.2.0/grid/cdata/rac-cluster/week.ocr
PROT-25: Manual backups for the Oracle Cluster Registry are not available
[grid@rac1 admin]$ crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   88e9adcf20db4f88bfba7ac8848ff68b (ORCL:VDKBACK) [OCR]
Located 1 voting disk(s).

close database
[grid@rac1 admin]$ srvctl stop database -d racdb -o immediate 
Shut down the cluster

[root@rac1 ~]# crsctl stop cluster -all -f

Check the physical device corresponding to asmdisk used by diskgroup ocr

[grid@rac1 admin]$ oracleasm querydisk -d VDKBACK
Disk "VDKBACK" is a valid ASM disk on device /dev/sdf1[8,81]

Simulated failure

[root@rac1 ~]#  dd if=/dev/zero of=/dev/sdf1  bs=1024K count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.006116 seconds, 171 MB/s
[root@rac1 ~]# 

Open cluster

crsctl start cluster -all 

It's stuck. It'll exit in a long time

[root@rac1 ~]# 
[root@rac1 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
[root@rac1 ~]# crsctl start cluster -all  
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac2'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac2' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac2'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac2'
CRS-2674: Start of 'ora.diskmon' on 'rac1' failed
CRS-2679: Attempting to clean 'ora.diskmon' on 'rac1'
CRS-2674: Start of 'ora.diskmon' on 'rac2' failed
CRS-2679: Attempting to clean 'ora.diskmon' on 'rac2'
CRS-2681: Clean of 'ora.diskmon' on 'rac1' succeeded
CRS-2681: Clean of 'ora.diskmon' on 'rac2' succeeded

CRS-4404: The following nodes did not reply within the allotted time:
rac1, rac2

 

crsctl start crs

[root@rac1 ~]# crsctl start crs
CRS-4640: Oracle High Availability Services is already active
CRS-4000: Command Start failed, or completed with errors.

Check logs

No exception found in more / var/log/messages

more $ORACLE_HOME/log/rac1/cssd/ocssd.log

File error found

[root@rac1 ~]# tail -30 $ORACLE_HOME/log/rac1/cssd/ocssd.log

2019-05-08 16:34:29.389: [    CLSF][1146157376]checksum failed for disk:ORCL:VDKOCR1:
2019-05-08 16:34:29.389: [    CLSF][1146157376]Read ASM header off dev:ORCL:VDKOCR1:0:0
2019-05-08 16:34:29.389: [   SKGFD][1146157376]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x72f7190 for disk :ORCL:VDKOCR1:

2019-05-08 16:34:29.389: [    CLSF][1146157376]Read ASM header off dev:ORCL:VDKOCR2:0:0
2019-05-08 16:34:29.390: [   SKGFD][1146157376]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x72f7b30 for disk :ORCL:VDKOCR2:

2019-05-08 16:34:29.401: [    CLSF][1146157376]Read ASM header off dev:ORCL:VDKVOTE:0:0
2019-05-08 16:34:29.401: [   SKGFD][1146157376]Lib :ASM:/opt/oracle/extapi/64/asm/orcl/1/libasm.so: closing handle 0x72f84d0 for disk :ORCL:VDKVOTE:

2019-05-08 16:34:29.401: [    CSSD][1146157376]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2019-05-08 16:34:29.401: [    CSSD][1146157376]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2019-05-08 16:34:29.401: [    CSSD][1146157376]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2019-05-08 16:34:29.401: [    CSSD][1146157376]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2019-05-08 16:34:29.401: [    CSSD][1146157376]clssnmvDiskVerify: file is not a voting file, cannot recognize on-disk signature for a voting
2019-05-08 16:34:29.401: [    CSSD][1146157376]clssnmvDiskVerify: Successful discovery of 0 disks
2019-05-08 16:34:29.401: [    CSSD][1146157376]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2019-05-08 16:34:29.401: [    CSSD][1146157376]clssnmvFindInitialConfigs: No voting files found
2019-05-08 16:34:29.401: [    CSSD][1146157376]###################################
2019-05-08 16:34:29.401: [    CSSD][1146157376]clssscExit: CSSD signal 11 in thread clssnmvDDiscThread
2019-05-08 16:34:29.401: [    CSSD][1146157376]###################################
2019-05-08 16:34:29.401: [    CSSD][1146157376]

----- Call Stack Trace -----
2019-05-08 16:34:29.401: [    CSSD][1135667520]clssgmClientShutdown: total iocapables 0
2019-05-08 16:34:29.401: [    CSSD][1135667520]clssgmClientShutdown: graceful shutdown completed.
2019-05-08 16:34:29.401: [    CSSD][1146157376]calling              call     entry                argument values in hex      
2019-05-08 16:34:29.402: [    CSSD][1146157376]location             type     point                (? means dubious value)     
2019-05-08 16:34:29.402: [    CSSD][1146157376]-------------------- -------- -------------------- ----------------------------
[root@rac1 ~]# 

[root@rac1 ~]# /etc/init.d/oracleasm scandisks
Scanning the system for Oracle ASMLib disks:               [  OK  ]
[root@rac1 ~]# /etc/init.d/oracleasm listdisks
VDKDATA
VDKOCR1
VDKOCR2
VDKVOTE
Found a missing asmdisk VDKBACK for OCR

Rebuild

[root@rac1 ~]# /usr/sbin/oracleasm createdisk VDKBACK /dev/sdf1
Writing disk header: done
Instantiating disk: done
[root@rac1 ~]# 

Close cluster

[root@rac1 ~]# crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'rac1'
CRS-2677: Stop of 'ora.mdnsd' on 'rac1' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'rac1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'rac1'
CRS-2677: Stop of 'ora.gipcd' on 'rac1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac1' has completed
CRS-4133: Oracle High Availability Services has been stopped.

Start the cluster as - excl -nocrs, which will start the ASM instance but not CRS

[root@rac1 ~]# crsctl start crs -excl -nocrs 

[root@rac1 ~]# crsctl start crs -excl
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.gipcd' on 'rac1'
CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'
CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded
CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'
CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2679: Attempting to clean 'ora.diskmon' on 'rac1'
CRS-2681: Clean of 'ora.diskmon' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'rac1'
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'rac1'
CRS-2676: Start of 'ora.ctssd' on 'rac1' succeeded
CRS-2676: Start of 'ora.drivers.acfs' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'rac1'
CRS-2676: Start of 'ora.asm' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'rac1'
CRS-2676: Start of 'ora.crsd' on 'rac1' succeeded
[root@rac1 ~]# crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.

[root@rac1 ~]# crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.

[root@rac1 ~]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4534: Cannot communicate with Event Manager

To rebuild the disk group of the original ocr and votedisk:
Note: This is under grid user

[grid@rac1 admin]$ sqlplus "/as sysasm"

SQL*Plus: Release 11.2.0.1.0 Production on Wed May 8 17:03:35 2019

Copyright (c) 1982, 2009, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> col path for a50
SQL> select path,header_status from v$asm_disk; 

PATH                                               HEADER_STATU
-------------------------------------------------- ------------
ORCL:VDKBACK                                       PROVISIONED
ORCL:VDKDATA                                       MEMBER
ORCL:VDKVOTE                                       MEMBER
ORCL:VDKOCR2                                       MEMBER
ORCL:VDKOCR1                                       MEMBER

SQL> create diskgroup OCR  EXTERNAL REDUNDANCY DISK  'ORCL:VDKBACK' ;

Diskgroup created.

After the ocr is created, the ocr backup content is restored, and an error is reported

[root@rac1 ~]# ocrconfig -restore /u01/app/11.2.0/grid/cdata/rac-cluster/backup01.ocr
PROT-16: Internal Error
[root@rac1 ~]# ocrconfig -restore  /u01/app/11.2.0/grid/cdata/rac-cluster/backup00.ocr
PROT-16: Internal Error
[root@rac1 ~]# ocrconfig -restore  /u01/app/11.2.0/grid/cdata/rac-cluster/backup00.ocr
PROT-16: Internal Error

Check the network and find the following operations

[grid@rac1 admin]$ sqlplus "/as sysasm"

SQL*Plus: Release 11.2.0.1.0 Production on Wed May 8 17:38:49 2019

Copyright (c) 1982, 2009, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> ^[[A                  " - rest of line ignored.
SQL> 042: unknown command "
SQL> drop diskgroup OCR;

Diskgroup dropped.

SQL>  create diskgroup OCR  EXTERNAL REDUNDANCY DISK  'ORCL:VDKBACK' attribute  'compatible.rdbms' = '11.1.0.0.0','compatible.asm' = '11.1.0.0.0';

Diskgroup created.

 

[grid @ Rac1 Admin] $ocrconfig - restore / u01/app/11.2.0/grid/cdata/rac-cluster/backup00.ocr executed successfully

[root@rac1 ~]# ocrconfig -restore /u01/app/11.2.0/grid/cdata/rac-cluster/backup00.ocr
[root@rac1 ~]# 

crsctl replace votedisk  +OCR

After the recovery of docr and vot, crs and other services will be automatically started

crsctl query css votedisk

[root@rac1 ~]# crsctl stop crs
[root@rac1 ~]# crsctl start crs

 

 

 

 

 

 

 

 

 

 

 

 

 

Tags: Oracle SQL Database sqlplus

Posted on Fri, 15 Nov 2019 12:23:45 -0500 by egmax