aodu的函数rac包含了以下内容:如何更改归档模式,常用术语,故障诊断常用脚本,设置调试模式,如何使用cvu(Cluster Vierify Utility),crsctl/srvctl常用命令,ocrconfig命令,olsnodes命令等。
如何使用rac函数?
[oracle@db1 ~]$ ./aodu
AT Oracle Database Utility,Release 1.1.0 on Tue Jun 14 14:28:01 2016Copyright (c) 2014, 2015, Robin.Han. All rights reserved.
http://ohsdba.cn
E-Mail:375349564@qq.com
AODU>
AODU> rac ohsdba
rac archivelog|general|abbr|diag|cvu|diaginstall|debug|eviction|diagrac|perf|srvctl|crsctl|ocrconfig|olsnodes|debug|note
AODU> rac oracle
Currently it's for internal use only
AODU>
注意:只有通过rac ohsdba会把函数中常用命令显示出来,当然也只有看过这篇文章,才会知道如何使用
AODU> rac archivelog
****Change archivedlog mode****
The following steps need to be taken to enable archive logging in a RAC database environment:
$srvctl stop database -d <db_unique_name>
$srvctl start database -d <db_unique_name> -o mount
$sqlplus / as sysdba
sql> alter database archivelog;
sql> exit;
$ srvctl stop database -d <db_unique_name>
$ srvctl start database -d <db_unique_name>
sql> archive log list;
Note:from 10g,you do not need to change parameter cluster_database
AODU>
AODU> rac general ---11gR2的特性和RAC启动顺序的说明
****11gR2 Clusterware Key Facts****
11gR2 Clusterware is required to be up and running prior to installing a 11gR2 Real Application Clusters database.
The GRID home consists of the Oracle Clusterware and ASM. ASM should not be in a separate home.
The 11gR2 Clusterware can be installed in "Standalone" mode for ASM and/or "Oracle Restart" single node support.
This clusterware is a subset of the full clusterware described in this document.
The 11gR2 Clusterware can be run by itself or on top of vendor clusterware.See the certification matrix for certified
combinations. Ref: Note: 184875.1 "How To Check The Certification Matrix for Real Application Clusters"
The GRID Home and the RAC/DB Home must be installed in different locations.
The 11gR2 Clusterware requires a shared OCR files and voting files. These can be stored on ASM or a cluster filesystem.
The OCR is backed up automatically every 4 hours to <GRID_HOME>/cdata/<clustername>/ and can be restored via ocrconfig.
The voting file is backed up into the OCR at every configuration change and can be restored via crsctl.
The 11gR2 Clusterware requires at least one private network for inter-node communication and at least one
public network for external communication.Several virtual IPs need to be registered with DNS.This includes the node VIPs (one per node),
SCAN VIPs (three).This can be done manually via your network administrator or optionally you could configure the "GNS" (Grid Naming Service)
in the Oracle clusterware to handle this for you (note that GNS requires its own VIP).
A SCAN (Single Client Access Name) is provided to clients to connect to. For more information on SCAN see Note: 887522.1
The root.sh script at the end of the clusterware installation starts the clusterware stack.For information on troubleshooting
root.sh issues see Note: 1053970.1
Only one set of clusterware daemons can be running per node.
On Unix, the clusterware stack is started via the init.ohasd script referenced in /etc/inittab with "respawn".
A node can be evicted (rebooted) if a node is deemed to be unhealthy. This is done so that the health of the entire cluster can be maintained.
For more information on this see: Note: 1050693.1 "Troubleshooting 11.2 Clusterware Node Evictions (Reboots)"
Either have vendor time synchronization software (like NTP) fully configured and running or have it not configured at all and
let CTSS handle time synchronization.See Note: 1054006.1 for more information.
If installing DB homes for a lower version, you will need to pin the nodes in the clusterware or you will see ORA-29702 errors.
See Note 946332.1 and Note:948456.1 for more information.
The clusterware stack can be started by either booting the machine, running "crsctl start crs" to start the clusterware stack,
or by running "crsctl start cluster" to start the clusterware on all nodes.Note that crsctl is in the <GRID_HOME>/bin directory.
Note that "crsctl start cluster" will only work if ohasd is running.
The clusterware stack can be stopped by either shutting down the machine, running "crsctl stop crs" to stop the clusterware stack,
or by running "crsctl stop cluster" to stop the clusterware on all nodes.Note that crsctl is in the <GRID_HOME>/bin directory.
Killing clusterware daemons is not supported.
Instance is now part of .db resources in "crsctl stat res -t" output, there is no separate .inst resource for 11gR2 instance.
****Cluster Start Sequence****
This daemon spawns 4 processes.
Level 1: OHASD Spawns:
cssdagent - Agent responsible for spawning CSSD.
orarootagent - Agent responsible for managing all root owned ohasd resources.
oraagent - Agent responsible for managing all oracle owned ohasd resources.
cssdmonitor - Monitors CSSD and node health (along wth the cssdagent).
Level 2: OHASD rootagent spawns:
CRSD - Primary daemon responsible for managing cluster resources.
CTSSD - Cluster Time Synchronization Services Daemon
Diskmon
ACFS (ASM Cluster File System) Drivers
Level 2: OHASD oraagent spawns:
MDNSD - Used for DNS lookup
GIPCD - Used for inter-process and inter-node communication
GPNPD - Grid Plug & Play Profile Daemon
EVMD - Event Monitor Daemon
ASM - Resource for monitoring ASM instances
Level 3: CRSD spawns:
orarootagent - Agent responsible for managing all root owned crsd resources.
oraagent - Agent responsible for managing all oracle owned crsd resources.
Level 4: CRSD rootagent spawns:
Network resource - To monitor the public network
SCAN VIP(s) - Single Client Access Name Virtual IPs
Node VIPs - One per node
ACFS Registery - For mounting ASM Cluster File System
GNS VIP (optional) - VIP for GNS
Level 4: CRSD oraagent spawns:
ASM Resouce - ASM Instance(s) resource
Diskgroup - Used for managing/monitoring ASM diskgroups.
DB Resource - Used for monitoring and managing the DB and instances
SCAN Listener - Listener for single client access name, listening on SCAN VIP
Listener - Node listener listening on the Node VIP
Services - Used for monitoring and managing services
ONS - Oracle Notification Service
eONS - Enhanced Oracle Notification Service
GSD - For 9i backward compatibility
GNS (optional) - Grid Naming Service - Performs name resolution
AODU>
AODU> rac abbr ---RAC术语
****Abbreviations, Acronyms****
This note lists commonly used Oracle Clusterware(Cluster Ready Service or Grid Infrastructure) Related abbreviations,acronyms,terms and Procedures.
nodename: short hostname for local node. For example, racnode1 for node racnode1.us.oracle.com
CRS: Cluster Ready Service, name for pre-11gR2 Oracle clusterware
GI: Grid Infrastructure, name for 11gR2 Oracle clusterware
GI cluster: Grid Infrastructure in cluster mode
Oracle Restart: GI Standalone, Grid Infrastructure in standalone mode
ASM user: the OS user who installs/owns ASM. For 11gR2, ASM and grid user is the same as ASM and GI share the same ORACLE_HOME.
For pre-11gR2 CRS cluster, ASM and CRS user can be different as ASM and CRS will be in different ORACLE_HOME.
For pre-11gR2 single-instance ASM, ASM and local CRS user is the same as ASM and local CRS share the same home.
CRS user: the OS user who installs/owns pre-11gR2 Oracle clusterware
grid user: the OS user who installs/owns 11gR2 Oracle clusterware
clusterware user: CRS or grid user which must be the same in upgrade environment
Oracle Clusterware software owner: same as clusterware user
clusterware home: CRS or GI home
ORACLE_BASE:ORACLE_BASE for grid or CRS user.
root script checkpoint file: the file that records root script (root.sh or rootupgrade.sh) progress so root script
can be re-executed, it's located in $ORACLE_BASE/Clusterware/ckptGridHA_${nodename}.xml
OCR: Oracle Cluster Registry. To find out OCR location, execute: ocrcheck
VD: Voting Disk. To find out voting file location, execute: crsctl query css votedisk
Automatic OCR Backup: OCR is backed up automatically every four hours in cluster environment on OCR Master node,
the default location is <clusterware-home>/cdata/<clustername>. To find out backup location, execute: ocrconfig -showbackup
SCR Base: the directory where ocr.loc and olr.loc are located.
Linux: /etc/oracle
Solaris: /var/opt/oracle
hp-ux: /var/opt/oracle
AIX: /etc/oracle
INITD Location: the directory where ohasd and init.ohasd are located.
Linux: /etc/init.d
Solaris: /etc/init.d
hp-ux: /sbin/init.d
AIX: /etc
oratab Location: the directory where oratab is located.
Linux: /etc
Solaris: /var/opt/oracle
hp-ux: /etc
AIX: /etc
CIL: Central Inventory Location. The location is defined by parameter inventory_loc in /etc/oraInst.loc or
/var/opt/oracle/oraInst.loc depend on platform.
Example on Linux:
cat /etc/oraInst.loc | grep inventory_loc
inventory_loc=/home/ogrid/app/oraInventory
Disable CRS/GI: To disable clusterware from auto startup when node reboots, as root execute "crsctl disable crs".
Replace it with "crsctl stop has" for Oracle Restart.
DG Compatible: ASM Disk Group's compatible.asm setting. To store OCR/VD on ASM, the compatible setting must be at least 11.2.0.0.0,
but on the other hand lower GI version won't work with higher compatible setting. For example, 11.2.0.1 GI will have issue to
access a DG if compatible.asm is set to 11.2.0.2.0. When downgrading from higher GI version to lower GI version,
if DG for OCR/VD has higher compatible, OCR/VD relocation to lower compatible setting is necessary.
To find out compatible setting, log on to ASM and query:
SQL> select name||' => '||compatibility from v$asm_diskgroup where name='GI';
NAME||'=>'||COMPATIBILITY
--------------------------------------------------------------------------------
GI => 11.2.0.0.0
In above example, GI is the name of the interested disk group.
To relocate OCR from higher compatible DG to lower one:
ocrconfig -add <diskgroup>
ocrconfig -delete <disk group>
To relocate VD from higher compatible DG to lower one:
crsctl replace votedisk <diskgroup>
When upgrading Oracle Clusterware:
OLD_HOME: pre-upgrade Oracle clusterware home - the home where existing clusterware is running off. For Oracle Restart,
the OLD_HOME is pre-upgrade ASM home.
OLD_VERSION: pre-upgrade Oracle clusterware version.
NEW_HOME: new Oracle clusterware home.
NEW_VERSION: new Oracle clusterware version.
OCR Node: The node where rootupgrade.sh backs up pre-upgrade OCR to $NEW_HOME/cdata/ocr$OLD_VERSION. In most case
it's first node where rootupgrade.sh was executed.
Example when upgrading from 11.2.0.1 to 11.2.0.2, after execution of rootupgrade.sh
ls -l $NEW_HOME/cdata/ocr*
-rw-r--r-- 1 root root 78220 Feb 16 10:21 /ocw/b202/cdata/ocr11.2.0.1.0
AODU>
AODU> rac diag --如何收集rac的故障诊断信息
****Data Gathering for All Oracle Clusterware Issues****
TFA Collector is installed in the GI HOME and comes with 11.2.0.4 GI and higher. For GI 11.2.0.3 or lower,
install the TFA Collector by referring to Document 1513912.1 for instruction on downloading and installing TFA collector.
$GI_HOME/tfa/bin/tfactl diagcollect -from "MMM/dd/yyyy hh:mm:ss" -to "MMM/dd/yyyy hh:mm:ss"
Format example: "Jul/1/2014 21:00:00"
Specify the "from time" to be 4 hours before and the "to time" to be 4 hours after the time of error.
****Linux/Unix Platform
a.Linux/UNIX 11gR2/12cR1
1. Execute the following as root user:
# script /tmp/diag.log
# id
# env
# cd <temp-directory-with-plenty-free-space>
# $GRID_HOME/bin/diagcollection.sh
# exit
For more information about diagcollection, check out "diagcollection.sh -help"
The following .gz files will be generated in the current directory and need to be uploaded along with /tmp/diag.log:
Linux/UNIX 10gR2/11gR1
1. Execute the following as root user:
# script /tmp/diag.log
# id
# env
# cd <temp-directory-with-plenty-free-space>
# export OCH=<CRS_HOME>
# export ORACLE_HOME=<DB_HOME>
# export HOSTNAME=<host>
# $OCH/bin/diagcollection.pl -crshome=$OCH --collect
# exit
The following .gz files will be generated in the current directory and need to be uploaded along with /tmp/diag.log:
2. For 10gR2 and 11gR1, if getting an error while running root.sh, please collect /tmp/crsctl.*
Please ensure all above information are provided from all the nodes.
****Windows Platform****
b.Windows 11gR2/12cR1:
set ORACLE_HOME=<GRID_HOME> for example: set ORACLE_HOME=D:\app\11.2.0\grid
set PATH=%PATH%;%ORACLE_HOME%\perl\bin
perl %ORACLE_HOME%\bin\diagcollection.pl --collect --crshome %ORACLE_HOME%
The following .zip files will be generated in the current directory and need to be uploaded:
crsData_<timestamp>.zip,
ocrData_<timestamp>.zip,
oraData_<timestamp>.zip,
coreData_<timestamp>.zip (only --core option specified)
For chmosdata*:
perl %ORACLE_HOME%\bin\diagcollection.pl --collect --crshome %ORACLE_HOME%
Windows 10gR2/11gR1
set ORACLE_HOME=<DB_HOME>
set OCH=<CRS_HOME>
set ORACLE_BASE=<oracle-base>
%OCH%\perl\bin\perl %OCH%\bin\diagcollection.pl --collect
AODU>
AODU> rac cvu --如何使用CVU
****Cluster verify utility****
How to Debug CVU / Collect CVU Trace Generated by RUNCLUVFY.SH (Doc ID 986822.1)
(a) GI/CRS has been installed
$ script /tmp/cluvfy.log
$ $GRID_HOME/bin/cluvfy stage -pre crsinst -n <node1, node2...> -verbose
$ $GRID_HOME/bin/cluvfy stage -post crsinst -n all -verbose
$ exit
(b) GI/CRS has not been installed
run runcluvfy.sh from the installation media or download from OTN http://www.oracle.com/technetwork/database/options/clustering/downloads/index.html
set the environment variables CV_HOME to point to the cvu home, CV_JDKHOME to point to the JDK home and an optional
CV_DESTLOC pointing to a writeable area on all nodes (e.g /tmp/cluvfy)
$ cd $CV_HOME/bin
$ script cluvfy.log
$ cluvfy stage -pre crsinst -n <node1, node2...>
$ exit
****Diagcollection options****
To collect only a subset of logs, --adr together with --beforetime and --aftertime can be used, i.e.:
# mkdir /tmp/collect
# $GRID_HOME/bin/diagcollection.sh --adr /tmp/collect -beforetime 20120218100000 -aftertime 20120218050000
This command will copy all logs contain timestamp 2012-02-18 05:00 - 10:00 to /tmp/collect directory.
Time is specified with YYYYMMDDHHMISS24 format. --adr points to a directory where the logs are copied to.
From 11.2.0.2 onwards, Cluster Health Monitor(CHM/OS) note 1328466.1 data can also be collected, i.e.:
# $GRID_HOME/bin/diagcollection.sh --chmos --incidenttime 02/18/201205:00:00 --incidentduration 05:00
This command will collect data from 2012-02-18 05:00 to 10:00 for 5 hours. incidenttime is specified as MM/DD/YYYY24HH:MM:SS,
incidentduration is specified as HH:MM.
****For 11gR2/12c:****
Set environment variable CV_TRACELOC to a directory that's writable by grid user; trace should be generated in there once
runcluvfy.sh starts. For example, as grid user:
rm -rf /tmp/cvutrace
mkdir /tmp/cvutrace
export CV_TRACELOC=/tmp/cvutrace
export SRVM_TRACE=true
export SRVM_TRACE_LEVEL=1
<STAGE_AREA>/runcluvfy.sh stage -pre crsinst -n <node1>,<node2> -verbose
Note:
A. STAGE_AREA refers to the location where Oracle Clusterware is unzipped.
B. Replace above "runcluvfy.sh stage -pre crsinst -n <node1>,<node2> -verbose"
if other stage/comp needs to be traced,i.e."./runcluvfy.sh comp ocr -verbose"
C. For 12.1.0.2, the following can be set for additional tracing from exectask command:
export EXECTASK_TRACE=true
The trace will be in <TMP>/CVU_<version>_<user>/exectask.trc, i.e. /tmp/CVU_12.1.0.2.0_grid/exectask.trc
****For 10gR2, 11gR1 or 11gR2:****
1. As crs/grid user, backup runcluvfy.sh. For 10gR2, it's located in <STAGE_AREA>/cluvfy/runcluvfy.sh;
and for 11gR1 and 11gR2, <STAGE_AREA>/runcluvfy.sh
cd <STAGE_AREA>
cp runcluvfy.sh runcluvfy.debug
2. Locate the following lines in runcluvfy.debug:
# Cleanup the home for cluster verification software
$RM -rf $CV_HOME
Comment out the remove command so runtime files including trace won't be removed once CVU finishes.
# Cleanup the home for cluster verification software
# $RM -rf $CV_HOME
3. As crs/grid user,set environment variable CV_HOME to anywhere as long as the location is writable by crs/grid user and has 400MB of free space:
mkdir /tmp/cvdebug
CV_HOME=/tmp/cvdebug
export CV_HOME
This step is optional, if CV_HOME is unset, CVU files will be generated in /tmp.
4. As crs/grid user, execute runcluvfy.debug:
export SRVM_TRACE=true
export SRVM_TRACE_LEVEL=1
cd <STAGE_AREA>
./runcluvfy.debug stage -pre crsinst -n <node1>,<node2> -verbose
Note:
A. SRVM_TRACE_LEVEL is effective for 11gR2 only.
B. Replace above "runcluvfy.sh stage -pre crsinst -n <node1>,<node2> -verbose" if other stage/comp needs to be
traced, i.e. "./runcluvfy.sh comp ocr -verbose"
5. Regardless whether above command finishes or not, CVU trace should be generated in:
10gR2: $CV_HOME/<pid>/cv/log
11gR1: $CV_HOME/bootstrap/cv/log
11gR2: $CV_HOME/bootstrap/cv/log
If CV_HOME is unset, trace will be in /tmp/<pid>/cv/log or /tmp/bootstrap/cv/log depend on CVU version.
6. Clean up temporary files generated by above runcluvfy.debug:
rm -rf $CV_HOME/bootstrap
HOW TO UPGRADE CLUVFY IN CRS_HOME (Doc ID 969282.1)
On the other hand e.g. GridControl uses the installed cluvfy from CRS_HOME. So if GridControl shows errors when checking the cluster
You may want to update the cluvfy used by GridControl.
As cluvfy consists of many files it is best to install the newest version outside of CRS_HOME so there
will be no conflict between the jar files and libraries used by cluvfy and CRS.
To do so follow these steps:
1. download the newest version of cluvfy from here and extract the files into the target directory You want to use.
The following assumptions are made here:
CRS_HOME=/u01/oracle/crs
new cluvfy-home CVU_HOME = /u01/oracle/cluvfy
2. go to CRS_HOME/bin and copy the existing file CRS_HOME/bin/cluvfy to CRS_HOME/bin/cluvfy.org
3. copy CVU_HOME/bin/cluvfy to CRS_HOME/bin/cluvfy
4. edit that file and search for the line
CRSHOME= CRSHOME=that file and search fosearch foe
/cluvfy
S_HOME/bin/cOME/bin/cCRS_HOME/bin/cluvfy.orgluvfy.organt to use.
ewest versiest versie of CRS_HOME so there so there cking the cluster
ime ter
ime ied as MM/DD/YYYY24HH:MYYY24HH:Mt to the JDK home and aome and al
ng. For example, 11.mple, 11.will have issue to
ace to
ac if compatible.asm is s.asm is s2.0.2.0. When downgradidowngradiigher GI version to lowon to lowsion,
ptionally you coly you cogure the "GNS" (Grid Na (Grid Na (Grid_OracleHome
make the following corrections:
ORACLE_HOME=$ORA_CRS_HOME
CRSHOME=$ORACLE_HOME
CV_HOME=/u01/oracle/cluvfy <---- check for Your environment
JREDIR=$CV_HOME/jdk/jre
DESTLOC=/tmp
./runcluvfy.sh stage -pre dbinst -n <node1>,<node2> -verbose | tee /tmp/cluvfy_dbinst.log
runcluvfy.sh stage -pre hacfg -verbose
./cluvfy stage -pre dbinst -n <node1>,<node2> -verbose | tee /tmp/cluvfy_dbinst.log
./runcluvfy.sh stage -pre crsinst -n <node1>,<node2> -verbose | tee /tmp/cluvfy_preinst.log
./cluvfy stage -post crsinst -n all -verbose | tee /tmp/cluvfy_postinst.log
$ ./cluvfy comp -list
USAGE:
cluvfy comp <component-name> <component-specific options> [-verbose]
Valid components are:
nodereach : checks reachability between nodes
nodecon : checks node connectivity
cfs : checks CFS integrity
ssa : checks shared storage accessibility
space : checks space availability
sys : checks minimum system requirements
clu : checks cluster integrity
clumgr : checks cluster manager integrity
ocr : checks OCR integrity
olr : checks OLR integrity
ha : checks HA integrity
crs : checks CRS integrity
nodeapp : checks node applications existence
admprv : checks administrative privileges
peer : compares properties with peers
software : checks software distribution
asm : checks ASM integrity
acfs : checks ACFS integrity
gpnp : checks GPnP integrity
gns : checks GNS integrity
scan : checks SCAN configuration
ohasd : checks OHASD integrity
clocksync : checks Clock Synchronization
vdisk : check Voting Disk Udev settings
$ ./cluvfy stage -list
USAGE:
cluvfy stage {-pre|-post} <stage-name> <stage-specific options> [-verbose]
Valid stage options and stage names are:
-post hwos : post-check for hardware and operating system
-pre cfs : pre-check for CFS setup
-post cfs : post-check for CFS setup
-pre crsinst : pre-check for CRS installation
-post crsinst : post-check for CRS installation
-pre hacfg : pre-check for HA configuration
-post hacfg : post-check for HA configuration
-pre dbinst : pre-check for database installation
-pre acfscfg : pre-check for ACFS Configuration.
-post acfscfg : post-check for ACFS Configuration.
-pre dbcfg : pre-check for database configuration
-pre nodeadd : pre-check for node addition.
-post nodeadd : post-check for node addition.
-post nodedel : post-check for node deletion.
cluvfy comp ssa -n dbnode1,dbnode2 -s
Datatbase logs & trace files:
cd $(orabase)/diag/rdbms
tar cf - $(find . -name '*.trc' -exec egrep '<date_time_search_string>' {} \; grep -v bucket) | gzip > /tmp/database_trace_files.tar.gz
ASM logs & trace files:
cd $(orabase)/diag/asm/+asm/
tar cf - $(find . -name '*.trc' -exec egrep '<date_time_search_string>' {} \; grep -v bucket) | gzip > /tmp/asm_trace_files.tar.gz
Clusteware logs:
<GI home>/bin/diagcollection.sh --collect --crs --crshome <GI home>
OS logs:
/var/adm/messages* or /var/log/messages* or 'errpt -a' or Windows System Event Viewer log (saved as .TXT file)
AODU>
AODU> rac diaginstall --RAC安装故障诊断信息收集
****Data Gathering for Oracle Clusterware Installation Issues****
Failure before executing root script:
For 11gR2: note 1056322.1 - Troubleshoot 11gR2 Grid Infrastructure/RAC Database runInstaller Issues
For pre-11.2: note 406231.1 - Diagnosing RAC/RDBMS Installation Problems
Failure while or after executing root script
Provide files in Section "Data Gathering for All Oracle Clusterware Issues" and the following:
root script (root.sh or rootupgrade.sh) screen output
For 11gR2: provide zip of <$ORACLE_BASE>/cfgtoollogs and <$ORACLE_BASE>/diag for grid user.
For pre-11.2: Note 240001.1 - Troubleshooting 10g or 11.1 Oracle Clusterware Root.sh Problems
Before deconfiguring, collect the following as grid user if possible to generate a list of user resources to be
added back to the cluster after reconfigure finishes:
$GRID_HOME/bin/crsctl stat res -t
$GRID_HOME/bin/crsctl stat res -p
$GRID_HOME/bin/crsctl query css votedisk
$GRID_HOME/bin/crsctl query crs activeversion
$GRID_HOME/bin/crsctl check crs
cat /etc/oracle/ocr.loc /var/opt/oracle/ocr.loc
$GRID_HOME/bin/crsctl get css diagwait
$GRID_HOME/bin/ocrcheck
$GRID_HOME/bin/oifcfg iflist -p -n
$GRID_HOME/bin/oifcfg getif
$GRID_HOME/bin/crsctl query css votedisk
$GRID_HOME/bin/srvctl config nodeapps -a
$GRID_HOME/bin/srvctl config scan
$GRID_HOME/bin/srvctl config asm -a
$GRID_HOME/bin/srvctl config listener -l <listener-name> -a
$DB_HOME/bin/srvctl config database -d <dbname> -a
$DB_HOME/bin/srvctl config service -d <dbname> -s <service-name> -v
# <$GRID_HOME>/crs/install/roothas.pl -deconfig -force -verbose
OHASD Agents do not start
Troubleshoot Grid Infrastructure Startup Issues (Doc ID 1050908.1)
In a nutshell, the operating system starts ohasd, ohasd starts agents to start up daemons (gipcd, mdnsd, gpnpd, ctssd, ocssd, crsd, evmd asm etc),
and crsd starts agents that start user resources (database, SCAN, listener etc).
OHASD.BIN will spawn four agents/monitors to start resource:
oraagent: responsible for ora.asm, ora.evmd, ora.gipcd, ora.gpnpd, ora.mdnsd etc
orarootagent: responsible for ora.crsd, ora.ctssd, ora.diskmon, ora.drivers.acfs etc
cssdagent / cssdmonitor: responsible for ora.cssd(for ocssd.bin) and ora.cssdmonitor(for cssdmonitor itself)
$GRID_HOME/bin/crsctl start res ora.crsd -init
$GRID_HOME/bin/crsctl start res ora.evmd -init
$GRID_HOME/bin/crsctl stop res ora.evmd -init
ps -ef | grep <keyword> | grep -v grep | awk '{print $2}' | xargs kill -9
If ohasd.bin can not start any of above agents properly, clusterware will not come to healthy state.
AODU>
AODU> rac debug --RAC设置调试模式
****Troubleshooting Steps****
Following are the new features in 10.2/11gR1/11gR2
Using crsctl, debugging can be turned on and off for CRS/EVM/CSS and their subcomponents. Debug levels can also be dynamically changed
using crsctl. The debug information is persisted in OCR for use during the next startup. Debugging can be turned on for CRS managed
resource like VIP, Instance as well.
* Note,in the following examples, commands with "#" prompt are executed as root user,commands with "$" prompt can be executed as clusterware owner.
1. ****Component level logging****
10.2/11gR1:
# crsctl debug log css [module:level]{,module:level} ...
- Turns on debugging for CSS
# crsctl debug log crs [module:level]{,module:level} ...
- Turns on debugging for CRS
# crsctl debug log evm [module:level]{,module:level} ...
- Turns on debugging for EVM
For example:
# crsctl debug log crs "CRSRTI:1,CRSCOMM:2"
# crsctl debug log evm "EVMD:1"
11gR2:
# crsctl set {log|trace} {mdns|gpnp|css|crf|crs|ctss|evm|gipc} "<name1>=<lvl1>,..."
Set the log/trace levels for specific modules within daemons
For example:
# crsctl set log crs "CRSRTI=2,CRSCOMM=2"
To list all modules:
10.2/11gR1:
# crsctl lsmodules {css | crs | evm} - lists the CSS modules that can be used for debugging
11gR2:
# crsctl lsmodules {mdns|gpnp|css|crf|crs|ctss|evm|gipc}
where
mdns multicast Domain Name Server
gpnp Grid Plug-n-Play Service
css Cluster Synchronization Services
crf Cluster Health Monitor
crs Cluster Ready Services
ctss Cluster Time Synchronization Service
evm EventManager
gipc Grid Interprocess Communications
Logging level definition:
level 0 = turn off
level 2 = default
level 3 = verbose
level 4 = super verbose
To check current logging level:
10.2/11gR1:
For CSS:
$ grep clssscSetDebugLevel <ocssd logs>
For CRS / EVMD:
$ grep "ENV Logging level for Module" <crsd / evmd logs>
11gR2:
$ crsctl get log <modules> ALL
For example:
$ crsctl get log css ALL
Get CSSD Module: CLSF Log Level: 0
Get CSSD Module: CSSD Log Level: 2
Get CSSD Module: GIPCCM Log Level: 2
Get CSSD Module: GIPCGM Log Level: 2
Get CSSD Module: GIPCNM Log Level: 2
Get CSSD Module: GPNP Log Level: 1
Get CSSD Module: OLR Log Level: 0
Get CSSD Module: SKGFD Log Level: 0
2. ****Component level debugging****
Debugging can be turned on for CRS and EVM and their specific modules by setting environment variables or through crsctl.
To turn on tracing for all modules:
ORA_CRSDEBUG_ALL
To turn on tracing for a specific sub module:
ORA_CRSDEBUG_<modulename>
3. ****CRS stack startup and shutdown****
Using crsctl, the entire CRS stack and the resources can be started and stopped.
# crsctl start crs
# crsctl stop crs
4. ****Diagnostics collection script - diagcollection.pl****
This script is for collecting diagnostic information from a CRS installation. The diagnostics are necessary for development
to be able to help with SRs, bugs and other problems that may arise in the field.
10.2
# <$CRS_HOME>/bin/diagcollection.pl
11.1
# <$CRS_HOME>/bin/diagcollection.pl -crshome=$ORA_CRS_HOME --collect
11.2
# <$GRID_HOME>/bin/diagcollection.sh
For more details, please refer to Document 330358.1 CRS 10gR2/ 11gR1/ 11gR2 Diagnostic Collection Guide
5. ****Unified log directory structure****
The following log directory structure is in place for 10.2 onwards in an effort to consolidate the log files of different Clusterware
components for easy diag information retrieval and problem analysis.
Objectives
1. Need one place where all the CRS related log files can be located.
2. The directory structure needs to be intuitive for any user
3. Permission, ownership issues need to be addressed
4. Disk space issues need to be considered
10.2/11gR1
Alert file
CRS_HOME/log/<host>/alert<hostname>.log
CRS component directories
CRS_HOME/log/
CRS_HOME/log/<host>
CRS_HOME/log/<host>/crsd
CRS_HOME/log/<host>/cssd
CRS_HOME/log/<host>/evmd
CRS_HOME/log/<host>/client
CRS_HOME/log/<host>/racg
11gR2:
GI components directories:
<GRID_HOME>/log/<host>/crsd
<GRID_HOME>/log/<host>/cssd
<GRID_HOME>/log/<host>/admin
<GRID_HOME>/log/<host>/agent
<GRID_HOME>/log/<host>/evmd
<GRID_HOME>/log/<host>/client
<GRID_HOME>/log/<host>/ohasd
<GRID_HOME>/log/<host>/mdsnd
<GRID_HOME>/log/<host>/gipcd
<GRID_HOME>/log/<host>/gnsd
<GRID_HOME>/log/<host>/ctssd
<GRID_HOME>/log/<host>/racg
Log files of the respective components go to the above directories. Core files are dropped in the same directory as the logs.
Old core files will be backed up.
6. ****CRS and EVM Alerts****
CRS and EVM post alert messages on the occurrence of important events.
CRSD
[NORMAL] CLSD-1201: CRSD started on host %s.
[ERROR] CLSD-1202: CRSD aborted on host %s. Error [%s]. Details in %s.
[ERROR] CLSD-1203: Failover failed for the CRS resource %s. Details in %s.
[NORMAL] CLSD-1204: Recovering CRS resources for host %s
[ERROR] CLSD-1205: Auto-start failed for the CRS resource %s. Details in %s.
EVMD
[NORMAL] CLSD-1401: EVMD started on node %s
[ERROR] CLSD-1402: EVMD aborted on node %s. Error [%s]. Details in %s.
[ERROR] CLSD-1402: EVMD aborted on node %s. Error [%s]. Details in %s.
7. ****Resource debugging****
Using crsctl, debugging can be turned on for resources.
10.2/11gR1:
# crsctl debug log res <resname:level>
- Turns on debugging for resources
For example:
# crsctl debug log res ora.racnode.vip:2
11gR2:
# crsctl set log res <resname>=<lvl> [-init]
Set the log levels for resources including init resource, check crsctl stat res -t and crsctl stat res -t -init for resource name.
For example:
# crsctl set log res ora.racnode.vip=2
# crsctl set log res ora.cssdmonitor=2 -init
This sets debugging on for the resource in the form of an OCR key.
8. ****Health Check****
To determine the health of the CRS stack:
$ crsctl check crs
To determine health of individual daemons:
$ crsctl check css
$ crsctl check evm
or 11gR2:
$ crsctl stat res -t -init
9. ****OCR debugging****
In 10.1, to debug OCR, the $ORA_CRS_HOME/srvm/ocrlog.ini was updated to a level higher than 0. Starting with 10.2 it is possible to
debug the OCR at the component level using the following commands
a) Edit $ORA_CRS_HOME/srvm/admin/ocrlog.ini to the component level
Eg: comploglvl="OCRAPI:5;OCRCLI:5;OCRSRV:5;OCRMAS:5;OCRCAC:5"
b) Use the dynamic feature to update the logging into the OCR itself using the command crsctl
Same as step one, the CRS modules name associated with OCR are:
10.2/11gR1:
CRSOCR
11gR2:
CRSOCR
OCRAPI
OCRASM
OCRCAC
OCRCLI
OCRMAS
OCRMSG
OCROSD
OCRRAW
OCRSRV
OCRUTL
AODU>
AODU> rac eviction --RAC重启、节点驱逐诊断信息收集
****Data Gathering for Node Reboot/Eviction****
Provide files in Section "Data Gathering for All Oracle Clusterware Issues" and the followings:
Approximate date and time of the reboot, and the hostname of the rebooted node
OSWatcher archives which cover the reboot time at an interval of 20 seconds with private network monitoring configured.
Note 301137.1 - OS Watcher User Guide
Note.433472.1 - OS Watcher For Windows (OSWFW) User Guide
For pre-11.2, zip of /var/opt/oracle/oprocd/* or /etc/oracle/oprocd/*
For pre-11.2, OS logs - refer to Section Appendix B
For 11gR2+, zip of /etc/oracle/lastgasp/* or /var/opt/oracle/lastgasp/*
CHM/OS data that covers the reboot time for platforms where it is available,
refer to Note 1328466.1 for section "How do I collect the Cluster Health Monitor data"
The Cluster Health Monitor is integrated part of 11.2.0.2 Oracle Grid Infrastructure for Linux (not on Linux Itanium) and
Solaris (Sparc 64 and x86-64 only), so installing 11.2.0.2 Oracle Grid Infrastructure on those platforms will automatically
install the Cluster Health Monitor. AIX will have the Cluster Health Monitor starting from 11.2.0.3.
The Cluster Health Monitor is also enabled for Windows (except Windows Itanium) in 11.2.0.3.
ora.crf is the Cluster Health Monitor resource name that ohasd manages.Issue "crsctl stat res ▒Ct ▒Cinit" to check the current status
of the Cluster Health Monitor.
For example, issue "<GI_HOME>/bin/diagcollection.pl --collect --crshome $ORA_CRS_HOME --chmos
--incidenttime <start time of interesting time period> --incidentduration 05:00"
What logs and data should I gather before logging a SR for the Cluster Health Monitor error?
1) provide 3-4 pstack outputs over a minute for osysmond.bin
2) output of strace -v for osysmond.bin about 2 minutes.
3) strace -cp <osysmond.bin pid> for about 2 min
4) oclumon dumpnodeview -v output for that node for 2 min.
5) output of "uname -a"
6) outpuft of "ps -eLf|grep osysmond.bin"
7) The ologgerd and sysmond log files in the CRS_HOME/log/<host name> directory from all nodes
How to start and stop CHM that is installed as a part of GI in 11.2 and higher?
The ora.crf resource in 11.2 GI (and higher) is the resource for CHM, and the ora.crf resource is managed by ohasd.
Starting and stopping ora.crf resource starts and stops CHM.
To stop CHM (or ora.crf resource managed by ohasd)
$GRID_HOME/bin/crsctl stop res ora.crf -init
To start CHM (or ora.crf resource managed by ohasd)
$GRID_HOME/bin/crsctl start res ora.crf -init
If vendor clusterware is being used, upload the vendor clusterware logs
AODU>
