Focus On Oracle

Installing, Backup & Recovery, Performance Tuning,
Troubleshooting, Upgrading, Patching, Zero-Downtime Upgrade, GoldenGate

Oracle Exadata ,Oracle ODA, Oracle ZDLRA


当前位置: 首页 » 技术文章 » Exadata

Exadata常用故障诊断脚本

Exadata脚本位置及说明

Utility Path Usage/Comments
Infiniband Some of these tools may be found in /opt/oracle.SupportTools/ibdiagtools on cells or database servers. Also see the  Infiniband Triage wiki page.
/opt/oracle.SupportTools/ibdiagtools/infinicheck
/opt/oracle.SupportTools/ibdiagtools/verify-topology
ibquery errors
/usr/bin/ibdiagnet Detecting fabric issues
/usr/sbin/ibaddr Examining HCA state & guids
/usr/sbin/ibcheckerrors Detecting fabric issues
/usr/sbin/ibcheckerrs Detecting fabric issues
/usr/sbin/ibcheckstate Detecting fabric issues
/usr/sbin/ibcheckwidth Detecting fabric issues
/usr/sbin/ibclearcounters Reset counters when detecting fabric issues
/usr/sbin/ibclearerrors Reset counters when detecting fabric issues
/usr/sbin/ibdatacounters Not directly used. perfquery is used instead
/usr/sbin/ibdatacounts Not directly used. perfquery is used instead
/usr/sbin/ibhosts Lising cells/db nodes
/usr/sbin/iblinkinfo.pl Obtaining the fabric topology
/usr/sbin/ibnetdiscover Obtaining the fabric topology
/usr/sbin/ibnodes Lising cells/db nodes/switches
/usr/sbin/ibping Checking IB level connectivity
/usr/sbin/ibportstate Testing port failure/disabling bad links
/usr/sbin/ibqueryerrors.pl Detecting fabric issues
/usr/sbin/ibstat Examining HCA state & guids
/usr/sbin/ibstatus Examining HCA state & guids
/usr/sbin/ibswitches Listing IB switch names
/usr/sbin/ibtracert Examining IB routes
/usr/sbin/perfquery Computing throughput, detecting fabric errors
/usr/sbin/saquery Not directly used
/usr/sbin/set_nodedesc.sh Setting the HCA node description based on node type
/usr/sbin/sminfo Determing location of master SM
/usr/sbin/smpdump not directly used
/usr/sbin/smpquery not directly used
/usr/sbin/vendstat not directly used
/usr/bin/ibv_devices listing local HCAs
/usr/bin/ibv_devinfo listing details of local HCAs
/usr/bin/ibv_rc_pingpong Determining working status of HCA
/usr/bin/ibv_srq_pingpong Determining working status of HCA
/usr/bin/ibv_uc_pingpong Determining working status of HCA
/usr/bin/ibv_ud_pingpong Determining working status of HCA
/usr/bin/mstflint Burning new HCA firmware/obtaining current firmware version
/usr/bin/ib_rdma_bw Computing IB level stats for troubleshooting
/usr/bin/ib_rdma_lat Computing IB level stats for troubleshooting
/usr/bin/ib_read_bw Computing IB level stats for troubleshooting
/usr/bin/ib_read_lat Computing IB level stats for troubleshooting
/usr/bin/ib_send_bw Computing IB level stats for troubleshooting
/usr/bin/ib_send_lat Computing IB level stats for troubleshooting
/usr/bin/ib_write_bw Computing IB level stats for troubleshooting
/usr/bin/ib_write_lat Computing IB level stats for troubleshooting
/usr/bin/qperf Computing throughput for RDS/TCP/SDP protocols
/sbin/ifconfig Determining configuration/status of network interfaces
/usr/bin/ib-bond Determining active slave interface for bond0
/usr/bin/rds-gen Not directly used
/usr/bin/rds-info Examining RDS state
/usr/bin/rds-ping Determining RDS connectivity
/usr/bin/rds-sink Not directly used
/usr/bin/rds-stress Profiling RDS performance
Imaging and versions These tools are related to imaging status and info as well as versions installed
imagehistory
imageinfo Only on database servers version >= 11.2.1.3
/opt/oracle.cellos/CheckHWnFWProfile Only applicable on cells. With the -d option, it will display versions found. Without options, it will report any mismatches against known correct vaiues.
/opt/oracle.SupportTools/CheckSWProfile.sh Only applicable on cells. Without options, displays any mismatch against known good configurations.
collectlogs.sh for collecting logs from onecommand deployments
Networking  
cat /proc/net/bonding/bond*
cat /sys/class/net/eth?/operstate
cat /sys/class/net/bond*/operstate
ifconfig
ethtool <interface_name> reports information about the interface like link mode capabilities
Logfiles on both database server and cells
/var/log/messages Older versions of this file will be automatically renamed as messages.<number> with number 1 being the most recent history.
dmesg (a command that displays log)
/var/log/cellos/validations.log
/var/log/cellos/validations/*log
Logfiles on cells

$ADR_BASE/diag/asm/cell/<hostname>

/trace/alert.log

Cell’s alert log. Also will find cell’s trace files in the same directory as the alert.log
Logfiles on database servers

$ORACLE_BASE/diag/asm/+asm/<instname>

/trace/

alert_<instname>.log

ASM alert logfile

$ORACLE_BASE/diag/rdbms/<dbname>/

<instname>/trace/alert_<instname>.log

DB alert log – one for each database running…may be more than one DB

/u01/app/11.2.0/grid/log/<hostname>/

alert<hostname>.log

Grid Infrastructure alert logfile. This log is relatively high-level and will often lead you to one of the logs mentioned in the entry just below this one.

/u01/app/11.2.0/grid/log/<hostname>/

[cssd,crsd,diskmon]/*.log

Logfiles for CSSD, CRSD, and diskmon processes. These processes are the most likely ones to have issues and will expose most issues.
Infiniband Switches
sminfo shows the current subnet master switch in the fabric – there should be exactly one regardless of how many switches are present in the fabric
ibswitches lists all IB switches in the fabric
showunhealthy shows any unhealthy sensors
env_test lists all the data from the environmental sensors in the switch
nm2version shows the current versions – use this to determine what version the switch is running right now
getfanspeed shows the speed of the internal fans in the switch – can be useful if showunhealthy indicates a problem with one of the fans
Cell software commands (cellcli and friends) These commands may be run from within cellcli
list cell detail
list alerthistory
list celldisk detail
list griddisk detail
list lun detail
list physicaldisk detail
list flashcache detail
list griddisk attributes name,status,asmmodestatus,asmdeactivationoutcome
alter cell validate configuration
adrci show incident
mdadm –misc –detail /dev/md* for an overview of the state of the raid devices on the storage cell
cat /proc/mdstat for a view of the status of the devices
/usr/local/bin/ipconf –verify
mdadm -Q –detail /dev/md? state information on a particular meta device
<GRID_HOME>/bin/kfod disks=all lists disks available from DB node for ASM use (run on DB node)
Hardware These commands may be run to query hardware status. Unless otherwise noted, they apply to cells and database servers.
ipmitool sel list Lists the system event logs – these logs sometimes show HW events that aren’t seen elsewhere.
ipmitool sunoem cli ‘show /SYS’ Shows system serial number, fault_state (overall fault state, not necessarily a rollup – may be a fault on a component-level)
/opt/MegaRAID/MegaCli/MegaCli64 -adpallinfo -a0 All adapter info
/opt/MegaRAID/MegaCli/MegaCli64 -FwTermLog -dsply -a0 Diplay controller’s log
/opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -GetBbuStatus -a0 Get battery status
/opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -GetBbuProperties -a0 Get battery properties
/opt/MegaRAID/MegaCli/MegaCli64 -LDinfo -Lall -aALL Looking for WriteThrough? on the Current Cache Policy – if disabled, may affect performance; easier to get this information from cellcli -e list lun attributes name,lunWriteCacheMode,status
/opt/MegaRAID/MegaCli/MegaCli64 -LDPdInfo -aAll Helpful to investigate predictive failure if necessary
/opt/MegaRAID/MegaCli/MegaCli64 -PDList -a0 The Inquiry Data will contain the drive firmware, but decoding the string to get the firmware requires special instructions – beyond what is here. Check list physicaldisk attributes physicalFirmware in cellcli for drive FW version.
lspci [-v [ -v [ -v ]]] Listing PCI devices. The more -v arguments you add, the more information detail it provides
lsscsi Especially helpful on cells. Flash cards will show up as MARVELL devices. There should be 16 flash devices listed. If not, there’s a card missing or not visible to the OS.
/opt/oracle.cellos/scripts_aura.sh This script lists the flash disks as will be seen from the cell software
/opt/oracle.SupportTools/sundiag.sh Gathers many diagnostic command outputs and important logfiles for analysis of storage cell and disk issues




















































































































































关键词:exadata 

相关文章

Exadata exafusion参数在不同版本的变化
19c新特性之实时统计信息收集
Oracle Exadata x8m-2的更新
Oracle Exadata十年
Oracle Exadata x8-2的更新
终于等到你,Oracle 19c真的来了
useful mos note for exadata
Exadata最权威最完整的学习资料
Exadata上收集Cell节点的日志
Exadata上如何重置Cell节点root密码当你忘记时
Oracle性能加速之Write-Back Flash Cache
Exadata and ASM
Top