Exadata之Infiniband交换机
Sun Datacenter InfiniBand Switch 36(36口 Infiniband 交换机)
X3 和 X2 半配和满配的机柜会包含三个(两个leaf switch一个 spine switch)
X4 和 X5 每套机柜会包含两个(leaf switch)
Exadata计算和存储节点都包含一块基于PCIe的Mellanox HCA(Host Channel Adapter,Infiniband 4XQDR )卡,单口单向速率40 Gb/s,两个口需要分别接到两个交换机上,以避免单点故障。下图为IB交换机连接示意图。
以X2为例,每台Infiniband交换机上有22根线连接到计算节点和存储节点,其中11根处于活动状态,11根处于Standby状态,两个叶节点之间使用7根线进行互联。
IPoIB:Internet Protocol over InfiniBand and it's based on TCP/IP and can provide more bandwidth.There is no necessary to modify your application to code.
In Exadata Infiniband as RAC Private network,the default protocol is IPoIB。RAC-->IPC-->TCP/IP-->IPoIB-->HCARDS:Reliable Datagram Sockets (RDS) .It's developed by Oracle and based on IPC(inter-processor communication) directly.Compared with IPoIB and other traditions,RDS will cost little CPU, low latency with high bandwidth.RAC-->IPC-->RDS-->HCA
iDB:It is built on Reliable Datagram Sockets (RDS v3) protocol and runs over InfiniBand ZDP (Zero-loss Zero-copy Datagram Protocol). The objective of ZDP is to eliminate unnessary copying of blocks. RDS is based on Socket API with low overhead, low latency, high bandwidth. Exadata Cell Node can send/receive large transfer using Remote Direct Memory Access (RDMA).
Oracle Exadata uses the Intelligent Database protocol (iDB) to transfer data between Database Node and Storage Cell Node. It is implemented in the database kernel and work as funtion shipping architecture to transparently maps database operations to Exadata operations. iDB can be used to transfer SQL operation from Database Node to Cell node, and get query result back or full data blocks back from Cell Node.
RDMA is a direct memory access from the memory of one computer into another computer without involving either’s operating system. The transfer require no work to be done by CPUs, caches, or context switches, and transfers continue in parallel with other system operations. It is quite useful in massively parallel processing environment.
If you want to optimize communications between Oracle Engineered System, like Exadata, Big Data Appliance, and Exlatics, you can use Sockets Direct Protocol (SDP) networking protocol. SDP only deals with stream sockets.
SDP allows high-performance zero-copy data transfers via RDMA network fabrics and uses a standard wire protocol over an RDMA fabric to support stream sockets (SOCK_STREAM). The goal of SDP is to provide an RDMA-accelerated alternative to the TCP protocol on IP, at the same time transparent to the application.
It bypasses the OS resident TCP stack for stream connections between any endpoints on the RDMA fabric. All other socket types (such as datagram, raw, packet, etc.) are supported by the IP stack and operate over standard IP interfaces (i.e., IPoIB on InfiniBand fabrics). The IP stack has no dependency on the SDP stack; however, the SDP stack depends on IP drivers for local IP assignments and for IP address resolution for endpoint identifications.
IPoIB(Internet Protocol over InfiniBand):Exadata节点间集群软件通信使用的是IPoIB
判断 RAC 是否使用 RDS 可以通过 RDBMS 的 alert log 实例启动日志:
cluster interconnect IPC version:Oracle RDS/IP (generic)
RDS(Reliable Datagram Sockets):Exadata计算节点和存储节点间数据传输,计算节点之间数据传输(RAC, Cache Fusion)
判断计算/存储节点间是否使用 RDS 可以通过 CELL 的 alert log:
CELL communication is configured to use 1 interface(s):
192.168.10.13
IPC version: Oracle RDS/IP (generic)
IPC Vendor 1 Protocol 3
Version 4.1
以下 ifconfig 命令输出显示 bondib0 是绑定后的逻辑网卡,ib0 和 ib1 是两块物理网卡:
bondib0 Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
inet addr:192.168.10.3 Bcast:192.168.11.255 Mask:255.255.252.0
inet6 addr: fe80::221:2800:1ef:f08d/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:31000 Metric:1
RX packets:30051124 errors:0 dropped:14978124 overruns:0 frame:0
TX packets:105715 errors:0 dropped:18 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1701216753 (1.5 GiB) TX bytes:24243537 (23.1 MiB)
ib0 Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
UP BROADCAST RUNNING SLAVE MULTICAST MTU:31000 Metric:1
RX packets:15072433 errors:0 dropped:0 overruns:0 frame:0
TX packets:105715 errors:0 dropped:18 overruns:0 carrier:0
collisions:0 txqueuelen:1024
RX bytes:862410057 (822.4 MiB) TX bytes:24243537 (23.1 MiB)
ib1 Link encap:InfiniBand HWaddr 80:00:00:49:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
UP BROADCAST RUNNING SLAVE MULTICAST MTU:31000 Metric:1
RX packets:14977557 errors:0 dropped:14977557 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1024
RX bytes:838743192 (799.8 MiB) TX bytes:0 (0.0 b)
X4 开始使用基于PCIe 3.0的HCA 卡,默认情况下两个端口会使用双活(active/active,理论上单向速率可以达到80 Gb/s)模式,其绑定是在 Linux 内核级别实现的。
ib0 Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
inet addr:192.168.8.23 Bcast:192.168.11.255 Mask:255.255.252.0
inet6 addr: fe80::210:e000:174:f741/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1
RX packets:9065689 errors:0 dropped:0 overruns:0 frame:0
TX packets:5404221 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1024
RX bytes:1463023797 (1.3 GiB) TX bytes:982270189 (936.7 MiB)
ib1 Link encap:InfiniBand HWaddr 80:00:00:49:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
inet addr:192.168.8.24 Bcast:192.168.11.255 Mask:255.255.252.0
inet6 addr: fe80::210:e000:174:f742/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1
RX packets:4185633 errors:0 dropped:0 overruns:0 frame:0
TX packets:3897138 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1024
RX bytes:284527349 (271.3 MiB) TX bytes:601168462 (573.3 MiB)
有些时候也需要将双活模式修改为主备模式,比如新的 X4/X5 同已有X2/X3 做级联,那么必须将X4/X5 修改为主备模式,需要注意的是,只有计算节点需要修改,存储节点不需要。
How to change InfiniBand bonding on Exadata X4-2 from active/active to active/passive (Doc ID 1642955.1)
除此之外,Infiniband 还可以用来连接外部同样支持 Infiniband 的设备:
Exalogic - 通过 SDP 协议访问 Exadata
ZFS Storage Appliance - 通过 NFS 挂载到 Exadata 计算节点
如果 Data Guard 主备库都是 Exadata 并且物理上级联在一起,则在 Infiniband 上配置好监听程序后日志传输同样可以通过 Infiniband 实现:
Setup Listener on Infiniband Network using both SDP and TCP Protocol (Doc ID 1580584.1)
日志 - /var/log/messages
常用命令
ibswitches - 显示网络中的 Infiniband 交换机
ibhosts - 显示网络中的所有 Infiniband HCA 卡
ibstat/ibstatus - 显示本地 HCA 卡及端口状态
ibdiagnet - 检查网络中设备状态
iblinkinfo - 显示网络中连线
ibnetdiscover - 显示网络中设备配置
ping/rds-ping - 测试 IPoIB/RDS 通信状态
rds-info - RDS 信息及状态计数器
verify-topology - 检查网络拓扑图
InfiniBand词汇
Before I talk more about InfiniBand, I thought its a good idea to just have a blog space with collection of all terms and abbreviations we use. The InfiniBand Jargon, per se. The first one is easy... the list is long... so lets keep it live and add more as we progress.
IB - InfiniBand
IBTA - InfiniBand Trade Association. They deserve the top position here being the founder and maintainers of InfiniBand Specifications since 1999.
OFED - Open Fabrics Enterprise Distribution. This is an open source community driven software package for InifniBand. The group is also active in interoperability, workshops, architectures and protocol development.
OSI - Open Systems Interconnection. A standard for communication systems and modern day networking based on seven layers.
HCA - Host Channel Adapter. This is a piece of hardware that gets installed inside an end point participating in IB network. Similar to network interface card (NIC) that you see in Ethernet world. In OSI model, this enables Layer 1.
MAC - Media Access Control. An entity is worthless without an identity. MAC address provides identity of an end point in the network at hardware layer. In OSI model, this enables Layer 2.
GUID - Globally Unique Identifier. To keep it simple, I would say this is a fixed hardware address of an end point participating in IB network. Conceptually it is similar to MAC address but has longer address length.
LID - Local Identifier. This is a 16-bit address assigned to end points dynamically in an operational IB network. In OSI model, this enables Layer 2 switching. You must be wondering that if MAC and GUID addresses are similar to each other then why IB introduces another number called LID ? Well, this is an IB implementation to ensure sequential and simplified addressing within a network. Who would like to remember those long hex format GUIDs :) On flip side, the limitation is that we can not have more than 2^16 end points.
SLID - Source LID. In messaging sequence, this will be the originator of packet.
DLID - Destination LID. And this will be the destination of packet.
SM - Subnet Manager. This is a software implementation that takes care of IB network management. I will have more on this topic later but in short, SM is responsible to monitor the connected network periodically for any changes, assign LIDs to end points, create switching tables, manage quality of service (QoS) etc.
IPoIB - Internet protocol over InfiniBand. We are moving up the OSI layered stack now. Remember, in my last blog post I mentioned that our socket based messaging still works in IB. IPoIB is the first step in enabling that. What it means is that we simply assign a Layer 3 IP address to an underlying IB device. The address can be IPv4 or IPv6.
When it comes to networks, we talk about numbers like 10/100/1000/10000Mbps (Ethernet), 11/54/150/300Mbps (WiFi) etc. Its about the signalling rate of bits. As a standard, I will use small letter 'b' for bits and capital 'B' for bytes.
SDR - Single Data Rate. This is baseline at 2.5 Gbps
DDR - Double Data Rate. Next level from SDR. 5.0 Gbps
QDR - Quad Data Rate. Nest level from DDR. 10.0 Gbps
SFP - Small Form-Factor (Hot) Pluggable Transceiver. These are special connectors at the end of cables that we use for connecting network equipment.'+' is added for enhanced version which is capable of 10Gbps signalling rates.
[root@enkx3db01 ~]# ibhosts Ca : 0x0010e00b4e20c000 ports 2 "SUN IB QDR GW switch enkbda1sw-ib2 192.168.8.150 Bridge 0" Ca : 0x0010e00b4e20c040 ports 2 "SUN IB QDR GW switch enkbda1sw-ib2 192.168.8.150 Bridge 1" Ca : 0x0021280001efdf70 ports 2 "enkbda1node10 BDA 192.168.12.110 HCA-1" Ca : 0x0021280001efd5ee ports 2 "enkbda1node09 BDA 192.168.12.109 HCA-1" Ca : 0x0021280001efd4ea ports 2 "enkbda1node12 BDA 192.168.12.112 HCA-1" Ca : 0x0021280001efd4d6 ports 2 "enkbda1node11 BDA 192.168.12.111 HCA-1" Ca : 0x0021280001efd5f6 ports 2 "enkbda1node14 BDA 192.168.12.114 HCA-1" Ca : 0x0021280001efd4e6 ports 2 "enkbda1node13 BDA 192.168.12.113 HCA-1" Ca : 0x0021280001ceda62 ports 2 "enkbda1node16 BDA 192.168.12.116 HCA-1" Ca : 0x0021280001cf5abe ports 2 "enkbda1node15 BDA 192.168.12.115 HCA-1" Ca : 0x0021280001efac6a ports 2 "enkbda1node18 BDA 192.168.12.118 HCA-1" Ca : 0x0021280001efd4fa ports 2 "enkbda1node17 BDA 192.168.12.117 HCA-1" Ca : 0x0021280001efdf68 ports 2 "enkbda1node08 BDA 192.168.12.108 HCA-1" Ca : 0x0021280001efd5e6 ports 2 "enkbda1node07 BDA 192.168.12.107 HCA-1" Ca : 0x0021280001efd606 ports 2 "enkbda1node05 BDA 192.168.12.105 HCA-1" Ca : 0x0021280001efd4ee ports 2 "enkbda1node06 BDA 192.168.12.106 HCA-1" Ca : 0x0021280001efd616 ports 2 "enkbda1node03 BDA 192.168.12.103 HCA-1" Ca : 0x0021280001efdf98 ports 2 "enkbda1node04 BDA 192.168.12.104 HCA-1" Ca : 0x0021280001efd84e ports 2 "enkbda1node01 BDA 192.168.12.101 HCA-1" Ca : 0x0021280001efdf6c ports 2 "enkbda1node02 BDA 192.168.12.102 HCA-1" Ca : 0x0010e00b88c0c000 ports 2 "SUN IB QDR GW switch enkbda1sw-ib3 192.168.8.151 Bridge 0" Ca : 0x0010e00b88c0c040 ports 2 "SUN IB QDR GW switch enkbda1sw-ib3 192.168.8.151 Bridge 1" Ca : 0x0021280001fcb9ec ports 2 "enkalytics EL-C 192.168.12.131 HCA-1" Ca : 0x0021280001fc4a1e ports 2 "enkx3db02 S 192.168.12.2 HCA-1" Ca : 0x0021280001fcbf5c ports 2 "enkx3cel03 C 192.168.12.5 HCA-1" Ca : 0x0021280001fbe18e ports 2 "enkx3cel01 C 192.168.12.3 HCA-1" Ca : 0x0021280001fc80c6 ports 2 "enkx3cel02 C 192.168.12.4 HCA-1" Ca : 0x0010e0000128ce64 ports 2 "enkx3db01 S 192.168.12.1 HCA-1" [root@enkx3db01 ~]# ibswitches Switch : 0x002128f57326a0a0 ports 36 "SUN DCS 36P QDR enkbda1sw-ib1 192.168.8.149" enhanced port 0 lid 59 lmc 0 Switch : 0x0010e00b88c0c0a0 ports 36 "SUN IB QDR GW switch enkbda1sw-ib3 192.168.8.151" enhanced port 0 lid 61 lmc 0 Switch : 0x0010e00b4e20c0a0 ports 36 "SUN IB QDR GW switch enkbda1sw-ib2 192.168.8.150" enhanced port 0 lid 60 lmc 0 Switch : 0x002128f575bba0a0 ports 36 "SUN DCS 36P QDR enkx3sw-ib3.enkitec.com" enhanced port 0 lid 1 lmc 0 Switch : 0x002128f57469a0a0 ports 36 "SUN DCS 36P QDR enkx3sw-ib2.enkitec.com" enhanced port 0 lid 2 lmc 0 [root@enkalytics ~]# rds-ping -c 5 enkx3db01-ibvip.enkitec.com 1: 240 usec 2: 214 usec 3: 201 usec 4: 199 usec 5: 269 usec [root@enkx3db01 ~]# rds-info RDS IB Connections: LocalAddr RemoteAddr LocalDev RemoteDev 192.168.12.31 192.168.12.131 fe80::10:e000:128:ce66 fe80::21:2800:1fc:b9ee 192.168.12.1 192.168.12.3 fe80::10:e000:128:ce66 fe80::21:2800:1fb:e18f 192.168.12.1 192.168.12.1 fe80::10:e000:128:ce66 fe80::10:e000:128:ce66 192.168.12.31 192.168.12.31 fe80::10:e000:128:ce66 fe80::10:e000:128:ce66 192.168.12.1 192.168.12.101 :: :: 169.254.87.194 169.254.87.194 fe80::10:e000:128:ce66 fe80::10:e000:128:ce66 192.168.12.1 192.168.12.118 :: :: 192.168.12.1 192.168.12.5 fe80::10:e000:128:ce66 fe80::21:2800:1fc:bf5e 192.168.12.1 192.168.12.4 fe80::10:e000:128:ce66 fe80::21:2800:1fc:80c8 192.168.12.1 192.168.12.2 fe80::10:e000:128:ce66 fe80::21:2800:1fc:4a20 192.168.12.31 192.168.12.2 fe80::10:e000:128:ce66 fe80::21:2800:1fc:4a20 169.254.87.194 169.254.97.245 fe80::10:e000:128:ce66 fe80::21:2800:1fc:4a20 rds-info: Unable get statistics: Protocol not available Counters: CounterName Value conn_reset 2879033 recv_drop_bad_checksum 0 recv_drop_old_seq 17 recv_drop_no_sock 2985 recv_drop_dead_sock 0 recv_deliver_raced 0 recv_delivered 222260977 recv_queued 130604931 recv_immediate_retry 0 recv_delayed_retry 0 recv_ack_required 14752884 recv_rdma_bytes 136276672512 recv_ping 288786 send_queue_empty 85915668 send_queue_full 15 send_lock_contention 764917 send_lock_queue_raced 16222 send_immediate_retry 0 send_delayed_retry 1222 send_drop_acked 0 send_ack_required 12818936 send_queued 115370469 send_rdma 261202 send_rdma_bytes 136280842240 send_pong 288786 page_remainder_hit 102755106 page_remainder_miss 11378031 copy_to_user 165149391125 copy_from_user 126738139340 cong_update_queued 0 cong_update_received 49 cong_send_error 0 cong_send_blocked 0 ib_connect_raced 24 ib_listen_closed_stale 0 ib_evt_handler_call 278198961 ib_tasklet_call 278198961 ib_tx_cq_event 138358343 ib_tx_ring_full 1319 ib_tx_throttle 0 ib_tx_sg_mapping_failure 0 ib_tx_stalled 259 ib_tx_credit_updates 0 ib_rx_cq_event 172332234 ib_rx_ring_empty 83 ib_rx_refill_from_cq 0 ib_rx_refill_from_thread 0 ib_rx_alloc_limit 0 ib_rx_credit_updates 0 ib_ack_sent 14648252 ib_ack_send_failure 0 ib_ack_send_delayed 125532 ib_ack_send_piggybacked 73902 ib_ack_received 12960925 ib_rdma_mr_alloc 6355 ib_rdma_mr_free 5488 ib_rdma_mr_used 48938489 ib_rdma_mr_pool_flush 6438472 ib_rdma_mr_pool_wait 0 ib_rdma_mr_pool_depleted 0 ib_atomic_cswp 0 ib_atomic_fadd 0 iw_connect_raced 0 iw_listen_closed_stale 0 iw_tx_cq_call 0 iw_tx_cq_event 0 iw_tx_ring_full 0 iw_tx_throttle 0 iw_tx_sg_mapping_failure 0 iw_tx_stalled 0 iw_tx_credit_updates 0 iw_rx_cq_call 0 iw_rx_cq_event 0 iw_rx_ring_empty 0 iw_rx_refill_from_cq 0 iw_rx_refill_from_thread 0 iw_rx_alloc_limit 0 iw_rx_credit_updates 0 iw_ack_sent 0 iw_ack_send_failure 0 iw_ack_send_delayed 0 iw_ack_send_piggybacked 0 iw_ack_received 0 iw_rdma_mr_alloc 0 iw_rdma_mr_free 0 iw_rdma_mr_used 0 iw_rdma_mr_pool_flush 0 iw_rdma_mr_pool_wait 0 iw_rdma_mr_pool_depleted 0 RDS Sockets: BoundAddr BPort ConnAddr CPort SndBuf RcvBuf Inode 192.168.12.1 7978 0.0.0.0 0 262144 2097152 1422468668 192.168.12.1 31215 0.0.0.0 0 262144 2097152 1422492506 192.168.12.1 7588 0.0.0.0 0 262144 2097152 1422492510 .... 169.254.87.194 61081 0.0.0.0 0 131072 2097152 1531223507 192.168.12.1 16962 0.0.0.0 0 262144 2097152 1534922520 192.168.12.1 442 0.0.0.0 0 131072 2097152 1534922522 192.168.12.1 49167 0.0.0.0 0 262144 2097152 1539515072 192.168.12.1 48917 0.0.0.0 0 131072 2097152 1539515074 192.168.12.1 14675 0.0.0.0 0 262144 2097152 1539517764 192.168.12.1 12371 0.0.0.0 0 131072 2097152 1539517766 0.0.0.0 0 0.0.0.0 0 131072 2097152 1539617228 RDS Connections: LocalAddr RemoteAddr NextTX NextRX Flg 192.168.12.31 192.168.12.131 13 13 --C 192.168.12.1 192.168.12.3 22473392 1304865 --C 192.168.12.1 192.168.12.1 2385972 139482 --C 192.168.12.31 192.168.12.31 9 9 --C 192.168.12.1 192.168.12.101 4 0 --- 169.254.87.194 169.254.87.194 492250 0 --C 192.168.12.1 192.168.12.118 119 0 --- 192.168.12.1 192.168.12.5 36615339 150138911 --C 192.168.12.1 192.168.12.4 17227536 60920516 --C 192.168.12.1 192.168.12.2 28714935 5551186 --C 192.168.12.31 192.168.12.2 287 287 --C 127.0.0.1 127.0.0.1 18895 18895 --C 169.254.87.194 169.254.97.245 7302242 1582910 --C Receive Message Queue: LocalAddr LPort RemoteAddr RPort Seq Bytes 192.168.12.1 22526 192.168.12.2 20819 3971282 168 192.168.12.1 22526 192.168.12.2 20819 4210130 168 192.168.12.1 22526 192.168.12.2 20819 5177334 168 192.168.12.1 22526 192.168.12.2 20819 5288457 168 192.168.12.1 44950 192.168.12.2 27716 4485037 168 192.168.12.1 44950 192.168.12.2 27716 4603330 168 192.168.12.1 44950 192.168.12.2 27716 4717860 168 .... 169.254.87.194 61209 169.254.97.245 2286 1322929 168 169.254.87.194 61209 169.254.97.245 32997 1322939 168 192.168.12.1 62729 192.168.12.2 62458 5513836 168 192.168.12.1 62729 192.168.12.2 33848 5513844 168 192.168.12.1 62729 192.168.12.2 37522 5513850 168 Send Message Queue: LocalAddr LPort RemoteAddr RPort Seq Bytes Retransmit Message Queue: LocalAddr LPort RemoteAddr RPort Seq Bytes 169.254.87.194 31175 169.254.87.194 42828 492248 156 169.254.87.194 31175 169.254.87.194 42828 492249 156 169.254.87.194 104 169.254.97.245 27039 7302241 252 [root@enkx3db01 ~]# rds-stress waiting for incoming connection on 0.0.0.0:4000 [root@enkalytics ~]# rds-stress -s enkx3db01-ibvip.enkitec.com -p 4000 -t 1 -D 600000 connecting to 192.168.12.31:4000 negotiated options, tasks will start in 2 seconds Starting up.... tsks tx/s rx/s tx+rx K/s mbi K/s mbo K/s tx us/c rtt us cpu % 1 1668 1668 3531.12 977247.98 977247.98 31.57 558.62 -1.00 1 1652 1653 3497.93 968354.24 967768.42 34.82 565.02 -1.00 1 1673 1673 3541.62 980153.86 980153.86 35.05 556.53 -1.00 1 1682 1682 3560.71 985437.49 985437.49 28.11 555.55 -1.00 1 1673 1673 3541.71 980179.34 980179.34 29.50 558.08 -1.00 1 1663 1663 3520.50 974308.84 974308.84 34.43 560.88 -1.00 1 1692 1692 3581.88 991294.23 991294.23 30.13 552.21 -1.00 1 1681 1681 3558.60 984852.60 984852.60 29.30 555.98 -1.00 1 1666 1666 3526.84 976063.53 976063.53 34.09 560.13 -1.00 1 1678 1678 3552.24 983093.02 983093.02 31.21 556.31 -1.00 1 1726 1726 3653.88 1011220.94 1011220.94 31.72 538.88 -1.00 1 1678 1678 3552.26 983097.93 983097.93 29.29 557.05 -1.00 [root@enkx3db01 ~]# rds-stress waiting for incoming connection on 0.0.0.0:4000 accepted connection from 192.168.12.131:19942 on 192.168.12.31:4000 negotiated options, tasks will start in 2 seconds Starting up.... tsks tx/s rx/s tx+rx K/s mbi K/s mbo K/s tx us/c rtt us cpu % 1 1670 1670 3531.99 977489.26 977489.26 15.12 573.84 -1.00 1 1654 1653 3496.99 967509.78 968095.08 15.71 578.94 -1.00 1 1675 1675 3542.75 981052.16 979881.45 14.75 570.90 -1.00 1 1683 1683 3559.70 984572.15 985742.86 15.76 568.26 -1.00 1 1674 1674 3540.45 979829.57 979829.57 15.93 570.79 -1.00 1 1666 1666 3523.46 975127.51 975127.51 15.21 575.50 -1.00 1 1692 1692 3581.02 991057.40 991057.40 16.02 566.28 -1.00 1 1681 1682 3557.17 984749.23 984163.76 15.50 569.27 -1.00 1 1667 1667 3526.36 975931.20 975931.20 15.12 574.15 -1.00 1 1681 1680 3554.11 983903.24 983317.93 15.85 569.99 -1.00 1 1728 1728 3654.66 1011436.98 1011436.98 16.37 556.93 -1.00 1 1678 1678 3551.18 982799.19 982799.19 15.53 571.33 -1.00 --------------------------------------------- 1 1677 1677 3551.65 982954.38 982905.59 15.59 570.97 -1.00 (average) [enkx3db01:oracle:dbm1] /home/oracle > srvctl status vip -i enkx3db01-ibvip VIP enkx3db01-ibvip is enabled VIP enkx3db01-ibvip is running on node: enkx3db01 [enkx3db01:oracle:dbm1] /home/oracle > srvctl config listener -l LISTENER_IB Name: LISTENER_IB Network: 2, Owner: oracle Home: End points: TCP:1522/SDP:1522 [enkx3db01:oracle:+ASM1] /home/oracle > lsnrctl status LISTENER_IB LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 09-AUG-2013 21:52:19 Copyright (c) 1991, 2011, Oracle. All rights reserved. Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_IB))) STATUS of the LISTENER ------------------------ Alias LISTENER_IB Version TNSLSNR for Linux: Version 11.2.0.3.0 - Production Start Date 24-JUL-2013 11:36:40 Uptime 16 days 10 hr. 15 min. 38 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /u01/app/11.2.0.3/grid/network/admin/listener.ora Listener Log File /u01/app/11.2.0.3/grid/log/diag/tnslsnr/enkx3db01/listener_ib/alert/log.xml Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_IB))) (DESCRIPTION=(ADDRESS=(PROTOCOL=sdp)(HOST=192.168.12.31)(PORT=1522))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.12.31)(PORT=1522))) Services Summary... Service "DBM_ETL" has 1 instance(s). Instance "dbm1", status READY, has 1 handler(s) for this service... Service "DBM_REPORTING" has 1 instance(s). Instance "dbm1", status READY, has 1 handler(s) for this service... Service "dbm" has 1 instance(s). Instance "dbm1", status READY, has 1 handler(s) for this service... The command completed successfully [root@enkalytics ~]# sdpnetstat Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 localhost.localdom:6700 localhost.localdo:35791 ESTABLISHED tcp 0 0 enkalytics.Enkitec:9710 enkalytics.Enkite:42991 ESTABLISHED tcp 0 0 enkalytics.Enkitec:9706 enkalytics.Enkite:11093 ESTABLISHED tcp 0 0 enkalytics.Enkitec:9701 enkalytics.Enkite:16776 ESTABLISHED tcp 0 0 localhost.localdom:6700 localhost.localdo:35793 ESTABLISHED tcp 0 0 enkalytics.Enkitec:9710 enkalytics.Enkite:10002 ESTABLISHED tcp 0 0 localhost.localdom:6700 localhost.localdo:35790 ESTABLISHED .... .... tcp 0 0 localhost.localdo:35791 localhost.localdom:6700 ESTABLISHED tcp 0 0 localhost.localdom:6700 localhost.localdo:35798 ESTABLISHED tcp 0 0 enkalytics.Enkite:11093 enkalytics.Enkitec:9706 ESTABLISHED tcp 0 0 localhost.localdom:6700 localhost.localdo:35797 ESTABLISHED tcp 0 0 localhost.localdo:35793 localhost.localdom:6700 ESTABLISHED tcp 0 0 enkalytics.Enkite:60565 enkalytics.Enkitec:9710 TIME_WAIT tcp 0 0 enkalytics.enkite:36136 enk03-vip.enki:ncube-lm ESTABLISHED tcp 0 0 enkalytics.enkite:21666 enkalytic:afs3-callback ESTABLISHED tcp 0 0 enkalytics.enkitec:9704 enkalytics.enkite:52854 TIME_WAIT tcp 0 0 enkalytics.enkitec:9704 enkalytics.enkite:52849 TIME_WAIT tcp 0 0 enkalytics.enkite:25478 enk04-vip.enki:ncube-lm ESTABLISHED tcp 0 0 enkalytic:afs3-callback enkalytics.enkite:11226 ESTABLISHED tcp 0 0 enkalytics.enkite:44861 enkalytics.enkitec:9704 ESTABLISHED tcp 0 0 enkalytics.enkitec:9704 enkalytics.enkite:52850 TIME_WAIT tcp 0 0 enkalytics.enkite:44867 enkalytics.enkitec:9704 ESTABLISHED tcp 0 0 enkalytics.enkitec:9704 enkalytics.enkite:52846 TIME_WAIT tcp 0 0 enkalytics.enkitec:9704 enkalytics.enkite:44867 ESTABLISHED tcp 0 0 localhost.localdo:35797 localhost.localdom:6700 ESTABLISHED tcp 0 0 enkalytics.enkite:11226 enkalytic:afs3-callback ESTABLISHED tcp 0 0 enkalytics.enkitec:9704 enkalytics.enkite:44861 ESTABLISHED tcp 0 0 enkalytic:afs3-callback enkalytics.enkite:11210 ESTABLISHED sdp 0 0 192.168.12.131:43307 enkx3db01-ib:ricardo-lm ESTABLISHED Active UNIX domain sockets (w/o servers) Proto RefCnt Flags Type State I-Node Path unix 20 [ ] DGRAM 24350 /dev/log unix 2 [ ] DGRAM 7979 @/org/kernel/udev/udevd unix 2 [ ] DGRAM 27732 @/org/freedesktop/hal/udev_event unix 2 [ ] DGRAM 4592850 unix 2 [ ] STREAM CONNECTED 1138317 unix 2 [ ] STREAM CONNECTED 1104247 unix 2 [ ] STREAM CONNECTED 1099975 unix 2 [ ] STREAM CONNECTED 1099714 unix 3 [ ] STREAM CONNECTED 971435 /var/run/dbus/system_bus_socket unix 3 [ ] STREAM CONNECTED 971434 unix 2 [ ] DGRAM 478947 unix 3 [ ] STREAM CONNECTED 29367 @/tmp/fam-root- unix 3 [ ] STREAM CONNECTED 29366 unix 3 [ ] STREAM CONNECTED 29353 /var/run/dbus/system_bus_socket unix 3 [ ] STREAM CONNECTED 29352 unix 3 [ ] STREAM CONNECTED 29170 /var/run/dbus/system_bus_socket unix 3 [ ] STREAM CONNECTED 29169 unix 3 [ ] STREAM CONNECTED 29164 unix 3 [ ] STREAM CONNECTED 29163 unix 2 [ ] DGRAM 29161 unix 2 [ ] DGRAM 28889 .... .... unix 2 [ ] DGRAM 27890 unix 3 [ ] STREAM CONNECTED 27865 /var/run/dbus/system_bus_socket unix 3 [ ] STREAM CONNECTED 27864 unix 3 [ ] STREAM CONNECTED 27813 @/var/run/hald/dbus-aeLDAYiwqS unix 3 [ ] STREAM CONNECTED 27812 unix 3 [ ] STREAM CONNECTED 27798 @/var/run/hald/dbus-aeLDAYiwqS unix 3 [ ] STREAM CONNECTED 27797 unix 3 [ ] STREAM CONNECTED 27783 @/var/run/hald/dbus-aeLDAYiwqS unix 3 [ ] STREAM CONNECTED 27782 unix 3 [ ] STREAM CONNECTED 27766 /var/run/acpid.socket unix 3 [ ] STREAM CONNECTED 27765 unix 3 [ ] STREAM CONNECTED 27760 @/var/run/hald/dbus-aeLDAYiwqS unix 3 [ ] STREAM CONNECTED 27759 unix 3 [ ] STREAM CONNECTED 27727 @/var/run/hald/dbus-0e5V2Tfgxi unix 3 [ ] STREAM CONNECTED 27726 unix 2 [ ] DGRAM 27562 unix 3 [ ] STREAM CONNECTED 27445 /var/run/dbus/system_bus_socket unix 3 [ ] STREAM CONNECTED 27444 unix 2 [ ] DGRAM 27433 unix 2 [ ] DGRAM 27422 unix 3 [ ] STREAM CONNECTED 27381 unix 3 [ ] STREAM CONNECTED 27380 unix 3 [ ] STREAM CONNECTED 27339 unix 3 [ ] STREAM CONNECTED 27338 unix 2 [ ] DGRAM 26918 unix 2 [ ] DGRAM 24358 unix 3 [ ] STREAM CONNECTED 24299 unix 3 [ ] STREAM CONNECTED 24298 [root@enkalytics ~]# ifconfig bond0 Link encap:InfiniBand HWaddr 80:00:00:4B:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:192.168.12.131 Bcast:192.168.12.255 Mask:255.255.255.0 inet6 addr: fe80::221:2800:1fc:b9ed/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:65520 Metric:1 RX packets:102393 errors:0 dropped:0 overruns:0 frame:0 TX packets:133607 errors:0 dropped:16 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:6058691 (5.7 MiB) TX bytes:4830507 (4.6 MiB)
Reference
https://blogs.oracle.com/ExadataCN/entry/exadata_%E4%B8%8A_infiniband_%E7%BD%91%E7%BB%9C%E7%AE%80%E4%BB%8B
https://blogs.oracle.com/networking/entry/infiniband_vocabulary
http://www.infinibandta.org/
http://downloads.openfabrics.org/downloads/
https://weidongzhou.wordpress.com/2013/08/09/