This note attempts to clarify the cluster_interconnects parameter and the
platforms on which the implementation has been made. A brief explanation onthe workings of the parameter has also been presented in this note.This is also one of the most frequently questions related to cluster and RACinstallations on most sites and forms a part of the prerequisite as well. ORACLE 9I RAC – Parameter CLUSTER_INTERCONNECTS
———————————————– FREQUENTLY ASKED QUESTIONS
————————–November 2002 CONTENTS
——–1. What is the parameter CLUSTER_INTERCONNECTS for ?2. Is the parameter CLUSTER_INTERCONNECTS available for all platforms ?3. How is the Interconnect recognized on Linux ?4. Where could I find more information on this parameter ?5. How to detect which interconnect is used ?6. Cluster_Interconnects is mentioned in the 9i RAC administration guide as a Solaris specific parameter, is this the only platform where this parameter is available ?7. Are there any side effects for this parameter, namely affecting normal operations ?8. Is the parameter OPS_INTERCONNECTS which was available in 8i similar to this parameter ?9. Does Cluster_interconnect allow failover from one Interconnect to another Interconnect ?10. Is the size of messages limited on the Interconnect ?11. How can you see which protocoll is being used by the instances ?12. Can the parameter CLUSTER_INTERCONNECTS be changed dynamically during runtime ?
QUESTIONS & ANSWERS——————-1. What is the parameter CLUSTER_INTERCONNECTS for ? Answer
——This parameter is used to influence the selection of the network interfacefor Global Cache Service (GCS) and Global Enqueue Service (GES) processing. This note does not compare the other elements of 8i OPS with 9i RAC
because of substantial differences in the behaviour of both architectures.Oracle 9i RAC has certain optimizations which attempt to transfer most ofthe information required via the interconnects so that the number of diskreads are minimized. This behaviour known as Cache fusion phase 2 is summarisedin Note 139436.1The definition of the interconnnect is a private network whichwill be used to transfer the cluster traffic and Oracle Resource directoryinformation and blocks to satisfy queries. The technical term for that iscache fusion. The CLUSTER_INTERCONNECTS should be used when
- you want to override the default network selection- bandwith of a single interconnect does not meet the bandwith requirements of a Real Application Cluster database The syntax of the parameter is:
CLUSTER_INTERCONNECTS = if1:if2:…:ifn
Where if is an IP address in standard dotted-decimal format, for example,144.25.16.214. Subsequent platform. implementations may specify interconnectswith different syntaxes.2. Is the parameter CLUSTER_INTERCONNECTS available for all platforms ? Answer
—— This parameter is configurable on most platforms.
This parameter can not be used on Linux. The following Matrix shows when the parameter was introduced on which platform.:
Operating System Available since
AIX 9.2.0HP/UX 9.0.1HP Tru64 9.0.1HP OPenVMS 9.0.1Sun Solaris 9.0.1 References
———-Bug <2119403> ORACLE9I RAC ADMINISTRATION SAYS CLUSTER_INTERCONNECTS IS SOLARIS ONLY.Bug <2359300> ENHANCE CLUSTER_INTERCONNECTS TO WORK WITH 9I RAC ON IBM3. How is the Interconnect recognized on Linux ? Answer
——Since Oracle9i 9.2.0.8 CLUSTER_INTECONNETCS can be used to change the interconnect.A patch is also available for 9.2.0.7 under Patch 4751660.Before 9.2.0.8 the Oracle implementation for the interface selection reads the ‘private hostname’in the cmcfg.ora file and uses the corresponding ip-address for the interconnect.If no private hostname is available the public hostname will be used.4. Where could I find information on this parameter ? Answer
—— The parameter is documented in the following books:
Oracle9i Database Reference Release 2 (9.2)Oracle9i Release 1 (9.0.1) New Features in Oracle9i Database Reference - What’s New in Oracle9i Database Reference?Oracle9i Real Application Clusters Administration Release 2 (9.2)Oracle9i Real Application Clusters Deployment and Performance Release 2 (9.2) Also port specific documentation may contain information about the usage of
the cluster_interconnects parameter. Documentation can be viewed on
References:———–Note 162725.1: OPS/RAC VMS: Using alternate TCP Interconnects on 8i OPS and 9i RAC on OpenVMS Note 151051.1: Init.ora Parameter “CLUSTER_INTERCONNECTS” Reference Note
5. How to detect which interconnect is used ?
The following commands show which interconnect is used for UDP or TCP: sqlplus> connect / as sysdba oradebug setmypid oradebug ipc exit The corresponding trace can be found in the user_dump_dest directory and for
example contains the following information in the last couple of lines: SKGXPCTX: 0x32911a8 ctx
admno 0x12f7150d admport: SSKGXPT 0x3291db8 flags SSKGXPT_READPENDING info for network 0 socket no 9 IP 172.16.193.1 UDP 43307 sflags SSKGXPT_WRITESSKGXPT_UP info for network 1 socket no 0 IP 0.0.0.0 UDP 0 sflags SSKGXPT_DOWN context timestamp 0x1ca5 no ports Please note that on some platforms and versions (Oracle9i 9.2.0.1 on Windows) you might see an ORA-70 when the command oradebug ipc has not been implemented. When other protocols such as LLT, HMP or RDG are used, then the trace file will not
reveal an IP address.6. Cluster_Interconnects is mentioned in the 9i RAC administration guide as a Solaris specific parameter, is this the only platform where this parameter is available ? Answer
—– This information that this parameter works on Solaris only is incorrect. Please
check the answer for question number 2 for the complete list of platforms for the same. References:
———–bug <2119403> ORACLE9I RAC ADMINISTRATION SAYS CLUSTER_INTERCONNECTS IS SOLARIS ONLY.7. Are there any side effects for this parameter, namely affecting normal operations ? Answer
—–When you set CLUSTER_INTERCONNECTS in cluster configurations, theinterconnect high availability features are not available. In other words,an interconnect failure that is normally unnoticeable would instead causean Oracle cluster failure as Oracle still attempts to access the networkinterface which has gone down. Using this parameter you are explicitlyspecifying the interface or list of interfaces to be used. 8. Is the parameter OPS_INTERCONNECTS which was available in 8i similar
to this parameter ? Answer
——Yes, the parameter OPS_INTERCONNECTS was used to influence the network selectionfor the Oracle 8i Parallel Server. Reference
———Note <120650.1> Init.ora Parameter “OPS_INTERCONNECTS” Reference Note9. Does Cluster_interconnect allow failover from one Interconnect to another Interconnect ? Answer
——Failover capability is not implemented at the Oracle level. In general thisfunctionality is delivered by hardware and/or Software of the operating system.For platform. details please see Oracle platform. specific documentationand the operating system documentation.10. Is the size of messages limited on the Interconnect ? Answer
——The message size depends on the protocoll and platform.UDP: In Oracle9i Release 2 (9.2.0.1) message size for UDP was limited to 32K. Oracle9i 9.2.0.2 allows to use bigger UDP message sizes depending on the platform. To increase throughput on an interconnect you have to adjust udp kernel parameters.TCP: There is no need to set the message size for TCP.RDG: The recommendations for RDG are documented in Oracle9i Administrator’s Reference – Part No. A97297-01References———-Bug <2475236> RAC multiblock read performance issue using UDP IPC11. How can you see which protocoll is being used by the instances ? Answer
——Please see the alert-file(s) of your RAC instances. During startup you’ll find a message in the alert-file that shows the protocoll being used. Wed Oct 30 05:28:55 2002
cluster interconnect IPC version:Oracle UDP/IP with Sun RSM disabled IPC Vendor 1 proto 2 Version 1.012. Can the parameter CLUSTER_INTERCONNECT be changed dynamically during runtime ? Answer
—— No. Cluster_interconnects is a static parameter and can only be set in the spfile or pfile (init.ora)
在Oracle RAC环境中,RAC实例的Cache Fusion通常都使用的是Clusterware的私有心跳网络,特别是11.2.0.2版本之后,多用HAIP技术,这种技术在提高带宽的同时(最多4个心跳网络),也保证了心跳网络的容错能力,例如:RAC节点服务器4条心跳网络,同时坏3条都不会引起Oracle RAC和Clusterware宕机。 但是当一套RAC环境中部署有多套数据库时,不同数据库实例之间的Cache Fusion活动会相互的影响,可能有些库对带宽要求高些,有些库对带宽要求低些,为了避免同一套RAC环境的多套数据库的心跳之间相互影响,Oracle在数据库层面提供了cluster_interconnects参数,该参数的作用就是覆盖默认的心跳网络,使用指定的网络用于数据库实例Cache Fusion活动,但该参数不具备容错的能力,下面我们通过实验来说明: Oracle RAC环境:12.1.0.2.0 标准Cluster for Oracle Linux 5.9 x64。 一.网络配置。 >节点1: [root@rhel1 ~]# ifconfig -a eth0 Link encap:Ethernet HWaddr 00:50:56:A8:16:15 <<<< eth0管理网络。 inet addr:172.168.4.20 Bcast:172.168.4.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:13701 errors:0 dropped:522 overruns:0 frame:0 TX packets:3852 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1122408 (1.0 MiB) TX bytes:468021 (457.0 KiB) eth1 Link encap:Ethernet HWaddr 00:50:56:A8:25:6B <<<< eth1公共网络。 inet addr:10.168.4.20 Bcast:10.168.4.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:23074 errors:0 dropped:520 overruns:0 frame:0 TX packets:7779 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:15974971 (15.2 MiB) TX bytes:2980403 (2.8 MiB) eth1:1 Link encap:Ethernet HWaddr 00:50:56:A8:25:6B inet addr:10.168.4.22 Bcast:10.168.4.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 eth1:2 Link encap:Ethernet HWaddr 00:50:56:A8:25:6B inet addr:10.168.4.24 Bcast:10.168.4.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 eth2 Link encap:Ethernet HWaddr 00:50:56:A8:21:0A <<<< eth2心跳网络,属于Clusterware HAIP其中之一。 inet addr:10.0.1.20 Bcast:10.0.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:11322 errors:0 dropped:500 overruns:0 frame:0 TX packets:10279 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:6765147 (6.4 MiB) TX bytes:5384321 (5.1 MiB) eth2:1 Link encap:Ethernet HWaddr 00:50:56:A8:21:0A inet addr:169.254.10.239 Bcast:169.254.127.255 Mask:255.255.128.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 eth3 Link encap:Ethernet HWaddr 00:50:56:A8:F7:F7 <<<< eth3心跳网络,属于Clusterware HAIP其中之一。 inet addr:10.0.2.20 Bcast:10.0.2.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:347096 errors:0 dropped:500 overruns:0 frame:0 TX packets:306170 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:210885992 (201.1 MiB) TX bytes:173504069 (165.4 MiB) eth3:1 Link encap:Ethernet HWaddr 00:50:56:A8:F7:F7 inet addr:169.254.245.28 Bcast:169.254.255.255 Mask:255.255.128.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 eth4 Link encap:Ethernet HWaddr 00:50:56:A8:DC:CC <<<< eth4~eth9心跳网络,但不属于Clusterware HAIP。 inet addr:10.0.3.20 Bcast:10.0.3.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:7247 errors:0 dropped:478 overruns:0 frame:0 TX packets:6048 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3525191 (3.3 MiB) TX bytes:2754275 (2.6 MiB) eth5 Link encap:Ethernet HWaddr 00:50:56:A8:A1:86 inet addr:10.0.4.20 Bcast:10.0.4.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:40028 errors:0 dropped:480 overruns:0 frame:0 TX packets:23700 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:15139172 (14.4 MiB) TX bytes:9318750 (8.8 MiB) eth6 Link encap:Ethernet HWaddr 00:50:56:A8:F7:53 inet addr:10.0.5.20 Bcast:10.0.5.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:13324 errors:0 dropped:470 overruns:0 frame:0 TX packets:128 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1075873 (1.0 MiB) TX bytes:16151 (15.7 KiB) eth7 Link encap:Ethernet HWaddr 00:50:56:A8:E4:78 inet addr:10.0.6.20 Bcast:10.0.6.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:13504 errors:0 dropped:457 overruns:0 frame:0 TX packets:120 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1158553 (1.1 MiB) TX bytes:14643 (14.2 KiB) eth8 Link encap:Ethernet HWaddr 00:50:56:A8:C0:B0 inet addr:10.0.7.20 Bcast:10.0.7.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:13272 errors:0 dropped:442 overruns:0 frame:0 TX packets:126 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1072609 (1.0 MiB) TX bytes:15999 (15.6 KiB) eth9 Link encap:Ethernet HWaddr 00:50:56:A8:5E:F6 inet addr:10.0.8.20 Bcast:10.0.8.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:14316 errors:0 dropped:431 overruns:0 frame:0 TX packets:127 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1169023 (1.1 MiB) TX bytes:15293 (14.9 KiB) 节点2: [root@rhel2 ~]# ifconfig -a <<<< 网络配置和节点1一致。 eth0 Link encap:Ethernet HWaddr 00:50:56:A8:C2:66 inet addr:172.168.4.21 Bcast:172.168.4.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:19156 errors:0 dropped:530 overruns:0 frame:0 TX packets:278 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:4628107 (4.4 MiB) TX bytes:37558 (36.6 KiB) eth1 Link encap:Ethernet HWaddr 00:50:56:A8:18:1A inet addr:10.168.4.21 Bcast:10.168.4.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:21732 errors:0 dropped:531 overruns:0 frame:0 TX packets:7918 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:4110335 (3.9 MiB) TX bytes:14783715 (14.0 MiB) eth1:2 Link encap:Ethernet HWaddr 00:50:56:A8:18:1A inet addr:10.168.4.23 Bcast:10.168.4.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 eth2 Link encap:Ethernet HWaddr 00:50:56:A8:1B:DD inet addr:10.0.1.21 Bcast:10.0.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:410244 errors:0 dropped:524 overruns:0 frame:0 TX packets:433865 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:206461212 (196.8 MiB) TX bytes:283858870 (270.7 MiB) eth2:1 Link encap:Ethernet HWaddr 00:50:56:A8:1B:DD inet addr:169.254.89.158 Bcast:169.254.127.255 Mask:255.255.128.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 eth3 Link encap:Ethernet HWaddr 00:50:56:A8:2B:68 inet addr:10.0.2.21 Bcast:10.0.2.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:323060 errors:0 dropped:512 overruns:0 frame:0 TX packets:337911 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:176652414 (168.4 MiB) TX bytes:212347379 (202.5 MiB) eth3:1 Link encap:Ethernet HWaddr 00:50:56:A8:2B:68 inet addr:169.254.151.103 Bcast:169.254.255.255 Mask:255.255.128.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 eth4 Link encap:Ethernet HWaddr 00:50:56:A8:81:DB inet addr:10.0.3.21 Bcast:10.0.3.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:37308 errors:0 dropped:507 overruns:0 frame:0 TX packets:27565 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:10836885 (10.3 MiB) TX bytes:14973305 (14.2 MiB) eth5 Link encap:Ethernet HWaddr 00:50:56:A8:43:EA inet addr:10.0.4.21 Bcast:10.0.4.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:38506 errors:0 dropped:496 overruns:0 frame:0 TX packets:27985 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:10940661 (10.4 MiB) TX bytes:14859794 (14.1 MiB) eth6 Link encap:Ethernet HWaddr 00:50:56:A8:84:76 inet addr:10.0.5.21 Bcast:10.0.5.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:13653 errors:0 dropped:484 overruns:0 frame:0 TX packets:114 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1102617 (1.0 MiB) TX bytes:14161 (13.8 KiB) eth7 Link encap:Ethernet HWaddr 00:50:56:A8:B6:4F inet addr:10.0.6.21 Bcast:10.255.255.255 Mask:255.0.0.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:13633 errors:0 dropped:474 overruns:0 frame:0 TX packets:115 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1101251 (1.0 MiB) TX bytes:14343 (14.0 KiB) eth8 Link encap:Ethernet HWaddr 00:50:56:A8:97:62 inet addr:10.0.7.21 Bcast:10.0.7.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:13633 errors:0 dropped:459 overruns:0 frame:0 TX packets:115 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1102065 (1.0 MiB) TX bytes:14343 (14.0 KiB) eth9 Link encap:Ethernet HWaddr 00:50:56:A8:28:10 inet addr:10.0.8.21 Bcast:10.0.8.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:13764 errors:0 dropped:446 overruns:0 frame:0 TX packets:115 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1159479 (1.1 MiB) TX bytes:14687 (14.3 KiB) 二.集群当前的心跳网络配置。 [grid@rhel1 ~]$ oifcfg getif eth1 10.168.4.0 global public eth2 10.0.1.0 global cluster_interconnect eth3 10.0.2.0 global cluster_interconnect 三.cluster_interconnects参数调整前。 SQL> show parameter cluster_interconnect NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ cluster_interconnects string cluster_interconnects默认为空。 SQL> select * from v$cluster_interconnects; NAME IP_ADDRESS IS_ SOURCE CON_ID --------------- ---------------- --- ------------------------------- ---------- eth2:1 169.254.10.239 NO 0 eth3:1 169.254.245.28 NO 0 V$CLUSTER_INTERCONNECTS displays one or more interconnects that are being used for cluster communication. 查询v$cluster_interconnects发现,当前RAC环境使用的是HAIP,请注意:这里显示的是HAIP地址,并不是系统配置的地址,这和之后的显示是有区别的。 四.调整cluster_interconnects参数。 调整cluster_interconnects参数,为了尽可能大的提高心跳带宽,我们为每台机器配置了9个心跳网络: SQL> alter system set cluster_interconnects="10.0.1.20:10.0.2.20:10.0.3.20:10.0.4.20:10.0.5.20:10.0.6.20:10.0.7.20:10.0.8.20:10.0.9.20" scope=spfile sid='orcl1'; <<<< 注意IP之间用冒号隔开,双引号引起来;设置cluster_interconnects参数将覆盖掉通过oifcfg getif命令查看到的clusterware心跳网络,该网络也是RAC心跳通信的默认网络。 System altered. SQL> alter system set cluster_interconnects="10.0.1.21:10.0.2.21:10.0.3.21:10.0.4.21:10.0.5.21:10.0.6.21:10.0.7.21:10.0.8.21:10.0.9.21" scope=spfile sid='orcl2'; System altered. 重启数据库实例收到如下报错: Advanced Analytics and Real Application Testing options [oracle@rhel1 ~]$ srvctl stop database -d orcl [oracle@rhel1 ~]$ srvctl start database -d orcl PRCR-1079 : Failed to start resource ora.orcl.db CRS-5017: The resource action "ora.orcl.db start" encountered the following error: ORA-27504: IPC error creating OSD context ORA-27300: OS system dependent operation:ip_list failed with status: 0 ORA-27301: OS failure message: Error 0 ORA-27302: failure occurred at: skgxpcini ORA-27303: additional information: Too many IPs specified to SKGXP. Max supported is 4, given 9. . For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/rhel2/crs/trace/crsd_oraagent_oracle.trc". CRS-2674: Start of 'ora.orcl.db' on 'rhel2' failed CRS-5017: The resource action "ora.orcl.db start" encountered the following error: ORA-27504: IPC error creating OSD context ORA-27300: OS system dependent operation:ip_list failed with status: 0 ORA-27301: OS failure message: Error 0 ORA-27302: failure occurred at: skgxpcini ORA-27303: additional information: Too many IPs specified to SKGXP. Max supported is 4, given 9. . For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/rhel1/crs/trace/crsd_oraagent_oracle.trc". CRS-2674: Start of 'ora.orcl.db' on 'rhel1' failed CRS-2632: There are no more servers to try to place resource 'ora.orcl.db' on that would satisfy its placement policy 看来即使是使用cluster_interconnects网络地址也不能超过4个,这个跟HAIP一致。 于是,去掉后面的5个IP,保留前4个IP用于心跳网络: 节点1:10.0.1.20:10.0.2.20:10.0.3.20:10.0.4.20 节点2:10.0.1.21:10.0.2.21:10.0.3.21:10.0.4.21 五.测试cluster_interconnects参数容错的能力。 下面我们来测试一下cluster_interconnects的容错能力: SQL> set linesize 200 SQL> select * from v$cluster_interconnects; NAME IP_ADDRESS IS_ SOURCE CON_ID --------------- ---------------- --- ------------------------------- ---------- eth2 10.0.1.20 NO cluster_interconnects parameter 0 eth3 10.0.2.20 NO cluster_interconnects parameter 0 eth4 10.0.3.20 NO cluster_interconnects parameter 0 eth5 10.0.4.20 NO cluster_interconnects parameter 0 重启实例之后发现当前RAC使用之前指定的4个IP用于心跳网络。 RAC双节点实例都正常运行: [oracle@rhel1 ~]$ srvctl status database -d orcl Instance orcl1 is running on node rhel1 Instance orcl2 is running on node rhel2 手动down掉节点1的其中一个心跳网卡: [root@rhel1 ~]# ifdown eth4 <<<< 该网卡不是HAIP其中的IP网口。 [oracle@rhel1 ~]$ srvctl status database -d orcl Instance orcl1 is running on node rhel1 Instance orcl2 is running on node rhel2 通过srvctl工具显示实例依然是运行状态。 用sqlplus本地登陆: [oracle@rhel1 ~]$ sql SQL*Plus: Release 12.1.0.2.0 Production on Tue Oct 20 18:11:35 2015 Copyright (c) 1982, 2014, Oracle. All rights reserved. Connected. SQL> 这个状态显然不对了。 检查告警日志,收到如下报错: 2015-10-20 18:10:22.996000 +08:00 SKGXP: ospid 32107: network interface query failed for IP address 10.0.3.20. SKGXP: [error 32607] 2015-10-20 18:10:31.600000 +08:00 Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_qm03_453.trc (incident=29265) (PDBNAME=CDB$ROOT): ORA-00603: ORACLE server session terminated by fatal error ORA-27501: IPC error creating a port ORA-27300: OS system dependent operation:bind failed with status: 99 ORA-27301: OS failure message: Cannot assign requested address ORA-27302: failure occurred at: sskgxpsock Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_29265/orcl1_qm03_453_i29265.trc Errors in file /u01/app/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_cjq0_561.trc (incident=29297) (PDBNAME=CDB$ROOT): ORA-00603: ORACLE server session terminated by fatal error ORA-27544: Failed to map memory region for export ORA-27300: OS system dependent operation:bind failed with status: 99 ORA-27301: OS failure message: Cannot assign requested address ORA-27302: failure occurred at: sskgxpsock Incident details in: /u01/app/oracle/diag/rdbms/orcl/orcl1/incident/incdir_29297/orcl1_cjq0_561_i29297.trc 2015-10-20 18:10:34.724000 +08:00 Dumping diagnostic data in directory=[cdmp_20151020181034], requested by (instance=1, osid=561 (CJQ0)), summary=[incident=29297]. 2015-10-20 18:10:35.819000 +08:00 Dumping diagnostic data in directory=[cdmp_20151020181035], requested by (instance=1, osid=453 (QM03)), summary=[incident=29265]. 从日志来看,实例并没有down掉,HANG在那里了,查看另一个节点的数据库实例日志,发现RAC的其他实例并没有报错,不受影响。 手动恢复网卡: [root@rhel1 ~]# ifup eth4 随即实例恢复正常,整个过程实例并没有down掉。 那HAIP对应的网口down掉会不会影响实例呢?于是将eth2 down掉: [root@rhel1 ~]# ifdown eth2 从测试来看,实例依然hang住,跟down掉非HAIP网口的情况一致,网口恢复后实例即恢复正常。 总结:从测试来看,不管指定的是HAIP网口,还是非HAIP网口,设置cluster_interconnects参数都将使心跳网络不具备容错能力,任何一个指定的网口出现问题,都将使实例HANG住,直到网口恢复正常,实例才能恢复正常,同时cluster_interconnects参数也只支持到4个IP地址。 虽然在RAC环境多数据库的情况下,通过设置数据库实例的cluster_interconnects初始化参数可以覆盖默认的clusterware心跳网络,多个数据库实例的心跳通信相互隔离,但指定的任何网卡出现故障都会引起实例HANG住,高可用性没有得到保障。