H165895: CLIENTS CAN NOT CONNECT TO AN ORACLE PARALLEL SERVER NODE SYMPTOM: Client sessions can no longer connect to one of the nodes in the database, and/or some rows or tables cannot be accessed from the other nodes. The node to which the client sessions can no longer connect cannot see the shared storage when viewing the drives with the NT Disk Administrator. This node had been up and running OK. This scenario has occurred without rebooting this node. PROBLEM ISOLATION AIDS: - The system is a Netfinity 7000 M-10 server, Type 8680, any supported Model configured with Oracle Parallel Server. Note: Supported configurations are listed at the following URL: http://www.pc.ibm.com/us/compat/clustering/matrix.shtml look for "Oracle Parallel Server" - The system is configured with the following option: IBM Netfinity Cluster Enabler Software - NOS affected: Windows NT 4.0 Enterprise Edition with Service Pack 4 applied. FIX: The action to take depends on the state of the OracleServiceOPSn service on the node that clients can no longer connect to. "n" will be the number of the database instance so the actual service name will be something like OracleServiceOPS1. - If the database instance has stopped and the node has been evicted from the cluster, then no action is required. - If the database instance has not stopped completely (even after 15 minutes), then the OracleServiceOPSn service should be stopped with the "net stop" command or from the Windows NT Services window. You can determine if the OracleServiceOPSn service has been stopped by using the Windows NT Services window. The "Status" field of the OracleServiceOPSn service will be blank when the service is successfully stopped. If the OracleServiceOPSn service cannot be stopped with one of these commands, then the ORACLE80.exe process should be terminated from the Windows NT Task Manager. DETAILS: A hardware failure may have occurred that prevents access to the shared storage on this node. If this has occurred, the database instance on this node may behave in one of two ways: 1) The database instance terminates. This node is evicted from the cluster. The Oracle reconfiguration process resulting from a node being evicted from the cluster may take as long as 15 minutes to complete. 2) The database instance terminates, but not completely. In both cases, provided the client application is written properly, the clients will failover from the node that can no longer communicate with the shared storage. In the first case above, no problems will occur. In the second case above, the ORACLE80.exe process may continue to use up CPU and memory on this node, but the clients should not see a problem. The fix addresses the problem by terminating an Oracle service that may not have shut down correctly and is hung holding a database lock. TRADEMARKS: Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States and/or other countries. Other company, product and service names may be the trademarks or service marks of others. DATE: February 17, 1999