informix-community

Open full view…

Informix HDR failover doesn't work when primary DB server and ICM are terminated

naguibenator
Sun, 04 Nov 2018 08:30:27 GMT

Automatic switchover is configured in a primary/secondary HDR server setup. The HDR cluster would be connected and replication is on. If the service is stopped gracefully on the primary server (NODE-A) or the VM is terminated, failover happens and the secondary server (NODE-B) becomes primary. The following works ok as long as the application server (where ICM is running) is up: service informix stop onmade -yuk or even simply shutdown/terminate the primary DB node VM However, if in the case of a DR event where both the primary DB server and application server (ICM) went down at the same time, then the ICM machine was brought back up, it does not failover to the secondary node **DB ONCONFIG file** # HA_FOC_ORDER - The cluster failover rules. HA_FOC_ORDER # DRAUTO - Controls automatic failover of primary DRAUTO 3 # DRINTERVAL - The maximum interval, in seconds, between HDR DRINTERVAL 0 # DRTIMEOUT - The time, in seconds, before a network DRTIMEOUT 30 **/etc/informix/cmconfig** NAME whics-t10 DEBUG 1 LOG 1 LOGFILE /logs/informix/cmlog CM_TIMEOUT 60 EVENT_TIMEOUT 60 SECONDARY_EVENT_TIMEOUT 60 SQLHOSTS local CLUSTER cluster_1 { INFORMIXSERVER cluster_1 SLA sla_cluster_1 DBSERVERS=primary FOC ORDER=ENABLED \ PRIORITY=1 #CMALARMPROGRAM $INFORMIXDIR/etc/cmalarmprogram.sh **/etc/informix/cm.sqlhosts** cluster_1 group - - i=10 awwdst10a_t10 onsoctcp awwdst10a informix_t10 g=cluster_1 awwdst10b_t10 onsoctcp awwdst10b informix_t10 g=cluster_1 sla_cluster_1 onsoctcp localhost informix_t10 **/logs/informix/cmlog - Before failure event** 17:09:02 IBM Informix Connection Manager 17:09:02 IBM Informix CSDK Version 4.10, IBM Informix-ESQL Version 4.10.FC10 ..... 17:09:02 set CM_TIMEOUT 60 17:09:02 set global EVENT_TIMEOUT 60 17:09:02 set global SECONDARY_EVENT_TIMEOUT 60 17:09:02 DEBUG[TID139992173627168]:add type primary to SLA sla_cluster_1 [cmsm_sla.c:parse_sla:2412] 17:09:02 SLA sla_cluster_1 listener mode REDIRECT 17:09:02 Connection Manager name is whics-t10 17:09:02 Starting Connection Manager... 17:09:02 DEBUG[TID139992173627168]:dbservername = cluster_1 [cmsm_server.ec:cmsm_primary:2819] 17:09:02 DEBUG[TID139992173627168]:nettype = [cmsm_server.ec:cmsm_primary:2820] 17:09:02 DEBUG[TID139992173627168]:hostname = - [cmsm_server.ec:cmsm_primary:2821] 17:09:02 DEBUG[TID139992173627168]:servicename = - [cmsm_server.ec:cmsm_primary:2822] 17:09:02 DEBUG[TID139992173627168]:options = i=10 [cmsm_server.ec:cmsm_primary:2823] 17:09:02 DEBUG[TID139992173627168]:connect to cluster_1 for CLUSTER cluster_1 [cmsm_server.ec:cmsm_primary:2839] 17:09:02 DEBUG[TID139992173627168]:connect to group cluster_1 server awwdst10b_t10 for CLUSTER cluster_1 [cmsm_server.ec:cmsm_primary:2860] 17:09:02 DEBUG[TID139992173627168]:create new thread 1986635520 for awwdst10b_t10 [cmsm_sla.c:cmsm_update_server_ex:996] 17:09:02 listener sla_cluster_1 initializing 17:09:02 DEBUG[TID139992152782592]:CONNECT to @awwdst10a_t10|onsoctcp|awwdst10a|informix_t10 AS awwdst10a_t101 SQLCODE = (0,0,) [cmsm_server.ec:cmsm_connect_byurl_opt:136] 17:09:02 DEBUG[TID139992152782592]:database sysmaster @awwdst10a_t10|onsoctcp|awwdst10a|informix_t10 AS awwdst10a_t101 SQLCODE = (0,0,) [cmsm_server.ec:cmsm_connect_byurl_opt:145] 17:09:02 DEBUG[TID139992152782592]:awwdst10a_t10 protocols = 1 [cmsm_server.ec:cmsm_add_dbaliases:2427] 17:09:02 DEBUG[TID139992152782592]:skip awwdst10a_t10 alias awwdst10a_t10_drda with protocol 1 [cmsm_server.ec:cmsm_add_dbaliases:2694] ....... setting primary name = awwdst10a_t10 [cmsm_arb.c:arb_set_primary:795] 17:09:02 Cluster cluster_1 Arbitrator FOC ORDER=ENABLED PRIORITY=1 17:09:02 Connection Manager successfully connected to awwdst10a_t10 17:09:02 The server type of cluster cluster_1 server awwdst10b_t10 is HDR. 17:09:02 DEBUG[TID139992150681344]:CONNECT to @awwdst10b_t10|onsoctcp|awwdst10b|informix_t10 AS awwdst10b_t101 SQLCODE = (0,0,) [cmsm_server.ec:cmsm_connect_byurl_opt:136] 17:09:02 DEBUG[TID139992152782592]:server awwdst10a_t10 version is 12.10 [cmsm_er.ec:cmsm_er_monitor:2971] 17:09:02 DEBUG[TID139992152782592]:register CM successfully whics-t10 with token 216854841 [cmsm_server.ec:cmsm_cm_register:2330] 17:09:02 DEBUG[TID139992152782592]:Register server awwdst10a_t10 for awwdst10a_t10 count = 1 [cmsm_er.ec:cmsm_er_event_process:1280] 17:09:03 DEBUG[TID139992150681344]:database sysmaster @awwdst10b_t10|onsoctcp|awwdst10b|informix_t10 AS awwdst10b_t101 SQLCODE = (0,0,) [cmsm_server.ec:cmsm_connect_byurl_opt:145] 17:09:03 Listener sla_cluster_1 DBSERVERS=primary is active with 4 worker threads 17:09:03 DEBUG[TID139992150681344]:awwdst10b_t10 protocols = 1 [cmsm_server.ec:cmsm_add_dbaliases:2427] 17:09:03 DEBUG[TID139992150681344]:skip awwdst10b_t10 alias awwdst10b_t10_drda with protocol 1 [cmsm_server.ec:cmsm_add_dbaliases:2694] 17:09:14 Connection Manager successfully connected to awwdst10b_t10 17:09:14 DEBUG[TID139992173627168]:fcntl(/opt/informix-12.10.fc10/tmp/cmsm.pid.whics-t10) success error:Success[0] [cmsm_main.c:cmsm_pidfile:2193] 17:09:14 Connection Manager started successfully 17:09:14 SLA sla_cluster_1 redirect SQLI client from ::1 to awwdst10a_t10 172.29.26.129.51001 17:09:14 DEBUG[TID139992150681344]:server awwdst10b_t10 version is 12.10 [cmsm_er.ec:cmsm_er_monitor:2971] .......... 17:11:01 SLA sla_cluster_1 redirect SQLI client from ::1 to awwdst10a_t10 172.29.26.129.51001 **Primary node DB - before failure** onstat -g cluster IBM Informix Dynamic Server Version 12.10.FC10 -- On-Line (Prim) -- Up 02:37:16 -- 1556768 Kbytes Primary Server:awwdst10a_t10 Current Log Page:54494,1135 Index page logging status: Enabled Index page logging was enabled at: 2018/06/06 09:47:35 Server ACKed Log Applied Log Supports Status (log, page) (log, page) Updates awwdst10b_t10 54494,1135 54494,1135 No ASYNC(HDR),Connected,On 16:47:21 IBM Informix Dynamic Server Started. 16:47:21 Requested shared memory segment size rounded from 275020KB to 280576K Mon Oct 22 16:47:23 2018 16:47:23 Requested shared memory segment size rounded from 405443KB to 405504KB 16:47:23 Shared memory segment will use huge pages. 16:47:23 Segment locked: addr=0x8a600000, size=415236096 16:47:23 Successfully added a bufferpool of page size 16K. ...... 16:47:28 Affinitied VP 1 to phys proc 1 16:47:28 DR: DRAUTO is 3 (CMSM) 16:47:28 DR: ENCRYPT_HDR is 0 (HDR encryption Disabled) 16:47:28 Event notification facility epoll enabled. 16:47:28 Trusted host cache successfully built:/etc/hosts.equiv. 16:47:28 IBM Informix Dynamic Server Version 12.10.FC10 Software 16:47:29 Performance Advisory: The current size of the physical log buffer is smaller than recommended. 16:47:29 Results: Transaction performance might not be optimal. 16:47:29 Action: For better performance, increase the physical log buffer size to 128. 16:47:30 IBM Informix Dynamic Server Initialized -- Shared Memory Initialized. 16:47:30 Started 1 B-tree scanners. 16:47:30 B-tree scanner threshold set at 5000. 16:47:30 B-tree scanner range scan size set to -1. 16:47:30 B-tree scanner ALICE mode set to 6. 16:47:30 B-tree scanner index compression level set to med. 16:47:30 DR: Reservation of the last logical log for log backup turned on 16:47:30 DR: Trying to connect to secondary server = awwdst10b_t10 16:47:30 DR: Cannot connect to secondary server 16:47:30 DR: Turned off on primary server 16:47:30 Physical Recovery Started at Page (3:152094). 16:47:30 Physical Recovery Complete: 0 Pages Examined, 0 Pages Restored. 16:47:30 Logical Recovery Started. 16:47:30 10 recovery worker threads will be started. 16:47:30 Logical Recovery has reached the transaction cleanup phase. 16:47:30 Logical Recovery Complete. 0 Committed, 0 Rolled Back, 0 Open, 0 Bad Locks 16:47:31 Dataskip is now OFF for all dbspaces 16:47:32 Checkpoint Completed: duration was 0 seconds. 16:47:32 Mon Oct 22 - loguniq 54478, logpos 0xe30c0, timestamp: 0xd41004bc Interval: 399667 16:47:39 Logical Log 54478 Complete, timestamp: 0xd41102e7. 16:47:41 Action: Increase the size of the individual logical log files so that it takes at least 30 seconds to fill each one. Look at the online log to determine how quickly the log files are filling, and then increase the size of the files proportionately. 16:47:41 Logical Log 54479 Complete, timestamp: 0xd411f36b. 16:47:44 DR: Primary server connected 16:47:44 DR: Secondary server needs failure recovery 16:47:46 DR_ERR set to -1 16:47:46 DR: Failure recovery error (14) 16:47:46 Logical Log 54480 Complete, timestamp: 0xd4131922. 16:47:47 DR: Turned off on primary server 16:47:47 DR: Cannot connect to secondary server 16:47:54 Logical Log 54483 Complete, timestamp: 0xd415bc16. 16:47:58 DR: Primary server connected 16:47:58 DR: Secondary server needs failure recovery 16:48:00 DR: Sending log 54478, size 2500 pages, 100.00 percent used 16:48:00 Logical Log 54484 Complete, timestamp: 0xd4172711. . 16:48:05 Logical Log 54485 Complete, timestamp: 0xd4180cc0. 16:48:07 DR: Sending log 54482, size 2500 pages, 100.00 percent used 16:48:20 Checkpoint Completed: duration was 1 seconds. 17:09:02 CM:Connection manager whics-t10 registered with the server 17:09:18 Logical Log 54478 - Backup Started 17:09:22 Logical Log 54488 - Backup Completed 17:09:22 Logical Log 54489 - Backup Started 17:33:32 Checkpoint Completed: duration was 0 seconds. 17:33:32 Mon Oct 22 - loguniq 54494, logpos 0x3aa018, timestamp: 0xd41e0c9d Interval: 399671 17:33:32 Maximum server connections 2 17:33:32 Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 24, Llog used 9 **Secondary node DB - before failure** onstat -g cluster IBM Informix Dynamic Server Version 12.10.FC10 -- Read-Only (Sec) -- Up 02:42:00 -- 1556768 Kbytes Primary Server:awwdst10a_t10 Index page logging status: Enabled Index page logging was enabled at: 2018/06/06 09:47:35 Server ACKed Log Supports Status (log, page) Updates awwdst10b_t10 54494,1135 No ASYNC(HDR),Connected,On 16:47:46 IBM Informix Dynamic Server Started. Mon Oct 22 16:47:47 2018 16:47:48 Requested shared memory segment size rounded from 405443KB to 405504KB 16:47:48 Shared memory segment will use huge pages. 16:47:48 Segment locked: addr=0x8a600000, size=415236096 16:47:48 Successfully added a bufferpool of page size 16K. 16:47:48 Warning: ONCONFIG dump directory (DUMPDIR) '/dump/informix/t10' has insecure permissions 16:47:48 Event alarms enabled. ALARMPROG = '/usr/informix/etc/no_log.sh' 16:47:48 Booting Language <c> from module <> 16:47:48 Loading Module <CNULL> 16:47:48 Booting Language <builtin> from module <> 16:47:48 Loading Module <BUILTINNULL> 16:47:53 Affinitied VP 1 to phys proc 1 16:47:53 DR: DRAUTO is 3 (CMSM) 16:47:53 DR: ENCRYPT_HDR is 0 (HDR encryption Disabled) 16:47:53 Event notification facility epoll enabled. 16:47:53 Trusted host cache successfully built:/etc/hosts.equiv. 16:47:53 CCFLAGS2 value set to 0x200 16:47:53 SQL_FEAT_CTRL value set to 0x8008 16:47:53 SQL_DEF_CTRL value set to 0x4b0 16:47:53 IBM Informix Dynamic Server Version 12.10.FC10 Software Serial Number AAA#B000000 16:47:54 Performance Advisory: The current size of the physical log buffer is smaller than recommended. 16:47:54 Results: Transaction performance might not be optimal. 16:47:54 Action: For better performance, increase the physical log buffer size to 128. 16:47:54 IBM Informix Dynamic Server Initialized -- Shared Memory Initialized. 16:47:54 Started 1 B-tree scanners. 16:47:54 B-tree scanner threshold set at 5000. 16:47:54 B-tree scanner range scan size set to -1. 16:47:54 B-tree scanner ALICE mode set to 6. 16:47:54 B-tree scanner index compression level set to med. 16:47:54 Physical Recovery Started at Page (3:151957). 16:47:54 Physical Recovery Complete: 0 Pages Examined, 0 Pages Restored. 16:47:54 DR: Trying to connect to primary server = awwdst10a_t10 16:47:54 smx creates 1 transports to server awwdst10a_t10 16:47:55 Dataskip is now OFF for all dbspaces 16:47:55 Restartable Restore has been ENABLED 16:47:55 Recovery Mode 16:47:57 DR: Secondary server connected 16:47:58 DR: Secondary server needs failure recovery 16:47:59 DR: Failure recovery from disk in progress ... 16:47:59 Logical Recovery Started. 16:47:59 10 recovery worker threads will be started. 16:47:59 Start Logical Recovery - Start Log 54478, End Log ? 16:47:59 Starting Log Position - 54478 0xdd018 16:48:00 Started processing open transactions on secondary during startup Server in fast recovery until these open transactions finish 16:48:00 Finished processing open transactions on secondary during startup. 16:48:00 Checkpoint Completed: duration was 0 seconds. 16:48:00 Mon Oct 22 - loguniq 54478, logpos 0xdf0c0, timestamp: 0xd410117e Interval: 399666 16:48:00 Checkpoint Completed: duration was 0 seconds. 16:48:00 Mon Oct 22 - loguniq 54478, logpos 0xe30c0, timestamp: 0xd41011cf Interval: 399668 16:48:00 Maximum server connections 0 16:48:00 Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 2, Llog used 0 16:48:03 Logical Log 54478 Complete, timestamp: 0xd410f549. 16:48:17 Logical Log 54486 Complete, timestamp: 0xd418e8e4. 16:48:19 B-tree scanners disabled. 16:48:20 DR: HDR secondary server operational 16:48:20 Checkpoint Completed: duration was 0 seconds. 16:48:20 Mon Oct 22 - loguniq 54487, logpos 0xd018, timestamp: 0xd418e973 Interval: 399669 16:48:20 Maximum server connections 0 16:48:20 Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 7761, Llog used 0 16:48:51 Logical Log 54492 Complete, timestamp: 0xd41d1f30. 16:56:04 Logical Log 54493 Complete, timestamp: 0xd41dafde. 17:03:31 Checkpoint Completed: duration was 0 seconds. 17:03:31 Mon Oct 22 - loguniq 54494, logpos 0x35f598, timestamp: 0xd41dff42 Interval: 399670 17:09:03 Booting Language <spl> from module <> 17:09:03 Loading Module <SPLNULL> 17:09:14 CM:Connection manager whics-t10 registered with the server 17:18:32 Checkpoint Completed: duration was 0 seconds. 17:18:32 Mon Oct 22 - loguniq 54494, logpos 0x3a1018, timestamp: 17:18:32 Maximum server connections 1 17:18:32 Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 0, Plog used 27, Llog used 0 17:33:32 Checkpoint Completed: duration was 0 seconds. 17:33:32 Mon Oct 22 - loguniq 54494, logpos 0x3aa018, timestamp: 0xd41e0c7e Interval: 399672 **ICM - post failure** Mon Oct 22 23:23:44 2018 23:23:44 IBM Informix Connection Manager 23:23:44 IBM Informix CSDK Version 4.10, IBM Informix-ESQL Version 4.10.FC10 23:23:44 Build Number: N046 23:23:44 Build Host: hans 23:23:44 Build OS: Linux 2.6.18-128.el5 23:23:44 Build Date: Mon Oct 23 07:06:25 CDT 2017 23:23:44 GLS Version: glslib-6.00.FC11 23:23:44 23:23:44 set CM_TIMEOUT 60 23:23:44 set global EVENT_TIMEOUT 60 23:23:44 set global SECONDARY_EVENT_TIMEOUT 60 23:23:44 DEBUG[TID139636001822496]:add type primary to SLA sla_cluster_1 [cmsm_sla.c:parse_sla:2412] 23:23:44 SLA sla_cluster_1 listener mode REDIRECT 23:23:44 Connection Manager name is whics-t10 23:23:44 Starting Connection Manager... 23:23:44 DEBUG[TID139636001822496]:Password File /opt/informix-12.10.fc10/etc/passwd_file failed error:No such file or directory[2] [cmsm_main.c:cmsm_pw_init:1782] 23:23:44 Warning: Password Manager failed; working in trusted node mode 23:23:44 the current maximum number of file descriptors is 32767 23:23:44 DEBUG[TID139636001822496]:new daemon pid is 3804 [cmsm_main.c:cmsm_daemonize:4653] 23:23:44 DEBUG[TID139636001822496]:dbservername = cluster_1 [cmsm_server.ec:cmsm_primary:2819] 23:23:44 DEBUG[TID139636001822496]:nettype = [cmsm_server.ec:cmsm_primary:2820] 23:23:44 DEBUG[TID139636001822496]:hostname = - [cmsm_server.ec:cmsm_primary:2821] 23:23:44 DEBUG[TID139636001822496]:servicename = - [cmsm_server.ec:cmsm_primary:2822] 23:23:44 DEBUG[TID139636001822496]:options = i=10 [cmsm_server.ec:cmsm_primary:2823] 23:23:44 DEBUG[TID139636001822496]:connect to cluster_1 for CLUSTER cluster_1 [cmsm_server.ec:cmsm_primary:2839] 23:23:44 DEBUG[TID139636001822496]:connect to group cluster_1 server awwdst10a_t10 for CLUSTER cluster_1 [cmsm_server.ec:cmsm_primary:2860] 23:23:44 DEBUG[TID139636001822496]:create new thread -1995749632 for awwdst10a_t10 [cmsm_sla.c:cmsm_update_server_ex:996] 23:23:44 DEBUG[TID139636001822496]:connect to group cluster_1 server awwdst10b_t10 for CLUSTER cluster_1 [cmsm_server.ec:cmsm_primary:2860] 23:23:44 DEBUG[TID139636001822496]:create new thread -1997850880 for awwdst10b_t10 [cmsm_sla.c:cmsm_update_server_ex:996] 23:23:44 listener sla_cluster_1 initializing 23:23:44 DEBUG[TID139635976775424]:listenter sla_cluster_1 host=localhost service=informix_t10 nettype=soctcp engtypeon [cmsm_main.c:cmsm_listener_thread:1166] 23:23:49 DEBUG[TID139635980977920]:CONNECT to @awwdst10a_t10|onsoctcp|awwdst10a|informix_t10 AS awwdst10a_t101 SQLCODE = (-908,107,awwdst10a_t10) [cmsm_server.ec:cmsm_connect_byurl_opt:136] 23:23:49 ALARM 4001 unable to connect to Informix server awwdst10a_t10 from awguht10 error -908 23:23:49 DEBUG[TID139635980977920]:Arbitrator return, server awwdst10a_t10 is not primary [cmsm_arb.c:arb_server_down:1399] 23:23:50 DEBUG[TID139635978876672]:CONNECT to @awwdst10b_t10|onsoctcp|awwdst10b|informix_t10 AS awwdst10b_t101 SQLCODE = (0,0,) [cmsm_server.ec:cmsm_connect_byurl_opt:136] 23:23:50 DEBUG[TID139635978876672]:database sysmaster @awwdst10b_t10|onsoctcp|awwdst10b|informix_t10 AS awwdst10b_t101 SQLCODE = (0,0,) [cmsm_server.ec:cmsm_connect_byurl_opt:145] 23:23:50 Listener sla_cluster_1 DBSERVERS=primary is active with 4 worker threads 23:23:50 DEBUG[TID139635978876672]:awwdst10b_t10 protocols = 1 [cmsm_server.ec:cmsm_add_dbaliases:2427] 23:23:50 DEBUG[TID139635978876672]:skip awwdst10b_t10 alias awwdst10b_t10_drda with protocol 1 [cmsm_server.ec:cmsm_add_dbaliases:2694] 23:23:50 DEBUG[TID139635978876672]:SQL fetch sysconfig_cursor sqlcode = (100,0,) [cmsm_server.ec:cmsm_add_dbaliases:2540] 23:23:52 DEBUG[TID139635980977920]:CONNECT to @awwdst10a_t10|onsoctcp|awwdst10a|informix_t10 AS awwdst10a_t102 SQLCODE = (-908,107,awwdst10a_t10) [cmsm_server.ec:cmsm_connect_byurl_opt:136] 23:23:52 DEBUG[TID139635980977920]:Arbitrator return, server awwdst10a_t10 is not primary [cmsm_arb.c:arb_server_down:1399] 23:24:12 DEBUG[TID139635980977920]:Monitor awwdst10a_t10 exit by arbitrator [cmsm_er.ec:cmsm_er_monitor:2766] 23:24:12 DEBUG[TID139636001822496]:fcntl(/opt/informix-12.10.fc10/tmp/cmsm.pid.whics-t10) success error:Success[0] [cmsm_main.c:cmsm_pidfile:2193] 23:24:12 DEBUG[TID139635968210688]:Warning: no server returned for SLA sla_cluster_1 [cmsm_sla.c:select_server_from_sla:3037] **23:24:12 Connection Manager successfully connected to awwdst10b_t10** **23:24:12 DEBUG[TID139635966109440]:Warning: no server returned for SLA sla_cluster_1 [cmsm_sla.c:select_server_from_sla:3037]** **23:24:12 DEBUG[TID139635966109440]:SLA sla_cluster_1 cannot find database server for SQLI client [cmsm_main.c:cmsm_process_sqli:1522]** 23:24:12 Connection Manager started successfully 23:24:12 DEBUG[TID139635980977920]:all monitors exited, server awwdst10a_t10 register count = 0 [cmsm_er.ec:cmsm_er_monitor:3133] 23:24:12 DEBUG[TID139635968210688]:SLA sla_cluster_1 cannot find database server for SQLI client [cmsm_main.c:cmsm_process_sqli:1522] 23:24:12 The server type of cluster cluster_1 server awwdst10a_t10 is Primary. 23:24:12 DEBUG[TID139635978876672]:create new thread -1995749632 for awwdst10a_t10 [cmsm_sla.c:cmsm_update_server_ex:996] 23:24:12 DEBUG[TID139635978876672]:cluster cluster_1 primary is not active [cmsm_arb.c:arb_set_primary:774] 23:24:12 DEBUG[TID139635978876672]:server awwdst10b_t10 version is 12.10 [cmsm_er.ec:cmsm_er_monitor:2971] 23:24:12 DEBUG[TID139635978876672]:ACTIVE MONITOR awwdst10b_t10 awwdst10b_t10 172.29.26.122 53089 [cmsm_er.ec:cmsm_er_monitor:3013] 23:24:12 DEBUG[TID139635978876672]:register CM successfully whics-t10 with token 1838996547 [cmsm_server.ec:cmsm_cm_register:2330] 23:24:12 DEBUG[TID139635978876672]:Register server awwdst10b_t10 for awwdst10b_t10 count = 1 [cmsm_er.ec:cmsm_er_event_process:1280] 23:24:12 DEBUG[TID139635978876672]:get awwdst10b_t10 CLUST_CHG event 1:4[DISCONNECT] awwdst10a_t10|Primary| [cmsm_er.ec:cmsm_er_event_process:1733] 23:24:17 DEBUG[TID139635980977920]:CONNECT to @awwdst10a_t10|onsoctcp|awwdst10a|informix_t10 AS awwdst10a_t103 SQLCODE = (-908,107,awwdst10a_t10) [cmsm_server.ec:cmsm_connect_byurl_opt:136] 23:24:17 DEBUG[TID139635980977920]:Arbitrator return, server awwdst10a_t10 is not a DBSERVERALIAS of primary [cmsm_arb.c:arb_server_down:1417] 23:24:17 DEBUG[TID139635980977920]:Monitor awwdst10a_t10 exit by arbitrator [cmsm_er.ec:cmsm_er_monitor:2766] 23:24:17 DEBUG[TID139635980977920]:all monitors exited, server awwdst10a_t10 register count = 0 [cmsm_er.ec:cmsm_er_monitor:3133] 23:24:46 The server type of cluster cluster_1 server awwdst10a_t10 is Primary. 23:24:46 DEBUG[TID139635978876672]:create new thread -1995749632 for awwdst10a_t10 [cmsm_sla.c:cmsm_update_server_ex:996] 23:24:52 DEBUG[TID139635980977920]:CONNECT to @awwdst10a_t10|onsoctcp|awwdst10a|informix_t10 AS awwdst10a_t104 SQLCODE = (-908,107,awwdst10a_t10) [cmsm_server.ec:cmsm_connect_byurl_opt:136] 23:24:52 DEBUG[TID139635980977920]:Arbitrator return, server awwdst10a_t10 is not a DBSERVERALIAS of primary [cmsm_arb.c:arb_server_down:1417] 23:24:52 DEBUG[TID139635980977920]:Monitor awwdst10a_t10 exit by arbitrator [cmsm_er.ec:cmsm_er_monitor:2766] 23:24:52 DEBUG[TID139635980977920]:all monitors exited, server awwdst10a_t10 register count = 0 [cmsm_er.ec:cmsm_er_monitor:3133] 23:25:02 DEBUG[TID139635896350464]:Warning: no server returned for SLA sla_cluster_1 [cmsm_sla.c:select_server_from_sla:3037] 23:25:02 DEBUG[TID139635896350464]:SLA sla_cluster_1 cannot find database server for SQLI client [cmsm_main.c:cmsm_process_sqli:1522] 23:25:20 The server type of cluster cluster_1 server awwdst10a_t10 is Primary. 23:25:23 DEBUG[TID139635980977920]:Arbitrator return, server awwdst10a_t10 is not a DBSERVERALIAS of primary [cmsm_arb.c:arb_server_down:1417] 23:25:23 DEBUG[TID139635980977920]:Monitor awwdst10a_t10 exit by arbitrator [cmsm_er.ec:cmsm_er_monitor:2766] 23:25:23 DEBUG[TID139635980977920]:all monitors exited, server awwdst10a_t10 register count = 0 [cmsm_er.ec:cmsm_er_monitor:3133] **DB secondary node - post failure** onstat -g cluster IBM Informix Dynamic Server Version 12.10.FC10 -- Read-Only (Sec) -- Up 16:17:47 -- 1550492 Kbytes Primary Server:awwdst10a_t10 Index page logging status: Enabled Index page logging was enabled at: 2018/06/06 09:47:35 Server ACKed Log Supports Status (log, page) Updates awwdst10b_t10 0,0 No ASYNC(HDR),Disconnected,Off 22:46:28 Parameter's user-configured value was adjusted. (ALARMPROGRAM) 22:46:28 IBM Informix Dynamic Server Started. 22:46:28 Insufficient free huge pages in /proc/meminfo for shared memory segment. Requested: 285405184 bytes. Available: 0 bytes. The default memory page size will be used. 22:46:28 Requested shared memory segment size rounded from 275020KB to 278716KB 22:46:28 Segment locked: addr=0x44000000, size=285405184 Mon Oct 22 22:46:30 2018 22:46:30 Insufficient free huge pages in /proc/meminfo for shared memory segment. Requested: 56774656 bytes. Available: 0 bytes. The default memory page size will be used. 22:46:30 Requested shared memory segment size rounded from 55443KB to 55444KB 22:46:30 Segment locked: addr=0x73877000, size=56774656 22:46:30 Successfully added a bufferpool of page size 2K. 22:46:30 Insufficient free huge pages in /proc/meminfo for shared memory segment. Requested: 107974656 bytes. Available: 0 bytes. The default memory page size will be used. 22:46:30 Requested shared memory segment size rounded from 105443KB to 105444KB 22:46:30 Segment locked: addr=0x76e9c000, size=107974656 22:46:30 Successfully added a bufferpool of page size 4K. 22:46:30 Insufficient free huge pages in /proc/meminfo for shared memory segment. Requested: 210374656 bytes. Available: 0 bytes. The default memory page size will be used. 22:46:30 Requested shared memory segment size rounded from 205443KB to 205444KB 22:46:30 Segment locked: addr=0x7d595000, size=210374656 22:46:30 Successfully added a bufferpool of page size 8K. 22:46:30 Insufficient free huge pages in /proc/meminfo for shared memory segment. Requested: 415174656 bytes. Available: 0 bytes. The default memory page size will be used. 22:46:30 Requested shared memory segment size rounded from 405443KB to 405444KB 22:46:30 Segment locked: addr=0x89e36000, size=415174656 22:46:30 Successfully added a bufferpool of page size 16K. 22:46:30 Warning: ONCONFIG dump directory (DUMPDIR) '/dump/informix/t10' has insecure permissions 22:46:30 Event alarms enabled. ALARMPROG = '/usr/informix/etc/no_log.sh' 22:46:30 Booting Language <c> from module <> 22:46:30 Loading Module <CNULL> 22:46:30 Booting Language <builtin> from module <> 22:46:30 Loading Module <BUILTINNULL> 22:46:36 Affinitied VP 1 to phys proc 1 22:46:36 DR: DRAUTO is 3 (CMSM) 22:46:36 DR: ENCRYPT_HDR is 0 (HDR encryption Disabled) 22:46:36 Event notification facility epoll enabled. 22:46:36 Trusted host cache successfully built:/etc/hosts.equiv. 22:46:36 CCFLAGS2 value set to 0x200 22:46:36 SQL_FEAT_CTRL value set to 0x8008 22:46:36 SQL_DEF_CTRL value set to 0x4b0 22:46:36 IBM Informix Dynamic Server Version 12.10.FC10 Software Serial Number AAA#B000000 22:46:37 Performance Advisory: The current size of the physical log buffer is smaller than recommended. 22:46:37 Results: Transaction performance might not be optimal. 22:46:37 Action: For better performance, increase the physical log buffer size to 128. 22:46:38 IBM Informix Dynamic Server Initialized -- Shared Memory Initialized. 22:46:38 Started 1 B-tree scanners. 22:46:38 B-tree scanner threshold set at 5000. 22:46:38 B-tree scanner range scan size set to -1. 22:46:38 B-tree scanner ALICE mode set to 6. 22:46:38 B-tree scanner index compression level set to med. 22:46:38 Physical Recovery Started at Page (3:164234). 22:46:38 Physical Recovery Complete: 0 Pages Examined, 0 Pages Restored. 22:46:38 DR: Trying to connect to primary server = awwdst10a_t10 22:48:39 DR: Cannot connect to primary server 22:48:39 DR: Turned off on secondary server 22:48:39 Dataskip is now OFF for all dbspaces 22:48:39 Restartable Restore has been ENABLED 22:48:39 Recovery Mode 23:23:50 Booting Language <spl> from module <> 23:23:50 Loading Module <SPLNULL> 23:24:12 CM:Connection manager whics-t10 registered with the server

I am Pradeep
Mon, 05 Nov 2018 19:34:43 GMT

Hi, Looks like this issue will require analysis of the logs and debugging to understand what is happening. Have you opened a support ticket for this? If not, please do so and our technical support experts will help you.

leoboygaurav
Fri, 30 Nov 2018 13:34:31 GMT

It should have work,weather CM is configured or not. Can you give me your onconfig file from both the servers, online.log frm both the servers and cm log file when thius event happens.

alexandremarini
Fri, 14 Dec 2018 09:50:54 GMT

Informix automatic failovering will completely depend on DRAUTO configuration. If you are using CM, I suppose you have it set to 3. With that, if your CM node/cluster is down, Informix will not be able to do it itself, and cannot be blamed for that. Hope that helps.