时间不同步导致ogg部署异常

一、错误检查

[oracle@srtcreen ~]$ ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.1.1.1.1 OGGCORE_11.1.1.1.1_PLATFORMS_110729.1700 
Linux, x64, 64bit (optimized), Oracle 10g on Jul 29 2011 19:43:29

Copyright (C) 1995, 2011, Oracle and/or its affiliates. All rights reserved.



GGSCI (srtcreen) 1> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
REPLICAT    RUNNING     R_1         00:00:00      00:00:07    
REPLICAT    RUNNING     R_2         00:00:00      00:00:05    
REPLICAT    ABENDED     R_3         19:19:34      00:17:33    
REPLICAT    STOPPED     R_4         00:00:00      19:44:24    
REPLICAT    STOPPED     R_5         00:00:00      19:44:13    

GGSCI (srtcreen) 2> view report r_3


***********************************************************************
                 Oracle GoldenGate Delivery for Oracle
     Version 11.1.1.1.1 OGGCORE_11.1.1.1.1_PLATFORMS_110729.1700 
   Linux, x64, 64bit (optimized), Oracle 10g on Sep 13 2011 21:33:03
 
Copyright (C) 1995, 2011, Oracle and/or its affiliates. All rights reserved.


                    Starting at 2011-12-02 16:36:58
***********************************************************************

Operating System Version:
Linux
Version #1 SMP Fri Apr 2 14:58:14 EDT 2010, Release 2.6.18-194.el5
Node: srtcreen
Machine: x86_64
                         soft limit   hard limit
Address Space Size   :    unlimited    unlimited
Heap Size            :    unlimited    unlimited
File Size            :    unlimited    unlimited
CPU Time             :    unlimited    unlimited

Process id: 13398

Description: 

***********************************************************************
**            Running with the following parameters                  **
***********************************************************************
replicat r_3
ASSUMETARGETDEFS
HANDLECOLLISIONS
SETENV (NLS_LANG =AMERICAN_AMERICA.ZHS16GBK)
Set environment variable (NLS_LANG=AMERICAN_AMERICA.ZHS16GBK)
userid it1ogg, password ********
--file for dicarded transaction --
discardfile /opt/OGG/discard/R_3.txt, append, megabytes 100
DDL
MAP CSCNEW.TAB_CS_CALL_PICKUP, TARGET SRT_CREEN.TAB_CS_CALL_PICKUP, KEYCOLS (CALL_ID);


CACHEMGR virtual memory values (may have been adjusted)
CACHEBUFFERSIZE:                         64K
CACHESIZE:                              512M
CACHEBUFFERSIZE (soft max):               4M
CACHEPAGEOUTSIZE (normal):                4M
PROCESS VM AVAIL FROM OS (min):           1G
CACHESIZEMAX (strict force to disk):    881M

Database Version:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

Database Language and Character Set:
NLS_LANG = "AMERICAN_AMERICA.ZHS16GBK" 
NLS_LANGUAGE     = "AMERICAN" 
NLS_TERRITORY    = "AMERICA" 
NLS_CHARACTERSET = "ZHS16GBK" 

For further information on character set settings, please refer to user manual.

***********************************************************************
**                     Run Time Messages                             **
***********************************************************************

Opened trail file /opt/OGG/dirdat/U3000000 at 2011-12-02 16:36:58

MAP resolved (entry CSCNEW.TAB_CS_CALL_PICKUP):
  MAP CSCNEW.TAB_CS_CALL_PICKUP, TARGET SRT_CREEN.TAB_CS_CALL_PICKUP, KEYCOLS (CALL_ID);
Using following columns in default map by name:
  PICKUP_ID, CALL_ID, CALL_SERIAL, USER_ID, PICKUP_TIME, CALL_RESULT, 
  FAIL_REASON, CALL_TIME, CALL_DURA, END_TIME

Using the following key columns for target table SRT_CREEN.TAB_CS_CALL_PICKUP: CALL_ID.


2011-12-02 16:36:58  WARNING OGG-00869  OCI Error ORA-01407: cannot update ("SRT_CREEN"."TAB_CS_CALL_PICKUP"."PICKUP_ID") to NULL (status = 1407), SQL <UPDAT
E "SRT_CREEN"."TAB_CS_CALL_PICKUP" SET "PICKUP_ID" = :a0,"CALL_SERIAL" = :a2,"USER_ID" = :a3,"PICKUP_TIME" = :a4,"CALL_RESULT" = :a5,"FAIL_REASON" = :a6,"CAL
L_TIME" = :a7,"CALL_DURA" = :a8,"END_T>.

2011-12-02 16:36:58  WARNING OGG-01004  Aborted grouped transaction on 'SRT_CREEN.TAB_CS_CALL_PICKUP', Database error 1407 (OCI Error ORA-01407: cannot updat
e ("SRT_CREEN"."TAB_CS_CALL_PICKUP"."PICKUP_ID") to NULL (status = 1407), SQL <UPDATE "SRT_CREEN"."TAB_CS_CALL_PICKUP" SET "PICKUP_ID" = :a0,"CALL_SERIAL" = 
:a2,"USER_ID" = :a3,"PICKUP_TIME" = :a4,"CALL_RESULT" = :a5,"FAIL_REASON" = :a6,"CALL_TIME" = :a7,"CALL_DURA" = :a8,"END_T>).

2011-12-02 16:36:58  WARNING OGG-01003  Repositioning to rba 924 in seqno 0.

2011-12-02 16:36:59  WARNING OGG-01154  SQL error 1407 mapping CSCNEW.TAB_CS_CALL_PICKUP to SRT_CREEN.TAB_CS_CALL_PICKUP OCI Error ORA-01407: cannot update (
"SRT_CREEN"."TAB_CS_CALL_PICKUP"."PICKUP_ID") to NULL (status = 1407), SQL <UPDATE "SRT_CREEN"."TAB_CS_CALL_PICKUP" SET "PICKUP_ID" = :a0,"CALL_SERIAL" = :a2
,"USER_ID" = :a3,"PICKUP_TIME" = :a4,"CALL_RESULT" = :a5,"FAIL_REASON" = :a6,"CALL_TIME" = :a7,"CALL_DURA" = :a8,"END_T>.

2011-12-02 16:36:59  WARNING OGG-01003  Repositioning to rba 924 in seqno 0.

***********************************************************************
*                   ** Run Time Statistics **                         *
***********************************************************************

Last record for the last committed transaction is the following: 
___________________________________________________________________
Trail name :  /opt/OGG/dirdat/U3000000
Hdr-Ind    :     E  (x45)     Partition  :     .  (x04) 
UndoFlag   :     .  (x00)     BeforeAfter:     A  (x41) 
RecLength  :   157 (x009d)    IO Time    : 2011-12-01 21:17:24.084108  
IOType     :    15  (x0f)     OrigNode   :   255  (xff)
TransInd   :     .  (x03)     FormatType :     R  (x52)
SyskeyLen  :     0  (x00)     Incomplete :     .  (x00)
AuditRBA   :       3225       AuditPos   : 42227728
Continued  :     N  (x00)     RecCount   :     1  (x01)

2011-12-01 21:17:24.084108 FieldComp          Len   157 RBA 924
Name: CSCNEW.TAB_CS_CALL_PICKUP
___________________________________________________________________

Reading /opt/OGG/dirdat/U3000000, current RBA 924, 0 records

Report at 2011-12-02 16:36:59 (activity since 2011-12-02 16:36:58)

From Table CSCNEW.TAB_CS_CALL_PICKUP to SRT_CREEN.TAB_CS_CALL_PICKUP:
       #                   inserts:         0
       #                   updates:         0
       #                   deletes:         0
       #                  discards:         1


DDL replication statistics:

                    Operations:         0
             Mapped operations:         0
           Unmapped operations:         0
              Other operations:         0
           Excluded operations:         0
                        Errors:         0
                Retried errors:         0
              Discarded errors:         0
                Ignored errors:         0




Last log location read:
     FILE:      /opt/OGG/dirdat/U3000000
     SEQNO:     0
     RBA:       924
     TIMESTAMP: 2011-12-01 21:17:24.084108
     EOF:       NO
     READERR:   0

2011-12-02 16:36:59  ERROR   OGG-01668  PROCESS ABENDING.

--发现奇怪现象,我2011-12-02早上过来检查这个,发现时间竟然显示2011-12-02 16:36:59,第一反应系统时间错误,继续检查

GGSCI (srtcreen) 2> info r_2

REPLICAT   R_2       Last Started 2011-12-01 21:09   Status RUNNING
Checkpoint Lag       00:00:00 (updated 00:00:05 ago)
Log Read Checkpoint  File /opt/OGG/dirdat/U2000000
                     2011-12-02 17:01:29.927591  RBA 6234

GGSCI (srtcreen) 3> exit
[oracle@srtcreen OGG]$ cd dirdat
[oracle@srtcreen dirdat]$ ll
total 396
-rw-rw-rw- 1 oracle oinstall      0 Dec  1 21:10 U1000000
-rw-rw-rw- 1 oracle oinstall   5984 Dec  2 05:36 U2000000
-rw-rw-rw- 1 oracle oinstall 392258 Dec  2 16:52 U3000000
[oracle@srtcreen dirdat]$ stat U3000000
  File: `U3000000'
  Size: 392258          Blocks: 776        IO Block: 4096   regular file
Device: fd00h/64768d    Inode: 213844220   Links: 1
Access: (0666/-rw-rw-rw-)  Uid: (  501/  oracle)   Gid: (  501/oinstall)
Access: 2011-12-02 16:36:59.000000000 +0800
Modify: 2011-12-02 16:52:55.000000000 +0800
Change: 2011-12-02 16:52:55.000000000 +0800

###############查看源端数据库服务器时间#####################
[oracle@tykf ~]$ date
Fri Dec  2 08:50:10 CST 2011
[oracle@tykf ~]$ ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.1.1.0.0 Build 078
Linux, x64, 64bit (optimized), Oracle 10 on Jul 28 2010 13:21:11

Copyright (C) 1995, 2010, Oracle and/or its affiliates. All rights reserved.



GGSCI (tykf) 1> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     EXT_1       00:00:00      00:00:05    
EXTRACT     RUNNING     EXT_2       00:00:00      00:00:01    
EXTRACT     RUNNING     EXT_3       00:00:00      00:00:05    
EXTRACT     STOPPED     EXT_4       00:00:00      21:59:56    
EXTRACT     STOPPED     EXT_5       00:00:00      21:59:28    
EXTRACT     RUNNING     P_1         00:00:00      00:00:04    
EXTRACT     RUNNING     P_2         00:00:00      00:00:04    
EXTRACT     RUNNING     P_3         00:00:00      00:00:05    
EXTRACT     STOPPED     P_4         00:00:00      21:56:42    
EXTRACT     STOPPED     P_5         00:00:00      21:56:11    


GGSCI (tykf) 2> info ext_3

EXTRACT    EXT_3     Last Started 2011-12-01 13:11   Status RUNNING
Checkpoint Lag       00:00:00 (updated 00:00:03 ago)
Log Read Checkpoint  Oracle Redo Logs
                     2011-12-02 08:50:06  Seqno 3233, RBA 32267264


GGSCI (tykf) 3> info p_3

EXTRACT    P_3       Last Started 2011-12-01 13:11   Status RUNNING
Checkpoint Lag       00:00:00 (updated 00:00:00 ago)
Log Read Checkpoint  File /opt/OGG/dirdat/extract/A3000000
                     2011-12-02 08:50:15.000000  RBA 393705

--ext_3和p_3是目标端r_3的对应进程
#######################################################################

果然是系统时间错误,源端和目标端相差了近八个小时

二、更正目标端时间

[oracle@srtcreen dirdat]$ date
Fri Dec  2 16:55:55 CST 2011
[oracle@srtcreen OGG]$ su - root
Password: 
[root@srtcreen ~]# date -s 08:58:20
Fri Dec  2  08:58:20 CST
[root@srtcreen ~]# clock -w
[root@srtcreen ~]# date
Fri Dec  2  08:58:28 CST

[root@srtcreen ~]# su - oracle
[oracle@srtcreen dirdat]$ ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.1.1.1.1 OGGCORE_11.1.1.1.1_PLATFORMS_110729.1700 
Linux, x64, 64bit (optimized), Oracle 10g on Jul 29 2011 19:43:29

Copyright (C) 1995, 2011, Oracle and/or its affiliates. All rights reserved.



GGSCI (srtcreen) 1> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
REPLICAT    RUNNING     R_1         00:00:00      unknown     
REPLICAT    RUNNING     R_2         00:00:00      unknown     
REPLICAT    ABENDED     R_3         19:19:34      unknown     
REPLICAT    STOPPED     R_4         00:00:00      11:54:31    
REPLICAT    STOPPED     R_5         00:00:00      11:54:20    
[oracle@srtcreen OGG]$ cd dirdat
[oracle@srtcreen dirdat]$ ll
total 412
-rw-rw-rw- 1 oracle oinstall      0 Dec  1 21:10 U1000000
-rw-rw-rw- 1 oracle oinstall   6485 Dec  2  2011 U2000000
-rw-rw-rw- 1 oracle oinstall 407519 Dec  2 09:04 U3000000

最新写到目标端的数据已经是当前修改时间(这个是系统时间,肯定会修改过来)
出现Time Since Chkpt unknown,处理方法见ogg中Time Since Chkpt显示unknown解决

三、重设r_3时间点

GGSCI (srtcreen) 7> alter r_3,begin 2011-12-02 09:00:00
REPLICAT altered.


GGSCI (srtcreen) 9> start r_3

Sending START request to MANAGER ...
REPLICAT R_3 starting


GGSCI (srtcreen) 10> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
REPLICAT    RUNNING     R_1         00:00:00      00:00:05      
REPLICAT    RUNNING     R_2         00:00:00      00:00:02     
REPLICAT    ABENDED     R_3         unknown       00:00:00    
REPLICAT    STOPPED     R_4         00:00:00      11:55:40    
REPLICAT    STOPPED     R_5         00:00:00      11:55:29    


GGSCI (srtcreen) 11> view report r_3


***********************************************************************
                 Oracle GoldenGate Delivery for Oracle
     Version 11.1.1.1.1 OGGCORE_11.1.1.1.1_PLATFORMS_110729.1700 
   Linux, x64, 64bit (optimized), Oracle 10g on Sep 13 2011 21:33:03
 
Copyright (C) 1995, 2011, Oracle and/or its affiliates. All rights reserved.


                    Starting at 2011-12-02 09:09:31
***********************************************************************

Operating System Version:
Linux
Version #1 SMP Fri Apr 2 14:58:14 EDT 2010, Release 2.6.18-194.el5
Node: srtcreen
Machine: x86_64
                         soft limit   hard limit
Address Space Size   :    unlimited    unlimited
Heap Size            :    unlimited    unlimited
File Size            :    unlimited    unlimited
CPU Time             :    unlimited    unlimited

Process id: 13629

Description: 

***********************************************************************
**            Running with the following parameters                  **
***********************************************************************
replicat r_3
ASSUMETARGETDEFS
HANDLECOLLISIONS
SETENV (NLS_LANG =AMERICAN_AMERICA.ZHS16GBK)
Set environment variable (NLS_LANG=AMERICAN_AMERICA.ZHS16GBK)
userid it1ogg, password ********
--file for dicarded transaction --
discardfile /opt/OGG/discard/R_3.txt, append, megabytes 100
DDL
MAP CSCNEW.TAB_CS_CALL_PICKUP, TARGET SRT_CREEN.TAB_CS_CALL_PICKUP, KEYCOLS (CALL_ID);


CACHEMGR virtual memory values (may have been adjusted)
CACHEBUFFERSIZE:                         64K
CACHESIZE:                              512M
CACHEBUFFERSIZE (soft max):               4M
CACHEPAGEOUTSIZE (normal):                4M
PROCESS VM AVAIL FROM OS (min):           1G
CACHESIZEMAX (strict force to disk):    881M

Database Version:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
PL/SQL Release 10.2.0.4.0 - Production
CORE    10.2.0.4.0      Production
TNS for Linux: Version 10.2.0.4.0 - Production
NLSRTL Version 10.2.0.4.0 - Production

Database Language and Character Set:
NLS_LANG = "AMERICAN_AMERICA.ZHS16GBK" 
NLS_LANGUAGE     = "AMERICAN" 
NLS_TERRITORY    = "AMERICA" 
NLS_CHARACTERSET = "ZHS16GBK" 

For further information on character set settings, please refer to user manual.

***********************************************************************
**                     Run Time Messages                             **
***********************************************************************

Opened trail file /opt/OGG/dirdat/U3000000 at 2011-12-02 09:09:32

MAP resolved (entry CSCNEW.TAB_CS_CALL_PICKUP):
  MAP CSCNEW.TAB_CS_CALL_PICKUP, TARGET SRT_CREEN.TAB_CS_CALL_PICKUP, KEYCOLS (CALL_ID);
Using following columns in default map by name:
  PICKUP_ID, CALL_ID, CALL_SERIAL, USER_ID, PICKUP_TIME, CALL_RESULT, 
  FAIL_REASON, CALL_TIME, CALL_DURA, END_TIME

Using the following key columns for target table SRT_CREEN.TAB_CS_CALL_PICKUP: CALL_ID.


2011-12-02 09:09:32  WARNING OGG-00869  OCI Error ORA-01407: cannot update ("SRT_CREEN"."TAB_CS_CALL_PICKUP"."PICKUP_ID") to NULL (status = 1407), SQL <UPDAT
E "SRT_CREEN"."TAB_CS_CALL_PICKUP" SET "PICKUP_ID" = :a0,"CALL_SERIAL" = :a2,"USER_ID" = :a3,"PICKUP_TIME" = :a4,"CALL_RESULT" = :a5,"FAIL_REASON" = :a6,"CAL
L_TIME" = :a7,"CALL_DURA" = :a8,"END_T>.

2011-12-02 09:09:32  WARNING OGG-01004  Aborted grouped transaction on 'SRT_CREEN.TAB_CS_CALL_PICKUP', Database error 1407 (OCI Error ORA-01407: cannot updat
e ("SRT_CREEN"."TAB_CS_CALL_PICKUP"."PICKUP_ID") to NULL (status = 1407), SQL <UPDATE "SRT_CREEN"."TAB_CS_CALL_PICKUP" SET "PICKUP_ID" = :a0,"CALL_SERIAL" = 
:a2,"USER_ID" = :a3,"PICKUP_TIME" = :a4,"CALL_RESULT" = :a5,"FAIL_REASON" = :a6,"CALL_TIME" = :a7,"CALL_DURA" = :a8,"END_T>).

2011-12-02 09:09:32  WARNING OGG-01003  Repositioning to rba 375833 in seqno 0.

2011-12-02 09:09:32  WARNING OGG-01154  SQL error 1407 mapping CSCNEW.TAB_CS_CALL_PICKUP to SRT_CREEN.TAB_CS_CALL_PICKUP OCI Error ORA-01407: cannot update (
"SRT_CREEN"."TAB_CS_CALL_PICKUP"."PICKUP_ID") to NULL (status = 1407), SQL <UPDATE "SRT_CREEN"."TAB_CS_CALL_PICKUP" SET "PICKUP_ID" = :a0,"CALL_SERIAL" = :a2
,"USER_ID" = :a3,"PICKUP_TIME" = :a4,"CALL_RESULT" = :a5,"FAIL_REASON" = :a6,"CALL_TIME" = :a7,"CALL_DURA" = :a8,"END_T>.

2011-12-02 09:09:32  WARNING OGG-01003  Repositioning to rba 375833 in seqno 0.

Source Context :
  SourceModule            : [er.main]
  SourceID                : [/scratch/angorant/view_storage/angorant_ogg_12978807_x64/oggcore/OpenSys/src/app/er/rep.c]
  SourceFunction          : [take_rep_err_action]
  SourceLine              : [16134]
  ThreadBacktrace         : [8] elements
                          : [/opt/OGG/replicat(CMessageContext::AddThreadContext()+0x26) [0x5ef8b6]]
                          : [/opt/OGG/replicat(CMessageFactory::CreateMessage(CSourceContext*, unsigned int, ...)+0x7b2) [0x5e6382]]
                          : [/opt/OGG/replicat(_MSG_ERR_MAP_TO_TANDEM_FAILED(CSourceContext*, DBString<777> const&, DBString<777> const&, CMessageFactory::Me
ssageDisposition)+0x9b) [0x5c4bcb]]
                          : [/opt/OGG/replicat [0x81ac2f]]
                          : [/opt/OGG/replicat [0x8f73e2]]
                          : [/opt/OGG/replicat(main+0x84b) [0x50764b]]
                          : [/lib64/libc.so.6(__libc_start_main+0xf4) [0x3e2f41d994]]
                          : [/opt/OGG/replicat(__gxx_personality_v0+0x1da) [0x4e3c2a]]

2011-12-02 09:09:32  ERROR   OGG-01296  Error mapping from CSCNEW.TAB_CS_CALL_PICKUP to SRT_CREEN.TAB_CS_CALL_PICKUP.

***********************************************************************
*                   ** Run Time Statistics **                         *
***********************************************************************

Last record for the last committed transaction is the following: 
___________________________________________________________________
Trail name :  /opt/OGG/dirdat/U3000000
Hdr-Ind    :     E  (x45)     Partition  :     .  (x04) 
UndoFlag   :     .  (x00)     BeforeAfter:     A  (x41) 
RecLength  :   157 (x009d)    IO Time    : 2011-12-02 11:52:49.559112  
IOType     :    15  (x0f)     OrigNode   :   255  (xff)
TransInd   :     .  (x03)     FormatType :     R  (x52)
SyskeyLen  :     0  (x00)     Incomplete :     .  (x00)
AuditRBA   :       3233       AuditPos   : 8194064
Continued  :     N  (x00)     RecCount   :     1  (x01)

2011-12-02 11:52:49.559112 FieldComp          Len   157 RBA 375833
Name: CSCNEW.TAB_CS_CALL_PICKUP
___________________________________________________________________

Reading /opt/OGG/dirdat/U3000000, current RBA 375833, 0 records

Report at 2011-12-02 09:09:32 (activity since 2011-12-02 09:09:32)

From Table CSCNEW.TAB_CS_CALL_PICKUP to SRT_CREEN.TAB_CS_CALL_PICKUP:
       #                   inserts:         0
       #                   updates:         0
       #                   deletes:         0
       #                  discards:         1


DDL replication statistics:

                    Operations:         0
             Mapped operations:         0
           Unmapped operations:         0
              Other operations:         0
           Excluded operations:         0
                        Errors:         0
                Retried errors:         0
              Discarded errors:         0
                Ignored errors:         0




Last log location read:
     FILE:      /opt/OGG/dirdat/U3000000
     SEQNO:     0
     RBA:       375833
     TIMESTAMP: 2011-12-02 11:52:49.559112
     EOF:       NO
     READERR:   0


2011-12-02 09:09:32  ERROR   OGG-01668  PROCESS ABENDING.
--这里可以看出我重设的当前的时间点对应的io time为2011-12-02 11:52:49.559112,而现在尚未到这个时间,进程启动失败

GGSCI (srtcreen) 2> alter r_3,begin 2011-12-02 18:00:00
REPLICAT altered.
--重设时间到今天早上我重设时间之前的错误时间之后

GGSCI (srtcreen) 3> start r_3

Sending START request to MANAGER ...
REPLICAT R_3 starting


GGSCI (srtcreen) 4> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
REPLICAT    RUNNING     R_1         00:00:00      00:00:06    
REPLICAT    RUNNING     R_2         00:00:00      00:00:01    
REPLICAT    RUNNING     R_3         unknown       00:00:08    
REPLICAT    RUNNING     R_4         00:00:00      00:00:09    
REPLICAT    RUNNING     R_5         00:00:00      00:00:09    


GGSCI (srtcreen) 5> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
REPLICAT    RUNNING     R_1         00:00:00      00:00:09    
REPLICAT    RUNNING     R_2         00:00:00      00:00:03    
REPLICAT    RUNNING     R_3         00:00:00      00:00:00    
REPLICAT    RUNNING     R_4         00:00:00      00:00:02    
REPLICAT    RUNNING     R_5         00:00:00      00:00:01    


GGSCI (srtcreen) 6> stats r_3

Sending STATS request to REPLICAT R_3 ...

No active replication maps
DDL replication statistics:

*** Total statistics since replicat started     ***
        Operations                                   0.00
        Mapped operations                            0.00
        Unmapped operations                          0.00
        Other operations                             0.00
        Excluded operations                          0.00
        Errors                                       0.00
        Retried errors                               0.00
        Discarded errors                             0.00
        Ignored errors                               0.00
--进程启动,工作正常

三、补充说明
1、部署ogg前需要核对两边时间是否相同,为了避免不必要的麻烦,建议配置ntp同步时间
2、这里因为我们这要求该同步上线无严格时间要求,所以通过重设r_3时间点实现工作正常,如果事情比较紧急,建议删除trail文件,重建pump和replicat进程
3、这里说明ogg同步也还不是和时间一点关系都没有

Mysql查询视图:ERROR 1449 (HY000)

1、问题重现
前几天因为有人删除了数据库中的记录,今天关闭了数据库的远程访问功能,今天接到开发报告,说出现 The user specified as a definer (‘air’@’%’) does not exist错误,他们定位是一张视图不能访问。利用实验重现了他们的情况

[root@ECP-UC-DB1 ~]# mysql -uxff -pxifenfei
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 8846
Server version: 5.5.14-log MySQL Community Server (GPL)

Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> select user,host from mysql.user;
+------+---------------+
| user | host          |
+------+---------------+
| xff  | %             |
| root | 127.0.0.1     |
| repl | 192.168.11.10 |
| root | ::1           |
|      | ECP-UC-DB1    |
| root | ECP-UC-DB1    |
| root | localhost     |
+------+---------------+
7 rows in set (0.08 sec)

mysql> use xifenfei;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed

mysql> create view v_users as select * from wp_users;
Query OK, 0 rows affected (0.14 sec)

mysql> select count(*) from xifenfei.v_users;
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.03 sec)

mysql> update mysql.user set host='localhost' where user='xff' and host='%';
Query OK, 1 row affected (0.05 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> FLUSH PRIVILEGES;                   
Query OK, 0 rows affected (0.12 sec)

mysql> exit
Bye
[root@ECP-UC-DB1 ~]# mysql -uxff -pxifenfei
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 8847
Server version: 5.5.14-log MySQL Community Server (GPL)

Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use xff;
ERROR 1049 (42000): Unknown database 'xff'
mysql> use xifenfei;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> select * from v_users ;
ERROR 1449 (HY000): The user specified as a definer ('xff'@'%') does not exist

2、解决方法

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| test               |
| xifenfei           |
+--------------------+
5 rows in set (0.00 sec)

mysql> use information_schema;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed

mysql> desc VIEWS;
+----------------------+--------------+------+-----+---------+-------+
| Field                | Type         | Null | Key | Default | Extra |
+----------------------+--------------+------+-----+---------+-------+
| TABLE_CATALOG        | varchar(512) | NO   |     |         |       |
| TABLE_SCHEMA         | varchar(64)  | NO   |     |         |       |
| TABLE_NAME           | varchar(64)  | NO   |     |         |       |
| VIEW_DEFINITION      | longtext     | NO   |     | NULL    |       |
| CHECK_OPTION         | varchar(8)   | NO   |     |         |       |
| IS_UPDATABLE         | varchar(3)   | NO   |     |         |       |
| DEFINER              | varchar(77)  | NO   |     |         |       |
| SECURITY_TYPE        | varchar(7)   | NO   |     |         |       |
| CHARACTER_SET_CLIENT | varchar(32)  | NO   |     |         |       |
| COLLATION_CONNECTION | varchar(32)  | NO   |     |         |       |
+----------------------+--------------+------+-----+---------+-------+
10 rows in set (0.02 sec)

mysql> select TABLE_SCHEMA,TABLE_NAME,DEFINER from views;
+--------------+------------+---------+
| TABLE_SCHEMA | TABLE_NAME | DEFINER |
+--------------+------------+---------+
| xifenfei     | v_users    | xff@%   |
+--------------+------------+---------+
1 row in set (0.16 sec)

mysql> create or replace view v_users as select * from wp_users;
ERROR 1044 (42000): Access denied for user 'xff'@'localhost' to database 'information_schema'
mysql> create or replace view xifenfei.v_users as select * from xifenfei.wp_users;
Query OK, 0 rows affected (0.02 sec)

mysql> select TABLE_SCHEMA,TABLE_NAME,DEFINER from views;
+--------------+------------+---------------+
| TABLE_SCHEMA | TABLE_NAME | DEFINER       |
+--------------+------------+---------------+
| xifenfei     | v_users    | xff@localhost |
+--------------+------------+---------------+
1 row in set (0.01 sec)

mysql> select count(*) from xifenfei.v_users;
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.03 sec)

3、原因分析
因为创建视图使用的是xff@%用户(目前已经不存在),然后登录用户使用的是xff@localhost用户,导致mysql认为现在的用户无权限访问该视图,解决方法就是在当前用户下重建该视图

Data pump 中network_link参数的使用续(登录用户和源端用户不一致处理)

1、准备工作

[oracle@node1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Wed Nov 30 09:08:04 2011

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options

SQL> create user test identified by xifenfei;

User created.

SQL> grant connect,resource to test;

Grant succeeded.

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options

2、远程导出数据

[oracle@node1 ~]$  expdp test/xifenfei directory=test_dir dumpfile=t1.dmp network_link=dblink_test tables=chf.T2_1

Export: Release 11.2.0.3.0 - Production on Wed Nov 30 09:09:35 2011

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
ORA-39002: invalid operation
ORA-39070: Unable to open the log file.
ORA-39087: directory name TEST_DIR is invalid
--对目录需要相关权限

[oracle@node1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Wed Nov 30 09:09:59 2011

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options

SQL> grant read,write on directory test_dir to test;

Grant succeeded.

SQL>  expdp test/xifenfei directory=test_dir dumpfile=t1.dmp network_link=dblink_test tables=chf.T2_1
SP2-0734: unknown command beginning "expdp test..." - rest of line ignored.
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
[oracle@node1 ~]$  expdp test/xifenfei directory=test_dir dumpfile=t1.dmp network_link=dblink_test tables=chf.T2_1

Export: Release 11.2.0.3.0 - Production on Wed Nov 30 09:10:23 2011

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
ORA-31631: privileges are required
ORA-39109: Unprivileged users may not operate upon other users' schemas
--执行导出用户需要相关权限exp_full_database

[oracle@node1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Wed Nov 30 09:11:54 2011

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options

SQL> grant exp_full_database  to test;

Grant succeeded.

SQL> exit 
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
[oracle@node1 ~]$  expdp test/xifenfei directory=test_dir dumpfile=t1.dmp network_link=dblink_test tables=chf.T2_1

Export: Release 11.2.0.3.0 - Production on Wed Nov 30 09:14:29 2011

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
Starting "TEST"."SYS_EXPORT_TABLE_01":  test/******** directory=test_dir dumpfile=t1.dmp network_link=dblink_test tables=chf.T2_1 
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 22 MB
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
. . exported "CHF"."T2_1"                                9.326 MB  100000 rows
Master table "TEST"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for TEST.SYS_EXPORT_TABLE_01 is:
  /tmp/t1.dmp
Job "TEST"."SYS_EXPORT_TABLE_01" successfully completed at 09:14:35
--实现用test登录,导出源端chf用户下面的表

[oracle@node1 ~]$ rm /tmp/t1.dmp
[oracle@node1 ~]$  expdp test/xifenfei directory=test_dir dumpfile=t1.dmp network_link=dblink_test tables=T2_1

Export: Release 11.2.0.3.0 - Production on Wed Nov 30 09:14:57 2011

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
ORA-39001: invalid argument value
ORA-39195: At least one schema in the TABLE_FILTER does not exist.
--如果当前登录用户和表所属用户不同,需要指定schema的名称

3、直接导入数据到目标端

[oracle@node1 ~]$ impdp test/xifenfei directory=test_dir network_link=dblink_test REMAP_SCHEMA=chf:xff REMAP_TABLESPACE=odu:users tables=chf.t2_1

Import: Release 11.2.0.3.0 - Production on Wed Nov 30 09:45:38 2011

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
ORA-31631: privileges are required
ORA-39109: Unprivileged users may not operate upon other users' schemas
--登录用户需要imp_full_database权限

[oracle@node1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Wed Nov 30 09:46:41 2011

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options

SQL> grant imp_full_database to test;

Grant succeeded.

SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
[oracle@node1 ~]$ impdp test/xifenfei directory=test_dir network_link=dblink_test REMAP_SCHEMA=chf:xff REMAP_TABLESPACE=odu:users tables=chf.t2_1

Import: Release 11.2.0.3.0 - Production on Wed Nov 30 09:46:59 2011

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
Starting "TEST"."SYS_IMPORT_TABLE_01":  test/******** directory=test_dir network_link=dblink_test REMAP_SCHEMA=chf:xff REMAP_TABLESPACE=odu:users tables=chf.t2_1 
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 22 MB
Processing object type TABLE_EXPORT/TABLE/TABLE
. . imported "XFF"."T2_1"                                100000 rows
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Job "TEST"."SYS_IMPORT_TABLE_01" successfully completed at 09:47:07
--使用test用户登录,目标端用户为xff,源端用户为chf

[oracle@node1 ~]$ impdp test/xifenfei directory=test_dir network_link=dblink_test REMAP_SCHEMA=chf:xff REMAP_TABLESPACE=odu:users tables=t2_1

Import: Release 11.2.0.3.0 - Production on Wed Nov 30 09:52:54 2011

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
ORA-39001: invalid argument value
ORA-39195: At least one schema in the TABLE_FILTER does not exist.
--不同用户时,需要指定源端表所属用户

Data pump 中network_link参数的使用

一、准备工作
1、源端

[oracle@ECP-UC-DB1 ~]$ sqlplus  chf/xifenfei

SQL*Plus: Release 10.2.0.4.0 - Production on Tue Nov 29 15:07:35 2011

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>  select count(*) from T2_1 ;

  COUNT(*)
----------
    100000

2、目标端

--tns配置
test =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = 192.168.11.12)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = test)
    )
  )
[oracle@node1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Tue Nov 29 14:57:08 2011

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options

SQL> create public database link dblink_test  
  2  connect to chf identified by xifenfei using 'test';

Database link created.

SQL> grant read,write on directory test_dir to xff;

Grant succeeded.

二、导入方式
1、expdp导出,impdp导入

[oracle@node1 ~]$ expdp chf/xff directory=test_dir dumpfile=t1_2.dmp network_link=dblink_test tables=T2_1

Export: Release 11.2.0.3.0 - Production on Tue Nov 29 15:05:36 2011

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
Starting "CHF"."SYS_EXPORT_TABLE_01":  chf/******** directory=test_dir dumpfile=t1_2.dmp network_link=dblink_test tables=T2_1 
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 22 MB
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
. . exported "CHF"."T2_1"                                9.326 MB  100000 rows
Master table "CHF"."SYS_EXPORT_TABLE_01" successfully loaded/unloaded
******************************************************************************
Dump file set for CHF.SYS_EXPORT_TABLE_01 is:
  /tmp/t1_2.dmp
Job "CHF"."SYS_EXPORT_TABLE_01" successfully completed at 15:06:52
--1、userid使用的是目标端(expdp端)登录
--2、tables对应的表所有者需要和userid相同

[oracle@node1 tmp]$ ll /tmp/t1_2.dmp
-rw-r----- 1 oracle oinstall 9859072 11-29 15:06 /tmp/t1_2.dmp
[oracle@node1 tmp]$ impdp xff/xifenfei directory=test_dir dumpfile=t1_2.dmp REMAP_SCHEMA=chf:xff REMAP_TABLESPACE=odu:users 

Import: Release 11.2.0.3.0 - Production on Tue Nov 29 15:30:15 2011

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
Master table "XFF"."SYS_IMPORT_FULL_01" successfully loaded/unloaded
Starting "XFF"."SYS_IMPORT_FULL_01":  xff/******** directory=test_dir dumpfile=t1_2.dmp REMAP_SCHEMA=chf:xff REMAP_TABLESPACE=odu:users 
Processing object type TABLE_EXPORT/TABLE/TABLE
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
. . imported "XFF"."T2_1"                                9.326 MB  100000 rows
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Job "XFF"."SYS_IMPORT_FULL_01" successfully completed at 15:30:18
[oracle@node1 tmp]$ sqlplus xff/xifenfei

SQL*Plus: Release 11.2.0.3.0 Production on Tue Nov 29 15:30:43 2011

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options

SQL> select count(*) from t2_1;

  COUNT(*)
----------
    100000

2、使用impdp直接导入

[oracle@node1 tmp]$ impdp chf/xff directory=test_dir network_link=dblink_test REMAP_SCHEMA=chf:xff REMAP_TABLESPACE=odu:users tables=t2_1

Import: Release 11.2.0.3.0 - Production on Tue Nov 29 15:48:49 2011

Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
Starting "CHF"."SYS_IMPORT_TABLE_01":  chf/******** directory=test_dir network_link=dblink_test REMAP_SCHEMA=chf:xff REMAP_TABLESPACE=odu:users tables=t2_1 
Estimate in progress using BLOCKS method...
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 22 MB
Processing object type TABLE_EXPORT/TABLE/TABLE
. . imported "XFF"."T2_1"                                100000 rows
Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
Job "CHF"."SYS_IMPORT_TABLE_01" successfully completed at 15:49:00
--1、userid使用的是impdp端登录
--2、tables对应的表所有者需要和userid相同(tables的表所有者是源端,使用REMAP_SCHEMA映射owner)

[oracle@node1 tmp]$ sqlplus xff/xifenfei

SQL*Plus: Release 11.2.0.3.0 Production on Tue Nov 29 15:51:18 2011

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options

SQL> select count(*) from t2_1;

  COUNT(*)
----------
    100000

如果登录用户和需要导入的表用户名不同,暂不清楚怎么处理

Mysql误删除数据及其bug分析

一、现状描述
11月25日晚上8点40分接到现场电话,说我们公司所有员工的im不能正常登陆,im数据库服务器(mysql数据库)的公司表中对应的我们公司名称被删除,需要处理。接到这个异常后,第一想到的是下午下班前,收到一封ogg的警告邮件说ogg进程终止,然后我登陆数据库查看的时候,发现ogg已经工作正常,所以也就没有太多关注。既然已经出现了这个问题,那么先解决问题,再分析原因。因为这些都是ogg从oracle端同步过来的一些数据,所以直接从oracle那边初始化一份过来,然后重设同步程序就可以了。

二、错误分析
1、出现这个问题,第一想到的就是binlog,因为公司表的数据是从我们oracle那边同步过去的,而且oracle那边没有任何关于这个表的删除操作,所以我定位这个表的delete操作

[mysql@ezgclient mysqllog]$ mysqlbinlog mysqlbin.000150>/tmp/11_25.txt
[mysql@ezgclient mysqllog]$ vi /tmp/11_25.txt
……
# at 1396789
# at 1396875
#111125 16:15:31 server id 2  end_log_pos 1396875       Table_map: `a`.`abc` mapped to number 5304
#111125 16:15:31 server id 2  end_log_pos 1397000       Delete_rows: table id 5304 flags: STMT_END_F

BINLOG '
o07PThMCAAAAVgAAAItQFQAAALgUAAAAAAAABnNydGFpcgAPdGFiX3V1bV9jb21wYW55AA4DD/b2
CAwMCA8PDw8PCBAAARQAFgCAAIAAQABAAEAA+D8=
o07PThkCAAAAfQAAAAhRFQAQALgUAAAAAAEADv//AP7iAAAAFAC6vNbdysC1vL/GvLzT0M/euavL
voAAAAAAOLkeQYAAAAAAAAAAAAEpAAAAAAAAANkqumA8EgAAO8kaakoSAAABAAAAAAAAAA9CVTAw
MDAwMDAzNTgwNzg=
……

2、因为binlog_format采用的是row模式,所以需要进一步解析binglog

[mysql@ezgclient mysqllog]$ mysqlbinlog -v -v mysqlbin.000150>/tmp/11_25.txt
[mysql@ezgclient mysqllog]$ vi /tmp/11_25.txt
#111125 16:15:31 server id 2  end_log_pos 1396875   Table_map: `a`.`abc` mapped to number 5304
#111125 16:15:31 server id 2  end_log_pos 1397000   Delete_rows: table id 5304 flags: STMT_END_F

BINLOG '
o07PThMCAAAAVgAAAItQFQAAALgUAAAAAAAABnNydGFpcgAPdGFiX3V1bV9jb21wYW55AA4DD/b2
CAwMCA8PDw8PCBAAARQAFgCAAIAAQABAAEAA+D8=
o07PThkCAAAAfQAAAAhRFQAQALgUAAAAAAEADv//AP7iAAAAFAC6vNbdysC1vL/GvLzT0M/euavL
voAAAAAAOLkeQYAAAAAAAAAAAAEpAAAAAAAAANkqumA8EgAAO8kaakoSAAABAAAAAAAAAA9CVTAw
MDAwMDAzNTgwNzg=
'/*!*/;
### DELETE FROM a.abc
### WHERE
###   @1=226 /* INT meta=0 nullable=0 is_null=0 */
###   @2='*****有限公司' /* VARSTRING(256) meta=256 nullable=0 is_null=0 */
###   @3=951656001 /* DECIMAL(20,0) meta=5120 nullable=0 is_null=0 */
###   @4=000000001 /* DECIMAL(22,0) meta=5632 nullable=1 is_null=0 */
###   @5=41 /* LONGINT meta=0 nullable=1 is_null=0 */
###   @6=2005-05-30 15:11:29 /* DATETIME meta=0 nullable=1 is_null=0 */
###   @7=2011-08-17 02:02:19 /* DATETIME meta=0 nullable=1 is_null=0 */
###   @8=1 /* LONGINT meta=0 nullable=1 is_null=0 */
###   @9='BU0000000358078' /* VARSTRING(128) meta=128 nullable=1 is_null=0 */
###   @10=NULL /* VARSTRING(128) meta=128 nullable=1 is_null=1 */
###   @11=NULL /* VARSTRING(128) meta=64 nullable=1 is_null=1 */
###   @12=NULL /* VARSTRING(128) meta=64 nullable=1 is_null=1 */
###   @13=NULL /* VARSTRING(128) meta=64 nullable=1 is_null=1 */
###   @14=NULL /* VARSTRING(128) meta=0 nullable=1 is_null=1 */
# at 1397000
#111125 16:15:31 server id 2  end_log_pos 1397027   Xid = 79238866
COMMIT/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;

通过上面的分析和这里的日志情况显示,很明显有人误删除了这条记录,导致我们公司所有员工不能登录im(登录在线的,不会使用到这条记录,这个也就是导致了我们到晚上八点多才发现这个异常)

3、检查error日志
检查这个日志,发现一个很明显的bug,这个是导致数据库重启,以及那个时间因为数据库重启导致ogg进程异常收到邮件

111125 16:15:35 InnoDB: Assertion failure in thread 1095162176 in file row/row0mysql.c line 1534
InnoDB: Failing assertion: index->type & DICT_CLUSTERED
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html
InnoDB: about forcing recovery.
111125 16:15:35 - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

key_buffer_size=8388608
read_buffer_size=1048576
max_used_connections=30
max_threads=1000
threads_connected=14
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 9234379 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0xb8bf170
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x4146d100 thread_stack 0x30000
/opt/mysql/product/5.1/bin/mysqld(my_print_stacktrace+0x2e)[0x8a74ce]
/opt/mysql/product/5.1/bin/mysqld(handle_segfault+0x322)[0x5dc992]
/lib64/libpthread.so.0[0x357980eb10]
/lib64/libc.so.6(gsignal+0x35)[0x3578c30265]
/lib64/libc.so.6(abort+0x110)[0x3578c31d10]
/opt/mysql/product/5.1/bin/mysqld(row_unlock_for_mysql+0x2f2)[0x7f4a52]
/opt/mysql/product/5.1/bin/mysqld(row_search_for_mysql+0x22e1)[0x802591]
/opt/mysql/product/5.1/bin/mysqld(_ZN11ha_innobase10index_readEPhPKhj16ha_rkey_function+0x192)[0x7724d2]
/opt/mysql/product/5.1/bin/mysqld(_ZN7handler16read_range_firstEPK12st_key_rangeS2_bb+0xbe)[0x6caa9e]
/opt/mysql/product/5.1/bin/mysqld(_ZN7handler22read_multi_range_firstEPP18st_key_multi_rangeS1_jbP17st_handler_buffer+0xce)[0x6c85be]
/opt/mysql/product/5.1/bin/mysqld(_ZN18QUICK_RANGE_SELECT8get_nextEv+0x127)[0x6aa557]
/opt/mysql/product/5.1/bin/mysqld[0x6c415d]
/opt/mysql/product/5.1/bin/mysqld(_Z12mysql_deleteP3THDP10TABLE_LISTP4ItemP11st_sql_listyyb+0x86c)[0x66fc5c]
/opt/mysql/product/5.1/bin/mysqld(_Z21mysql_execute_commandP3THD+0x38bf)[0x5f03af]
/opt/mysql/product/5.1/bin/mysqld(_Z11mysql_parseP3THDPKcjPS2_+0x357)[0x5f25e7]
/opt/mysql/product/5.1/bin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0xe63)[0x5f3453]
/opt/mysql/product/5.1/bin/mysqld(_Z10do_commandP3THD+0xe6)[0x5f3d16]
/opt/mysql/product/5.1/bin/mysqld(handle_one_connection+0x236)[0x5e66d6]
/lib64/libpthread.so.0[0x357980673d]
/lib64/libc.so.6(clone+0x6d)[0x3578cd3d1d]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort...
thd->query at 0x2aab4890fcd0 is an invalid pointer
thd->thread_id=62259
thd->killed=NOT_KILLED
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
111125 16:15:35 mysqld_safe Number of processes running now: 0
111125 16:15:35 mysqld_safe mysqld restarted
InnoDB: Log scan progressed past the checkpoint lsn 0 694228728
111125 16:15:36 InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Doing recovery: scanned up to log sequence number 0 694229872
111125 16:15:36 InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents: 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 5
6 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 
InnoDB: Apply batch completed
InnoDB: Last MySQL binlog file position 0 1397027, file name /opt/mysql/mysqldata/mysqllog/mysqlbin.000149
111125 16:15:37 InnoDB: Started; log sequence number 0 694229872
111125 16:15:37 [Note] Recovering after a crash using /opt/mysql/mysqldata/mysqllog/mysqlbin
111125 16:15:37 [Note] Starting crash recovery...
111125 16:15:37 [Note] Crash recovery finished.

而且还有个巧合就是查看binlog发现,DELETE FROM a.abc where ……之后,数据库就因为这个bug自动重启了。

4、网友解释

# /opt/mysql/product/5.1/bin/mysqld(_Z12mysql_deleteP3THDP10TABLE_LISTP4ItemP11st_sql_listyyb+0x86c)[0x66fc5c]
# /opt/mysql/product/5.1/bin/mysqld(_Z21mysql_execute_commandP3THD+0x38bf)[0x5f03af]
---从这信息看是整理簇索引,导致表空间出现损坏
---分析的情况,你的系统应该正在做一个DELETE操作,而且应该无索引可走,删除的数据量也比较大
---可能是大量数据被缓存在innodb_buffer_pool_size中,并且其内部有创建自适应的hash索引,因删除数据而不得不重新创建,
---以及你的服务器当时IO出现瓶颈,导致一时无法响应Innodb master thread,而出现问题,并且InnoDB引擎在此方面出现过BUG
---解决版本是5.1.37之后,所以建议使用:5.1.40版本,较稳定

感谢jinguanding前辈热情帮助
http://www.itpub.net/forum.php?mod=viewthread&tid=1515971&page=1#pid18593129