statement suspended, wait error to be cleared

一、现场报告
导入数据到Processing object type SCHEMA_EXPORT/TABLE/INDEX/INDEX的时候,hang住了,求救

[oracle@TestServer-RHAS-5 dmpdir]$ impdp system/ DIRECTORY=dmpdir DUMPFILE=cscnew.20111123.dmp LOGFILE=cscnew.20111123.log SCHEMAS=CSCNEW remap_schema=CSCNEW:TESTB remap_tablespace=CSC_TAB_1:TESTB table_exists_action=replace
…………
. . imported "TESTB"."TAB_CS_SELF_WORKTIME"                  0 KB       0 rows
. . imported "TESTB"."TAB_CS_SELF_WORKTIME_DETAIL"           0 KB       0 rows
. . imported "TESTB"."TAB_CS_USERMENU"                       0 KB       0 rows
. . imported "TESTB"."TAB_PUB_BANK"                          0 KB       0 rows
. . imported "TESTB"."TAB_PUB_BUSISRVINFO"                   0 KB       0 rows
. . imported "TESTB"."TAB_PUB_CONTACT"                       0 KB       0 rows
Processing object type SCHEMA_EXPORT/TABLE/GRANT/OWNER_GRANT/OBJECT_GRANT
Processing object type SCHEMA_EXPORT/TABLE/INDEX/INDEX

二、处理过程
1、分析是否是impdp是否因为网络等情况终止掉

[oracle@TestServer-RHAS-5 ~]$ ps -ef|grep impdp
oracle    2520  1837  0 09:59 pts/8    00:00:00 grep impdp
oracle   23819 20966  0 09:39 pts/6    00:00:00 impdp         DIRECTORY=dmpdir DUMPFILE=cscnew.20111123.dmp LOGFILE=cscnew.20111123.log SCHEMAS=CSCNEW remap_schema=CSCNEW:TESTB remap_tablespace=CSC_TAB_1:TESTB table_exists_action=replace
[oracle@TestServer-RHAS-5 ~]$ ps -ef|grep LOCAL=YES
oracle    2692  1837  0 10:00 pts/8    00:00:00 grep LOCAL=YES
oracle   10754 10694  0 09:15 ?        00:00:09 oraclemcrm (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
oracle   23835 23819  0 09:40 ?        00:00:00 oraclemcrm (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))

通过上面的查询,证明impdp进程工作正常

2、查询等待事件

[oracle@TestServer-RHAS-5 ~]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Thu Nov 24 10:00:26 2011

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> select event from v$session_wait where wait_class#<>6; 

EVENT
----------------------------------------------------------------
SQL*Net message to client
statement suspended, wait error to be cleared

通过这个查询,发现一个异常等待事件:statement suspended, wait error to be cleared。
查询MOS,确定是表空间不足引起impdp suspended
Statement Suspended, Wait Error To Be Cleared Wait Event [ID 761848.1]

Oracle Database provides a means for suspending, and later resuming, 
the execution of large database operations in the event of space allocation failures. 
This enables you to take corrective action instead of the Oracle Database server returning an error to the user. 
After the error condition is corrected, the suspended operation automatically resumes. 
This feature is called resumable space allocation. The statements that are affected are called resumable statements. 
The time between suspending the execution till correction of the error is reported as 
"statement suspended, wait error to be cleared" wait event.

3、查看alert.log日志文件确认

[oracle@TestServer-RHAS-5 ~]$ cd /opt/oracle/admin/mcrm/bdump/
[oracle@TestServer-RHAS-5 bdump]$ tail -30 alert_mcrm.log 
Thu Nov 24 09:29:20 2011
create tablespace testb
datafile '/opt/oradata/mcrm/testb.dbf'
size 1500M autoextend on next 50M maxsize 2000M
Thu Nov 24 09:29:51 2011
Completed: create tablespace testb
datafile '/opt/oradata/mcrm/testb.dbf'
size 1500M autoextend on next 50M maxsize 2000M
Thu Nov 24 09:40:00 2011
The value (30) of MAXTRANS parameter ignored.
kupprdp: master process DM00 started with pid=111, OS id=23858
         to execute - SYS.KUPM$MCP.MAIN('SYS_IMPORT_SCHEMA_01', 'SYSTEM', 'KUPC$C_1_20111124094000', 'KUPC$S_1_20111124094000', 0);
kupprdp: worker process DW01 started with worker id=1, pid=112, OS id=23870
         to execute - SYS.KUPW$WORKER.MAIN('SYS_IMPORT_SCHEMA_01', 'SYSTEM');
Thu Nov 24 09:43:11 2011
statement in resumable session 'SYSTEM.SYS_IMPORT_SCHEMA_01.1' was suspended due to
    ORA-01652: unable to extend temp segment by 128 in tablespace TESTB
Thu Nov 24 10:00:45 2011
Thread 1 advanced to log sequence 4761 (LGWR switch)
  Current log# 3 seq# 4761 mem# 0: /opt/oradata/mcrm/redo03.log

4、查询TESTB表空间使用情况

SQL> select bytes/1024/1024,maxbytes/1024/1024,user_bytes/1024/1024 
  2  from dba_data_files where tablespace_name='TESTB';

BYTES/1024/1024 MAXBYTES/1024/1024 USER_BYTES/1024/1024
--------------- ------------------ --------------------
           2000               2000            1998.9375

5、解决问题

Thu Nov 24 10:04:21 2011
alter tablespace TESTB add datafile '/opt/oradata/mcrm/testb01.dbf' size 100m  autoextend on next 1m maxsize 30g
Thu Nov 24 10:04:25 2011
Completed: alter tablespace TESTB add datafile '/opt/oradata/mcrm/testb01.dbf' size 100m  autoextend on next 1m maxsize 30g
Thu Nov 24 10:04:26 2011
statement in resumable session 'SYSTEM.SYS_IMPORT_SCHEMA_01.1' was resumed

通过这个日志可以看出,表空间不足的问题解决后(可以添加数据文件,或者resize数据文件大小),impdp的job又开始运行

ORA-09968, ORA-01102 When Starting a Database

一、网友错误
pub上网友遇到一个问题

Tue Nov 22 10:31:19 2011
ALTER DATABASE   MOUNT
Tue Nov 22 10:31:19 2011
sculkget: failed to lock /u01/app/oracle/product/10.2.01/db_1/dbs/lkORCL exclusive
sculkget: lock held by PID: 26308
Tue Nov 22 10:31:19 2011
ORA-09968: unable to lock file
Linux Error: 11: Resource temporarily unavailable
Additional information: 26308
Tue Nov 22 10:31:19 2011
ORA-1102 signalled during: ALTER DATABASE   MOUNT...

我给的建议是重启数据库解决,其实重启数据库是关闭了当前开启的实例,然后开启报错的实例,所以重启成功

二、错误重现

Tue Nov 22 10:31:19 2011
ALTER DATABASE   MOUNT
Tue Nov 22 10:31:19 2011
sculkget: failed to lock /u01/app/oracle/product/10.2.01/db_1/dbs/lkORCL exclusive
sculkget: lock held by PID: 26308
Tue Nov 22 10:31:19 2011
ORA-09968: unable to lock file
Linux Error: 11: Resource temporarily unavailable
Additional information: 26308
Tue Nov 22 10:31:19 2011
ORA-1102 signalled during: ALTER DATABASE   MOUNT...




[oracle@ECP-UC-DB1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Wed Nov 23 09:07:21 2011

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> select status from v$instance;

STATUS
------------
OPEN

SQL> show parameter name ;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_file_name_convert                 string
db_name                              string      test
db_unique_name                       string      test
global_names                         boolean     FALSE
instance_name                        string      test
lock_name_space                      string
log_file_name_convert                string
service_names                        string      test
SQL> show parameter control;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
control_file_record_keep_time        integer     7
control_files                        string      /opt/oracle/oradata/test/contr
                                                 ol01.ctl
SQL> create pfile='/tmp/t_pfile' from spfile;

File created.

SQL> exit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
[oracle@ECP-UC-DB1 ~]$ vi /tmp/t_pfile 

*.__db_cache_size=67108864
*.__java_pool_size=4194304
*.__large_pool_size=4194304
*.__shared_pool_size=117440512
*.__streams_pool_size=8388608
*.archive_lag_target=0
*.audit_file_dest='/opt/oracle/admin/test/adump'
*.background_dump_dest='/opt/oracle/admin/test/bdump'
*.compatible='10.2.0.3.0'
*.control_files='/opt/oracle/oradata/test/control01.ctl'
*.core_dump_dest='/opt/oracle/admin/test/cdump'
*.db_block_size=8192
*.db_domain=''
*.db_file_multiblock_read_count=16
*.db_name='test'
*.db_recovery_file_dest='/opt/oracle/flash_recovery_area'
*.db_recovery_file_dest_size=2147483648
*.dispatchers='(PROTOCOL=TCP) (SERVICE=testXDB)'
*.job_queue_processes=10
*.log_archive_dest_1='location=/opt/oracle/oradata/test/archivelog'
*.open_cursors=1000
*.pga_aggregate_target=66060288
*.processes=150
*.remote_login_passwordfile='EXCLUSIVE'
*.sga_target=209715200
*.undo_management='AUTO'
*.undo_tablespace='UNDOTBS1'
*.user_dump_dest='/opt/oracle/admin/test/udump'
~
~

"/tmp/t_pfile" 28L, 1043C written                                                                                                          
[oracle@ECP-UC-DB1 ~]$ export ORACLE_SID=tt1
[oracle@ECP-UC-DB1 ~]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Wed Nov 23 09:10:47 2011

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

Connected to an idle instance.

SQL> startup pfile='/tmp/t_pfile' mount;
ORACLE instance started.

Total System Global Area  209715200 bytes
Fixed Size                  2082784 bytes
Variable Size             134219808 bytes
Database Buffers           67108864 bytes
Redo Buffers                6303744 bytes
ORA-01102: cannot mount database in EXCLUSIVE mode


SQL> exit
Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
[oracle@ECP-UC-DB1 ~]$ more /opt/oracle/admin/test/bdump/alert_tt1.log 
Wed Nov 23 09:11:26 2011
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Shared memory segment for instance monitoring created
Picked latch-free SCN scheme 3
Autotune of undo retention is turned on. 
IMODE=BR
ILAT =18
LICENSE_MAX_USERS = 0
SYS auditing is disabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.4.0.
System parameters with non-default values:
  processes                = 150
  __shared_pool_size       = 117440512
  __large_pool_size        = 4194304
  __java_pool_size         = 4194304
  __streams_pool_size      = 8388608
  sga_target               = 209715200
  control_files            = /opt/oracle/oradata/test/control01.ctl
  db_block_size            = 8192
  __db_cache_size          = 67108864
  compatible               = 10.2.0.3.0
  log_archive_dest_1       = location=/opt/oracle/oradata/test/archivelog
  archive_lag_target       = 0
  db_file_multiblock_read_count= 16
  db_recovery_file_dest    = /opt/oracle/flash_recovery_area
  db_recovery_file_dest_size= 2147483648
  undo_management          = AUTO
  undo_tablespace          = UNDOTBS1
  remote_login_passwordfile= EXCLUSIVE
  db_domain                = 
  dispatchers              = (PROTOCOL=TCP) (SERVICE=testXDB)
  job_queue_processes      = 10
  background_dump_dest     = /opt/oracle/admin/test/bdump
  user_dump_dest           = /opt/oracle/admin/test/udump
  core_dump_dest           = /opt/oracle/admin/test/cdump
  audit_file_dest          = /opt/oracle/admin/test/adump
  db_name                  = test
  open_cursors             = 1000
  pga_aggregate_target     = 66060288
PMON started with pid=2, OS id=28086
PSP0 started with pid=3, OS id=28088
MMAN started with pid=4, OS id=28090
DBW0 started with pid=5, OS id=28092
LGWR started with pid=6, OS id=28094
CKPT started with pid=7, OS id=28096
SMON started with pid=8, OS id=28098
RECO started with pid=9, OS id=28100
CJQ0 started with pid=10, OS id=28102
MMON started with pid=11, OS id=28104
Wed Nov 23 09:11:28 2011
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
MMNL started with pid=12, OS id=28106
Wed Nov 23 09:11:28 2011
starting up 1 shared server(s) ...
Wed Nov 23 09:11:28 2011
ALTER DATABASE   MOUNT
Wed Nov 23 09:11:28 2011
sculkget: failed to lock /opt/oracle/product/10.2.0/db_1/dbs/lkTEST exclusive
sculkget: lock held by PID: 12339
Wed Nov 23 09:11:28 2011
ORA-09968: unable to lock file
Linux-x86_64 Error: 11: Resource temporarily unavailable
Additional information: 12339
Wed Nov 23 09:11:28 2011
ORA-1102 signalled during: ALTER DATABASE   MOUNT...

通过这个试验,再现了网友的ORA-09968, ORA-01102错误

三、MOS解释
ORA-09968, ORA-01102 When Starting a Database

ERROR OGG-01224 TCP/IP error 110 (Connection timed out); retries exceeded.

Cause

This is generally due due to basic issues like:
* The remote manager process is not running or is running with incorrect port number
* The RMTHOST parameter in the pump extract is not configured correctly.
In other cases, the issue could be due to firewalls that forbid the connection, 
blocking certain ports or processes. 
This is generally seen when there is a firewall between the source and target machine and 
either the ports are not open or just the manager port is open.

Solution

The Extract, Replicat and Ggsci processes use ports normally starting at port 7840 and ascend sequentially. 
The ggsci command 'send manager getportinfo detail ' will retrieve the current list of ports that have been
allocated by Manager and their corresponding process IDs.
If you have some port restrictions, then you could use the DYNAMICPORTLIST with some range so that the
collector process will allocate the ports from that range. 
In general to overcome this issue, we could do something like below 

1. Change the target manager parameter file to use something like the following 

port 7809 
dynamicportlist 7810-7820 

2. Stop and start the manager 
3. Open ports 7809 through 7820 in the firewall 
4. Re-start the source pump 
The port range used in dynamicportlist(7810-7820) and the manager port 7809 is just an example. 
You can define your own ports there and have them open. 

OGG Extract Pump abended with ERROR OGG-01224 TCP/IP error 110 (Connection timed out); retries exceeded

ogg中Time Since Chkpt显示unknown解决

1、异常现象

[oracle@localhost ~]$ ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.1.1.0.0 Build 078
Linux, x64, 64bit (optimized), Oracle 10 on Jul 28 2010 13:21:11

Copyright (C) 1995, 2010, Oracle and/or its affiliates. All rights reserved.



GGSCI (localhost.localdomain) 1> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     EXT-ECP     00:00:00      unknown     
EXTRACT     RUNNING     EXT-EDS     00:00:00      unknown     
EXTRACT     RUNNING     EXT-XZ      00:00:00      unknown     
EXTRACT     RUNNING     P-EDS       00:00:00      unknown     
EXTRACT     RUNNING     P-XZ        00:00:00      unknown     
REPLICAT    RUNNING     REP-BOS     00:00:00      unknown     

2、尝试关闭异常进程重启

GGSCI (localhost.localdomain) 2> stop *

Sending STOP request to EXTRACT EXT-ECP ...

ERROR: sending message to EXTRACT EXT-ECP (Timeout waiting for message).

Sending STOP request to EXTRACT EXT-EDS ...

ERROR: sending message to EXTRACT EXT-EDS (Timeout waiting for message).

Sending STOP request to EXTRACT EXT-XZ ...

ERROR: sending message to EXTRACT EXT-XZ (Timeout waiting for message).

Sending STOP request to EXTRACT P-EDS ...

ERROR: sending message to EXTRACT P-EDS (Timeout waiting for message).

Sending STOP request to EXTRACT P-XZ ...

ERROR: sending message to EXTRACT P-XZ (Timeout waiting for message).

Sending STOP request to REPLICAT REP-BOS ...

ERROR: sending message to REPLICAT REP-BOS (Timeout waiting for message).

GGSCI (localhost.localdomain) 3> stop mgr!

Sending STOP request to MANAGER ...
Request processed.
Manager stopped.

GGSCI (localhost.localdomain) 4> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     STOPPED                                           
EXTRACT     RUNNING     EXT-ECP     00:00:00      unknown     
EXTRACT     RUNNING     EXT-EDS     00:00:00      unknown     
EXTRACT     RUNNING     EXT-XZ      00:00:00      unknown     
EXTRACT     RUNNING     P-EDS       00:00:00      unknown     
EXTRACT     RUNNING     P-XZ        00:00:00      unknown     
REPLICAT    RUNNING     REP-BOS     00:00:00      unknown     

GGSCI (localhost.localdomain) 5> kill EXT-ECP 

ERROR: Manager not currently running.

GGSCI (localhost.localdomain) 6> kill EXT-EDS 

ERROR: Manager not currently running.


GGSCI (localhost.localdomain) 7> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     STOPPED                                           
EXTRACT     RUNNING     EXT-ECP     00:00:00      unknown     
EXTRACT     RUNNING     EXT-EDS     00:00:00      unknown     
EXTRACT     RUNNING     EXT-XZ      00:00:00      unknown     
EXTRACT     RUNNING     P-EDS       00:00:00      unknown     
EXTRACT     RUNNING     P-XZ        00:00:00      unknown     
REPLICAT    RUNNING     REP-BOS     00:00:00      unknown   

GGSCI (localhost.localdomain) 8> exit  
--使用stop 进程,stop mgr,kill 进程都不能正常关闭这些进程

3、系统系统级别kill相关ogg进程

[oracle@localhost OGG]$ ps -ef|grep /opt/OGG
oracle    7479     1  0 Nov10 ?        00:03:31 /opt/OGG/extract PARAMFILE /opt/OGG/dirprm/ext-ecp.prm REPORTFILE /opt/OGG/dirrpt/EXT-ECP.rpt PROCESSID EXT-ECP USESUBDIRS
oracle    7480     1  0 Nov10 ?        00:02:30 /opt/OGG/extract PARAMFILE /opt/OGG/dirprm/ext-eds.prm REPORTFILE /opt/OGG/dirrpt/EXT-EDS.rpt PROCESSID EXT-EDS USESUBDIRS
oracle    7482     1  0 Nov10 ?        00:03:07 /opt/OGG/extract PARAMFILE /opt/OGG/dirprm/ext-xz.prm REPORTFILE /opt/OGG/dirrpt/EXT-XZ.rpt PROCESSID EXT-XZ USESUBDIRS
oracle    7483     1  0 Nov10 ?        00:00:01 /opt/OGG/extract PARAMFILE /opt/OGG/dirprm/p-eds.prm REPORTFILE /opt/OGG/dirrpt/P-EDS.rpt PROCESSID P-EDS USESUBDIRS
oracle    7485     1  0 Nov10 ?        00:00:03 /opt/OGG/replicat PARAMFILE /opt/OGG/dirprm/rep-bos.prm REPORTFILE /opt/OGG/dirrpt/REP-BOS.rpt PROCESSID REP-BOS USESUBDIRS
oracle    7518     1  0 Nov10 ?        00:00:01 ./server -p 7847 -k -l /opt/OGG/ggserr.log
oracle    7677     1  0 Nov10 ?        00:00:15 /opt/OGG/extract PARAMFILE /opt/OGG/dirprm/p-xz.prm REPORTFILE /opt/OGG/dirrpt/P-XZ.rpt PROCESSID P-XZ USESUBDIRS
oracle   25261 25112  0 12:48 pts/1    00:00:00 grep /opt/OGG
[oracle@localhost OGG]$ kill -9 7479 7480 7482 7483 7485  7518 7677
[oracle@localhost OGG]$ ps -ef|grep /opt/OGG
oracle   25264 25112  0 12:48 pts/1    00:00:00 grep /opt/OGG

4、重启所有ogg进程

[oracle@localhost OGG]$ ggsci

Oracle GoldenGate Command Interpreter for Oracle
Version 11.1.1.0.0 Build 078
Linux, x64, 64bit (optimized), Oracle 10 on Jul 28 2010 13:21:11

Copyright (C) 1995, 2010, Oracle and/or its affiliates. All rights reserved.



GGSCI (localhost.localdomain) 1> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     STOPPED                                           
EXTRACT     ABENDED     EXT-ECP     00:00:00      unknown     
EXTRACT     ABENDED     EXT-EDS     00:00:00      unknown     
EXTRACT     ABENDED     EXT-XZ      00:00:00      unknown     
EXTRACT     ABENDED     P-EDS       00:00:00      unknown     
EXTRACT     ABENDED     P-XZ        00:00:00      unknown     
REPLICAT    ABENDED     REP-BOS     00:00:00      unknown     
--进程状态还是异常

GGSCI (localhost.localdomain) 2> start mgr

Manager started.


GGSCI (localhost.localdomain) 3> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     EXT-ECP     00:00:00      unknown     
EXTRACT     RUNNING     EXT-EDS     00:00:00      unknown     
EXTRACT     RUNNING     EXT-XZ      00:00:00      unknown     
EXTRACT     RUNNING     P-EDS       00:00:00      unknown     
EXTRACT     RUNNING     P-XZ        00:00:00      unknown     
REPLICAT    RUNNING     REP-BOS     00:00:00      unknown     
--进程起来了,但是Time Since Chkpt还是不正确

GGSCI (localhost.localdomain) 4> stop ext-ecp

Sending STOP request to EXTRACT EXT-ECP ...
Request processed.


GGSCI (localhost.localdomain) 5> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     STOPPED     EXT-ECP     unknown       00:00:02    
EXTRACT     RUNNING     EXT-EDS     00:00:00      unknown     
EXTRACT     RUNNING     EXT-XZ      00:00:00      unknown     
EXTRACT     RUNNING     P-EDS       00:00:00      unknown     
EXTRACT     RUNNING     P-XZ        00:00:00      unknown     
REPLICAT    RUNNING     REP-BOS     00:00:00      unknown     
--关闭EXT-ECP测试,状态正常

GGSCI (localhost.localdomain) 6> start ext-ecp

Sending START request to MANAGER ...
EXTRACT EXT-ECP starting


GGSCI (localhost.localdomain) 7> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     EXT-ECP     unknown       00:00:14    
EXTRACT     RUNNING     EXT-EDS     00:00:00      unknown     
EXTRACT     RUNNING     EXT-XZ      00:00:00      unknown     
EXTRACT     RUNNING     P-EDS       00:00:00      unknown     
EXTRACT     RUNNING     P-XZ        00:00:00      unknown     
REPLICAT    RUNNING     REP-BOS     00:00:00      unknown     
--Lag异常,等待恢复

GGSCI (localhost.localdomain) 8> stop ext-eds

Sending STOP request to EXTRACT EXT-EDS ...

Recovery is not complete.  This normal stop will wait and checkpoint recovery's 
work when recovery has finished. To force Extract to stop now, 
use the SEND EXTRACT EXT-EDS, FORCESTOP command.
--因为恢复没有完成导致该提示,可以忽略,等待

GGSCI (localhost.localdomain) 9> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     EXT-ECP     unknown       00:00:02    
EXTRACT     STOPPED     EXT-EDS     01:51:12      00:00:01    
EXTRACT     RUNNING     EXT-IM      00:00:00      1059:44:26  
EXTRACT     RUNNING     EXT-XZ      00:00:00      unknown     
EXTRACT     RUNNING     P-EDS       00:00:00      unknown     
EXTRACT     RUNNING     P-XZ        00:00:00      unknown     
REPLICAT    RUNNING     REP-BOS     00:00:00      unknown     


GGSCI (localhost.localdomain) 10> start ext-eds

Sending START request to MANAGER ...
EXTRACT EXT-EDS starting


GGSCI (localhost.localdomain) 11> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     EXT-ECP     99:53:02      00:00:01    
EXTRACT     RUNNING     EXT-EDS     01:51:12      00:00:10    
EXTRACT     RUNNING     EXT-XZ      00:00:00      unknown     
EXTRACT     RUNNING     P-EDS       00:00:00      unknown     
EXTRACT     RUNNING     P-XZ        00:00:00      unknown     
REPLICAT    RUNNING     REP-BOS     00:00:00      00:00:00    


GGSCI (localhost.localdomain) 12> stop ext-xz

Sending STOP request to EXTRACT EXT-XZ ...
Request processed.


GGSCI (localhost.localdomain) 13> start ext-xz

Sending START request to MANAGER ...
EXTRACT EXT-XZ starting

GGSCI (localhost.localdomain) 15> stop p-eds

Sending STOP request to EXTRACT P-EDS ...
Request processed.


GGSCI (localhost.localdomain) 16> start p-eds

Sending START request to MANAGER ...
EXTRACT P-EDS starting


GGSCI (localhost.localdomain) 17> stop p-xz

Sending STOP request to EXTRACT P-XZ ...
Request processed.


GGSCI (localhost.localdomain) 18> start p-xz

Sending START request to MANAGER ...
EXTRACT P-XZ starting


GGSCI (localhost.localdomain) 19> info all

Program     Status      Group       Lag           Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     EXT-ECP     00:00:00      00:00:01    
EXTRACT     RUNNING     EXT-EDS     00:00:00      00:00:10    
EXTRACT     RUNNING     EXT-IM      00:00:00      1059:45:28  
EXTRACT     RUNNING     EXT-XZ      00:00:00      00:00:07    
EXTRACT     RUNNING     P-EDS       00:00:00      00:00:04    
EXTRACT     RUNNING     P-XZ        00:00:00      00:00:05    
REPLICAT    RUNNING     REP-BOS     00:00:00      00:00:05    
--重启所有异常进程,ogg工作正常

GGSCI (localhost.localdomain) 20> 

5、总结处理步骤
强制关闭mgr,系统级别kill相关ogg进程,开启ogg主进程,重启相关进程