Blog dedicated to Oracle Applications (E-Business Suite) Technology; covers Apps Architecture, Administration and third party bolt-ons to Apps

Saturday, April 11, 2015

opatch hangs on /sbin/fuser oracle

Pipu pinged me today about opatch hanging. The opatch log showed this:

[Apr 11, 2015 5:24:13 PM]    Start fuser command /sbin/fuser $ORACLE_HOME/bin/oracle at Sat Apr 11 17:24:13 EDT 2015

I had faced this issue once before, but was not able to recall what was the solution.  So I started fresh.

As oracle user:

/sbin/fuser $ORACLE_HOME/bin/oracle hung

As root user

/sbin/fuser $ORACLE_HOME/bin/oracle hung

As root user

lsof hung.

Google searches about it brought up a lot of hits about NFS issues.  So I did df -h.

df -h also hung.

So I checked /var/log/messages and found many messages like these:

Apr 11 19:44:42 erpserver kernel: nfs: server share.justanexample.com not responding, still trying

That server has a mount called /R12.2stage that has the installation files for R12.2.

So I tried unmounting it:

umount /R12.2stage
Device Busy

umount -f /R12.2stage
Device Busy

umount -l /R12.2stage

df -h didn't hang any more.

Next I did strace /sbin/fuser $ORACLE_HOME/bin/oracle and it stopped here:

open("/proc/12854/fdinfo/3", O_RDONLY)  = 7
fstat(7, {st_mode=S_IFREG|0400, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b99de014000
read(7, "pos:\t0\nflags:\t04002\n", 1024) = 20
close(7)                                = 0
munmap(0x2b99de014000, 4096)            = 0
getdents(4, /* 0 entries */, 32768)     = 0
close(4)                                = 0
stat("/proc/12857/", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
open("/proc/12857/stat", O_RDONLY)      = 4
read(4, "12857 (bash) S 12853 12857 12857"..., 4096) = 243
close(4)                                = 0
readlink("/proc/12857/cwd", "11.2.0.4/examples (deleted)"..., 4096) = 27
rt_sigaction(SIGALRM, {0x411020, [ALRM], SA_RESTORER|SA_RESTART, 0x327bc30030}, {SIG_DFL, [ALRM], SA_RESTORER|SA_RESTART, 0x327bc30030}, 8) = 0
alarm(15)                               = 0
write(5, "@\20A\0\0\0\0\0", 8)          = 8
write(5, "\20\0\0\0", 4)                = 4
write(5, "/proc/12857/cwd\0", 16)       = 16
write(5, "\220\0\0\0", 4)               = 4
read(6,  

It stopped here. So I did Ctrl+C
# ps -ef |grep 12857
oracle   12857 12853  0 Apr10 pts/2    00:00:00 -bash
root     21688  2797  0 19:42 pts/8    00:00:00 grep 12857

Killed this process

# kill -9 12857

Again I did strace /sbin/fuser $ORACLE_HOME/bin/oracle and it stopped at a different process this time that was another bash process.  I killed that process also.

I executed it for 3rd time: strace /sbin/fuser $ORACLE_HOME/bin/oracle

This time it completed.

Ran it without strace

/sbin/fuser $ORACLE_HOME/bin/oracle

It came out in 1 second.

Then I did the same process for lsof

strace lsof

and killed those processes were it was getting stuck.  Eventually lsof also worked.

Pipu retried opatch and it worked fine.

Stale NFS mount was the root cause of this issue.  It was stale because the source server was down for Unix security patching during weekend.
 

No comments: