Or: FND_USER table and blocking locks.. I don’t mean recover as in backup, by the way.
Some disclaimers: This scenario happened in an 11.5.10 system, and seems to have been a general problem not specific to our site, but as I haven’t been able to research it further, I don’t know if it could happen elsewhere as well. And even if I couldn’t think of a way around it, some of you might :) Furthermore, the following shouldn’t be a problem for those using the new User Management (UMX) HTML interface to manage EBS users, instead of the older, Forms based “Administer Users” interface. (UMX became available as a patch to 11.5.10, if I remember correctly).
Since the only resolution to this I (and Oracle support) know of, is to shutdown abort the running production EBS database, which is rather grave, I’m hereby noting down our preventive work-around, which banished this issue from our site.
The issue (which happened twice in a couple of weeks):
During normal operation in a non-peak period, typically in the late morning, the Oracle E-Business Suite system would suddenly not accept any more logins. Over the course of perhaps 15 minutes after this, existing users started having trouble accessing different HTML/JSP pages, and after that, the whole system effectively froze up.
The first time this happened, there was no time for troubleshooting, except to discover that there were lots of sessions waiting to get a lock for the FND_USER table, so we had to shut down the services and abort the database. After a call to Oracle support revealed that they did not know of any other way around our situation.
The second time, I was a little more prepared, and though I couldn’t react in time to kill the blocking sessions fast enough the resolve the situation, I did have time to detect a pattern (I’ll get to this in a second).
So the issue in a little more detail, was that – as is normal – each new user session wanted to get hold of their entry in the FND_USER table, to update the TIMESTAMP field. And the already existing sessions also expect to be able to do this once in a while. This was not possible, since one session sat with a blocking lock for it’s own entry. I found this strange, and at least expected to be able to resolve everything by contacting the user to confirm I could kill the session, and by this letting the waiting sessions go on with their business.
But every time the blocker was killed, the next session waiting in line, took over and did not release the lock. And since every user who experienced some kind of hang situation in their session, restarted their browsers and tried to log back in, the waiting line grew much more quickly than it was possible to kill off sessions.
Since there was just a limited amount of time available to experiment with this, I still have no clear understanding of this behaviour. (I had to resort to bouncing/aborting the system at this occasion as well, since that can be done fairly quickly, albeit with the possible necessity of some cleanup work afterwards.)
But the pattern was simply that someone with System Administrator rights had opened the FND_USER entry (in a Form, the correct way) for the SYSADMIN user, for changing its list of Responsibilities. And they had left the Form window open long enough for another user to try the same thing, which was all that’s needed. So if you have a test EBS system you’re not afraid to crash; this could be a useful exercise: Open the SYSADMIN user, change for example an end-date of one of the Responsibilities (back and forth), leave it open, and have another user with Rights attempt to change some other aspect of the SYSADMIN user. If the second session hangs, waiting for the first, have some other users try to log in and out normally, possibly starting up Forms. If you see the above described behaviour, well, then you know it needs to be prevented.
It’s simple enough to avoid, of course. The only policy necessary at our site, instated after the second instance of this issue, was: “Every change to EBS user properties must immediately be followed by closing the form in question”.
As I said, simple enough, and probably very obvious to most EBS sites. But users don’t necessarily think that way, figuring instead that the System is all-powerful and can protect itself against stuff like this. By the way, for those who are wondering, we did also take this opportunity to put in place similar policies for other forms than just the user management. Anyway, the issue never arose again.