After positive timezone change, a clustered sakai instance starts suddenly randomly removing it's peers nodes from the cluster. This manifests itself in reporting almost empty nodes in the "On Line" view and 2 types of messages in the log:
First, there is one node saying:
WARN: run(): ghost-busting server: server2-1300294315708 from : server1-1301067882737 (2011-03-28 04:07:25,786 SakaiClusterService.Maintenance_org.sakaiproject.cluster.impl.SakaiClusterService)
Second, the other nodes report to be kicked offline:
WARN: run(): server has been closed in cluster table, reopened: server2-1300294315708 (2011-03-28 14:15:57,347 SakaiClusterService.Maintenance_org.sakaiproject.cluster.impl.SakaiClusterService)
The cause of the issue:
After a Sakai system changes timezone forward (+1) or more (due to daylight saving time, DST) the open connections to an Oracle database do not change their session timezone, or at least the CURRENT_TIMESTAMP function starts to behave unexpected.
That will cause the ghost-busting service to kick in (because the limit is set to 10 minutes). The suspect of causing this problem is the use of CURRENT_TIMESTAMP in combination with a DATE field. This hypothesis, based on the Metalink documentation will be confirmed by a test that will be attached later.
The the connection pool determines the scope of this issue: if the pool refreshes it's connections due to pool resizing, this issue fades away slowly, but if the pool keeps it's connections forever it will remain for longer periods of time.