[Mulgara-dev] random heap-space exhaustions

Tue Feb 6 05:20:33 UTC 2007

On Mon, Feb 05, 2007 at 10:07:14PM -0600, Paul Gearon wrote:
> I haven't seen this either, so I'm stabbing in the dark.
> 
> I have a couple of questions.  Since it's always the same queries which
> fail, what's their basic structure?

  select $p $o from <model1> where
  ( $s $p $o in <model2> and
    ( $s <mulgara:is> <url1> or $s <mulgara:is> <url2> or $s <pred1> <url1> or
      ( trans($s <pred1> $res) and $res <mulgara:is> <url1> )))
   or
  ( $s $impliedBy $o in <model2> and
    ( $impliedBy <pred2> $p or trans($impliedBy <pred2> $p)) and
    ( $s <mulgara:is> <url1> or $s <mulgara:is> <url2> or $s <pred1> <url1> or
      ( trans($s <pred1> $res) and $res <mulgara:is> <url1> )))

(the duplication the last two lines is because of bug MGR-36).

> Also, can we get a stack trace of the *first* of your failures please?  It
> probably won't help a lot, but you never know.

Nope, OOM stack traces are rarely useful, and this ones no exception.
But here it is anyway:

  org.mulgara.itql.ItqlInterpreterException: org.mulgara.query.QueryException: Error ending previous query
          at org.mulgara.itql.ItqlInterpreter.setLastError(ItqlInterpreter.java:2479)
          at org.mulgara.itql.ItqlInterpreter.outASelectCommand(ItqlInterpreter.java:1231)
          at org.mulgara.itql.analysis.DepthFirstAdapter.caseASelectCommand(DepthFirstAdapter.java:497)
          at org.mulgara.itql.node.ASelectCommand.apply(ASelectCommand.java:30)
          at org.mulgara.itql.analysis.DepthFirstAdapter.caseACommandStart(DepthFirstAdapter.java:51)
          at org.mulgara.itql.node.ACommandStart.apply(ACommandStart.java:30)
          at org.mulgara.itql.analysis.DepthFirstAdapter.caseStart(DepthFirstAdapter.java:31)
          at org.mulgara.itql.node.Start.apply(Start.java:33)
          at org.mulgara.itql.ItqlInterpreter.executeCommand(ItqlInterpreter.java:572)
          ....
  Caused by: org.mulgara.query.QueryException: Error ending previous query
          at org.mulgara.resolver.DatabaseSession.endTransactionalBlock(DatabaseSession.java:1039)
          at org.mulgara.resolver.DatabaseSession.endPreviousQueryTransaction(DatabaseSession.java:1091)
          at org.mulgara.resolver.DatabaseSession.executeQuery(DatabaseSession.java:1383)
          at org.mulgara.resolver.DatabaseSession.query(DatabaseSession.java:545)
          at org.mulgara.itql.ItqlInterpreter.outASelectCommand(ItqlInterpreter.java:1201)
          ... 39 more
  Caused by: javax.transaction.RollbackException
          at org.objectweb.jotm.TransactionImpl.commit(TransactionImpl.java:225)
          at org.objectweb.jotm.Current.commit(Current.java:442)
          at org.mulgara.resolver.DatabaseSession.endTransactionalBlock(DatabaseSession.java:1026)
          ... 43 more
  Caused by: java.lang.OutOfMemoryError: Java heap space

> One last thing, and this may be difficult...  When you see one of these
> queries taking >20 seconds, would someone be able to hit Ctrl-\ on the
> server?  If we can get an idea of what the program is spending time doing,
> then it may help.

I would've done so if I could. But the problem disappears too quickly
again. Though I might try and see if we can rig up something to
monitor the process's rss and trigger send a few sigquit's if it goes
above some threshhold.

  Cheers,

  Ronald

> On 2/5/07, Andrae Muys <andrae at netymon.com> wrote:
> >
> >
> >On 06/02/2007, at 10:39 AM, Life is hard, and then you die wrote:
> >
> >>
> >> We've been seeing random heap exhaustion errors, and were wondering if
> >> anybody else had seen this.
> >>
> >> Specifically, on average about every 3 days (with an average of around
> >> 17'500 queries a day) we suddenly get a bunch of queries (anywhere
> >> between 10 and 140) failing with an "OutOfMemoryError: Java heap
> >> space".
> >> The queries involved are always the same (modulo parameters), but
> >> these queries work just fine the rest of the time. Furthermore, I've
> >> extracted the exact queries from the log and run them against the
> >> database and they worked.
> >>
> >> Another observation is that in the normal case these queries take
> >> around .5 to .6 seconds to complete, but in the failure case they run
> >> for about 25 seconds before they run out of heap space - so it looks
> >> like a case of something running amok, rather than the query per se
> >> having a problem. And it therefore seems unlikely that additional heap
> >> space would help.
> >
> >I haven't seen this behaviour before personally, but from the
> >description the top suspect would be a resource leak that is being
> >reclaimed by the garbage collector.  At least this would be where I
> >would start looking for the bug unless there is something in the logs
> >to indicate what else might be happening.
> >
> >> Lastly, this occurs for bunch of queries, and then everything settles
> >> down again things run fine for another few days. In fact, there are
> >> queries that start after the failed queries started and complete
> >> successfully before the heap errors, and there are even queries that
> >> start before the heap errors are reported and complete after the first
> >> errors. So it's not affecting everything when it happens.
> >>
> >> Anyway, we have not been able to reproduce this in any test setup. So
> >> we were just wondering if anybody has ever seen anything like this
> >> with Mulgara (or any of its predecessors).
> >
> >Is the test setup also 64-bit?  And is it also running on the same
> >platform?  (btw. what platform are you currently using?)
> >
> >> Btw., this is running jdk 1.5.0-09 (64-bit).
> >
> >Is this running in 64 or 32 bit mode?  I'm assuming 64-bit.
> >
> >Andrae
> >
> >--
> >Andrae Muys
> >andrae at netymon.com
> >Principal Mulgara Consultant
> >Netymon Pty Ltd
> >