CLE: Search
  1. CLE: Search
  2. SRCH-38

"node already merging shared segments" on lone instance

    Details

    • Type: Bug Bug
    • Status: Resolved Resolved
    • Priority: Blocker Blocker
    • Resolution: Won't Fix
    • Affects Version/s: 1.2.7, 1.3.2, 1.3.5
    • Fix Version/s: 1.2.12, 1.3.8 [Tentative]
    • Component/s: search
    • Labels:
    • Previous Issue Keys:
      SAK-16947

      Description

      I have a lone instance which prints the following error into the log every 10 seconds:

      2009-09-09 11:02:27,502 ERROR Timer-4 org.sakaiproject.search.optimize.shared.impl.DbJournalOptimizationManager - This node already merging shared segments, index writer spellbreaker:1251938321559 This node is currently optimizing the shared segments, This is an error as only one copy of this node should be Active in the cluster

      In spite of the message, this instance has never been part of a cluster.

        Issue Links

          Activity

          Hide
          David Horwitz added a comment -
          This is an effect of the node merging shared segments being shut down whilest doing so. To fix run:

          update search_journal set status='committed';

          Show
          David Horwitz added a comment - This is an effect of the node merging shared segments being shut down whilest doing so. To fix run: update search_journal set status='committed';
          Hide
          Alan Berg added a comment -
          Adding this Jira number as part of the logging will give system admin's clues on how to deal with the situation without resorting to Google or searching confluence.

          Show
          Alan Berg added a comment - Adding this Jira number as part of the logging will give system admin's clues on how to deal with the situation without resorting to Google or searching confluence.
          Hide
          Anthony Whyte added a comment -
          1.3.x, r88270.
          Show
          Anthony Whyte added a comment - 1.3.x, r88270.
          Hide
          Steve Swinsburg added a comment -
          Note that the actual SQL is
          update search_journal set status='commited';

          ie with comitted mispelt as commited.

          Show
          Steve Swinsburg added a comment - Note that the actual SQL is update search_journal set status='commited'; ie with comitted mispelt as commited.
          Hide
          Steve Swinsburg added a comment -
          Reopening this because this is affecting a production instance. We shutdown Tomcat normally, but on restart we get the log errors. Running an SQL command every time this happens is not feasible.

          The search system should shut itself down properly when Tomcat shuts down.
          Show
          Steve Swinsburg added a comment - Reopening this because this is affecting a production instance. We shutdown Tomcat normally, but on restart we get the log errors. Running an SQL command every time this happens is not feasible. The search system should shut itself down properly when Tomcat shuts down.
          Hide
          David Horwitz added a comment -
          Steve this is not fixable without re-engineering search
          Show
          David Horwitz added a comment - Steve this is not fixable without re-engineering search
          Hide
          Steve Swinsburg added a comment -
          Couldn't the SQL just be run on destroy() ?
          Show
          Steve Swinsburg added a comment - Couldn't the SQL just be run on destroy() ?
          Hide
          Steve Swinsburg added a comment - - edited
          This is becoming a real pain. Frequent restarts and we need to manually run some SQL afterwards. Surely there is a way for this to be cleaned up? I know of a lot of places running Sakai that bounce app servers nightly.
          Show
          Steve Swinsburg added a comment - - edited This is becoming a real pain. Frequent restarts and we need to manually run some SQL afterwards. Surely there is a way for this to be cleaned up? I know of a lot of places running Sakai that bounce app servers nightly.
          Hide
          Steve Swinsburg added a comment -
          Here's a patch to cleanup on shutdown.

           It just runs the query that needs to be manually run on startup. I don't know much about search but if it's left in a bad state on shutdown and this needs to be run manually on startup, then this should be safe to run on shutdown yes?

          Show
          Steve Swinsburg added a comment - Here's a patch to cleanup on shutdown.  It just runs the query that needs to be manually run on startup. I don't know much about search but if it's left in a bad state on shutdown and this needs to be run manually on startup, then this should be safe to run on shutdown yes?
          Hide
          Steve Swinsburg added a comment -
          I seem to be getting a few errors after running this patch locally.

          2011-07-28 17:28:03,033 ERROR IndexManager org.sakaiproject.search.journal.impl.SharedFilesystemJournalStorage - ======================================= Lost Shared Segment =================
          /Users/steve/dev/sakai/conf/trunk/search/searchjournal/51.zip
          The above file does not exist, this should not happen and should be investigated
          2011-07-28 17:28:03,202 INFO IndexManager org.sakaiproject.search.optimize.shared.impl.OptimizeSharedTransactionListenerImpl - Merged SavePoint /Users/steve/dev/sakai/conf/trunk/search/indexwork/journal-optimize-import/53 in 46 ms /Users/steve/dev/sakai/conf/trunk/search/indexwork/journal-optimize-import/53
          2011-07-28 17:28:03,438 INFO IndexManager org.sakaiproject.search.optimize.shared.impl.OptimizeSharedTransactionListenerImpl - Merged SavePoint /Users/steve/dev/sakai/conf/trunk/search/indexwork/journal-optimize-import/52 in 235 ms /Users/steve/dev/sakai/conf/trunk/search/indexwork/journal-optimize-import/52
          2011-07-28 17:28:03,444 WARN IndexManager org.sakaiproject.search.optimize.shared.impl.JournalOptimizationOperation - Failed to complete optimize
          org.sakaiproject.search.optimize.api.OptimizedFailedIndexTransactionException: Failed to Optimize indexes
          at org.sakaiproject.search.optimize.shared.impl.OptimizeSharedTransactionListenerImpl.prepare(OptimizeSharedTransactionListenerImpl.java:294)
          at org.sakaiproject.search.transaction.impl.IndexTransactionImpl.firePrepare(IndexTransactionImpl.java:313)
          at org.sakaiproject.search.transaction.impl.IndexTransactionImpl.prepare(IndexTransactionImpl.java:147)
          at org.sakaiproject.search.optimize.shared.impl.JournalOptimizationOperation.runOnce(JournalOptimizationOperation.java:90)
          at org.sakaiproject.search.journal.impl.IndexManagementTimerTask.run(IndexManagementTimerTask.java:137)
          at java.util.TimerThread.mainLoop(Timer.java:512)
          at java.util.TimerThread.run(Timer.java:462)
          Caused by: org.apache.lucene.store.NoSuchDirectoryException: directory '/Users/steve/dev/sakai/conf/trunk/search/indexwork/journal-optimize-import/51' does not exist
          at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:516)
          at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:543)
          at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:564)
          at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:73)
          at org.apache.lucene.index.IndexReader.open(IndexReader.java:476)
          at org.apache.lucene.index.IndexReader.open(IndexReader.java:314)
          at org.sakaiproject.search.optimize.shared.impl.OptimizeSharedTransactionListenerImpl.prepare(OptimizeSharedTransactionListenerImpl.java:194)
          ... 6 more


          So instead of marking them as committed I removed the entries from the db. Search seems happier.

          2011-07-28 17:31:13,474 INFO IndexManager org.sakaiproject.search.journal.impl.MergeUpdateOperation - Local Merge Operation
          2011-07-28 17:32:53,454 INFO IndexManager org.sakaiproject.search.journal.impl.MergeUpdateOperation - Local Merge Operation


          Show
          Steve Swinsburg added a comment - I seem to be getting a few errors after running this patch locally. 2011-07-28 17:28:03,033 ERROR IndexManager org.sakaiproject.search.journal.impl.SharedFilesystemJournalStorage - ======================================= Lost Shared Segment ================= /Users/steve/dev/sakai/conf/trunk/search/searchjournal/51.zip The above file does not exist, this should not happen and should be investigated 2011-07-28 17:28:03,202 INFO IndexManager org.sakaiproject.search.optimize.shared.impl.OptimizeSharedTransactionListenerImpl - Merged SavePoint /Users/steve/dev/sakai/conf/trunk/search/indexwork/journal-optimize-import/53 in 46 ms /Users/steve/dev/sakai/conf/trunk/search/indexwork/journal-optimize-import/53 2011-07-28 17:28:03,438 INFO IndexManager org.sakaiproject.search.optimize.shared.impl.OptimizeSharedTransactionListenerImpl - Merged SavePoint /Users/steve/dev/sakai/conf/trunk/search/indexwork/journal-optimize-import/52 in 235 ms /Users/steve/dev/sakai/conf/trunk/search/indexwork/journal-optimize-import/52 2011-07-28 17:28:03,444 WARN IndexManager org.sakaiproject.search.optimize.shared.impl.JournalOptimizationOperation - Failed to complete optimize org.sakaiproject.search.optimize.api.OptimizedFailedIndexTransactionException: Failed to Optimize indexes at org.sakaiproject.search.optimize.shared.impl.OptimizeSharedTransactionListenerImpl.prepare(OptimizeSharedTransactionListenerImpl.java:294) at org.sakaiproject.search.transaction.impl.IndexTransactionImpl.firePrepare(IndexTransactionImpl.java:313) at org.sakaiproject.search.transaction.impl.IndexTransactionImpl.prepare(IndexTransactionImpl.java:147) at org.sakaiproject.search.optimize.shared.impl.JournalOptimizationOperation.runOnce(JournalOptimizationOperation.java:90) at org.sakaiproject.search.journal.impl.IndexManagementTimerTask.run(IndexManagementTimerTask.java:137) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) Caused by: org.apache.lucene.store.NoSuchDirectoryException: directory '/Users/steve/dev/sakai/conf/trunk/search/indexwork/journal-optimize-import/51' does not exist at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:516) at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:543) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:564) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:73) at org.apache.lucene.index.IndexReader.open(IndexReader.java:476) at org.apache.lucene.index.IndexReader.open(IndexReader.java:314) at org.sakaiproject.search.optimize.shared.impl.OptimizeSharedTransactionListenerImpl.prepare(OptimizeSharedTransactionListenerImpl.java:194) ... 6 more So instead of marking them as committed I removed the entries from the db. Search seems happier. 2011-07-28 17:31:13,474 INFO IndexManager org.sakaiproject.search.journal.impl.MergeUpdateOperation - Local Merge Operation 2011-07-28 17:32:53,454 INFO IndexManager org.sakaiproject.search.journal.impl.MergeUpdateOperation - Local Merge Operation
          Hide
          Matthew Buckett added a comment -
          It's a long time since I've done some search stuff but:

          If you have two nodes in a cluster (node A and node B) and the search service is running on both of them. Node A is currently merging shared segments and at that point node B is shutdown, it would run the SQL and that would imply that node A was nolonger merging segments, which would be incorrect. I think if you do tidy up in a shutdown it should only tidy up records for itself.
          Show
          Matthew Buckett added a comment - It's a long time since I've done some search stuff but: If you have two nodes in a cluster (node A and node B) and the search service is running on both of them. Node A is currently merging shared segments and at that point node B is shutdown, it would run the SQL and that would imply that node A was nolonger merging segments, which would be incorrect. I think if you do tidy up in a shutdown it should only tidy up records for itself.
          Hide
          Steve Swinsburg added a comment -
          Is there a way to determine what node the entries are from? This is happening on a single node instance when it gets shutdown but the records still show merge-prepare (or similar). Easy to refine the SQL if we have an identifier. Cheers.
          Show
          Steve Swinsburg added a comment - Is there a way to determine what node the entries are from? This is happening on a single node instance when it gets shutdown but the records still show merge-prepare (or similar). Easy to refine the SQL if we have an identifier. Cheers.
          Hide
          Steve Swinsburg added a comment -
          The indexwriter is a combination of serverId:transaction (if any) so could we just update the items that have their indexwriter column starting with the serverId? Node A and B would have different serverId's so it wouldn't affect one another in this case.
          Show
          Steve Swinsburg added a comment - The indexwriter is a combination of serverId:transaction (if any) so could we just update the items that have their indexwriter column starting with the serverId? Node A and B would have different serverId's so it wouldn't affect one another in this case.
          Hide
          Seth Theriault added a comment -
          I noticed that your patch uses "committed" instead of "commited" in the SQL update command.
          Show
          Seth Theriault added a comment - I noticed that your patch uses "committed" instead of "commited" in the SQL update command.
          Hide
          Steve Swinsburg added a comment -
          Yes, SRCH-57 made it so that it looks at both spellings.
          Show
          Steve Swinsburg added a comment - Yes, SRCH-57 made it so that it looks at both spellings.
          Hide
          David Horwitz added a comment -
          The patch looks good except that is could roll back a merge in operation by another node:

          cleanup = connection.prepareStatement("update search_journal set status = 'committed'");

          Seeing the optimization could be happening on another node - it should only roll back a merge its bussy with. I think there may be a roll back method infact that also deletes files ...

          (at a bootcamp this week so responses may be delayed)
          Show
          David Horwitz added a comment - The patch looks good except that is could roll back a merge in operation by another node: cleanup = connection.prepareStatement("update search_journal set status = 'committed'"); Seeing the optimization could be happening on another node - it should only roll back a merge its bussy with. I think there may be a roll back method infact that also deletes files ... (at a bootcamp this week so responses may be delayed)
          Hide
          Anthony Whyte added a comment -
          1.2.x r96240.
          Show
          Anthony Whyte added a comment - 1.2.x r96240.
          Hide
          Steve Swinsburg added a comment -
          The cleanup still needs to be performed on shutdown.
          Show
          Steve Swinsburg added a comment - The cleanup still needs to be performed on shutdown.
          Hide
          Seth Theriault added a comment -
          What's the status of Steve's clean-up-on-destroy patch?
          Show
          Seth Theriault added a comment - What's the status of Steve's clean-up-on-destroy patch?
          Hide
          Steve Swinsburg added a comment -
          I think the SQL needs to be adjusted to bind to only the current node. Or see David's note above (02-Aug-2011)
          Show
          Steve Swinsburg added a comment - I think the SQL needs to be adjusted to bind to only the current node. Or see David's note above (02-Aug-2011)
          Hide
          Sam Ottenhoff added a comment -
          Per CLE Conf call: can someone clarify how we can wrap up this ticket?
          Show
          Sam Ottenhoff added a comment - Per CLE Conf call: can someone clarify how we can wrap up this ticket?
          Hide
          Steve Swinsburg added a comment -
          Someone more familiar with search needs to have a look at this and figure out the safest way to terminate the indexer for a node when it is shutdown
          Show
          Steve Swinsburg added a comment - Someone more familiar with search needs to have a look at this and figure out the safest way to terminate the indexer for a node when it is shutdown
          Hide
          Matthew Jones added a comment -
          I'm not even sure you can reliably guarantee cleanup on shutdown because someone could just kill the process and not run the tomcat shutdown script. This seems like it would really need to be something that runs at startup, but how do you determine what to reset and what to leave? On a single node environment, just reset or delete everything.

          On a clustered environment, you can assume that the first machine that came up reset (or deleted) everything already.

          So here's my suggestion, on the search init, check check to see how many cluster servers there are (getServers?), if there are none, run this cleanup sql? Otherwise there will just be a lot of errors and manual cleanup involved.
          Show
          Matthew Jones added a comment - I'm not even sure you can reliably guarantee cleanup on shutdown because someone could just kill the process and not run the tomcat shutdown script. This seems like it would really need to be something that runs at startup, but how do you determine what to reset and what to leave? On a single node environment, just reset or delete everything. On a clustered environment, you can assume that the first machine that came up reset (or deleted) everything already. So here's my suggestion, on the search init, check check to see how many cluster servers there are (getServers?), if there are none, run this cleanup sql? Otherwise there will just be a lot of errors and manual cleanup involved.
          Hide
          Swe Aung added a comment - - edited
          Hi
          I am facing the same error after restarting the nodes. I tried running above SQL command, but the error did not stop. I am running 2.8.0 with 2 nodes setup with Oracle 10g database.
          Please advise. Your prompt reply is appreciated.
          Show
          Swe Aung added a comment - - edited Hi I am facing the same error after restarting the nodes. I tried running above SQL command, but the error did not stop. I am running 2.8.0 with 2 nodes setup with Oracle 10g database. Please advise. Your prompt reply is appreciated.
          Hide
          John Bush added a comment - - edited
          This will not be an issue with the elasticsearch impl the whole way the storage of index data is managed is completely different. The recommended solution to addressing these sort of issues is to switch over to elasticsearch in Sakai 10. These types of issues is the whole reason search was re-implemented.
          Show
          John Bush added a comment - - edited This will not be an issue with the elasticsearch impl the whole way the storage of index data is managed is completely different. The recommended solution to addressing these sort of issues is to switch over to elasticsearch in Sakai 10. These types of issues is the whole reason search was re-implemented.
          Hide
          Matthew Jones added a comment -
          Is elasticsearch the default in trunk? How is it being activated/switched from the lucene search? It looks like the parallelIndexComponents.xml and the optimizer in that are still active and I'm still seeing this error running of recent trunk build. Do you have any examples of avoiding this, how to use SRCH-119 or why this would still be happening?
          Show
          Matthew Jones added a comment - Is elasticsearch the default in trunk? How is it being activated/switched from the lucene search? It looks like the parallelIndexComponents.xml and the optimizer in that are still active and I'm still seeing this error running of recent trunk build. Do you have any examples of avoiding this, how to use SRCH-119 or why this would still be happening?
          Hide
          John Bush added a comment -
          I don't think elasticsearch is the default in trunk right now, we've gone back and forth on that. I think it should be the default, b/c while the solr option might be just as good (i have no experience with it) it does require another server be setup. To enable elasticsearch as the default the following should be set:

          search.enable=true
          elasticsearch.http.enabled=true
          elasticsearch.http.port=9200
          search.service.impl=org.sakaiproject.search.elasticsearch.ElasticSearchService
          search.indexbuilder.impl=org.sakaiproject.search.elasticsearch.ElasticSearchIndexBuilder
          Show
          John Bush added a comment - I don't think elasticsearch is the default in trunk right now, we've gone back and forth on that. I think it should be the default, b/c while the solr option might be just as good (i have no experience with it) it does require another server be setup. To enable elasticsearch as the default the following should be set: search.enable=true elasticsearch.http.enabled=true elasticsearch.http.port=9200 search.service.impl=org.sakaiproject.search.elasticsearch.ElasticSearchService search.indexbuilder.impl=org.sakaiproject.search.elasticsearch.ElasticSearchIndexBuilder
          Hide
          Matthew Jones added a comment -
          Elasticsearch is what's in there out of the box, certainly lucene shouldn't be the default for 10 and SOLR requires some configuration that doesn't exist. Thanks for the info. I can create a new jira to make ElasticSearchService the default too..
          Show
          Matthew Jones added a comment - Elasticsearch is what's in there out of the box, certainly lucene shouldn't be the default for 10 and SOLR requires some configuration that doesn't exist. Thanks for the info. I can create a new jira to make ElasticSearchService the default too..

            People

            • Assignee:
              John Bush
              Reporter:
              Joshua Swink
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: