Uploaded image for project: 'Sakai'
  1. Sakai
  2. SAK-27700

Local node can get into inconsistent state

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: RESOLVED
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 10.0
    • Fix Version/s: 2.7.2
    • Component/s: Search
    • Labels:
      None
    • Previous Issue Keys:
      SAK-13127, SRCH-16

      Description

      This may be related to restarting an app server. Observed symptoms:

      The segments list does not match the local index contents:

      Reading Segment List /usr/local/searchindex/index/local-segments
      Contains 51 segments(s)
      /usr/local/searchindex/index-import/3600
      /usr/local/searchindex/index-import/3601
      ...
      /usr/local/searchindex/index-import/3649
      /usr/local/searchindex/index-import/3650
      Done

      However, there are only 2 segments actually present:

      srvslscle002:/usr/local/searchindex # du -ks index-import/*
      40 index-import/3650
      16 index-import/3651

      Database shows this node current to 3650:

      mysql> select * from search_node_status;
      ---------------------------

      jid jidts serverid

      ---------------------------

      3650 1204361837688 vula2a
      3946 1204450153001 vula3a
      3946 1204450046089 vula4a
      3946 1204450051758 vula5a

      ---------------------------
      4 rows in set (0.00 sec)

      As a result, the local index cannot be opened:

      ERROR: Failed to get an index searcher (2008-03-02 11:45:03,115 http-8443-Processor19_org.sakaiproject.search.component.service.impl.BaseSearchServiceImpl)
      java.io.FileNotFoundException: /usr/local/searchindex/index-import/3600/segments (No such file or directory)
      at java.io.RandomAccessFile.open(Native Method)
      at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
      at org.apache.lucene.store.FSIndexInput$Descriptor.<init>(FSDirectory.java:425)
      at org.apache.lucene.store.FSIndexInput.<init>(FSDirectory.java:434)
      at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:324)
      at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:45)
      at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:148)
      at org.apache.lucene.store.Lock$With.run(Lock.java:109)
      at org.apache.lucene.index.IndexReader.open(IndexReader.java:143)
      at org.apache.lucene.index.IndexReader.open(IndexReader.java:133)
      at org.sakaiproject.search.journal.impl.JournaledFSIndexStorage.getIndexReaderInternal(JournaledFSIndexStorage.java:808)
      at org.sakaiproject.search.journal.impl.JournaledFSIndexStorage.getIndexReader(JournaledFSIndexStorage.java:712)
      at org.sakaiproject.search.journal.impl.JournaledFSIndexStorage.loadIndexSearcherInternal(JournaledFSIndexStorage.java:519)
      at org.sakaiproject.search.journal.impl.JournaledFSIndexStorage.getIndexSearcher(JournaledFSIndexStorage.java:503)
      at org.sakaiproject.search.journal.impl.ParallelIndexStorage.getIndexSearcher(ParallelIndexStorage.java:166)
      at org.sakaiproject.search.component.service.impl.BaseSearchServiceImpl.getIndexSearcher(BaseSearchServiceImpl.java:482)
      at org.sakaiproject.search.component.service.impl.BaseSearchServiceImpl.getNDocs(BaseSearchServiceImpl.java:496)
      at org.sakaiproject.search.component.service.impl.ConcurrentSearchServiceImpl.getSearchStatus(ConcurrentSearchServiceImpl.java:115)

      and local merge operations fail:

      WARN: Failed to compete merge of 3651 (2008-03-02 11:45:13,921 Timer-2_org.sakaiproject.search.journal.impl.MergeUpdateOperation)
      org.sakaiproject.search.transaction.api.IndexTransactionException: Failed to delete documents
      at org.sakaiproject.search.journal.impl.JournaledFSIndexStorageUpdateTransactionListener.prepare(JournaledFSIndexStorageUpdateTransactionListener.java:161)
      at org.sakaiproject.search.transaction.impl.IndexTransactionImpl.firePrepare(IndexTransactionImpl.java:312)
      at org.sakaiproject.search.transaction.impl.IndexTransactionImpl.prepare(IndexTransactionImpl.java:146)
      at org.sakaiproject.search.journal.impl.MergeUpdateOperation.runOnce(MergeUpdateOperation.java:94)
      at org.sakaiproject.search.journal.impl.IndexManagementTimerTask.run(IndexManagementTimerTask.java:135)
      at java.util.TimerThread.mainLoop(Timer.java:512)
      at java.util.TimerThread.run(Timer.java:462)
      Caused by: java.io.FileNotFoundException: /usr/local/searchindex/index-import/3600/segments (No such file or directory)
      at java.io.RandomAccessFile.open(Native Method)
      at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
      at org.apache.lucene.store.FSIndexInput$Descriptor.<init>(FSDirectory.java:425)
      at org.apache.lucene.store.FSIndexInput.<init>(FSDirectory.java:434)
      at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:324)
      at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:45)
      at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:148)
      at org.apache.lucene.store.Lock$With.run(Lock.java:109)
      at org.apache.lucene.index.IndexReader.open(IndexReader.java:143)
      at org.apache.lucene.index.IndexReader.open(IndexReader.java:133)
      at org.sakaiproject.search.journal.impl.JournaledFSIndexStorage.getIndexReaderInternal(JournaledFSIndexStorage.java:808)
      at org.sakaiproject.search.journal.impl.JournaledFSIndexStorage.getIndexReader(JournaledFSIndexStorage.java:712)
      at org.sakaiproject.search.journal.impl.JournaledFSIndexStorage.getDeletionIndexReader(JournaledFSIndexStorage.java:545)
      at org.sakaiproject.search.journal.impl.JournaledFSIndexStorageUpdateTransactionListener.prepare(JournaledFSIndexStorageUpdateTransactionListener.java:124)
      ... 6 more
      WARN: Failed to start merge operation (2008-03-02 11:45:13,923 Timer-2_org.sakaiproject.search.journal.impl.MergeUpdateOperation)
      org.sakaiproject.search.journal.api.JournalErrorException: Journal is stalled at ID 3651
      at org.sakaiproject.search.journal.impl.JournaledFSIndexStorageUpdateTransactionListener.open(JournaledFSIndexStorageUpdateTransactionListener.java:88)
      at org.sakaiproject.search.transaction.impl.IndexTransactionImpl.fireOpen(IndexTransactionImpl.java:359)
      at org.sakaiproject.search.transaction.impl.IndexTransactionImpl.open(IndexTransactionImpl.java:75)

      Search on the app server is stuck.

      This problem is not nfs-related because the local index is on local disk. It could be caused by writes carried out in the wrong order (e.g. removing the local segments from disk before updating the local-segments file), such that interrupted operations (e.g. app server shut down) lead to an inconsistent and unrecoverable state.

      Workaround for this is to shut down the app server, remove the local index and let it rebuild from the shared index (or copy another local index from another app server).

        Gliffy Diagrams

          Attachments

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                smarquard Stephen Marquard
              • Votes:
                1 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Git Source Code