Uploaded image for project: 'Sakai'
  1. Sakai
  2. SAK-42696

Performance issues with large concurrent assignment with honor pledge

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: OPEN
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 19.2
    • Fix Version/s: None
    • Component/s: Assignments
    • Labels:
      None
    • Test Plan:
      Hide

      Please add a Test Plan here.

      Show
      Please add a Test Plan here.

      Description

      We recently had major issues with a large concurrent assignment.  A professor scheduled their midterm exam in the Assignments tool for almost 1000 students.  The exam started at 9 am and ended at 12 pm, so the students all started at almost exactly the same moment.  They were meant to download a zip file that was in the assignment instructions, make some changes, and submit it.
       
      Students complained of the system being very slow and having trouble getting access to the document, waiting a long time while Sakai seemed to be loading, and never being able to retrieve the file at all.  Apparently over half the students experienced these issues so it was a bit of a disaster.  Many students later complained of having issues submitting, and seeing a stacktrace in the portal.
       
      On our 3 clustered servers we have activated the database leakDetectionThreshold at 30s, and across the three server logs starting at exactly 9am we start seeing hundreds of "connection leak detected" messages (findAssignment-conn-leak.txt) all concerning the same method in Assignments, AssignmentRepositoryImpl.findAssignment() which is eventually called by AssignmentAction.doAccept_assignment_honor_pledge().  To be sure, the assignment is configured to present the honor pledge to students before allowing them to access it.
       
      In talking to our DBAs, they confirmed that as of 9 am our active database session count exploded into the hundreds, as did the number of active sessions waiting for a lock to release.  They generated a report for that period indicating that the number one SQL query by elapsed time was the following :

      select ASSIGNMENT_ID from ASN_ASSIGNMENT where ASSIGNMENT_ID =:1 for update 

       
      By activating the hibernate show_sql property I was able to confirm that this query is executed when a student accepts the honor pledge, I believe because a submission is generated at that time.  If the honor pledge is disabled, this query is executed when the student submits.  I think since the students were all accepting the honor pledge at the same time, they all had to wait for the others' locks to resolve causing a massive slow down and server load.
       
      Another strange error was later seen many times in the logs: an UnresolvableObjectException.txt on AssignmentRepositoryImpl.deleteSubmission() in findSubmissionForUser().  It looks like the system finds >1 submission for the user and then tries to delete all but one, but the delete fails because the submission id is not found.  I figure this might be from students closing their tab and trying again, and I assume some kind of dirty read is making the tool try to delete a submission that is not yet committed.  Presumable this is caused by the previous issues.  This is seen from AssignmentAction.doAccept_assignment_honor_pledge() as well as /vm/assignment/chef_assignments_list_assignments.vm, and was never seen before in our logs archive.
       

        Gliffy Diagrams

          Attachments

            Activity

              People

              • Assignee:
                ern Earle R Nietzel
                Reporter:
                curtisvo Curtis van Osch
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Git Source Code