Sakai
  1. Sakai
  2. SAK-9718

Quota Calculations cause all resources in a site to be loaded into memory, killing any put performance

    Details

    • Previous Issue Keys:

      Description

      When a resourceCommitEdit() is called, the quota calulation loads all the resources in a site into memory to calculate the quota.

      This is Ok for small sites with 5 - 10 resources, but with larger sites where 100's of files have been uploaded it causes massive garbage collection and kills performance. It is particually bad with a webdav access where every put however big causes 100 or more getMembers() calls against evey collection in the site. (once per collection, so not cachable)

      The Quota calculation should be maintained in 1 place only so it doent have to be re-calculated every time.

      It might be worth a look at other filesystems with quota to see how its done there.

        Issue Links

          Activity

          Hide
          Ian Boston added a comment -


          If I turn quota off on the site, DAV becomes perfectly usable on large sites.
          Show
          Ian Boston added a comment - If I turn quota off on the site, DAV becomes perfectly usable on large sites.
          Hide
          Peter A. Knoop added a comment -
          Bumped this to Blocker for discussion at today's Release Meeting. This seems like it could potentially be a wide-spread problem, depending on how common WebDAV use is for an implementation.
          Show
          Peter A. Knoop added a comment - Bumped this to Blocker for discussion at today's Release Meeting. This seems like it could potentially be a wide-spread problem, depending on how common WebDAV use is for an implementation.
          Hide
          Seth Theriault added a comment -
          XFS (Linux) stores quota information in non-visible files and "internally":
          http://www.die.net/doc/linux/man/man8/quotaon.8.html

          This might be inspirational.

          In addition, ProFTPD, an ftp replacement, has a module that implements quotas independently of the filesystem. Check ut:

          http://www.castaglia.org/proftpd/modules/mod_quotatab.html#QuotaTables
          http://www.castaglia.org/proftpd/doc/devel-guide/advanced/Quotatab/

          It implements this via "limit" and "tally" tables, so entries are solely on FTP commands -- a drawback. In Sakai's case, we totally control the "filesystem," this wouldn't be a limiting factor.
          Show
          Seth Theriault added a comment - XFS (Linux) stores quota information in non-visible files and "internally": http://www.die.net/doc/linux/man/man8/quotaon.8.html This might be inspirational. In addition, ProFTPD, an ftp replacement, has a module that implements quotas independently of the filesystem. Check ut: http://www.castaglia.org/proftpd/modules/mod_quotatab.html#QuotaTables http://www.castaglia.org/proftpd/doc/devel-guide/advanced/Quotatab/ It implements this via "limit" and "tally" tables, so entries are solely on FTP commands -- a drawback. In Sakai's case, we totally control the "filesystem," this wouldn't be a limiting factor.
          Hide
          Peter A. Knoop added a comment -
          Ian, when you say kill performance, does this mean the app server overall is in trouble, causing problems for everyone else's sessions too? Do you know if this is also affects 2.3.1, or if this is a regression resulting from more recent changes for 2.4?
          Show
          Peter A. Knoop added a comment - Ian, when you say kill performance, does this mean the app server overall is in trouble, causing problems for everyone else's sessions too? Do you know if this is also affects 2.3.1, or if this is a regression resulting from more recent changes for 2.4?
          Hide
          Jim Eng added a comment -
          Significant improvement on the efficiency of quota calculations (SAK-9718) depends on refactoring the database (SAK-3799).
          Show
          Jim Eng added a comment - Significant improvement on the efficiency of quota calculations ( SAK-9718 ) depends on refactoring the database ( SAK-3799 ).
          Hide
          Ian Boston added a comment -


             I have a fix that I will commit.

          This is a temporary fix that only applies to the calculation of the total size of the collection and only when items are uploaded. I have tested it with 2 concurrent DAV sessions on 2 seperate sites works fine.

          The process uses a concurrent hashmap to store a small object (2x longs) against the collection id of the site.

          If no object is found for the site in question, a fill scan is performed as before, from that point on untill the object expires, the object is used rather than scanning the site.

          The object contains the current size of the site, and a timestamp for the object to expire. As content is added or removed the size counter is updated.
          When the object expires its is removed from the hashmap and a new scan is performed.

          The object life time is set to 10 minutes between creation and expiry, and expiry scans are only performed when a found object expires or a new object is added.

          ----

          I see the numer of GC lines reduce on a site with 3000 files in it from 15 per upload to 10-15 uploads per GC.

          This is a temporary fix that resolves the issue for 2.4 (if included) but is not perfect as its in memory and does not synchronize between nodes in a server and requires 1 scan per site every 10 minutes when puts are being made to that site.

          Show
          Ian Boston added a comment -    I have a fix that I will commit. This is a temporary fix that only applies to the calculation of the total size of the collection and only when items are uploaded. I have tested it with 2 concurrent DAV sessions on 2 seperate sites works fine. The process uses a concurrent hashmap to store a small object (2x longs) against the collection id of the site. If no object is found for the site in question, a fill scan is performed as before, from that point on untill the object expires, the object is used rather than scanning the site. The object contains the current size of the site, and a timestamp for the object to expire. As content is added or removed the size counter is updated. When the object expires its is removed from the hashmap and a new scan is performed. The object life time is set to 10 minutes between creation and expiry, and expiry scans are only performed when a found object expires or a new object is added. ---- I see the numer of GC lines reduce on a site with 3000 files in it from 15 per upload to 10-15 uploads per GC. This is a temporary fix that resolves the issue for 2.4 (if included) but is not perfect as its in memory and does not synchronize between nodes in a server and requires 1 scan per site every 10 minutes when puts are being made to that site.
          Hide
          Ian Boston added a comment -


          This a temporary fix that works here and removed the GC problems and performance issues.

          It does not change the database or have a massive scope. I have tested on a single node with multiple clients acessing he same site at the same time from different machines.

          This fix does not address the wider issues and if it is acceptable to those concerned it could be included in 2.4 and the wider issues delayed until 2.5

          Please discuss with the others.
          Show
          Ian Boston added a comment - This a temporary fix that works here and removed the GC problems and performance issues. It does not change the database or have a massive scope. I have tested on a single node with multiple clients acessing he same site at the same time from different machines. This fix does not address the wider issues and if it is acceptable to those concerned it could be included in 2.4 and the wider issues delayed until 2.5 Please discuss with the others.
          Hide
          Jim Eng added a comment -
          Here are questions raised by Glenn Golden in email:

          Does this work in a cluster? When the "threads" are in different app servers?

          This is further dangerous because it is a memory cache. It will have an entry for every site, and could grow large. Does it have a timeout value and a cleaning thread to keep the size down? Does it register with the memory service so that when the admin send the command to clear all caches, is correctly responds?

          If we though that this was a good approach, a cache could be devised that worked in the cluster as well. But... I'm not sure this is worth it.

          I'm also not sure why we are considering this for 2.4 at this late date. Maybe we are not.
          Show
          Jim Eng added a comment - Here are questions raised by Glenn Golden in email: Does this work in a cluster? When the "threads" are in different app servers? This is further dangerous because it is a memory cache. It will have an entry for every site, and could grow large. Does it have a timeout value and a cleaning thread to keep the size down? Does it register with the memory service so that when the admin send the command to clear all caches, is correctly responds? If we though that this was a good approach, a cache could be devised that worked in the cluster as well. But... I'm not sure this is worth it. I'm also not sure why we are considering this for 2.4 at this late date. Maybe we are not.
          Hide
          Jim Eng added a comment -
          It would be great if we could include something that patches this for 2.4, but I think we need to address the questions Glenn raised before adopting a temporary fix that may be risky. I had asked about contention for the cache, and on review of various emails Ian answered that question to my satisfaction.

          Show
          Jim Eng added a comment - It would be great if we could include something that patches this for 2.4, but I think we need to address the questions Glenn raised before adopting a temporary fix that may be risky. I had asked about contention for the cache, and on review of various emails Ian answered that question to my satisfaction.
          Hide
          Megan May added a comment -
          From Ian Boston:

          There is no issue with concurrent threads on the same app server, ConcurrentHashtable is used.

          The cache is not communicated between app servers.

          Each item in the cache uses 1 object+2 longs + 1 36 char string.

          The items expire 10 minutes from the creation of the entry in the hashmap.

          This is a temporary solution

          Ian
          Show
          Megan May added a comment - From Ian Boston: There is no issue with concurrent threads on the same app server, ConcurrentHashtable is used. The cache is not communicated between app servers. Each item in the cache uses 1 object+2 longs + 1 36 char string. The items expire 10 minutes from the creation of the entry in the hashmap. This is a temporary solution Ian
          Hide
          Megan May added a comment -
          2.4.0.014 bound
          Show
          Megan May added a comment - 2.4.0.014 bound
          Hide
          Megan May added a comment -
          TESTING GUIDANCE
          =====================================
          For a single node there is a very simple test,
          get 4 - 5 web dav sessions uploading to the same site at the same time, you might multiply that for a few sites.
          Then do the same but lower the quota to make it go over quota.\
          -----------
          For a cluster, you need to repeat but with the sessions split between nodes.



           (Preliminary testing of fix) I have done this for both clustered and non clustered for a 400MB data set of files ranging from 10K to 5M. - Ian
          Show
          Megan May added a comment - TESTING GUIDANCE ===================================== For a single node there is a very simple test, get 4 - 5 web dav sessions uploading to the same site at the same time, you might multiply that for a few sites. Then do the same but lower the quota to make it go over quota.\ ----------- For a cluster, you need to repeat but with the sessions split between nodes.  (Preliminary testing of fix) I have done this for both clustered and non clustered for a 400MB data set of files ranging from 10K to 5M. - Ian
          Hide
          Andrew Poland added a comment -
          merged to 2-4-x r29918
          Show
          Andrew Poland added a comment - merged to 2-4-x r29918
          Hide
          Megan May added a comment -
          updating fix version to include 2.4.x
          Show
          Megan May added a comment - updating fix version to include 2.4.x
          Hide
          Peter A. Knoop added a comment -
          Trunk missing as fix version even though it was checked-in, so adding.
          Show
          Peter A. Knoop added a comment - Trunk missing as fix version even though it was checked-in, so adding.

            People

            • Assignee:
              Unassigned
              Reporter:
              Ian Boston
            • Votes:
              2 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: