|
[
Permlink
| « Hide
]
David Haines added a comment - 18-Jan-2008 06:35
This is also a concern at Michigan. We have cleaned up our catastrophic sites that were causing us to restart servers, but wonder if we will be moving to sites where the forum tool is nearly unusable. That would be progress, but not enough.
with just 13 messages, 8 forums and 11 users I'm seeing quite a few count queries (420). See attached output from yourkit. So I think this code needs to be rewritten a little:
while (topicIter.hasNext()) { final Topic topic = (Topic) topicIter.next(); if (uiPermissionsManager.isRead((DiscussionTopic) topic, df)) { totalForum += messageManager.findMessageCountByTopicId(topic.getId()); authoredForum += messageManager.findAuhtoredMessageCountByTopicIdByUserId(topic.getId(), userId); unreadForum += messageManager.findUnreadMessageCountByTopicIdByUserId(topic.getId(), userId); readForum += messageManager.findReadMessageCountByTopicIdByUserId(topic.getId(), userId); } } Its doing this for each member, forum topic. It needs to aggregate the data a little better to limit the number of sql calls. Just a note that this has also been reported at Indiana as well.
Noting that UCT tentatively plans to address this issue for 2.7.
+1 from Marist on this issue.
SUGGESTION: Could we more easily implement an excel export? Our faculty members just want to check these stats once a week or even just once a month. If they could run a process, wait 5-10 minutes, and have it produce an excel spreadsheet with all of the data (maybe each student on a different worksheet?) that could help. I just had to wait 30 minutes to see the summary page of a course with 192 users and over 800 messages... Very frustrating... Then clicking on one user is taking forever again... I like Josh's suggestion of downloading a spreadsheet. +1
Assigning back to IU as we are not currently working on this.
On our server (2.5.3/Oracle) this index helped considerably:
create index MFR_MESSAGE_T_UDSAKAI140 on SAKAI.MFR_MESSAGE_T ( SURROGATEKEY , DELETED , DRAFT ) The wait time for the course Mathieu referred to wen from over 30 minutes to under 5 -- still a while, but much more tolerable. I think the big difference in John's index is the surrogatekey field. If an index on only surrogatekey (topic_id) was created, that should show a big improvement all over Forums, including statistics.
Also, when viewing statistics, the query 'findAuhtoredMessageCountByTopicId' is executed once for every user for every topic, so this needs to be as efficient as possible. Indexing (surrogatekey, created_by) should show further improvement in that area so that the database doesn't have to scan every topic's messages linearly for each user in the process. OK, so I sat down and tackled this over the past few days and I've managed to greatly increase the loading speed of the main statistics page, I've looked at but have not tackled loading an individual student's statistics quicker.
Our environment: Sakai 2.5.x on Oracle I first started by getting a hold of the actual SQL that hibernate was creating and executing and running those manually. They already seemed to be running very quickly so I did not anticipate that adding any new indexes would help us. Instead I looked at the MessageForumStatisticsBean.getAllUserStatistics function. I noticed that it was running a lot of separate queries, basically 4*<the number of topics>*<the number of site members>, which as sites started to get large, would explain the slowness we were seeing. I combined queries and refactored the function to get it to the point where it only runs three queries. We tested on three courses and found an almost exponential increase in speed (we loaded each Forum's statistics page 7+ times and timed it with a stopwatch). There were some statistical outliers which we dropped from the data, as we assumed that they were due to GC, network latency, or something else separate from the loading of the forums. Course 1 : 1 Forum : 8 Topics : 563 Messages : 36 Site Members Initial Average Load Time: 30.8s Current Average Load Time: 1.1s Course 2 : 2 Forums : 12 Topics : 830 Messages : 71 Site Members Initial Average Load Time: 75.2s Current Average Load Time: 2.6s Course 3 : 6 Forums : 38 Topics : 1,035 Messages : 52 Site Members Initial Average Load Time: 136.2s Current Average Load Time: 1.88s I've attached a patch for each of the four files that I had to change. Please review my patch and let me know what you think. I hope it helps, but if I'm doing something I shouldn't, please just let me know. A big thanks to Branden from U Windsor who pointed out that the first group of patches that I submitted were totally wrong. I created the diffs using the wrong base svn revision. I just recreated the patches and have submitted them again. I also took the step to create an all-in-one patch to hopefully make it a bit easier to install.
Thanks again, and let me know how this goes. I merged for our 2.5 instance. I don't think any of our forums are large enough to notice a huge performance improvement *yet*.
The list of user stats loaded quick with 140 messages, 4 topics over 2 forums. I verified the reported message stats were consistent with where they were reported elsewhere. A design question: Is there any particular reason you use List<Object[]> in the API, while using the Object[] as a 2-value array that represents { ID, Count } ? A cleaner alternative might be to use a Map<String, Integer> that maps ID -> Count . The patch is great. I saw the HUGE improvement on Forums statistics page load. Thanks Matt !
I also agree with Branden, I think it would be clearer to use Map<String, Integer> instead of List<Object[]> in the API. What do you think, Matt ? If you agree with this, would you please make the change and attach an update patch? After that, I will apply it to trunk. Thanks a lot. Returning a Map<String, Integer> was my first thought as well but I couldn't find a way to have a Hibernate named query automatically return a Map. And because this is part of an utility API that can be called from just about anywhere, I tried to keep my implementation as sparse as possible, touching the data as little as possible before passing it along so that in those situations where a Map is not called for, the overhead of creating a map is not forced on the developer.
If it's still something we would definitely want in the API, then I would suggest adding a new function call with the same parameters but a different return type. Commons Lang ArrayUtils allows you to create a Map from an object array:
ArrayUtils.toMap(Object[]); I'd suggest this for the conversion, over returning an Object[] directly. Or put in some docs on it. As an aside, its not possible to have the same method signature with the same parameters and different return types. Before closing this issue, can anyone tell on a large site if this addresses performance issues for user statistics? Matt mentioned that view wasn't directly addressed in his fix.
My patch does not affect user statistics at all on our instances. I haven't done anything with the functions that gather a user's information yet, but I think that is another area where re-thinking the implementation logic could yield large improvements in the speed. I can see two approaches that we could take:
1) We could combine the getUserAuthoredStatistics and getUserReadStatistics into one function. Right now, between those two functions we are checking every message in the Forums tool twice. Combining them would cut that in half. I also think we could get all of the messages in the Forum in one query instead of querying on for each topic individually. 2) We could keep the functions separate but instead of querying for all the messages in the forum, we could write queries that give us exactly the messages we want for each function. For example, write a query that gives us only the list of messages authored by a user and a second query that only gives us messages the user has read. I think both of those options are valid and haven't found a reason to lean one way or the other right now. Although I think my gut says to go with Option 2. This is something that's still on my ToDo queue, but it's closer to the tail than the head right now. After the patch,it is pretty fast if you go to Forums, click statistics, then click one user name. Before the patch, it needs 1 minute after I click the user name to load the page, now just need 2 seconds. This also improves the "Previous Participant" and "Next Participant" to load.
Is anyone running this patch on a production 2.6.x instance? This code has been in trunk for quite awhile and seems like a strong candidate for the 2.6.x branch. Can we close this issue and start new issues for further refinements?
After we applied this patch and tested it, I looked for more ways to improve performance. This is some work I did on our local instance, just putting this out here if you want to use it:
r67532 Effects loading a single user's statistics page: Limited the # of connections needed by calling SQL queries in bulk instead of one at a time: Large Test: 2200 -> 6 connections Average Test: 130 -> 3 connections Overall performance improvement of about 20% load time. This will reduce the number of connections needed to load the statistics page. We also had moved ahead and done some fixes for loading an individual student's statistics at our institution which I forgot to post to this ticket. Our changes are much more dramatic than Bryan's but we've been running them in production for a few months now with no problems. I've attached a diff (
Thanks Matt... I applied that patch locally and it helped a lot with the single user's statistics page. It overlapped most of my additional changes but there were a couple of functions that it didn't touch. So it might be worth looking into those changes.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||