click here for details... Sakai Executive Director Position Search now open
Issue Details (XML | Word | Printable)

Key: SAK-13584
Type: Task Task
Status: Closed Closed
Resolution: Fixed
Priority: Blocker Blocker
Assignee: Charles Severance
Reporter: Charles Severance
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Sakai

Further Improve the Performance of the Email Archive and Message API

Created: 22-May-2008 01:31   Updated: 15-Dec-2009 11:51
Component/s: Email Archive, Message
Affects Version/s: 2.5.0
Fix Version/s: 2.6.0

Time Tracking:
Not Specified

File Attachments: 1. Text File SAK-13584-kernel.txt (47 kB)
2. File SAK-13584-sash-test.py (2 kB)

Issue Links:
Cloners
 
Incorporate
 
Relate

2.6.x Status: None
2.5.x Status: Won't Fix
2.4.x Status: Won't Fix


 Description  « Hide
This will build on SAK-11544 and SAK-12837. The idea is to add an even better feature to the Message API and then modify the MailArchive tool to use this new API. The improvements will support flexible filtering, sorting, and paging all in a single query object. This will be done in a very similar manner to the GenericDAO search pattern that is currently in use by Aaron Z.

Once the API changes are made and the MailTool is modified to use the new APIs and the data is reorganized and the implementation makes all of the queries efficient, the MailArchive tool will use little or no Session state at all. This also means that the MailArchive tool will have REST-like URLs that capture navigation state instead of using Session state to handle navigation state.

Once this pattern is in place, it can be used to improve the performance and reduce the session state load of the Message API family of Sakai tools. If that work is undertaken, it will be in separate JIRAs.

This is targeted for 2.6 and will require a database conversion to split out data from the XML into columns. Since there is a conversion involved - there is no plan to back-port this (even to the 2-5-x branch) - it is targeted for 2.6.


 All   Comments   Work Log   Change History   Subversion Commits   git Commits      Sort Order: Ascending order - Click to sort in descending order
Charles Severance added a comment - 10-Jul-2008 01:36
One small fix that went straight into trunk cause it was so simple.

Add integer versions of the top and bottom message numbers to the context.
This allows the option for the tool's markup to produce REST-like URLs
when moving from a list-view to an individual view. These values are simply
placed in context - it is up to the tool whether it uses them.

Sending tool/src/java/org/sakaiproject/cheftool/PagedResourceActionII.java
Transmitting file data .
Committed revision 48604.


Charles Severance added a comment - 10-Jul-2008 02:27

Charles Severance added a comment - 11-Jul-2008 07:56
The first version is up - primarily for testing - I only have a mysql convertor. It runs with mysql and hsql. Here are checkout, compile, and conversion instructions:

Testing SAK-13584

Grab the following branches:

https://source.sakaiproject.org/svn/db/branches/SAK-13584/
https://source.sakaiproject.org/svn/mailarchive/branches/SAK-13584/
https://source.sakaiproject.org/svn/util/branches/SAK-13584/

Compile and deploy

Run the conversion

cd mailarchive

Edit upgradeschema-mysql.config to get the database connection right.

Run the script:

sh mailarchive-runconversion.sh \
      -j "/Users/csev/dev/sakai-trunk/apache-tomcat-5.5.23/common/lib/mysql-connector-java-5.1.6-bin.jar" \
      -p "/Users/csev/dev/sakai-trunk/apache-tomcat-5.5.23/sakai/sakai.properties" \
      upgradeschema-mysql.config

Check to see if the BODY, SUBJECT, and HTMLBODY columns look reasonable. The conversion can be stopped and started. If you want to undo the conversion simply execute these commands:

ALTER TABLE MAILARCHIVE_MESSAGE DROP COLUMN SUBJECT;
ALTER TABLE MAILARCHIVE_MESSAGE DROP COLUMN BODY;
ALTER TABLE MAILARCHIVE_MESSAGE DROP COLUMN HTMLBODY;

--Chuck

Charles Severance added a comment - 11-Jul-2008 08:00
Test Scenario for this interim version - RUN this test and send me catalina.out

Here are the needed clicks - keep track of which ones are dog slow and which ones are fast. I expect that the non-search queries will be quite fast. And even the search queries should be tolerable.

Go to the mail tool in a site with lots of messages - do not enter a search value

Press next page 2 times
Press back page
Press last page
Press back page twice
Press first page
Enter a message
Press next message twice
Press back message once

Switch to Sort by Subject
Switch to Ascending and descending
Press back page
Press last page
Press back page twice
Press first page

Now put in a search string that is common and repeat the above steps.

Now put in a rare search string and repeat the above steps.

Send me back the catalina.out - thanks muchly

Charles Severance added a comment - 21-Jul-2008 17:04
Tested the Oracle code and created / tested the Oracle conversion. Thanks Dave H.

Sending mailarchive-runconversion.sh
Adding upgradeschema-2.6-mysql.config
Adding upgradeschema-2.6-oracle.config
Deleting upgradeschema-mysql.config
Transmitting file data ..
Committed revision 49103.


Charles Severance added a comment - 27-Jul-2008 12:39
Add feature to suppress the search option if there are too many messages
in the channel.

sakai.mailbox.search-threshold=2500

The default is 2500 messages - above this number searches get somewhat slow as they use like clauses in the newly exploded columns. This is *different* from the Sakai 2-5 version of Mail Tool - it actually tried to search large corpuses but features were
suppressed when the search result returned too many results. The Sakai 2.5 approach
actually retrieved and deserialized all of the data in the App Server.

Some testing can be done to determine if this is a good limit. Some initial testing
of a 25000 message corpus suggested that it takes about 1 second per 1000 messages doing a like clause. That testing was done in MySql with a prety weak system and slow disk on a consumer desktop - so perhaps this threshold can be higher when the server is more performant. Some basic like queries can be done on the MAILARCHIVE_MESSAGE for a channel with a large corpus to hep determine the right setting for this parameter for a particular database server.

Charles Severance added a comment - 27-Jul-2008 12:39
svn commit
Sending mailarchive-tool/tool/src/java/org/sakaiproject/mailarchive/tool/MailboxAction.java
Sending mailarchive-tool/tool/src/webapp/vm/mailarchive/chef_mailbox-List.vm
Transmitting file data ..
Committed revision 49413.

Charles Severance added a comment - 30-Jul-2008 21:04
This got refactored because of K1. Half of this is in K1 and the other half is in the main svn. The refactor is complete and things are all merged up to trunk - once final testing in Oracle is completed - a patch for Kernel will be generated than then the mailarchive code will be checked into trunk.

Charles Severance added a comment - 31-Jul-2008 11:07
This is the kernel patch for this modification.

Ian Boston added a comment - 31-Jul-2008 15:00
There are some things that need fixing in the patch.

1. New files, with no license need to be licenses ECL
2. There are a significant number of methods in DoubleStorageSql that nave no javadoc.
3. Some files are licensed ASFL2, (c) Aaron, since you cant just change a license and I beleive Aaron has signed a CLA, these need to be licensed ECL

I am awaiting confirmation from sakai-dev for 3. but other than that the patch is Ok, but, if its correct that ECL is required, can you liaise with Aaron to fix these items.

Thanks
Ian

Ian Boston added a comment - 31-Jul-2008 17:27

Patch applied r49932

Charles Severance added a comment - 04-Aug-2008 17:49
This is a test to load a bunch of messsages into E-Mail archive. It tests the loading part of the APIs and then allows you to test the tool with data.

You need SASH (HTML Terminal) in your Sakai. Place the script in Resources and then type the following command into SASH (replacing the site ID):

python /group/dae3cfae-bb45-42b5-b84d-66bb7c4eb66f/SAK-13584-sash-test.py

The output looks like this:

org.sakaiproject.time.impl.BasicTimeService@9e451f
20080805004635588
dae3cfae-bb45-42b5-b84d-66bb7c4eb66f
main
/mailarchive/channel/dae3cfae-bb45-42b5-b84d-66bb7c4eb66f/main
 0
added... 0
all done

You can add an optional parameter to add a bunch of messages:

python /group/dae3cfae-bb45-42b5-b84d-66bb7c4eb66f/SAK-13584-sash-test.py 10

Be careful- sash is picky about spaces.

Charles Severance added a comment - 04-Aug-2008 17:53
This is now committed. to trunk

Merge of the branch into trunk.http://bugs.sakaiproject.org/jira/browse/SAK-13584

The major features of this are as follows:

- All features restored - search, sort, and paging forward/back when viewing an individual message
- All features are fast regardless of message corpus size - except for search - there is a property to shut off search (simply remove it from the UI) above a threshold - this is defaulted to 2500 messages.

- The session state for MailArchive is virtually zero - about 10 strings - no live objects at all ever end up in session - so MailArchive will never run you out of memory again.

- The pattern has been made more REST-like - pretty much every action turns into one SQL query which returns *exactly* the right records. There is no more searching, or selecting or sorting in the CPU. Each action only de-serialized the exact number of records that will be displayed on each screen.

This is pretty nice - it means that MailArchive is safe on any sized message corpus - at the point where search would cause a query become more than
 3-4 seconds it shuts off (you can play with this value)

This *does* require a DB conversion - the conversion is included. It uses the conversion pattern used by Content - it is a shell script that does the conversion (adding columns) off-line. The recent K! jar reorganization broke the conversion - but it was heavily tested pre-K1 - and I will soon have it working again. The conversion can run repeatedly.

For developers who start with autoddl - it automatically makes the columns for you on fresh startup.

This does not touch incoming mail handling - nor does it affect outbound mail handling - it does touch the storage - I tested this mostly using SASH scripts. SO when we do QA - we do nee to check the mail coming into and out of storage carefully.

The conversion is broken because of K1 - and will be fixed. See the README in this commit.

svn commit
Sending mailarchive-impl/impl/pom.xml
Sending mailarchive-impl/impl/src/java/org/sakaiproject/mailarchive/impl/BaseMailArchiveService.java
Sending mailarchive-impl/impl/src/java/org/sakaiproject/mailarchive/impl/DbMailArchiveService.java
Adding mailarchive-impl/impl/src/java/org/sakaiproject/mailarchive/impl/conversion
Adding mailarchive-impl/impl/src/java/org/sakaiproject/mailarchive/impl/conversion/ExtractXMLToColumns.java
Adding mailarchive-impl/impl/src/sql/hsqldb/sakai_mailarchive_2_6_0.sql
Adding mailarchive-impl/impl/src/sql/mysql/sakai_mailarchive_2_6_0.sql
Adding mailarchive-impl/impl/src/sql/oracle/sakai_mailarchive_2_6_0.sql
Adding mailarchive-runconversion.sh
Sending mailarchive-tool/tool/src/java/org/sakaiproject/mailarchive/tool/MailboxAction.java
Sending mailarchive-tool/tool/src/webapp/vm/mailarchive/chef_mailbox-List.vm
Sending mailarchive-tool/tool/src/webapp/vm/mailarchive/chef_mailbox-view.vm
Adding readme-conversion.txt
Adding upgradeschema-2.6-mysql.config
Adding upgradeschema-2.6-oracle.config
Transmitting file data ........
Committed revision 50007.



Charles Severance added a comment - 25-Aug-2008 06:18
The only remaining task on this JIRA is to fix the conversion script after K1 broke it.

Charles Severance added a comment - 25-Aug-2008 07:00
I switched this to blocker because the release will not work without the conversion script working - the new code that depends on the conversion script is there but the conversion script needs fixing.

The fix is simple - it just needs to load the right post-K1 jars and be retested - the code and logic are already heavily tested.

Peter A. Knoop added a comment - 25-Aug-2008 09:59
Moved from Branch to Task, now that the code is in trunk.

Peter A. Knoop added a comment - 29-Sep-2008 07:22
[Bulk Comment] This Task (or Sub-Task) issue currently is Unresolved, but has a Fix Version of 2.6. The Code Freeze for Sakai 2.6 has now passed (29-Sep-2008, 8:00am Eastern US time).

If you are still working to resolve this issue for 2.6, then please post an email to sakai-dev to let everyone now that you need an exception for this JIra, explain what is left to do, and when you plan to have the work completed; please include that information here in the Jira as well.

Otherwise, if the resolution of this task has been postponed, please reset the Fix Version to 2.7 or Unknown, depending on what you're new expectations for completion are. If the issue is no longer relevant, please close the issue as Won't Fix or Incomplete with an explanation of why.

Thanks!

Charles Severance added a comment - 03-Oct-2008 14:45
$ svn commit
Sending mailarchive-runconversion.sh
Deleting readme-conversion.txt
Sending upgradeschema-2.6-mysql.config
Sending upgradeschema-2.6-oracle.config
Transmitting file data ...
Committed revision 53121.

Update the MailArchive 2.6 conversion script.

As part of a 2.6 upgrade this script needs to be run on the database.
It adds a BODY and SUBJECT column and extracts these
out of the XML to improve searching, sorting, and
paging performance.

The conversion script is in the mialarchive directory and
it is called as follows:

./mailarchive-runconversion.sh upgradeschema-2.6-oracle.config

You must do this with CATALINA_HOME pointing to your Sakai
deployment - it looks through sakai.properties for conneciton
details and finds the DB drivers in your Tomcat as well.

The convrsion can be run more than once - it notices when a
message alreayd has been converted and skipps it. You
can even drop the columns and then re-run the conversion.

Please put this in 2.6 - it is a blocker.

Note - this is a required conversion as part of 2.6 - it needs to go
into the release notes somewhere.


Charles Severance added a comment - 03-Oct-2008 14:46
This is now fixed and ready for 2.6.

Anthony Whyte added a comment - 15-Dec-2009 11:51
Old issue. Closing.