|
[
Permlink
| « Hide
]
Aaron Zeckoski added a comment - 21-Apr-2008 03:52
The first part of the patch (address #1)
Determined #2 cannot be done without refactoring UDS into 2 parts, too much for this issue, instead this is addressed for JCR in
Investigating #3 indicates this so far:
On login, the stack looks like this on the first hit on the UDP: JLDAPDirectoryProvider.getUser(UserEdit) line: 516 CaretFederatingUDP(FilterUserDirectoryProvider).getUser(UserEdit) line: 163 DbUserService$$EnhancerByCGLIB$$9060adbf(BaseUserDirectoryService).getProvidedUserByEid(String, String) line: 603 DbUserService$$EnhancerByCGLIB$$9060adbf(BaseUserDirectoryService).getUserByEid(String) line: 708 DbUserService$$EnhancerByCGLIB$$9060adbf(BaseUserDirectoryService).getInternallyAuthenticatedUser(String, String) line: 1391 DbUserService$$EnhancerByCGLIB$$9060adbf(BaseUserDirectoryService).authenticate(String, String) line: 1376 UserAuthnComponent$$EnhancerByCGLIB$$a0f65666(UserAuthnComponent).authenticate(Evidence) line: 107 AuthenticationManager.authenticate(Evidence) line: 63 This seems wrong because first authenticate is called (ok) -> then getInternallyAuthenticatedUser (ok) -> then getUserByEid (ok) -> then that calls the UDP (wrong), this method should ONLY be fetching an internal user based on the name used. Perhaps the naming is bad but I doubt this is the intended behavior. This results in the first call to UDP.getUser This is the stack at the second hit on the UDP: JLDAPDirectoryProvider.authenticateUser(String, UserEdit, String) line: 359 CaretFederatingUDP(FilterUserDirectoryProvider).authenticateUser(String, UserEdit, String) line: 493 DbUserService$$EnhancerByCGLIB$$9060adbf(BaseUserDirectoryService).getProviderAuthenticatedUser(String, String) line: 1438 DbUserService$$EnhancerByCGLIB$$9060adbf(BaseUserDirectoryService).authenticate(String, String) line: 1381 UserAuthnComponent$$EnhancerByCGLIB$$a0f65666(UserAuthnComponent).authenticate(Evidence) line: 107 AuthenticationManager.authenticate(Evidence) line: 63 This should have been the first hit against the UDP rather than the second. Otherwise this seems fine. When fetching a user data: CaretFederatingUDP(FilterUserDirectoryProvider).getUser(UserEdit) line: 163 DbUserService$$EnhancerByCGLIB$$9060adbf(BaseUserDirectoryService).getProvidedUserByEid(String, String) line: 603 DbUserService$$EnhancerByCGLIB$$9060adbf(BaseUserDirectoryService).getUserByEid(String) line: 708 UserDirectoryService.getUserByEid(String) line: 90 SiteAction.prepareParticipants(String, List) line: 7935 There is only a single hit against the UDP. This only happens if the user was not already in the cache. No problems here. Other tests seem to indicate things are generally ok in other areas. Assigning to Ray for review and merge, ideally this should end up in 2.5.x and possibly 2.5.1 as the performance implications are possibly great
"This seems wrong because first authenticate is called (ok) -> then getInternallyAuthenticatedUser (ok) -> then getUserByEid (ok) -> then that calls the UDP (wrong), this method should ONLY be fetching an internal user based on the name used. Perhaps the naming is bad but I doubt this is the intended behavior."
It can sometimes be surprisingly difficult to work out "intended behavior" for the services. :) The legacy behavior does allow internal authentication of a provided user. The outcome would be the normal "internal"-style password check made against the hashed password returned in the provided user record. I don't know if anyone is depending on that case, but it has been possible for a long while now. It should still work that way even without the first call. The odd reality in my testing is that because that call happens first we cache a user who has not yet authenticated and then if the auth fails end up caching a user for no good reason. With the changes I put in the user will still be fetched on login and the case you mentioned should still work fine, however, the user will not be fetched or cached until after they authenticate.
That part of the patch can be ignored if you would prefer to only apply the cahcing changes. They are more important anyway. In order:
1) No question that this should go in! I'm sorry for the mess -- thanks for cleaning it up. 2) I wasn't sure about the priority of a cache on UDS "authenticate()", since the UserAuthnComponent is (or should be) the standard entry point for authentication. It turns out that some clients we didn't know about have ended up calling UDS authenticate directly, but they should really switch. (Speaking of which -- thanks for putting in the warning comment and the JCR JIRA!) Since you've tested the change and we all love caching, I'll merge it, but we probably shouldn't expect much of a payback. 3) There's a possible scenario in which the current behavior is desirable (for example, a "provider" that just serializes records and doesn't want to try to handle authentication logic itself). The question is whether any customers are counting on it. But as you mentioned on IM, it turns there's not a lot of inefficiency currently after all, since a UDP-supplied user record will get locally cached whether authentication worked or not. Since I'm wary of breaking long-standing behavior in a point release, I'd lean against merging in this particular change. Sorry, I missed your comment! It's not so bad to cache the user record even if authentication failed. Since a user record was found, that means that the username meant something, which means it was probably a password failure, which means we're likely to get another request for that user soon. :)
Quick note to myself: The integration test shows that the new MemoryAuthenticationCache class still advertises the maximumSize, timeoutMs, and failureThrottleTimeoutMs properties, but doesn't appear to honor them. Either the new implementation should handle the old properties correctly or they should be removed (or, to make upgrades easier, replaced to throw warning messages to the log).
It looks like the easiest way to preserve backward compatibility for the
maximumSize@org.sakaiproject.user.impl.AuthenticationCache and timeoutMs@org.sakaiproject.user.impl.AuthenticationCache properties would be to subclass EhCacheFactoryBean. Potential overkill, but it makes it safer to include this change in a point release. There's no ehCache equivalent for "failureThrottleTimeoutMs", so I'll just emit a warning for that one. Aaron's default configuration sets "timeToIdle" to be equal to "timeToLive". The original implementation minimized timeToIdle "because we want to reduce the risk of OK-ing an out-of-date password." Following that line of thought, unless someone objects, I'll set timeToIdle to the minimal 1 second by default. I don't think you want to do that:
timeToIdleSeconds: Sets the time to idle for an element before it expires. i.e. The maximum amount of time between accesses before an element expires Is only used if the element is not eternal. Optional attribute. A value of 0 means that an Element can idle for infinity. The default value is 0. timeToLiveSeconds: Sets the time to live for an element before it expires. i.e. The maximum time between creation time and when an element expires. Is only used if the element is not eternal. Optional attribute. A value of 0 means that and Element can live for infinity. The default value is 0. It would cause the cached authentications to expire immediately. Also, the failureThrottleTimeoutMs should still work fine with the patch I sent. It is not part of ehcache at all but something where it looks at the time stored in the custom object that was being used. That is still the case (since I am just extending the existing code that should continue to work as it did before). Thanks for the idle/live wake-up call. So basically the minimum value of those two will win out? In that case, I assume we might as well not bother setting "timeToIdle".
Regarding failureThrottleTimeoutMs, I think we might be misunderstanding each other. Failure caching works fine. What doesn't work is specifying different timeout values for success-caching and failure-caching: every cached record (whether a "success" record or a "failure" record) times out the same way. It was probably overkill to split the two up -- I was just following someone else's lead. After some email back-and-forth and after failing at my first two attempts to support backwards compatibility, I've decided to keep life simple and standard, and drop support for the three AuthenticationCache-specific properties introduced in 2.5.x. When they get set, the new code will log a warning and otherwise ignore them. If the 2.5.x release manager decides that's unacceptable, we can rethink things. At revision: 45743 of the trunk, I checked in the switch to the standard EhCache approach, along with the applicable integration test changes.
This should be considered for inclusion in 2.5.x and the next point release, but requires the following release note: * The authentication cache required to support WebDAV efficiently has been switched to a more standard implementation. This means that the following "sakai.properties" are no longer valid: maximumSize@org.sakaiproject.user.impl.AuthenticationCache timeoutMs@org.sakaiproject.user.impl.AuthenticationCache failureThrottleTimeoutMs@org.sakaiproject.user.impl.AuthenticationCache The new applicable properties are: maxElementsInMemory@org.sakaiproject.user.api.AuthenticationManager.cache timeToLive@org.sakaiproject.user.api.AuthenticationManager.cache However, we expect the default settings will be suitable for most environments. Is this in the 2.5.x branch now?
-AZ Not yet. It needs to be QA'd in trunk first.
How can we get that done? It would be a real shame to not have this in 2.5.1
Basically verify that the caching is still effective and that the config settings work appropriately. Perhaps test on a trunk build with a UDP that has logging/debug statements (i.e. that would show that the authentication methods are only called once within the TTL period no matter how many times the user is authenticated in Sakai via login, WebDAV, etc.) ? I'm assuming there aren't yet unit tests that exercise this code.
My understanding was that Ray has unit tests which he used to verify the code. I tested this on my machine and I believe Ray tested on his as well. My tracing showed that it was working so I did not need to look at logs but Ray may have looked at log output.
merged into 2-5-x with r46543
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||