[SAK-10159] Unable to interpret RSS from Google News Created: 23-May-2007  Updated: 22-Jun-2016  Resolved: 12-Mar-2010

Status: CLOSED
Project: Sakai
Component/s: News (RSS), Reference
Affects Version/s: 2.4.0, 2.4.1, 2.5.0, 2.5.2, 2.5.3, 2.5.4, 2.5.5, 2.6.0, 2.7.0
Fix Version/s: 2.7.0, 2.8.0

Type: Bug Priority: Major
Reporter: Sam Ottenhoff Assignee: Matthew Buckett
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File SAK-10159-rome1.patch     File SAK-10159.patch     File rome-129-pom.patch     File rome-129.patch     File rome-maven.tgz    
Issue Links:
Relate
relates to SAK-13353 Unable to render news from Times High... CLOSED
is related to SAK-18044 Add -Dhttp.agent=Sakai to demo startu... CLOSED
is related to SAK-18359 On clicking on the News tool for the ... CLOSED
CLE Team Issue:
Yes

 Description   

Unable to interpret RSS feed from Google News.

To reproduce:

1) Go to news.google.com

2) Search for "Cleveland Cavaliers"

3) Select the RSS link in the left-hand nav

http://news.google.com/news?hl=en&ned=us&q=cleveland+cavaliers&ie=UTF-8&output=rss

4) Go to your Sakai site => Site Info => Edit Tools => Add News

5) Paste your Google news feed in

The error received:

[Unable to interpret news feed from http://news.google.com/news?hl=en&ned=us&q=cleveland+cavaliers&ie=UTF-8&output=rss]



 Comments   
Comment by Sam Ottenhoff [ 27-Jul-2007 ]

Google News is 403'ing (Forbidden to access) Sakai from downloading RSS feeds. The reason is that Sakai retrieves the RSS feed with a user agent of "Java/1.5.0_11" (in my case).

Google has apparently banned user agents like wget, curl, or Java.

I thought the easy fix would be a simple a fix in BasicNewsChannel.java:

feedFetcher.setUserAgent("Sakai News Tool");

But the user agent in the requests remained as "Java/xxxxx"

I had to add a JAVA_OPT to set http.agent to "Sakai" to get around the issue.

Any thoughts on why the setUserAgent in rome-fetcher is not "sticking"?

Comment by Matthew Buckett [ 10-Apr-2008 ]

Attached patch sets the user agent to something like "Sakai/2.5.x (sakai.news)" and also updates rome-fetcher to 0.9 which seems to fix the bug of ignoring the setUserAgent call. I also updated rome to 0.9 at the same time.

rome-fetcher-0.9.jar needs to be add to the sakai maven repo as it doesn't exist anywhere else at the moment for this to work.

http://wiki.java.net/bin/view/Javawsxml/RomeFetcherRelease09
https://rome.dev.java.net/dist/rome-fetcher-0.9.zip

Comment by Matthew Buckett [ 10-Apr-2008 ]

Attached the correct patch (sorry).

Comment by Matthew Buckett [ 20-Apr-2009 ]

Upgrading to rome 1.0 doesn't work because the sakai classloader setup doesn't work with rome without patching.
https://rome.dev.java.net/issues/show_bug.cgi?id=129

Comment by Matthew Buckett [ 20-Apr-2009 ]

Updated patch that uses patched rome 1.0 for news tool.
A copy of the patched rome 1 can be found at

http://maven-repo.oucs.ox.ac.uk/content/repositories/releases/rome/

Comment by Matthew Buckett [ 20-Apr-2009 ]

Patch to rome 1.0

Comment by Matthew Buckett [ 20-Apr-2009 ]

Patch to rome 1.0 to create a sakai version.

Comment by Alan Berg [ 17-Feb-2010 ]

http://news.google.com/news?hl=en&ned=us&q=cleveland+cavaliers&ie=UTF-8&output=rss

Works in Firefox 3.5.7 as a live bookmark
But in the news tools you get:
Alert: http://news.google.com/news?hl=en&ned=us&q=cleveland+cavaliers&ie=UTF-8&output=rss is an invalid RSS feed.

Comment by Sam Ottenhoff [ 17-Feb-2010 ]

The only fix I have been able to find is the one pointed out in 2007: set the user agent in JAVA_OPTS to "Sakai" as Google seems to mistrust a user agent with "Java" in the string.

Comment by Seth Theriault [ 17-Feb-2010 ]

For the record, Feed Validator (http://www.feedvalidator.org) reports that the Google News feed is invalid.

Comment by Alan Berg [ 18-Feb-2010 ]

I confirm that adding -Dhttp.agent=Sakai in the JAVA_OPTS for the ./start-sakai scripts in the demo for trunk resolved this issue.

Comment by Anthony Whyte [ 18-Feb-2010 ]

Demo startup scripts updated, trunk r73654.

Comment by Alan Berg [ 22-Feb-2010 ]

The JAVA_OPTS update for 2.7 b3 allows the google feed to work on qa1-nl

Comment by Matthew Buckett [ 03-Mar-2010 ]

rome-maven.tgz is a file containing the extra maven artifacts that should go in the Sakai maven repo.

Comment by Matthew Buckett [ 12-Mar-2010 ]

Fixed in trunk to using a newer version of ROME.

Comment by Anthony Whyte [ 22-Mar-2010 ]

2.7.x, r74963.

Comment by Matthew Buckett [ 16-Sep-2015 ]

Rome has moved to GitHub and the issue is now at: https://github.com/rometools/rome/issues/130

Generated at Sun Sep 22 10:46:53 CDT 2019 using Jira 8.0.3#800011-sha1:073e8b433c2c0e389c609c14a045ffa7abaca10d.