The test driver snippet from the last entry hints at the core processing loop that we need for NewsHeap - Iterate over all channels, retrieve their content, parse the result. Rinse, Lather, Repeat.
This post begins to layer some behavior ontop of our current classes, pushing NewsHeap towards useful.
A feed reader should be able to
The OPML file serves as the list of subscriptions. We can parse it, we can update it, alas we can't insert or move outlines around yet, but that will come.
To keep track of when channels were last updated, etags if present, update frequency etc, we need to introduce a Subscription
. We also need something to manage a list of Subscriptions
. A SubscriptionList
.
One design issue to wrestle with is the representation of Subscription organization. Foreshadowing our UI requirements to organize our subscription list heirarchically, do we replicate the heirarchy between the subscription list and the OPML? Do we eliminate the OPML altogether and make it available as a import/export format?
Having gone back and forth on this, it strikes me that we can borrow the OPML parser and create a control file format by extending OPML
and Outline
.
A simple change to the OPML SAX Handler facilitates us setting the class for new outline instances.
def initialize(handlerClass=Outline)
...
@handlerClass = handlerClass
end
def start_element uri, localname, qname, attributes
if (!@opml) # ensure that first element is 'OPML'
...
elsif (localname == 'outline')
o = @handlerClass.new() # **Was ---> Outline.new() **
We make a corresponding change to OPML.readFrom
so that we can control the type of Outline.
We can create some subclasses now to hide these details from users of the classes. We need to create Subscripton
and SubscriptionList
class Subscription < Outline
def initialize()
@etag = nil
@lastModified = nil
@xmlURL = nil
@lastGet = nil
super()
end
def etag=(anEtag)
@attributes['etag'] = anEtag
end
def etag()
@attributes.fetch('etag',nil)
end
You will note that we harden the interface for the Subscription
class, adding accessors for elements such as etag
.
SubscriptionList
is even easier
class SubscriptionList < OPML
...
def readFrom(aSource)
super(aSource,**Subscription**)
end
This will ensure that when we do
subList = SubscriptionList.new
subList.readFrom(File.new("somecontrolfile.xml"))
that all instances in the subscription list will be of class Subscription
A few unit tests and the addition of an iterator in OPML
, and we're done.
Let's bring this together with a driver program. First, some top matter and initialization
opmlFile = ARGV[0] || "myChannels.opml"
# read the control file
subscriptionList = NewsHeap::SubscriptionList.new()
subscriptionList.readFrom (File.new("./control/control.xml",File::CREAT|File::RDONLY))
# read the opml
opml = OPML.new
opml.readFrom(File.new(opmlFile))
parser = RssParser::new()
fetcher = HttpGetter::new()
The preceding checks for a control file in ./control/control.xml
and creates it if not found. Start looping over every entry in the OPML file, checking the SubscriptionList
for an entry.
# for every outline
begin
opml.each_outline { | outline |
url = outline.attributes["xmlUrl"]
# check control file
subscription = getSubscription(subscriptionList,url)
result = {}
etag = subscription.etag
lastModified = subscription.lastModified
data = fetcher.readData(url,result,etag,lastModified,NewsHeap::Control::AGENT)
if (newData?(result,subscription))
result = parser.parse(data,result)
end
printIt(result)
# update the control entry
subscription.etag = result['etag']
subscription.lastModified = result['modified']
subscription.lastGet = Time.now()
}
rescue => bang
print_backtrace(bang)
end
# update control data
subscriptionList.persist("./control/control.xml")
We grab the etag and lastModified attributes from the subscription and pass them to our HTTPFetcher
. The fetcher may return a 304 Not Modified if we've already got current content. The fetcher always updates the etag and last modified, so we can check if a new entry is present. If so, we'll pass it to the parser. Update the control entries and when we're all done, we persist the control file for use in the next go around.
One final enhancement makes this actually usable
...
**etag = lastModified = nil if !cached?(url) # <=== Check the cache for this URL - don't modify headers if no cache hit**
data = fetcher.readData(url,result,etag,lastModified,NewsHeap::Control::AGENT)
** # update and cache results
cache(data,url) if data # <=== cache the data so if we get a 304, we still have content to display
**
if (newData?(result,subscription))
result = parser.parse(data,result)
end
We have a control file to store details of our subscriptions and support for bandwidth friendly behavior.
What we're missing is per-channel update frequency and some behavior to rearrange the control file to reflect heirarchies and groups.
In the next round, we'll examine some GUI options, make a decision and implement the basic 3-pane viewer.
The existing code still needs to be cleaned up - camelCase vs canonical_ruby, organization etc can all be better. I'm adding unit tests as we go, so we can layer in some good feelings.