manicwave Surf the wave

Take Control

The test driver snippet from the last entry hints at the core processing loop that we need for NewsHeap - Iterate over all channels, retrieve their content, parse the result. Rinse, Lather, Repeat.

This post begins to layer some behavior ontop of our current classes, pushing NewsHeap towards useful.

A feed reader should be able to

  • Maintain a list of subscribtions
  • Routinely query each subscription to see if updated content is present
  • Maintain a record of when the content was last checked
  • Provide per channel update frequency

The OPML file serves as the list of subscriptions. We can parse it, we can update it, alas we can't insert or move outlines around yet, but that will come.

To keep track of when channels were last updated, etags if present, update frequency etc, we need to introduce a Subscription. We also need something to manage a list of Subscriptions. A SubscriptionList.

One design issue to wrestle with is the representation of Subscription organization. Foreshadowing our UI requirements to organize our subscription list heirarchically, do we replicate the heirarchy between the subscription list and the OPML? Do we eliminate the OPML altogether and make it available as a import/export format?

Having gone back and forth on this, it strikes me that we can borrow the OPML parser and create a control file format by extending OPML and Outline.

A simple change to the OPML SAX Handler facilitates us setting the class for new outline instances.

def initialize(handlerClass=Outline)
     ...
     @handlerClass = handlerClass
end
    
    def start_element uri, localname, qname, attributes
      if (!@opml)				# ensure that first element is 'OPML'
         ...
     elsif (localname == 'outline')
	o = @handlerClass.new()   # **Was ---> Outline.new() **

We make a corresponding change to OPML.readFrom so that we can control the type of Outline.

We can create some subclasses now to hide these details from users of the classes. We need to create Subscripton and SubscriptionList

class Subscription < Outline
	def initialize()
	  @etag = nil
	  @lastModified = nil
	  @xmlURL = nil
	  @lastGet = nil
	  super()
	end
	
	def etag=(anEtag)
	   @attributes['etag'] = anEtag
	end
	
	def etag()
	    @attributes.fetch('etag',nil)
	end

You will note that we harden the interface for the Subscription class, adding accessors for elements such as etag.

SubscriptionList is even easier

class SubscriptionList < OPML
    ...
	
    def readFrom(aSource)
      super(aSource,**Subscription**)
    end

This will ensure that when we do

subList = SubscriptionList.new
subList.readFrom(File.new("somecontrolfile.xml"))

that all instances in the subscription list will be of class Subscription

A few unit tests and the addition of an iterator in OPML, and we're done.

Driver

Let's bring this together with a driver program. First, some top matter and initialization

 opmlFile = ARGV[0] || "myChannels.opml"  
  # read the control file
  subscriptionList = NewsHeap::SubscriptionList.new()
  subscriptionList.readFrom (File.new("./control/control.xml",File::CREAT|File::RDONLY))
 
  # read the opml 
  opml = OPML.new
  opml.readFrom(File.new(opmlFile))
  parser = RssParser::new()
  fetcher = HttpGetter::new()

The preceding checks for a control file in ./control/control.xml and creates it if not found. Start looping over every entry in the OPML file, checking the SubscriptionList for an entry.

# for every outline
  begin
    opml.each_outline { | outline | 
	url = outline.attributes["xmlUrl"] 
	# check control file
	subscription = getSubscription(subscriptionList,url)
                                                
                result = {}
                etag = subscription.etag
	lastModified = subscription.lastModified  
	data = fetcher.readData(url,result,etag,lastModified,NewsHeap::Control::AGENT)
			
	if (newData?(result,subscription))
	   result = parser.parse(data,result)
	end
		
	printIt(result)
	# update the control entry
	subscription.etag = result['etag']
	subscription.lastModified = result['modified']
	subscription.lastGet = Time.now()
	
    }
  rescue => bang
     print_backtrace(bang)
  end
  # update control data
  subscriptionList.persist("./control/control.xml")

We grab the etag and lastModified attributes from the subscription and pass them to our HTTPFetcher. The fetcher may return a 304 Not Modified if we've already got current content. The fetcher always updates the etag and last modified, so we can check if a new entry is present. If so, we'll pass it to the parser. Update the control entries and when we're all done, we persist the control file for use in the next go around.

One final enhancement makes this actually usable

  ...
  **etag = lastModified = nil if !cached?(url)  #  <=== Check the cache for this URL - don't modify headers if no cache hit**
  data = fetcher.readData(url,result,etag,lastModified,NewsHeap::Control::AGENT)
			
**  # update and cache results
  cache(data,url) if data # <=== cache the data so if we get a 304, we still have content to display
**			
  if (newData?(result,subscription))
    result = parser.parse(data,result)
  end

Summary

We have a control file to store details of our subscriptions and support for bandwidth friendly behavior.

What we're missing is per-channel update frequency and some behavior to rearrange the control file to reflect heirarchies and groups.

In the next round, we'll examine some GUI options, make a decision and implement the basic 3-pane viewer.

The existing code still needs to be cleaned up - camelCase vs canonical_ruby, organization etc can all be better. I'm adding unit tests as we go, so we can layer in some good feelings.

Download file

Filed under