manicwave

Surf the wave

Got OPML?

Permalink

RSS is in the bag (more or less). Now we need to start getting a list of feeds, consume them and keep track of what we've seen and when.

Regardless of what Amr E. Malik thinks of OPML, it seems to be the default format for storing feeds lists in RSS aggregators. See OPML Loader for some interesting stuff. The following contrived example of OPML includes a list of sites organized as a heirarchy.

<?xml version="1.0"?>
<!-- OPML pooped by NewsHeap -->
<opml version="1.1">
  <head>
     <title>mySubscriptions</title>
  </head>
  <body>
    <outline text="Sites of interest" ">
      <outline text="manicwave" description="Surfing the Wave" title="manicwave" type="rss" version="RSS" 
                      htmlUrl="http://manicwave.com/blog" xmlUrl="http://manicwave.com/blog/rss.xml"/>
      <outline text="manicwave3" description="Surfing the Wave" title="manicwave" type="rss" version="RSS" 
                      htmlUrl="http://manicwave.com/blog" xmlUrl="http://manicwave.com/blog/rss.xml">
         <outline text="manicwave3.1" description="Surfing the Wave" title="manicwave" type="rss" version="RSS" 
                         htmlUrl="http://manicwave.com/blog" xmlUrl="http://manicwave.com/blog/rss.xml"/>
         <outline text="manicwave3.2" description="Surfing the Wave" title="manicwave" type="rss" version="RSS" 
                         htmlUrl="http://manicwave.com/blog" xmlUrl="http://manicwave.com/blog/rss.xml"/>
      </outline>
    </outline>
    <outline text="manicwave4" description="Surfing the Wave" title="manicwave" type="rss" version="RSS" 
                    htmlUrl="http://manicwave.com/blog" xmlUrl="http://manicwave.com/blog/rss.xml"/>    
  </body>
</opml> 

OPML is simply a format. We need a basic class that can consume and produce OPML. A simple internal representation of OPML would be attributes which simply is a hash of elements from the section. The outline items are an array of Outline instances, each of which can contain a list of children.

Typical usage would be

  opml = OPML.new
  opml.readFrom(string or IO)
  opml.roots.each { | outline |
      print "Has children\n" if (outline.has_children?)
      print "No children\n" if (!outline.has_children?)
  } 

to access the attributes of each outline, outline.attributes[_'attrName'_]

OPML is done. We could imagine the intersection of the OPML code and the RSS code thusly:

require 'rss-parser'
require 'OPML'
require 'pp'  
rssParser = RssParser::new()
fetcher = HttpGetter::new()  
opml = OPML.new
opml.readFrom(File.new('test.opml'))  # **<==Change this appropriately**
opml.roots.each { |outline|
  next unless url = outline.attributes['xmlUrl']
  data = fetcher.readData(url)  
  result = rssParser.parse(data)
  pp result
}

Download is here

In the next installment, we'll add a control file to keep track of feeds we've read, when and associated etags if any. Then we can start caching content and doing net-friendly updates.