In the current SNS - Social Networking Site - boom it is becoming increasingly important to deal with usability. People have accounts for many different websites and it's getting more and more tiring to register for a new account. This is one of the main reasons why Confabio.com doesn't require you to signup and login. And it's also one of the main reasons why websites like Wakoopa.com make their registration as painless as possible. A colleague of mine was even experimenting with the idea of omitting the username/email requirement at all. Also, OpenID is yet too young and Sun's Liberty Alliance is just too corporate and slow.

But for most social networking sites it's pretty simple: they just need people to enter information. So let's make that as easy for the user as possible.

Entering Syndication Feeds

For one of my projects I have to let users enter information about themselves. This is so they can build up their own profile. What I really like about some of the new sites is that they aggregate your blog's contents and your FlickR pictures.

One of such websites is the Tokyo based Social Networking Site Asooboo.com. After signing up you can enter your blog feed and FlickR username and it will keep track of all your stories and pictures. I think that's really cool and it's one of the first steps in making the web more ubiquitous. You can later change your Feed URL in your 'edit my profile':

Entering Links instead of Feeds

Entering feeds is nice, but to users that are not tech-savy 'Feed, RSS and Atom' might raise question marks. Therefore I think it would be nice if the users wouldn't have to worry about feeds, but instead can just enter their links like:

My Websites and Profiles:

  • http://blog.dominiek.com/
  • http://www.flickr.com/photos/dominiekterheide/
  • http://del.icio.us/dominiekth

It would then show a fancy spinner and convert it to 'My Blog', 'My Pictures' and 'My Links'. All content will be automatically aggregated if it can detect any RSS feeds on those pages.

Detecting RSS feeds

When you use a proper browser like Mozilla Firefox you will see a syndication icon every time you visit a website that has RSS feeds:

It does this by reading certain HTML tags.

After a quick search I couldn't find any code to do this in my own project, so I wrote a little piece of code for it with a RubyOnRails integration test.

You can use it like this:

 FeedDetector.fetch_feed_url('http://blog.dominiek.com/')
 => "http://blog.dominiek.com/feed/atom.xml"
 FeedDetector.fetch_feed_url('http://blog.dominiek.com/feed/atom.xml')
 => "http://blog.dominiek.com/feed/atom.xml"
 FeedDetector.fetch_feed_url('http://www.flickr.com/photos/dominiekterheide/', :rss)
 => "http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&lang=en-us&format=rss_200"
 # alternatively you can parse HTML with FeedDetector.get_feed_path(html_data)
 # see integration test for more examples

FeedDetector + Test

Excuse my quick mash code. The FeedDetector (lib/feed_detector.rb):


require 'net/http'

class FeedDetector

  ##
  # return the feed url for a url
  # for example: http://blog.dominiek.com/ => http://blog.dominiek.com/feed/atom.xml
  # only_detect can force detection of :rss or :atom
  def self.fetch_feed_url(page_url, only_detect=nil)
    url = URI.parse(page_url)
    host_with_port = url.host
    host_with_port << ":#{url.port}" unless url.port == 80
    req = Net::HTTP::Get.new(url.path)
    # something fishy going on with URI.host
    res = Net::HTTP.start(url.host.gsub(/:[0-9]+/, ''), url.port) {|http|
      http.request(req)
    }
    feed_url = self.get_feed_path(res.body, only_detect)
    feed_url = "http://#{host_with_port}/#{feed_url.gsub(/^\//, '')}" unless !feed_url || feed_url =~ /^http:\/\// 
    feed_url || page_url
  end

  ##
  # get the feed href from an HTML document
  # for example:
  # ...
  # <link href="/feed/atom.xml" rel="alternate" type="application/atom+xml" />
  # ...
  # => /feed/atom.xml
  # only_detect can force detection of :rss or :atom
  def self.get_feed_path(html, only_detect=nil)
    unless only_detect && only_detect != :atom
      md ||= /<link.*href=['"]*([^\s'"]+)['"]*.*application\/atom\+xml.*>/.match(html) 
      md ||= /<link.*application\/atom\+xml.*href=['"]*([^\s'"]+)['"]*.*>/.match(html) 
    end
    unless only_detect && only_detect != :rss
      md ||= /<link.*href=['"]*([^\s'"]+)['"]*.*application\/rss\+xml.*>/.match(html) 
      md ||= /<link.*application\/rss\+xml.*href=['"]*([^\s'"]+)['"]*.*>/.match(html) 
    end
    md && md[1]
  end

end

The integration test (test/integration/feed detector test.rb:


require "#{File.dirname(__FILE__)}/../test_helper"


class FeedDetectorTest < ActionController::IntegrationTest

  def test_fetch_feed_url
    return # uncomment me to test HTTP fetching

    # test mephisto
    feed_url = FeedDetector.fetch_feed_url('http://blog.dominiek.com/')
    assert_equal('http://blog.dominiek.com/feed/atom.xml', feed_url)
    # test wordpress
    feed_url = FeedDetector.fetch_feed_url('http://digigen.nl/')
    assert_equal('http://digigen.nl/feed/', feed_url)

    # test non conventional port
    feed_url = FeedDetector.fetch_feed_url('http://blog.dominiek.com:8000/')
    assert_equal('http://blog.dominiek.com:8000/feed/atom.xml', feed_url)

    # test only_detect rss/atom on flickr
    feed_url = FeedDetector.fetch_feed_url('http://www.flickr.com/photos/dominiekterheide/', :atom)
    assert_equal('http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&amp;lang=en-us&format=atom', feed_url)
    feed_url = FeedDetector.fetch_feed_url('http://www.flickr.com/photos/dominiekterheide/', :rss)
    assert_equal('http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&amp;lang=en-us&format=rss_200', feed_url)

    # make sure that feeds return themselves
    feed_url = FeedDetector.fetch_feed_url('http://blog.dominiek.com/feed/atom.xml')
    assert_equal('http://blog.dominiek.com/feed/atom.xml', feed_url)
    feed_url = FeedDetector.fetch_feed_url('http://digigen.nl/feed/')
    assert_equal('http://digigen.nl/feed/', feed_url)
  end

  def test_get_feed_path
    body = []
    body << ' <html>'
    body << '  <head>'
    body << '   <link href="/super.css" rel="alternate" type="text/css"/>'
    body << '   <link href="/feed/atom.xml" rel="alternate" type="application/atom+xml" />'
    body << '  </head>'
    body << ' </html>'

    # Mephisto
    feed_path = FeedDetector.get_feed_path(body.join("\n"))
    assert_equal('/feed/atom.xml', feed_path)
    body[3] = '   <link href=\'/feed/atom.xml\' rel="alternate" type="application/atom+xml" />'
    feed_path = FeedDetector.get_feed_path(body.join("\n"))
    assert_equal('/feed/atom.xml', feed_path)

    # FlickR
    body[3] = '<link rel="alternate" type="application/atom+xml" title="Flickr: Photos from dominiekth Atom feed" href="http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&amp;lang=en-us&format=atom">'
    feed_path = FeedDetector.get_feed_path(body.join("\n"))
    assert_equal('http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&amp;lang=en-us&format=atom', feed_path)
          body[4] = '<link rel="alternate"   type="application/rss+xml" title="Flickr: Photos from dominiekth RSS feed" href="http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&amp;lang=en-us&format=rss_200">'
    feed_path = FeedDetector.get_feed_path(body.join("\n"))
    assert_equal('http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&amp;lang=en-us&format=atom', feed_path)
    feed_path = FeedDetector.get_feed_path(body.join("\n"), :rss)
    assert_equal('http://api.flickr.com/services/feeds/photos_public.gne?id=71386598@N00&amp;lang=en-us&format=rss_200', feed_path)

    # Wordpress
    body[3] = '<link rel="alternate" type="application/rss+xml" title="Digigen RSS Feed" href="http://digigen.nl/feed/" />'
    body[4] = ' </head>'
    feed_path = FeedDetector.get_feed_path(body.join("\n"), :atom)
    assert_equal(nil, feed_path)
    feed_path = FeedDetector.get_feed_path(body.join("\n"), :rss)
    assert_equal('http://digigen.nl/feed/', feed_path)
  end

end

I'm sure this might be useful to some people so Enjoy!