Making Content Smarter, using the Web's Collective Knowledge.

on December 12, 2006

My English vocabulary is relatively small since it’s not my native language. Often I read materials on the Internet and sometimes I have to lookup some terms. If I wouldn’t lookup these new words, I will never learn them.

Secondly, it often happens that I know the meaning of some concepts, but I would like to know more about them. Lookup everything takes a while, which will disturb your mental model while reading.

This is why I started working on a little project to embed the knowledge of the Web in the reading process. I’m developing this thing in a pragmatic way using RubyOnRails. The main objectives are: I’ve just finished these objectives, which required writing a little HTML parser and an interface to use the Python NLP toolkit (which is far more superior then Ruby’s).

  • Providing the content with a simple piece of code that can make any piece of HTML smarter.
  • Using Natural Language Parsing (NLP) to pick out the important words.
  • Using Princeton’s WordNet to explain basic concepts.
NLP screenshot Wordnet screenshot

At the moment I’m refactoring the code so it can be distributed as a RubyOnRails plugin. For the near future the following features are on my TODO list:

  • Interfacing with the Wikipedia encyclopaedia.
  • Detecting concepts like ‘Software Engineer’ rather then detecting ‘Software’ and ‘Engineer’.
  • Looking for alternatives forms of user/reading interaction.

Current code: laboratoire_nuage-101206.tar.gz (or browse)

Requires: Python-NLTK, Ruby-Linguistics, Ruby-WordNet and RubyOnRails

Leave a Reply