Making Content Smarter, using the Web's Collective Knowledge.
My English vocabulary is relatively small since it’s not my native language. Often I read materials on the Internet and sometimes I have to lookup some terms. If I wouldn’t lookup these new words, I will never learn them.
Secondly, it often happens that I know the meaning of some concepts, but I would like to know more about them. Lookup everything takes a while, which will disturb your mental model while reading.
This is why I started working on a little project to embed the knowledge of the Web in the reading process. I’m developing this thing in a pragmatic way using RubyOnRails. The main objectives are: I’ve just finished these objectives, which required writing a little HTML parser and an interface to use the Python NLP toolkit (which is far more superior then Ruby’s).
- Providing the content with a simple piece of code that can make any piece of HTML smarter.
- Using Natural Language Parsing (NLP) to pick out the important words.
- Using Princeton’s WordNet to explain basic concepts.
| NLP screenshot | Wordnet screenshot |
At the moment I’m refactoring the code so it can be distributed as a RubyOnRails plugin. For the near future the following features are on my TODO list:
- Interfacing with the Wikipedia encyclopaedia.
- Detecting concepts like ‘Software Engineer’ rather then detecting ‘Software’ and ‘Engineer’.
- Looking for alternatives forms of user/reading interaction.
Current code: laboratoire_nuage-101206.tar.gz (or browse)
Requires: Python-NLTK, Ruby-Linguistics, Ruby-WordNet and RubyOnRails
Leave a Reply