Dominiek.com
Dominiek ter Heide
about 1 year ago
Tweet this Bookmark and Share

Making Content Smarter, using the Web's Collective Knowledge.

My English vocabulary is relatively small since it’s not my native language. Often I read materials on the Internet and sometimes I have to lookup some terms. If I wouldn’t lookup these new words, I will never learn them.

Secondly, it often happens that I know the meaning of some concepts, but I would like to know more about them. Lookup everything takes a while, which will disturb your mental model while reading.

This is why I started working on a little project to embed the knowledge of the Web in the reading process. I’m developing this thing in a pragmatic way using RubyOnRails. The main objectives are: I’ve just finished these objectives, which required writing a little HTML parser and an interface to use the Python NLP toolkit (which is far more superior then Ruby’s).

  • Providing the content with a simple piece of code that can make any piece of HTML smarter.
  • Using Natural Language Parsing (NLP) to pick out the important words.
  • Using Princeton’s WordNet to explain basic concepts.
NLP screenshot Wordnet screenshot

At the moment I’m refactoring the code so it can be distributed as a RubyOnRails plugin. For the near future the following features are on my TODO list:

  • Interfacing with the Wikipedia encyclopaedia.
  • Detecting concepts like ‘Software Engineer’ rather then detecting ‘Software’ and ‘Engineer’.
  • Looking for alternatives forms of user/reading interaction.

Current code: laboratoire_nuage-101206.tar.gz (or browse)

Requires: Python-NLTK, Ruby-Linguistics, Ruby-WordNet and RubyOnRails

blog comments powered by Disqus

About

I'm a 22 year old Web Developer who is currently engaged in RubyOnRails consulting. Apart from obsessing about many technology related topics, I enjoy traveling and international life. In the coming months I intend to boost up my entrepreneurial activities, so stay tuned!

Creative Commons License

All content on this blog is available under the Creative Commons Attribution 3.0 License. Dominiek.com is running Kakuteru a new Semantic-Web enabled lifestreamer. Design and interaction inspired by Yonfook's Sweetcron. Most icons used are by Joseph North.