tagging

Calais Module for Drupal Released

One of the hottest subjects in Web technologies, these days, is finding effective ways to exploit Collective Intelligence of masses. Most everybody has heard of the so-called Web 2.0 and hundreds of different definitions of what it is or is not. In simple terms, Web 2.0 is a phenomena characterized by vastly increased direct participation of the user community in content authoring, mostly through blogs and discussions around blogs. Web 2.0 has brought us to a state where more and better content is freely available online than ever before.

There is a major problem with collective intelligence, though: information pieces are often disbursed. The more we move from the early days of the Internet as static data publishing platform towards the Internet, an aggregator of Intelligence, more do modern search engines fall short of providing adequate results. Current technologies are often unable to put information in context and help us connect the dots. It is for that reason that there is an increased demand for tools that can extract context off of content and can aggregate different data sources in a meaningful way.

One of such tools that has caught some spotlight lately, has been Calais Web Service, released by the news giant: Reuters.

"The Calais web service allows you to automatically annotate your content with rich semantic metadata, including Entities like People and Companies and Events & Facts like Acquisitions and Management Changes." -- opencalais.com

What is exceptional and interesting about the Calais web-service, putting it beyond and above other free terms-extractor services (like the one from Yahoo!) is that Calais provides context to extracted terms. For instance, when Calais web service analyzes a piece of content and finds "George Bush", not only will it extract and return it as a term (keyword) relevant to the text, but it will also tell you that George Bush is a Person. Likewise, it will tell you that United States is a country. This may seem trivial and simple, but if you put the added information (entity type) to a good use, you can build systems much more intelligent than you could with other, flat terms extraction tools.

Calais is a free Web Service. You can plug it into your applications and/or content management systems and use it, without any charge . Frank and I, spent a lot of our time last month integrating it into Drupal CMS and are glad to announce that it is now available for both Drupal 5 and Drupal 6. It is also the first integration of Calais API with a major content-management system.

You can download Calais integration module from:
http://drupal.org/project/opencalais
You can also watch a short screencast Frank recorded to demo main features of the module: http://calais.phase2technology.com/content/calais-demo-screencast

The screencast was recorded before the code was finalized, so the module can actually do more than you see in the screencast. You are encouraged to download and test-drive it.

And last, but not least, we would like to express our gratitude to our friends at the Calais team, for their invaluable help and support.

Del.icio.us Tag Filtering in PHP/Drupal

Filtering content through URI tagging was initially popularized by Del.icio.us and is now a common way to quickly navigate content. For instance, in a system that supports this type of navigation, you can constract a URL:

http://example.com/tag/drupal+news,rss

and get a vertical view of the domain data. In this markup, "+" stands for logical AND, whereas "," stands for logical OR.

Following is a PHP code snippet that processes "delicioused" query string into a logical expression (you can modify the code slightly and get an SQL where clause instead). In addition to the original del.icio.us syntax, it allows tags with spaces in them. You just need to enclose those in single or double quotation marks. For instance, the following expression is a valid one: http://example.com/tag/drupal+news,rss,'Steve Jobs'. Note: using quotation marks inside the tag names is still invalid syntax!

And following is the code sample:

reCaptcha and Save a Book

Web these days is full of spam-bots, malicious crawlers and other "e-pests" that post junk-advertising all over the Net. These parasites give the most headache to blogs and social sites, where moderators want to free-up comments to general audience ("Two-Way Communication", eh?), but have no desire to promote illegal Viagra sales.

Captcha, or Completely Automated Public Turing test to tell Computers and Humans Apart, is one of the most effective tools to fight evil machines with. Unfortunately, not all Captcha is equal. Many Captcha systems are vulnerable and have been hacked, rendered useless against all but the most primitive spamming.

reCAPTCHA is a service from Carnegie Mellon University. This service is a prime example of blending pleasant with useful in a very Web 2.0-ish way. reCAPTCHA provides a free, high-quality protection and at the same time helps digitize old books. Every time you use reCaptcha, you help digitize one word of a book that was written before the digital age. How much cooler can it get? Well, reCaptcha also provides enhanced accessibility through audio-Captchas. That one is not easy to "code" yourself, and I don't know of any other free service that offers it.

Great job, guys!

Syndicate content