Delicious API Integration

Tuesday July 1, 2008 - 26 months ago

Posted by James Ellis / Filed under Code, Web

Last week we announced the relaunch of the Thinking for a Living website. Today a bit about how it works.

Earlier this year Duane King (of BBDK) and I started discussing possibilities for a new TFAL website. What had started out as a brief booklet was evolving into a larger concept: an open-source design education resource. Duane wanted to grow a database of print and online resources, and find a way to publish this information effectively online. I recommended using Delicious, the Yahoo-owned online bookmarking service, as a back-end.

Delicious allows you to maintain a single repository of bookmarks, with all data hosted on Delicious servers instead of your local machine. The obvious benefit to this system is that your bookmarks are accessible from any computer on the web.

It’s easy to push new bookmarks to the system: You can use the Delicious Firefox extension or browser bookmarklets to quickly bookmark items as you browse. Each bookmark may include information such as title and description. And rather than organize bookmarks by placing them in specific folders, each Delicious bookmark can be associated with any number of tags.

Most important for our purposes, and where things get interesting: Delicious offers an API that allows external applications to communicate with the service. By tapping into the API, we can access the TFAL Delicious account bookmarks, and (re)publish this data/content on the TFAL website in whatever fashion we like.

This is interesting as most people view Delicious as a personal internet utility, not a content management system. It’s also interesting that we can take our bookmark data/content out of the highly normalized Delicious environment and integrate it within something of our own design.

It could be said that we’re simply reformatting information, however, I think there is more to do it than that. The important point is that we’re able to involve our content in a larger context and give it greater meaning. And by presenting it in a format of our own design, we can make the content that much more uniquely ours. Given the standardization of the modern web, we think a little specialness goes a long way.

Working with the API

Athletics has been working with the Zend Framework for some time now. Given the investment we’ve made in our existing systems/infrastructure, ZF’s plug-n-play approach to framework integration suits us well. ZF should be accessible to anyone with a decent grasp of PHP.

ZF provides dozens of modules designed to address a variety of different tasks. For this project we made use of Zend_Service_Delicious, a module designed specifically for interfacing with the Delicious API.

An example of the code required to connect to the Delicious API and query for all posts:

  1. <?php
  2. // Connect to the Delicious service
  3. $delicious = new Zend_Service_Delicious('username', 'password');
  4.  
  5. // Query for all posts
  6. $all_posts = $delicious->getAllPosts();
  7.  
  8. // Loop through all posts
  9. foreach ($all_posts as $post) {
  10. echo "Title: $post->getTitle()";
  11. echo "Url: $post->getUrl()";
  12. }
  13. ?>

It doesn’t get much easier than that. ZF abstracts all the complicated stuff; you never have to bother about XML/JSON/etc.

Caching

Delicious is strict about how often you hit the API, as they can’t have external apps slamming the service and draining bandwidth. From their API help page:

  • Please wait AT LEAST ONE SECOND between queries, or you are likely to get automatically throttled.
  • Please watch for 503 errors and back-off appropriately.

This means you must implement some sort server-side caching to prevent your site from making too many calls to the Delicious API. With TFAL, we store the result of $delicious->getAllPosts() for about two hours. We do this using another handy ZF module: Zend_Cache.

In the following example we combine Zend_Service_Delicious with Zend_Cache to store the Delicious data for two hours:

  1. <?php
  2. $frontendOptions = array(
  3. 'lifetime' => 7200, // cache lifetime (60 * 60 * 2 = 2 hrs)
  4. 'automatic_serialization' => true
  5. );
  6.  
  7. $backendOptions = array(
  8. 'cache_dir' => 'path_to_cache_dir' // Directory where we put the cache files
  9. );
  10.  
  11. // Instantiate a Zend_Cache_Core object
  12. $cache = Zend_Cache::factory('Core', 'File', $frontendOptions, $backendOptions);
  13.  
  14. // Can we load the cache?
  15. if( !$delicious_data = $cache->load('AllPosts' ) ) {
  16.  
  17. try {
  18. // Connect to the Delicious service
  19. $delicious = new Zend_Service_Delicious( 'username', 'password' );
  20.  
  21. $delicious_data = $delicious->getAllPosts($tag=null);
  22.  
  23. // Save the results to the Cache
  24. $cache->save( $delicious_data, 'AllPosts');
  25.  
  26. } catch ( Zend_Service_Delicious_Exception $e ) {
  27. // If connection to Delicious failed, load old cache regardless of expiration
  28. $delicious_data = $cache->load('AllPosts', $doNotTestCacheValidity=true );
  29. }
  30. }
  31. ?>

In the code above we initialize a Zend_Cache object on lines 1-12. On line 15 we attempt to load the cached version of the Delicious data. If the cache has expired we continue to lines 17-29 where we first try to connect to the Delicious API and request all posts (lines 19-24). If this fails (the server couldn’t connect to the Delicious service for any reason), we catch the exception (line 26) and force load the last cached version of the Delicious data, regardless of whether it has passed its 2-hour expiration date.

We found this sort of caching essential as Delicious doesn’t hesitate to block your requests if you get too greedy. (We were blocked a couple times during development.)

Additional caching

Storing $delicious->getAllPosts() is great and will definitely keep you from being booted off the Delicious API, but once you start dealing with large data sets (with over 1200+ items, TFAL certainly qualifies) additional page-level caching can speed things up significantly. For a high-traffic page like the Home page, we store the system-rendered HTML page output for 10 minutes instead of repeatedly asking the system to spider all 1200+ items just to determine what constitutes the latest 3 resources. This makes a lot of sense given that the $delicious->getAllPosts() data-set only updates every 2 hours.

We have page-level caching functionality built into our general purpose web framework. The implementation is very similar to that of WP-Cache. However, I should mention that Zend_Cache can also be used for page-level output caching.

The numbers:

Rendering the Home without page caching requires 0.2 seconds and 6.24MB of memory. (This is, of course, using the cached result of $delicious->getAllPosts(). Making the full trip to the Delicious API takes much longer — usually a second or two.)

0.2 seconds might seem fast, but it’s actually a bit excessive. By using a full page-cache we can get this down to .01 seconds and 1.83MB of memory. Big improvement. This additional caching is particularly important the day your site gets linked up on a heavily trafficked website and an avalanche of traffic rolls in.

Additional notes:

The odd del.icio.us name/web address has always bothered me a bit. I’m all for waving the nerd flag, but I’ve never found domain hacks to be all that clever. Inevitably in conversation someone ends up saying “dell dot ick-eo dot you-ess”, which is ridiculous. After being acquired by Yahoo they picked up delicious.com, which bounces you to del.icio.us. I understand Yahoo intends to rebrand the service simply “Delicious” at some point. I’m looking forward to that.

Also, this post was an excuse to play with GeSHi, the PHP-based generic syntax highlighter class. It’s really quite handy and supports fifty-eleven languages. We’ll be using this for code highlighting from now on.


Questions/Comments: Contact James via email - .