Memcached decorator for Python

Bad English, yet findable by search engines…

After returning from Europython 2011 I was specially excited about two things: Memcached and Python’s decorator syntax. Don’t know why this took so long but finally it was  time to try things out.

Memcached is a memory-based caching system that can be easily installed on various operating systems. Online documentation is abundant (try this) – so let’s focus on it’s purpose. Imagine some expensive function, e.g. a CPU-costly database calculation or some network-bound function. For the sake of simplicity my example just sleeps for 5 secs:

import time

def expensive_function(x):
    time.sleep(5)
    return x

expensive_function('some_input')

The function in my example always returns the same output given an input, thus it is said to be referentially transparent. Hence the expensive function can be replaced with it’s value. (This technique is either called caching or memoization but I don’t really care about the disambiguation.)

The first approach looks like this:

import memcache
import time

def expensive_function(x):
    mc = memcache.Client(['127.0.0.1:11211'], debug=0)
    # calculate a unique key
    key = 'expensive_function_%s' % (x,)    
    # check if key/value pair exists
    value = mc.get(key)
    if value:
        return value
    else:
        time.sleep(5)
        value = x
        # cache result for 5 mins
        mc.set(key, value, 60 * 5)
        return value

expensive_function('some_input')
expensive_function('some_input')

On the first invocation the function now runs for five seconds whereas the second will return immediately. A unique key is needed to store and retrieve the values. Since memcached is not bound to the Python runtime the speedup also works in a second process.

The return value depends on the functions’s input value. So the key must contain (some form of) the input. I will show a better solution below.

By now, it’s possible to cache results of a single function. In a bigger application context we run into two problems. First, we violate the DRY principle because we have to write the code getting and setting key-value pairs over and over again. Second, and more severe, we risk setting the same key for different combinations of functions and input parameters. Since this could lead to harmful errors hard to debug it seems advisable to place the caching logic in a single point of your code.

Thanks to Python’s decorators the solution is easy. (A decorator simply is a function that intercepts execution of another function or method. So every time we call expensive_function a wrapper function will look for cached results and decide whether to invoke as intended or return from the cache immediately.) The function can now be decorated. The code readability is preserved.

#!/usr/bin/env python

import time
import memcache
import hashlib

def memoize(f):

    def newfn(*args, **kwargs):
        mc = memcache.Client(['127.0.0.1:11211'], debug=0)
        # generate md5 out of args and function
        m = hashlib.md5()
        margs = [x.__repr__() for x in args]
        mkwargs = [x.__repr__() for x in kwargs.values()]
        map(m.update, margs + mkwargs)
        m.update(f.__name__)
        m.update(f.__class__.__name__)
        key = m.hexdigest()

        value = mc.get(key)
        if value:
            return value
        else:
            value = f(*args, **kwargs)
            mc.set(key, value, 60)
            return value
        return f(*args)

    return newfn

@memoize
def expensive_function(x):
    time.sleep(5)
    return x

if __name__ == '__main__':
    print expensive_function('abc')
    print expensive_function('abc')

The function name, class name and all parameters passed produce a hash key which will be used as key by memcached. I’m still not happy with the solution because uniqueness of the key is not guaranteed. (And all solutions I can think of revolve around some string concatenation magic…) A glimpse at more sophisticated implementations, e.g. Django’s cache decorator or memorised, work pretty much the same way though.

3 Comments

  • Pingback: Python Unique Method Identifier | PHP Developer Resource

  • September 4, 2012 - 2012-09-04 11:29:37 | Permalink

    Thanks for your decorator! If the value returned from memcache evaluates to false (for example, integer value 0, empty list, boolean False), the function is executed. It would be better to compare value to None. However, if a return value of None would still leave the function re-executed.

  • Pingback: Mathias' Blog » Designing APIs that work

  • Leave a Reply

    Your email address will not be published. Required fields are marked *

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>