Bad English, yet findable by search engines…
Memcached is a memory-based caching system that can be easily installed on various operating systems. Online documentation is abundant (try this) – so let’s focus on it’s purpose. Imagine some expensive function, e.g. a CPU-costly database calculation or some network-bound function. For the sake of simplicity my example just sleeps for 5 secs:
import time def expensive_function(x): time.sleep(5) return x expensive_function('some_input')
The function in my example always returns the same output given an input, thus it is said to be referentially transparent. Hence the expensive function can be replaced with it’s value. (This technique is either called caching or memoization but I don’t really care about the disambiguation.)
The first approach looks like this:
import memcache import time def expensive_function(x): mc = memcache.Client(['127.0.0.1:11211'], debug=0) # calculate a unique key key = 'expensive_function_%s' % (x,) # check if key/value pair exists value = mc.get(key) if value: return value else: time.sleep(5) value = x # cache result for 5 mins mc.set(key, value, 60 * 5) return value expensive_function('some_input') expensive_function('some_input')
On the first invocation the function now runs for five seconds whereas the second will return immediately. A unique key is needed to store and retrieve the values. Since memcached is not bound to the Python runtime the speedup also works in a second process.
The return value depends on the functions’s input value. So the key must contain (some form of) the input. I will show a better solution below.
By now, it’s possible to cache results of a single function. In a bigger application context we run into two problems. First, we violate the DRY principle because we have to write the code getting and setting key-value pairs over and over again. Second, and more severe, we risk setting the same key for different combinations of functions and input parameters. Since this could lead to harmful errors hard to debug it seems advisable to place the caching logic in a single point of your code.
Thanks to Python’s decorators the solution is easy. (A decorator simply is a function that intercepts execution of another function or method. So every time we call expensive_function a wrapper function will look for cached results and decide whether to invoke as intended or return from the cache immediately.) The function can now be decorated. The code readability is preserved.
#!/usr/bin/env python import time import memcache import hashlib def memoize(f): def newfn(*args, **kwargs): mc = memcache.Client(['127.0.0.1:11211'], debug=0) # generate md5 out of args and function m = hashlib.md5() margs = [x.__repr__() for x in args] mkwargs = [x.__repr__() for x in kwargs.values()] map(m.update, margs + mkwargs) m.update(f.__name__) m.update(f.__class__.__name__) key = m.hexdigest() value = mc.get(key) if value: return value else: value = f(*args, **kwargs) mc.set(key, value, 60) return value return f(*args) return newfn @memoize def expensive_function(x): time.sleep(5) return x if __name__ == '__main__': print expensive_function('abc') print expensive_function('abc')
The function name, class name and all parameters passed produce a hash key which will be used as key by memcached. I’m still not happy with the solution because uniqueness of the key is not guaranteed. (And all solutions I can think of revolve around some string concatenation magic…) A glimpse at more sophisticated implementations, e.g. Django’s cache decorator or memorised, work pretty much the same way though.