Unix’ beauty lies in the fact that you can easily build powerful systems out of simple components. This post describes an email-to-database interface for one of our web applications using Python and some Unix tools.
Users came up with the urgent need to annotate database records with arbitrary content. Rather than bending our application into a document management system (which apparently it is not) I came up with the idea to offer an email-to-website service.
Or as my friend Florian put it: “With a few billion users, email is quite a succesful social network.”
Mailing to a script
In order to write mails to a program rather than a conventional mailbox you need to edit the file /etc/aliases .
booking: |/opt/wu-wien/roomsmta/bin/booking.sh
Here I’m sending the mail to an address called
booking. Every incoming mail invokes the shell script booking.sh.
If you want to forward your mail to other locations simply add mailboxes or addresses after a comma. Don’t forget to run newaliases after the file is saved.
Shell wrapper
#!/bin/sh PYTHON_EGG_CACHE=/opt/wu-wien/roomsmta/cache export PYTHON_EGG_CACHE exec /opt/wu-wien/roomsmta/bin/booking.py
The incoming mail is passed to a shell script which sets some environment variables needed for virtualenv. The Unix command
exec passes standard input on to the Python script. The result will be a custom Python interpreter set up with its very own environment.
The Python script
My script reads the email from STDIN, does some parsing magic and eventually puts the content into a database. (For reasons of clarity I have omitted the latter part in my example.) Since my mail configuration saves the mail in a proper mailbox as well I do not store attachments in the database. If you’re interested in parsing attachments I recommend Ian Lewis’ post.
#!/opt/wu-wien/roomsmta/v_mtapy/bin/python
import email.FeedParser
from email.utils import parseaddr
import sys, os
import base64
import re
import cx_Oracle
def parse_email(email_input):
"""Return message object"""
parser = email.FeedParser.FeedParser()
msg = None
for msg_line in email_input:
msg = parser.feed(msg_line)
msg = parser.close()
return msg
class Email(object):
def __init__(self, msg):
self.msg = msg
self.subject = msg['Subject']
self.date = msg['Date']
(self.from_name, self.from_email) = parseaddr(msg['From'])
self.to_email = parseaddr(msg['To'])
self.has_attachments = 0
msgbody = []
for part in msg.walk():
if part.get_content_type() == 'text/plain':
if part['Content-Transfer-Encoding'] == 'base64':
body = unicode(base64.b64decode(part.get_payload()), part.get_content_charset(), 'replace' )
else:
try:
body = unicode(part.get_payload(), part.get_content_charset(), 'replace')
except TypeError:
# if no charset is given
body = unicode(part.get_payload())
msgbody.append(body)
elif part.get_content_type() == 'text/html':
html = unicode(part.get_payload(), part.get_content_charset(), 'replace')
raw = nltk.clean_html(html)
msgbody.append(raw)
#print raw.encode('utf8', 'replace')
elif part.get("Content-Disposition"):
#print part.get("Content-Disposition")
self.has_attachments = 1
msgbody.append(u'ATTACHMENT: %s' % (part.get_filename(),))
self.body = u'\n'.join(msgbody)
def save(self):
print 'GOING on to save'
if __name__ == '__main__':
email_input = sys.stdin.readlines()
msg = parse_email(email_input)
Email(msg).save()
Testing the setup
Testing your code turns out to be a nightmare since you will need a mail client to invoke your script. Fortunately Unix comes with pipes which mock the email behavior:
less my_test_mail.eml | ./booking.py
Addressing records in the database
Until now we have written all our email to booking@servername. But what you need in an application context is the possibility to clearly attribute incoming email to specific database records. I have seen some solutions carrying the magic in the subject field, but I prefer RFC2822‘s possibility to write content after a plus sign into the address field. In my case this will either be the record’s id and the database instance in use (i.e. test or production).
tocc = '%s %s' % (self.msg['To'], self.msg['Cc'])
# has formats:
# booking+1234T@localhost
# booking+1234P@localhost
#
m = re.search('booking\+\d+[PT]', tocc)
try:
tocc = m.group(0)
instance = tocc[-1:]
tid = tocc[:-1].replace('booking+','')
return (instance, tid)
except AttributeError:
raise NoEventFoundVoilá, I can write mails to
booking+1234T@servername.
Thanks to Roland and Willi who did the tricky work;-)





























About