Motivation
We’ve been using AWS load balancers with autoscaling instances for years now and it’s great at handling load, but it’s quite a bit of infrastructure to manage (even with Troposphere + CloudFormation). We also have to manage all the data flow, queue processing and such ourselves: multiple SQS queues, EC2 polling, recording state in databases… More of our code is dedicated to that “plumbing and wiring” than the actual focus of our application.
We’ve been looking at “serverless” options after the announcement of AWS Lambda, and been following the rapid development of the Serverless framework. Recently I started working on a proof of concept, re-grooving our cloudy application for a serverless world. So far, I’m really lovin’ it.
Instead of implementing data flow, finite state machines, queuing and such in our software, we describe that wiring in terms of AWS “events” that trigger “functions” running on Lambda infrastructure. Instead of “data flow as code” we now have “data flow as configuration”.
We’re comfortable with AWS services — and for this exercise, want to avoid using any 24×7 EC2 servers — so our Lambda functions interact with S3 object storage, DynamoDB databases, and Elasticsearch service search engines. Here are some design patterns that have proven to be helpful as we’ve come up to speed using this brave new world; they’re pretty generic problem solving approaches so should be applicable to your applications as well.
Sample Application
S3 Events
events in each lambda function definition separately for each event type, so we can have a different function module for each event, like:functions: extract: handler: extract.handler events: - s3: bucket: images-in event: s3:ObjectCreated:* nuke: handler: nuke.handler events: - s3: bucket: images-in event: s3:ObjectDeleted:*
extract.py and nuke.py, each of which has its own handler function. Easy enough, but could be a bit too fine-grained if there’s a lot of redundant code in the two files.functions: create: handler: s3in.handle_create events: - s3: bucket: images-in event: s3:ObjectCreated:* delete: handler: s3in.handle_delete events: - s3: bucket: images-in event: s3:ObjectDeleted:*
s3in.py module with two handler functions, handle_create() and handle__delete(). If both share some code, this reduces repetition.functions: extract: handler: s3event.handler events: - s3: bucket: images-in event: s3:*
DynamoDB Streams
eventName like “INSERT”, “MODIFY” or “REMOVE”. We don’t get separate events we can discriminate on in the severless.yml file.eventname = record['eventName'] if eventname == 'REMOVE': self.delete() .... raise Exception('Unimplemented: id={} ignoring eventname={}'.format(self.id, eventname))
handler() function.Handler structure
records[0].def handler(event, context): try: for record in event['Records']: AssetDDBRecordHandler(record) except Exception as e: msg = 'ERROR asetddb.handler: {}'.format(e) log.error(msg) return {'event': event, 'message': msg} return {'event': event, 'message': 'Function executed successfully: asset.handler'}
class AssetDDBRecordHandler: def __init__(self, record): self.id = record['dynamodb']['Keys']['id']['S'] eventname = record['eventName'] # INSERT, MODIFY, REMOVE if eventname == 'REMOVE': self.delete() elif eventname == 'INSERT': self.insert() raise Exception('Unimplemented: id={} ignoring eventname={}'.format(self.id, eventname)) def delete(self): try: res = es.delete(index='images', doc_type='image', id=self.id) except Exception as e: raise Exception('id={} deleting Elasticsearch index: {}'.format(self.id, e))
DynamoDB Streams Native Protocol Deserialization
{u'_dt': {u'S': u'2017-02-08T13:30:38.915580'}, u'id': {u'S': u'ALEX18'}, u'metadata': {u'M': {u'description': {u'S': u'12-year-old...'}}} }
from boto3.dynamodb.types import TypeDeserializer deserialize = TypeDeserializer().deserialize for record in event['Records']: data = {} new = record['dynamodb'].get('NewImage') if new: for key in new: data[key] = deserialize(new[key]) id = data['id']
data as native Python objects.data = {k, deserialize(v) for k, v in new.items()}