COVID-19 Stats Collection with AWS — A serverless recipe that costs $0.00

Rajesh Rajamani
Dev Genius
Published in
6 min readJun 26, 2020

--

Photo by Christian Wiediger on Unsplash

The COVID-19 pandemic has certainly wrecked economies , normal life and what not. But there are things that are thriving . Especially the ones that are centered around the pandemic itself.

Tracking pandemic numbers is not an easy job. Government bodies , Healthcare departments , Academia , Grass-root workers all are onto it. Mostly this data is open-source . And it has to stay that way in these tough times for “the more we know , the better we can fight back”.

One of the most critical requirement during all this is a sensible and timely archival of this data . By doing so, you have track of the numbers in a time-series which can be used for forecasting the future trends . With the abundant free-tier that AWS provides ( hats off to you AWS Team ) I have built a daily archival system for COVID-19 Data hosted by BING .

Note : The data I refer here is in the public domain and it can be accessed by anyone.

An initiation for un-initiated on “serverless” .

Serverless Computing is a Cloud service offering whereby you can develop and deploy code without having worry about infrastructure requirements.

Things you need to build this recipe:

  1. An AWS account with free-tier built-in. That’s it.

Before going further I assume that you have some basic idea of AWS . In case if you don’t I suggest the following a play-list that I keep . This is a 10-hour course by freecodecamp.org . But you can just go through the components that you like by selecting the time ( which the freecodecamp guys have graciously put-together in the comments section. Hats of to you freecodecamp . You are awesome ).

My serverless application consisted of 3 critical components. IAM Role is an enabler that allows security permissions for Lambda function to access S3 to store files.

AWS Components

Let’s see the solution that I built.

  1. Create a S3 bucket and name it covid19statsdump-<your aws accountid>. Because S3 buckets need to be unique. And hence adding your AWS Account ID is a great idea. Note that I have included a “hyphen” as no other special characters are allows in a bucket name
  2. Create an IAM role with the AWSLambdaBasicExecutionRole . Name this role as “covid19statsdump_lambda_execution_role” .Attach an inline policy to allow access to the s3 bucket that you created in step 1.

Inline Policy should be as follows. Ensure to the change the bucket name appropriately.

3. The lambda function code is simple to achieve 3 objectives.

a. Make a http request to the Bing URL and receive response

b. Use Beautifulsoup to read through the http response

c. Use pandas to create a simple data frame.

I used pandas for simplicity , but if you feel creative please feel free .

4. Now this code needs to be hosted into Lambda. Things get trickier here. There are two ways of doing the deployment.

The hard manual way where you setup the environment and then create the zip package and upload it to Lambda yourself.

This works if the package that you create is manageable in size and the dependencies are all fine. But this can get really cumbersome in some cases due to platform-specific dependencies. Remember that AWS Lambda runs linux and if you are running Windows and using a package that has a linux variant , then chances are that you have a trouble.

The easy way , the Zappa way. Let’s go the easy way.

Here is a great article about deploying a simple app with Zappa.

A quick view of what to do with Zappa.

Install Zappa.

Create a Virtual Environment and install all the necessary packages required by the lambda function.

Write your code.

Use Zappa init to configure your lambda credentials.

Use Zappa deploy to deploy the app.

Really it’s that simple.

Zappa is a great python package with a command-line interface to help you deploy lambda functions with everything taken care of .

Note:When using Zappa your Lambda function will have a http trigger as Zappa by-default creates an API-Gateway url .But you can edit this functionality by accessing the lambda function andchanging the trigger.

In my case I over-rode this functionality by setting up a CloudWatch rul that runs on a cron-expression to run at 7:45 AM IST everyday as I’m interested in data for India .

This is how my Lambda function looks like after I set it up.

A snapshot of the S3 bucket with the files stored every day at 7:45 AM

Now a look at the cost factor:

As I promised in my article , I’ll explain how this is $0.00 setup that costs nothing.

A quick snapshot of the free-tier services that I use .

A detailed overview of AWS Free-tier usage

So that’s it.

Parting thoughts:

  1. Serverless applications provide an economical , scalable way to do some fascinating things on cloud.
  2. All Cloud Services AWS , Azure and GCP provide free-tier. Better if you are from a academic you may get more based on use-case.
  3. If this free-tier can be put to a good use to support decision making in this pandemic time then why not ?

I also write about Azure and GCP and just have started an series on Azure functions . If you are interested , please check out the below series .

--

--