Connect a PostgreSQL RDS database to Heroku

This tutorial will show you how to create and connect a postgres database hosted with Amazon Web Service’s RDS to a Heroku application.

  1. Create and launch a database instance on RDS with the default settings. Make sure ‘Public accessibility’ is set to Yes.
  2. Once your instance is created, click on it to view the details and settings.
  3. Scroll half way down the page to see your security group. Click on the security group so you can make edits.
  4. On the security group page, click the Inbound tab at the bottom of the page. Click edit, then change the Source to 0.0.0.0/0. Your security group should look like this:
  5. Go back to RDS and click ‘Parameter groups’ on the left hand menu
  6. Click Create parameter group
  7. Give it a name like ‘forcessl’ and click Create. Then click on the newly created parameter group, filter to rds.force_ssl. Click edit parameter and change the value to 1. This setting will force all connections to your database to use SSL.
  8.  Click on your database instance again. Click the Instance Actions at the top left, then click Modify. Scroll towards the bottom and change the parameter group to your newly created group called forcessl. Click next, then change the radio option to Apply immediately, and click Modify DB instance.
  9. Download the SSL certificate from AWS using this link: https://s3.amazonaws.com/rds-downloads/rds-combined-ca-bundle.pem
  10. Place it in the root of your application directory
  11. Set your DATABASE_URL to the following format:
    postgres://user:[email protected]_endpoint/dbname?sslrootcert=rds-combined-ca-bundle.pem

Note: when I say instance_endpoint, I mean the RDS endpoint that can be found under your instance details, ending with us-east-1.rds.amazonaws.com.

You can now connect to your database securely using SSL. Test it out with psql by running:

psql -h instance_endpoint -p 5432 “dbname=mydbname user=dbuser sslrootcert=\Users\casey\SSH\rds-ca-2015-root.pem sslmode=verify-full”

You will see at the top that are you connected using SSL.

You can also connect using DataGrip by selecting the SSL tab on your connection and providing the path to the SSL certificate like so:

Switching to Pycharm

Over the past few weeks I’ve been transitioning from my beloved Sublime Text and terminal combo to Pycharm Professional.

Why Change?

I’m fortunate to have more freelance work then I can handle. That’s driven me to find productivity improvements so I can complete projects faster. I also needed new database software for Postgres since PGAdmin 4 is basically unusable. After using Pycharm for a few weeks, I realize I made the right choice. These are the core reasons I’m happy with the change:

  • Debugging:  While in draft my code is often littered with print statements, which I use to view data structures and values during runtime. Pycharm’s integrated debugger now makes this a breeze… with no print statements!
  • Project setup:  I can configure my project’s virtual environments, environment variables, etc to save time when switching between projects.
  • Integrated database views:  I’m used to switching back and forth between PGAdmin to view changes in a database. Now I can put this right next to my code in Pycharm.
  • Integrated terminal:  The terminal window is right there, neatly tucked beneath my code and matched to that project.
  • Realtime git differences:  I love that I can see what has changed from the last time I committed.

Basically I love that I can do my work within one big window.

What I Don’t Like

I prefer to keep my virtual environments within my project folder. Pycharm is hard wired to keep virtual environments outside the project folder.

Why is this important to me? Right now I can jump into any of my project folders and type source venv/bin/activate and see an activated virtual environment. With the virtual environment outside I need to list the virtual environments to find the right name, then activate the environment. With a lot of projects this can get very tedious. I know Pycharm magically opens up the right virtual environment for you once configured, but I don’t want to be tied to a program to open my environments and need the option to activate them easily in a regular terminal window.

I also did not like that Pycharm creates additional files that I have to ignore, such as whatever is in ‘.idea’. Not sure what that is for?

Am I Glad I Started with Sublime Text?

Short answer:  yes! I feel like I know how to use all of my tools the ‘regular way’ without the help of additional software. For instance, I sometimes skype with clients that do not have Pycharm installed. I would hate to tell them ‘umm I’m not sure how to push to github from the terminal, I use Pycharm for that.’

Whether you are just getting started or experienced, I recommend taking a look at Pycharm!

Save API Results to PostgreSQL for Free with AWS Lambda

Overview

In this tutorial I will show you how to use Amazon Web Services (AWS) Lambda service to save the results of an API response to a PostgreSQL database on a recurring schedule.

The full repo is posted on GitHub here.

AWS Lambda is a serverless architecture, meaning you do not need to spin up and maintain resources before calling a program or function. You are only charged for the time needed to execute the function. It is also incredibly scalable since each event is an independent, one-off execution.

This use case was for a client of mine that needed to save the API data along side his other data in PostgreSQL. I know there are other ways to cache API results and that is not addressed here.

Resources

The application will be written in Python 3.6.

For an example API response I will use JSON Placeholder which produces fake API responses for posts, comments, etc. For PostgreSQL I created a free database on the ‘Tiny Turtle’ plan at ElephantSQL. For deployment I used the awesome AWS Lambda packaging tool Zappa.

API Response

Let’s assume we want to save the results from the API endpoint https://jsonplaceholder.typicode.com/posts which produces an array of items that look like this:

Let’s create an app.py file and use requests to retrieve our API response with a python function:

$ pip install requests

Database Model

I prefer SqlAlchemy when working with databases in python. So install that with:

$ pip install sqlalchemy

Let’s build our database connection and model in a separate models.py file. Our model will have four fields: id, userId, title, and body.

Saving to Database

Turning back to our app.py file, we create a main() function that imports a database session object, deletes any records that already exist, then saves a fresh batch after running our get_items() function.

Of note, if you have a lot of API fields and are naming your database fields exactly the same as the API fields, you can refactor the save_items() function using item.keys() and setattr:

Deploying to AWS Lambda

We now have an app that works on our local machine. Let’s deploy it to AWS using Zappa. Your first step is to install Zappa:

$ pip install zappa

Ensure you have an AWS user configured with programmatic access (API ID and secret key). Install the AWS CLI, then run:

$ aws configure

Follow the set up prompts to enter your user credentials. Unfortunately, Zappa requires borderline admin access to deploy an application. If you want to deploy with more minimal permissions, you can try the policy that is proposed as a pull request to Zappa here.

Zappa will create a JSON configuration file for your app by running:

$ zappa init

Go ahead and set the defaults, and use app.main for the modular path to your main function. After our zappa_settings.json file is created, we need to make some edits. Leave aws_region and s3_bucket alone as those are unique to your setup. Edit the rest to:

What we’ve done is add an events array, which will in turn create a CloudWatch event that triggers our app function every 10 minutes. I set keep_warm to false because this is not a web app. I do not need a fast response to a http request so the keep_warm setting is unnecessary. API Gateway resources are enabled by default, so I set that to false because we do not need it for this application.

Go ahead and deploy your app with:

$ zappa deploy dev

This command will:

  • Package your application and all dependencies into a zip file
  • Upload the zip file to a temporary S3 bucket
  • Create the necessary lambda function, permissions, and CloudWatch events
  • Install the package
  • Clean up and delete the local zip file and remote zip from S3

The last step is to go into our AWS Lambda function and set our database environment variable, with the Key as DATABASE_URL and the Value as a postgres URL format postgresql://user:[email protected]_path:port:/database_name.

You can see that our CloudWatch event is enabled by clicking the Triggers tab.

Monitoring Your Function

You can see if there are are any errors with your function by looking at your lambda function in AWS, then clicking Monitoring > View Logs in CloudWatch.  

One thing you can see in the logs is how much memory your function is using compared to what you allocated. The zappa default is 512. My log file initially looked like this:

REPORT RequestId:  Duration: 638.74 ms Billed Duration: 700 ms Memory Size: 512 MB Max Memory Used: 54 MB

So I am using only 54 MB of 512 MB. How do we correct this? You can manually change the setting in AWS Lambda, but the better way for long term maintenance is to edit your zappa_settings.json file and add this line, reducing it to the minimum of 128:

To update your function run:

$ zappa update dev

Remember to re-add your DATABASE_URL environment variable when this is finished as it will be removed after updating with Zappa.

You will notice I set the initial recurrence to 10 minutes. I do that to monitor the function for a bit, but would change that to something more reasonable later on.

Again, unless Amazon changes something with their lambda billing, this setup will run for free forever:

The Lambda free tier includes 1M free requests per month and 400,000 GB-seconds of compute time per month. The memory size you choose for your Lambda functions determines how long they can run in the free tier. The Lambda free tier does not automatically expire at the end of your 12 month AWS Free Tier term, but is available to both existing and new AWS customers indefinitely.

Finishing Up

There is one final thing I want to address. Some of you are likely screaming at the screen, saying ‘you can save JSON directly into PostgreSQL!’ Yes, that’s right you can! How awesome is that? My client had a need to keep the data relational with foreign key relationships, etc. But if you want to save the JSON response directly into PostgreSQL you can change the model to this:

Then we can save the JSON in our save_items() function within app.py with this:

You can now query the JSON stored in the database directly.

I hope this tutorial was useful! Please provide feedback on anything I can improve in the comments below.

Implement Short URL Tags in Flask Using Hashids

One issue you need to watch for in your SaaS application is using auto-incrementing IDs in your application’s URLs.

Good: http://myapp.com/invoice/bXW0y
Bad: http://myapp.com/invoice/9

In this article I will show you how to implement the first style of URL, matching the text string to an integer ID in your database.

Why is this an issue?

Your users likely share the same table for records, but can only see what they have access to. When they navigate to a URL like http://myapp.com/invoice/9, and later create an invoice and see the ID is 10, they know that the next invoice in your system is 11.

However, let’s say the invoice with ID 11 belongs to another user. To look up that invoice you need the invoice ID number and the user ID that is stored in a session variable. By knowing the invoice number is 11, a potential attacker already has one half of the equation without doing ANY work.

Auto-incrementing IDs in URLs also tell users information about your underlying system, like how many invoices have been created and how many users have signed up. In the scenario above, I would know that only 9 invoices have been created in the SaaS app and possibly question how mature the software is.

IDs in URLs can sometimes be ok in very public web sites where the next ID is an article or readily available piece of content. But I contend that in every SaaS application they are a bad idea.

Ways to Fix It

I researched several ways to work around this problem and narrowed it down to three possible solutions:

  • Stop passing IDs in URLs and use sessions or some other method
  • Generate a UUID in place of your primary key
  • Use hashids, a library for converting integers to short string IDs

The first option is not reasonable because it creates usability issues. Imagine your user Bob wants to email Gary a link to an invoice. Without an ID in the URL this is not possible. Bob would need to tell Gary how to browse to the right record in your system. You would also lose the ability to bookmark records and come back to them later.

The second option is very promising. UUIDs are unique and easy to generate. However, they are very long. Example: 123e4567-e89b-12d3-a456-426655440000. To implement UUIDs I would need to store them as the primary key or in a surrogate field called ‘slug’. I would have to shorten them to a reasonable length using base64 or some other method.

The last option, hashids, was my preferred option and fit the bill perfectly.

Implementing Hashids

Hashids work by encoding and decoding your ID, like so:

The result is a short string like ‘yds’ that is suitable for a URL.

One decision you need to make is whether to save your hashids in your database, or encode and decode them on the fly. I considered saving my hashids in a field called ‘slug’. However, after researching and reviewing a response from the creator of hashids, I opted to encode on the fly. The primary reason for this is to ensure speed when looking up records via simple integers.

Here is my final implementation in python Flask:

First, install the package with $ pip install hashids

I created two utility functions within my utils.py file. My app’s secret_key acts as the salt. Note: I’m using an application factory which is why I’m calling Hashid() in both functions.

We need to call the create_hashid function in our jinja2 templates. To do this, set a global environment variable like this:

In my template I have a row of items, with links generated like this:

Finally, our views.py file takes the link and displays the provided invoice:

The end result is our URLs no longer look like /invoice/32, but rather invoice/bXW0y. Nothing can be gained by looking at the letters. Everything is working perfectly.

Hope this was helpful!

Why I Like Flask

I’m a huge fan of the python micro framework Flask. Here are the reasons why.

Flask is built with python

The python programming language is a joy to use. It strives to be practical, intuitive, and concise. It is not clever. What do I mean by that? Some programming languages encourage what it is known as ‘code golf’ where multiple functions can be summed up in a single cryptic line of code. Python discourages this in favor of clean code and readability.

Python is popular across many communities, so whatever problem you have there is typically a python package available to help solve your problem.

Flask makes core web functions simple

At a high level, most web frameworks are doing the following:

  1. Map a route to a function
  2. Gather data and send it to a template
  3. Convert the template into regular HTML

With Flask, this entire process is easy:

Flask starts small

The smallest flask application is literally one file and seven lines of code:

This makes it ideal for writing small applications or simple micro services.

Flask can grow big

Through the use of extensions and blueprints, a flask application can grow just as big as a large Ruby on Rails or Django app. For example, the cookiecutter-flask template features user registration/login/logout, database migrations, caching, and tests.

Some may say ‘with all those features why not just use Django’? Django is awesome and it does a great job integrating a lot of components right out of the box. But I’m still more productive using Flask. That gets me to my last point.

Flask helps me get things done!

The main reason I started using Flask and never looked back, is because it does not get in my way. It’s like starting with a pile of legos, and each lego fits together well and you just keep stacking. Why try something else when it works so well?

I can clearly see where data is flowing in my application. This helps me track down bugs, and know what is needed to implement a new feature.