Skip to content
Search
Generic filters
Exact matches only

A guide on how to interact between Python and databases using SQLAlchemy and PostgreSQL

source: Toa Heftiba, unsplash

Explanation of the connection between Python and PostgreSQL using the SQLAlchemy Python library as well as some tips on how to use it.

Jonathan Leban

When working on a data science project, you may want to connect Python scripts with databases. A library known as SQLAlchemy bridges the gap between SQL and Python.

One way to store databases is to use a relational database management system, which is a common type of database where data is stored in tables. However, software is required to host these databases. PostgreSQL is open source software that allows this. Although it is not necessary, a visualization tool can be very useful when dealing with a large number of different and complex databases. A visualization tool gives you a clear overview of the database you are working on. Among these tools, I chose TablePlus (for Mac) and phpMyAdmin (for Windows).

In this article, I will show you the different steps to follow to configure your system, as well as some commands you need to know when using Postgres and SQLAlchemy.

pip install SQLAlchemy

To install PostgreSQL, you need to go onto that website. I also give you a tutorial for mac and for windows. It is extremely important that you note the login and the password you use when you configure PostgreSQL because you will need it in the connection with TablePlus and with Python.

Now that the software is downloaded, a connection needs to be created between tablePlus and PostgreSQL. Here is a tutorial on how to do it on TablePlus.

First, you need to connect to the PostgreSQL server by typing in the terminal:

psql -U postgres

Then, you can create your database which we will call “flight” (avoid to use upper case letters).

CREATE DATABASE flight;

Now that the database is created you can create tables, insert values either by using PostgresSQL or SQLAlchemy. I will show you the two methods.

connect flight;

Now we want to create a table with some attributes. I will call the table “flights” and the attributes are the id of the flight, the origin, the destination and the duration.

CREATE TABLE flights (
id SERIAL PRIMARY KEY,
origin VARCHAR NOT NULL,
destination VARCHAR NOT NULL,
duration INTEGER NOT NULL
);

If you want to create a table without any attribute you need to enter the following command:

CREATE TABLE flights();

Now that we have create the attributes, we need to insert some values. However, to insert values, you need to have created attributes before.

INSERT INTO flights (origin, destination, duration) VALUES ('New York', 'London', 415);
INSERT INTO flights (origin, destination, duration) VALUES ('Shanghai', 'Paris', 760);
INSERT INTO flights (origin, destination, duration) VALUES ('Istanbul', 'Tokyo', 700);
INSERT INTO flights (origin, destination, duration) VALUES ('New York', 'Paris', 435);

Then, you can visualize your table in TablePlus and you should see that:

Now I will show you the commands in Python and how to use the SQLAlchemy library.

from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session, sessionmaker
engine = create_engine("postgresql://login:[email protected]:5432/flight")

The general format to create an engine is:
create_engine(“postgresql://login:[email protected]:5432/name_database”)

Here as we want to work on the database flight, instead of name_database we put flight. The login and the password correspond to the one when you set up PostgreSQL. And 5432 is the port but as it’s a local server, it’s 5432.

Then we need to create a table and attributes but also insert values into that table. First, we need to create a scoped session by tapping the following line:

db = scoped_session(sessionmaker(bind=engine))

It is not very relevant for our application but it is extremely important in a lot of examples. Image a situation where once we take our web application to the internet and we have multiple people that simultaneously trying to use our website, we want to make sure that the stuff that person A is doing with the database is kept separate from the stuff that person B is doing on the database.

Next, we need to run the following line to create a table flights with attributes id, origin, destination and duration. Be careful, the syntax is a bit different than in PostgreSQL: we need to declare the key at the end.

# to create the table flights with all the attributes
db.execute("CREATE TABLE test (id INTEGER NOT NULL, origin VARCHAR NOT NULL, destination VARCHAR NOT NULL, duration INTEGER NOT NULL, PRIMARY KEY (id));")
# to insert values into the table flights
db.execute("INSERT INTO flights (origin, destination, duration) VALUES ('New York', 'London', 415);")
db.execute("INSERT INTO flights (origin, destination, duration) VALUES ('Shanghai', 'Paris', 760);")db.execute("INSERT INTO flights (origin, destination, duration) VALUES ('Istanbul', 'Tokyo', 700);")db.execute("INSERT INTO flights (origin, destination, duration) VALUES ('New York', 'Paris', 435);")

Then, to be executed on the PostgreSQL server, we need to commit the changes:

db.commit()

In a final step we close the session:

db.close()

Thus the whole code is summarized below:

from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session, sessionmaker
# creation of an engine
engine = create_engine("postgresql://login:[email protected]:5432/flight")
# creation of a session
db = scoped_session(sessionmaker(bind=engine))
# creation of the table flights with all the attributes
db.execute("CREATE TABLE test (id INTEGER NOT NULL, origin VARCHAR NOT NULL, destination VARCHAR NOT NULL, duration INTEGER NOT NULL, PRIMARY KEY (id));")
# insertion of values into the table flights
db.execute("INSERT INTO flights (origin, destination, duration) VALUES ('New York', 'London', 415);")
db.execute("INSERT INTO flights (origin, destination, duration) VALUES ('Shanghai', 'Paris', 760);")db.execute("INSERT INTO flights (origin, destination, duration) VALUES ('Istanbul', 'Tokyo', 700);")db.execute("INSERT INTO flights (origin, destination, duration) VALUES ('New York', 'Paris', 435);")# commit the changes
db.commit()
# close the session
db.close()

To show you how it works, I used this dataset. The dataset is all about flight in the US for the year 2015. You can find the dataset here.

We will work on the same database than before which is flight. And we will create a table called “airlines_2015”.
import pandas as pd

from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session, sessionmaker
engine = create_engine("postgresql://login:[email protected]:5432/flight")db = scoped_session(sessionmaker(bind=engine))db.execute("CREATE TABLE airlines_2015();")dataframe.to_sql('airlines_2015', engine, if_exists='replace')db.commit()
db.close()

Connect to the PostgreSQL server:

psql -U postgres

Create a database:

CREATE DATABASE name_database ;

Connect to the database:

connect name_database ;

List of all the databases:

l

Create a table:

CREATE TABLE name_table();

In the next command, you need to connect to the database you want to see the table of:

dt

If you want to have a description of a table:

d table_name

Finally, if you want to close some connections which are still opened you need to open a new terminal and run the following (after each line press enter, so you need to press 3 times enter in total):

# first command
psql -U postgres
# second command
connect nom_database;
# third command
select pg_terminate_backend (pg_stat_activity.pid) from pg_stat_activity where datname = current_database() and pid <> pg_backend_pid();

PS: I am currently a Master of Engineering Student at Berkeley, and if you want to discuss the topic, feel free to reach me. Here is my email.