PostgreSQL isn’t enough. Learn These Pandas to SQL Translations To Upgrade Your Data Analysis Game!

40 PostgreSQL Queries and their analysis using Python

Saikiran Dasari
8 min readJan 4, 2023

Here I'm going to give you an Overall Understanding and Functioning of PostgreSQL using PgAdmin4 Interface as well as a good healthy exercise to get started with the pysocopg2 python library. This will be highly beneficial to the data scientists and analysts out there.

Photo by Julia Zolotova on Unsplash

PostgreSQL is a highly efficient open-source object-relational database system that uses and extends the SQL language combined with many features that safely store and scale the most complicated data workloads. Some of the top companies that are using PostgreSQL currently include Apple, Uber, Netflix, Spotify, Instagram, and Reddit.

In a nutshell, learning this tool can be a real boost to your analytics career as the ‘Data Science’ domain requires SQL skills to extract the most useful information from the plethora of incoming data.

Motivation

SQL and Pandas are both powerful tools for data scientists to work with data.

SQL, as we all know, is a language used to manage and manipulate data in databases. On the other hand, Pandas is Python's data manipulation and analysis library.

Today we will learn how to import a sample database into a PostgreSQL server and run a query to check if everything worked as expected.

Therefore, in this blog, I will provide a quick guide to translating the most common Pandas operations to their equivalent SQL queries.

Let’s begin 🚀

Before going forward with this tutorial, you need to have:

  • A PostgreSQL database server is installed on your system.
  • A PostgreSQL sample database called DVD rental.

Installing PostgreSQL and pgAdmin4 on desktop.

  1. Download PostgreSQL and pgAdmin4: https://www.postgresql.org/download/ https://www.pgadmin.org/download/
  2. Or You can Simply follow my Word Doc Installation Screen Shot from my GitHub Repository

Dataset: For Data Analysis I have my DVD rental Dataset:

First Download the dvdrental.zip file from my GitHub Repository

Summary: In this section, I will show you how to load the PostgreSQL sample database into the PostgreSQL database server using “psql Shell” and “pgAdmin4” Interface.

A. Load the sample database using psql tool

First, launch the psql tool from Start Menu

Second, enter the account’s information to log in to the PostgreSQL database server. You can use the default value provided by psql by pressing the Enter keyboard. However, for the password, you need to enter the one that you provided during the PostgreSQL installation.

Third, enter the following CREATE DATABASE statement to create a new dvdrental database

PostgreSQL will create a new database named dvdrental.

Then, navigate to the bin folder of the PostgreSQL installation folder:

After that, use the pg_restore tool to load data into the dvdrental database:

In this command:

  • The -U postgres specifies the postgres user to login to the PostgreSQL database server.
  • The -d dvdrental specifies the target database to load.

Finally, enter the password for the postgres user and press enter

It takes about seconds to load data stored in the dvdrental.tar file into the dvdrental database.

Finally, enter the exit command to quit psql:

B. Load the DVD Rental database using the pgAdmin

After you have downloaded and configured pgAdmin, it's time to load the dvdrental.tar file into pgAdmin.

  • Right-click on Databases and click ‘Create’.
  • Type in ‘dvdrental2’ as the database name and click save
  • The above step will create an empty database.
  • Right-click on the empty ‘dvdrental2’ database and click ‘Restore’
  • Import the .tar file and click restore.
  • Note: If you want to see existing tables in the database, this is the path: dvdrental2 -> Schemas -> Tables
  • Open up the query tool and run the following command:

SELECT * FROM actor;

  • If you followed all the above steps you should be seeing the below The output of Actor Table:

Now, here Comes the Main part i.e How to integrate PostgreSQL in a python environment using the psycopg2 module && Analyze the loaded “dvdrental” database using Python

STEPS:

1. Install psycopg2 using the pip command

2. Import the required libraries.

3. Connect to the PostgreSQL server using connect() command.

The connect() function has the following parameters:

  • The username you use to work with PostgreSQL, The default username for the PostgreSQL database is postgres.
  • Password — The password is given by the user at the time of installing PostgreSQL.
  • Host Name(Optional) — is the server name or Ip address on which PostgreSQL is running. if you are running on localhost, then you can use localhost, or its IP i.e., 127.0.0.0
  • Database Name — The database name to which you want to connect. Here we are using Database named “dvdrental”.

4. To check if we are connected to the database server, let us run a quick query against it

Here we use the pandas library to transform the SQL query into a data frame. We could have alternatively used the cursor.execute() function to execute the query but we prefer pandas as it gives us the flexibility to work on a structured data frame for EDA (Exploratory Data Analysis) purposes.

5. Let us see what the DVD rental database is all about.

DVD rental ER Diagram

Contents of Each Table:

Data Types of all the Tables of dvdrental database

# A. Basic Select ( 15 QnAs ):

+ 9 More…

# B. Advanced Select (4 QnAs):

+ 2 More…

# C. Aggregation (18 QnAs)

+ 13 More…

# D. Joins (4 QnAs):

+ 2 More…

At the END of the Analysis make sure to CLOSE your Connection

If you are looking to practice or refine All the remaining PostgreSQL solving, you can download my free Jupyter Notebook 40+ practice questions on Basic Select, Advanced Select, Aggregation, and Join queries from my GitHub page. Posted a notebook with the answers as a reference. Make sure you download the whole zip file with the images.

Things Learned:

  1. Installing PostgreSQL and pgAdmin on desktop.
  2. How to load the dvdrental sample database into the PostgreSQL database server using “psql shell” and “pgAdmin4” Interface for practicing PostgreSQL.
  3. Connection Between PostgreSQL and Python
  4. Doing the Data Analysis using: Basic Select, Advanced Select, Aggregation, and Join queries.

Conclusion

Congratulations! You now know the PostgreSQL translation of the most common methods in Pandas.

I have tried to cover translations for most of the data scientists use on a regular basis in Pandas. However, I understand I might have missed a few.

Do let me know in the responses.

As always, thanks for reading!

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Below are the ways where you could contact me or take a look at my work.

If you liked this blog, consider following me on @MEDIUM, LinkedIn, Twitter, and GitHub for more content.

@Saikiran Dasari | Portfolio or Saikiran Dasari | Email to contact me.

Buy me a Beer if you like this article! 😊

💬 Leave a response to this article by providing your insights, comments, or requests for future articles. 📢 Share the articles with your friends and colleagues on social media.

--

--

Saikiran Dasari

Hi there, I’m a Data Scientist& CompScienceEngg, I like working on Data: Extraction, Pre-Processing & EDA, Feature-Engg, Modelling, NLP, Time-Series, Deployment