Creating and filling a Postgres DB with Docker compose

José David Arévalo
Level Up Coding
Published in
4 min readJan 30, 2021

--

Many times we need to populate and eventually share a database with dummy data, either to test our pipelines, test queries, make a demo of the operation of a new service, or perhaps as a tool to apply testing to the future members of the company.

Github repo: https://github.com/jdaarevalo/docker_postgres_with_data

Introduction

Among the various options that we have to solve this requirement is to use or share files such as CSV, parquet, s3, among others, but those have limitations in each of the challenges that we want to face.

This is a problem that can be solved easily by making use of the features that Docker offers to us.

Our goal here will be to create the following tables with their respective foreign keys and fill the tables, while they can be easily shared with other users.

ER Diagram

Initialize the Postgres Service

We will use Docker Compose to manage the instance of Postgres with this docker-compose.yml file:

Now, we can start Postgres with just one simple command:

$ docker-compose up

or run Postgres in the background with this command, adding -d Detached mode

$ docker-compose up -d

And stop the Postgres service running

$ docker-compose down

In this case, we mapped our local port 5438 to the port 5432 (Postgres default port) inside the container., also we will persist the data in our machine, preventing data loss when deleting the containers, for this, we add the volume using the folder postgres-data as the place where Postgres data are stored.

Creating tables

To create the tables in our database we need to copy our create_tables.sql script to/docker-entrypoint-initdb.d/

The script has the commands to create each table required, the primary keys and the foreign keys

All the source code is available here https://github.com/jdaarevalo/docker_postgres_with_data

Why in docker-entrypoint-initdb.d/ ?

The official PostgreSQL Docker image https://hub.docker.com/_/postgres/ allows us to place SQL files in the /docker-entrypoint-initb.d folder, and the first time the service starts, it will import and execute those SQL files.

In our Postgres container, we will find this bash script /usr/local/bin/docker-entrypoint.sh where each *.sh, **.sql and *.*sql.gz file will be executed

Filling tables with data

Now we want to fill the tables using the script sql/filling_tables.sql

First, we will define variables as number_of_sales, number_of_countries and start_date, … and end_date for sales, we will do it using session variables

The SET <session variable> statement accepts two parameters: the variable name and the value to use to modify the variable. The variable name is case insensitive.

More details here https://www.postgresql.org/docs/10/sql-set.html

With variables defined continue creating the SQL scripts to fill data

The param current_setting get the variable defined in the first part, GENERATE_SERIES is a Postgres function that generates a series of values, from startto stop values with a step size of one, if this is undefined

Thus, we fill the tables country, city, store, users, product, status_name for the tables order_status and sale we will require a couple of Postgres functions or additional logical steps

Sale ID: Here we want a UUID for the column sale.id, fortunately, Postgres has support for universally unique identifiers (UUIDs) as a column data type via UUID, and we will use this to create the sale_id for this we load the pgcrypto extension in the current session database/schema with

Dates: Here we will use the PostgreSQL TO_TIMESTAMP() function to convert a string to a timestamp according to the given format, additional we will use the random() function to generate random dates in the range created

Random ids: Note that each sale requires a product_id, user_id, store_id. Here we use the function floor to return the value after rounding up the value generated with the random function

To fill the table order_status we will use the table sale

The update date in order_status has been created between the date of sale plus 5 days, to prevent having orders updated before the sale.

Finally, we add all script to fill the tables in the script sql/fill_tables.sql

After run docker-compose upwe will see something like this

Now you can generate and use a database with tables and values to test models, pipelines or evaluate the SQL, git, docker skills of your candidates.

To connect to Postgres you could use a database tool like DBeaver or from the command line, running this command:

$ docker ps -f "name=postgres"

-f: Filter output based on conditions provided

To get the CONTAINER_ID of the running container, and execute

$ docker exec -it <CONTAINER_ID> /bin/bash

And run queries like this

Github repo: https://github.com/jdaarevalo/docker_postgres_with_data

Please feel free to share, use it or contribute to the repository. And if you have any comments or suggestions I’d appreciate it

Regards

--

--

Data Engineer, my passion is data, my hobby is work with data. I love to learn, I enjoy teaching, and I'm excited about data related challenges bit.ly/in_jdaa