Creating and filling a Postgres DB with Docker compose
Many times we need to populate and eventually share a database with dummy data, either to test our pipelines, test queries, make a demo of the operation of a new service, or perhaps as a tool to apply testing to the future members of the company.
Github repo: https://github.com/jdaarevalo/docker_postgres_with_data
Introduction
Among the various options that we have to solve this requirement is to use or share files such as CSV, parquet, s3, among others, but those have limitations in each of the challenges that we want to face.
This is a problem that can be solved easily by making use of the features that Docker offers to us.
Our goal here will be to create the following tables with their respective foreign keys and fill the tables, while they can be easily shared with other users.
Initialize the Postgres Service
We will use Docker Compose to manage the instance of Postgres with this docker-compose.yml file:
Now, we can start Postgres with just one simple command:
$ docker-compose up
or run Postgres in the background with this command, adding -d Detached mode
$ docker-compose up -d
And stop the Postgres service running
$ docker-compose down
In this case, we mapped our local port 5438 to the port 5432 (Postgres default port) inside the container., also we will persist the data in our machine, preventing data loss when deleting the containers, for this, we add the volume using the folder postgres-data
as the place where Postgres data are stored.
Creating tables
To create the tables in our database we need to copy our create_tables.sql
script to/docker-entrypoint-initdb.d/
The script has the commands to create each table required, the primary keys and the foreign keys
All the source code is available here https://github.com/jdaarevalo/docker_postgres_with_data
Why in docker-entrypoint-initdb.d/ ?
The official PostgreSQL Docker image https://hub.docker.com/_/postgres/ allows us to place SQL files in the /docker-entrypoint-initb.d
folder, and the first time the service starts, it will import and execute those SQL files.
In our Postgres container, we will find this bash script /usr/local/bin/docker-entrypoint.sh
where each *.sh, **.sql and *.*sql.gz file will be executed
Filling tables with data
Now we want to fill the tables using the script sql/filling_tables.sql
First, we will define variables as number_of_sales, number_of_countries and start_date, … and end_date for sales, we will do it using session variables
The SET <session variable> statement accepts two parameters: the variable name and the value to use to modify the variable. The variable name is case insensitive.
More details here https://www.postgresql.org/docs/10/sql-set.html
With variables defined continue creating the SQL scripts to fill data
The param current_setting get the variable defined in the first part, GENERATE_SERIES is a Postgres function that generates a series of values, from start
to stop
values with a step size of one, if this is undefined
Thus, we fill the tables country, city, store, users, product, status_name for the tables order_status and sale we will require a couple of Postgres functions or additional logical steps
Sale ID: Here we want a UUID for the column sale.id, fortunately, Postgres has support for universally unique identifiers (UUIDs) as a column data type via UUID, and we will use this to create the sale_id for this we load the pgcrypto extension in the current session database/schema with
Dates: Here we will use the PostgreSQL TO_TIMESTAMP() function to convert a string to a timestamp according to the given format, additional we will use the random() function to generate random dates in the range created
Random ids: Note that each sale requires a product_id, user_id, store_id. Here we use the function floor to return the value after rounding up the value generated with the random function
To fill the table order_status we will use the table sale
The update date in order_status has been created between the date of sale plus 5 days, to prevent having orders updated before the sale.
Finally, we add all script to fill the tables in the script sql/fill_tables.sql
After run docker-compose up
we will see something like this
Now you can generate and use a database with tables and values to test models, pipelines or evaluate the SQL, git, docker skills of your candidates.
To connect to Postgres you could use a database tool like DBeaver or from the command line, running this command:
$ docker ps -f "name=postgres"
-f: Filter output based on conditions provided
To get the CONTAINER_ID of the running container, and execute
$ docker exec -it <CONTAINER_ID> /bin/bash
And run queries like this
Github repo: https://github.com/jdaarevalo/docker_postgres_with_data
Please feel free to share, use it or contribute to the repository. And if you have any comments or suggestions I’d appreciate it
Regards
Level Up Coding
Thanks for being a part of our community! Level Up is transforming tech recruiting. Find your perfect job at the best companies.