Hello my fellows Pythoneers in this new entry I’m going to talk about a cool topic, and that is handling SQL database systems with Python, but first let’s talks as usual about some background theory so we can put you in context.
A really quick definition about this concept is that al digital data is organized and stored as separated groups of data that a have relations on somehow between these groups, in the same storage, normally called database, we could have several data groups, these data groups in terms of SQL are called tables, and each table have unique records of information, these records are called rows.
So structurally speaking it would be something like this.
Database -> Table -> Row
Tables are organized as concerns in your applications, each table have a fixed number of columns in which each column represent a feature of attribute of those elements that we want to save in each table. And finally every row is a unique entry. We can understand these concepts better by approaching with an example.
This language gives you syntax and constructs so you can cleverly consult, insert, update and delete information in a logical and meaningful way, you can use a large set of statements to build a smart query for achieving a desired result, and we can see some examples later in this entry.
Database systems and Database management systems
Even that this topics are not so much of programming concerns as a programmer you need to be in somehow proficient in every aspect of computer science or informatics, and have some knowledge and practice with these concepts.
A database system is a software that you could install in your local computer, this software will automatically handle the way in which data is stored in the hard drive, this is not like saving information in a CSV file or in a Excel Workbook, one of the advantages of using a database system is that a large amount of data is to be stored and we want to retrieve information out of it very quickly, so the database system not only handles this optimized storage but also gives you a drive, engine or service so you can query the data to obtain valuable information, pretty cool right? There several other advantages by using a database system like users and permissions but here we will focus only in the basics.
Also we have database management systems; these are software that usually comes with the database system installation that you could use for managing the data base with the use of a graphical user interface.
In this entry we are going to build a simple database populating it with three main CSV files.
You can click on any of these links to download the respective file.
One thing to point out, the information in all of these CSV files are music tracks, I took them from a college assignment of programming, from a friend that I helped with.
These tracks they could be gathered into a single CSV file, but they are separated to construct a schema that could be implemented in a database, so what is a schema? Is the way we can split data based on relations so we can store in separated tables. Why? Well for instance several tracks could have the same album name, a very long string, some bytes that you will store in each record or row in the tracks table right? for small databases this could be good enough but when our database tends to be really large, millions of records those bytes could be a size problem in terms of storage, so what can we do about it to optimize this, well can replace the album name by a unique identifier, that could take less bytes that the entire album name, and then hold in a separated table the full album name with the same identifier, this identifier also called ID, takes one column for each table, en these columns are the relations that bound one table to another, that is why it is called relational database, in the albums table you only take one record for the album name instead of several times, as much tracks as the album perhaps has.
The same approach could be applied for genres, so if you open these CSV files you can see that this schema is already in place, but if you take a closer look you can see that you could even better optimize this data by replacing the names of artists by unique ID and build a new table with artists names only.
But let's keep it simple for now, we want just to build these three tables and make some queries on it.
First let's talk about the database system that we are going to use, for simplicity we are going to use sqlite, this instead of a database system is just a way of saving data into a single file, but query it with programming languages, for each programming language there must be a library that supports sqlite, in Python this library can be imported with sqlite3 module.
First we are going to build 3 functions for populating the database with each CSV file.
Each one of these functions takes two arguments, the first one db, which is a database filename, and the second one data_file, which is the file handler for the respective CSV file, remember that a file handler is an object returned by the open function. Let's take a look at the setup_tracks function.
As you can see first we import our two main modules, csv for CSV file handling and sqlite3. Then in our setup_tracks function, we start by creating a connection object, this bound a sqlite database file to a Python local variable, and with the cursor method we create a cursor instance that can be used for sql query executions.
The first try and except statement is used for deleting the table Tracks in case it exists, so we can have a clean and fresh database setup each time we execute this function, as you can see we use the execute method from the cursor created, this method takes a string which is a SQL query syntax, they are pretty much readable for this example. For the second execution we create a new table with the next attributes
· Title (Text)
· ID (Integer)
· Time (Integer)
Each one with its respective data type, this is very important in Python variables type are defined at runtime dynamically, in SQL each attribute type requires to be specified at the moment the table is created.
The commit method applies the executions to the database, every time we use the execute method this queries are saved to a queue, but they are not entirely applied to the database we need to commit them.
The using the DictReader function from csv module we can create a list of dictionaries based on the CSV file header, so we use a for loop and dictionary keys access to take each value of each attribute for every track in the CSV file.
It could be kind of strange the third execution statement, as you can see we want to insert a new record to the Tracks table but what about those "?" questions marks, well the query must be a string, so you can convert and concatenate each attribute value and build a query, but this is not safe, an injection attack could be made with a dangerous attribute value, let me explain this with a cool and funny comic.
Taken from xkcd
As you can see there must be a safer way to execute sql statements from a programming language, well for Python this is one way, we use "?" in the respective positions, and in the second argument we provide a tuple with the values ordered in their respective value, in this way the execute method will take a closer look if the query is safe or not.
After we commit and we are done with the updates we close both the cursor and the connection objects.
In this function we used a local function for converting time formats, from h:mm:ss to seconds, or mm:ss to seconds.
For the others two functions we take the same approach let's take a look at them.
With these three functions we can populate our database file from the CSV files, we can take a main script for this
After executing this string you will have a music.db file in the same directory as the Python script file, also you must save all three CSV files in this same directory. Alright so for now we have our database filled with our tracks, what can we do about? Anything, remember that databases are the core store for almost every application out there in the world and in the Internet, but for now we will make some basic and simple queries, in the second part of this entry we will build a web application so we can interact with this database, so keep in touch if you want to learn more.
For a basic query we are going to create a simple function that will return the results from our database SQL execution.
With this function we can play a little with our database, the main difference with the other functions is that after we make our execution we can fetch data from the cursor with the fetchall method, this will return a list of tuples, and each tuple is a record with the selected attributes from the table that we used in our query, cool right? Let's do some queries with Python
As you can see in each run_query call a list of tuples is returned, so accessing a string value we might need to use double square brackets.
So this is it my friends I hope you have found this entry useful, so keep in touch if you want to see the second part of this entry, we will cover web application with this database, so comment, subscribe, follow my facebook page and share with your friends, best regards.