Creating SQLite3 Databases from a .csv File

When you’re working with a spreadsheet that requires more complex filtering/sorting (i.e. based on conditions that are satisified (or not) across multiple columns/rows) than what’s generally available in Microsoft Excel/Libre Office/Google Sheets, it turns out that SQLite3 (which comes installed with Ubuntu) is a pretty easy/quick way to approach complicated filtering. Additionally, this can all be done from the shell (command line).

The spreadsheet I’m working with in this example contains data for abalone samples that have been evaluated for infection levels of the withering syndrome bacterium in various tissues.

Before using SQLite:

  • Change the names of any columns that are duplicate names.
  • Export the spreadsheet as a comma-separated file (.csv)

The remaining steps are all performed using the shell. Additionally, none of the commands below actually alter your file in any way. So, you can play around all you want without worrying about modifying/destroying the source data set.

Check the header (column names) of the spreadsheet .csv file:

$head -1 RedPinkPinto.csv

The output:

Accession #,Type,Date,Species,Control/Exp,Shell Length (mm),Total Weight (g),Shell weight (g),Body weight (g),Sex,Foot Condition,PE RLO,PE St,PE New,PE total RLOs,DG RLO,DG St,DG New,DG Tot RLOs,Metaplasia,Coccidia (Y/N),Comments

Launch SQLite3:

$sqlite3

The shell prompt should change to:

sqlite>

Specify the delimiter of the file to be imported (comma in this instance):

sqlite>.separator ","

Import the .csv file (which creates a SQLite table) and name the table:

sqlite> .import RedPinkPinto.csv RedPinkPinto

Check the column names of our new table:

sqlite>.schema

Screenshot of the output:

Selection_026

Change the output view of the new table from comma-separated to a columnar view:

sqlite>.mode column

Run a query (filter) to view only those records (rows) where the Species column contains “Red” and where the value in the “PE St” column is greater than the value in the “PE New” column.

Additionally, only show the following columns in the output:

  • Accession #
  • Species
  • PE RLO
  • PE St
  • PE New
  • DG RLO
  • DG St
  • DG New
sqlite> SELECT "Accession #", Species, "PE RLO", "PE St", "PE New","DG RLO", "DG St", "DG New" FROM RedPinkPinto WHERE "Species"="Red" AND "PE St">"PE New";

Here’s a screenshot of the output:

Selection_029

This is cool, but it’s a pain to have to try to remember the order of the columns.

Enable column headers in the output:

sqlite>.header ON

Re-run the query to view the output difference (NOTE: Quotes are needed around column names that contain spaces):

sqlite> SELECT "Accession #", Species, "PE RLO", "PE St", "PE New","DG RLO", "DG St", "DG New" FROM RedPinkPinto WHERE "Species"="Red" AND "PE St">"PE New";

Screenshot of the output with column headers enabled:

Selection_030

Much nicer and easier to read!

Overall, using SQLite is a much faster, easier, and more flexible process to filter datasets than trying to do so in a traditional spreadsheet.