Convert a SQL Dump to Excel

The attached script can be used to migrate data out of a sqldump and into csv and xls for further analysis or ETL’ing. The name of each table is used to create a separate csv file (e.g. table1.csv) and an xls representation of each table is put into a workbook in a spreadsheet.

The script requires Python 3.7. So if you’re on a Mac you might need to install that. Here, we use Homebrew to do so:

brew install python3

Then you’d need to install the specific mods:

python3 -m pip install pandas

python3 -m pip install openpyxl

The command is then run as follows:

python3 sqlcsvxlsexport.py <sql dump file> <target directory>

The script is accessible here:

Python Script To Move A SQL Database To .csv Files

You have a database, such as a mysql dump of a Jamf Pro server, or a sql dump of a WordPress site. You want to bring it into another tool or clean the data using a csv as an intermediary. Or you’re using an Amazon Glue job to ETL the data. The following script will take that sql dump and convert it into a bunch of csv files.

To use the script, simply run it with $1 as the path to the sql file and $2 as the path to the export directory, as follows:

python sqlcsvexport.py /sql_file_path /target_dir

To download the script:

Show MySQL Settings

MySQL usually pulls settings from a my.cnf file. However, you can end up with settings in include files, which can be defined in the my.cnf using the following directives: include /home/mydir/myopt.cnf includedir /home/mydir Because of this, and the fact that you might not have access to all locations of .cnf files on a filesystem, you can also grab them using the SHOW VARIABLES option within SQL, obtained by /usr/local/mysql/bin/mysql -uroot -p mypassword -e "SHOW VARIABLES;" > /tmp/SQLSettings.txt In the above command, -uroot defines we’ll be accessing with the root user, -p defines the password (listed as mypassword) and the -e defines that we want to execute a command and then quit. We then use > to dump the output into the defined file.

Mapping New SQLite Databases

I’ve written about SQLite databases here and there over the years. A number of Apple tools and third party tools for the platform run on SQLite and it’s usually a pretty straight forward process to get into a database and inspect what’s there and how you might programmatically interact with tools that store data in SQLite. And I’ll frequently use a tool like Navicat to quickly and visually hop in and look at what happens when I edit data that then gets committed to the database. But I don’t always have tools like that around. So when I want to inspect new databases, or at least those new to me, I need to use the sqlite3 command. First, I need to find the databases, which are .db files, usually stored somewhere that a user has rights to alter the file. For example,  /Library/Application Support/My Product. In that folder, you’ll usually find a db file, which for this process, we’ll use the example of Data.db. To access that file, you’d simply run sqlite3 with the path of the database, as follows: sqlite3 /Library/Application\ Support/My\ Product/Data.db To see a list of tables in the database, use .tables (note that a tool like Postgress would use commands like /tr but in SQLite we can run commands with a . in front and statements like select do not use those): .tables To then see a list of columns, use .schema followed by the name of a table. In this case, we’ll look at iOS_devices, which tracks the basic devices stored on the server: .schema iOS_devices The output shows us a limited set of fields, meaning that the UDID is used to link information from other tables to the device. I like to enable column headers, unless actually doing an export (and then I usually do it as well): .headers ON Then, you can run a standard select to see what is in each field, which in the below example would be listing all information from all rows in the myapptable table: select * from myapptable; The output might be as follows: GUID|last_modified|Field3|Field4 abcdefg|2017-01-26T17:02:39Z|Contents of field 3|Contents of field four Another thing to consider is that a number of apps will use multiple .db files. For example, one might contain tables about users, another for groups, and another for devices in a simple asset tracking system. This doesn’t seem great at first, but I’ve never really judged it, as I don’t know what kind of design considerations they were planning for that I don’t know. If so, finding that key (likely GUID in the above example) will likely be required if you’re doing this type of reverse engineer to find a way to programmatically inject information into or extract information out of a tool that doesn’t otherwise allow you to do so.

Use the Slow Query Log in MySQL

Slow queries can cause poor database performance. MySQL has a slow query log, which lets you log queries that take too long (with too long a user-configurable setting). This allows you to quickly find craptastic queries in your logs. To enable the slow query log, use SET GLOBAL and then set the slow_query_log option to ON: SET GLOBAL slow_query_log = 'ON'; By default, the slow query log identifies any query that takes 10 seconds or longer. To change long_query_time using SET GLOBAL. Here, we’ll set it to 7 seconds: SET GLOBAL long_query_time = 7; You can also set the log file to something other than the default location of /var/lib/mysql/hostname-slow.log. Again, use SET GLOBAL and then provide a path for slow_query_log_file: SET GLOBAL slow_query_log_file = '/var/logs/slowquery.log'; To disable the slow query log: SET GLOBAL slow_query_log = 'OFF';

Use SQL Views to Grant Access and Constrain Output

You can grant access to certain columns to view in SQL without providing access to specific users to see the whole database. This is pretty useful when delegating reporting to users, without giving them access to all of the data in your database. For example, a user might be able to see a column with an address, but not a column with a credit card number, increasing database security while allowing you to delegate certain tasks when appropriate. In this article, we’ll use the same “Customers” table from our first articles, signupdate: ID Site Contact Address City Zip Country SignupDate 1 Krypted Charles Edge my house Minneapolis 55418 US 2005-01-01 2 Apple Tim Cook spaceship Cupertino 95014 US 2015-12-05 3 Microsoft Satya Nadella campus Redmond 98053 US 2014-11-01 4 Facebook Mark Zuckerberg foodhall Menlo Park 94025 US 2010-03-10 5 JAMF Dean Hager Grain Exchange Minneapolis 55418 US 2016-01-01 Next, we’ll create a view called SignupDate that only has customers that signed up on January 1st of 2005. This view returns the data set of contacts and signup dates: CREATE VIEW signupdate AS SELECT * FROM Customers WHERE OrderDate='2005-01-01'; The syntax is similar to a SELECT, but with CREATE VIEW followed by the name of the view and then AS followed by the SELECT statement. The view is a virtual table containing the output of the query rather than data. Once created, use use the signupdate view in a query: SELECT * FROM signupdate; This SQL statement returns the following results: 1 Krypted Charles Edge my house Minneapolis 55418 US 2005-01-01 You can also use the view to contain a query with just the columns you want, according to how you structure your query, thus granting access to specific columns, without granting access to all of the columns in a table.

Index Those SQL Tables

If you have growing sets of data, one of the best ways to speed up database performance is to make sure each column in larger tables is indexed. You can easily index a column following this syntax (using the name of your table in the place of tablename and the name of your column in the place of columnname): ALTER TABLE tablename ADD INDEX (columnname); So if you have a table called Customers and the following columns (as in the case of my Customers database from the previous exercises): ID Site Contact Address City Zip Country You would index them all as follows: ALTER TABLE Customers ADD INDEX (ID); ALTER TABLE Customers ADD INDEX (Site); ALTER TABLE Customers ADD INDEX (Contact); ALTER TABLE Customers ADD INDEX (Address); ALTER TABLE Customers ADD INDEX (City); ALTER TABLE Customers ADD INDEX (Zip); ALTER TABLE Customers ADD INDEX (Country);

SQL and Date Functions

SQL has some really great features for managing dates. Given that dates are a very common thing to store in databases, that’s a pretty logical thing to get to a good, mature point. But the most challenging aspect of working with date and time in SQL is formatting. There are lots of different ways to format dates around the world, and even different structures (e.g. epoch) that are often complicated by adding time to records. But, you need to do things in a consistent fashion with how SQL expects them, if you’ll be using built in functions to manage dates. Dates can be stored in a variety of formats, which include the following:
  • DATE: Appears as YYYY-MM-DD
  • DATETIME: Appears as YYYY-MM-DD HH:MI:SS
  • TIMESTAMP: Appears as YYYY-MM-DD HH:MI:SS
  • YEAR: Appears as YY or YYYY
What are those functions?
  • CONVERT: Shows the date/time data in the specified format
  • CURATE: Shows the current date
  • CURTIME: Shows the current time
  • DATE: Extracts the date from a date/time field
  • DATE_ADD: Adds the time to a date
  • DATE_SUB: Removes the time from a date
  • DATE_FORMAT: Shows the date and time data in different formats
  • DATEDIFF: Shows the days between two specified dates
  • EXTRACT: Shows part of a date and time formatted value
  • NOW: Shows the current date and time, frequently used to capture the date during signups, etc
In this article, we’ll use the same “Customers” table from our first articles, but we’ll add a column for signupdate: ID Site Contact Address City Zip Country SignupDate 1 Krypted Charles Edge my house Minneapolis 55418 US 2005-01-01 2 Apple Tim Cook spaceship Cupertino 95014 US 2015-12-05 3 Microsoft Satya Nadella campus Redmond 98053 US 2014-11-01 4 Facebook Mark Zuckerberg foodhall Menlo Park 94025 US 2010-03-10 5 JAMF Dean Hager Grain Exchange Minneapolis 55418 US 2016-01-01 Let’s show (SELECT) records with an SignupDate of 2005-01-01 from the table above. SELECT * FROM Customers WHERE OrderDate='2005-01-01' Overall, dates are pretty easy. Times are a little more challenging, and take up much more space in a database. But overall, provided you build the column with the right format you want to use, the functions make managing date tasks all the easier. I’m not going to cover all of the options, but let’s look at EXTRACT real quick. Here, we’ll serialize the year, month, and day from the previous select. To do so, we’ll use the SELECT statement, call the EXTRACT function, and then split the three parts of the date into variables we’re making up (so you can use your own strings for the names), as SignupYear, SignupMonth, and SignupDay: SELECT EXTRACT(YEAR FROM SignupDate) AS SignupYear, EXTRACT(MONTH FROM SignupDate) AS SignupMonth, EXTRACT(DAY FROM SignupDate) AS SignupDay, FROM Customers WHERE OrderDate='2005-01-01' Obviously, we knew the OrderDate, but we could have used any other column and pattern to match for that column in the WHERE to serialize date information based on other parameters. Fun stuff.

Manage SQL Schemas with Flyway

Flyway is a tool that allows you to perform version-controlled management of SQL schemas. You can download flyway at http://flywaydb.org. It doesn’t require a build, so once downloaded, drop it where you want it to live and you can just summon the binary in scripts. Once installed, the following is the basic structure of commands for Flyway: flyway [options] command By default, the configuration will be read from conf/flyway.conf. Options passed from the command-line override the configuration. Commands ======== migrate : Migrates the database clean : Drops all objects in the configured schemas info : Prints the information about applied, current and pending migrations validate : Validates the applied migrations against the ones on the classpath baseline : Baselines an existing database at the baselineVersion repair : Repairs the metadata table Options (Format: -key=value) ======= driver : Fully qualified classname of the jdbc driver url : Jdbc url to use to connect to the database user : User to use to connect to the database password : Password to use to connect to the database schemas : Comma-separated list of the schemas managed by Flyway table : Name of Flyway’s metadata table locations : Classpath locations to scan recursively for migrations resolvers : Comma-separated list of custom MigrationResolvers sqlMigrationPrefix : File name prefix for Sql migrations sqlMigrationSeparator : File name separator for Sql migrations sqlMigrationSuffix : File name suffix for Sql migrations encoding : Encoding of Sql migrations placeholderReplacement : Whether placeholders should be replaced placeholders : Placeholders to replace in Sql migrations placeholderPrefix : Prefix of every placeholder placeholderSuffix : Suffix of every placeholder target : Target version up to which Flyway should use migrations outOfOrder : Allows migrations to be run “out of order” callbacks : Comma-separated list of FlywayCallback classes validateOnMigrate : Validate when running migrate cleanOnValidationError : Automatically clean on a validation error baselineVersion : Version to tag schema with when executing baseline baselineDescription : Description to tag schema with when executing baseline baselineOnMigrate : Baseline on migrate against uninitialized non-empty schema configFile : Config file to use (default: conf/flyway.properties) configFileEncoding : Encoding of the config file (default: UTF-8) jarDirs : Dirs for Jdbc drivers & Java migrations (default: jars) Add -X to print debug output Add -q to suppress all output, except for errors and warnings Example ======= flyway -target=1.5 -placeholders.user=my_user info More info at http://flywaydb.org/documentation/commandline