Linear Algebra in Python Scripts

Preparing data for tensorflow is often easiest when done as a two step process. In machine learning, you often get into trying to plot points, calculate tangents, and a lot of basic algebra. Working out equations kinda’ reminds me of being in in-school suspension in high school. Except now we’re writing code to solve the problems rather than solving them ourselves.

I never liked solving for a matrix… But NumPy is a great little framework to import that does a lot of N-dimensional array work. A few basic tasks in the following script includes a number of functions across norms, matrix products, vector products, decompose, and eigenvalues. Remove/comment what you don’t need:

import numpy as np from numpy import linalg as LA array = [[-1,4,2],[1,-1,3]] array2 = [[6,2,1],[7,1,-3]] array = np.asarray(array) converted = np.fliplr(array) #ifsquare >> cholesky = LA.cholesky(array) #ifsquare >> inv = LA.inv(array) #ifsquare >> determinant = LA.det(array) #ifsquare >> signlog = LA.slogdet(array) print print 'ONE ARRAY' print 'Sum: ', np.trace(converted) print 'Elements: ', np.diagonal(converted) print 'Solved: ', LA.norm(array) print 'qr factorization: ' print np.linalg.qr(array) print 'Hermitian: ' print LA.svd(array) print print 'TWO ARRAYS' print 'vdot: ',np.vdot(array,array2) print 'Inner: ' print np.inner(array,array2) print 'Outer: ' print np.outer(array,array2) print 'Tensor dot product: ',np.tensordot(array,array2) print 'Kronecker product: ' print np.kron(array,array2)

Create Quick And Dirty PDFs from Web Pages in Python

Let’s say you want to make a script that creates a PDF of a web page. pdfkit makes that pretty easy. Simply import pdfkit and then call pdfkit.from_url, passing along the source location as your first parameter and the resultant file as your second, as follows, using http://docs.jamf.com/10.10.1/jamf-pro/release-notes/What’s_New_in_This_Release.html as our source and just calling the pdf we create Release_Notes.pdf:

import pdfkit pdfkit.from_url('http://docs.jamf.com/10.10.1/jamf-pro/release-notes/What's_New_in_This_Release.html', 'Release_Notes.pdf')

Your source location could also be from a standard html file (e.g. if you’re running from your site location) and for those you’d use pdfkit.from_file instead of pdfkit.from_url. If you don’t have pdfkit installed, you might need to pip it first:

pip install pdfkit

One last note, you can also change a few options you can pass pdfkit for job processing: page-size, margin-top, margin-left, margin-right, and margin-bottom:

import pdfkit myoptions = { 'page-size': 'A4', 'margin-top': '1in', 'margin-left': '1in', 'margin-right': '1in', 'margin-bottom': '1in', } pdfkit.from_url('http://docs.jamf.com/10.10.1/jamf-pro/release-notes/What's_New_in_This_Release.html', 'Release_Notes.pdf',options=myoptions)

When I’ve used options, things always seem to take a long time and a lot more resources to run, so I don’t any more. But that’s just me…

Python Script To Move A SQL Database To .csv Files

You have a database, such as a mysql dump of a Jamf Pro server, or a sql dump of a WordPress site. You want to bring it into another tool or clean the data using a csv as an intermediary. Or you’re using an Amazon Glue job to ETL the data. The following script will take that sql dump and convert it into a bunch of csv files.

To use the script, simply run it with $1 as the path to the sql file and $2 as the path to the export directory, as follows:

python sqlcsvexport.py /sql_file_path /target_dir

To download the script:

A Bit About Python Module Imports

The first lines of practically every python script ever begin with the word import. Per the python docs at https://docs.python.org/2/tutorial/modules.html a module is “a file containing Python definitions and statements.” To open a python environment, simply open Terminal and type Python

python

To import a module just type import followed by the name of the module. A common one is os, so let’s import that:

import os

Now let’s use str to see where it is:

str(os)

You’ll then see what the script was that is actually the os module and where it can be found/edited/viewed. Now let’s do a couple of basic tasks with os. First, let’s grab the user who’s running the script’s id (still from a standard python interpreter):

os.getuid()

Here, we’re using a function that’s built into that os script, and the () is the parameters we’re passing to the function. When run in that interactive mode, you can use os.environ to see what environment variables your python script has access to (e.g. if you’re shelling out a command).

os.environ

Now let’s use that os to actually shell out a bash command. Here, we’ll set a variable in the python interpreter:

bashCommand="cat /test.log"

Basically, we just set a variable called bashCommand to contain a simple cat of a log file (quoting it since it has special characters and all that). Next, we’ll use os.system with the variable of the command as the parameter we’re sending into the command:

os.system(bashCommand)

Now I can clear the contents of that bashCommand variable using the del command, still from within that python console:

del bashCommand

When a module is imported, the interpreter first searches for a built-in module with the name you supply. If the interpreter can’t find a module, it will then search through the current working directory, then the PYTHONPATH wet in sys.path and . If not found, it then searches for a file named spam.py in a list of directories given by the variable sys.path. Now, let’s import urllib and check out what functions it has:

import urllib dir(urllib)

Then let’s ask for help for one of them:

help(urllib.basejoin)

The response will look something like this (ymmv):

urljoin(base, url, allow_fragments=True)

    Join a base URL and a possibly relative URL to form an absolute

    interpretation of the latter.

Now you know what goes in the parenthesis when you actually call the function from within your scripts.

Define docstrings in Python

Bryson mentioned Docstrings in the latest episode of the MacAdmins Podcast. But how do you use them? Documentation strings (or docstrings for short) are an easy way to document Python objects (classes, functions, methods, and modules in-line with your code. Docstrings are created using three double-quotes in the first statement of the definition and are meant to describe in a human readable way what the object does.

Let’s look at an example for hello_krypted:

def hello_krypted():
"""This simply echos Hello Krypted.

But there's so much potential to do more!
"""

Docstrings can then be accessed using the __doc__ attribute on objects (e.g. via print):

>>> print hello_krypted.__doc__
This simply echos Hello Krypted.

But you can say hello in person!

For more on docstrings, check out the official docs at https://www.python.org/dev/peps/pep-0257/