Python

7 Python Machine Learning Modules To Get Acquainted With

A lot of machine learning work has been done in python. Therefore, there are a few python modules that make the work of normalizing data, analyzing data, and training algorithms much easier. Let’s look at a few of these and if you have suggestions, please feel free to comment!

Tensorflow

Tensorflow is more of an ecosystem, but you can import it into a python project and build some pretty amazing stuff relatively quickly. You usually see it imported as just tf:

import tensorflow as tf

While it helps to have a basis in linear algebra and vector calculus that’s not entirely necessary. A tensor is an object that is similar to a vector but more general. Usually it’s an array of parts that are each a function of a coordinate. So think of tensorflow as generating these large arrays and then analyzing those coordinates and training further models based on that analysis. A common example is to generate a model of various attributes of an object, like all the metadata captured about a movie in Netflix, and then looking for other movies whose array is similar, or the “nearest neighbor” of that object.

Tensor flow can do much, much more. It’s one of the more powerful machine learning libraries, although it can get a bit bloated given the flexibility.

NumPy

Take just a little number nerderation rather than full on machine learning and that’s NumPy. The smaller the module you need to do something in the better off you are. To see it in action, here’s a krypted article from awhile back.

NumPy is able to do much more complicated statistical analysis and I really like the visual of the 8th dwarf from Snow White being a machine learning nerd called numpy that they never invite to hang out. But that’s aside from the point. Import as numpy to get started:

pip install numpy

import numpy as np

pandas

If you need to allow for different types of data to be stored together, you will need to move beyond NumPy and into something more sophisticated like pandas (or Python Data Analysis Library). Multidimensional structured data sets need a lot of data wrangling and pandas is probably the best tool for doing that. It’s often imported as pd:

import pandas as pd

Once imported, you can easily bring data in with pd.read_filetype() and then convert it for parsing with pd.DataFrame() and then there are lots of great options for filtering, sorting, joining, combining, grouping, selecting, and cleaning the data.

PyMongo

Mongodb is a database that uses a JSON format to store data in rows. This is great for unstructured data where you maybe don’t have a common schema for the objects being imported. It’s easy to install

pip install pymongo

Once installed you can use PyMongo to interact with it:

import pymongo

You can then open a connection:

connection = pymongo.MongoClient('mongodb://localhost:27017')

Then list some database names:

connection.database_names()

Matplotlib

Often seen in scripts as plt for short, as the name implies Matplotlib is great for plotting information in 2D for easy visualizations. Never underestimate plotting data on an x and y axis and just eyeballing it to point you in the direction of how you want to train a model.

Scikit-learn

Usually just imported as sklearn, this is a library that has algorithms for classification, clustering, and regressions. It builds on, and so requires SciPy and NumPy. There’s a lot of science data sets you can import to start training models quickly. But a great use is around stuff like malware detection, like this: https://www.randhome.io/blog/2016/07/16/machine-learning-for-malware-detection/. To import, simply:

pip install sklearn

import sklearn

SciPy

SciPy is also built on top of NumPy and gives you more functions. It’s typically imported as just sp.

People responsible for the management of devices have a lot of data at our fingertips. This might just be inventory data, logs, or even information joined (aka enriched) from different systems. As such, we have the ability to infer much more about devices, behaviors, malware, and what our users might need than a lot of other people. The term machine learning might make a lot of Systems Administrators or engineers think they need a data scientist around to do that kind of work, but if you can do some basic python scripting and get all that data into a standard format, you can get a lot further than you think, and pretty quickly. Happy number nerding!