Batch Process Sentiment Analysis for UX Research Studies

I added the option to run sentiment analysis on csv files exported from a UX Research repository like Handrail to the lightweight sentiment analysis tool I worked on recently. That’s available at https://github.com/krypted/lightweightsentiment.

This makes it pretty simple to pull in a csv and then add a column and supply a sentiment score for each row of the csv. It’s written to be generic and heavily uses python’s nltk. Make sure to install python3 before running it (e.g. brew install python3 with homebrew).

Once you have python3 installed, download the files from https://github.com/krypted/lightweightsentiment into a directory (e.g. on your desktop or in a place you like to keep such things). Then, let’s download your csv from Handrail. To do so, open Handrail, click on the study to export and then click on Results. Then, just click on the button to Export Data.

The data export will show up in your downloads directory (~/Downloads on a Mac). Drag the files into your folder with the sentiment scripts so they look like this:

Now let’s fire up Terminal and run the script. First change your working directory to the script folder (let’s say it’s in ~/Downloads/sentiment you’d run:

cd ~/Downloads/sentiment

And then run it. I’m going to run it on the answers and questions file, but it could be run on a column in notes, etc.

python3 parse_csv.py --file="Episodes_analysis_page.csv" --column="Group Description"

A new file will be created with the scores.

This type of quantifiable overlay to qualitative research is just another arrow in the ux researcher quiver. The raw csv can then be dragged into analytical tools like Domo or Splunk, visualization tools like Tableau, or even hooked into a workflow where we’re automating against the Jira command line, as follows: https://krypted.com/programming-2/create-jira-issues-from-the-command-line/. In fact, the script was written modular (and so is a little harder to use manually than it would otherwise be) for the express purpose of being implemented as a series of lambdas to run as a server-less component of a continual research workflow. Sooooo, hope it helps someone else out there.

Finally, there’s a data.json file. The more you train that file, the more accurate the analysis will get. That would look different for every industry and geo, given different words used to describe things. The default training data provides a decent bell-curve but ymmv.