Using Natural Language Processing (AWS Comprehend) to Analyze Text (Cardi B Lyrics)

Humans spend a lot of time reading, analyzing, and responding through text (emails, chats, etc). A lot of this is inefficient or not for pleasure (such as the amount of payroll companies spend to read through feedback emails or the amount of time I spend sifting through Outlook each day). Using Natural Language Processing (NLP), we can reduce the inefficient and not-for-pleasure reading we do so that time can be re-invested into something more productive or fulfilling.

For fun, I scrapily ran the lyrics to Cardi B’s “I Like It” through AWS Comprehend to see what its response would be. I also ran a review of Mission Impossible: Fallout through the same service.  The full output for Cardi B can be viewed here.  The full output for Mission Impossible: Fallout can be viewed here.

While these are low value examples, a more real-world use case for Comprehend could be using AWS Comprehend to detect the language of emails sent to your company to adjust the routing destination in real-time (should they go to you English team or your Spanish?). Another example would be using Comprehend to collect feedback on your new product launch or ad campaign. For example, we could easily capture Twitter mentions for a brand, funnel those into an S3 bucket, and run the contents of that buck to split out negative vs positive vs neutral vs mixed sentiment mentions. From there, we could surface the most frequent adjectives and entities mentioned for each group of each sentiment bucket. It’s a cheap, quick way to customer capture and analyze feedback that would otherwise be ignored.

Initial Setup
In each of the last couple of posts, I’ve outline how to create an IAM user for your project so I won’t repeat that again.  After we have our IAM user created and the credentials added to our /.aws/credentials file, we’ll import the AWS PHP SDK and ComprehendClient.  Next, we’ll create a Comprehend client, define the API version, region, and credentials profile to use:

<?php

require '/home/vendor/autoload.php'; 
use Aws\Comprehend\ComprehendClient;

$client = new Aws\Comprehend\ComprehendClient([
    'version'     => '2017-11-27',
    'region'      => 'us-west-2',
    'profile'     => 'fk-comprehend',
]);
$review="Your Comprehend text";

Exploring the Options
In this example, we’ll detect the language(s), entities (objects, businesses, etc), key phrases, sentiment, and syntax (parts of speech) of our sample texts (Cardi B lyrics and a movie review).  For all of these except DetectDominantLanguage, the language is a required input.  If we use Comprehend to identify that first, then we can simply repeat its output in later functions.  For each output, Comprehend also spits out a confidence score which basically  tells you how confident it is in the output.  This could be used to ignore low-confidence suggestions, thus increasing the accuracy of the models you build using Comprehend.

DetectDominantLanguage Example
This will detect the language and spit out the ISO abbreviation.

//Detecting Dominant Language
$result = $client->detectDominantLanguage([
    "Text" => "$review",
]);

echo "<h1>DetectDominantLanguage</h1><pre>";
print_r($result);
echo "</pre>";

foreach ($result['Languages'] as $phrase) {
    echo "Language ".$phrase['LanguageCode']." has a confidence score of ".round($phrase['Score']*100)."%.<br />";
}

DetectSentiment Example

//Detecting Sentiment
$result = $client->detectSentiment([
    "LanguageCode" => "en",
    "Text" => "$review",
]);

echo "<h1>DetectSentiment</h1><pre>";
print_r($result);
echo "</pre>";

echo "Sentiment: ".$result['Sentiment']."<br />";
echo "Positive: ".round($result['SentimentScore']['Positive']*100)."%<br />";
echo "Negative: ".round($result['SentimentScore']['Negative']*100)."%<br />";
echo "Neutral: ".round($result['SentimentScore']['Neutral']*100)."%<br />";
echo "Mixed: ".round($result['SentimentScore']['Mixed']*100)."%<br />";

DetectKeyPhrases Example

//Detecting KeyPhrases
$result = $client->detectKeyPhrases([
    "LanguageCode" => "en",
    "Text" => "$review",
]);

echo "<h1>DetectKeyPhrases</h1><pre>";
print_r($result);
echo "</pre>";

foreach ($result['KeyPhrases'] as $phrase) {
    echo "Phrase ".$phrase['Text']." has a score of ".round($phrase['Score']*100)."%.<br />";
}

DetectSyntax Example

//Detecting Syntax
$result = $client->detectSyntax([
    "LanguageCode" => "en",
    "Text" => "$review",
]);

echo "<h1>DetectSyntax</h1><pre>";
print_r($result);
echo "</pre>";

foreach ($result['SyntaxTokens'] as $syntax) {
    echo "Phrase ".$syntax['Text']." is as ".$syntax['PartOfSpeech']['Tag']." (with ".round($syntax['PartOfSpeech']['Score']*100)."% confidence).<br />";
}

DetectEntities Example

//Detecting Entities
$result = $client->detectEntities([
    "LanguageCode" => "en",
    "Text" => "$review",
]);

echo "<h1>DetectEntities</h1><pre>";
print_r($result);
echo "</pre>";

foreach ($result['Entities'] as $syntax) {
    echo "Phrase ".$syntax['Text']." is as ".$syntax['Type']." (".round($syntax['Score']*100)."% confidence).<br />";
}

The Results for Cardi B and Tom Cruise

The full output for Cardi B can be viewed here.  This one is the most interesting of the two as “I like it” has a Spanish verse.  You can see how Comprehend dealt with it when it was passed as English.  It also does a good job of determining when “bitch” is a noun vs an adjective except in the line “Where’s my pen? Bitch I’m signin'” — I’m unsure as to why.

The full output for Mission Impossible: Fallout can be viewed here.  The interesting piece here is the sentiment analysis: NEUTRAL (8% positive, 36% negative, 39% neutral, and 17% mixed).  After reading the review, I would say this is pretty in-line with the reviewer and Comprehend did a good job of identifying the overall sentiment of the article.

Using AWS Lambda to Send SNS Topics in CloudWatch

AWS Lambda enables you to run code without managing a server.  You simply plop in your code and it does the rest (no maintenance, scaling concerns, etc).  The cost is only $0.20 per 1 million requests/month and the first million requests are free each month.

In the previous post, I setup an SNS topic. I’m extending this further so that a node.js function will be triggered in AWS Lambda each time my SNS topic is triggered. This Lambda function will feed metrics into AWS CloudWatch which will allow me to chart/monitor/set alarms against events or patterns with my SNS topic.  A practical use case for this could be understanding event patterns or logging SNS messages (and their contents) sent to your customers.

Creating your Lambda Function

From the Lambda page of the AWS console, select “Create Function”.  From here, we’ll author from scratch.  Below are the inputs I’ve used for this example:
Name: SNSPingerToCloudWatch
Runtime: Node.js 8.10
Role: Choose and existing role
Existing role: lambda_basic_execution

On the page after selecting “Create Function”, we’ll click “SNS” from the “Add Triggers” section and then select our SNS topic in the “Configure Triggers” section.  Then click “Add” and “Save”.  Here’s a screenshot of the final state.

Next, click on your function name (SNSPingerToCloudWatch) in the flow chart and scroll to edit the function code.
The JS we’ll use:

exports.handler = async (event, context) => {
    const message = event.Records[0].Sns.Message;
    console.log('Pinger says:', message);
    return message;
};

Under Basic Settings, I’ve set the timeout duration to 5 seconds (because that’s the timeout duration I have set in my SNS topic PHP script) You can add descriptions, throttles, etc but I’m leaving those at the defaults.  Here’s a screenshot of my final config for this Lambda function.

Once complete, click “Save” again and then we’re ready to test. I manually fired my SNS topic and jumped over to the “Monitoring” tab of the console. It took a minute or so but I saw my event appear. From here, you can view the log details in CloudWatch, as well.

Indexing my movie collection

#FirstWorldProblems – having so many DVDs that you forget what you already own and end up buying multiple copies of the same movie.  While 126 movies isn’t a massive collection, it’s enough for me to sometimes forget what I have when I’m pillaging the $5 bins at Best Buy and Target.
To solve for this, I created a Google Sheets list of my collection so I could check what I have from my phone.  After typing all the titles into the list, I realized it’d be very easy for me to use the code I wrote for my DirecTV project to scrape additional details for the movies and create a nice, simple UI….so I did:
v1

What it does

  1. Using The Movie DB API, I pull several pieces of information about the film and store it locally: title, image, release date, rating, budget, revenue, runtime, synopsis, genres, cast, etc.
  2. Storing it locally reduces repetitive, slow API calls and allows me to cleanly add additional attributes like whether it’s DVD, Blu-Ray, Google Movies, Amazon Video, etc.
  3. Adding new titles is easy – I just type in the name and the rest of the details populate immediately.

 
There are two views: one shown above for desktop and another, more compact view, when a tablet or mobile device is detected:
v2
I’m not sure what’s next for this but it was a quick and fun project that connected my inner home automation and movie geeks.

Logging router traffic and other changes

It’s been a slow couple months between the holidays, travelling, and work.  I did manage to accomplish a few things with the Home Dashboard project, though.  I redesigned the UI to move away from an exclusively mobile interface as the amount of data and type of data I’m including in the project now simply don’t all make sense to squeeze into a mobile UI.  Sometime in early January, the system broke the 2 millionth record milestone — I’m unsure what I’ll do with some of the data I’m collecting at this point but I’ve learned a lot through collecting it and I’m sure I’ll learn more analyzing it at some point in the future.
_statsThis brings the list of events I’m collecting to:

  1. Indoor temperature and humidity
  2. Amazon Echo music events
  3. DirecTV program and DVR information
  4. Cell phone location and status details
  5. Local fire and police emergency events
  6. Home lights and other Wink hub events
  7. …and now home network information

Analyzing home network information

The biggest change was the addition of network event logging.  After seeing that a foreign IP was accessing my LAN, I started logging each request to or from my home network until I was sure I had fixed the vulnerability.  After that, I found the information interesting so I just kept logging it.  For example, I was able to discover that an app on my phone was making repeated calls (~2,000 per day) to an app monitoring service (New Relic) which wasn’t doing my phone’s battery life any favors.
_routerlogs
_routerlogsbydevice

Collecting additional phone data

After launching Location to HTTP, I’ve been tinkering with additional data collection from my cellphone such as Bluetooth, WiFi, GPS, battery, and screen status.  After collecting this information for a month or so, here are some useless data points from the most recent 30 days:

  • I’ve actively used my phone for 104.5 hours (3.5hrs per day – I need to cut back on the work email…)
  • Average battery level: 61%
  • Average free memory: 516MB (12.6%)
  • Average location accuracy: +/-31FT
  • Average altitude: 154FT
  • Average speed: 1MPH
  • Average uptime: 153.3HRs
  • Maximum uptime: 437.4HRs

_phonestats
I also improved the location history mapping to show different color map markers depending on the age of the record and phone details at the time the record was made:


 

Improved home climate visuals

I added some simple graphs of MoM temperature and humidity and also updated the heat-mappings for the daily climate information.  These are a bit easier to read that those I had in the previous UI.  It’s interesting to see the effectiveness of thermostat automation and our daily routines.

More detailed emergency event

Lastly, I expanded on the emergency event information to surface the top event types and the total number of events by type:


 

Collecting and Handling 911 Event Data

Seattle has a pretty awesome approach to data availability and transparency through data.Seattle.gov.  The city has thousands of data sets available (from in-car police video records to land zoning to real-time emergency feeds) and Socrata, a Seattle-based company, has worked with the city (and many other cities) to allow developers to engage this data however they like.  I spent some time playing around with some of the data sets and decided it’d be nice to know when police and fire events occurred near my apartment.
I setup a script to pull the fire and police calls for events occurring within 500 meters of my apartment and started storing them into a local database (Socrata makes it so simple – amazing work by that team).  While reading it from the API, I check the proximity of the event to my address and also the type of event (burglary, suspicious person, traffic stop, etc) and trigger emails for the ones I really want to know about (such as a near by rape, burglary, shooting, vehicle theft, etc).  I decided to store all events, even traffic stops, just because.  I may find a use for it later – who knows…
After I’ve scrubbed through and sent any notifications for events I care about, I display the data in a simple table in my existing home dashboard and highlight red any rows for events which are within certain square area of my apartment.
911table.png
To add a nice visual, I also plot the most recent events on a map using the Google Maps API.  Police events are noted with blue pins, fire events are noted with red pins:
911map
Clicking the pins will give us some details about the event:
911detail.png
All told, it was a pretty simple project which helped me gain some experience with the Google Maps API and also poke around with some of the data the city provides.  I’m sure I’ll be doing a bit more of that in the future.  These two projects have been integrated back into my home automation dashboard so I can continue to build on them in the future.