Sentiment Analysis using Machine Learning .NET (Part 4 of 5)

It is the fourth part of a 5 part blog series of MachineLearning.net, here are the links of previous blogs of this series.

First Blog Post on the introduction of Machine Learning.NET https://cloudandmobileblog.com/2018/07/09/introduction-of-machine-learning-net-part-1-of-5/

Second Blog Post on Clustering in Machine Learning .NET  https://cloudandmobileblog.com/2018/07/15/clustering-in-machinelearning-net/

Third Blog Post on Understanding Binary Classification  https://cloudandmobileblog.com/2018/07/28/understanding-binary-classification-using-sentiment-analysis-through-ml-net-part-3-of-5/


Classification is a machine learning task that uses data to determine the category, type, or class of an item or row of data. For example, you can use classification to:

  • Identify sentiment as positive or negative.
  • Classify email as spam, junk, or good.
  • Determine whether a patient’s lab sample is cancerous.
  • Categorize customers by their propensity to respond to a sales campaign.

Classification tasks are frequently one of the following types:

  • Binary: either A or B.
  • Multiclass: multiple categories that can be predicted by using a single model.

This sample is a console app that uses ML.NET to train a model that classifies and predicts sentiment as either positive or negative. It also evaluates the model with the second dataset for quality analysis. The sentiment datasets are from the WikiDetox project.

We first need to understand the problem so we can break it down into parts that can support building and train the model. Breaking the problem down you to predict and evaluate the results.

The problem with this tutorial is to understand incoming website comment sentiment to take the appropriate action.

We can break down the problem into the sentiment text and sentiment value for the data you want to train the model with, and a predicted sentiment value that you can evaluate and then use operationally.


Step 1: Create a new Dot Net Core Console App, I am using Visual Studio for Mac as shown below,  you can also use Visual Studio code on Linux or Visual Studio 2017 for Windows.

1-NewConsoleApp


Step 2: Now Install the Microsoft.ML Nuget package and download the following datasets

https://github.com/abhiongithub/ML-for-Dot-Net-developers/blob/master/4-SentimentAnalysis/SentimentAnalysis/SentimentAnalysis/sentiment-imdb-train.txt

https://github.com/abhiongithub/ML-for-Dot-Net-developers/blob/master/4-SentimentAnalysis/SentimentAnalysis/SentimentAnalysis/sentiment-yelp-test.txt

Add these txt files into Visual Studio project and set files properties as “Copy to output directory”.


Step 3: Now create a new file SentimentData.cs as shown below


Step 4: Now create a new file SentimentPrediction.cs


Step 5: Now add a new file TestSentimentData.cs

If you have done everything correctly by now your solution must look like this

Screen Shot 2018-07-29 at 12.44.14 AM


Step 6:  Here we will be using  FastTreeBinaryClassifier

A decision (or regression) tree is a binary tree-like flow chart, where at each interior node one decides which of the two child nodes to continue to based on one of the feature values from the input. At each leaf node, a value is returned. In the interior nodes, the decision is based on the test ‘x <= v’ where x is the value of the feature in the input sample and v is one of the possible values of this feature. The functions that can be produced by a regression tree are all the piece-wise constant functions.

The ensemble of trees is produced by computing, in each step, a regression tree that approximates the gradient of the loss function and adding it to the previous tree with coefficients that minimize the loss of the new tree. The output of the ensemble produced by MART on a given instance is the sum of the tree outputs.

  • In case of a binary classification problem, the output is converted to a probability by using some form of calibration.
  • In case of a regression problem, the output is the predicted value of the function.
  • In case of a ranking problem, the instances are ordered by the output value of the ensemble.

Now update your Program.cs as shown below


Step 7: Now when we run this program we can notice that it is predicting positive and negative sentiments correctly as shown below.

7-PredictedCorrectSenetiments.png


The GitHub repository of this sample code is here.

https://github.com/abhiongithub/ML-for-Dot-Net-developers

Here is the link to next blog post of this series https://cloudandmobileblog.com/2018/07/28/building-fare-predictor-using-regression-evaluator-of-machine-learning-netpart-5-of-5/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.