Building Fare Predictor using Regression Evaluator of Machine Learning .Net(Part 5 of 5)

It is the fifth and final part of a 5 part blog series of MachineLearning.net, here are the links of previous blogs of this series.

First Blog Post on the introduction of Machine Learning.NET https://cloudandmobileblog.com/2018/07/09/introduction-of-machine-learning-net-part-1-of-5/

Second Blog Post on Clustering in Machine Learning .NET  https://cloudandmobileblog.com/2018/07/15/clustering-in-machinelearning-net/

Third Blog Post on Understanding Binary Classification  https://cloudandmobileblog.com/2018/07/28/understanding-binary-classification-using-sentiment-analysis-through-ml-net-part-3-of-5/

Fourth Blog Post on Sentiment Analysis using FastTreeBinaryClassifier https://cloudandmobileblog.com/2018/07/28/sentiment-analysis-using-machine-learning-net-part-4-of-5/


Linear regression is a linear approach to modeling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables). The case of one explanatory variable is called simple linear regression. For more than one explanatory variable, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.

In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters are estimated from the data. Such models are called linear models.

Most commonly, the conditional mean of the response given the values of the explanatory variables (or predictors) is assumed to be an affine function of those values; less commonly, the conditional median or some other quantile is used. Like all forms of regression analysis, linear regression focuses on the conditional probability distribution of the response given the values of the predictors, rather than on the joint probability distribution of all of these variables, which is the domain of multivariate analysis.


Now we will solve a problem through Regression and to be specific FastTreeRegressor.

FastTrees is an efficient implementation of the MART gradient boosting algorithm. Gradient boosting is a machine learning technique for regression problems. It builds each regression tree in a step-wise fashion, using a predefined loss function to measure the error for each step and corrects for it in the next.

So this prediction model is actually an ensemble of weaker prediction models. In regression problems, boosting builds a series of such trees in a step-wise fashion and then selects the optimal tree using an arbitrary differentiable loss function.


This problem is about predicting the fare of a taxi trip in New York City. At first glance, it may seem to depend simply on the distance traveled. However, taxi vendors in New York charge varying amounts for other factors such as additional passengers or paying with a credit card instead of cash.

CSV Files are here :

https://raw.githubusercontent.com/abhiongithub/ML-for-Dot-Net-developers/master/5-Prediction/FarePredictor/FarePredictor/taxi-fare-test.csv

https://raw.githubusercontent.com/abhiongithub/ML-for-Dot-Net-developers/master/5-Prediction/FarePredictor/FarePredictor/taxi-fare-train.csv

The provided dataset contains the following columns:

  • vendor_id: The ID of the taxi vendor is a feature.
  • rate_code: The rate type of the taxi trip is a feature.
  • passenger_count: The number of passengers on the trip is a feature.
  • trip_time_in_secs: The amount of time the trip took. You want to predict the fare of the trip before the trip is completed. At that moment you don’t know how long the trip would take. Thus, the trip time is not a feature and you’ll exclude this column from the model.
  • trip_distance: The distance of the trip is a feature.
  • payment_type: The payment method (cash or credit card) is a feature.
  • fare_amount: The total taxi fare paid is the label.

Step 1: Create a new Dot Net Core Console App, I am using Visual Studio for Mac as shown below,  you can also use Visual Studio code on Linux or Visual Studio 2017 for Windows.

1-NewConsoleApp


Step 2:  Add Microsoft.ML Nuget package and import the CSV files , set the properties of CSV files as “Copy to Output directory” as shown below.

2-BuildingFarePRedictor.png


Step 3: Create a new file TaxiTrip.cs as shown below


Step 4: Create a new file TaxiTripFarePrediction.cs as shown below


Step 5: Update Program.cs as shown below


Step 6:  Now When you will run this app, you can see the predicted fare as 31.4 while the actual fare is 29.5 which is very close to what we have predicted.

4-PredictedOutput.png

You can download this application code from here

https://github.com/abhiongithub/ML-for-Dot-Net-developers

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.