Build a Pricing Prediction Model Using Python, Scikit learn, Linear

In this piece, I would walk you through brieflyf how to predict a variant pricing based on having considered multiple variables that might be correlated to the pricing change. By the end of this piece, you can apply this module to your business actual cases using Python and Scikit learn for generating a score to predict the pricing.

blog detail

Ingredients to prepare in advance: Numpy, Pandas, Scikit learn, matplotlib, seaborn, Linear Regression, StandardScaler, RandomForest

Tables of Content: A Variant Price Prediction based on Multiple Variables Using Python, ScikitLearn

Loading Dataset
Data Exploration
Data Preprocessing
Linear Regression Model
Full Python Script of Building a Pricing Prediction Model Using Python, ScikitLearn, Linear Regression
Data Science & Machine Learning Couresa Course Recommendation

Loading Dataset

This article uses the California housing price dataset as an example. Being said that you can use your business case data as the dataset. Just be sure the dataset should have a certain amount of data which is better to predict the score.

As usual, we can use Pandas to load the dataset and apply info() to have a glance at the dataset conditions.

For better prediction, one of the main principles are the data size amongst different metrics should have the same amount of data rows. As you can see from this sample, obviously totle_bedrooms metrics show some NA in its column. Therefore, we need to drop the NA first.

data.dropna(inplace=True)

Data Exploration

First thing first, we need to set a target variant to predict. In this case, the median housing value is the target variant because basically this experiment is for property purchase decision making. So, we need to drop the metric from the existing table and set the target variant separately as a new variable in the script.

X = data.drop(['median_house_value'], axis=1)

y = data['median_house_value']

Then, we might try to explore the correlation of each variable with our target variant and have a big picture understanding if the dataset makes sense.

Generally we don’t need to use the whole dataset to fulfill this purpose. In this case, we can again leverage the train testing split. We elaborate this method in the previous chapter. If you are interested in, please explore other chapters on Easy2Digital.com

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Then, we can try to show them in the histogram graph using join() and hist() method

Or we can show in a heatmap using corr() method, which is more visual with deep and light color contrast.

Data Preprocessing

We can see a bunch of featured variables. Furthermore, when we look at the histogram distribution above, some features look non-sensible. So we might try to use log() to see if the featured variable distribution can be better.

train_data['total_rooms'] = np.log(train_data['total_rooms'] + 1)

train_data['total_bedrooms'] = np.log(train_data['total_bedrooms'] + 1)

train_data['population'] = np.log(train_data['population'] + 1)

train_data['households'] = np.log(train_data['households'] + 1)

It makes more sense after having implemented the log method in this case. We need to plus one in the log because it’s just in case some of the features might be Zero.

Then, the other critical section of data preprocessing is to convert string data type into integers. It’s because machine learning is a number driven process and it is not able to handle strings directly.

In the dataset, we find that ocean proximity is in the string data type format. Thus, we can use panda get_dummies method to handle this.

pd.get_dummies(train_data.ocean_proximity)

Predict Using Linear Regression Model

Now the dataset is in place and we can try to import a model and test the model accuracy for predicting the housing value by scaling the feature dataset.

from sklearn.linear_model import LinearRegression

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

X_train, y_train = train_data.drop(['median_house_value'], axis=1), train_data['median_house_value']

X_train_s = scaler.fit_transform(X_train)

reg = LinearRegression()

reg.fit(X_train_s, y_train)

LinearRegression()

test_data = X_test.join(y_test)

test_data['total_rooms'] = np.log(test_data['total_rooms'] + 1)

test_data['total_bedrooms'] = np.log(test_data['total_bedrooms'] + 1)

test_data['population'] = np.log(test_data['population'] + 1)

test_data['households'] = np.log(test_data['households'] + 1)

test_data = test_data.join(pd.get_dummies(test_data.ocean_proximity)).drop(['ocean_proximity'], axis=1)

X_test, y_test = test_data.drop(['median_house_value'], axis=1), test_data['median_house_value']

X_test_s = scaler.transform(X_test)

reg.score(X_test_s, y_test)

Full Python Script of building a Price Prediction model based on Multiple Variables Using Python, ScikitLearn

If you are interested in Build a Pricing Prediction Model Using Python, ScikitLearn, Linear Regression, please subscribe to our newsletter by adding the message ‘price prediction model’. We would send you the script immediately to your mailbox.

I hope you enjoy reading Build a Pricing Prediction Model Using Python, ScikitLearn, Linear Regression. If you did, please support us by doing one of the things listed below, because it always helps out our channel.

Support and Donate to our channel through PayPal (paypal.me/Easy2digital)
Subscribe to my channel and turn on the notification bell Easy2Digital Youtube channel.
Follow and like my page Easy2Digital Facebook page
Share the article on your social network with the hashtag #easy2digital
You sign up for our weekly newsletter to receive Easy2Digital latest articles, videos, and discount codes
Subscribe to our monthly membership through Patreon to enjoy exclusive benefits (www.patreon.com/louisludigital)

Tables of Content: A Variant Price Prediction based on Multiple Variables Using Python, ScikitLearn

Loading Dataset

Data Exploration

Data Preprocessing

Predict Using Linear Regression Model

Full Python Script of building a Price Prediction model based on Multiple Variables Using Python, ScikitLearn

Data Science & Machine Learning Couresa Course Recommendation

Share This Post

Comment & Review

Cookies & Data Privacy