Build a Pricing Prediction Model Using Python, ScikitLearn, Linear Regression
In this piece, I would walk you through brieflyf how to predict a variant pricing based on having considered multiple variables that might be correlated to the pricing change. By the end of this piece, you can apply this module to your business actual cases using Python and Scikit learn for generating a score to predict the pricing.
In this piece, I would walk you through brieflyf how to predict a variant pricing based on having considered multiple variables that might be correlated to the pricing change. By the end of this piece, you can apply this module to your business actual cases using Python and Scikit learn for generating a score to predict the pricing.
Ingredients to prepare in advance: Numpy, Pandas, Scikit learn, matplotlib, seaborn, Linear Regression, StandardScaler, RandomForest
Tables of Content: A Variant Price Prediction based on Multiple Variables Using Python, ScikitLearn
- Loading Dataset
- Data Exploration
- Data Preprocessing
- Linear Regression Model
- Full Python Script of Building a Pricing Prediction Model Using Python, ScikitLearn, Linear Regression
- Data Science & Machine Learning Couresa Course Recommendation
Loading Dataset
This article uses the California housing price dataset as an example. Being said that you can use your business case data as the dataset. Just be sure the dataset should have a certain amount of data which is better to predict the score.
As usual, we can use Pandas to load the dataset and apply info() to have a glance at the dataset conditions.
For better prediction, one of the main principles are the data size amongst different metrics should have the same amount of data rows. As you can see from this sample, obviously totle_bedrooms metrics show some NA in its column. Therefore, we need to drop the NA first.
data.dropna(inplace=True)
Data Exploration
First thing first, we need to set a target variant to predict. In this case, the median housing value is the target variant because basically this experiment is for property purchase decision making. So, we need to drop the metric from the existing table and set the target variant separately as a new variable in the script.
X = data.drop(['median_house_value'], axis=1)
y = data['median_house_value']
Then, we might try to explore the correlation of each variable with our target variant and have a big picture understanding if the dataset makes sense.
Generally we don’t need to use the whole dataset to fulfill this purpose. In this case, we can again leverage the train testing split. We elaborate this method in the previous chapter. If you are interested in, please explore other chapters on Easy2Digital.com
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
Then, we can try to show them in the histogram graph using join() and hist() method
Or we can show in a heatmap using corr() method, which is more visual with deep and light color contrast.
Data Preprocessing
We can see a bunch of featured variables. Furthermore, when we look at the histogram distribution above, some features look non-sensible. So we might try to use log() to see if the featured variable distribution can be better.
train_data['total_rooms'] = np.log(train_data['total_rooms'] + 1)
train_data['total_bedrooms'] = np.log(train_data['total_bedrooms'] + 1)
train_data['population'] = np.log(train_data['population'] + 1)
train_data['households'] = np.log(train_data['households'] + 1)
It makes more sense after having implemented the log method in this case. We need to plus one in the log because it’s just in case some of the features might be Zero.
Then, the other critical section of data preprocessing is to convert string data type into integers. It’s because machine learning is a number driven process and it is not able to handle strings directly.
In the dataset, we find that ocean proximity is in the string data type format. Thus, we can use panda get_dummies method to handle this.
pd.get_dummies(train_data.ocean_proximity)
Predict Using Linear Regression Model
Now the dataset is in place and we can try to import a model and test the model accuracy for predicting the housing value by scaling the feature dataset.
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train, y_train = train_data.drop(['median_house_value'], axis=1), train_data['median_house_value']
X_train_s = scaler.fit_transform(X_train)
reg = LinearRegression()
reg.fit(X_train_s, y_train)
LinearRegression()
test_data = X_test.join(y_test)
test_data['total_rooms'] = np.log(test_data['total_rooms'] + 1)
test_data['total_bedrooms'] = np.log(test_data['total_bedrooms'] + 1)
test_data['population'] = np.log(test_data['population'] + 1)
test_data['households'] = np.log(test_data['households'] + 1)
test_data = test_data.join(pd.get_dummies(test_data.ocean_proximity)).drop(['ocean_proximity'], axis=1)
X_test, y_test = test_data.drop(['median_house_value'], axis=1), test_data['median_house_value']
X_test_s = scaler.transform(X_test)
reg.score(X_test_s, y_test)
Full Python Script of building a Price Prediction model based on Multiple Variables Using Python, ScikitLearn
If you are interested in Build a Pricing Prediction Model Using Python, ScikitLearn, Linear Regression, please subscribe to our newsletter by adding the message ‘price prediction model’. We would send you the script immediately to your mailbox.
I hope you enjoy reading Build a Pricing Prediction Model Using Python, ScikitLearn, Linear Regression. If you did, please support us by doing one of the things listed below, because it always helps out our channel.
- Support and Donate to our channel through PayPal (paypal.me/Easy2digital)
- Subscribe to my channel and turn on the notification bell Easy2Digital Youtube channel.
- Follow and like my page Easy2Digital Facebook page
- Share the article on your social network with the hashtag #easy2digital
- You sign up for our weekly newsletter to receive Easy2Digital latest articles, videos, and discount codes
- Subscribe to our monthly membership through Patreon to enjoy exclusive benefits (www.patreon.com/louisludigital)