Generate Feature Importance Using Scikit learn, Random Forest

The random forest algorithm has been applied across a number of industries, allowing them to make better business decisions. Some use cases include high credit risk analysis and product recommendation for cross-sell purposes.

In this piece, I would briefly walk you through several methods of generating feature importance by using classic red wine quality validator dataset. By the end of this chapter, you can have a basic concept to use Random forest applied to your projects and compare the result amongst different methods.

Table of Contents: Generate the Object Feature Importance Using Scikit learn and Random Forest in Machine Learning

Red wine dataset and data training split
Built-in Feature Importance with Scikit-learn
Built-in Scikit-learn Method with a Random Feature
Permutation Feature Importance
Random Forest Feature Importance with SNAP
Random Forest Path Feature Importance
Full Python Scripts of Feature importance generator
Data Science & Machine Learning Couresa Course Recommendation

Red wine dataset and data training split

For any machine learning model, getting a proper dataset or preprocess the data is critical. Kaggle is one of the most popular platforms for you to look up proper dataset. Here is the link for the red wine quality project.

https://www.kaggle.com/datasets/uciml/red-wine-quality-cortez-et-al-2009

First thing thing, processing the data using Pandas and Sklearn train_test_split is the first step.

url = "winequality-red.csv"

wine_data = pd.read_csv(url, sep=";")

x = wine_data.drop('quality', axis=1)

y = wine_data['quality']

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.5, random_state=50)

Built-in Feature Importance with Scikit-learn

Scikit-learn provides a built-in feature importance method for Random Forest models. According to the documentation, this method is based on the decrease in node impurity.

scikit learn in-built random forest

In a Random Forest, the questions are like the features in the model. Some questions help you eliminate more possibilities than others. The assumption is that features that help you eliminate more possibilities quickly are more important because they help you get closer to the correct answer faster. It’s very simple to get these feature importances with Scikit-learn:

rf = RandomForestRegressor(n_estimators=100, random_state=50)

rf.fit(x_train, y_train)

inbuilt_importances = pd.Series(rf.feature_importances_, index=x_train.columns)

inbuilt_importances.sort_values(ascending=True, inplace=True)

inbuilt_importances.plot.barh(color='black')

Built-in Scikit-learn Method with a Random Feature

The most simple way to advance this method is to add a random feature to the dataset and see if the result might be deviated more than the 1st one without random.

If a real feature has lower importance than the random feature, it could indicate that its importance is just due to chance.

def randomMethod():

X_train_random = x_train.copy()

X_train_random["RANDOM"] = np.random.RandomState(42).randn(x_train.shape[0])

rf_random = RandomForestRegressor(n_estimators=100, random_state=42)

rf_random.fit(X_train_random, y_train)

importances_random = pd.Series(rf_random.feature_importances_, index=X_train_random.columns)

importances_random.sort_values(ascending=True, inplace=True)

importances_random.plot.barh(color='blue')

plt.xlabel("Importance")

plt.ylabel("Feature")

plt.title("Feature Importance - Scikit Learn Built-in with random")

plt.show()

return

Permutation Feature Importance

Permutation feature importance is another technique to estimate the importance of each feature in a Random Forest model by measuring the change in the model’s performance when the feature’s values are randomly shuffled.

One of the advantages of this method is that it can be used with any model, not just Random Forests, which makes the results between models more comparable.

Random Forest Feature Importance with SNAP

SHAP is a method for interpreting the output of machine learning models based on game theory.

It provides a unified measure of feature importance that, like the permutation importance, can be applied to any model.

The main drawback of it is that it can be computationally expensive, especially for large datasets or complex models.

Random Forest Path Feature Importance

Another way to understand how each feature contributes to the Random Forest predictions is to look at the decision tree paths that each instance takes.

It calculates the difference between the prediction value at the leaf node and the prediction values at the nodes that precede it to get the estimated contribution of each feature.

Full Python Script of Feature importance generator

If you are interested in Chapter 76 – Generate the Object Feature Importance Using Scikit learn and Random Forest, please subscribe to our newsletter by adding the message ‘Chapter 75 + notion api’. We would send you the script immediately to your mailbox.

I hope you enjoy reading Chapter 76 – Generate the Object Feature Importance Using Scikit learn and Random Forest. If you did, please support us by doing one of the things listed below, because it always helps out our channel.

Support and Donate to our channel through PayPal (paypal.me/Easy2digital)
Subscribe to my channel and turn on the notification bell Easy2Digital Youtube channel.
Follow and like my page Easy2Digital Facebook page
Share the article on your social network with the hashtag #easy2digital
You sign up for our weekly newsletter to receive Easy2Digital latest articles, videos, and discount codes
Subscribe to our monthly membership through Patreon to enjoy exclusive benefits (www.patreon.com/louisludigital)

Data Science & Machine Learning Couresa Course Recommendation

Tags: Data Science, Numpy, Pandas, Scikit Learn

romeorandle on Chapter 40 – Utilize Youtube Bots to Scrape Videos, Profiles, and Contacts Using Easy2Digital APIs and Youtube APIApril 3, 2023
Useful info. Fortunate me I found your site by accident, and I am stunned why this accident didn't came about…
yoshka on Chapter 72 – Build a Blog Content Generator Using OpenAI GPT3 and Easy2Digital APIMarch 26, 2023
Thank you ever so for you blog. Really looking forward to read more.
Haydengret on Chapter 40 – Utilize Youtube Bots to Scrape Videos, Profiles, and Contacts Using Easy2Digital APIs and Youtube APIMarch 22, 2023
When some one searches for his necessary thing, therefore he/she needs to be available that in detail, thus that thing…
dennylone on Chapter 40 – Utilize Youtube Bots to Scrape Videos, Profiles, and Contacts Using Easy2Digital APIs and Youtube APIMarch 21, 2023
Your mode of telling the whole thing in this article is in fact fastidious, all be able to easily be…
sil on Chapter 29 – Build an Indiegogo Bot for Scraping Most Crowdfunded ProjectsMarch 19, 2023
Pls send me Python Tutorial 29 – Create an Indiegogo Bot for Scraping Most Crowdfunded Projects, thank u ~

Chapter 76 – Generate the Object Feature Importance Using Scikit learn and Random Forest

Table of Contents: Generate the Object Feature Importance Using Scikit learn and Random Forest in Machine Learning

Red wine dataset and data training split

Built-in Feature Importance with Scikit-learn

Built-in Scikit-learn Method with a Random Feature

Permutation Feature Importance

Random Forest Feature Importance with SNAP

Random Forest Path Feature Importance

Full Python Script of Feature importance generator

Data Science & Machine Learning Couresa Course Recommendation

Chapter 86 – Tips to Create AMP Pages for Web App using Python, HTML, CSS, JS

Chapter 87 – Interact with Google Big Query ML Pre-trained Model Dataset Using Python

Python 88 – Guide to Tokenize AI Prompt to Save Cost & Generate Better Result

Follow Us

Product & Partnership

About

Table of Contents: Generate the Object Feature Importance Using Scikit learn and Random Forest in Machine Learning

Red wine dataset and data training split

Built-in Feature Importance with Scikit-learn

Built-in Scikit-learn Method with a Random Feature

Permutation Feature Importance

Random Forest Feature Importance with SNAP

Random Forest Path Feature Importance

Full Python Script of Feature importance generator

Data Science & Machine Learning Couresa Course Recommendation

More Stories

Chapter 86 – Tips to Create AMP Pages for Web App using Python, HTML, CSS, JS

Chapter 87 – Interact with Google Big Query ML Pre-trained Model Dataset Using Python

Python 88 – Guide to Tokenize AI Prompt to Save Cost & Generate Better Result

Follow Us

Product & Partnership

About