Chapter 68 – Build a Keyword Extractor Using Easy2Digital APIs

louis lu

1 year ago

Keyword extraction using website URLs facilitates you to fast learn about a new brand from scratch rather than reading through all information. There are many tools out there, nevertheless, it’s expensive to subscribe or the free tool is not user-friendly, notably not providing APIs for you to integrate with your business dashboard.

In this article, I would try using Easy2Digital APIs – brand info scraper and Google SERP scraper, to build a keyword extractor.

Ingredients on building a keyword extractor using Easy2Digital API

Collecting Brand relevant URLs using Easy2Digital APIs
Scrape the content from the URLs using BeautifulSoup
De-duplicate the extracted keywords
Full Python Script of Keyword Extractor

Collecting Brand relevant URLs using Easy2Digital APIs

The first question is where to extract keywords that are related to the brand you are investigating. The answer is the brand’s official site and surrounding portal sites.

Here is the code sample of the API usage as follows:

Scrape the content from the URLs using BeautifulSoup

For non-react-based website page scraping, Beautifulsoup is the best option without a doubt. For better scraping as many raw texts as possible at this stage, I suggest scraping h1, h2, h3, and p content separately and using find_all() to make it. Here are the code samples as follows:

Extract the text’s keywords using stop-words

In the keyword extractor top argument, we can add a number to extract only the most frequent keywords, such as top 10, and top 5. Additionally, for avoiding getting the English stop words, we can add the list in the second argument – stopwords

There is some stopwords module out there, here I recommend you to use scikit-learn because it has the longest stopword list compared to others based on my personal experiences. Please be sure to install the scikit-learn python package before importing this module into the script.

from sklearn.feature_extraction.text import ENGLISH_STOP_WORDS

In the extract_keywords, we need to feed the scraped text from the websites we just did. Here I created a variable named full_text, which included the scraped text.

keywords = kw_extractor.extract_keywords(full_text)

De-duplicate or Word Cloudify the extracted keywords

The script can generate the extracted keywords like this if it’s working properly. It’s a JSON format, and we can create a loop to extract the keyword without the score value.

On the whole, we have two ways to well use these keywords for intelligent insight. One is to generate a word cloud by using thousands of keywords for showing the keyword popularity by size. The other is to de-duplicate the keywords and rest only a list of unique keywords. Here I would show to de-duplicate the script.

final_keyword = list(dict.fromkeys(keywordResult))

Full Python Script of Keyword Extractor

If you are interested in Chapter 68 – Build a Keyword Extractor Using Easy2Digital APIs, please subscribe to our newsletter by adding the message “Chapter 68”. We would send you the script immediately to your mailbox. (If you need the email scraper as well, please tell us you need the paid version.

I hope you enjoy reading Chapter 68 – Build a Keyword Extractor Using Easy2Digital APIs. If you did, please support us by doing one of the things listed below, because it always helps out our channel.

Support and Donate to our channel through PayPal (paypal.me/Easy2digital)
Subscribe to my channel and turn on the notification bell Easy2Digital Youtube channel.
Follow and like my page Easy2Digital Facebook page
Share the article on your social network with the hashtag #easy2digital
You sign up for our weekly newsletter to receive Easy2Digital latest articles, videos, and discount codes
Subscribe to our monthly membership through Patreon to enjoy exclusive benefits (www.patreon.com/louisludigital)