Python Tutorial 37 – Clearbit Bot for Scraping Brand Web Domains with Python, Clearbit API and Sqlite3

An objective-oriented scraping project consists of many standalone Python bot scripts which can connect and function together. One of the most useful data used to scrape potential leads’ data must be the brand web domains. Basically we learn and know a brand from there. The question is how we are able to automatically grab in bulk instead of using Google search. This article tells how to make a bot with Python, Clearbit and Sqlite3

clearbit bot

In this Python Tutorial, I’ll walk you through how to create a clearbit bot leveraging clearbit free API resources to bulky and scalably scrape the specific web URLs using names. Typically, I will use brand names in the clearbit bot script. By the end of this Python tutorial, you can master using the API, write the codes and learn how to open db files in your local device.

Python Modules: clearbit api, requests, json, pandas, sqlite3

What’s Clearbit and Why need the clearbit bot

Clearbit develops business intelligence to help companies find more information about customers in order to increase communication accuracy, sales and reduce fraud data. Its big data capability facilitates B2B marketers and entrepreneurs to have more accurate data intelligence to communicate and make decisions.

clearbit bot

Brand web URLs are one of the most important touch points we are able to learn about the product, and identify business opportunities. From a Python scraper point of view, it’s also the first place to grab communication data, such as emails, social profiles, product, marketing, etc. It’s not like searching URLs using brand keywords in Google engine manually. Clearbit Bot can help you instantly scrape those specific brand web URLs and visualize them in one hub like the screenshot attached below

Clearbit API Endpoint and Accessible Data

Its Free API endpoint allows 600 requests per min. Thus, you can set up a timer in the script that every 600 requests stop for 60 seconds. Basically you can scrape unlimited data using the Python Clearbit bot.

There are different API endpoints in Clearbit, and some are not for free. In terms of the brand web url scraping, below is the free API endpoint. 

https://autocomplete.clearbit.com/v1/companies/suggest?query=

As you can see from the picture, there are three accessible pieces of data, which are the brand name, domain URL and the logo URL.

clearbit

Nest Looping Codes for scraping the data

In the response from Clearbit API, there might be more than one item from the same brand name. So for avoiding missing any information from the same brand name, we need a nest loop to grab all data from the same name.

What’s sqlite3 and why it matters

SQLite3 is a software library that provides a relational database management system. The lite in SQLite means lightweight in terms of setup, database administration, and required resources. It has the following noticeable features: self-contained, serverless, zero-configuration, transactional.

Thus, it’s perfect to use Sqlite3 for any applications, website or IoT devices with low or medium volume of traffic. For example, you can build an automatic workflow of the Amazon product price monitoring bot on wayscripts with sqlite3. If it gets fewer than 100K hits/day. It can perfectly work fine with SQLite. It emphasizes economy, efficiency, reliability, independence, and simplicity

SQLite is an embedded database and it is not intended to be used as a client/server DB.

So it is not directly comparable to client server SQL database engines such as MySQL, Oracle, PostgreSQL, or SQL Server since SQLite is trying to solve a different problem.

Being said that, for the database learner, Sqlite3 is a more light and easier data management system to start and use to build applications, such as bots, websites, etc. Although it’s not comparable to the client side server, it’s convertible if your project needs it.

Also if you really want to, you can use SQLitening on the client side server deployment. Thus, it’s not a standalone language only for the on-disk, serverless and local storage purpose.

Sqlite3 methods and codes to store scraped data

There are a few methods and tactics for storing the scraped clearbit api data. 

First thing first, you need to import sqlite3. The good news is you don’t need to install sqlite3 separately. It’s because since Python 2.5, it has already included sqlite3.

clearbit bot

Second, you need to create a db file and a variable using cursor class. Cursor class is an instance using which you can invoke methods that execute SQLite statements, fetch data from the result sets of the queries.

Then, the step is to create a table and name the column and nature. We need to use the execute method.

execute('''CREATE TABLE weblinks(query_name TEXT, brand_name TEXT, web_URL TEXT)''')

  • CREATE TABLE is the table creation argument
  • Weblinks is the variable name you freely write and use to create a table
  • TEXT means the column nature. If it is a list of numbers, please use INT

In the scraped data set, there is more than one piece of data supplies. So we need a loop to insert the data into the sqlite3 db file.

for i in range(len(df)):

cursor.execute('''INSERT INTO weblinks VALUES(?,?,?)''', df.iloc[i])

  • INSERT INTO is to feed the scraped data to the weblinks table
  • VALUES() method is to match the column data you created earlier above. One question mark represents one column data.
  • Iloc[i] represents the rows in the index of the I, which is the iteration of variables from above

Last but not least, please remember to add commit() because it makes sure all of the scraped data can be stored in place. Otherwise, you will lose the data. Then, you can close the connection using close()

Once you store the data into a db file, you can select Ridill to open the file and see if it’s well done.

clearbit bot

Full Python Script of Clearbit Bot

If you are interested in the full script of Clearbit Bot for Scraping Brand Web Domains with Python, Clearbit API and Sqlite3, please subscribe to our newsletter by adding the message “Python Tutorial 37”. We would send you the script immediately to your mailbox.

Contact us

I hope you enjoy reading Python Tutorial 37 – Clearbit Bot for Scraping Brand Web Domains with Python, Clearbit API and Sqlite3. If you did, please support us by doing one of the things listed below, because it always helps out our channel.

  • Support my channel through PayPal (paypal.me/Easy2digital)
  • Subscribe to my channel and turn on the notification bell Easy2Digital Youtube channel.
  • Follow and like my page Easy2Digital Facebook page
  • Share the article to your social network with the hashtag #easy2digital
  • Buy products with Easy2Digital 10% OFF Discount code (Easy2DigitalNewBuyers2022)
  • You sign up for our weekly newsletter to receive Easy2Digital latest articles, videos, and discount code on Buyfromlo products and digital software
  • Subscribe to our monthly membership through Patreon to enjoy exclusive benefits (www.patreon.com/louisludigital)

Leave a Reply

Your email address will not be published.