September 21, 2021

Python Tutorial 16 – Amazon Product Scraper Using Selenium, BeautifulSoup and gspread

You might be wondering why some sellers can smell the up-to-trend niche products and make a great investment. Of course, software like Jungle Scout is assisting them to understand the target market consumers. I might partially agree because the most important is the mindset and mastering skills to automate the survey and monitoring. Instead of paying and relying on 3rd party software, self-developed amazon product scraper is indispensable if you like to stand on the front of demand, and monitor your pricing value.

python tutorial

Amazon product information in the search result has great value for you to understand 2 things. They are the sales performance of a product and the customer review on this product and merchant. Then, it can extend to a much wider application, such as the price tracker, and P&L market value.

In this Python Tutorial, I would share with you how to create an Amazon product scraper and save the fetched data on the Google Sheet. By the end of this Python Tutorial, you can learn how to install the gspread module, and where to find the data elements in the HTML.

Python Tutorial – Import Selenium, BeautifulSoup and gspread Module

Amazon doesn’t allow you to visit the website using BeautifulSoup in a Python script. It would come up as a result of “sorry, something wrong”. Instead, you can smoothly fetch the product data as you like using selenium. Importing these two modules are the same process to previously scripts I did for other bots

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from bs4 import BeautifulSoup

Also, this script is not only for fetching and saving product information in a sheet. It also aims to automatically refresh the data in a schedule, and track competitor’s marketing information and product pricing. For the price tracker, I would release another article. Because of that, I would recommend using Google Sheet APIs and managing the data on the Google Sheet. Here I recommend using gspread, because it makes things more simple.

For creating the Google Sheet API and setting up the service account, you can refer to the other article. I released an article previously. For more details, please check out this article.

I would say gspread is much easier to connect with Google API and manage fetched data. First thing first, you need to install the gspread in your laptop

Pip3 install gspread

Then, you copy the name of the JSON key you downloaded from your new Google API service account and paste it into the filename value.

gc = gspread.service_account(filename = 'amazon-price-tracker-321016-494586b5d875.json')

Last but not least, you need to copy the Google sheet name and open it using the open method. Depending on which sheet you like to use to save the fetch data. Here I use sheet1.

sh = gc.open('AmazonPriceTracker').sheet1

python tutorial

Python Tutorial – Create a Prototype of Product Search Scraper

You can search any products’ information using search queries. Basically, below is the prototype of the Amazon product scraper, whatever product you are searching for. Or wherever Amazon market you are looking through.

1. Amazon Search Query URL structure

You might find that there are two parameters in the URL controlling the SERP. One is the k followed by the keyword. The other is the page followed up with the page number.

python tutorial

https://www.amazon.com/s?k=ring+camera&page=2

So you can create two variables for the usage later on. One is representing the keyword you are searching for. One is for dynamically looping and fetching more pages

query = "ring+camera"
page = "&page="

2. Find the product information block in the SERP

You can right-click any product title and use the inspect function. You can see what elements can find the entire product information block. The block includes all core product information you aim to fetch. For example, they are ASIN, pricing, title, URL, review count, etc. It’s similar to the web scraping I shared previously.

As you can see, all the information sits in a div and a tag named data component type. So you can draft the codes like this. This is for scraping the 1st page

driver.get("https://www.amazon.com/s?k=query
soup = BeautifulSoup(driver.page_source, 'html.parser')
results = soup.find_all('div',{'data-component-type': 's-search-result'})

3. Select the Data Type and Scrape Specifically

ASIN is a key element you must fetch, because the price tracker needs this element to connect with your current P&L calculator. I would talk about this in the other article.

As you can see, the ASIN value is sitting in this tag data-asin. So you can create a variable and use attrs to get each product ASIN number

asins = item.attrs['data-asin']

Then, the product title is the h2 in the html. So the python code can be like this. However, for removing any removes any leading (spaces at the beginning) and trailing (spaces at the end) characters, you can use strip() after text

try:
    title = item.h2.a.text.strip()
except Exception as e:
    raise None

Pricing is another key element in the price tracker because it’s dynamic and might change depending on your competitor’s promotion. So in the Amazon product scraper, this coding can help you fetch the pricing. However, for filtering and calculating market value later, I would recommend you remove the currency signal. This is to ensure the data is in the number format in the Google Sheets.

try:
   price_parent = item.find('span','a-price')
   price = price_parent.find('span','a-offscreen').text.replace('$','')
except Exception as e:
   price_parent = '0'
   price = '0'

Python Tutorial – Scrape Multi-pages of Search Result

The 1st page of Amazon SERP has avg.22 pieces of product. It might not be sufficient for you to understand a product’s market performance and opportunity. In this case, you need to scrape more than one page. Luckily it’s not complicated, and it’s similar to the web pagination scraper I shared before.

First thing first, you need to create a loop on the higher tier than the specific data fetch looping. You can create a variable X.

Then, in the range, you can set the number of pages. But please keep in mind that the last page should be y – 1. For example, here the last page is 3, so it means the last page is 2.

for x in range (1,3):
    driver.get("https://www.amazon.com/s?k="+query+page+str(x))
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    results = soup.find_all('div',{'data-component-type': 's-search-result'})

         for item in results:

              asins = item.attrs['data-asin']

Last but not least, as you know the Amazon URL structure, we need to update the URL request, which is like this.

driver.get("https://www.amazon.com/s?k="+query+page+str(x))

Save Fetched Data to the Google Sheets

Now things are ready, and it’s time to feed and save in the Google Sheet. In gspread, it has a method append_row. You just need to put those specific data variable names into this method like this. Then, all the fetched data can be fed to your destination on the Google Sheet.

sh.append_row([asins, title, price, rating, review_count])

python tutorial

Full Python Script of Amazon Product Scraper

If you would like to have the full version of the Python Script of Amazon Product Scraper, please subscribe to our newsletter by adding the message Python Tutorial 16. We would send you the script immediately to your mailbox.

Contact us

I hope you enjoy reading Python Tutorial 16 – Amazon Product Scraper Using Selenium, BeautifulSoup and gspread. If you did, please support us by doing one of the things listed below, because it always helps out our channel.

  • Support my channel through PayPal (paypal.me/Easy2digital)
  • Subscribe to my channel and turn on the notification bell Easy2Digital Youtube channel.
  • Follow and like my page Easy2Digital Facebook page
  • Share the article to your social network with the hashtag #easy2digital
  • Buy products with Easy2Digital 10% OFF Discount code (Easy2DigitalNewBuyers2021)
  • You sign up for our weekly newsletter to receive Easy2Digital latest articles, videos, and discount code on Buyfromlo products and digital software
  • Subscribe to our monthly membership through Patreon to enjoy exclusive benefits (www.patreon.com/louisludigital)

Leave a Reply

Your email address will not be published. Required fields are marked *