As mentioned in the previous Python tutorial about “How to Write, Parse, Read CSV Files with Scraped Data”, we would discuss how to specify web data to scrape, because this is one of the key purposes of why we like to learn Python as a digital marketer.

So in this python tutorial for digital marketers 4, I’ll walk you through a basic concept, and methods with using Beautifulsoup and Requests you need to know to specify web data and scrape. It’s better if you understand how to read HTML, CSS, javascript in this part, but it’s totally okay if you haven’t yet, because the purpose is to find the data located at the moment and learn some methods to scrape specific data for digital marketing purpose.

During the Python tutorial lesson, I’ll take Ring.com as an example to write codes and scrape all the latest offers and pricing. By the end of the Python tutorial lesson, you can master identify where your expected data locate on a target page and scrape it all in minutes.

Identify the data section on a target page

As you can see here, there are many ring product bundles and offers which might be updated irregularly. If you were a Ring reseller or Ring’s competitor, you definitely like to adjust the product marketing and pricing strategy, in order that conversion rate will not be impacted if sales are critical for your business.

python tutorial

To find the data location, we need to use a browser developer tool and inspect the web source code. I take Chrome as an example, you can select a product and right-click to select inspect.

python tutorial

We aim to scrape data of headline, subheadline, regular price, promotion price, description, and product URL. Based on this data scraping target, we try to look into the code and find out this section or we call division, which includes all the data we target to scrape:

<div class=”plp-product”…….</div>

You are aware of the resting of product information all start with the same division:

<div class=”plp-product”…….</div>

To see if this is correct, we can start using methods: find(), and find_all()

(Note: I’m not going into details on how to import Beautifulsoup, requests modules. If you like to learn more, please check the previous articles:

Python Tutorial for Digital Marketers 2: Web Scraping with BeautifulSoup, Requests, Sublime Text

First of all, let us create variables called ringweb, ringoffers and ringproduct

ringweb = requests.get(‘https://ring.com/collections/offers’).text

ringoffers = BeautifulSoup(ringweb,’lxml’)

find() method is used to locate your scraping action and get a response of the data from this location. As the path is <div class=”plp-product”…….</div>, so we can write a line of code

ringproduct = ringoffers.find(‘div’, class_=’plp-product’)

In this line of code, we define a variable ringrproduct, which represents the data of the target path under ringoffers. Just keep in mind, in the Python method, we usually use commas to split the HTML source code and use single quotes for each source code. Regarding class, we need to use class_=, because class= stands for the other function in Python.

If we try to print this out and command B, you can see these lines of code can grab the section data already. It’s working.

As Ring.com has not only one set of bundles to sell in the offer page, so we need to use the other method find_all(). We only need to replace find() with find_all(), you can see all bundle section data are generated.

Specify the data to parse and scrape

Now we start to parse the target section data and specific data we want to scrape, as we mentioned earlier in this article.

First of all, it’s the product headline

We can inspect and see from developer tools, Ring offer page h4 only represents the bundle products’ headline, so we can directly write a line of code:

headline = ringproduct.h4.text
print(headline)

In Python, we split a path by using a dot except for a path within a method. As we scrape the string data, so we can use text after h4.

Then, it’s subheadline and description

We can see there are two places that are using h6 (subheadline and description). So different from the headline, we need to use find() method and locate a specific h6 data

subheadline = ringproduct.find(‘h6′, class_=’sub-title size-xs’).text
print(subheadline)

description = ringproduct.find(‘h6′,class_=’product-description font__exlight’).text
print(description)

You would find out not all the product bundles have subheadline. In Python, we need to pass missing data in order to avoid errors in running the script file. I’ll talk about it in a moment.

Then, it’s the regular price and promotion price

promote_price = ringproduct.find(‘span’,class_=’regular-price’).text
print(promote_price)

regular_price = ringproduct.find(‘span’,class_=’compare-price’).text
print(regular_price)

Last but not least, it’s the product landing URL

Lines of code:

product_url = ringproduct.a[‘href’]
product_link = f’https://ring.com{product_url}’
print(product_link)

Basically, there’s only one unique URL showing in the source code of each product bundle. For example, the product starter kit landing path is collections/offers/products/starter-kit. So we could ignore which value we aim to scrape and directly leverage [ ], to scrape href value in a section: product_url = ringproduct.a[‘href’]

However, if we scrape these data path into a file, they can’t be opened and accessed to the page. So for displaying the full URL, we can create a variable product_link and leverage the format feature by using f ‘’ and insert product_url into { }.

product_link = f’https://ring.com{product_url}’

If we try to print this out, we can have this result listed below which prove it’s working.

Create a loop to scrape all section specified data

These lines of code are working so that we can roll it out to scrape all data in bulk. For this, we need to use for…in and find_all method:

for ringproduct in ringoffers.find_all(‘div’, class_=’plp-product’):

As this code is parent level and all children lines of code are working under this, so we need to add a colon at the end of this line of code and indent children lines.

If we try to print this out, we can see this result listed below, which includes all product bundle information.

Python tutorial

Pass missing data in some sections

You might be aware that not all product bundles have subheadline and promotion price, so if you try to run subheadline and promotion price lines of code if you come across this response from Python: object has no attribute ‘text’, and it would stop the scraping process.

Python tutorial

This is the reflection of the coding world because not all information is in order and structured. So we need to use try/except to pass this when coming across.

Subheading:

try:
   subheadline = ringproduct.find(‘h6′, class_=’sub-title size-xs’).text
except Exception as e:
   subheadline = None

Promotion price:

try:
   regular_price = ringproduct.find(‘span’,class_=’compare-price’).text

except Exception as e:
   regular_price = None

For the variable value under except exception as e, you can set as you feel it’s easy to understand, such as none, 0, ‘NA’, etc.

I set none, so you can see the response result

Python tutorial

Save the data into a CSV file

Now the python script is ready, and it’s time to save the scraped data into a place, which can be either of a local file or an online server.

I’m going to go into details about the CSV file because I talked about this lesson previously. If you are interested, please check out the other article

Python Tutorial for Digital Marketers 3: How to Write, Parse, Read CSV Files with Scraped Data

After the codes are done, a CSV file like the below would come up and store all specified data we aim to scrape.

Python tutorial

So easy, right? I hope you enjoy reading Python Tutorial for Digital Marketers 4: How to Specify Web Data to Scrape. If you did, please support us by doing one of the things listed below, because it always helps out our channel.

  • Support my channel through PayPal (paypal.me/Easy2digital)
  • Subscribe to my channel and turn on the notification bell Easy2Digital Youtube channel.
  • Follow and like my page Easy2Digital Facebook page
  • Share the article to your social network with the hashtag #easy2digital
  • Buy products with Easy2Digital 10% OFF Discount code (Easy2DigitalNewBuyers2020)
  • You sign up for our weekly newsletter to receive Easy2Digital latest articles, videos, and discount code on Buyfromlo products and digital software
  • Subscribe to our monthly membership through Patreon to enjoy exclusive benefits (www.patreon.com/louisludigital)

Apart from scraping HTML and XML web data, if you are interested in learning to fetch some platform data which can only be accessed via API, please check out this article, and we would start with the Youtube channel.

Python Tutorial for Digital Marketers 5: Create Youtube API & Scrape Youtube Videos

By Louis Lu

Growth Hacker & Digital Marketer, with a proven record of over 11 years experience in 20+ Asian markets, and 25,000+ connections in Linkedin

Leave a Reply

Your email address will not be published. Required fields are marked *