Chapter 22: Tmall & Taobao Product Scraper Using Keywords to Fetch Item Data

The COVID-19 virus hit China first and hard, but China is the fastest recovering country worldwide. Evergrand and electricity events are still creating a blurry frontpath in China. But it’s a fact that China’s economy grew by a blistering 18.3% in the first quarter of 2021 compared to 2020. A country with more than 450 million middle class citizens, Tmall and Taobao product data insight basically have become a lighthouse for sellers who are doing business in China, or are going to enter and capture some percentages of this pie.

Tmall and Taobao is the largest online B2C and C2B marketplace in China for almost all product categories, from dry goods to wet goods. Amazon is an inspiring lighthouse if you are looking for products to sell worldwide, or you are defending your business share in the targeted markets. In China, Tmall and Taobao are the places you must go and explore. Basically, you can find anything.

In this chapter, I would walk you through creating a Tmall and Taobao product scraper that facilitates you to investigate or build an automatic pricing monitor. By the end of this chapter, you can learn all the tools and elements you need to know and use for building a Tmall and Taobao product scraper. Then, you can flexibly plug these data into your dashboard or P&L calculator.

Table of Contents: Tmall & Taobao Scraper

What’s the difference between Taobao/Tmall and Amazon Product Scraper

Selenium is also the key component to build the Taobao and Tmall product scraper, as well as the Amazon product scraper we walked through earlier. In China, digital platforms often require different verifications, such as SMS, manual swipe, face ID, personal ID, etc. Particularly Chinese digital channels are very strict about suspicious robot crawling. Basically, most platforms have built up a more or less immune system resisting robots.

It’s normal that your robot might come across this challenge shown in the above photo in China. Different from western channel verification, it not only requires manual action but also it would change time by time. So basically the script indeed needs adjustment if changes are updated. This sort of methodology can be rolled out to China Douyin, Zhihu, etc.

In this article, we would mainly showcase web driver detection and actionchain() for manual swipe verification.

Taobao Scraper – ChromeOption() setting

For Taobao and Tmall, the reason is they have set up a javascript to detect the traffic and justify automatically if it’s from human-being or robots.

Normally, if a user logins into her or his Taobao, or Tmall account, the javascript would show the behavior is undefined under window.navigator.webdriver. That means it’s not a robot

However, it would show true if this behavior is being controlled by a webdriver, and then Taobao, and Tmall would turn this into another crawling resistance procedure.

So the very first thing first is to pretend your scraping behavior is a real user action. For this, we need to add additional codings in the chrome option setting.

In this setting, I would recommend setting the chrome browser environment into Simplified Chinese, so Taobao and Tmall don’t suspect you are a user from oversea markets.

Then, these are the core components to escape being detected as a robot scraper.

Last but not least, I also recommend adding the codings that pause all image loadings, for the purpose to increase the web loading speed. After all, your goal is not for those photos.

Taobao Scraper – What Product Data you can grab

Regarding scrapable data of Taobao products and items, there are store names, product photos, product page URLs, titles, pricing, and total sales.

In the Tmall dataset, it’s quite similar to Taobao. It has one unique data metric which is the customer review number. It can let you compare the sales with the review number.

Taobao Scraper – Taobao and Tmall Product Data Path

Taobao and Tmall SERP html and CSS codings are different. Here I would take Tmall for instance.

First thing first, it’s the SERP page URL structure. As it’s a keyword-based scraping, you need to know how to create a URL with changeable query parameters. What’s more, you might not only aim to scrape the 1st page SERP. Below are the samples of the Swans products in Taobao. And as each page has 60 products, so each pagination needs to add on 60. For example, s=60 means it’s the 2nd page. S=120 means the 3rd page.

https://list.tmall.com/search_product.htm?q=swans
https://list.tmall.com/search_product.htm?q=swans&s=60
https://list.tmall.com/search_product.htm?q=swans&s=120

Secondly, in each SERP, there are 60 pieces of the product item block like this. When you create a loop in your python script, you can use this data path to lock all blocks.

results = soup.find_all('div',{'class': 'product-iWrap'})

Last but not least, we need to create another loop riding on the results data, and specially scrape the data we need for business purposes. Here I select the product title, urls, shop name, monthly sales, and price.

title = tag.find('p',{'class': 'productTitle'}).text.strip()
url = tag.find('a',{'class': 'productImg'})['href']
Shopname = tag.find('a',{'class': 'productShop-name'}).text.strip()
monthlySales = tag.find('p',{'class': 'productStatus'}).text.strip()
price = tag.find('p',{'class': 'productPrice'}).text.strip()

Action chain () needed for Tmall Scraper

Unlike Taobao, Tmall has a more strict login process. It’s because it must have a manual swipe verification step. Meanwhile, Taobao is more like the Instagram scraper I walked you through earlier. It can directly direct you to the SERP page after you login in.

Fortunately, Selenium is very powerful. It not only has the scroll up and scroll down functions. It can also facilitate you to horizontally swipe with on hold using Action chain function

First thing first, you need to import this module at the beginning of the python script.

from selenium.webdriver.common.action_chains import ActionChains

Then, as well as finding the data path of the product item, you need to identify where the swipe location is, and use selenium to lock the path

slider = driver.find_element_by_id('nc_1_n1z')

Last but not least, we can use an action chain function and few more methods to stimulate the action completed by a real user. Just kindly remind me that it’s better to set a timer after you log in as sometimes the page loading might be slow. Just in case your IP might be blacklisted by Taobao and set as a robot IP.

log_in = WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button[type='submit']"))).click()
time.sleep(15)

ActionChains(driver).click_and_hold(slider).move_by_offset(300, 0).pause(5).release().perform()

Taobao Scraper – What is the business value of this data?

Like Amazon product scraper, Tmall and Taobao product scraper has huge business value for you to learn the market opportunities and competition environment. From the sales and pricing data, you can basically understand what’s the avg pricing and what selling strategies, are more popular. From here, you can learn how many competitors are selling similar product value to yours. You can create a real-time price monitor as well as you are doing for Amazon business.

Also, it’s a very helpful approach to finding your brand’s fake and refurbished brand products. are also very popular on Taobao and Tmall, although Alibaba has been actively blacklisting those fake brand stores. However, if your business is a foreign brand, it’s still challenging. It’s because the legal entity and trademark thing do not connect with the western country system. So even though your brand business registration and trade market signup has been done, it doesn’t mean that they are legal in China. So there is a saying that it’s all about censorship, documentation, and starting cost in China.

Full Python Script of Taobao & Tmall Product Scraper

If you would like to have the full version of the Python Script of Tmall & Taobao Product Scraper, please subscribe to our newsletter by adding the message “Chapter Tutorial 22”. We would send you the script immediately to your mailbox.

Contact us

I hope you enjoy reading Chapter 22: Tmall & Taobao Product Scraper Using Keywords to Fetch Item Data. If you did, please support us by doing one of the things listed below, because it always helps out our channel.

  • Support and donate to our channel through PayPal (paypal.me/Easy2digital)
  • Subscribe to my channel and turn on the notification bell Easy2Digital Youtube channel.
  • Follow and like my page Easy2Digital Facebook page
  • Share the article on your social network with the hashtag #easy2digital
  • Buy products with Easy2Digital 10% OFF Discount code (Easy2DigitalNewBuyers2021)
  • You sign up for our weekly newsletter to receive Easy2Digital latest articles, videos, and discount codes
  • Subscribe to our monthly membership through Patreon to enjoy exclusive benefits (www.patreon.com/louisludigital)

FAQ:

Q1: What is Taobao Product Scraper?

A: Taobao Product Scraper is a tool that allows you to extract product data from Taobao, a popular Chinese e-commerce platform.

Q2: How does Taobao Product Scraper work?

A: Taobao Product Scraper works by utilizing web scraping techniques to extract product information such as titles, prices, descriptions, and images from Taobao product pages.

Q3: What can I do with the extracted data from Taobao Product Scraper?

A: Once you have extracted the data using Taobao Product Scraper, you can use it for various purposes such as market research, competitor analysis, price comparison, inventory management, and more.

Q4: Is Taobao Product Scraper legal?

A: While web scraping is generally legal, the legality of scraping specific websites can vary. It is important to review and comply with the terms of service of Taobao before using Taobao Product Scraper.

Q5: Can I scrape product data from multiple Taobao stores?

A: Yes, Taobao Product Scraper allows you to scrape product data from multiple Taobao stores. You can specify the URLs of the stores you want to scrape or use search keywords to scrape products from multiple stores.

Q6: Does Taobao Product Scraper support scraping product reviews and ratings?

A: Yes, Taobao Product Scraper can extract product reviews and ratings along with other product data. This can be useful for analyzing customer feedback and sentiment towards products.

Q7: Can I schedule automated scrapes with Taobao Product Scraper?

A: Yes, Taobao Product Scraper offers scheduling options that allow you to automate the scraping process. You can set up regular intervals for scraping or schedule specific times for scraping.

Q8: What formats can I export the scraped data in?

A: Taobao Product Scraper supports exporting the scraped data in various formats such as CSV, Excel, JSON, or custom formats. You can choose the format that best suits your needs.

Q9: Is Taobao Product Scraper beginner-friendly?

A: Yes, Taobao Product Scraper is designed to be user-friendly and accessible for beginners. It provides a simple and intuitive interface to easily configure and run scraping tasks.

Q10: Does Taobao Product Scraper offer customer support?

A: Yes, Taobao Product Scraper offers customer support to assist you with any questions or issues you may encounter while using the tool. You can reach out to their support team for assistance.