Python Tutorial 22: Tmall & Taobao Product Scraper Using Keywords to Fetch Item Data

The COVID-19 virus hit China first and hard, but China is the fastest recovering country worldwide. Evergrand and electricity events are still creating a blurry frontpath in China. But it’s a fact that China’s economy grew by a blistering 18.3% in the first quarter of 2021 compared to 2020. A country with more than 450 million middle class citizens, Tmall and Taobao product data insight basically have become a lighthouse for sellers who are doing business in China, or are going to enter and capture some percentages of this pie.

Tmall and Taobao is the largest online B2C and C2B marketplace in China for almost all product categories, from dry goods to wet goods.Amazon is an inspiring lighthouse if you are looking for products to sell worldwide, or you are defending your business share in the targeted markets. In China, Tmall and Taobao are the places you must go and explore. Basically you can find anything.

In this Python Tutorial, I would walk you through to create a Tmall and Taobao product scraper which facilitates you to investigate, or build an automatic pricing monitor. By the end of this Python Tutorial, you can learn all the tools and elements you need to know and use for building a Tmall and Taobao product scraper. Then, you can flexibly plug these data into your dashboard or P&L calculator.

What’s difference between Taobao/Tmall and Amazon Product Scraper

Selenium is also the key component to build the taobao and tmall product scraper, as well as the Amazon product scraper we walked through earlier. In China, digital platforms often require different verifications, such as SMS, manual swipe, face ID, personal ID, etc. Particularly Chinese digital channels are very strict to suspicious robot crawling. Basically most platforms have built up a more or less immune system resisting robots.

python tutorial

It’s normal that your robot might come across this challengings shown in the above photo in China. Different from western channel verification, it’s not only requiring manual action, but also it would change time by time. So basically the script indeed needs adjustment if changes are updated. This sort of methodology can be rolled out to China Douyin, Zhihu, etc.

In this article, we would mainly showcase webdriver detection and actionchain() for manual swipe verification.

Python Tutorial – ChromeOption() setting

For Taobao and Tmall, the reason is they have set up a javascript to detect the traffic and justify automatically if it’s from human-being or robots.

Normally, if a user logins into her or his Taobao, Tmall account, the javascript would show the behaviour is undefined under window.navigator.webdriver. That means it’s not a robot

However, it would show true if this behaviour is being controlled by a webdriver, and then Taobao, Tmall would turn this into another crawling resistance procedure.

So the very first thing first is to pretend your scraping behaviour is a real user action. For this, we need to add additional codings in the chrome option setting.

In this setting, I would recommend setting the chrome browser environment into Simpflied Chinese, so Taobao and Tmall don’t suspect you are a user from oversea markets.

options = webdriver.ChromeOptions()
options.add_argument('lang=zh_CN.UTF-8')

Then, these are the core components to escape being detected as a robot scraper.

options.add_experimental_option('excludeSwitches',['enable-automation'])
options.add_argument('--disable-blink-features=AutomationControlled')

Last but not least, I also recommend adding the codings that pause all image loadings, for the purpose to increase the web loading speed. After all, your goal is not for those photos.

options.add_experimental_option("prefs", {"profile.managed_default_content_settings.images": 2})

Python Tutorial – What Product Data you can grab

Regarding scrapable data of taobao products and items, there are store name, product photos, product page URLs, title, pricing, total sales.

In the Tmall dataset, it’s quite similar to Taobao. It has one unique data metric that is the customer review number. It can let you compare the sales with the review number.

Python Tutorial – Taobao and Tmall Product Data Path

Taobao and Tmall SERP html and CSS codings are different. Here I would take Tmall for instance.

First thing first, it’s the SERP page URL structure. As it’s a keyword-based scraping, you need to know how to create a URL with changeable query parameters. What’s more, you might not only aim to scrape the 1st page SERP. Below are the samples of the Swans products in Taobao. And as each page has 60 products, so each pagination needs to add on 60. For example, s=60 means it’s the 2nd page. S=120 means the 3rd page.

https://list.tmall.com/search_product.htm?q=swans
https://list.tmall.com/search_product.htm?q=swans&s=60
https://list.tmall.com/search_product.htm?q=swans&s=120

Secondly, in each SERP, there are 60 pieces of the product item block like this. When you create a loop in your python script, you can use this data path to lock all blocks.

results = soup.find_all('div',{'class': 'product-iWrap'})

Last but not least, we need to create another loop riding on the results data, and specially scrape the data we need for business purposes. Here I select the product title, urls, shop name, monthly sales, price.

title = tag.find('p',{'class': 'productTitle'}).text.strip()
url = tag.find('a',{'class': 'productImg'})['href']
Shopname = tag.find('a',{'class': 'productShop-name'}).text.strip()
monthlySales = tag.find('p',{'class': 'productStatus'}).text.strip()
price = tag.find('p',{'class': 'productPrice'}).text.strip()

Action chain () needed for Tmall Scraper

Unlike Taobao, Tmall has a more strict login process. It’s because it must have a manual swipe verification step. Meanwhile, Taobao is more like the Instagram scraper I walked you through earlier. It can directly direct to the SERP page after you login in.

python tutorial

Fortunately Selenium is very powerful. It not only has the scroll up and scroll down functions. It can also facilitate you to horizontally swipe with on hold using Action chain function

First thing first, you need to import this module at the beginning of the python script.

from selenium.webdriver.common.action_chains import ActionChains

Then, as well as finding the data path of the product item, you need to identify where the swipe location is, and using selenium to lock the path

slider = driver.find_element_by_id('nc_1_n1z')

Last but not least, we can use an action chain function and few more methods to stimulate the action completed by a real user. Just kindly remind me that it’s better to set a timer after you log in as sometimes the page loading might be slow. Just in case your IP might be blacklisted by Taobao and set as a robot IP.

log_in = WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button[type='submit']"))).click()
time.sleep(15)

ActionChains(driver).click_and_hold(slider).move_by_offset(300, 0).pause(5).release().perform()

What is the business value of this data?

Like Amazon product scraper, Tmall and Taobao product scraper has huge business value for your to learn the market opportunities and competition environment. From the sales and pricing data, you can basically understand what’s the avg pricing and what are the selling strategies, which are more popular. From here, you can learn how many competitors are selling similar product value with yours. You can create a real time price monitor as well as you are doing for Amazon business.

python tutorial

Also, it’s a very helpful approach to find your brand’s fake and refurbished brand products. are also very popular on Taobao and Tmall, although Alibaba has been actively blacklisting those fake brand stores. However, if your business is a foreign brand, it’s still challenging. It’s because the legal entity and trademark thing do not connect with the western country system. So even though your brand business registration and trade market signup has been done, it doesn’t mean that they are legal in China. So there is a saying that it’s all about censorship, documentation and starting cost in China.

Full Python Script of Taobao & Tmall Product Scraper

If you would like to have the full version of the Python Script of Amazon Product Price Tracker, please subscribe to our newsletter by adding the message Python Tutorial 22. We would send you the script immediately to your mailbox.

Contact us

I hope you enjoy reading Python Tutorial 22: Tmall & Taobao Product Scraper Using Keywords to Fetch Item Data. If you did, please support us by doing one of the things listed below, because it always helps out our channel.

  • Support my channel through PayPal (paypal.me/Easy2digital)
  • Subscribe to my channel and turn on the notification bell Easy2Digital Youtube channel.
  • Follow and like my page Easy2Digital Facebook page
  • Share the article to your social network with the hashtag #easy2digital
  • Buy products with Easy2Digital 10% OFF Discount code (Easy2DigitalNewBuyers2021)
  • You sign up for our weekly newsletter to receive Easy2Digital latest articles, videos, and discount code on Buyfromlo products and digital software
  • Subscribe to our monthly membership through Patreon to enjoy exclusive benefits (www.patreon.com/louisludigital)

Leave a Reply

Your email address will not be published. Required fields are marked *