Python Tutorial 23: Douyin Bot & Content Scraper – Top Ranking Videos & KOL Profiles

Douyin, which is the Chinese version of Tiktok, announced that users’ time under the age of 14 will be capped to use the app for a maximum of 40 minutes a day, and only between 6 a.m. and 10 p.m. The question is how does the system realise the signed-up users are under 14? Apparently the new policy is relying on their parents. Being said that, Douyin has averaged around 600 million daily active users, in which is accumulating many creative video content and top performing KOLs. A way to scrape scalably in bulk definitely can inspire your product development and marketing creativity.

python tutorial

Douyin is under the largest companies, Bytedance. It’s famous with for its short-form video and artificial intelligence to understand the audience’s taste. So the platform has gathered many KOLs and KOCs to create and upload creative videos. They are for engaging with their target audience and followers. What’s more, Douyin’s audience is more than 600 million users according to the Baijiahao statistic, 80% of which are under the age of 29 and 64% are women. So it represents a critical battleground for you to understand the younger generation and nurture them for your business.

So in this Python Tutorial 23, I would walk you through how to use Selenium to stimulate you to log into the platform, manage the required personal identity verification, and search the hashtags for downloading the top-ranking video dataset. By the end of the Python Tutorial, you can start investigating top trendy content and products in bulk

Douyin Video Search Methodology

Top ranking video search is open to any visitors, although it just requires user login when you scroll down for more videos. The url is listed as below and you can just add on the keyword at the end of the URL.

https://www.douyin.com/search/yourkeyword

Douyin search methodology is similar to the Instagram top-ranking post we talked about earlier, which is based on the hashtag. As a user or KOL, they would add on more related hashtags if they want their videos that can be searched. So you can find creative video scripts and best-performing videos by means of this method.

Horizontal Scroll and Mobile SMS Verification

Douyin login requires identity verification, which is different from the Instagram scraper. In this part, it just can be semi-automatic.

First thing first, we can set up in the script to send the phone number to the login input, and click get the verification code. Just kindly remind you to add a time-sleep as you also need to manually slide the photo, and add the code sent to your phone which might need some time.

PhoneNumber = WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"input[name='normal-input']")))

PhoneNumber.send_keys("")

log_in = WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPath,"//*[@id="login-pannel"]/div[3]/div/article/article/div[1]/div[1]/div[2]/article/div[2]/div/span"))).click()

time.sleep(60)

python tutorial

python tutorial

After you complete the photo and SMS identity verification, you can click login manually. At this stage, you don’t need to go to the specific keyword result page. As long as you have logged into the Douyin account, you can visit any pages in a logged status in the ChromeDriver.

You might want the full version of Python Scriptalso. Also you like me to help you handle the verification and scrape top-ranking Douyin video content trend, hashtags, and KOL contacts, please subscribe to our newsletter by adding the message “Douyin Scraper Service + Python Script 23”

The quotation is charged by keywords. Min.10 keywords are priced at US$100. Each keyword has 1000+ pieces of video content and 50+ top ranking profiles

Selenium Window Scroll Down & Find All Douyin Posts

Douyin also needs users to scroll down if the users want to explore more video posts. It’s very similar to the Instagram explore mechanism. So here is the line of codings we can add after opening the specific keyword page

driver.execute_script("window.scrollBy(0,1000000)")

For the search result page of specific keywords, you can create the variables beforehand by means of the Douyin search URL.

pageB = 'https://www.douyin.com/search/keywordA'
pageC = 'https://www.douyin.com/search/keywordB'
pageD = 'https://www.douyin.com/search/keywordC'
pageE = 'https://www.douyin.com/search/keywordD'
pageA = 'https://www.douyin.com/search/keywordE'
pageF = 'https://www.douyin.com/search//keywordF'
pageG = 'https://www.douyin.com/search//keywordG'

Then, you need to create a loop to fetch all result pages and find all video posts’ html codings.

page = (pageA,pageB,pageC,pageD,pageE,pageF,pageG)

for urltoFetch in page:
taobao = driver.get(urltoFetch)
driver.execute_script("window.scrollBy(0,1000000)")
time.sleep(50)
soup = BeautifulSoup(driver.page_source, 'html.parser')
results = soup.find_all('li',{'class': 'a3cc5072a10a34f3d46c4e722ef788c1-scss'})

Douyin Top Ranking Video Headline, URL, Like Count, and Profile URL

Each video has several datasets which are crawlable. They are the video URL, like count, video headline, profile URL, profile name, video length, published date.

For investigating the creative content, product trend and KOL profiles, basically below dataset are sufficient.

for tag in results:

like = tag.find('span',{'class': '_04b09e32a7964282872626a4aff3353b-scss'}).text.strip()

title = tag.find('p',{'class': '_1d72ef4c67644daab0f1496c89e038aa-scss b2c8df63da2ed9be2bc3d38cf902e5b4-scss'}).text.strip()

channel = tag.find('p',{'class': '_31dc42fa6181927e1afa821a0db10ed0-scss _7cfe89a4f1868679513e50ad5cf7215c-scss'}).text.strip()

Post_URL = tag.find('a',{'class': 'caa4fd3df2607e91340989a2e41628d8-scss a074d7a61356015feb31633ad4c45f49-scss b388acfeaeef33f0122af9c4f71a93c9-scss'})['href']

Profile_URL = tag.find('a',{'class': 'caa4fd3df2607e91340989a2e41628d8-scss a074d7a61356015feb31633ad4c45f49-scss _9c247910afecae7b8e47d4c75867113a-scss'})['href']

Full Python Script of Douyin Top Ranking Video & KOL Profile Scraper

If you would like to have the full version of the Python Script, please subscribe to our newsletter by adding the message Python Tutorial 23. We would send you the script immediately to your mailbox. Don’t forget to add Douyin Scraper service if you additionally need my help.

The quotation is charged by keywords. Min.10 keywords are priced at US$100. Each keyword has 1000+ pieces of video content and 50+ top ranking profiles

I hope you enjoy reading Python Tutorial 23: Douyin Content Scraper – Top Ranking Video & KOL Profile. If you did, please support us by doing one of the things listed below, because it always helps out our channel.

  • Support my channel through PayPal (paypal.me/Easy2digital)
  • Subscribe to my channel and turn on the notification bell Easy2Digital Youtube channel.
  • Follow and like my page Easy2Digital Facebook page
  • Share the article to your social network with the hashtag #easy2digital
  • Buy products with Easy2Digital 10% OFF Discount code (Easy2DigitalNewBuyers2021)
  • You sign up for our weekly newsletter to receive Easy2Digital latest articles, videos, and discount code on Buyfromlo products and digital software
  • Subscribe to our monthly membership through Patreon to enjoy exclusive benefits (www.patreon.com/louisludigital)

Leave a Reply

Your email address will not be published. Required fields are marked *