In this Python Tutorial, I’ll walk you through what elements to create a Zhihu bot and execute the scraping. By the end of this tutorial, you can master how to write the python script
– Python Module: BeautifulSoup, Selenium, CSV, Time, Pandas
– Components: Zhihu account (Either verified or not; Automatic messaging requires verified Zhihu Account (Personal ID, or passport).
- Why you need a Zhihu Bot for marketing purpose in China
- Define a zhihuLogin function
- Tips and codes to scrape Zhihu SERP data
- 3 types of post links in SERP
- Zhihu Q&A
- Zhihu Column
- Full Python Script of Zhihu Bot
Why you need a Zhihu Bot for marketing purpose in China
First thing first, Zhihu is one of the largest Q&A communities with recognised quality content. People get used to going there and finding answers regarding daily life problems, brand word of mouth, product review, healthcare information, and professional knowledge. Thus, it gathers in-demand traffic on the platform.
Also, foreigners are accessible to Zhihu as well because the platform allows users to sign up using a foreign country mobile phone number. Even though it has a limit to release and comment content if that user doesn’t verify her or his identity using a personal identity card. However, whether you want or not, at least browsing the platform and scraping the top ranking content and KOLs, KOCs are not a problem.
Unlike Instagram, Zhihu encourages users to message each other, invite others to comment. So there isn’t a limit of daily messaging, commenting, following etc. It provides a friendly environment for marketers to automate the data collection and outreach communication on the platform. So having a Zhihu bot facilitates you doing marketing and recruitment in China.
Define a ZhihuLogin Function
As a foreigner, signing up for a Zhihu needs to install the Zhihu app on your mobile and make it done through mobile devices. For more details, I’ll release another article regarding how to sign up for a Zhihu account using a non-China mobile number.
Zhihu login journey is very simple, which has 4 steps. Below are the codes I convert into these 4 steps
## Step 1 - open the login page
zhihuHome = driver.get('https://www.zhihu.com/signin?next=%2F')
## Step 2 - Click the login by using email and password
## Step 3 - locate the email and password elements, send your credential information to there
username = WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"input[name='username']")))
password = WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"input[name='password']")))
## Step 4: Click the submit buton
submit = driver.find_element_by_xpath("//button[@type='submit']").click()
Last but not least, it has a verification process after you click submission. I will suggest manually as it is just one-off and it is unnecessary to verify again. I will release another article regarding how to pass security checkers using Python.
Tips and codes to scrape Zhihu SERP data
You can filter the dimension of the posts in SERP, such as the most votes, within a year, etc. Here is the URL with parameters for the best engagement content in the latest year.
https://www.zhihu.com/search?q=' + keywords + '&sort=upvoted_count&type=content&time_interval=a_year'
One trick to scrape the Zhihu SERP data is you need to manually scroll down and load one more result page first, otherwise the html element data is not reachable and your bot can’t work.
In the SERP, the scrapable data are the post link, profile name, post tile, numbers of like & comments.
3 types of post links in SERP
In the SERP, there are three types of post links, which are the Q&A post, video and column. Here are the URL samples FYI. In terms of the content-wise difference, please refer to this article.
For the Q&A post links, the script can grab the number of profile followers directly, as on the right hand there is the follower and following number.
Then, you can set up a conditional coding that if the follower number is larger than a number, the script can automatically message the KOL.
Here are the codings of python automatic messaging
In SERP, articles possiblly are from Zhihu Column and when you click into the piece, you are not able to find the profile follower numbers. So in the python script, you also need to break down into two sections. One is to scrape the profile page URL first, and the other script is to fetch the follower number from the profile URL.
Full Python Script of Zhihu Bot
If you are interested in the full script of Zhihu bot, please subscribe to our newsletter by adding the message “Python tutorial 31”. We would send you the script immediately to your mailbox.
I hope you enjoy reading Python Tutorial 31: Zhihu Bot & Scraper for grabbing Q&A, Column Data. If you did, please support us by doing one of the things listed below, because it always helps out our channel.
- Support my channel through PayPal (paypal.me/Easy2digital)
- Subscribe to my channel and turn on the notification bell Easy2Digital Youtube channel.
- Follow and like my page Easy2Digital Facebook page
- Share the article to your social network with the hashtag #easy2digital
- Buy products with Easy2Digital 10% OFF Discount code (Easy2DigitalNewBuyers2021)
- You sign up for our weekly newsletter to receive Easy2Digital latest articles, videos, and discount code on Buyfromlo products and digital software
- Subscribe to our monthly membership through Patreon to enjoy exclusive benefits (www.patreon.com/louisludigital)