Scraping Instagram influencers and their posts can not only help collect influencer candidate lists to work with in the future but also give you content marketing insights about what content might have better engagement with audiences. If you’re looking to strengthen your brand identity, learning from the profiles of influencers is a good place to start.
However, scraping social data is not the same as website scraping, because almost all social platforms require you to log in first before using any features on the platform. So in this Python Tutorial, I would walk you through how to use Selenium to stimulate you to log into the platform, browse Instagram, and search the hashtags for downloading the top posts’ links. By the end of the Python Tutorial, you can start downloading all top Instagram posts by changing the hashtags as you like.
- Install Selenium and ChromeDriver
- Log in Instagram account
- Stimulate you to select the options
- Search posts using hashtags and scroll for more posts
- Find the elements you like to scrape and save them in a CSV file
- Full Python Script of Instagram Top Ranking Post and Influencer Profile Scraper By Using Hashtag
Instagram Bot – Install Selenium and ChromeDriver
Selenium is a free open-source automated testing framework used to validate web applications across different browsers and platforms. You can use multiple programming languages like Java, C#, Python, etc to create Selenium Test Scripts. Testing done using the Selenium testing tool is usually referred to as Selenium Testing.
If you have read my Python Tutorial article regarding setting up the pip3 previously, installing Selenium is very easy. You just need to type in this code in your Mac terminal
$pip3 install selenium
Then, you need a virtual driver to act on your behalf of you in the process. I would recommend ChromeDrive in this tutorial. First thing first, please go to Google and search ChromeDriver and click through to their website. You can see two versions basically – the beta and the latest standard. Just click the standard now!
You can select the version that is configurable with your device, here we would select mac64.zip. After download, you need to extract the zip and install the ChromeDriver. Quick notes for you to copy the ChromeDriver location path to the clipboard. It will be used in a moment.
In the python script, first of all, we need to import modules as well as other Python scripts we created before. Here the modules that are necessary:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
import pandas as pd
Then, we create a variable called the driver and add on the path copied just now using executable_path. Also, we type into a code about the requests to browse instagram.com. It’s similar to the request.get, but we need to use a driver in a selenium environment.
driver = webdriver.Chrome(executable_path='/Users/louislu/Desktop/Python/chromedriver')
Instagram Bot – Log in Instagram account
Basically, Selenium testing would stimulate my normal browsing on Instagram. So first thing first must be the account login.
First, we go to the login page and right-click to select inspect, for finding out what elements used to function in the username and password type-in box. As we can see, basically it is using the< input name=”username”> element representing this box, as well as the password is using input either. So we can use By.CSS_SELECTOR to specifically point out this section.
In the selenium expected_conditions, there is one argument we can use which means the element is clickable is element_to_be_clickable. And as we might need to consider the loading speed, we can create the lines of coding also by using WebDriverWait.
Here are the codes:
username = WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"input[name='username']")))
password = WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"input[name='password']")))
Secondly, we would send the account username and password value to the box. Before that, I would recommend clearing up the box first, to ensure the box is empty. Then, we use a method from selenium API – send_keys, for sending the value to the box.
Last but not least, we also need to inspect what elements the login button is, as well as check the username and password box. Then it continues to use element_to_be_clickable and By.CSS_SELECTOR. As we need to click the button, so at the end, a method, click(), needs adding.
log_in = WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button[type='submit']"))).click()
Stimulate you to select the options
Some platforms would have some pop-up windows after you logged into. In this case, you also need to clarify what pop-up windows that might have. In Instagram, it generally has two windows and for smoothly browsing our target content, we can click not now.
Here we also can use XPATH to click the not now button. Here are the codes:
not_now = WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//button[contains(text(),'Not Now')]"))).click()
not_now2 = WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,"//button[contains(text(),'Not Now')]"))).click()
Search posts using hashtags and scroll down for more posts
For searching posts using hashtags, Instagram has a fixed path which is https://www.instagram.com/explore/tags/ + keyword. So we need to create a query variable first, here I presume to search “moussy”. Then, we also create a variable page to visit the page.
query = "moussy"
page = driver.get("https://www.instagram.com/explore/tags/" + query)
When you scroll down for more posts, you might find out it will load a while for more posts. So we need to write codes to scroll and also add codes to avoid the scraping being stopped due to the loading time. We would use window.scrollBy () and time.sleep() methods. The x and y number in the scroll method represents the max. The height you like to scroll down. But as the scrolling would be stopped due to loading time, I would recommend you set a bigger number first and add more lines if you aim to scrape more posts.
Find the elements you like to scrape and save them in a CSV file
Now basically all posts are ready, and what we need to do is to fetch the post links. Again, we can inspect and find the elements. In the Selenium argument, there are two methods, elements_by_tag_name and get_attribute()
links = driver.find_elements_by_tag_name('a')
links = [link.get_attribute('href') for link in links]
If you try to print these codes and the result comes up the links, it means it’s working
So you can use Pandas to append the column and save it as a CSV file. For this, I shared previously and I am not going to elaborate on details here.
df = pd.DataFrame(links,columns=["InstagramPostLink"])
Full Python Script of Instagram Top Ranking Post and Influencer Profile Scraper By Using Hashtag
If you would like to have the full version of the Python Script of Instagram Post and Influencer Scraper By Using Hashtag, please subscribe to our newsletter by adding the message Python Tutorial 12. We would send you the script immediately to your mailbox.
So easy, right? I hope you enjoy reading Python Tutorial 12 – Using Hashtags to Scrape Top Instagram Posts and Instagram Users. If you did, please support us by doing one of the things listed below, because it always helps out our channel.
- Support my channel through PayPal (paypal.me/Easy2digital)
- Subscribe to my channel and turn on the notification bell Easy2Digital Youtube channel.
- Follow and like my page Easy2Digital Facebook page
- Share the article to your social network with the hashtag #easy2digital
- Buy products with Easy2Digital 10% OFF Discount code (Easy2DigitalNewBuyers2021)
- You sign up for our weekly newsletter to receive Easy2Digital latest articles, videos, and discount code on Buyfromlo products and digital software
- Subscribe to our monthly membership through Patreon to enjoy exclusive benefits (www.patreon.com/louisludigital)