September 22, 2021

Python Tutorial 15 – Instagram Photo Scraper Using Selenium and OS

You might have suffered low production efficiency due to insufficient imageries and photos in your personal library. Then looking for new photos really takes time, and sometimes it’s frustrating and it can be very expensive if you pay for every piece of photo. What’s more, up-to-date and top ranking photos in instagram can inspire you with more creative content ideas. It’s worth finding a way to save time and increase the efficiency, rather than bearing with the repeating and heavy workload.

python tutorial

Instagram is a social platform that is famous for photo sharing, although it has been a video sharing platform moving forward. Looking through the photo in the top rankings absolutely can facilitate you to gather better photo materials.

In this Python Tutorial, I would like to share a way to download and save top-ranking photos by using hashtags and Python. By the end of this article, you can learn what modules and Python methods are necessary. And you can immediately download hundreds of photos by just spending a few minutes.

Python Tutorial – Import Selenium Module and Log in Instagram Account

In previous Instagram scraper articles, we also used the selenium module. It’s used to import web driver and, and log into an Instagram account. So for more details in this section, please refer to one of these articles.

Python Tutorial 14 for Growth Hacker & Digital Marketer – Using Selenium for creating an Instagram Bot to Boost Visibility and Grow Followers

Python Tutorial 13 for Growth Hacker & Digital Marketer – Scrape Instagram Email, Followers, Posts, and More Using Selenium, BeautifulSoup, and JSON

Python Tutorial for Digital Marketer 12 – Using Hashtags to Scrape Top Instagram Posts and Instagram Users

Python Tutorial – Find and Get Instagram Photo Paths

You can right-click one of the post images in the hashtag result and inspect it using Chrome. As you can see from the below screencap, any photo is named in the tag img.

python tutorial

I created a new variable images. Then, you can leverage the selenium argument – find_elements_by_tag_name, to lock all photos in this path.

images = driver.find_elements_by_tag_name('img')

For downloading the photos, the first thing is to find all photo URLs. So you need to use the argument of image and get_attribute(). The attribute value as you can see from the screencap, it’s “src”.

images = [image.get_attribute('src') for image in images]

Import OS Module

The OS module in Python provides functions for interacting with the operating system. OS comes under Python’s standard utility modules. This module provides a portable way of using operating system-dependent functionality. The os and os path modules include many functions to interact with the file system.

Now we need to download the photos and save them to your laptop. So It is necessary to import the OS to create a new folder, save photos to the folder, and combine two things together.

import os

Python Methods to Interact with the operating system – getcwd(), join() and mkdir()

Python method getcwd() returns the current working directory of a process. So if your python script is located in one directory of your laptop, this line of code represents that your photo is to save in this directory as well.

path = os.getcwd()

The join() method is a string method and returns a string in which the elements of the sequence have been joined by a str separator.

path = os.path.join(path, query)

The query is the hashtag keyword you can set whatever you like. This line of code represents the hashtag name we can use in this path now you’re using.

os.mkdir() method in Python is used to create a directory named path with the specified numeric mode. So the new directory is named in the query or hashtag name you set just now

os.mkdir(path)

Import wget module, download and save the photos on your computer

The wget command is a non-interactive utility to download remote files from the internet. It is built-in with Unix-based operating systems. It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies.

Import wget

As you might be aware, there are many photos from a hashtag on Instagram. So the photo naming is critical. Otherwise, the photo would be replaced one after one. In the end, it wastes time because you can only get one photo.

For resolving this, we need to create a new variable with the value 0.

Number = 0

Then, we create a loop to download the photos and save each photo with a unique name. As we have import wget, we can use the method download() and save the photos in the path you specify. Last but not least, please don’t forget to tell Python the number variable is plus 1 one after one in the loop after starting with 0.

for image in images:
       save_as = os.path.join(path, query + str(number) + '.jpg')
       wget.download(image, save_as)
       number +=1

Full Python Script of Instagram Photo Scraper

If you would like to have the full version of the Python Script of Instagram Photo Scraper, please subscribe to our newsletter by adding the message Python Tutorial 15. We would send you the script immediately to your mailbox.

Contact us

I hope you enjoy reading Python Tutorial 15 – Instagram Photo Scraper Using Selenium and OS. If you did, please support us by doing one of the things listed below, because it always helps out our channel.

  • Support my channel through PayPal (paypal.me/Easy2digital)
  • Subscribe to my channel and turn on the notification bell Easy2Digital Youtube channel.
  • Follow and like my page Easy2Digital Facebook page
  • Share the article to your social network with the hashtag #easy2digital
  • Buy products with Easy2Digital 10% OFF Discount code (Easy2DigitalNewBuyers2021)
  • You sign up for our weekly newsletter to receive Easy2Digital latest articles, videos, and discount code on Buyfromlo products and digital software
  • Subscribe to our monthly membership through Patreon to enjoy exclusive benefits (www.patreon.com/louisludigital)

Leave a Reply

Your email address will not be published. Required fields are marked *