In the previous Python Tutorial for digital marketers 2, we talked about how to install beautifulsoup4, requests, lxml, html5lib and sublime text, and then scraping web data by them. But the data is not saved in a file or a database yet, so it’s not convenient for you to use for your business purpose and work operation.
So in this Python Tutorial, we would talk about how to write Python scripts to parse and save the data into CSV files in local and read the CSV files in a Python environment.
By the end of this Python Tutorial, you can master what CSV read, parse and write methods you can use to open and save CSV files in a readable format, although we are not going to deep dive into a specific scraping methods scriptwriting which we would talk about in the next chapter of Python Tutorial.
- Import CSV module in Python3 on Sublime Text
- Write CSV Files
- Parse CSV Files
- Save Scraped Data into CSV Files
- Read CSV Files
Python Tutorial – Import CSV Module
Previously, I shared to import beautifulsoup and requests module in order to scrape the targeted web data and show correct data on Sublime Text. So when talking about CSV scripts in Python, we need to import the CSV module. The way is very easy by typing in the script below at the beginning of the Python file.
Python Tutorial – Write CSV Files
In order to create and write a new csv file to save your scraped data, you need to learn these two Python methods – open() and writer()
Open() Syntax: open(file, mode)
In the method argument, file means the path and name of the file which you can open after work is done. Then, mode means a string, define which mode you want to open the file in, and basically, there are four modes
“r” – Read – Default value. Opens a file for reading, error if the file does not exist
“a” – Append – Opens a file for appending, creates the file if it does not exist
“w” – Write – Opens a file for writing, creates the file if it does not exist
“x” – Create – Creates the specified file, returns an error if the file exist
In this case, we need to create and write a new CSV file, so we can either use “w”, or “x”
For example we can create a variable (csv_file) and write a line of code like this:
csv_file = open('ecommerce_scrape.csv','w')
Writer() Syntax: writer(variable)
The csv.writer() method returns a writer object which converts the user’s data into delimited strings on the given file-like object.
For example we can create a variable (csv_writer) and write a line of code like this:
csv_writer = csv.writer(csv_file)
Normally we scrape data and aim to split data and feed data into different specific columns in CSV. So the purpose of writer() is to create a parsable working environment before we parse the CSV file data.
Parse CSV Files
We don’t expect to read and use the data from a single excel box. Instead, no matter whether we save the files locally or on the server, we aim to split the raw data into different row headlines, which are convenient for us to read, call and use. In order to get the data in an expected format, we need to parse the data. Today, we’ll introduce a method – writerow(). basically the writerow() method is used to create each column headline, and writes a row of data into the specified file.
Writerow() Syntax: writerow([‘ ’],[‘ ’],[‘ ’],……, or [variable, variable2, variable3,….]
For example, we can write a line of code like this:
Now each column naming is done in the CSV file, and then we could feed the scraped data by columns.
For example we can write a line of code like this:
As you might be aware, the arguments in the above writerow are all the variables we created to scrape the different sections of data in Easy2Digital eCommerce article page. Please keep in mind and avoid using the column naming we did in the previous step.
(Note: We’ll discuss how to scrape specific data in Python3 on Sublime Text in the next chapter. Before that, you can refer to the other article about “Web Scraping with Google Sheets ImportXML to Automatically Collect Product Price Info”, where you can find the ways to use developer tools to identify the specific data location and path and learn about html structure.)
Save Scraped Data into CSV Files
In order to tell Python3, CSV file coding work is finished and export a file, or update the data to a server location, we need to use a method – close()
Python file method close() closes the opened file. A closed file cannot be read or written any more. Any operation, which requires that the file be opened will raise a ValueError after the file has been closed. Calling close() more than once is allowed.
Python automatically closes a file when the reference object of a file is reassigned to another file. It is a good practice to use the close() method to close a file.
For example, we can write a line of code like this:
Then, we can enter “command + B”. It’s still showing the headline and summary on Sublime text, but you find that there is a new csv file which you name in the script (ecommerce_scrape.csv) showing up in the assigned location.
If you try to open it, you can find all scraped information are saved in the CSV.file. There is no limitation on what data you want to scrape automatically, and where to save this new file. It’s just depending on your business purpose and work operation.
Read CSV Files
In many cases, you would need to write a Python script to automate a full workflow, such as updating ecommerce SKU profit calculator. Thus opening existing files and getting the information is a key ingredient in the automatic process. Here we would introduce two patterns with..as and for line in, and two methods – reader() and next()
First of all, let’s import the CSV module and open our existing file we just now created in CSV format. As you can see, here we use ‘r’ in the open method instead of ‘x’, or ‘w’ because we want to read the information, and define it as csv_file by using with open….as
with open('ecommerce_scrape.csv','r') as csv_file:
Then, we need to use reader method to grab the information and show it to us, so we create a variable csv_reading as well as the line of code listed below
csv_reading = csv.reader(csv_file)
Noted: reader() method returns a reader object which is an iterator of lines in the csv file.
If we try to print (csv_reading) and enter command b, the return is the object information.
In order to show the information in the file, we need to write a line of code by using for line in like this:
for line in csv_reading:
Then, not all of the information is necessary to grap, so you can modify and select the information you want to use by using next() and [number].
next () function returns the next item from the iterator. For example in this case, if you don’t need each column headline name, you can use:
Last but not least, you might just need specific column information like the article headline here. In general programming, 0 means the first. So if we just need the article headine, we can add after line in print method like this:
As you can see, it only shows all article headlines in the return dashboard.
So easy, right? I hope you enjoy reading Python Tutorial 3: How to Write, Parse, Read CSV Files with Scraped Data. If you did, please support us by doing one of the things listed below, because it always helps out our channel.
- Support my channel through PayPal (paypal.me/Easy2digital)
- Subscribe to my channel and turn on the notification bell Easy2Digital Youtube channel.
- Follow and like my page Easy2Digital Facebook page
- Share the article to your social network with the hashtag #easy2digital
- Buy products with Easy2Digital 10% OFF Discount code (Easy2DigitalNewBuyers2020)
- You sign up for our weekly newsletter to receive Easy2Digital latest articles, videos, and discount code on Buyfromlo products and digital software
- Subscribe to our monthly membership through Patreon to enjoy exclusive benefits (www.patreon.com/louisludigital)
If you are interested in the next chapter, please check out the article below
Python Tutorial for Marketers 4: Create a Website Bot to Scrape Specific Website Data Using BeautifulSoup