Site icon EASY2DIGITAL

Chapter 3: Utilise CSV Module to Write, Parse, Read CSV Files to Manage Scraped Data

In the previous Chapter 2, we talked about how to install beautifulsoup4, requests, lxml, html5lib, and sublime text, and then scrape web data by them. But the data is not saved in a file or a database yet, so it’s not convenient for you to use for your business purpose and work operation.

So in this Python Tutorial, we would talk about how to write Python scripts to parse and save the data into CSV files local and read the CSV files in a Python environment.

By the end of this Python Tutorial, you can master what CSV read, parse and write methods you can use to open and save CSV files in a readable format, although we are not going to deep dive into specific scraping methods scriptwriting which we would talk about in the next chapter of Python Tutorial.

Table of Contents: Python CSV Module

Import CSV Module

Previously, I shared to import beautifulsoup and requests module in order to scrape the targeted web data and show correct data on Sublime Text. So when talking about CSV scripts in Python, we need to import the CSV module. The way is very easy by typing in the script below at the beginning of the Python file.

import csv

Python Tutorial – Write CSV Files

In order to create and write a new CSV file to save your scraped data, you need to learn these two Python methods – open() and writer()

Open() Syntax: open(file, mode)

In the method argument, file means the path and name of the file which you can open after work is done. Then, mode means a string, that defines which mode you want to open the file in, and basically, there are four modes

“r” – Read – Default value. Opens a file for reading, error if the file does not exist

“a” – Append – Opens a file for appending, creates the file if it does not exist

“w” – Write – Opens a file for writing, creates the file if it does not exist

“x” – Create – Creates the specified file, returns an error if the file exists

In this case, we need to create and write a new CSV file, so we can either use “w”, or “x”

For example, we can create a variable (csv_file) and write a line of code like this:

csv_file = open('ecommerce_scrape.csv','w')

Writer() Syntax: writer(variable)

The csv.writer() method returns a writer object which converts the user’s data into delimited strings on the given file-like object.

For example, we can create a variable (csv_writer) and write a line of code like this:

csv_writer = csv.writer(csv_file)

Normally we scrape data and aim to split data and feed data into different specific columns in CSV. So the purpose of writer() is to create a parsable working environment before we parse the CSV file data.

We don’t expect to read and use the data from a single excel box. Instead, no matter whether we save the files locally or on the server, we aim to split the raw data into different row headlines, which are convenient for us to read, call and use. In order to get the data in an expected format, we need to parse the data. Today, we’ll introduce a method – writerow(). basically, the writerow() method is used to create each column headline and writes a row of data into the specified file.

Writerow() Syntax: writerow([‘ ’],[‘ ’],[‘ ’],……, or [variable, variable2, variable3,….]

For example, we can write a line of code like this:

csv_writer.writerow(['Headline','Summary'])

Now each column naming is done in the CSV file, and then we could feed the scraped data by columns.

For example, we can write a line of code like this:

csv_writer.writerow([headline,summary])

As you might be aware, the arguments in the above writerow are all the variables we created to scrape the different sections of data in the Easy2Digital eCommerce article page. Please keep this in mind and avoid using the column naming we did in the previous step.

(Note: We’ll discuss how to scrape specific data in Python3 on Sublime Text in the next chapter. Before that, you can refer to the other article “Web Scraping with Google Sheets ImportXML to Automatically Collect Product Price Info”, where you can find the ways to use developer tools to identify the specific data location and path and learn about HTML structure.)

Save Scraped Data into CSV Files

In order to tell Python3, the CSV file coding work is finished and export a file, or update the data to a server location, we need to use a method – close()

Python file method close() closes the opened file. A closed file cannot be read or written any more. Any operation, which requires that the file be opened will raise a ValueError after the file has been closed. Calling close() more than once is allowed.

Python automatically closes a file when the reference object of a file is reassigned to another file. It is a good practice to use the close() method to close a file.
For example, we can write a line of code like this:

csv_file.close()

Then, we can enter “command + B”. It’s still showing the headline and summary on Sublime text, but you find that there is a new CSV file that you name in the script (ecommerce_scrape.csv) showing up in the assigned location.

 

If you try to open it, you can find all scraped information is saved in the CSV.file. There is no limitation on what data you want to scrape automatically, and where to save this new file. It’s just depending on your business purpose and work operation.

Read CSV Files

In many cases, you would need to write a Python script to automate a full workflow, such as updating the eCommerce SKU profit calculator. Thus opening existing files and getting the information is a key ingredient in the automatic process. Here we would introduce two patterns with..as and for line in, and two methods – reader() and next()

First of all, let’s import the CSV module and open the existing file we just now created in CSV format. As you can see, here we use ‘r’ in the open method instead of ‘x’, or ‘w’ because we want to read the information, and define it as csv_file by using open….as

import csv

with open('ecommerce_scrape.csv','r') as csv_file:

Then, we need to use the reader method to grab the information and show it to us, so we create a variable csv_reading as well as the line of code listed below

csv_reading = csv.reader(csv_file)

Noted: reader() method returns a reader object which is an iterator of lines in the CSV file.

If we try to print (csv_reading) and enter command b, the return is the object information.

In order to show the information in the file, we need to write a line of code by using for line like this:

For Looping

for line in csv_reading:

Then, not all of the information is necessary to grasp, so you can modify and select the information you want to use by using next() and [number].
next () function returns the next item from the iterator. For example in this case, if you don’t need each column headline name, you can use:

next(csv_reading)

Last but not least, you might just need specific column information like the article headline here. In general programming, 0 means the first. So if we just need the article headline, we can add after a line in print method like this:

print(line[0])

As you can see, it only shows all article headlines in the return dashboard.

So easy, right? I hope you enjoy reading Chapter 3: How to Write, Parse, and Read CSV Files with Scraped Data. If you did, please support us by doing one of the things listed below, because it always helps out our channel.

If you are interested in the next chapter, please check out the article below

Python Tutorial 4: Create a Website Bot to Scrape Specific Website Data Using BeautifulSoup

FAQ:

Q1: What is the CSV Python Library?

A: The CSV Python Library is a module in Python that provides functionality for working with comma-separated values (CSV) files.

Q2: How can the CSV Python Library be used?

A: The CSV Python Library can be used to read and write CSV files, parse CSV data, and manipulate CSV data in various ways.

Q3: What are some common use cases for the CSV Python Library?

A: Some common use cases for the CSV Python Library include importing and exporting data in CSV format, analyzing and processing CSV data, and generating reports or summaries from CSV data.

Q4: Is the CSV Python Library built-in or do I need to install it?

A: The CSV Python Library is part of the standard library in Python, so it is already included with your Python installation and does not require any additional installation steps.

Q5: Can the CSV Python Library handle large CSV files?

A: Yes, the CSV Python Library is designed to handle large CSV files efficiently. It provides various methods for reading and writing data in chunks, which helps to conserve memory and improve performance when working with large datasets.

Q6: Are there any limitations or caveats when using the CSV Python Library?

A: One limitation of the CSV Python Library is that it does not support all possible variations of CSV files, such as those with complex nested structures or non-standard delimiters. However, for most common use cases, the library provides more than enough functionality.

Q7: Is the CSV Python Library compatible with other data manipulation libraries?

A: Yes, the CSV Python Library can be easily integrated with other popular data manipulation libraries in Python, such as pandas or numpy. You can read CSV data into these libraries for further processing or export data from these libraries into CSV format using the CSV Python Library.

Q8: Does the CSV Python Library support Unicode characters?

A: Yes, the CSV Python Library supports Unicode characters. It provides options for specifying the encoding of the CSV file, allowing you to work with CSV files that contain characters from different languages or character sets.

Q9: Is the CSV Python Library cross-platform?

A: Yes, the CSV Python Library is cross-platform and can be used on any operating system that supports Python.

Q10: Where can I find more information and examples on how to use the CSV Python Library?

A: You can find more information, examples, and documentation on how to use the CSV Python Library in the official Python documentation or by searching online for tutorials and guides specifically related to CSV manipulation in Python.

Exit mobile version