In this lesson, we will discuss the basics of working with data:
Separating text into words is a common programming task
someText = "Read my lips!"
someText
listOfWords = someText.split() # default is to parse on white space
listOfWords
type(listOfWords)
# we can iterate over the items in the list as usual...
for word in listOfWords:
print(word)
The resulting listOfWords is the same in these three cases:
someText
# Example 1
listOfWords = someText.split('ead')
listOfWords
# quick example to show that you can cascade splits and selections
mypath = "/foo/bar/biff/baz/file.png"
mypath.split("/")[-1].split(".")[0]
'file'
# Example 2
lineOfText = "one fish two fish red fish blue fish"
tokens = lineOfText.split() # default is to parse on white space
# default parsing
for item in tokens:
print("next item: '%s'" % item)
next item: 'one' next item: 'fish' next item: 'two' next item: 'fish' next item: 'red' next item: 'fish' next item: 'blue' next item: 'fish'
tokens = lineOfText.split('h') # now, parse on 'h'
# default parsing
for item in tokens:
print("next item: '%s'" % item) # take note of resulting whitespace
next item: 'one fis' next item: ' two fis' next item: ' red fis' next item: ' blue fis' next item: ''
tokens = lineOfText.split('fish')
# default parsing
for item in tokens:
tokens = item.split()
print("next item: '%s'" % item.strip()) # take note of resulting whitespace
There are two types of files:
In this course, we are going to restrict attention to text files only.
Perhaps the simplest way to create an output file is to "redirect" the normal output of a program (i.e., its print statements) into a file with a specific name.
From the (Unix) command line, the >
operator writes program output to a file:
python myscript.py > outfile.txt
This is called "redirecting file output" and it has nothing to do with Python per se. When we are doing this, we are taking advantage of a feature of the operating system. Nonetheless, it's an easy and powerful way of creating an output file.
It's important to note that the >
operator creates a new file if the named file (here, "outfile.txt") does not exist. If the file already exists, the previous file is overwritten by the new content.
From the (Unix) command line, the >>
operator appends program output to a file (this can be used to run the program computation for different inputs/situations and get one file containing all results back).
python myscript.py >> outfile.txt
Each time the program myscript.py is called in this manner, its output is added to the end of the existing file outfile.txt.
We also have the ability for our Python program to write "directly" to a file. In order to do this, we must include Python commands to execute each of the following steps:
Open (or create if it does not exist) the file in write mode
Write the desired information
Close the the file (data gets lost if the file is not closed)
Note that the file will be written to your current "working" directory.
Here is a simple example:
#open a file named "sample1.txt" in 'write' mode
myFile = open("sample1.txt", "w") # the "w" indicates to open in write mode
#write (save) three lines of text
myFile.write("This is line 1.\n") # note the use of \n to get a "carriage return"
myFile.write("And here's line 2.\n")
myFile.write("Finally comes line 3.\n")
#close the file
myFile.close() #data get lost if not closed
Having executed the code block above, we should now see a file named sample1.txt
in the same directory as this notebook.
We can also create Python programs that read the contents of a file.
In order to do this, we must include Python commands to execute each of the following steps:
Open (or create if it does not exist) the file in write mode
Read the desired information
Close the the file (data gets lost if the file is not closed)
Note that the file must already exist in your current "working" directory.
For example, we can use Python to read the file we just wrote.
Here are some simple file reading scripts:
# Reading Files Example 1:
#open the file in 'read' mode
myFile = open("sample1.txt", "r") # the "r" indicates to open in read mode
whole_file = myFile.read() # this reads the ENTIRE FILE into the variable
print(whole_file) # to print the whole file
myFile.close()
In the block of code above, the read()
function read the entire file, which is typically not very useful in practice. Instead, we typically read a file line-by-line.
# Reading Files Example 2:
#open the file in 'read' mode
myFile = open("sample1.txt", "r") # the "r" indicates to open in read mode
first_line = myFile.readline() # this reads one line of the file into the variable
print(first_line) # to print the contents
second_line = myFile.readline() # this reads one line of the file into the variable
print(second_line) # to print the contents
third_line = myFile.readline() # this reads one line of the file into the variable
print(third_line) # to print the contents
myFile.close() #close the file
What a pain. It would be much better to use a loop. As with all loops, there is more than one way to do this. Here's a common one:
# Reading Files Example 3:
#open the file in 'read' mode
myFile = open("sample1.txt", "r") # the "r" indicates to open in read mode
for line in myFile: # the for-loop: a nice way to iterate over the lines in a file
print(line)
myFile.close() #close the file
One of the simplest data formats is known as Comma Separated Values or CSV data.
CSV is simply text with commas, used to separate individual values. However, the convention is to use a ".csv" file extension to indicate that the file has this format.
Many programs "know" how to read/write CSV data, including spreadsheet programs like Excel.
Often, the data in a spreadsheet can be converted to a Comma-Separated Value (.csv) format (for example, Microsoft Excel allows you to save spreadsheets as a .csv)
We can use what we've learned today to read and write from/to .csv files (you will probably need to do this often).
The examples below use the "CSVFile.csv" file you should have downloaded and have in the same director as this .ipynb file.
# To read a .csv file
target = open("CSVFile.csv", "r") # open the target file in read mode
my_data = [] # create an empty list
for line in target: # run a loop over each line of the target file
line = line.strip() # this strips any leading/lagging whitespace and any special characters
my_data.append(line.split(',')) # this splits the line into a list and appends that list to the my_data list
my_data # note that all of the data returned is of type string!
target.close() # close the target file
my_data
my_data.append(['E', '17', '18', '19', '20']) # let's add another line - note everything is a string
my_data
# To write a .csv file
f = open('CSVFile.csv','w') # open the file and refer to it as "f"
for sublist in my_data: # loop over rows
for item in sublist: # loop over columns
f.write(item + ',') # this writes each element and a comma to serve as the delimiter
f.write('\n') # this executes a newline character at the end of the line
f.close()
csv
module¶Reading and writing comma-separated value (CSV) data is so common that there is a Python module to make it easier. Check out https://docs.python.org/3/library/csv.html.
Key features of this module:
split
each line on comma (this happens automatically)### Read data into list
import csv
# this is a real shortcut to reading everything into a list-of-lists...
with open('CSVFile.csv', 'r') as f:
reader = csv.reader(f)
my_data = list(reader)
f.close()
my_data
my_data.append(['F', '21', '22', '23', '24']) # let's add another line - note everything is a string
my_data
with open('CSVFile.csv', 'w') as csvfile:
writer = csv.writer(csvfile) # pass additional parameters as appropriate
for row in my_data:
writer.writerow(row)