Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download

An introduction to list comprehensions

Views: 555
Kernel: Python 3 (Anaconda 5)

List comprehensions

Those of you who have taken some advanced math courses may be familiar with set builder notation.

This is a way of specifying a subset of a known set determined by some property.

For example if you want to talk about the subset of integers Z\mathbb{Z} that are even, you could describe this as:

{aZ:2 divides a}\{a \in \mathbb{Z}:\text{2 divides } a\}

This says "The set of all aa in Z\mathbb{Z} such that 2 divides aa."

The basic form of the statment is

{aS:a satisfies constraint C}\{a \in S: a \text{ satisfies constraint } C\}

Python has syntax for this called "list comprehension" syntax. (The word comprehension comes from its use in set theory. )

The form of the Python syntax is:

[a for a in S if C(a)]

where S is any container (list, set, etc.) and C(a) is some function returning True or False.

# Example 1 S = "bElUgaWhAlE!!?" upper_letters = [letter for letter in S if letter.isupper()] print("Here are the uppercase letters occurring in ",S) upper_letters
Here are the uppercase letters occurring in bElUgaWhAlE!!?
['E', 'U', 'W', 'A', 'E']
# Example 2 phrase = "It is not charisma, daring or eloquence that have made her remarkable. Like her mentor and predecessor as chancellor, Helmut Kohl, Ms. Merkel is rather bland in speech and demeanor. Her slogan in the last election — “For a Germany where life is good and we enjoy it” — about summed up the comforting combination of moderation, stability, centrism and decency that have rallied voters behind “Mutti” (Mommy). In her 13 years at the helm, Germany has been a fairly calm and prosperous place, despite some political storms." no_symbols = [k for k in phrase if k.isalpha() or k is ' '] processed_phrase = "".join(no_symbols) print(processed_phrase)
It is not charisma daring or eloquence that have made her remarkable Like her mentor and predecessor as chancellor Helmut Kohl Ms Merkel is rather bland in speech and demeanor Her slogan in the last election For a Germany where life is good and we enjoy it about summed up the comforting combination of moderation stability centrism and decency that have rallied voters behind Mutti Mommy In her years at the helm Germany has been a fairly calm and prosperous place despite some political storms
#Recall how "join" works. sentence = "An old dog can learn new tricks" "-".join(sentence.split())
'An-old-dog-can-learn-new-tricks'
# Exercise # # Use a list comprehension to turn processed_phrase into a list of lowercase words. words = [word.lower() for word in processed_phrase.split()] # # Then use a list comprehension to output all words in the phrase of length 2. len_two = [word for word in words if len(word)==2] list(set(len_two))
['in', 'we', 'or', 'up', 'at', 'is', 'ms', 'of', 'as', 'it']
# Exercise # Use a list comprehension to select the vowels out of your own name. name = "Hunter Johnson" vowels = list("aeiouy") [letter for letter in name if letter not in vowels]
['H', 'n', 't', 'r', ' ', 'J', 'h', 'n', 's', 'n']
set("Hunter Johnson".lower()).intersection(vowels)
{'e', 'o', 'u'}

Mappings and transformations

List comprehension notation can also be used to transform each element in a list.

The basic form is:

[f(a) for a in L]

where f(a)f(a) is any function.

Here are some examples.

sentence = "I will replace each word in this sentence with its last letter".split() [word[-1] for word in sentence] #last letter in each word [len(word) for word in sentence] [word.count("i") for word in sentence]
[0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0]
[len(word) for word in sentence]
sentence = "I will replace each word with whether or not the word is length 3 or less".split() [len(word) <= 3 for word in sentence]
[True, False, False, False, False, False, False, True, True, True, False, True, False, True, True, False]
# Exercise: # replace each number in [8,1,7,6,3,0,9] with the string version of the number. # hint: Use str(), the string conversion function. L_nums = [8,1,7,6,3,0,9] # [f(a) for a in S.] [str(a) for a in L_nums]
['8', '1', '7', '6', '3', '0', '9']
#Another approach list(map(str,L_nums))
['8', '1', '7', '6', '3', '0', '9']
## Invert the operation from above. ## That is, replace each word in ["8","1","7","6","3","0","9"] with its numerical equivalent. Want output to be [8,1,7,6,3,0,9] ## Here is a hint: NUMBERS = ["8","1","7","6","3","0","9"] [int(n) for n in NUMBERS]
[8, 1, 7, 6, 3, 0, 9]
# What if we want robust error checking? L_strs = ["8","1","7","6","3","0","9"] + ["oh no"] renum = [] for elem in L_strs: try: elem_int = int(elem) renum.append(elem_int) except: pass renum
[8, 1, 7, 6, 3, 0, 9]
# Exercise: sentence = "replace each word in this sentence with the same word but reversed" list_of_words = sentence.split() word = "example" print("{} reversed is {}".format(word,word[ ::-1])) # Solution list_of_words_reversed = [word[::-1] for word in list_of_words] " ".join(list_of_words_reversed)
example reversed is elpmaxe
'ecalper hcae drow ni siht ecnetnes htiw eht emas drow tub desrever'

All together now

You can of course do a transformation and a filter at the same time.

That is, you can do commands like:

[f(a) for a in S if C(a)]

Here are some examples.

# The double of each number in the list L which is divisible by 3. L = [1,2,3,4,5,6,7,8,9,10] D = [i*2 for i in L if i%3==0] D
[6, 12, 18]
# The reverse of each word in the sentence with length >= 4 sentence = "I like pasta but should I eat it out of this strangers purse".split() [word[::-1] for word in sentence if len(word) >= 4]
['ekil', 'atsap', 'dluohs', 'siht', 'sregnarts', 'esrup']
# Exercise: #Finish specifying the numerizer() function so that it works as described. digits = ["zero","one","two","three","four","five","six","seven","eight","nine"] sentence = "if you get 1 or 2 then you might as well get 3".split() def numerizer(word,digits=digits): """ This function takes a string as input. If the string is an Arabic numeral, this function returns the number in English. For example if the input is "123" then the output is "one two three" If the string is not an Arabic numeral, then just return the word. For example if the input is "sandwich" then the output is "sandwich". """ try: number = int(word) #This may fail. num_str_list = list(word) # "123" becomes ["1","2","3"] ### You type stuff here. return output except: return word print("The cell output should be:\n 'if you get one or two then you might as well get three'" ) " ".join(numerizer(word) for word in sentence)
The cell output should be: 'if you get one or two then you might as well get three'
'if you get one or two then you might as well get three'
# Exercise: # On the midterm you defined H(n) = 1+1/2+1/3+...+ 1/n # Now use a lambda to define this function using only one line of code H = lambda n: sum(1/i for i in range(1,n+1)) #stuff goes here H(3) == 1+1/2+1/3 #Should be True
True
# The sum of the first n numbers... sum_of_first_n = lambda n : sum(range(0,n+1)) sum_of_first_n(100)
5050
sum_first_n_squares = lambda n: sum(i**2 for i in range(n+1)) sum_first_n_squares(100)
338350

Dictionary comprehensions

You can use notation much like list comprehension notation to initialize a dictionary.

D = {food:0 for food in ["pizza","pasta","popsicle"]} D["pizza"] += 1 D
{'pizza': 1, 'pasta': 0, 'popsicle': 0}
squares = {i:i**2 for i in range(100)} number = 9 "The square of {} is {}".format(number,squares[number])
'The square of 9 is 81'

Set comprehensions

Similarly you can select items from a set.

S = set([1,2,3,1,2,3,1,2,3]+ [12,16,17,18,19,20]) print("The elements of the set S = ",S) S_evens = {i for i in S if i%2==0} print("The even elements of the set S are ",S_evens)
The elements of the set S = {1, 2, 3, 12, 16, 17, 18, 19, 20} The even elements of the set S are {2, 12, 16, 18, 20}
## Exercise: ## Initialize a dictionary D such that D["a"] == D["b"] ==0. ## Then use it to count the frequency of "a" and "b" in the string below. S = "aaababababbabababababababaabbbbbababababbbababababbabababababababababababababababababababbabababababababbababababab"