Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
| Download

📚 The CoCalc Library - books, templates and other resources

Views: 96169
License: OTHER
1
% LaTeX source for ``Think Python: How to Think Like a Computer Scientist''
2
% Copyright (c) 2015 Allen B. Downey.
3
4
% License: Creative Commons Attribution-NonCommercial 3.0 Unported License.
5
% http://creativecommons.org/licenses/by-nc/3.0/
6
%
7
8
%\documentclass[10pt,b5paper]{book}
9
\documentclass[10pt]{book}
10
\usepackage[width=5.5in,height=8.5in,hmarginratio=3:2,vmarginratio=1:1]{geometry}
11
12
% for some of these packages, you might have to install
13
% texlive-latex-extra (in Ubuntu)
14
15
\usepackage[T1]{fontenc}
16
\usepackage{textcomp}
17
\usepackage{mathpazo}
18
\usepackage{url}
19
\usepackage{fancyhdr}
20
\usepackage{graphicx}
21
\usepackage{amsmath}
22
\usepackage{amsthm}
23
%\usepackage{amssymb}
24
\usepackage{exercise} % texlive-latex-extra
25
\usepackage{makeidx}
26
\usepackage{setspace}
27
\usepackage{hevea}
28
\usepackage{upquote}
29
\usepackage{appendix}
30
\usepackage[bookmarks]{hyperref}
31
32
\title{Think Python}
33
\author{Allen B. Downey}
34
\newcommand{\thetitle}{Think Python: How to Think Like a Computer Scientist}
35
\newcommand{\theversion}{2nd Edition, Version 2.4.0}
36
\newcommand{\thedate}{}
37
38
% these styles get translated in CSS for the HTML version
39
\newstyle{a:link}{color:black;}
40
\newstyle{p+p}{margin-top:1em;margin-bottom:1em}
41
\newstyle{img}{border:0px}
42
43
% change the arrows
44
\setlinkstext
45
{\imgsrc[ALT="Previous"]{back.png}}
46
{\imgsrc[ALT="Up"]{up.png}}
47
{\imgsrc[ALT="Next"]{next.png}}
48
49
\makeindex
50
51
\newif\ifplastex
52
\plastexfalse
53
54
\begin{document}
55
56
\frontmatter
57
58
% PLASTEX ONLY
59
\ifplastex
60
\usepackage{localdef}
61
\maketitle
62
63
\newcount\anchorcnt
64
\newcommand*{\Anchor}[1]{%
65
\@bsphack%
66
\Hy@GlobalStepCount\anchorcnt%
67
\edef\@currentHref{anchor.\the\anchorcnt}%
68
\Hy@raisedlink{\hyper@anchorstart{\@currentHref}\hyper@anchorend}%
69
\M@gettitle{}\label{#1}%
70
\@esphack%
71
}
72
73
74
\else
75
% skip the following for plastex
76
77
\newtheorem{exercise}{Exercise}[chapter]
78
79
% LATEXONLY
80
81
\input{latexonly}
82
83
\begin{latexonly}
84
85
\renewcommand{\blankpage}{\thispagestyle{empty} \quad \newpage}
86
87
%\blankpage
88
%\blankpage
89
90
% TITLE PAGES FOR LATEX VERSION
91
92
%-half title--------------------------------------------------
93
\thispagestyle{empty}
94
95
\begin{flushright}
96
\vspace*{2.0in}
97
98
\begin{spacing}{3}
99
{\huge Think Python}\\
100
{\Large How to Think Like a Computer Scientist}
101
\end{spacing}
102
103
\vspace{0.25in}
104
105
\theversion
106
107
\thedate
108
109
\vfill
110
111
\end{flushright}
112
113
%--verso------------------------------------------------------
114
115
\blankpage
116
\blankpage
117
%\clearemptydoublepage
118
%\pagebreak
119
%\thispagestyle{empty}
120
%\vspace*{6in}
121
122
%--title page--------------------------------------------------
123
\pagebreak
124
\thispagestyle{empty}
125
126
\begin{flushright}
127
\vspace*{2.0in}
128
129
\begin{spacing}{3}
130
{\huge Think Python}\\
131
{\Large How to Think Like a Computer Scientist}
132
\end{spacing}
133
134
\vspace{0.25in}
135
136
\theversion
137
138
\thedate
139
140
\vspace{1in}
141
142
143
{\Large
144
Allen Downey\\
145
}
146
147
148
\vspace{0.5in}
149
150
{\Large Green Tea Press}
151
152
{\small Needham, Massachusetts}
153
154
%\includegraphics[width=1in]{figs/logo1.pdf}
155
\vfill
156
157
\end{flushright}
158
159
160
%--copyright--------------------------------------------------
161
\pagebreak
162
\thispagestyle{empty}
163
164
{\small
165
Copyright \copyright ~2015 Allen Downey.
166
167
168
\vspace{0.2in}
169
170
\begin{flushleft}
171
Green Tea Press \\
172
9 Washburn Ave \\
173
Needham MA 02492
174
\end{flushleft}
175
176
Permission is granted to copy, distribute, and/or modify this document
177
under the terms of the Creative Commons Attribution-NonCommercial 3.0 Unported
178
License, which is available at \url{http://creativecommons.org/licenses/by-nc/3.0/}.
179
180
The original form of this book is \LaTeX\ source code. Compiling this
181
\LaTeX\ source has the effect of generating a device-independent
182
representation of a textbook, which can be converted to other formats
183
and printed.
184
185
The \LaTeX\ source for this book is available from
186
\url{http://www.thinkpython2.com}
187
188
\vspace{0.2in}
189
190
} % end small
191
192
\end{latexonly}
193
194
195
% HTMLONLY
196
197
\begin{htmlonly}
198
199
% TITLE PAGE FOR HTML VERSION
200
201
{\Large \thetitle}
202
203
{\large Allen B. Downey}
204
205
\theversion
206
207
\thedate
208
209
\setcounter{chapter}{-1}
210
211
\end{htmlonly}
212
213
\fi
214
% END OF THE PART WE SKIP FOR PLASTEX
215
216
217
\chapter{Preface}
218
219
\section*{The strange history of this book}
220
221
In January 1999 I was preparing to teach an introductory programming
222
class in Java. I had taught it three times and I was getting
223
frustrated. The failure rate in the class was too high and, even for
224
students who succeeded, the overall level of achievement was too low.
225
226
One of the problems I saw was the books.
227
They were too big, with too much unnecessary detail about Java, and
228
not enough high-level guidance about how to program. And they all
229
suffered from the trap door effect: they would start out easy,
230
proceed gradually, and then somewhere around Chapter 5 the bottom would
231
fall out. The students would get too much new material, too fast,
232
and I would spend the rest of the semester picking up the pieces.
233
234
Two weeks before the first day of classes, I decided to write my
235
own book. My goals were:
236
237
\begin{itemize}
238
239
\item Keep it short. It is better for students to read 10 pages
240
than not read 50 pages.
241
242
\item Be careful with vocabulary. I tried to minimize jargon
243
and define each term at first use.
244
245
\item Build gradually. To avoid trap doors, I took the most difficult
246
topics and split them into a series of small steps.
247
248
\item Focus on programming, not the programming language. I included
249
the minimum useful subset of Java and left out the rest.
250
251
\end{itemize}
252
253
I needed a title, so on a whim I chose {\em How to Think Like
254
a Computer Scientist}.
255
256
My first version was rough, but it worked. Students did the reading,
257
and they understood enough that I could spend class time on the hard
258
topics, the interesting topics and (most important) letting the
259
students practice.
260
261
I released the book under the GNU Free Documentation License,
262
which allows users to copy, modify, and distribute the book.
263
\index{GNU Free Documentation License}
264
\index{Free Documentation License, GNU}
265
266
What happened next is the cool part. Jeff Elkner, a high school
267
teacher in Virginia, adopted my book and translated it into
268
Python. He sent me a copy of his translation, and I had the
269
unusual experience of learning Python by reading my own book.
270
As Green Tea Press, I published the first Python version in 2001.
271
\index{Elkner, Jeff}
272
273
In 2003 I started teaching at Olin College and I got to teach
274
Python for the first time. The contrast with Java was striking.
275
Students struggled less, learned more, worked on more interesting
276
projects, and generally had a lot more fun.
277
\index{Olin College}
278
279
Since then I've continued to develop the book,
280
correcting errors, improving some of the examples and
281
adding material, especially exercises.
282
283
The result is this book, now with the less grandiose title
284
{\em Think Python}. Some of the changes are:
285
286
\begin{itemize}
287
288
\item I added a section about debugging at the end of each chapter.
289
These sections present general techniques for finding and avoiding
290
bugs, and warnings about Python pitfalls.
291
292
\item I added more exercises, ranging from short tests of
293
understanding to a few substantial projects. Most exercises
294
include a link to my solution.
295
296
\item I added a series of case studies---longer examples with
297
exercises, solutions, and discussion.
298
299
\item I expanded the discussion of program development plans
300
and basic design patterns.
301
302
\item I added appendices about debugging and analysis of algorithms.
303
304
\end{itemize}
305
306
The second edition of {\em Think Python} has these new features:
307
308
\begin{itemize}
309
310
\item The book and all supporting code have been updated to Python 3.
311
312
\item I added a few sections, and more details on the web, to help
313
beginners get started running Python in a browser, so you don't have
314
to deal with installing Python until you want to.
315
316
\item For Chapter~\ref{turtle} I switched from my own turtle graphics
317
package, called Swampy, to a more standard Python module, {\tt
318
turtle}, which is easier to install and more powerful.
319
320
\item I added a new chapter called ``The Goodies'', which introduces
321
some additional Python features that are not strictly necessary, but
322
sometimes handy.
323
324
\end{itemize}
325
326
I hope you enjoy working with this book, and that it helps
327
you learn to program and think like
328
a computer scientist, at least a little bit.
329
330
331
Allen B. Downey \\
332
333
Olin College \\
334
335
336
\section*{Acknowledgments}
337
338
Many thanks to Jeff Elkner, who
339
translated my Java book into Python, which got this project
340
started and introduced me to what has turned out to be my
341
favorite language.
342
\index{Elkner, Jeff}
343
344
Thanks also to Chris Meyers, who contributed several sections
345
to {\em How to Think Like a Computer Scientist}.
346
\index{Meyers, Chris}
347
348
Thanks to the Free Software Foundation for developing
349
the GNU Free Documentation License, which helped make
350
my collaboration with Jeff and Chris possible, and Creative
351
Commons for the license I am using now.
352
\index{GNU Free Documentation License}
353
\index{Free Documentation License, GNU}
354
\index{Creative Commons}
355
356
Thanks to the editors at Lulu who worked on
357
{\em How to Think Like a Computer Scientist}.
358
359
Thanks to the editors at O'Reilly Media who worked on
360
{\em Think Python}.
361
362
Thanks to all the students who worked with earlier
363
versions of this book and all the contributors (listed
364
below) who sent in corrections and suggestions.
365
366
367
\section*{Contributor List}
368
369
\index{contributors}
370
More than 100 sharp-eyed and thoughtful readers have sent in
371
suggestions and corrections over the past few years. Their
372
contributions, and enthusiasm for this project, have been a
373
huge help.
374
375
If you have a suggestion or correction, please send email to
376
{\tt feedback@thinkpython.com}. If I make a change based on your
377
feedback, I will add you to the contributor list
378
(unless you ask to be omitted).
379
380
If you include at least part of the sentence the
381
error appears in, that makes it easy for me to search. Page and
382
section numbers are fine, too, but not quite as easy to work with.
383
Thanks!
384
385
\begin{itemize}
386
387
\small
388
\item Lloyd Hugh Allen sent in a correction to Section 8.4.
389
390
\item Yvon Boulianne sent in a correction of a semantic error in
391
Chapter 5.
392
393
\item Fred Bremmer submitted a correction in Section 2.1.
394
395
\item Jonah Cohen wrote the Perl scripts to convert the
396
LaTeX source for this book into beautiful HTML.
397
398
\item Michael Conlon sent in a grammar correction in Chapter 2
399
and an improvement in style in Chapter 1, and he initiated discussion
400
on the technical aspects of interpreters.
401
402
\item Beno\^{i}t Girard sent in a
403
correction to a humorous mistake in Section 5.6.
404
405
\item Courtney Gleason and Katherine Smith wrote {\tt horsebet.py},
406
which was used as a case study in an earlier version of the book. Their
407
program can now be found on the website.
408
409
\item Lee Harr submitted more corrections than we have room to list
410
here, and indeed he should be listed as one of the principal editors
411
of the text.
412
413
\item James Kaylin is a student using the text. He has submitted
414
numerous corrections.
415
416
\item David Kershaw fixed the broken {\tt catTwice} function in Section
417
3.10.
418
419
\item Eddie Lam has sent in numerous corrections to Chapters
420
1, 2, and 3.
421
He also fixed the Makefile so that it creates an index the first time it is
422
run and helped us set up a versioning scheme.
423
424
\item Man-Yong Lee sent in a correction to the example code in
425
Section 2.4.
426
427
\item David Mayo pointed out that the word ``unconsciously"
428
in Chapter 1 needed
429
to be changed to ``subconsciously".
430
431
\item Chris McAloon sent in several corrections to Sections 3.9 and
432
3.10.
433
434
\item Matthew J. Moelter has been a long-time contributor who sent
435
in numerous corrections and suggestions to the book.
436
437
\item Simon Dicon Montford reported a missing function definition and
438
several typos in Chapter 3. He also found errors in the {\tt increment}
439
function in Chapter 13.
440
441
\item John Ouzts corrected the definition of ``return value"
442
in Chapter 3.
443
444
\item Kevin Parks sent in valuable comments and suggestions as to how
445
to improve the distribution of the book.
446
447
\item David Pool sent in a typo in the glossary of Chapter 1, as well
448
as kind words of encouragement.
449
450
\item Michael Schmitt sent in a correction to the chapter on files
451
and exceptions.
452
453
\item Robin Shaw pointed out an error in Section 13.1, where the
454
printTime function was used in an example without being defined.
455
456
\item Paul Sleigh found an error in Chapter 7 and a bug in Jonah Cohen's
457
Perl script that generates HTML from LaTeX.
458
459
\item Craig T. Snydal is testing the text in a course at Drew
460
University. He has contributed several valuable suggestions and corrections.
461
462
\item Ian Thomas and his students are using the text in a programming
463
course. They are the first ones to test the chapters in the latter half
464
of the book, and they have made numerous corrections and suggestions.
465
466
\item Keith Verheyden sent in a correction in Chapter 3.
467
468
\item Peter Winstanley let us know about a longstanding error in
469
our Latin in Chapter 3.
470
471
\item Chris Wrobel made corrections to the code in the chapter on
472
file I/O and exceptions.
473
474
\item Moshe Zadka has made invaluable contributions to this project.
475
In addition to writing the first draft of the chapter on Dictionaries, he
476
provided continual guidance in the early stages of the book.
477
478
\item Christoph Zwerschke sent several corrections and
479
pedagogic suggestions, and explained the difference between {\em gleich}
480
and {\em selbe}.
481
482
\item James Mayer sent us a whole slew of spelling and
483
typographical errors, including two in the contributor list.
484
485
\item Hayden McAfee caught a potentially confusing inconsistency
486
between two examples.
487
488
\item Angel Arnal is part of an international team of translators
489
working on the Spanish version of the text. He has also found several
490
errors in the English version.
491
492
\item Tauhidul Hoque and Lex Berezhny created the illustrations
493
in Chapter 1 and improved many of the other illustrations.
494
495
\item Dr. Michele Alzetta caught an error in Chapter 8 and sent
496
some interesting pedagogic comments and suggestions about Fibonacci
497
and Old Maid.
498
499
\item Andy Mitchell caught a typo in Chapter 1 and a broken example
500
in Chapter 2.
501
502
\item Kalin Harvey suggested a clarification in Chapter 7 and
503
caught some typos.
504
505
\item Christopher P. Smith caught several typos and helped us
506
update the book for Python 2.2.
507
508
\item David Hutchins caught a typo in the Foreword.
509
510
\item Gregor Lingl is teaching Python at a high school in Vienna,
511
Austria. He is working on a German translation of the book,
512
and he caught a couple of bad errors in Chapter 5.
513
514
\item Julie Peters caught a typo in the Preface.
515
516
\item Florin Oprina sent in an improvement in {\tt makeTime},
517
a correction in {\tt printTime}, and a nice typo.
518
519
\item D.~J.~Webre suggested a clarification in Chapter 3.
520
521
\item Ken found a fistful of errors in Chapters 8, 9 and 11.
522
523
\item Ivo Wever caught a typo in Chapter 5 and suggested a clarification
524
in Chapter 3.
525
526
\item Curtis Yanko suggested a clarification in Chapter 2.
527
528
\item Ben Logan sent in a number of typos and problems with translating
529
the book into HTML.
530
531
\item Jason Armstrong saw the missing word in Chapter 2.
532
533
\item Louis Cordier noticed a spot in Chapter 16 where the code
534
didn't match the text.
535
536
\item Brian Cain suggested several clarifications in Chapters 2 and 3.
537
538
\item Rob Black sent in a passel of corrections, including some
539
changes for Python 2.2.
540
541
\item Jean-Philippe Rey at \'{E}cole Centrale
542
Paris sent a number of patches, including some updates for Python 2.2
543
and other thoughtful improvements.
544
545
\item Jason Mader at George Washington University made a number
546
of useful suggestions and corrections.
547
548
\item Jan Gundtofte-Bruun reminded us that ``a error'' is an error.
549
550
\item Abel David and Alexis Dinno reminded us that the plural of
551
``matrix'' is ``matrices'', not ``matrixes''. This error was in the
552
book for years, but two readers with the same initials reported it on
553
the same day. Weird.
554
555
\item Charles Thayer encouraged us to get rid of the semi-colons
556
we had put at the ends of some statements and to clean up our
557
use of ``argument'' and ``parameter''.
558
559
\item Roger Sperberg pointed out a twisted piece of logic in Chapter 3.
560
561
\item Sam Bull pointed out a confusing paragraph in Chapter 2.
562
563
\item Andrew Cheung pointed out two instances of ``use before def''.
564
565
\item C. Corey Capel spotted the missing word in the Third Theorem
566
of Debugging and a typo in Chapter 4.
567
568
\item Alessandra helped clear up some Turtle confusion.
569
570
\item Wim Champagne found a brain-o in a dictionary example.
571
572
\item Douglas Wright pointed out a problem with floor division in
573
{\tt arc}.
574
575
\item Jared Spindor found some jetsam at the end of a sentence.
576
577
\item Lin Peiheng sent a number of very helpful suggestions.
578
579
\item Ray Hagtvedt sent in two errors and a not-quite-error.
580
581
\item Torsten H\"{u}bsch pointed out an inconsistency in Swampy.
582
583
\item Inga Petuhhov corrected an example in Chapter 14.
584
585
\item Arne Babenhauserheide sent several helpful corrections.
586
587
\item Mark E. Casida is is good at spotting repeated words.
588
589
\item Scott Tyler filled in a that was missing. And then sent in
590
a heap of corrections.
591
592
\item Gordon Shephard sent in several corrections, all in separate
593
emails.
594
595
\item Andrew Turner {\tt spot}ted an error in Chapter 8.
596
597
\item Adam Hobart fixed a problem with floor division in {\tt arc}.
598
599
\item Daryl Hammond and Sarah Zimmerman pointed out that I served
600
up {\tt math.pi} too early. And Zim spotted a typo.
601
602
\item George Sass found a bug in a Debugging section.
603
604
\item Brian Bingham suggested Exercise~\ref{exrotatepairs}.
605
606
\item Leah Engelbert-Fenton pointed out that I used {\tt tuple}
607
as a variable name, contrary to my own advice. And then found
608
a bunch of typos and a ``use before def''.
609
610
\item Joe Funke spotted a typo.
611
612
\item Chao-chao Chen found an inconsistency in the Fibonacci example.
613
614
\item Jeff Paine knows the difference between space and spam.
615
616
\item Lubos Pintes sent in a typo.
617
618
\item Gregg Lind and Abigail Heithoff suggested Exercise~\ref{checksum}.
619
620
\item Max Hailperin has sent in a number of corrections and
621
suggestions. Max is one of the authors of the extraordinary {\em
622
Concrete Abstractions}, which you might want to read when you are
623
done with this book.
624
625
\item Chotipat Pornavalai found an error in an error message.
626
627
\item Stanislaw Antol sent a list of very helpful suggestions.
628
629
\item Eric Pashman sent a number of corrections for Chapters 4--11.
630
631
\item Miguel Azevedo found some typos.
632
633
\item Jianhua Liu sent in a long list of corrections.
634
635
\item Nick King found a missing word.
636
637
\item Martin Zuther sent a long list of suggestions.
638
639
\item Adam Zimmerman found an inconsistency in my instance
640
of an ``instance'' and several other errors.
641
642
\item Ratnakar Tiwari suggested a footnote explaining degenerate
643
triangles.
644
645
\item Anurag Goel suggested another solution for \verb"is_abecedarian"
646
and sent some additional corrections. And he knows how to
647
spell Jane Austen.
648
649
\item Kelli Kratzer spotted one of the typos.
650
651
\item Mark Griffiths pointed out a confusing example in Chapter 3.
652
653
\item Roydan Ongie found an error in my Newton's method.
654
655
\item Patryk Wolowiec helped me with a problem in the HTML version.
656
657
\item Mark Chonofsky told me about a new keyword in Python 3.
658
659
\item Russell Coleman helped me with my geometry.
660
661
\item Nam Nguyen found a typo and pointed out that I used the Decorator
662
pattern but didn't mention it by name.
663
664
\item St\'{e}phane Morin sent in several corrections and suggestions.
665
666
\item Paul Stoop corrected a typo in \verb+uses_only+.
667
668
\item Eric Bronner pointed out a confusion in the discussion of the
669
order of operations.
670
671
\item Alexandros Gezerlis set a new standard for the number and
672
quality of suggestions he submitted. We are deeply grateful!
673
674
\item Gray Thomas knows his right from his left.
675
676
\item Giovanni Escobar Sosa sent a long list of corrections and
677
suggestions.
678
679
\item Daniel Neilson corrected an error about the order of operations.
680
681
\item Will McGinnis pointed out that {\tt polyline} was defined
682
differently in two places.
683
684
\item Frank Hecker pointed out an exercise that was under-specified, and
685
some broken links.
686
687
\item Animesh B helped me clean up a confusing example.
688
689
\item Martin Caspersen found two round-off errors.
690
691
\item Gregor Ulm sent several corrections and suggestions.
692
693
\item Dimitrios Tsirigkas suggested I clarify an exercise.
694
695
\item Carlos Tafur sent a page of corrections and suggestions.
696
697
\item Martin Nordsletten found a bug in an exercise solution.
698
699
\item Sven Hoexter pointed out that a variable named {\tt input}
700
shadows a build-in function.
701
702
\item Stephen Gregory pointed out the problem with {\tt cmp}
703
in Python 3.
704
705
\item Ishwar Bhat corrected my statement of Fermat's last theorem.
706
707
\item Andrea Zanella translated the book into Italian, and sent a
708
number of corrections along the way.
709
710
\item Many, many thanks to Melissa Lewis and Luciano Ramalho for
711
excellent comments and suggestions on the second edition.
712
713
\item Thanks to Harry Percival from PythonAnywhere for his help
714
getting people started running Python in a browser.
715
716
\item Xavier Van Aubel made several useful corrections in the second
717
edition.
718
719
\item William Murray corrected my definition of floor division.
720
721
\item Per Starb{\"a}ck brought me up to date on universal newlines in Python 3.
722
723
\item Laurent Rosenfeld and Mihaela Rotaru translated this book into French. Along the way, they sent many corrections and suggestions.
724
725
% ENDCONTRIB
726
727
In addition, people who spotted typos or made corrections include
728
Czeslaw Czapla, Dale Wilson, Francesco Carlo Cimini,
729
Richard Fursa, Brian McGhie, Lokesh Kumar Makani, Matthew Shultz, Viet
730
Le, Victor Simeone, Lars O.D. Christensen, Swarup Sahoo, Alix Etienne,
731
Kuang He, Wei Huang, Karen Barber, and Eric Ransom.
732
733
734
735
736
\end{itemize}
737
738
\normalsize
739
\clearemptydoublepage
740
741
% TABLE OF CONTENTS
742
\begin{latexonly}
743
744
\tableofcontents
745
746
\clearemptydoublepage
747
748
\end{latexonly}
749
750
% START THE BOOK
751
\mainmatter
752
753
\chapter{The way of the program}
754
755
The goal of this book is to teach you to think like a computer
756
scientist. This way of thinking combines some of the best features of
757
mathematics, engineering, and natural science. Like mathematicians,
758
computer scientists use formal languages to denote ideas (specifically
759
computations). Like engineers, they design things, assembling
760
components into systems and evaluating tradeoffs among alternatives.
761
Like scientists, they observe the behavior of complex systems, form
762
hypotheses, and test predictions. \index{problem solving}
763
764
The single most important skill for a computer scientist is {\bf
765
problem solving}. Problem solving means the ability to formulate
766
problems, think creatively about solutions, and express a solution
767
clearly and accurately. As it turns out, the process of learning to
768
program is an excellent opportunity to practice problem-solving
769
skills. That's why this chapter is called, ``The way of the
770
program''.
771
772
On one level, you will be learning to program, a useful skill by
773
itself. On another level, you will use programming as a means to an
774
end. As we go along, that end will become clearer.
775
776
777
\section{What is a program?}
778
779
A {\bf program} is a sequence of instructions that specifies how to
780
perform a computation. The computation might be something
781
mathematical, such as solving a system of equations or finding the
782
roots of a polynomial, but it can also be a symbolic computation, such
783
as searching and replacing text in a document or something
784
graphical, like processing an image or playing a video.
785
\index{program}
786
787
The details look different in different languages, but a few basic
788
instructions appear in just about every language:
789
790
\begin{description}
791
792
\item[input:] Get data from the keyboard, a file, the network, or some
793
other device.
794
795
\item[output:] Display data on the screen, save it in a
796
file, send it over the network, etc.
797
798
\item[math:] Perform basic mathematical operations like addition and
799
multiplication.
800
801
\item[conditional execution:] Check for certain conditions and
802
run the appropriate code.
803
804
\item[repetition:] Perform some action repeatedly, usually with
805
some variation.
806
807
\end{description}
808
809
Believe it or not, that's pretty much all there is to it. Every
810
program you've ever used, no matter how complicated, is made up of
811
instructions that look pretty much like these. So you can think of
812
programming as the process of breaking a large, complex task
813
into smaller and smaller subtasks until the subtasks are
814
simple enough to be performed with one of these basic instructions.
815
816
817
\section{Running Python}
818
819
One of the challenges of getting started with Python is that you
820
might have to install Python and related software on your computer.
821
If you are familiar with your operating system, and especially
822
if you are comfortable with the command-line interface, you will
823
have no trouble installing Python. But for beginners, it can be
824
painful to learn about system administration and programming at the
825
same time.
826
\index{running Python}
827
\index{Python!running}
828
829
To avoid that problem, I recommend that you start out running Python
830
in a browser. Later, when you are comfortable with Python, I'll
831
make suggestions for installing Python on your computer.
832
\index{Python in a browser}
833
834
There are a number of web pages you can use to run Python. If you
835
already have a favorite, go ahead and use it. Otherwise I recommend
836
PythonAnywhere. I provide detailed instructions for getting started
837
at \url{http://tinyurl.com/thinkpython2e}.
838
\index{PythonAnywhere}
839
840
There are two versions of Python, called Python 2 and Python 3.
841
They are very similar, so if you learn one, it is easy to switch
842
to the other. In fact, there are only a few differences you will
843
encounter as a beginner.
844
This book is written for Python 3, but I include some notes
845
about Python 2.
846
\index{Python 2}
847
848
The Python {\bf interpreter} is a program that reads and executes
849
Python code. Depending on your environment, you might start the
850
interpreter by clicking on an icon, or by typing {\tt python} on
851
a command line.
852
When it starts, you should see output like this:
853
\index{interpreter}
854
855
\begin{verbatim}
856
Python 3.4.0 (default, Jun 19 2015, 14:20:21)
857
[GCC 4.8.2] on linux
858
Type "help", "copyright", "credits" or "license" for more information.
859
>>>
860
\end{verbatim}
861
%
862
The first three lines contain information about the interpreter
863
and the operating system it's running on, so it might be different for
864
you. But you should check that the version number, which is
865
{\tt 3.4.0} in this example, begins with 3, which indicates that
866
you are running Python 3. If it begins with 2, you are running
867
(you guessed it) Python 2.
868
869
The last line is a {\bf prompt} that indicates that the interpreter is
870
ready for you to enter code.
871
If you type a line of code and hit Enter, the interpreter displays the
872
result:
873
\index{prompt}
874
875
\begin{verbatim}
876
>>> 1 + 1
877
2
878
\end{verbatim}
879
%
880
Now you're ready to get started.
881
From here on, I assume that you know how to start the Python
882
interpreter and run code.
883
884
885
\section{The first program}
886
\label{hello}
887
\index{Hello, World}
888
889
Traditionally, the first program you write in a new language
890
is called ``Hello, World!'' because all it does is display the
891
words ``Hello, World!''. In Python, it looks like this:
892
893
\begin{verbatim}
894
>>> print('Hello, World!')
895
\end{verbatim}
896
%
897
This is an example of a {\bf print statement}, although it
898
doesn't actually print anything on paper. It displays a result on the
899
screen. In this case, the result is the words
900
901
\begin{verbatim}
902
Hello, World!
903
\end{verbatim}
904
%
905
The quotation marks in the program mark the beginning and end
906
of the text to be displayed; they don't appear in the result.
907
\index{quotation mark}
908
\index{print statement}
909
\index{statement!print}
910
911
The parentheses indicate that {\tt print} is a function. We'll get
912
to functions in Chapter~\ref{funcchap}.
913
\index{function} \index{print function}
914
915
In Python 2, the print statement is slightly different; it is not
916
a function, so it doesn't use parentheses.
917
\index{Python 2}
918
919
\begin{verbatim}
920
>>> print 'Hello, World!'
921
\end{verbatim}
922
%
923
This distinction will make more sense soon, but that's enough to
924
get started.
925
926
927
\section{Arithmetic operators}
928
\index{operator!arithmetic}
929
\index{arithmetic operator}
930
931
After ``Hello, World'', the next step is arithmetic. Python provides
932
{\bf operators}, which are special symbols that represent computations
933
like addition and multiplication.
934
935
The operators {\tt +}, {\tt -}, and {\tt *} perform addition,
936
subtraction, and multiplication, as in the following examples:
937
938
\begin{verbatim}
939
>>> 40 + 2
940
42
941
>>> 43 - 1
942
42
943
>>> 6 * 7
944
42
945
\end{verbatim}
946
%
947
The operator {\tt /} performs division:
948
949
\begin{verbatim}
950
>>> 84 / 2
951
42.0
952
\end{verbatim}
953
%
954
You might wonder why the result is {\tt 42.0} instead of {\tt 42}.
955
I'll explain in the next section.
956
957
Finally, the operator {\tt **} performs exponentiation; that is,
958
it raises a number to a power:
959
960
\begin{verbatim}
961
>>> 6**2 + 6
962
42
963
\end{verbatim}
964
%
965
In some other languages, \verb"^" is used for exponentiation, but
966
in Python it is a bitwise operator called XOR. If you are not
967
familiar with bitwise operators, the result will surprise you:
968
969
\begin{verbatim}
970
>>> 6 ^ 2
971
4
972
\end{verbatim}
973
%
974
I won't cover
975
bitwise operators in this book, but you can read about
976
them at \url{http://wiki.python.org/moin/BitwiseOperators}.
977
\index{bitwise operator}
978
\index{operator!bitwise}
979
980
981
\section{Values and types}
982
\index{value}
983
\index{type}
984
\index{string}
985
986
A {\bf value} is one of the basic things a program works with, like a
987
letter or a number. Some values we have seen so far are {\tt 2},
988
{\tt 42.0}, and \verb"'Hello, World!'".
989
990
These values belong to different {\bf types}:
991
{\tt 2} is an {\bf integer}, {\tt 42.0} is a {\bf floating-point number},
992
and \verb"'Hello, World!'" is a {\bf string},
993
so-called because the letters it contains are strung together.
994
\index{integer}
995
\index{floating-point}
996
997
If you are not sure what type a value has, the interpreter can
998
tell you:
999
1000
\begin{verbatim}
1001
>>> type(2)
1002
<class 'int'>
1003
>>> type(42.0)
1004
<class 'float'>
1005
>>> type('Hello, World!')
1006
<class 'str'>
1007
\end{verbatim}
1008
%
1009
In these results, the word ``class'' is used in the sense of
1010
a category; a type is a category of values.
1011
\index{class}
1012
1013
Not surprisingly, integers belong to the type {\tt int},
1014
strings belong to {\tt str} and floating-point
1015
numbers belong to {\tt float}.
1016
\index{type}
1017
\index{string type}
1018
\index{type!str}
1019
\index{int type}
1020
\index{type!int}
1021
\index{float type}
1022
\index{type!float}
1023
1024
What about values like \verb"'2'" and \verb"'42.0'"?
1025
They look like numbers, but they are in quotation marks like
1026
strings.
1027
\index{quotation mark}
1028
1029
\begin{verbatim}
1030
>>> type('2')
1031
<class 'str'>
1032
>>> type('42.0')
1033
<class 'str'>
1034
\end{verbatim}
1035
%
1036
They're strings.
1037
1038
When you type a large integer, you might be tempted to use commas
1039
between groups of digits, as in {\tt 1,000,000}. This is not a
1040
legal {\em integer} in Python, but it is legal:
1041
1042
\begin{verbatim}
1043
>>> 1,000,000
1044
(1, 0, 0)
1045
\end{verbatim}
1046
%
1047
That's not what we expected at all! Python interprets {\tt
1048
1,000,000} as a comma-separated sequence of integers. We'll learn
1049
more about this kind of sequence later.
1050
\index{sequence}
1051
1052
%This is the first example we have seen of a semantic error: the code
1053
%runs without producing an error message, but it doesn't do the
1054
%``right'' thing.
1055
%\index{semantic error}
1056
%\index{error!semantic}
1057
%\index{error message}
1058
% TODO: use this as an example of a semantic error later
1059
1060
1061
1062
\section{Formal and natural languages}
1063
\index{formal language}
1064
\index{natural language}
1065
\index{language!formal}
1066
\index{language!natural}
1067
1068
{\bf Natural languages} are the languages people speak,
1069
such as English, Spanish, and French. They were not designed
1070
by people (although people try to impose some order on them);
1071
they evolved naturally.
1072
1073
{\bf Formal languages} are languages that are designed by people for
1074
specific applications. For example, the notation that mathematicians
1075
use is a formal language that is particularly good at denoting
1076
relationships among numbers and symbols. Chemists use a formal
1077
language to represent the chemical structure of molecules. And
1078
most importantly:
1079
1080
\begin{quote}
1081
{\bf Programming languages are formal languages that have been
1082
designed to express computations.}
1083
\end{quote}
1084
1085
Formal languages tend to have strict {\bf syntax} rules that
1086
govern the structure of statements.
1087
For example, in mathematics the statement
1088
$3 + 3 = 6$ has correct syntax, but
1089
$3 + = 3 \$ 6$ does not. In chemistry
1090
$H_2O$ is a syntactically correct formula, but $_2Zz$ is not.
1091
\index{syntax}
1092
1093
Syntax rules come in two flavors, pertaining to {\bf tokens} and
1094
structure. Tokens are the basic elements of the language, such as
1095
words, numbers, and chemical elements. One of the problems with
1096
$3 += 3 \$ 6$ is that \( \$ \) is not a legal token in mathematics
1097
(at least as far as I know). Similarly, $_2Zz$ is not legal because
1098
there is no element with the abbreviation $Zz$.
1099
\index{token}
1100
\index{structure}
1101
1102
The second type of syntax rule pertains to the way tokens are
1103
combined. The equation $3 +/ 3$ is illegal because even though $+$
1104
and $/$ are legal tokens, you can't have one right after the other.
1105
Similarly, in a chemical formula the subscript comes after the element
1106
name, not before.
1107
1108
This is @ well-structured Engli\$h
1109
sentence with invalid t*kens in it. This sentence all valid tokens
1110
has, but invalid structure with.
1111
1112
When you read a sentence in English or a statement in a formal
1113
language, you have to figure out the structure
1114
(although in a natural language you do this subconsciously). This
1115
process is called {\bf parsing}.
1116
\index{parse}
1117
1118
Although formal and natural languages have many features in
1119
common---tokens, structure, and syntax---there are some
1120
differences:
1121
\index{ambiguity}
1122
\index{redundancy}
1123
\index{literalness}
1124
1125
\begin{description}
1126
1127
\item[ambiguity:] Natural languages are full of ambiguity, which
1128
people deal with by using contextual clues and other information.
1129
Formal languages are designed to be nearly or completely unambiguous,
1130
which means that any statement has exactly one meaning,
1131
regardless of context.
1132
1133
\item[redundancy:] In order to make up for ambiguity and reduce
1134
misunderstandings, natural languages employ lots of
1135
redundancy. As a result, they are often verbose. Formal languages
1136
are less redundant and more concise.
1137
1138
\item[literalness:] Natural languages are full of idiom and metaphor.
1139
If I say, ``The penny dropped'', there is probably no penny and
1140
nothing dropping (this idiom means that someone understood something
1141
after a period of confusion). Formal languages
1142
mean exactly what they say.
1143
1144
\end{description}
1145
1146
Because we all grow up speaking natural languages, it is sometimes
1147
hard to adjust to formal languages. The difference between formal and
1148
natural language is like the difference between poetry and prose, but
1149
more so: \index{poetry} \index{prose}
1150
1151
\begin{description}
1152
1153
\item[Poetry:] Words are used for their sounds as well as for
1154
their meaning, and the whole poem together creates an effect or
1155
emotional response. Ambiguity is not only common but often
1156
deliberate.
1157
1158
\item[Prose:] The literal meaning of words is more important,
1159
and the structure contributes more meaning. Prose is more amenable to
1160
analysis than poetry but still often ambiguous.
1161
1162
\item[Programs:] The meaning of a computer program is unambiguous
1163
and literal, and can be understood entirely by analysis of the
1164
tokens and structure.
1165
1166
\end{description}
1167
1168
Formal languages are more dense
1169
than natural languages, so it takes longer to read them. Also, the
1170
structure is important, so it is not always best to read
1171
from top to bottom, left to right. Instead, learn to parse the
1172
program in your head, identifying the tokens and interpreting the
1173
structure. Finally, the details matter. Small errors in
1174
spelling and punctuation, which you can get away
1175
with in natural languages, can make a big difference in a formal
1176
language.
1177
1178
1179
\section{Debugging}
1180
\index{debugging}
1181
1182
Programmers make mistakes. For whimsical reasons, programming errors
1183
are called {\bf bugs} and the process of tracking them down is called
1184
{\bf debugging}.
1185
\index{debugging}
1186
\index{bug}
1187
1188
Programming, and especially debugging, sometimes brings out strong
1189
emotions. If you are struggling with a difficult bug, you might
1190
feel angry, despondent, or embarrassed.
1191
1192
There is evidence that people naturally respond to computers as if
1193
they were people. When they work well, we think
1194
of them as teammates, and when they are obstinate or rude, we
1195
respond to them the same way we respond to rude,
1196
obstinate people (Reeves and Nass, {\it The Media
1197
Equation: How People Treat Computers, Television, and New Media
1198
Like Real People and Places}).
1199
\index{debugging!emotional response}
1200
\index{emotional debugging}
1201
1202
Preparing for these reactions might help you deal with them.
1203
One approach is to think of the computer as an employee with
1204
certain strengths, like speed and precision, and
1205
particular weaknesses, like lack of empathy and inability
1206
to grasp the big picture.
1207
1208
Your job is to be a good manager: find ways to take advantage
1209
of the strengths and mitigate the weaknesses. And find ways
1210
to use your emotions to engage with the problem,
1211
without letting your reactions interfere with your ability
1212
to work effectively.
1213
1214
Learning to debug can be frustrating, but it is a valuable skill
1215
that is useful for many activities beyond programming. At the
1216
end of each chapter there is a section, like this one,
1217
with my suggestions for debugging. I hope they help!
1218
1219
1220
\section{Glossary}
1221
1222
\begin{description}
1223
1224
\item[problem solving:] The process of formulating a problem, finding
1225
a solution, and expressing it.
1226
\index{problem solving}
1227
1228
\item[high-level language:] A programming language like Python that
1229
is designed to be easy for humans to read and write.
1230
\index{high-level language}
1231
1232
\item[low-level language:] A programming language that is designed
1233
to be easy for a computer to run; also called ``machine language'' or
1234
``assembly language''.
1235
\index{low-level language}
1236
1237
\item[portability:] A property of a program that can run on more
1238
than one kind of computer.
1239
\index{portability}
1240
1241
\item[interpreter:] A program that reads another program and executes
1242
it
1243
\index{interpret}
1244
1245
\item[prompt:] Characters displayed by the interpreter to indicate
1246
that it is ready to take input from the user.
1247
\index{prompt}
1248
1249
\item[program:] A set of instructions that specifies a computation.
1250
\index{program}
1251
1252
\item[print statement:] An instruction that causes the Python
1253
interpreter to display a value on the screen.
1254
\index{print statement}
1255
\index{statement!print}
1256
1257
\item[operator:] A special symbol that represents a simple computation like
1258
addition, multiplication, or string concatenation.
1259
\index{operator}
1260
1261
\item[value:] One of the basic units of data, like a number or string,
1262
that a program manipulates.
1263
\index{value}
1264
1265
\item[type:] A category of values. The types we have seen so far
1266
are integers (type {\tt int}), floating-point numbers (type {\tt
1267
float}), and strings (type {\tt str}).
1268
\index{type}
1269
1270
\item[integer:] A type that represents whole numbers.
1271
\index{integer}
1272
1273
\item[floating-point:] A type that represents numbers with fractional
1274
parts.
1275
\index{floating-point}
1276
1277
\item[string:] A type that represents sequences of characters.
1278
\index{string}
1279
1280
\item[natural language:] Any one of the languages that people speak that
1281
evolved naturally.
1282
\index{natural language}
1283
1284
\item[formal language:] Any one of the languages that people have designed
1285
for specific purposes, such as representing mathematical ideas or
1286
computer programs; all programming languages are formal languages.
1287
\index{formal language}
1288
1289
\item[token:] One of the basic elements of the syntactic structure of
1290
a program, analogous to a word in a natural language.
1291
\index{token}
1292
1293
\item[syntax:] The rules that govern the structure of a program.
1294
\index{syntax}
1295
1296
\item[parse:] To examine a program and analyze the syntactic structure.
1297
\index{parse}
1298
1299
\item[bug:] An error in a program.
1300
\index{bug}
1301
1302
\item[debugging:] The process of finding and correcting bugs.
1303
\index{debugging}
1304
1305
\end{description}
1306
1307
1308
\section{Exercises}
1309
1310
\begin{exercise}
1311
1312
It is a good idea to read this book in front of a computer so you can
1313
try out the examples as you go.
1314
1315
Whenever you are experimenting with a new feature, you should try
1316
to make mistakes. For example, in the ``Hello, world!'' program,
1317
what happens if you leave out one of the quotation marks? What
1318
if you leave out both? What if you spell {\tt print} wrong?
1319
\index{error message}
1320
1321
This kind of experiment helps you remember what you read; it also
1322
helps when you are programming, because you get to know what the error
1323
messages mean. It is better to make mistakes now and on purpose than
1324
later and accidentally.
1325
1326
\begin{enumerate}
1327
1328
\item In a print statement, what happens if you leave out one
1329
of the parentheses, or both?
1330
1331
\item If you are trying to print a string, what happens if you
1332
leave out one of the quotation marks, or both?
1333
1334
\item You can use a minus sign to make a negative number like
1335
{\tt -2}. What happens if you put a plus sign before a number?
1336
What about {\tt 2++2}?
1337
1338
\item In math notation, leading zeros are ok, as in {\tt 09}.
1339
What happens if you try this in Python? What about {\tt 011}?
1340
1341
\item What happens if you have two values with no operator
1342
between them?
1343
1344
\end{enumerate}
1345
1346
\end{exercise}
1347
1348
1349
1350
\begin{exercise}
1351
1352
Start the Python interpreter and use it as a calculator.
1353
1354
\begin{enumerate}
1355
1356
\item How many seconds are there in 42 minutes 42 seconds?
1357
1358
\item How many miles are there in 10 kilometers? Hint: there are 1.61
1359
kilometers in a mile.
1360
1361
\item If you run a 10 kilometer race in 42 minutes 42 seconds, what is
1362
your average pace (time per mile in minutes and seconds)? What is
1363
your average speed in miles per hour?
1364
1365
\index{calculator}
1366
\index{running pace}
1367
1368
\end{enumerate}
1369
1370
\end{exercise}
1371
1372
1373
1374
1375
\chapter{Variables, expressions and statements}
1376
1377
One of the most powerful features of a programming language is the
1378
ability to manipulate {\bf variables}. A variable is a name that
1379
refers to a value.
1380
\index{variable}
1381
1382
1383
\section{Assignment statements}
1384
\label{variables}
1385
\index{assignment statement}
1386
\index{statement!assignment}
1387
1388
An {\bf assignment statement} creates a new variable and gives
1389
it a value:
1390
1391
\begin{verbatim}
1392
>>> message = 'And now for something completely different'
1393
>>> n = 17
1394
>>> pi = 3.1415926535897932
1395
\end{verbatim}
1396
%
1397
This example makes three assignments. The first assigns a string
1398
to a new variable named {\tt message};
1399
the second gives the integer {\tt 17} to {\tt n}; the third
1400
assigns the (approximate) value of $\pi$ to {\tt pi}.
1401
\index{state diagram}
1402
\index{diagram!state}
1403
1404
A common way to represent variables on paper is to write the name with
1405
an arrow pointing to its value. This kind of figure is
1406
called a {\bf state diagram} because it shows what state each of the
1407
variables is in (think of it as the variable's state of mind).
1408
Figure~\ref{fig.state2} shows the result of the previous example.
1409
1410
\begin{figure}
1411
\centerline
1412
{\includegraphics[scale=0.8]{figs/state2.pdf}}
1413
\caption{State diagram.}
1414
\label{fig.state2}
1415
\end{figure}
1416
1417
1418
1419
\section{Variable names}
1420
\index{variable}
1421
1422
Programmers generally choose names for their variables that
1423
are meaningful---they document what the variable is used for.
1424
1425
Variable names can be as long as you like. They can contain
1426
both letters and numbers, but they can't begin with a number.
1427
It is legal to use uppercase letters, but it is conventional
1428
to use only lower case for variables names.
1429
1430
The underscore character, \verb"_", can appear in a name.
1431
It is often used in names with multiple words, such as
1432
\verb"your_name" or \verb"airspeed_of_unladen_swallow".
1433
\index{underscore character}
1434
1435
If you give a variable an illegal name, you get a syntax error:
1436
1437
\begin{verbatim}
1438
>>> 76trombones = 'big parade'
1439
SyntaxError: invalid syntax
1440
>>> more@ = 1000000
1441
SyntaxError: invalid syntax
1442
>>> class = 'Advanced Theoretical Zymurgy'
1443
SyntaxError: invalid syntax
1444
\end{verbatim}
1445
%
1446
{\tt 76trombones} is illegal because it begins with a number.
1447
{\tt more@} is illegal because it contains an illegal character, {\tt
1448
@}. But what's wrong with {\tt class}?
1449
1450
It turns out that {\tt class} is one of Python's {\bf keywords}. The
1451
interpreter uses keywords to recognize the structure of the program,
1452
and they cannot be used as variable names.
1453
\index{keyword}
1454
1455
Python 3 has these keywords:
1456
1457
\begin{verbatim}
1458
False class finally is return
1459
None continue for lambda try
1460
True def from nonlocal while
1461
and del global not with
1462
as elif if or yield
1463
assert else import pass
1464
break except in raise
1465
\end{verbatim}
1466
%
1467
You don't have to memorize this list. In most development environments,
1468
keywords are displayed in a different color; if you try to use one
1469
as a variable name, you'll know.
1470
1471
1472
\section{Expressions and statements}
1473
1474
An {\bf expression} is a combination of values, variables, and operators.
1475
A value all by itself is considered an expression, and so is
1476
a variable, so the following are all legal expressions:
1477
\index{expression}
1478
1479
\begin{verbatim}
1480
>>> 42
1481
42
1482
>>> n
1483
17
1484
>>> n + 25
1485
42
1486
\end{verbatim}
1487
%
1488
When you type an expression at the prompt, the interpreter
1489
{\bf evaluates} it, which means that it finds the value of
1490
the expression.
1491
In this example, {\tt n} has the value 17 and
1492
{\tt n + 25} has the value 42.
1493
\index{evaluate}
1494
1495
A {\bf statement} is a unit of code that has an effect, like
1496
creating a variable or displaying a value.
1497
\index{statement}
1498
1499
\begin{verbatim}
1500
>>> n = 17
1501
>>> print(n)
1502
\end{verbatim}
1503
%
1504
The first line is an assignment statement that gives a value to
1505
{\tt n}. The second line is a print statement that displays the
1506
value of {\tt n}.
1507
1508
When you type a statement, the interpreter {\bf executes} it,
1509
which means that it does whatever the statement says. In general,
1510
statements don't have values.
1511
\index{execute}
1512
1513
1514
\section{Script mode}
1515
1516
So far we have run Python in {\bf interactive mode}, which
1517
means that you interact directly with the interpreter.
1518
Interactive mode is a good way to get started,
1519
but if you are working with more than a few lines of code, it can be
1520
clumsy.
1521
\index{interactive mode}
1522
1523
The alternative is to save code in a file called a {\bf script} and
1524
then run the interpreter in {\bf script mode} to execute the script. By
1525
convention, Python scripts have names that end with {\tt .py}.
1526
\index{script}
1527
\index{script mode}
1528
1529
If you know how to create and run a script on your computer, you
1530
are ready to go. Otherwise I recommend using PythonAnywhere again.
1531
I have posted instructions for running in script mode at
1532
\url{http://tinyurl.com/thinkpython2e}.
1533
1534
Because Python provides both modes,
1535
you can test bits of code in interactive mode before you put them
1536
in a script. But there are differences between interactive mode
1537
and script mode that can be confusing.
1538
\index{interactive mode}
1539
\index{script mode}
1540
1541
For example, if you are using Python as a calculator, you might type
1542
1543
\begin{verbatim}
1544
>>> miles = 26.2
1545
>>> miles * 1.61
1546
42.182
1547
\end{verbatim}
1548
1549
The first line assigns a value to {\tt miles}, but it has no visible
1550
effect. The second line is an expression, so the
1551
interpreter evaluates it and displays the result. It turns out that a
1552
marathon is about 42 kilometers.
1553
1554
But if you type the same code into a script and run it, you get no
1555
output at all.
1556
In script mode an expression, all by itself, has no
1557
visible effect. Python evaluates the expression, but it doesn't
1558
display the result.
1559
To display the result, you need a {\tt print} statement like this:
1560
1561
\begin{verbatim}
1562
miles = 26.2
1563
print(miles * 1.61)
1564
\end{verbatim}
1565
1566
This behavior can be confusing at first.
1567
To check your understanding, type the following statements in the
1568
Python interpreter and see what they do:
1569
1570
\begin{verbatim}
1571
5
1572
x = 5
1573
x + 1
1574
\end{verbatim}
1575
1576
Now put the same statements in a script and run it. What
1577
is the output? Modify the script by transforming each
1578
expression into a print statement and then run it again.
1579
1580
1581
\section{Order of operations}
1582
\index{order of operations}
1583
\index{PEMDAS}
1584
1585
When an expression contains more than one operator, the order of
1586
evaluation depends on the {\bf order of operations}. For
1587
mathematical operators, Python follows mathematical convention.
1588
The acronym {\bf PEMDAS} is a useful way to
1589
remember the rules:
1590
1591
\begin{itemize}
1592
1593
\item {\bf P}arentheses have the highest precedence and can be used
1594
to force an expression to evaluate in the order you want. Since
1595
expressions in parentheses are evaluated first, {\tt 2 * (3-1)} is 4,
1596
and {\tt (1+1)**(5-2)} is 8. You can also use parentheses to make an
1597
expression easier to read, as in {\tt (minute * 100) / 60}, even
1598
if it doesn't change the result.
1599
1600
\item {\bf E}xponentiation has the next highest precedence, so
1601
{\tt 1 + 2**3} is 9, not 27, and {\tt 2 * 3**2} is 18, not 36.
1602
1603
\item {\bf M}ultiplication and {\bf D}ivision have higher precedence
1604
than {\bf A}ddition and {\bf S}ubtraction. So {\tt 2*3-1} is 5, not
1605
4, and {\tt 6+4/2} is 8, not 5.
1606
1607
\item Operators with the same precedence are evaluated from left to
1608
right (except exponentiation). So in the expression {\tt degrees /
1609
2 * pi}, the division happens first and the result is multiplied
1610
by {\tt pi}. To divide by $2 \pi$, you can use parentheses or write
1611
{\tt degrees / 2 / pi}.
1612
1613
\end{itemize}
1614
1615
I don't work very hard to remember the precedence of
1616
operators. If I can't tell by looking at the expression, I use
1617
parentheses to make it obvious.
1618
1619
1620
\section{String operations}
1621
\index{string!operation}
1622
\index{operator!string}
1623
1624
In general, you can't perform mathematical operations on strings, even
1625
if the strings look like numbers, so the following are illegal:
1626
1627
\begin{verbatim}
1628
'chinese'-'food' 'eggs'/'easy' 'third'*'a charm'
1629
\end{verbatim}
1630
%
1631
But there are two exceptions, {\tt +} and {\tt *}.
1632
1633
The {\tt +} operator performs {\bf string concatenation}, which means
1634
it joins the strings by linking them end-to-end. For example:
1635
\index{concatenation}
1636
1637
\begin{verbatim}
1638
>>> first = 'throat'
1639
>>> second = 'warbler'
1640
>>> first + second
1641
throatwarbler
1642
\end{verbatim}
1643
%
1644
The {\tt *} operator also works on strings; it performs repetition.
1645
For example, \verb"'Spam'*3" is \verb"'SpamSpamSpam'". If one of the
1646
values is a string, the other has to be an integer.
1647
1648
This use of {\tt +} and {\tt *} makes sense by
1649
analogy with addition and multiplication. Just as {\tt 4*3} is
1650
equivalent to {\tt 4+4+4}, we expect \verb"'Spam'*3" to be the same as
1651
\verb"'Spam'+'Spam'+'Spam'", and it is. On the other hand, there is a
1652
significant way in which string concatenation and repetition are
1653
different from integer addition and multiplication.
1654
Can you think of a property that addition has
1655
that string concatenation does not?
1656
\index{commutativity}
1657
1658
1659
\section{Comments}
1660
\index{comment}
1661
1662
As programs get bigger and more complicated, they get more difficult
1663
to read. Formal languages are dense, and it is often difficult to
1664
look at a piece of code and figure out what it is doing, or why.
1665
1666
For this reason, it is a good idea to add notes to your programs to explain
1667
in natural language what the program is doing. These notes are called
1668
{\bf comments}, and they start with the \verb"#" symbol:
1669
1670
\begin{verbatim}
1671
# compute the percentage of the hour that has elapsed
1672
percentage = (minute * 100) / 60
1673
\end{verbatim}
1674
%
1675
In this case, the comment appears on a line by itself. You can also put
1676
comments at the end of a line:
1677
1678
\begin{verbatim}
1679
percentage = (minute * 100) / 60 # percentage of an hour
1680
\end{verbatim}
1681
%
1682
Everything from the {\tt \#} to the end of the line is ignored---it
1683
has no effect on the execution of the program.
1684
1685
Comments are most useful when they document non-obvious features of
1686
the code. It is reasonable to assume that the reader can figure out
1687
{\em what} the code does; it is more useful to explain {\em why}.
1688
1689
This comment is redundant with the code and useless:
1690
1691
\begin{verbatim}
1692
v = 5 # assign 5 to v
1693
\end{verbatim}
1694
%
1695
This comment contains useful information that is not in the code:
1696
1697
\begin{verbatim}
1698
v = 5 # velocity in meters/second.
1699
\end{verbatim}
1700
%
1701
Good variable names can reduce the need for comments, but
1702
long names can make complex expressions hard to read, so there is
1703
a tradeoff.
1704
1705
1706
\section{Debugging}
1707
\index{debugging}
1708
\index{bug}
1709
1710
Three kinds of errors can occur in a program: syntax errors, runtime
1711
errors, and semantic errors. It is useful
1712
to distinguish between them in order to track them down more quickly.
1713
1714
\begin{description}
1715
1716
\item[Syntax error:] ``Syntax'' refers to the structure of a program
1717
and the rules about that structure. For example, parentheses have
1718
to come in matching pairs, so {\tt (1 + 2)} is legal, but {\tt 8)}
1719
is a {\bf syntax error}. \index{syntax error} \index{error!syntax}
1720
\index{error message}
1721
\index{syntax}
1722
1723
If there is a syntax error
1724
anywhere in your program, Python displays an error message and quits,
1725
and you will not be able to run the program. During the first few
1726
weeks of your programming career, you might spend a lot of
1727
time tracking down syntax errors. As you gain experience, you will
1728
make fewer errors and find them faster.
1729
1730
1731
\item[Runtime error:] The second type of error is a runtime error, so
1732
called because the error does not appear until after the program has
1733
started running. These errors are also called {\bf exceptions}
1734
because they usually indicate that something exceptional (and bad)
1735
has happened. \index{runtime error} \index{error!runtime}
1736
\index{exception} \index{safe language} \index{language!safe}
1737
1738
Runtime errors are rare in the simple programs you will see in the
1739
first few chapters, so it might be a while before you encounter one.
1740
1741
1742
\item[Semantic error:] The third type of error is ``semantic'', which
1743
means related to meaning. If there is a semantic error in your
1744
program, it will run without generating error messages, but it will
1745
not do the right thing. It will do something else. Specifically,
1746
it will do what you told it to do. \index{semantic error}
1747
\index{error!semantic} \index{error message}
1748
1749
Identifying semantic errors can be tricky because it requires you to work
1750
backward by looking at the output of the program and trying to figure
1751
out what it is doing.
1752
1753
\end{description}
1754
1755
1756
\section{Glossary}
1757
1758
\begin{description}
1759
1760
\item[variable:] A name that refers to a value.
1761
\index{variable}
1762
1763
\item[assignment:] A statement that assigns a value to a variable.
1764
\index{assignment}
1765
1766
\item[state diagram:] A graphical representation of a set of variables and the
1767
values they refer to.
1768
\index{state diagram}
1769
1770
\item[keyword:] A reserved word that is used to parse a
1771
program; you cannot use keywords like {\tt if}, {\tt def}, and {\tt while} as
1772
variable names.
1773
\index{keyword}
1774
1775
\item[operand:] One of the values on which an operator operates.
1776
\index{operand}
1777
1778
\item[expression:] A combination of variables, operators, and values that
1779
represents a single result.
1780
\index{expression}
1781
1782
\item[evaluate:] To simplify an expression by performing the operations
1783
in order to yield a single value.
1784
1785
\item[statement:] A section of code that represents a command or action. So
1786
far, the statements we have seen are assignments and print statements.
1787
\index{statement}
1788
1789
\item[execute:] To run a statement and do what it says.
1790
\index{execute}
1791
1792
\item[interactive mode:] A way of using the Python interpreter by
1793
typing code at the prompt.
1794
\index{interactive mode}
1795
1796
\item[script mode:] A way of using the Python interpreter to read
1797
code from a script and run it.
1798
\index{script mode}
1799
1800
\item[script:] A program stored in a file.
1801
\index{script}
1802
1803
\item[order of operations:] Rules governing the order in which
1804
expressions involving multiple operators and operands are evaluated.
1805
\index{order of operations}
1806
1807
\item[concatenate:] To join two operands end-to-end.
1808
\index{concatenation}
1809
1810
\item[comment:] Information in a program that is meant for other
1811
programmers (or anyone reading the source code) and has no effect on the
1812
execution of the program.
1813
\index{comment}
1814
1815
\item[syntax error:] An error in a program that makes it impossible
1816
to parse (and therefore impossible to interpret).
1817
\index{syntax error}
1818
1819
\item[exception:] An error that is detected while the program is running.
1820
\index{exception}
1821
1822
\item[semantics:] The meaning of a program.
1823
\index{semantics}
1824
1825
\item[semantic error:] An error in a program that makes it do something
1826
other than what the programmer intended.
1827
\index{semantic error}
1828
1829
\end{description}
1830
1831
1832
\section{Exercises}
1833
1834
\begin{exercise}
1835
1836
Repeating my advice from the previous chapter, whenever you learn
1837
a new feature, you should try it out in interactive mode and make
1838
errors on purpose to see what goes wrong.
1839
1840
\begin{itemize}
1841
1842
\item We've seen that {\tt n = 42} is legal. What about {\tt 42 = n}?
1843
1844
\item How about {\tt x = y = 1}?
1845
1846
\item In some languages every statement ends with a semi-colon, {\tt ;}.
1847
What happens if you put a semi-colon at the end of a Python statement?
1848
1849
\item What if you put a period at the end of a statement?
1850
1851
\item In math notation you can multiply $x$ and $y$ like this: $x y$.
1852
What happens if you try that in Python?
1853
1854
\end{itemize}
1855
1856
\end{exercise}
1857
1858
1859
\begin{exercise}
1860
1861
Practice using the Python interpreter as a calculator:
1862
\index{calculator}
1863
1864
\begin{enumerate}
1865
1866
\item The volume of a sphere with radius $r$ is $\frac{4}{3} \pi r^3$.
1867
What is the volume of a sphere with radius 5?
1868
1869
\item Suppose the cover price of a book is \$24.95, but bookstores get a
1870
40\% discount. Shipping costs \$3 for the first copy and 75 cents
1871
for each additional copy. What is the total wholesale cost for
1872
60 copies?
1873
1874
\item If I leave my house at 6:52 am and run 1 mile at an easy pace
1875
(8:15 per mile), then 3 miles at tempo (7:12 per mile) and 1 mile at
1876
easy pace again, what time do I get home for breakfast?
1877
\index{running pace}
1878
1879
\end{enumerate}
1880
\end{exercise}
1881
1882
1883
\chapter{Functions}
1884
\label{funcchap}
1885
1886
In the context of programming, a {\bf function} is a named sequence of
1887
statements that performs a computation. When you define a function,
1888
you specify the name and the sequence of statements. Later, you can
1889
``call'' the function by name.
1890
\index{function}
1891
1892
\section{Function calls}
1893
\label{functionchap}
1894
\index{function call}
1895
1896
We have already seen one example of a {\bf function call}:
1897
1898
\begin{verbatim}
1899
>>> type(42)
1900
<class 'int'>
1901
\end{verbatim}
1902
%
1903
The name of the function is {\tt type}. The expression in parentheses
1904
is called the {\bf argument} of the function. The result, for this
1905
function, is the type of the argument.
1906
\index{parentheses!argument in}
1907
1908
It is common to say that a function ``takes'' an argument and ``returns''
1909
a result. The result is also called the {\bf return value}.
1910
\index{argument}
1911
\index{return value}
1912
1913
Python provides functions that convert values
1914
from one type to another. The {\tt int} function takes any value and
1915
converts it to an integer, if it can, or complains otherwise:
1916
\index{conversion!type}
1917
\index{type conversion}
1918
\index{int function}
1919
\index{function!int}
1920
1921
\begin{verbatim}
1922
>>> int('32')
1923
32
1924
>>> int('Hello')
1925
ValueError: invalid literal for int(): Hello
1926
\end{verbatim}
1927
%
1928
{\tt int} can convert floating-point values to integers, but it
1929
doesn't round off; it chops off the fraction part:
1930
1931
\begin{verbatim}
1932
>>> int(3.99999)
1933
3
1934
>>> int(-2.3)
1935
-2
1936
\end{verbatim}
1937
%
1938
{\tt float} converts integers and strings to floating-point
1939
numbers:
1940
\index{float function}
1941
\index{function!float}
1942
1943
\begin{verbatim}
1944
>>> float(32)
1945
32.0
1946
>>> float('3.14159')
1947
3.14159
1948
\end{verbatim}
1949
%
1950
Finally, {\tt str} converts its argument to a string:
1951
\index{str function}
1952
\index{function!str}
1953
1954
\begin{verbatim}
1955
>>> str(32)
1956
'32'
1957
>>> str(3.14159)
1958
'3.14159'
1959
\end{verbatim}
1960
%
1961
1962
\section{Math functions}
1963
\index{math function}
1964
\index{function!math}
1965
1966
Python has a math module that provides most of the familiar
1967
mathematical functions. A {\bf module} is a file that contains a
1968
collection of related functions.
1969
\index{module}
1970
\index{module object}
1971
1972
Before we can use the functions in a module, we have to import it with
1973
an {\bf import statement}:
1974
1975
\begin{verbatim}
1976
>>> import math
1977
\end{verbatim}
1978
%
1979
This statement creates a {\bf module object} named math. If
1980
you display the module object, you get some information about it:
1981
1982
\begin{verbatim}
1983
>>> math
1984
<module 'math' (built-in)>
1985
\end{verbatim}
1986
%
1987
The module object contains the functions and variables defined in the
1988
module. To access one of the functions, you have to specify the name
1989
of the module and the name of the function, separated by a dot (also
1990
known as a period). This format is called {\bf dot notation}.
1991
\index{dot notation}
1992
1993
\begin{verbatim}
1994
>>> ratio = signal_power / noise_power
1995
>>> decibels = 10 * math.log10(ratio)
1996
1997
>>> radians = 0.7
1998
>>> height = math.sin(radians)
1999
\end{verbatim}
2000
%
2001
The first example uses \verb"math.log10" to compute
2002
a signal-to-noise ratio in decibels (assuming that \verb"signal_power" and
2003
\verb"noise_power" are defined). The math module also provides {\tt log},
2004
which computes logarithms base {\tt e}.
2005
\index{log function}
2006
\index{function!log}
2007
\index{sine function}
2008
\index{radian}
2009
\index{trigonometric function}
2010
\index{function!trigonometric}
2011
2012
The second example finds the sine of {\tt radians}. The variable name {\tt radians} is a hint that {\tt sin} and the other trigonometric
2013
functions ({\tt cos}, {\tt tan}, etc.) take arguments in radians. To
2014
convert from degrees to radians, divide by 180 and multiply by
2015
$\pi$:
2016
2017
\begin{verbatim}
2018
>>> degrees = 45
2019
>>> radians = degrees / 180.0 * math.pi
2020
>>> math.sin(radians)
2021
0.707106781187
2022
\end{verbatim}
2023
%
2024
The expression {\tt math.pi} gets the variable {\tt pi} from the math
2025
module. Its value is a floating-point approximation
2026
of $\pi$, accurate to about 15 digits.
2027
\index{pi}
2028
2029
If you know
2030
trigonometry, you can check the previous result by comparing it to
2031
the square root of two, divided by two:
2032
\index{sqrt function}
2033
\index{function!sqrt}
2034
2035
\begin{verbatim}
2036
>>> math.sqrt(2) / 2.0
2037
0.707106781187
2038
\end{verbatim}
2039
%
2040
2041
\section{Composition}
2042
\index{composition}
2043
2044
So far, we have looked at the elements of a program---variables,
2045
expressions, and statements---in isolation, without talking about how to
2046
combine them.
2047
2048
One of the most useful features of programming languages is their
2049
ability to take small building blocks and {\bf compose} them. For
2050
example, the argument of a function can be any kind of expression,
2051
including arithmetic operators:
2052
2053
\begin{verbatim}
2054
x = math.sin(degrees / 360.0 * 2 * math.pi)
2055
\end{verbatim}
2056
%
2057
And even function calls:
2058
2059
\begin{verbatim}
2060
x = math.exp(math.log(x+1))
2061
\end{verbatim}
2062
%
2063
Almost anywhere you can put a value, you can put an arbitrary
2064
expression, with one exception: the left side of an assignment
2065
statement has to be a variable name. Any other expression on the left
2066
side is a syntax error (we will see exceptions to this rule
2067
later).
2068
2069
\begin{verbatim}
2070
>>> minutes = hours * 60 # right
2071
>>> hours * 60 = minutes # wrong!
2072
SyntaxError: can't assign to operator
2073
\end{verbatim}
2074
%
2075
\index{SyntaxError}
2076
\index{exception!SyntaxError}
2077
2078
2079
\section{Adding new functions}
2080
2081
So far, we have only been using the functions that come with Python,
2082
but it is also possible to add new functions.
2083
A {\bf function definition} specifies the name of a new function and
2084
the sequence of statements that run when the function is called.
2085
\index{function}
2086
\index{function definition}
2087
\index{definition!function}
2088
2089
Here is an example:
2090
2091
\begin{verbatim}
2092
def print_lyrics():
2093
print("I'm a lumberjack, and I'm okay.")
2094
print("I sleep all night and I work all day.")
2095
\end{verbatim}
2096
%
2097
{\tt def} is a keyword that indicates that this is a function
2098
definition. The name of the function is \verb"print_lyrics". The
2099
rules for function names are the same as for variable names: letters,
2100
numbers and underscore are legal, but the first character
2101
can't be a number. You can't use a keyword as the name of a function,
2102
and you should avoid having a variable and a function with the same
2103
name.
2104
\index{def keyword}
2105
\index{keyword!def}
2106
\index{argument}
2107
2108
The empty parentheses after the name indicate that this function
2109
doesn't take any arguments.
2110
\index{parentheses!empty}
2111
\index{header}
2112
\index{body}
2113
\index{indentation}
2114
\index{colon}
2115
2116
The first line of the function definition is called the {\bf header};
2117
the rest is called the {\bf body}. The header has to end with a colon
2118
and the body has to be indented. By convention, indentation is
2119
always four spaces. The body can contain
2120
any number of statements.
2121
2122
The strings in the print statements are enclosed in double
2123
quotes. Single quotes and double quotes do the same thing;
2124
most people use single quotes except in cases like this where
2125
a single quote (which is also an apostrophe) appears in the string.
2126
2127
All quotation marks (single and double)
2128
must be ``straight quotes'', usually
2129
located next to Enter on the keyboard. ``Curly quotes'', like
2130
the ones in this sentence, are not legal in Python.
2131
2132
If you type a function definition in interactive mode, the interpreter
2133
prints dots ({\tt ...}) to let you know that the definition
2134
isn't complete:
2135
\index{ellipses}
2136
2137
\begin{verbatim}
2138
>>> def print_lyrics():
2139
... print("I'm a lumberjack, and I'm okay.")
2140
... print("I sleep all night and I work all day.")
2141
...
2142
\end{verbatim}
2143
%
2144
To end the function, you have to enter an empty line.
2145
2146
Defining a function creates a {\bf function object}, which
2147
has type \verb"function":
2148
\index{function type}
2149
\index{type!function}
2150
2151
\begin{verbatim}
2152
>>> print(print_lyrics)
2153
<function print_lyrics at 0xb7e99e9c>
2154
>>> type(print_lyrics)
2155
<class 'function'>
2156
\end{verbatim}
2157
%
2158
The syntax for calling the new function is the same as
2159
for built-in functions:
2160
2161
\begin{verbatim}
2162
>>> print_lyrics()
2163
I'm a lumberjack, and I'm okay.
2164
I sleep all night and I work all day.
2165
\end{verbatim}
2166
%
2167
Once you have defined a function, you can use it inside another
2168
function. For example, to repeat the previous refrain, we could write
2169
a function called \verb"repeat_lyrics":
2170
2171
\begin{verbatim}
2172
def repeat_lyrics():
2173
print_lyrics()
2174
print_lyrics()
2175
\end{verbatim}
2176
%
2177
And then call \verb"repeat_lyrics":
2178
2179
\begin{verbatim}
2180
>>> repeat_lyrics()
2181
I'm a lumberjack, and I'm okay.
2182
I sleep all night and I work all day.
2183
I'm a lumberjack, and I'm okay.
2184
I sleep all night and I work all day.
2185
\end{verbatim}
2186
%
2187
But that's not really how the song goes.
2188
2189
2190
\section{Definitions and uses}
2191
\index{function definition}
2192
2193
Pulling together the code fragments from the previous section, the
2194
whole program looks like this:
2195
2196
\begin{verbatim}
2197
def print_lyrics():
2198
print("I'm a lumberjack, and I'm okay.")
2199
print("I sleep all night and I work all day.")
2200
2201
def repeat_lyrics():
2202
print_lyrics()
2203
print_lyrics()
2204
2205
repeat_lyrics()
2206
\end{verbatim}
2207
%
2208
This program contains two function definitions: \verb"print_lyrics" and
2209
\verb"repeat_lyrics". Function definitions get executed just like other
2210
statements, but the effect is to create function objects. The statements
2211
inside the function do not run until the function is called, and
2212
the function definition generates no output.
2213
\index{use before def}
2214
2215
As you might expect, you have to create a function before you can
2216
run it. In other words, the function definition has to run
2217
before the function gets called.
2218
2219
As an exercise, move the last line of this program
2220
to the top, so the function call appears before the definitions. Run
2221
the program and see what error
2222
message you get.
2223
2224
Now move the function call back to the bottom
2225
and move the definition of \verb"print_lyrics" after the definition of
2226
\verb"repeat_lyrics". What happens when you run this program?
2227
2228
2229
\section{Flow of execution}
2230
\index{flow of execution}
2231
2232
To ensure that a function is defined before its first use,
2233
you have to know the order statements run in, which is
2234
called the {\bf flow of execution}.
2235
2236
Execution always begins at the first statement of the program.
2237
Statements are run one at a time, in order from top to bottom.
2238
2239
Function definitions do not alter the flow of execution of the
2240
program, but remember that statements inside the function don't
2241
run until the function is called.
2242
2243
A function call is like a detour in the flow of execution. Instead of
2244
going to the next statement, the flow jumps to the body of
2245
the function, runs the statements there, and then comes back
2246
to pick up where it left off.
2247
2248
That sounds simple enough, until you remember that one function can
2249
call another. While in the middle of one function, the program might
2250
have to run the statements in another function. Then, while
2251
running that new function, the program might have to run yet
2252
another function!
2253
2254
Fortunately, Python is good at keeping track of where it is, so each
2255
time a function completes, the program picks up where it left off in
2256
the function that called it. When it gets to the end of the program,
2257
it terminates.
2258
2259
In summary, when you read a program, you
2260
don't always want to read from top to bottom. Sometimes it makes
2261
more sense if you follow the flow of execution.
2262
2263
2264
\section{Parameters and arguments}
2265
\label{parameters}
2266
\index{parameter}
2267
\index{function parameter}
2268
\index{argument}
2269
\index{function argument}
2270
2271
Some of the functions we have seen require arguments. For
2272
example, when you call {\tt math.sin} you pass a number
2273
as an argument. Some functions take more than one argument:
2274
{\tt math.pow} takes two, the base and the exponent.
2275
2276
Inside the function, the arguments are assigned to
2277
variables called {\bf parameters}. Here is a definition for
2278
a function that takes an argument:
2279
\index{parentheses!parameters in}
2280
2281
\begin{verbatim}
2282
def print_twice(bruce):
2283
print(bruce)
2284
print(bruce)
2285
\end{verbatim}
2286
%
2287
This function assigns the argument to a parameter
2288
named {\tt bruce}. When the function is called, it prints the value of
2289
the parameter (whatever it is) twice.
2290
2291
This function works with any value that can be printed.
2292
2293
\begin{verbatim}
2294
>>> print_twice('Spam')
2295
Spam
2296
Spam
2297
>>> print_twice(42)
2298
42
2299
42
2300
>>> print_twice(math.pi)
2301
3.14159265359
2302
3.14159265359
2303
\end{verbatim}
2304
%
2305
The same rules of composition that apply to built-in functions also
2306
apply to programmer-defined functions, so we can use any kind of expression
2307
as an argument for \verb"print_twice":
2308
\index{composition}
2309
\index{programmer-defined function}
2310
\index{function!programmer defined}
2311
2312
\begin{verbatim}
2313
>>> print_twice('Spam '*4)
2314
Spam Spam Spam Spam
2315
Spam Spam Spam Spam
2316
>>> print_twice(math.cos(math.pi))
2317
-1.0
2318
-1.0
2319
\end{verbatim}
2320
%
2321
The argument is evaluated before the function is called, so
2322
in the examples the expressions \verb"'Spam '*4" and
2323
{\tt math.cos(math.pi)} are only evaluated once.
2324
\index{argument}
2325
2326
You can also use a variable as an argument:
2327
2328
\begin{verbatim}
2329
>>> michael = 'Eric, the half a bee.'
2330
>>> print_twice(michael)
2331
Eric, the half a bee.
2332
Eric, the half a bee.
2333
\end{verbatim}
2334
%
2335
The name of the variable we pass as an argument ({\tt michael}) has
2336
nothing to do with the name of the parameter ({\tt bruce}). It
2337
doesn't matter what the value was called back home (in the caller);
2338
here in \verb"print_twice", we call everybody {\tt bruce}.
2339
2340
2341
\section{Variables and parameters are local}
2342
\index{local variable}
2343
\index{variable!local}
2344
2345
When you create a variable inside a function, it is {\bf local},
2346
which means that it only
2347
exists inside the function. For example:
2348
\index{parentheses!parameters in}
2349
2350
\begin{verbatim}
2351
def cat_twice(part1, part2):
2352
cat = part1 + part2
2353
print_twice(cat)
2354
\end{verbatim}
2355
%
2356
This function takes two arguments, concatenates them, and prints
2357
the result twice. Here is an example that uses it:
2358
\index{concatenation}
2359
2360
\begin{verbatim}
2361
>>> line1 = 'Bing tiddle '
2362
>>> line2 = 'tiddle bang.'
2363
>>> cat_twice(line1, line2)
2364
Bing tiddle tiddle bang.
2365
Bing tiddle tiddle bang.
2366
\end{verbatim}
2367
%
2368
When \verb"cat_twice" terminates, the variable {\tt cat}
2369
is destroyed. If we try to print it, we get an exception:
2370
\index{NameError}
2371
\index{exception!NameError}
2372
2373
\begin{verbatim}
2374
>>> print(cat)
2375
NameError: name 'cat' is not defined
2376
\end{verbatim}
2377
%
2378
Parameters are also local.
2379
For example, outside \verb"print_twice", there is no
2380
such thing as {\tt bruce}.
2381
\index{parameter}
2382
2383
2384
\section{Stack diagrams}
2385
\label{stackdiagram}
2386
\index{stack diagram}
2387
\index{function frame}
2388
\index{frame}
2389
2390
To keep track of which variables can be used where, it is sometimes
2391
useful to draw a {\bf stack diagram}. Like state diagrams, stack
2392
diagrams show the value of each variable, but they also show the
2393
function each variable belongs to.
2394
\index{stack diagram}
2395
\index{diagram!stack}
2396
2397
Each function is represented by a {\bf frame}. A frame is a box with
2398
the name of a function beside it and the parameters and variables of
2399
the function inside it. The stack diagram for the previous example is
2400
shown in Figure~\ref{fig.stack}.
2401
2402
\begin{figure}
2403
\centerline
2404
{\includegraphics[scale=0.8]{figs/stack.pdf}}
2405
\caption{Stack diagram.}
2406
\label{fig.stack}
2407
\end{figure}
2408
2409
2410
The frames are arranged in a stack that indicates which function
2411
called which, and so on. In this example, \verb"print_twice"
2412
was called by \verb"cat_twice", and \verb"cat_twice" was called by
2413
\verb"__main__", which is a special name for the topmost frame. When
2414
you create a variable outside of any function, it belongs to
2415
\verb"__main__".
2416
2417
\index{main}
2418
2419
Each parameter refers to the same value as its corresponding
2420
argument. So, {\tt part1} has the same value as
2421
{\tt line1}, {\tt part2} has the same value as {\tt line2},
2422
and {\tt bruce} has the same value as {\tt cat}.
2423
2424
If an error occurs during a function call, Python prints the
2425
name of the function, the name of the function that called
2426
it, and the name of the function that called {\em that}, all the
2427
way back to \verb"__main__".
2428
2429
For example, if you try to access {\tt cat} from within
2430
\verb"print_twice", you get a {\tt NameError}:
2431
2432
\begin{verbatim}
2433
Traceback (innermost last):
2434
File "test.py", line 13, in __main__
2435
cat_twice(line1, line2)
2436
File "test.py", line 5, in cat_twice
2437
print_twice(cat)
2438
File "test.py", line 9, in print_twice
2439
print(cat)
2440
NameError: name 'cat' is not defined
2441
\end{verbatim}
2442
%
2443
This list of functions is called a {\bf traceback}. It tells you what
2444
program file the error occurred in, and what line, and what functions
2445
were executing at the time. It also shows the line of code that
2446
caused the error.
2447
\index{traceback}
2448
2449
The order of the functions in the traceback is the same as the
2450
order of the frames in the stack diagram. The function that is
2451
currently running is at the bottom.
2452
2453
2454
\section{Fruitful functions and void functions}
2455
\index{fruitful function}
2456
\index{void function}
2457
\index{function!fruitful}
2458
\index{function!void}
2459
2460
Some of the functions we have used, such as the math functions, return
2461
results; for lack of a better name, I call them {\bf fruitful
2462
functions}. Other functions, like \verb"print_twice", perform an
2463
action but don't return a value. They are called {\bf void
2464
functions}.
2465
2466
When you call a fruitful function, you almost always
2467
want to do something with the result; for example, you might
2468
assign it to a variable or use it as part of an expression:
2469
2470
\begin{verbatim}
2471
x = math.cos(radians)
2472
golden = (math.sqrt(5) + 1) / 2
2473
\end{verbatim}
2474
%
2475
When you call a function in interactive mode, Python displays
2476
the result:
2477
2478
\begin{verbatim}
2479
>>> math.sqrt(5)
2480
2.2360679774997898
2481
\end{verbatim}
2482
%
2483
But in a script, if you call a fruitful function all by itself,
2484
the return value is lost forever!
2485
2486
\begin{verbatim}
2487
math.sqrt(5)
2488
\end{verbatim}
2489
%
2490
This script computes the square root of 5, but since it doesn't store
2491
or display the result, it is not very useful.
2492
\index{interactive mode}
2493
\index{script mode}
2494
2495
Void functions might display something on the screen or have some
2496
other effect, but they don't have a return value. If you
2497
assign the result to a variable, you get a special value called
2498
{\tt None}.
2499
\index{None special value}
2500
\index{special value!None}
2501
2502
\begin{verbatim}
2503
>>> result = print_twice('Bing')
2504
Bing
2505
Bing
2506
>>> print(result)
2507
None
2508
\end{verbatim}
2509
%
2510
The value {\tt None} is not the same as the string \verb"'None'".
2511
It is a special value that has its own type:
2512
2513
\begin{verbatim}
2514
>>> type(None)
2515
<class 'NoneType'>
2516
\end{verbatim}
2517
%
2518
The functions we have written so far are all void. We will start
2519
writing fruitful functions in a few chapters.
2520
\index{NoneType type}
2521
\index{type!NoneType}
2522
2523
2524
\section{Why functions?}
2525
\index{function!reasons for}
2526
2527
It may not be clear why it is worth the trouble to divide
2528
a program into functions. There are several reasons:
2529
2530
\begin{itemize}
2531
2532
\item Creating a new function gives you an opportunity to name a group
2533
of statements, which makes your program easier to read and debug.
2534
2535
\item Functions can make a program smaller by eliminating repetitive
2536
code. Later, if you make a change, you only have
2537
to make it in one place.
2538
2539
\item Dividing a long program into functions allows you to debug the
2540
parts one at a time and then assemble them into a working whole.
2541
2542
\item Well-designed functions are often useful for many programs.
2543
Once you write and debug one, you can reuse it.
2544
2545
\end{itemize}
2546
2547
2548
\section{Debugging}
2549
2550
One of the most important skills you will acquire is debugging.
2551
Although it can be frustrating, debugging is one of the most
2552
intellectually rich, challenging, and interesting parts of
2553
programming.
2554
\index{experimental debugging}
2555
\index{debugging!experimental}
2556
2557
In some ways debugging is like detective work. You are confronted
2558
with clues and you have to infer the processes and events that led
2559
to the results you see.
2560
2561
Debugging is also like an experimental science. Once you have an idea
2562
about what is going wrong, you modify your program and try again. If
2563
your hypothesis was correct, you can predict the result of the
2564
modification, and you take a step closer to a working program. If
2565
your hypothesis was wrong, you have to come up with a new one. As
2566
Sherlock Holmes pointed out, ``When you have eliminated the
2567
impossible, whatever remains, however improbable, must be the truth.''
2568
(A. Conan Doyle, {\em The Sign of Four})
2569
\index{Holmes, Sherlock}
2570
\index{Doyle, Arthur Conan}
2571
2572
For some people, programming and debugging are the same thing. That
2573
is, programming is the process of gradually debugging a program until
2574
it does what you want. The idea is that you should start with a
2575
working program and make small modifications,
2576
debugging them as you go.
2577
2578
For example, Linux is an operating system that contains millions of
2579
lines of code, but it started out as a simple program Linus Torvalds
2580
used to explore the Intel 80386 chip. According to Larry Greenfield,
2581
``One of Linus's earlier projects was a program that would switch
2582
between printing AAAA and BBBB. This later evolved to Linux.''
2583
({\em The Linux Users' Guide} Beta Version 1).
2584
\index{Linux}
2585
2586
2587
\section{Glossary}
2588
2589
\begin{description}
2590
2591
\item[function:] A named sequence of statements that performs some
2592
useful operation. Functions may or may not take arguments and may or
2593
may not produce a result.
2594
\index{function}
2595
2596
\item[function definition:] A statement that creates a new function,
2597
specifying its name, parameters, and the statements it contains.
2598
\index{function definition}
2599
2600
\item[function object:] A value created by a function definition.
2601
The name of the function is a variable that refers to a function
2602
object.
2603
\index{function definition}
2604
2605
\item[header:] The first line of a function definition.
2606
\index{header}
2607
2608
\item[body:] The sequence of statements inside a function definition.
2609
\index{body}
2610
2611
\item[parameter:] A name used inside a function to refer to the value
2612
passed as an argument.
2613
\index{parameter}
2614
2615
\item[function call:] A statement that runs a function. It
2616
consists of the function name followed by an argument list in
2617
parentheses.
2618
\index{function call}
2619
2620
\item[argument:] A value provided to a function when the function is called.
2621
This value is assigned to the corresponding parameter in the function.
2622
\index{argument}
2623
2624
\item[local variable:] A variable defined inside a function. A local
2625
variable can only be used inside its function.
2626
\index{local variable}
2627
2628
\item[return value:] The result of a function. If a function call
2629
is used as an expression, the return value is the value of
2630
the expression.
2631
\index{return value}
2632
2633
\item[fruitful function:] A function that returns a value.
2634
\index{fruitful function}
2635
2636
\item[void function:] A function that always returns {\tt None}.
2637
\index{void function}
2638
2639
\item[{\tt None}:] A special value returned by void functions.
2640
\index{None special value}
2641
\index{special value!None}
2642
2643
\item[module:] A file that contains a
2644
collection of related functions and other definitions.
2645
\index{module}
2646
2647
\item[import statement:] A statement that reads a module file and creates
2648
a module object.
2649
\index{import statement}
2650
\index{statement!import}
2651
2652
\item[module object:] A value created by an {\tt import} statement
2653
that provides access to the values defined in a module.
2654
\index{module}
2655
2656
\item[dot notation:] The syntax for calling a function in another
2657
module by specifying the module name followed by a dot (period) and
2658
the function name.
2659
\index{dot notation}
2660
2661
\item[composition:] Using an expression as part of a larger expression,
2662
or a statement as part of a larger statement.
2663
\index{composition}
2664
2665
\item[flow of execution:] The order statements run in.
2666
\index{flow of execution}
2667
2668
\item[stack diagram:] A graphical representation of a stack of functions,
2669
their variables, and the values they refer to.
2670
\index{stack diagram}
2671
2672
\item[frame:] A box in a stack diagram that represents a function call.
2673
It contains the local variables and parameters of the function.
2674
\index{function frame}
2675
\index{frame}
2676
2677
\item[traceback:] A list of the functions that are executing,
2678
printed when an exception occurs.
2679
\index{traceback}
2680
2681
2682
\end{description}
2683
2684
2685
\section{Exercises}
2686
2687
\begin{exercise}
2688
\index{len function}
2689
\index{function!len}
2690
2691
Write a function named \verb"right_justify" that takes a string
2692
named {\tt s} as a parameter and prints the string with enough
2693
leading spaces so that the last letter of the string is in column 70
2694
of the display.
2695
2696
\begin{verbatim}
2697
>>> right_justify('monty')
2698
monty
2699
\end{verbatim}
2700
2701
Hint: Use string concatenation and repetition. Also,
2702
Python provides a built-in function called {\tt len} that
2703
returns the length of a string, so the value of \verb"len('monty')" is 5.
2704
2705
\end{exercise}
2706
2707
2708
\begin{exercise}
2709
\index{function object}
2710
\index{object!function}
2711
2712
A function object is a value you can assign to a variable
2713
or pass as an argument. For example, \verb"do_twice" is a function
2714
that takes a function object as an argument and calls it twice:
2715
2716
\begin{verbatim}
2717
def do_twice(f):
2718
f()
2719
f()
2720
\end{verbatim}
2721
2722
Here's an example that uses \verb"do_twice" to call a function
2723
named \verb"print_spam" twice.
2724
2725
\begin{verbatim}
2726
def print_spam():
2727
print('spam')
2728
2729
do_twice(print_spam)
2730
\end{verbatim}
2731
2732
\begin{enumerate}
2733
2734
\item Type this example into a script and test it.
2735
2736
\item Modify \verb"do_twice" so that it takes two arguments, a
2737
function object and a value, and calls the function twice,
2738
passing the value as an argument.
2739
2740
\item Copy the definition of
2741
\verb"print_twice" from earlier in this chapter to your script.
2742
2743
\item Use the modified version of \verb"do_twice" to call
2744
\verb"print_twice" twice, passing \verb"'spam'" as an argument.
2745
2746
\item Define a new function called
2747
\verb"do_four" that takes a function object and a value
2748
and calls the function four times, passing the value
2749
as a parameter. There should be only
2750
two statements in the body of this function, not four.
2751
2752
\end{enumerate}
2753
2754
Solution: \url{http://thinkpython2.com/code/do_four.py}.
2755
2756
\end{exercise}
2757
2758
2759
2760
\begin{exercise}
2761
2762
Note: This exercise should be
2763
done using only the statements and other features we have learned so
2764
far.
2765
2766
\begin{enumerate}
2767
2768
\item Write a function that draws a grid like the following:
2769
\index{grid}
2770
2771
\begin{verbatim}
2772
+ - - - - + - - - - +
2773
| | |
2774
| | |
2775
| | |
2776
| | |
2777
+ - - - - + - - - - +
2778
| | |
2779
| | |
2780
| | |
2781
| | |
2782
+ - - - - + - - - - +
2783
\end{verbatim}
2784
%
2785
Hint: to print more than one value on a line, you can print
2786
a comma-separated sequence of values:
2787
2788
\begin{verbatim}
2789
print('+', '-')
2790
\end{verbatim}
2791
%
2792
By default, {\tt print} advances to the next line, but you
2793
can override that behavior and put a space at the end, like this:
2794
2795
\begin{verbatim}
2796
print('+', end=' ')
2797
print('-')
2798
\end{verbatim}
2799
%
2800
The output of these statements is \verb"'+ -'" on the same line.
2801
The output from the next print statement would begin on the next line.
2802
2803
\item Write a function that draws a similar grid
2804
with four rows and four columns.
2805
2806
\end{enumerate}
2807
2808
Solution: \url{http://thinkpython2.com/code/grid.py}.
2809
Credit: This exercise is based on an exercise in Oualline, {\em
2810
Practical C Programming, Third Edition}, O'Reilly Media, 1997.
2811
2812
\end{exercise}
2813
2814
2815
2816
2817
2818
\chapter{Case study: interface design}
2819
\label{turtlechap}
2820
2821
This chapter presents a case study that demonstrates a process for
2822
designing functions that work together.
2823
2824
It introduces the {\tt turtle} module, which allows you to
2825
create images using turtle graphics. The {\tt turtle} module is
2826
included in most Python installations, but if you are running Python
2827
using PythonAnywhere, you won't be able to run the turtle examples (at
2828
least you couldn't when I wrote this).
2829
2830
If you have already installed Python on your computer, you should
2831
be able to run the examples. Otherwise, now is a good time
2832
to install. I have posted instructions at
2833
\url{http://tinyurl.com/thinkpython2e}.
2834
2835
Code examples from this chapter are available from
2836
\url{http://thinkpython2.com/code/polygon.py}.
2837
2838
2839
\section{The turtle module}
2840
\label{turtle}
2841
2842
To check whether you have the {\tt turtle} module, open the Python
2843
interpreter and type
2844
2845
\begin{verbatim}
2846
>>> import turtle
2847
>>> bob = turtle.Turtle()
2848
\end{verbatim}
2849
2850
When you run this code, it should create a new window
2851
with small arrow that represents the turtle. Close the window.
2852
2853
Create a file named {\tt mypolygon.py} and type in the following
2854
code:
2855
2856
\begin{verbatim}
2857
import turtle
2858
bob = turtle.Turtle()
2859
print(bob)
2860
turtle.mainloop()
2861
\end{verbatim}
2862
%
2863
The {\tt turtle} module (with a lowercase 't') provides a function
2864
called {\tt Turtle} (with an uppercase 'T') that creates a Turtle
2865
object, which we assign to a variable named {\tt bob}.
2866
Printing {\tt bob} displays something like:
2867
2868
\begin{verbatim}
2869
<turtle.Turtle object at 0xb7bfbf4c>
2870
\end{verbatim}
2871
%
2872
This means that {\tt bob} refers to an object with type
2873
{\tt Turtle}
2874
as defined in module {\tt turtle}.
2875
2876
\verb"mainloop" tells the window to wait for the user
2877
to do something, although in this case there's not much for
2878
the user to do except close the window.
2879
2880
Once you create a Turtle, you can call a {\bf method} to move it
2881
around the window. A method is similar to a function, but it
2882
uses slightly different syntax. For example, to move the turtle
2883
forward:
2884
2885
\begin{verbatim}
2886
bob.fd(100)
2887
\end{verbatim}
2888
%
2889
The method, {\tt fd}, is associated with the turtle
2890
object we're calling {\tt bob}.
2891
Calling a method is like making a request: you are asking {\tt bob}
2892
to move forward.
2893
2894
The argument of {\tt fd} is a distance in pixels, so the actual
2895
size depends on your display.
2896
2897
Other methods you can call on a Turtle are {\tt bk} to move
2898
backward, {\tt lt} for left turn, and {\tt rt} right turn. The
2899
argument for {\tt lt} and {\tt rt} is an angle in degrees.
2900
2901
Also, each Turtle is holding a pen, which is
2902
either down or up; if the pen is down, the Turtle leaves
2903
a trail when it moves. The methods {\tt pu} and {\tt pd}
2904
stand for ``pen up'' and ``pen down''.
2905
2906
To draw a right angle, add these lines to the program
2907
(after creating {\tt bob} and before calling \verb"mainloop"):
2908
2909
\begin{verbatim}
2910
bob.fd(100)
2911
bob.lt(90)
2912
bob.fd(100)
2913
\end{verbatim}
2914
%
2915
When you run this program, you should see {\tt bob} move east and then
2916
north, leaving two line segments behind.
2917
2918
Now modify the program to draw a square. Don't go on until
2919
you've got it working!
2920
2921
%\newpage
2922
2923
\section{Simple repetition}
2924
\label{repetition}
2925
\index{repetition}
2926
2927
Chances are you wrote something like this:
2928
2929
\begin{verbatim}
2930
bob.fd(100)
2931
bob.lt(90)
2932
2933
bob.fd(100)
2934
bob.lt(90)
2935
2936
bob.fd(100)
2937
bob.lt(90)
2938
2939
bob.fd(100)
2940
\end{verbatim}
2941
%
2942
We can do the same thing more concisely with a {\tt for} statement.
2943
Add this example to {\tt mypolygon.py} and run it again:
2944
\index{for loop}
2945
\index{loop!for}
2946
\index{statement!for}
2947
2948
\begin{verbatim}
2949
for i in range(4):
2950
print('Hello!')
2951
\end{verbatim}
2952
%
2953
You should see something like this:
2954
2955
\begin{verbatim}
2956
Hello!
2957
Hello!
2958
Hello!
2959
Hello!
2960
\end{verbatim}
2961
%
2962
This is the simplest use of the {\tt for} statement; we will see
2963
more later. But that should be enough to let you rewrite your
2964
square-drawing program. Don't go on until you do.
2965
2966
Here is a {\tt for} statement that draws a square:
2967
2968
\begin{verbatim}
2969
for i in range(4):
2970
bob.fd(100)
2971
bob.lt(90)
2972
\end{verbatim}
2973
%
2974
The syntax of a {\tt for} statement is similar to a function
2975
definition. It has a header that ends with a colon and an indented
2976
body. The body can contain any number of statements.
2977
2978
A {\tt for} statement is also called a {\bf loop} because
2979
the flow of execution runs through the body and then loops back
2980
to the top. In this case, it runs the body four times.
2981
\index{loop}
2982
2983
This version is actually a little different from the previous
2984
square-drawing code because it makes another turn after
2985
drawing the last side of the square. The extra turn takes
2986
more time, but it simplifies the code if we do the same thing
2987
every time through the loop. This version also has the effect
2988
of leaving the turtle back in the starting position, facing in
2989
the starting direction.
2990
2991
\section{Exercises}
2992
2993
The following is a series of exercises using TurtleWorld. They
2994
are meant to be fun, but they have a point, too. While you are
2995
working on them, think about what the point is.
2996
2997
The following sections have solutions to the exercises, so
2998
don't look until you have finished (or at least tried).
2999
3000
\begin{enumerate}
3001
3002
\item Write a function called {\tt square} that takes a parameter
3003
named {\tt t}, which is a turtle. It should use the turtle to draw
3004
a square.
3005
3006
Write a function call that passes {\tt bob} as an argument to
3007
{\tt square}, and then run the program again.
3008
3009
\item Add another parameter, named {\tt length}, to {\tt square}.
3010
Modify the body so length of the sides is {\tt length}, and then
3011
modify the function call to provide a second argument. Run the
3012
program again. Test your program with a range of values for {\tt
3013
length}.
3014
3015
\item Make a copy of {\tt square} and change the name to {\tt
3016
polygon}. Add another parameter named {\tt n} and modify the body
3017
so it draws an n-sided regular polygon. Hint: The exterior angles
3018
of an n-sided regular polygon are $360/n$ degrees. \index{polygon
3019
function} \index{function!polygon}
3020
3021
\item Write a function called {\tt circle} that takes a turtle,
3022
{\tt t}, and radius, {\tt r}, as parameters and that draws an
3023
approximate circle by calling {\tt polygon} with an appropriate
3024
length and number of sides. Test your function with a range of values
3025
of {\tt r}. \index{circle function} \index{function!circle}
3026
3027
Hint: figure out the circumference of the circle and make sure that
3028
{\tt length * n = circumference}.
3029
3030
\item Make a more general version of {\tt circle} called {\tt arc}
3031
that takes an additional parameter {\tt angle}, which determines
3032
what fraction of a circle to draw. {\tt angle} is in units of
3033
degrees, so when {\tt angle=360}, {\tt arc} should draw a complete
3034
circle.
3035
\index{arc function}
3036
\index{function!arc}
3037
3038
\end{enumerate}
3039
3040
3041
\section{Encapsulation}
3042
3043
The first exercise asks you to put your square-drawing code
3044
into a function definition and then call the function, passing
3045
the turtle as a parameter. Here is a solution:
3046
3047
\begin{verbatim}
3048
def square(t):
3049
for i in range(4):
3050
t.fd(100)
3051
t.lt(90)
3052
3053
square(bob)
3054
\end{verbatim}
3055
%
3056
The innermost statements, {\tt fd} and {\tt lt} are indented twice to
3057
show that they are inside the {\tt for} loop, which is inside the
3058
function definition. The next line, {\tt square(bob)}, is flush with
3059
the left margin, which indicates the end of both the {\tt for} loop
3060
and the function definition.
3061
3062
Inside the function, {\tt t} refers to the same turtle {\tt bob}, so
3063
{\tt t.lt(90)} has the same effect as {\tt bob.lt(90)}. In that
3064
case, why not
3065
call the parameter {\tt bob}? The idea is that {\tt t} can be any
3066
turtle, not just {\tt bob}, so you could create a second turtle and
3067
pass it as an argument to {\tt square}:
3068
3069
\begin{verbatim}
3070
alice = turtle.Turtle()
3071
square(alice)
3072
\end{verbatim}
3073
%
3074
Wrapping a piece of code up in a function is called {\bf
3075
encapsulation}. One of the benefits of encapsulation is that it
3076
attaches a name to the code, which serves as a kind of documentation.
3077
Another advantage is that if you re-use the code, it is more concise
3078
to call a function twice than to copy and paste the body!
3079
\index{encapsulation}
3080
3081
3082
\section{Generalization}
3083
3084
The next step is to add a {\tt length} parameter to {\tt square}.
3085
Here is a solution:
3086
3087
\begin{verbatim}
3088
def square(t, length):
3089
for i in range(4):
3090
t.fd(length)
3091
t.lt(90)
3092
3093
square(bob, 100)
3094
\end{verbatim}
3095
%
3096
Adding a parameter to a function is called {\bf generalization}
3097
because it makes the function more general: in the previous
3098
version, the square is always the same size; in this version
3099
it can be any size.
3100
\index{generalization}
3101
3102
The next step is also a generalization. Instead of drawing
3103
squares, {\tt polygon} draws regular polygons with any number of
3104
sides. Here is a solution:
3105
3106
\begin{verbatim}
3107
def polygon(t, n, length):
3108
angle = 360 / n
3109
for i in range(n):
3110
t.fd(length)
3111
t.lt(angle)
3112
3113
polygon(bob, 7, 70)
3114
\end{verbatim}
3115
%
3116
This example draws a 7-sided polygon with side length 70.
3117
3118
If you are using Python 2, the value of {\tt angle} might be off
3119
because of integer division. A simple solution is to compute
3120
{\tt angle = 360.0 / n}. Because the numerator is a floating-point
3121
number, the result is floating point.
3122
\index{Python 2}
3123
3124
When a function has more than a few numeric arguments, it is easy to
3125
forget what they are, or what order they should be in. In that case
3126
it is often a good idea to include the names of the parameters in the
3127
argument list:
3128
3129
\begin{verbatim}
3130
polygon(bob, n=7, length=70)
3131
\end{verbatim}
3132
%
3133
These are called {\bf keyword arguments} because they include
3134
the parameter names as ``keywords'' (not to be confused with
3135
Python keywords like {\tt while} and {\tt def}).
3136
\index{keyword argument}
3137
\index{argument!keyword}
3138
3139
This syntax makes the program more readable. It is also a reminder
3140
about how arguments and parameters work: when you call a function, the
3141
arguments are assigned to the parameters.
3142
3143
3144
\section{Interface design}
3145
3146
The next step is to write {\tt circle}, which takes a radius,
3147
{\tt r}, as a parameter. Here is a simple solution that uses
3148
{\tt polygon} to draw a 50-sided polygon:
3149
3150
\begin{verbatim}
3151
import math
3152
3153
def circle(t, r):
3154
circumference = 2 * math.pi * r
3155
n = 50
3156
length = circumference / n
3157
polygon(t, n, length)
3158
\end{verbatim}
3159
%
3160
The first line computes the circumference of a circle with radius
3161
{\tt r} using the formula $2 \pi r$. Since we use {\tt math.pi}, we
3162
have to import {\tt math}. By convention, {\tt import} statements
3163
are usually at the beginning of the script.
3164
3165
{\tt n} is the number of line segments in our approximation of a circle,
3166
so {\tt length} is the length of each segment. Thus, {\tt polygon}
3167
draws a 50-sided polygon that approximates a circle with radius {\tt r}.
3168
3169
One limitation of this solution is that {\tt n} is a constant, which
3170
means that for very big circles, the line segments are too long, and
3171
for small circles, we waste time drawing very small segments. One
3172
solution would be to generalize the function by taking {\tt n} as
3173
a parameter. This would give the user (whoever calls {\tt circle})
3174
more control, but the interface would be less clean.
3175
\index{interface}
3176
3177
The {\bf interface} of a function is a summary of how it is used: what
3178
are the parameters? What does the function do? And what is the return
3179
value? An interface is ``clean'' if it allows the caller to do
3180
what they want without dealing with unnecessary details.
3181
3182
In this example, {\tt r} belongs in the interface because it
3183
specifies the circle to be drawn. {\tt n} is less appropriate
3184
because it pertains to the details of {\em how} the circle should
3185
be rendered.
3186
3187
Rather than clutter up the interface, it is better
3188
to choose an appropriate value of {\tt n}
3189
depending on {\tt circumference}:
3190
3191
\begin{verbatim}
3192
def circle(t, r):
3193
circumference = 2 * math.pi * r
3194
n = int(circumference / 3) + 3
3195
length = circumference / n
3196
polygon(t, n, length)
3197
\end{verbatim}
3198
%
3199
Now the number of segments is an integer near {\tt circumference/3},
3200
so the length of each segment is approximately 3, which is small
3201
enough that the circles look good, but big enough to be efficient,
3202
and acceptable for any size circle.
3203
3204
Adding 3 to {\tt n} guarantees that the polygon has at least 3 sides.
3205
3206
3207
\section{Refactoring}
3208
\label{refactoring}
3209
\index{refactoring}
3210
3211
When I wrote {\tt circle}, I was able to re-use {\tt polygon}
3212
because a many-sided polygon is a good approximation of a circle.
3213
But {\tt arc} is not as cooperative; we can't use {\tt polygon}
3214
or {\tt circle} to draw an arc.
3215
3216
One alternative is to start with a copy
3217
of {\tt polygon} and transform it into {\tt arc}. The result
3218
might look like this:
3219
3220
\begin{verbatim}
3221
def arc(t, r, angle):
3222
arc_length = 2 * math.pi * r * angle / 360
3223
n = int(arc_length / 3) + 1
3224
step_length = arc_length / n
3225
step_angle = angle / n
3226
3227
for i in range(n):
3228
t.fd(step_length)
3229
t.lt(step_angle)
3230
\end{verbatim}
3231
%
3232
The second half of this function looks like {\tt polygon}, but we
3233
can't re-use {\tt polygon} without changing the interface. We could
3234
generalize {\tt polygon} to take an angle as a third argument,
3235
but then {\tt polygon} would no longer be an appropriate name!
3236
Instead, let's call the more general function {\tt polyline}:
3237
3238
\begin{verbatim}
3239
def polyline(t, n, length, angle):
3240
for i in range(n):
3241
t.fd(length)
3242
t.lt(angle)
3243
\end{verbatim}
3244
%
3245
Now we can rewrite {\tt polygon} and {\tt arc} to use {\tt polyline}:
3246
3247
\begin{verbatim}
3248
def polygon(t, n, length):
3249
angle = 360.0 / n
3250
polyline(t, n, length, angle)
3251
3252
def arc(t, r, angle):
3253
arc_length = 2 * math.pi * r * angle / 360
3254
n = int(arc_length / 3) + 1
3255
step_length = arc_length / n
3256
step_angle = float(angle) / n
3257
polyline(t, n, step_length, step_angle)
3258
\end{verbatim}
3259
%
3260
Finally, we can rewrite {\tt circle} to use {\tt arc}:
3261
3262
\begin{verbatim}
3263
def circle(t, r):
3264
arc(t, r, 360)
3265
\end{verbatim}
3266
%
3267
This process---rearranging a program to improve
3268
interfaces and facilitate code re-use---is called {\bf refactoring}.
3269
In this case, we noticed that there was similar code in {\tt arc} and
3270
{\tt polygon}, so we ``factored it out'' into {\tt polyline}.
3271
\index{refactoring}
3272
3273
If we had planned ahead, we might have written {\tt polyline} first
3274
and avoided refactoring, but often you don't know enough at the
3275
beginning of a project to design all the interfaces. Once you start
3276
coding, you understand the problem better. Sometimes refactoring is a
3277
sign that you have learned something.
3278
3279
3280
\section{A development plan}
3281
\index{development plan!encapsulation and generalization}
3282
3283
A {\bf development plan} is a process for writing programs. The
3284
process we used in this case study is ``encapsulation and
3285
generalization''. The steps of this process are:
3286
3287
\begin{enumerate}
3288
3289
\item Start by writing a small program with no function definitions.
3290
3291
\item Once you get the program working, identify a coherent piece of
3292
it, encapsulate the piece in a function and give it a name.
3293
3294
\item Generalize the function by adding appropriate parameters.
3295
3296
\item Repeat steps 1--3 until you have a set of working functions.
3297
Copy and paste working code to avoid retyping (and re-debugging).
3298
3299
\item Look for opportunities to improve the program by refactoring.
3300
For example, if you have similar code in several places, consider
3301
factoring it into an appropriately general function.
3302
3303
\end{enumerate}
3304
3305
This process has some drawbacks---we will see alternatives later---but
3306
it can be useful if you don't know ahead of time how to divide the
3307
program into functions. This approach lets you design as you go
3308
along.
3309
3310
3311
\section{docstring}
3312
\label{docstring}
3313
\index{docstring}
3314
3315
A {\bf docstring} is a string at the beginning of a function that
3316
explains the interface (``doc'' is short for ``documentation''). Here
3317
is an example:
3318
3319
\begin{verbatim}
3320
def polyline(t, n, length, angle):
3321
"""Draws n line segments with the given length and
3322
angle (in degrees) between them. t is a turtle.
3323
"""
3324
for i in range(n):
3325
t.fd(length)
3326
t.lt(angle)
3327
\end{verbatim}
3328
%
3329
By convention, all docstrings are triple-quoted strings, also known
3330
as multiline strings because the triple quotes allow the string
3331
to span more than one line.
3332
\index{quotation mark}
3333
\index{triple-quoted string}
3334
\index{string!triple-quoted}
3335
\index{multiline string}
3336
\index{string!multiline}
3337
3338
It is terse, but it contains the essential information
3339
someone would need to use this function. It explains concisely what
3340
the function does (without getting into the details of how it does
3341
it). It explains what effect each parameter has on the behavior of
3342
the function and what type each parameter should be (if it is not
3343
obvious).
3344
3345
Writing this kind of documentation is an important part of interface
3346
design. A well-designed interface should be simple to explain;
3347
if you have a hard time explaining one of your functions,
3348
maybe the interface could be improved.
3349
3350
3351
\section{Debugging}
3352
\index{debugging}
3353
\index{interface}
3354
3355
An interface is like a contract between a function and a caller.
3356
The caller agrees to provide certain parameters and the function
3357
agrees to do certain work.
3358
3359
For example, {\tt polyline} requires four arguments: {\tt t} has to be
3360
a Turtle; {\tt n} has to be an
3361
integer; {\tt length} should be a positive number; and {\tt
3362
angle} has to be a number, which is understood to be in degrees.
3363
3364
These requirements are called {\bf preconditions} because they
3365
are supposed to be true before the function starts executing.
3366
Conversely, conditions at the end of the function are
3367
{\bf postconditions}. Postconditions include the intended
3368
effect of the function (like drawing line segments) and any
3369
side effects (like moving the Turtle or making other changes).
3370
\index{precondition}
3371
\index{postcondition}
3372
3373
Preconditions are the responsibility of the caller. If the caller
3374
violates a (properly documented!) precondition and the function
3375
doesn't work correctly, the bug is in the caller, not the function.
3376
3377
If the preconditions are satisfied and the postconditions are
3378
not, the bug is in the function. If your pre- and postconditions
3379
are clear, they can help with debugging.
3380
3381
3382
\section{Glossary}
3383
3384
\begin{description}
3385
3386
\item[method:] A function that is associated with an object and called
3387
using dot notation.
3388
\index{method}
3389
3390
\item[loop:] A part of a program that can run repeatedly.
3391
\index{loop}
3392
3393
\item[encapsulation:] The process of transforming a sequence of
3394
statements into a function definition.
3395
\index{encapsulation}
3396
3397
\item[generalization:] The process of replacing something
3398
unnecessarily specific (like a number) with something appropriately
3399
general (like a variable or parameter).
3400
\index{generalization}
3401
3402
\item[keyword argument:] An argument that includes the name of
3403
the parameter as a ``keyword''.
3404
\index{keyword argument}
3405
\index{argument!keyword}
3406
3407
\item[interface:] A description of how to use a function, including
3408
the name and descriptions of the arguments and return value.
3409
\index{interface}
3410
3411
\item[refactoring:] The process of modifying a working program to
3412
improve function interfaces and other qualities of the code.
3413
\index{refactoring}
3414
3415
\item[development plan:] A process for writing programs.
3416
\index{development plan}
3417
3418
\item[docstring:] A string that appears at the top of a function
3419
definition to document the function's interface.
3420
\index{docstring}
3421
3422
\item[precondition:] A requirement that should be satisfied by
3423
the caller before a function starts.
3424
\index{precondition}
3425
3426
\item[postcondition:] A requirement that should be satisfied by
3427
the function before it ends.
3428
\index{precondition}
3429
3430
\end{description}
3431
3432
3433
\section{Exercises}
3434
3435
\begin{exercise}
3436
3437
Download the code in this chapter from
3438
\url{http://thinkpython2.com/code/polygon.py}.
3439
3440
\begin{enumerate}
3441
3442
\item Draw a stack diagram that shows the state of the program
3443
while executing {\tt circle(bob, radius)}. You can do the
3444
arithmetic by hand or add {\tt print} statements to the code.
3445
\index{stack diagram}
3446
3447
\item The version of {\tt arc} in Section~\ref{refactoring} is not
3448
very accurate because the linear approximation of the
3449
circle is always outside the true circle. As a result,
3450
the Turtle ends up a few pixels away from the correct
3451
destination. My solution shows a way to reduce
3452
the effect of this error. Read the code and see if it makes
3453
sense to you. If you draw a diagram, you might see how it works.
3454
3455
\end{enumerate}
3456
3457
\end{exercise}
3458
3459
\begin{figure}
3460
\centerline
3461
{\includegraphics[scale=0.8]{figs/flowers.pdf}}
3462
\caption{Turtle flowers.}
3463
\label{fig.flowers}
3464
\end{figure}
3465
3466
\begin{exercise}
3467
\index{flower}
3468
3469
Write an appropriately general set of functions that
3470
can draw flowers as in Figure~\ref{fig.flowers}.
3471
3472
Solution: \url{http://thinkpython2.com/code/flower.py},
3473
also requires \url{http://thinkpython2.com/code/polygon.py}.
3474
3475
\end{exercise}
3476
3477
\begin{figure}
3478
\centerline
3479
{\includegraphics[scale=0.8]{figs/pies.pdf}}
3480
\caption{Turtle pies.}
3481
\label{fig.pies}
3482
\end{figure}
3483
3484
3485
\begin{exercise}
3486
\index{pie}
3487
3488
Write an appropriately general set of functions that
3489
can draw shapes as in Figure~\ref{fig.pies}.
3490
3491
Solution: \url{http://thinkpython2.com/code/pie.py}.
3492
3493
\end{exercise}
3494
3495
\begin{exercise}
3496
\index{alphabet}
3497
\index{turtle typewriter}
3498
\index{typewriter, turtle}
3499
3500
The letters of the alphabet can be constructed from a moderate number
3501
of basic elements, like vertical and horizontal lines and a few
3502
curves. Design an alphabet that can be drawn with a minimal
3503
number of basic elements and then write functions that draw the letters.
3504
3505
You should write one function for each letter, with names
3506
\verb"draw_a", \verb"draw_b", etc., and put your functions
3507
in a file named {\tt letters.py}. You can download a
3508
``turtle typewriter'' from \url{http://thinkpython2.com/code/typewriter.py}
3509
to help you test your code.
3510
3511
You can get a solution from \url{http://thinkpython2.com/code/letters.py};
3512
it also requires
3513
\url{http://thinkpython2.com/code/polygon.py}.
3514
3515
\end{exercise}
3516
3517
\begin{exercise}
3518
3519
Read about spirals at \url{http://en.wikipedia.org/wiki/Spiral}; then
3520
write a program that draws an Archimedian spiral (or one of the other
3521
kinds). Solution: \url{http://thinkpython2.com/code/spiral.py}.
3522
\index{spiral}
3523
\index{Archimedian spiral}
3524
3525
\end{exercise}
3526
3527
3528
\chapter{Conditionals and recursion}
3529
3530
The main topic of this chapter is the {\tt if} statement, which
3531
executes different code depending on the state of the program.
3532
But first I want to introduce two new operators: floor division
3533
and modulus.
3534
3535
3536
\section{Floor division and modulus}
3537
3538
The {\bf floor division} operator, \verb"//", divides
3539
two numbers and rounds down to an integer. For example, suppose the
3540
run time of a movie is 105 minutes. You might want to know how
3541
long that is in hours. Conventional division
3542
returns a floating-point number:
3543
3544
\begin{verbatim}
3545
>>> minutes = 105
3546
>>> minutes / 60
3547
1.75
3548
\end{verbatim}
3549
3550
But we don't normally write hours with decimal points. Floor
3551
division returns the integer number of hours, rounding down:
3552
3553
\begin{verbatim}
3554
>>> minutes = 105
3555
>>> hours = minutes // 60
3556
>>> hours
3557
1
3558
\end{verbatim}
3559
3560
To get the remainder, you could subtract off one hour in minutes:
3561
3562
\begin{verbatim}
3563
>>> remainder = minutes - hours * 60
3564
>>> remainder
3565
45
3566
\end{verbatim}
3567
3568
\index{floor division}
3569
\index{floating-point division}
3570
\index{division!floor}
3571
\index{division!floating-point}
3572
\index{modulus operator}
3573
\index{operator!modulus}
3574
3575
An alternative is to use the {\bf modulus operator}, \verb"%", which
3576
divides two numbers and returns the remainder.
3577
3578
\begin{verbatim}
3579
>>> remainder = minutes % 60
3580
>>> remainder
3581
45
3582
\end{verbatim}
3583
%
3584
The modulus operator is more useful than it seems. For
3585
example, you can check whether one number is divisible by another---if
3586
{\tt x \% y} is zero, then {\tt x} is divisible by {\tt y}.
3587
\index{divisibility}
3588
3589
Also, you can extract the right-most digit
3590
or digits from a number. For example, {\tt x \% 10} yields the
3591
right-most digit of {\tt x} (in base 10). Similarly {\tt x \% 100}
3592
yields the last two digits.
3593
3594
If you are using Python 2, division works differently. The
3595
division operator, \verb"/", performs floor division if both
3596
operands are integers, and floating-point division if either
3597
operand is a {\tt float}.
3598
\index{Python 2}
3599
3600
3601
\section{Boolean expressions}
3602
\index{boolean expression}
3603
\index{expression!boolean}
3604
\index{logical operator}
3605
\index{operator!logical}
3606
3607
A {\bf boolean expression} is an expression that is either true
3608
or false. The following examples use the
3609
operator {\tt ==}, which compares two operands and produces
3610
{\tt True} if they are equal and {\tt False} otherwise:
3611
3612
\begin{verbatim}
3613
>>> 5 == 5
3614
True
3615
>>> 5 == 6
3616
False
3617
\end{verbatim}
3618
%
3619
{\tt True} and {\tt False} are special
3620
values that belong to the type {\tt bool}; they are not strings:
3621
\index{True special value}
3622
\index{False special value}
3623
\index{special value!True}
3624
\index{special value!False}
3625
\index{bool type}
3626
\index{type!bool}
3627
3628
\begin{verbatim}
3629
>>> type(True)
3630
<class 'bool'>
3631
>>> type(False)
3632
<class 'bool'>
3633
\end{verbatim}
3634
%
3635
The {\tt ==} operator is one of the {\bf relational operators}; the
3636
others are:
3637
3638
\begin{verbatim}
3639
x != y # x is not equal to y
3640
x > y # x is greater than y
3641
x < y # x is less than y
3642
x >= y # x is greater than or equal to y
3643
x <= y # x is less than or equal to y
3644
\end{verbatim}
3645
%
3646
Although these operations are probably familiar to you, the Python
3647
symbols are different from the mathematical symbols. A common error
3648
is to use a single equal sign ({\tt =}) instead of a double equal sign
3649
({\tt ==}). Remember that {\tt =} is an assignment operator and
3650
{\tt ==} is a relational operator. There is no such thing as
3651
{\tt =<} or {\tt =>}.
3652
\index{relational operator}
3653
\index{operator!relational}
3654
3655
3656
\section {Logical operators}
3657
\index{logical operator}
3658
\index{operator!logical}
3659
3660
There are three {\bf logical operators}: {\tt and}, {\tt
3661
or}, and {\tt not}. The semantics (meaning) of these operators is
3662
similar to their meaning in English. For example,
3663
{\tt x > 0 and x < 10} is true only if {\tt x} is greater than 0
3664
{\em and} less than 10.
3665
\index{and operator}
3666
\index{or operator}
3667
\index{not operator}
3668
\index{operator!and}
3669
\index{operator!or}
3670
\index{operator!not}
3671
3672
{\tt n\%2 == 0 or n\%3 == 0} is true if {\em either or both} of the
3673
conditions is true, that is, if the number is divisible by 2 {\em or}
3674
3.
3675
3676
Finally, the {\tt not} operator negates a boolean
3677
expression, so {\tt not (x > y)} is true if {\tt x > y} is false,
3678
that is, if {\tt x} is less than or equal to {\tt y}.
3679
3680
Strictly speaking, the operands of the logical operators should be
3681
boolean expressions, but Python is not very strict.
3682
Any nonzero number is interpreted as {\tt True}:
3683
3684
\begin{verbatim}
3685
>>> 42 and True
3686
True
3687
\end{verbatim}
3688
%
3689
This flexibility can be useful, but there are some subtleties to
3690
it that might be confusing. You might want to avoid it (unless
3691
you know what you are doing).
3692
3693
3694
\section{Conditional execution}
3695
\label{conditional.execution}
3696
3697
\index{conditional statement}
3698
\index{statement!conditional}
3699
\index{if statement}
3700
\index{statement!if}
3701
\index{conditional execution}
3702
In order to write useful programs, we almost always need the ability
3703
to check conditions and change the behavior of the program
3704
accordingly. {\bf Conditional statements} give us this ability. The
3705
simplest form is the {\tt if} statement:
3706
3707
\begin{verbatim}
3708
if x > 0:
3709
print('x is positive')
3710
\end{verbatim}
3711
%
3712
The boolean expression after {\tt if} is
3713
called the {\bf condition}. If it is true, the indented
3714
statement runs. If not, nothing happens.
3715
\index{condition}
3716
\index{compound statement}
3717
\index{statement!compound}
3718
3719
{\tt if} statements have the same structure as function definitions:
3720
a header followed by an indented body. Statements like this are
3721
called {\bf compound statements}.
3722
3723
There is no limit on the number of statements that can appear in
3724
the body, but there has to be at least one.
3725
Occasionally, it is useful to have a body with no statements (usually
3726
as a place keeper for code you haven't written yet). In that
3727
case, you can use the {\tt pass} statement, which does nothing.
3728
\index{pass statement}
3729
\index{statement!pass}
3730
3731
\begin{verbatim}
3732
if x < 0:
3733
pass # TODO: need to handle negative values!
3734
\end{verbatim}
3735
%
3736
3737
\section{Alternative execution}
3738
\label{alternative.execution}
3739
\index{alternative execution}
3740
\index{else keyword}
3741
\index{keyword!else}
3742
3743
A second form of the {\tt if} statement is ``alternative execution'',
3744
in which there are two possibilities and the condition determines
3745
which one runs. The syntax looks like this:
3746
3747
\begin{verbatim}
3748
if x % 2 == 0:
3749
print('x is even')
3750
else:
3751
print('x is odd')
3752
\end{verbatim}
3753
%
3754
If the remainder when {\tt x} is divided by 2 is 0, then we know that
3755
{\tt x} is even, and the program displays an appropriate message. If
3756
the condition is false, the second set of statements runs.
3757
Since the condition must be true or false, exactly one of the
3758
alternatives will run. The alternatives are called {\bf
3759
branches}, because they are branches in the flow of execution.
3760
\index{branch}
3761
3762
3763
3764
\section{Chained conditionals}
3765
\index{chained conditional}
3766
\index{conditional!chained}
3767
3768
Sometimes there are more than two possibilities and we need more than
3769
two branches. One way to express a computation like that is a {\bf
3770
chained conditional}:
3771
3772
\begin{verbatim}
3773
if x < y:
3774
print('x is less than y')
3775
elif x > y:
3776
print('x is greater than y')
3777
else:
3778
print('x and y are equal')
3779
\end{verbatim}
3780
%
3781
{\tt elif} is an abbreviation of ``else if''. Again, exactly one
3782
branch will run. There is no limit on the number of {\tt
3783
elif} statements. If there is an {\tt else} clause, it has to be
3784
at the end, but there doesn't have to be one.
3785
\index{elif keyword}
3786
\index{keyword!elif}
3787
3788
\begin{verbatim}
3789
if choice == 'a':
3790
draw_a()
3791
elif choice == 'b':
3792
draw_b()
3793
elif choice == 'c':
3794
draw_c()
3795
\end{verbatim}
3796
%
3797
Each condition is checked in order. If the first is false,
3798
the next is checked, and so on. If one of them is
3799
true, the corresponding branch runs and the statement
3800
ends. Even if more than one condition is true, only the
3801
first true branch runs.
3802
3803
3804
\section{Nested conditionals}
3805
\index{nested conditional}
3806
\index{conditional!nested}
3807
3808
One conditional can also be nested within another. We could have
3809
written the example in the previous section like this:
3810
3811
\begin{verbatim}
3812
if x == y:
3813
print('x and y are equal')
3814
else:
3815
if x < y:
3816
print('x is less than y')
3817
else:
3818
print('x is greater than y')
3819
\end{verbatim}
3820
%
3821
The outer conditional contains two branches. The
3822
first branch contains a simple statement. The second branch
3823
contains another {\tt if} statement, which has two branches of its
3824
own. Those two branches are both simple statements,
3825
although they could have been conditional statements as well.
3826
3827
Although the indentation of the statements makes the structure
3828
apparent, {\bf nested conditionals} become difficult to read very
3829
quickly. It is a good idea to avoid them when you can.
3830
3831
Logical operators often provide a way to simplify nested conditional
3832
statements. For example, we can rewrite the following code using a
3833
single conditional:
3834
3835
\begin{verbatim}
3836
if 0 < x:
3837
if x < 10:
3838
print('x is a positive single-digit number.')
3839
\end{verbatim}
3840
%
3841
The {\tt print} statement runs only if we make it past both
3842
conditionals, so we can get the same effect with the {\tt and} operator:
3843
3844
\begin{verbatim}
3845
if 0 < x and x < 10:
3846
print('x is a positive single-digit number.')
3847
\end{verbatim}
3848
3849
For this kind of condition, Python provides a more concise option:
3850
3851
\begin{verbatim}
3852
if 0 < x < 10:
3853
print('x is a positive single-digit number.')
3854
\end{verbatim}
3855
3856
3857
\section{Recursion}
3858
\label{recursion}
3859
\index{recursion}
3860
3861
It is legal for one function to call another;
3862
it is also legal for a function to call itself. It may not be obvious
3863
why that is a good thing, but it turns out to be one of the most
3864
magical things a program can do.
3865
For example, look at the following function:
3866
3867
\begin{verbatim}
3868
def countdown(n):
3869
if n <= 0:
3870
print('Blastoff!')
3871
else:
3872
print(n)
3873
countdown(n-1)
3874
\end{verbatim}
3875
%
3876
If {\tt n} is 0 or negative, it outputs the word, ``Blastoff!''
3877
Otherwise, it outputs {\tt n} and then calls a function named {\tt
3878
countdown}---itself---passing {\tt n-1} as an argument.
3879
3880
What happens if we call this function like this?
3881
3882
\begin{verbatim}
3883
>>> countdown(3)
3884
\end{verbatim}
3885
%
3886
The execution of {\tt countdown} begins with {\tt n=3}, and since
3887
{\tt n} is greater than 0, it outputs the value 3, and then calls itself...
3888
3889
\begin{quote}
3890
The execution of {\tt countdown} begins with {\tt n=2}, and since
3891
{\tt n} is greater than 0, it outputs the value 2, and then calls itself...
3892
3893
\begin{quote}
3894
The execution of {\tt countdown} begins with {\tt n=1}, and since
3895
{\tt n} is greater than 0, it outputs the value 1, and then calls itself...
3896
3897
\begin{quote}
3898
The execution of {\tt countdown} begins with {\tt n=0}, and since {\tt
3899
n} is not greater than 0, it outputs the word, ``Blastoff!'' and then
3900
returns.
3901
\end{quote}
3902
3903
The {\tt countdown} that got {\tt n=1} returns.
3904
\end{quote}
3905
3906
The {\tt countdown} that got {\tt n=2} returns.
3907
\end{quote}
3908
3909
The {\tt countdown} that got {\tt n=3} returns.
3910
3911
And then you're back in \verb"__main__". So, the
3912
total output looks like this:
3913
\index{main}
3914
3915
\begin{verbatim}
3916
3
3917
2
3918
1
3919
Blastoff!
3920
\end{verbatim}
3921
%
3922
A function that calls itself is {\bf recursive}; the process of
3923
executing it is called {\bf recursion}.
3924
\index{recursion}
3925
\index{function!recursive}
3926
3927
As another example, we can write a function that prints a
3928
string {\tt n} times.
3929
3930
\begin{verbatim}
3931
def print_n(s, n):
3932
if n <= 0:
3933
return
3934
print(s)
3935
print_n(s, n-1)
3936
\end{verbatim}
3937
%
3938
If {\tt n <= 0} the {\bf return statement} exits the function. The
3939
flow of execution immediately returns to the caller, and the remaining
3940
lines of the function don't run.
3941
\index{return statement}
3942
\index{statement!return}
3943
3944
The rest of the function is similar to {\tt countdown}: it displays
3945
{\tt s} and then calls itself to display {\tt s} $n-1$ additional
3946
times. So the number of lines of output is {\tt 1 + (n - 1)}, which
3947
adds up to {\tt n}.
3948
3949
For simple examples like this, it is probably easier to use a {\tt
3950
for} loop. But we will see examples later that are hard to write
3951
with a {\tt for} loop and easy to write with recursion, so it is
3952
good to start early.
3953
\index{for loop}
3954
\index{loop!for}
3955
3956
3957
\section{Stack diagrams for recursive functions}
3958
\label{recursive.stack}
3959
\index{stack diagram}
3960
\index{function frame}
3961
\index{frame}
3962
3963
In Section~\ref{stackdiagram}, we used a stack diagram to represent
3964
the state of a program during a function call. The same kind of
3965
diagram can help interpret a recursive function.
3966
3967
Every time a function gets called, Python creates a
3968
frame to contain the function's local variables and parameters.
3969
For a recursive function, there might be more than one frame on the
3970
stack at the same time.
3971
3972
Figure~\ref{fig.stack2} shows a stack diagram for {\tt countdown} called with
3973
{\tt n = 3}.
3974
3975
\begin{figure}
3976
\centerline
3977
{\includegraphics[scale=0.8]{figs/stack2.pdf}}
3978
\caption{Stack diagram.}
3979
\label{fig.stack2}
3980
\end{figure}
3981
3982
3983
As usual, the top of the stack is the frame for \verb"__main__".
3984
It is empty because we did not create any variables in
3985
\verb"__main__" or pass any arguments to it.
3986
\index{base case}
3987
\index{recursion!base case}
3988
3989
The four {\tt countdown} frames have different values for the
3990
parameter {\tt n}. The bottom of the stack, where {\tt n=0}, is
3991
called the {\bf base case}. It does not make a recursive call, so
3992
there are no more frames.
3993
3994
As an exercise, draw a stack diagram for \verb"print_n" called with
3995
\verb"s = 'Hello'" and {\tt n=2}.
3996
Then write a function called \verb"do_n" that takes a function
3997
object and a number, {\tt n}, as arguments, and that calls
3998
the given function {\tt n} times.
3999
4000
4001
\section{Infinite recursion}
4002
\index{infinite recursion}
4003
\index{recursion!infinite}
4004
\index{runtime error}
4005
\index{error!runtime}
4006
\index{traceback}
4007
4008
If a recursion never reaches a base case, it goes on making
4009
recursive calls forever, and the program never terminates. This is
4010
known as {\bf infinite recursion}, and it is generally not
4011
a good idea. Here is a minimal program with an infinite recursion:
4012
4013
\begin{verbatim}
4014
def recurse():
4015
recurse()
4016
\end{verbatim}
4017
%
4018
In most programming environments, a program with infinite recursion
4019
does not really run forever. Python reports an error
4020
message when the maximum recursion depth is reached:
4021
\index{exception!RuntimeError}
4022
\index{RuntimeError}
4023
4024
\begin{verbatim}
4025
File "<stdin>", line 2, in recurse
4026
File "<stdin>", line 2, in recurse
4027
File "<stdin>", line 2, in recurse
4028
.
4029
.
4030
.
4031
File "<stdin>", line 2, in recurse
4032
RuntimeError: Maximum recursion depth exceeded
4033
\end{verbatim}
4034
%
4035
This traceback is a little bigger than the one we saw in the
4036
previous chapter. When the error occurs, there are 1000
4037
{\tt recurse} frames on the stack!
4038
4039
If you encounter an infinite recursion by accident, review
4040
your function to confirm that there is a base case that does not
4041
make a recursive call. And if there is a base case, check whether
4042
you are guaranteed to reach it.
4043
4044
4045
\section{Keyboard input}
4046
\index{keyboard input}
4047
4048
The programs we have written so far accept no input from the user.
4049
They just do the same thing every time.
4050
4051
Python provides a built-in function called {\tt input} that
4052
stops the program and
4053
waits for the user to type something. When the user presses {\sf
4054
Return} or {\sf Enter}, the program resumes and \verb"input"
4055
returns what the user typed as a string. In Python 2, the same
4056
function is called \verb"raw_input".
4057
\index{Python 2}
4058
\index{input function}
4059
\index{function!input}
4060
4061
\begin{verbatim}
4062
>>> text = input()
4063
What are you waiting for?
4064
>>> text
4065
'What are you waiting for?'
4066
\end{verbatim}
4067
%
4068
Before getting input from the user, it is a good idea to print a
4069
prompt telling the user what to type. \verb"input" can take a
4070
prompt as an argument:
4071
\index{prompt}
4072
4073
\begin{verbatim}
4074
>>> name = input('What...is your name?\n')
4075
What...is your name?
4076
Arthur, King of the Britons!
4077
>>> name
4078
'Arthur, King of the Britons!'
4079
\end{verbatim}
4080
%
4081
The sequence \verb"\n" at the end of the prompt represents a {\bf
4082
newline}, which is a special character that causes a line break.
4083
That's why the user's input appears below the prompt. \index{newline}
4084
4085
If you expect the user to type an integer, you can try to convert
4086
the return value to {\tt int}:
4087
4088
\begin{verbatim}
4089
>>> prompt = 'What...is the airspeed velocity of an unladen swallow?\n'
4090
>>> speed = input(prompt)
4091
What...is the airspeed velocity of an unladen swallow?
4092
42
4093
>>> int(speed)
4094
42
4095
\end{verbatim}
4096
%
4097
But if the user types something other than a string of digits,
4098
you get an error:
4099
4100
\begin{verbatim}
4101
>>> speed = input(prompt)
4102
What...is the airspeed velocity of an unladen swallow?
4103
What do you mean, an African or a European swallow?
4104
>>> int(speed)
4105
ValueError: invalid literal for int() with base 10
4106
\end{verbatim}
4107
%
4108
We will see how to handle this kind of error later.
4109
\index{ValueError}
4110
\index{exception!ValueError}
4111
4112
4113
\section{Debugging}
4114
\label{whitespace}
4115
\index{debugging}
4116
\index{traceback}
4117
4118
When a syntax or runtime error occurs, the error message contains
4119
a lot of information, but it can be overwhelming. The most
4120
useful parts are usually:
4121
4122
\begin{itemize}
4123
4124
\item What kind of error it was, and
4125
4126
\item Where it occurred.
4127
4128
\end{itemize}
4129
4130
Syntax errors are usually easy to find, but there are a few
4131
gotchas. Whitespace errors can be tricky because spaces and
4132
tabs are invisible and we are used to ignoring them.
4133
\index{whitespace}
4134
4135
\begin{verbatim}
4136
>>> x = 5
4137
>>> y = 6
4138
File "<stdin>", line 1
4139
y = 6
4140
^
4141
IndentationError: unexpected indent
4142
\end{verbatim}
4143
%
4144
In this example, the problem is that the second line is indented by
4145
one space. But the error message points to {\tt y}, which is
4146
misleading. In general, error messages indicate where the problem was
4147
discovered, but the actual error might be earlier in the code,
4148
sometimes on a previous line.
4149
\index{error!runtime}
4150
\index{runtime error}
4151
4152
The same is true of runtime errors. Suppose you are trying
4153
to compute a signal-to-noise ratio in decibels. The formula
4154
is $SNR_{db} = 10 \log_{10} (P_{signal} / P_{noise})$. In Python,
4155
you might write something like this:
4156
4157
\begin{verbatim}
4158
import math
4159
signal_power = 9
4160
noise_power = 10
4161
ratio = signal_power // noise_power
4162
decibels = 10 * math.log10(ratio)
4163
print(decibels)
4164
\end{verbatim}
4165
%
4166
When you run this program, you get an exception:
4167
%
4168
\index{exception!OverflowError}
4169
\index{OverflowError}
4170
4171
\begin{verbatim}
4172
Traceback (most recent call last):
4173
File "snr.py", line 5, in ?
4174
decibels = 10 * math.log10(ratio)
4175
ValueError: math domain error
4176
\end{verbatim}
4177
%
4178
The error message indicates line 5, but there is nothing
4179
wrong with that line. To find the real error, it might be
4180
useful to print the value of {\tt ratio}, which turns out to
4181
be 0. The problem is in line 4, which uses floor division
4182
instead of floating-point division.
4183
\index{floor division}
4184
\index{division!floor}
4185
4186
You should take the time to read error messages carefully, but don't
4187
assume that everything they say is correct.
4188
4189
4190
\section{Glossary}
4191
4192
\begin{description}
4193
4194
\item[floor division:] An operator, denoted {\tt //}, that divides two
4195
numbers and rounds down (toward negative infinity) to an integer.
4196
\index{floor division}
4197
\index{division!floor}
4198
4199
\item[modulus operator:] An operator, denoted with a percent sign
4200
({\tt \%}), that works on integers and returns the remainder when one
4201
number is divided by another.
4202
\index{modulus operator}
4203
\index{operator!modulus}
4204
4205
\item[boolean expression:] An expression whose value is either
4206
{\tt True} or {\tt False}.
4207
\index{boolean expression}
4208
\index{expression!boolean}
4209
4210
\item[relational operator:] One of the operators that compares
4211
its operands: {\tt ==}, {\tt !=}, {\tt >}, {\tt <}, {\tt >=}, and {\tt <=}.
4212
4213
\item[logical operator:] One of the operators that combines boolean
4214
expressions: {\tt and}, {\tt or}, and {\tt not}.
4215
4216
\item[conditional statement:] A statement that controls the flow of
4217
execution depending on some condition.
4218
\index{conditional statement}
4219
\index{statement!conditional}
4220
4221
\item[condition:] The boolean expression in a conditional statement
4222
that determines which branch runs.
4223
\index{condition}
4224
4225
\item[compound statement:] A statement that consists of a header
4226
and a body. The header ends with a colon (:). The body is indented
4227
relative to the header.
4228
\index{compound statement}
4229
4230
\item[branch:] One of the alternative sequences of statements in
4231
a conditional statement.
4232
\index{branch}
4233
4234
\item[chained conditional:] A conditional statement with a series
4235
of alternative branches.
4236
\index{chained conditional}
4237
\index{conditional!chained}
4238
4239
\item[nested conditional:] A conditional statement that appears
4240
in one of the branches of another conditional statement.
4241
\index{nested conditional}
4242
\index{conditional!nested}
4243
4244
\item[return statement:] A statement that causes a function to
4245
end immediately and return to the caller.
4246
4247
\item[recursion:] The process of calling the function that is
4248
currently executing.
4249
\index{recursion}
4250
4251
\item[base case:] A conditional branch in a
4252
recursive function that does not make a recursive call.
4253
\index{base case}
4254
4255
\item[infinite recursion:] A recursion that doesn't have a
4256
base case, or never reaches it. Eventually, an infinite recursion
4257
causes a runtime error.
4258
\index{infinite recursion}
4259
4260
\end{description}
4261
4262
\section{Exercises}
4263
4264
\begin{exercise}
4265
4266
The {\tt time} module provides a function, also named {\tt time}, that
4267
returns the current Greenwich Mean Time in ``the epoch'', which is
4268
an arbitrary time used as a reference point. On UNIX systems, the
4269
epoch is 1 January 1970.
4270
4271
\begin{verbatim}
4272
>>> import time
4273
>>> time.time()
4274
1437746094.5735958
4275
\end{verbatim}
4276
4277
Write a script that reads the current time and converts it to
4278
a time of day in hours, minutes, and seconds, plus the number of
4279
days since the epoch.
4280
4281
\end{exercise}
4282
4283
4284
\begin{exercise}
4285
\index{Fermat's Last Theorem}
4286
4287
Fermat's Last Theorem says that there are no positive integers
4288
$a$, $b$, and $c$ such that
4289
4290
\[ a^n + b^n = c^n \]
4291
%
4292
for any values of $n$ greater than 2.
4293
4294
\begin{enumerate}
4295
4296
\item Write a function named \verb"check_fermat" that takes four
4297
parameters---{\tt a}, {\tt b}, {\tt c} and {\tt n}---and
4298
checks to see if Fermat's theorem holds. If
4299
$n$ is greater than 2 and
4300
4301
\[a^n + b^n = c^n \]
4302
%
4303
the program should print, ``Holy smokes, Fermat was wrong!''
4304
Otherwise the program should print, ``No, that doesn't work.''
4305
4306
\item Write a function that prompts the user to input values
4307
for {\tt a}, {\tt b}, {\tt c} and {\tt n}, converts them to
4308
integers, and uses \verb"check_fermat" to check whether they
4309
violate Fermat's theorem.
4310
4311
\end{enumerate}
4312
4313
\end{exercise}
4314
4315
4316
\begin{exercise}
4317
\index{triangle}
4318
4319
If you are given three sticks, you may or may not be able to arrange
4320
them in a triangle. For example, if one of the sticks is 12 inches
4321
long and the other two are one inch long, you will
4322
not be able to get the short sticks to meet in the middle. For any
4323
three lengths, there is a simple test to see if it is possible to form
4324
a triangle:
4325
4326
\begin{quotation}
4327
If any of the three lengths is greater than the sum of the other
4328
two, then you cannot form a triangle. Otherwise, you
4329
can. (If the sum of two lengths equals the third, they form
4330
what is called a ``degenerate'' triangle.)
4331
\end{quotation}
4332
4333
\begin{enumerate}
4334
4335
\item Write a function named \verb"is_triangle" that takes three
4336
integers as arguments, and that prints either ``Yes'' or ``No'', depending
4337
on whether you can or cannot form a triangle from sticks with the
4338
given lengths.
4339
4340
\item Write a function that prompts the user to input three stick
4341
lengths, converts them to integers, and uses \verb"is_triangle" to
4342
check whether sticks with the given lengths can form a triangle.
4343
4344
\end{enumerate}
4345
4346
\end{exercise}
4347
4348
\begin{exercise}
4349
What is the output of the following program?
4350
Draw a stack diagram that shows the state of the program
4351
when it prints the result.
4352
4353
\begin{verbatim}
4354
def recurse(n, s):
4355
if n == 0:
4356
print(s)
4357
else:
4358
recurse(n-1, n+s)
4359
4360
recurse(3, 0)
4361
\end{verbatim}
4362
4363
\begin{enumerate}
4364
4365
\item What would happen if you called this function like this: {\tt
4366
recurse(-1, 0)}?
4367
4368
\item Write a docstring that explains everything someone would need to
4369
know in order to use this function (and nothing else).
4370
4371
\end{enumerate}
4372
4373
\end{exercise}
4374
4375
4376
The following exercises use the {\tt turtle} module, described in
4377
Chapter~\ref{turtlechap}:
4378
\index{TurtleWorld}
4379
4380
\begin{exercise}
4381
4382
Read the following function and see if you can figure out
4383
what it does (see the examples in Chapter~\ref{turtlechap}). Then run it
4384
and see if you got it right.
4385
4386
\begin{verbatim}
4387
def draw(t, length, n):
4388
if n == 0:
4389
return
4390
angle = 50
4391
t.fd(length*n)
4392
t.lt(angle)
4393
draw(t, length, n-1)
4394
t.rt(2*angle)
4395
draw(t, length, n-1)
4396
t.lt(angle)
4397
t.bk(length*n)
4398
\end{verbatim}
4399
4400
\end{exercise}
4401
4402
4403
\begin{figure}
4404
\centerline
4405
{\includegraphics[scale=0.8]{figs/koch.pdf}}
4406
\caption{A Koch curve.}
4407
\label{fig.koch}
4408
\end{figure}
4409
4410
\begin{exercise}
4411
\index{Koch curve}
4412
4413
The Koch curve is a fractal that looks something like
4414
Figure~\ref{fig.koch}. To draw a Koch curve with length $x$, all you
4415
have to do is
4416
4417
\begin{enumerate}
4418
4419
\item Draw a Koch curve with length $x/3$.
4420
4421
\item Turn left 60 degrees.
4422
4423
\item Draw a Koch curve with length $x/3$.
4424
4425
\item Turn right 120 degrees.
4426
4427
\item Draw a Koch curve with length $x/3$.
4428
4429
\item Turn left 60 degrees.
4430
4431
\item Draw a Koch curve with length $x/3$.
4432
4433
\end{enumerate}
4434
4435
The exception is if $x$ is less than 3: in that case,
4436
you can just draw a straight line with length $x$.
4437
4438
\begin{enumerate}
4439
4440
\item Write a function called {\tt koch} that takes a turtle and
4441
a length as parameters, and that uses the turtle to draw a Koch
4442
curve with the given length.
4443
4444
\item Write a function called {\tt snowflake} that draws three
4445
Koch curves to make the outline of a snowflake.
4446
4447
Solution: \url{http://thinkpython2.com/code/koch.py}.
4448
4449
\item The Koch curve can be generalized in several ways. See
4450
\url{http://en.wikipedia.org/wiki/Koch_snowflake} for examples and
4451
implement your favorite.
4452
4453
\end{enumerate}
4454
\end{exercise}
4455
4456
4457
\chapter{Fruitful functions}
4458
\label{fruitchap}
4459
4460
Many of the Python functions we have used, such as the math
4461
functions, produce return values. But the functions we've written
4462
are all void: they have an effect, like printing a value
4463
or moving a turtle, but they don't have a return value. In
4464
this chapter you will learn to write fruitful functions.
4465
4466
4467
\section{Return values}
4468
\index{return value}
4469
4470
Calling the function generates a return
4471
value, which we usually assign to a variable or use as part of an
4472
expression.
4473
4474
\begin{verbatim}
4475
e = math.exp(1.0)
4476
height = radius * math.sin(radians)
4477
\end{verbatim}
4478
%
4479
The functions we have written so far are void. Speaking casually,
4480
they have no return value; more precisely,
4481
their return value is {\tt None}.
4482
4483
In this chapter, we are (finally) going to write fruitful functions.
4484
The first example is {\tt area}, which returns the area of a circle
4485
with the given radius:
4486
4487
\begin{verbatim}
4488
def area(radius):
4489
a = math.pi * radius**2
4490
return a
4491
\end{verbatim}
4492
%
4493
We have seen the {\tt return} statement before, but in a fruitful
4494
function the {\tt return} statement includes
4495
an expression. This statement means: ``Return immediately from
4496
this function and use the following expression as a return value.''
4497
The expression can be arbitrarily complicated, so we could
4498
have written this function more concisely:
4499
\index{return statement}
4500
\index{statement!return}
4501
4502
\begin{verbatim}
4503
def area(radius):
4504
return math.pi * radius**2
4505
\end{verbatim}
4506
%
4507
On the other hand, {\bf temporary variables} like {\tt a} can make
4508
debugging easier.
4509
\index{temporary variable}
4510
\index{variable!temporary}
4511
4512
Sometimes it is useful to have multiple return statements, one in each
4513
branch of a conditional:
4514
4515
\begin{verbatim}
4516
def absolute_value(x):
4517
if x < 0:
4518
return -x
4519
else:
4520
return x
4521
\end{verbatim}
4522
%
4523
Since these {\tt return} statements are in an alternative conditional,
4524
only one runs.
4525
4526
As soon as a return statement runs, the function
4527
terminates without executing any subsequent statements.
4528
Code that appears after a {\tt return} statement, or any other place
4529
the flow of execution can never reach, is called {\bf dead code}.
4530
\index{dead code}
4531
4532
In a fruitful function, it is a good idea to ensure
4533
that every possible path through the program hits a
4534
{\tt return} statement. For example:
4535
4536
\begin{verbatim}
4537
def absolute_value(x):
4538
if x < 0:
4539
return -x
4540
if x > 0:
4541
return x
4542
\end{verbatim}
4543
%
4544
This function is incorrect because if {\tt x} happens to be 0,
4545
neither condition is true, and the function ends without hitting a
4546
{\tt return} statement. If the flow of execution gets to the end
4547
of a function, the return value is {\tt None}, which is not
4548
the absolute value of 0.
4549
\index{None special value}
4550
\index{special value!None}
4551
4552
\begin{verbatim}
4553
>>> print(absolute_value(0))
4554
None
4555
\end{verbatim}
4556
%
4557
By the way, Python provides a built-in function called
4558
{\tt abs} that computes absolute values.
4559
\index{abs function}
4560
\index{function!abs}
4561
4562
As an exercise, write a {\tt compare} function that
4563
takes two values, {\tt x} and {\tt y}, and returns {\tt 1} if {\tt x > y},
4564
{\tt 0} if {\tt x == y}, and {\tt -1} if {\tt x < y}.
4565
\index{compare function}
4566
\index{function!compare}
4567
4568
4569
\section{Incremental development}
4570
\label{incremental.development}
4571
\index{development plan!incremental}
4572
4573
As you write larger functions, you might find yourself
4574
spending more time debugging.
4575
4576
To deal with increasingly complex programs,
4577
you might want to try a process called
4578
{\bf incremental development}. The goal of incremental development
4579
is to avoid long debugging sessions by adding and testing only
4580
a small amount of code at a time.
4581
\index{testing!incremental development}
4582
\index{Pythagorean theorem}
4583
4584
As an example, suppose you want to find the distance between two
4585
points, given by the coordinates $(x_1, y_1)$ and $(x_2, y_2)$.
4586
By the Pythagorean theorem, the distance is:
4587
4588
\begin{displaymath}
4589
\mathrm{distance} = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}
4590
\end{displaymath}
4591
%
4592
The first step is to consider what a {\tt distance} function should
4593
look like in Python. In other words, what are the inputs (parameters)
4594
and what is the output (return value)?
4595
4596
In this case, the inputs are two points, which you can represent
4597
using four numbers. The return value is the distance represented by
4598
a floating-point value.
4599
4600
Immediately you can write an outline of the function:
4601
4602
\begin{verbatim}
4603
def distance(x1, y1, x2, y2):
4604
return 0.0
4605
\end{verbatim}
4606
%
4607
Obviously, this version doesn't compute distances; it always returns
4608
zero. But it is syntactically correct, and it runs, which means that
4609
you can test it before you make it more complicated.
4610
4611
To test the new function, call it with sample arguments:
4612
4613
\begin{verbatim}
4614
>>> distance(1, 2, 4, 6)
4615
0.0
4616
\end{verbatim}
4617
%
4618
I chose these values so that the horizontal distance is 3 and the
4619
vertical distance is 4; that way, the result is 5, the hypotenuse
4620
of a 3-4-5 triangle. When testing a function, it is
4621
useful to know the right answer.
4622
\index{testing!knowing the answer}
4623
4624
At this point we have confirmed that the function is syntactically
4625
correct, and we can start adding code to the body.
4626
A reasonable next step is to find the differences
4627
$x_2 - x_1$ and $y_2 - y_1$. The next version stores those values in
4628
temporary variables and prints them.
4629
4630
\begin{verbatim}
4631
def distance(x1, y1, x2, y2):
4632
dx = x2 - x1
4633
dy = y2 - y1
4634
print('dx is', dx)
4635
print('dy is', dy)
4636
return 0.0
4637
\end{verbatim}
4638
%
4639
If the function is working, it should display \verb"dx is 3" and
4640
\verb"dy is 4". If so, we know that the function is getting the right
4641
arguments and performing the first computation correctly. If not,
4642
there are only a few lines to check.
4643
4644
Next we compute the sum of squares of {\tt dx} and {\tt dy}:
4645
4646
\begin{verbatim}
4647
def distance(x1, y1, x2, y2):
4648
dx = x2 - x1
4649
dy = y2 - y1
4650
dsquared = dx**2 + dy**2
4651
print('dsquared is: ', dsquared)
4652
return 0.0
4653
\end{verbatim}
4654
%
4655
Again, you would run the program at this stage and check the output
4656
(which should be 25).
4657
Finally, you can use {\tt math.sqrt} to compute and return the result:
4658
\index{sqrt}
4659
\index{function!sqrt}
4660
4661
\begin{verbatim}
4662
def distance(x1, y1, x2, y2):
4663
dx = x2 - x1
4664
dy = y2 - y1
4665
dsquared = dx**2 + dy**2
4666
result = math.sqrt(dsquared)
4667
return result
4668
\end{verbatim}
4669
%
4670
If that works correctly, you are done. Otherwise, you might
4671
want to print the value of {\tt result} before the return
4672
statement.
4673
4674
The final version of the function doesn't display anything when it
4675
runs; it only returns a value. The {\tt print} statements we wrote
4676
are useful for debugging, but once you get the function working, you
4677
should remove them. Code like that is called {\bf scaffolding}
4678
because it is helpful for building the program but is not part of the
4679
final product.
4680
\index{scaffolding}
4681
4682
When you start out, you should add only a line or two of code at a
4683
time. As you gain more experience, you might find yourself writing
4684
and debugging bigger chunks. Either way, incremental development
4685
can save you a lot of debugging time.
4686
4687
The key aspects of the process are:
4688
4689
\begin{enumerate}
4690
4691
\item Start with a working program and make small incremental changes.
4692
At any point, if there is an error, you should have a good idea
4693
where it is.
4694
4695
\item Use variables to hold intermediate values so you can
4696
display and check them.
4697
4698
\item Once the program is working, you might want to remove some of
4699
the scaffolding or consolidate multiple statements into compound
4700
expressions, but only if it does not make the program difficult to
4701
read.
4702
4703
\end{enumerate}
4704
4705
As an exercise, use incremental development to write a function
4706
called {\tt hypotenuse} that returns the length of the hypotenuse of a
4707
right triangle given the lengths of the other two legs as arguments.
4708
Record each stage of the development process as you go.
4709
\index{hypotenuse}
4710
4711
4712
4713
\section{Composition}
4714
\index{composition}
4715
\index{function composition}
4716
4717
As you should expect by now, you can call one function from within
4718
another. As an example, we'll write a function that takes two points,
4719
the center of the circle and a point on the perimeter, and computes
4720
the area of the circle.
4721
4722
Assume that the center point is stored in the variables {\tt xc} and
4723
{\tt yc}, and the perimeter point is in {\tt xp} and {\tt yp}. The
4724
first step is to find the radius of the circle, which is the distance
4725
between the two points. We just wrote a function, {\tt
4726
distance}, that does that:
4727
4728
\begin{verbatim}
4729
radius = distance(xc, yc, xp, yp)
4730
\end{verbatim}
4731
%
4732
The next step is to find the area of a circle with that radius;
4733
we just wrote that, too:
4734
4735
\begin{verbatim}
4736
result = area(radius)
4737
\end{verbatim}
4738
%
4739
Encapsulating these steps in a function, we get:
4740
\index{encapsulation}
4741
4742
\begin{verbatim}
4743
def circle_area(xc, yc, xp, yp):
4744
radius = distance(xc, yc, xp, yp)
4745
result = area(radius)
4746
return result
4747
\end{verbatim}
4748
%
4749
The temporary variables {\tt radius} and {\tt result} are useful for
4750
development and debugging, but once the program is working, we can
4751
make it more concise by composing the function calls:
4752
4753
\begin{verbatim}
4754
def circle_area(xc, yc, xp, yp):
4755
return area(distance(xc, yc, xp, yp))
4756
\end{verbatim}
4757
%
4758
4759
\section{Boolean functions}
4760
\label{boolean}
4761
4762
Functions can return booleans, which is often convenient for hiding
4763
complicated tests inside functions. \index{boolean function}
4764
For example:
4765
4766
\begin{verbatim}
4767
def is_divisible(x, y):
4768
if x % y == 0:
4769
return True
4770
else:
4771
return False
4772
\end{verbatim}
4773
%
4774
It is common to give boolean functions names that sound like yes/no
4775
questions; \verb"is_divisible" returns either {\tt True} or {\tt False}
4776
to indicate whether {\tt x} is divisible by {\tt y}.
4777
4778
Here is an example:
4779
4780
\begin{verbatim}
4781
>>> is_divisible(6, 4)
4782
False
4783
>>> is_divisible(6, 3)
4784
True
4785
\end{verbatim}
4786
%
4787
The result of the {\tt ==} operator is a boolean, so we can write the
4788
function more concisely by returning it directly:
4789
4790
\begin{verbatim}
4791
def is_divisible(x, y):
4792
return x % y == 0
4793
\end{verbatim}
4794
%
4795
Boolean functions are often used in conditional statements:
4796
\index{conditional statement}
4797
\index{statement!conditional}
4798
4799
\begin{verbatim}
4800
if is_divisible(x, y):
4801
print('x is divisible by y')
4802
\end{verbatim}
4803
%
4804
It might be tempting to write something like:
4805
4806
\begin{verbatim}
4807
if is_divisible(x, y) == True:
4808
print('x is divisible by y')
4809
\end{verbatim}
4810
%
4811
But the extra comparison is unnecessary.
4812
4813
As an exercise, write a function \verb"is_between(x, y, z)" that
4814
returns {\tt True} if $x \le y \le z$ or {\tt False} otherwise.
4815
4816
4817
\section{More recursion}
4818
\label{more.recursion}
4819
\index{recursion}
4820
\index{Turing complete language}
4821
\index{language!Turing complete}
4822
\index{Turing, Alan}
4823
\index{Turing Thesis}
4824
4825
We have only covered a small subset of Python, but you might
4826
be interested to know that this subset is a {\em complete}
4827
programming language, which means that anything that can be
4828
computed can be expressed in this language. Any program ever written
4829
could be rewritten using only the language features you have learned
4830
so far (actually, you would need a few commands to control devices
4831
like the mouse, disks, etc., but that's all).
4832
4833
Proving that claim is a nontrivial exercise first accomplished by Alan
4834
Turing, one of the first computer scientists (some would argue that he
4835
was a mathematician, but a lot of early computer scientists started as
4836
mathematicians). Accordingly, it is known as the Turing Thesis.
4837
For a more complete (and accurate) discussion of the Turing Thesis,
4838
I recommend Michael Sipser's book {\em Introduction to the
4839
Theory of Computation}.
4840
4841
To give you an idea of what you can do with the tools you have learned
4842
so far, we'll evaluate a few recursively defined mathematical
4843
functions. A recursive definition is similar to a circular
4844
definition, in the sense that the definition contains a reference to
4845
the thing being defined. A truly circular definition is not very
4846
useful:
4847
4848
\begin{description}
4849
4850
\item[vorpal:] An adjective used to describe something that is vorpal.
4851
\index{vorpal}
4852
\index{circular definition}
4853
\index{definition!circular}
4854
4855
\end{description}
4856
4857
If you saw that definition in the dictionary, you might be annoyed. On
4858
the other hand, if you looked up the definition of the factorial
4859
function, denoted with the symbol $!$, you might get something like
4860
this:
4861
%
4862
\begin{eqnarray*}
4863
&& 0! = 1 \\
4864
&& n! = n (n-1)!
4865
\end{eqnarray*}
4866
%
4867
This definition says that the factorial of 0 is 1, and the factorial
4868
of any other value, $n$, is $n$ multiplied by the factorial of $n-1$.
4869
4870
So $3!$ is 3 times $2!$, which is 2 times $1!$, which is 1 times
4871
$0!$. Putting it all together, $3!$ equals 3 times 2 times 1 times 1,
4872
which is 6.
4873
\index{factorial function}
4874
\index{function!factorial}
4875
\index{recursive definition}
4876
4877
If you can write a recursive definition of something, you can
4878
write a Python program to evaluate it. The first step is to decide
4879
what the parameters should be. In this case it should be clear
4880
that {\tt factorial} takes an integer:
4881
4882
\begin{verbatim}
4883
def factorial(n):
4884
\end{verbatim}
4885
%
4886
If the argument happens to be 0, all we have to do is return 1:
4887
4888
\begin{verbatim}
4889
def factorial(n):
4890
if n == 0:
4891
return 1
4892
\end{verbatim}
4893
%
4894
Otherwise, and this is the interesting part, we have to make a
4895
recursive call to find the factorial of $n-1$ and then multiply it by
4896
$n$:
4897
4898
\begin{verbatim}
4899
def factorial(n):
4900
if n == 0:
4901
return 1
4902
else:
4903
recurse = factorial(n-1)
4904
result = n * recurse
4905
return result
4906
\end{verbatim}
4907
%
4908
The flow of execution for this program is similar to the flow of {\tt
4909
countdown} in Section~\ref{recursion}. If we call {\tt factorial}
4910
with the value 3:
4911
4912
Since 3 is not 0, we take the second branch and calculate the factorial
4913
of {\tt n-1}...
4914
4915
\begin{quote}
4916
Since 2 is not 0, we take the second branch and calculate the factorial of
4917
{\tt n-1}...
4918
4919
4920
\begin{quote}
4921
Since 1 is not 0, we take the second branch and calculate the factorial
4922
of {\tt n-1}...
4923
4924
4925
\begin{quote}
4926
Since 0 equals 0, we take the first branch and return 1
4927
without making any more recursive calls.
4928
\end{quote}
4929
4930
4931
The return value, 1, is multiplied by $n$, which is 1, and the
4932
result is returned.
4933
\end{quote}
4934
4935
4936
The return value, 1, is multiplied by $n$, which is 2, and the
4937
result is returned.
4938
\end{quote}
4939
4940
4941
The return value (2) is multiplied by $n$, which is 3, and the result, 6,
4942
becomes the return value of the function call that started the whole
4943
process.
4944
\index{stack diagram}
4945
4946
Figure~\ref{fig.stack3} shows what the stack diagram looks like for
4947
this sequence of function calls.
4948
4949
\begin{figure}
4950
\centerline
4951
{\includegraphics[scale=0.8]{figs/stack3.pdf}}
4952
\caption{Stack diagram.}
4953
\label{fig.stack3}
4954
\end{figure}
4955
4956
The return values are shown being passed back up the stack. In each
4957
frame, the return value is the value of {\tt result}, which is the
4958
product of {\tt n} and {\tt recurse}.
4959
\index{function frame}
4960
\index{frame}
4961
4962
In the last frame, the local
4963
variables {\tt recurse} and {\tt result} do not exist, because
4964
the branch that creates them does not run.
4965
4966
4967
\section{Leap of faith}
4968
\index{recursion}
4969
\index{leap of faith}
4970
4971
Following the flow of execution is one way to read programs, but
4972
it can quickly become overwhelming. An
4973
alternative is what I call the ``leap of faith''. When you come to a
4974
function call, instead of following the flow of execution, you {\em
4975
assume} that the function works correctly and returns the right
4976
result.
4977
4978
In fact, you are already practicing this leap of faith when you use
4979
built-in functions. When you call {\tt math.cos} or {\tt math.exp},
4980
you don't examine the bodies of those functions. You just
4981
assume that they work because the people who wrote the built-in
4982
functions were good programmers.
4983
4984
The same is true when you call one of your own functions. For
4985
example, in Section~\ref{boolean}, we wrote a function called
4986
\verb"is_divisible" that determines whether one number is divisible by
4987
another. Once we have convinced ourselves that this function is
4988
correct---by examining the code and testing---we can use the function
4989
without looking at the body again.
4990
\index{testing!leap of faith}
4991
4992
The same is true of recursive programs. When you get to the recursive
4993
call, instead of following the flow of execution, you should assume
4994
that the recursive call works (returns the correct result) and then ask
4995
yourself, ``Assuming that I can find the factorial of $n-1$, can I
4996
compute the factorial of $n$?'' It is clear that you
4997
can, by multiplying by $n$.
4998
4999
Of course, it's a bit strange to assume that the function works
5000
correctly when you haven't finished writing it, but that's why
5001
it's called a leap of faith!
5002
5003
5004
\section{One more example}
5005
\label{one.more.example}
5006
5007
\index{fibonacci function}
5008
\index{function!fibonacci}
5009
After {\tt factorial}, the most common example of a recursively
5010
defined mathematical function is {\tt fibonacci}, which has the
5011
following definition (see
5012
\url{http://en.wikipedia.org/wiki/Fibonacci_number}):
5013
%
5014
\begin{eqnarray*}
5015
&& \mathrm{fibonacci}(0) = 0 \\
5016
&& \mathrm{fibonacci}(1) = 1 \\
5017
&& \mathrm{fibonacci}(n) = \mathrm{fibonacci}(n-1) + \mathrm{fibonacci}(n-2)
5018
\end{eqnarray*}
5019
%
5020
Translated into Python, it looks like this:
5021
5022
\begin{verbatim}
5023
def fibonacci(n):
5024
if n == 0:
5025
return 0
5026
elif n == 1:
5027
return 1
5028
else:
5029
return fibonacci(n-1) + fibonacci(n-2)
5030
\end{verbatim}
5031
%
5032
If you try to follow the flow of execution here, even for fairly
5033
small values of $n$, your head explodes. But according to the
5034
leap of faith, if you assume that the two recursive calls
5035
work correctly, then it is clear that you get
5036
the right result by adding them together.
5037
\index{flow of execution}
5038
5039
5040
\section{Checking types}
5041
\label{guardian}
5042
5043
What happens if we call {\tt factorial} and give it 1.5 as an argument?
5044
\index{type checking}
5045
\index{error checking}
5046
\index{factorial function}
5047
\index{RuntimeError}
5048
5049
\begin{verbatim}
5050
>>> factorial(1.5)
5051
RuntimeError: Maximum recursion depth exceeded
5052
\end{verbatim}
5053
%
5054
It looks like an infinite recursion. How can that be? The function
5055
has a base case---when {\tt n == 0}. But if {\tt n} is not an integer,
5056
we can {\em miss} the base case and recurse forever.
5057
\index{infinite recursion}
5058
\index{recursion!infinite}
5059
5060
In the first recursive call, the value of {\tt n} is 0.5.
5061
In the next, it is -0.5. From there, it gets smaller
5062
(more negative), but it will never be 0.
5063
5064
We have two choices. We can try to generalize the {\tt factorial}
5065
function to work with floating-point numbers, or we can make {\tt
5066
factorial} check the type of its argument. The first option is
5067
called the gamma function and it's a
5068
little beyond the scope of this book. So we'll go for the second.
5069
\index{gamma function}
5070
5071
We can use the built-in function {\tt isinstance} to verify the type
5072
of the argument. While we're at it, we can also make sure the
5073
argument is positive:
5074
\index{isinstance function}
5075
\index{function!isinstance}
5076
5077
\begin{verbatim}
5078
def factorial(n):
5079
if not isinstance(n, int):
5080
print('Factorial is only defined for integers.')
5081
return None
5082
elif n < 0:
5083
print('Factorial is not defined for negative integers.')
5084
return None
5085
elif n == 0:
5086
return 1
5087
else:
5088
return n * factorial(n-1)
5089
\end{verbatim}
5090
%
5091
The first base case handles nonintegers; the
5092
second handles negative integers. In both cases, the program prints
5093
an error message and returns {\tt None} to indicate that something
5094
went wrong:
5095
5096
\begin{verbatim}
5097
>>> print(factorial('fred'))
5098
Factorial is only defined for integers.
5099
None
5100
>>> print(factorial(-2))
5101
Factorial is not defined for negative integers.
5102
None
5103
\end{verbatim}
5104
%
5105
If we get past both checks, we know that $n$ is a non-negative integer, so we can prove that the recursion terminates.
5106
\index{guardian pattern}
5107
\index{pattern!guardian}
5108
5109
This program demonstrates a pattern sometimes called a {\bf guardian}.
5110
The first two conditionals act as guardians, protecting the code that
5111
follows from values that might cause an error. The guardians make it
5112
possible to prove the correctness of the code.
5113
5114
In Section~\ref{raise} we will see a more flexible alternative to printing
5115
an error message: raising an exception.
5116
5117
5118
\section{Debugging}
5119
\label{factdebug}
5120
5121
Breaking a large program into smaller functions creates natural
5122
checkpoints for debugging. If a function is not
5123
working, there are three possibilities to consider:
5124
\index{debugging}
5125
5126
\begin{itemize}
5127
5128
\item There is something wrong with the arguments the function
5129
is getting; a precondition is violated.
5130
5131
\item There is something wrong with the function; a postcondition
5132
is violated.
5133
5134
\item There is something wrong with the return value or the
5135
way it is being used.
5136
5137
\end{itemize}
5138
5139
To rule out the first possibility, you can add a {\tt print} statement
5140
at the beginning of the function and display the values of the
5141
parameters (and maybe their types). Or you can write code
5142
that checks the preconditions explicitly.
5143
\index{precondition}
5144
\index{postcondition}
5145
5146
If the parameters look good, add a {\tt print} statement before each
5147
{\tt return} statement and display the return value. If
5148
possible, check the result by hand. Consider calling the
5149
function with values that make it easy to check the result
5150
(as in Section~\ref{incremental.development}).
5151
5152
If the function seems to be working, look at the function call
5153
to make sure the return value is being used correctly (or used
5154
at all!).
5155
\index{flow of execution}
5156
5157
Adding print statements at the beginning and end of a function
5158
can help make the flow of execution more visible.
5159
For example, here is a version of {\tt factorial} with
5160
print statements:
5161
5162
\begin{verbatim}
5163
def factorial(n):
5164
space = ' ' * (4 * n)
5165
print(space, 'factorial', n)
5166
if n == 0:
5167
print(space, 'returning 1')
5168
return 1
5169
else:
5170
recurse = factorial(n-1)
5171
result = n * recurse
5172
print(space, 'returning', result)
5173
return result
5174
\end{verbatim}
5175
%
5176
{\tt space} is a string of space characters that controls the
5177
indentation of the output. Here is the result of {\tt factorial(4)} :
5178
5179
\begin{verbatim}
5180
factorial 4
5181
factorial 3
5182
factorial 2
5183
factorial 1
5184
factorial 0
5185
returning 1
5186
returning 1
5187
returning 2
5188
returning 6
5189
returning 24
5190
\end{verbatim}
5191
%
5192
If you are confused about the flow of execution, this kind of
5193
output can be helpful. It takes some time to develop effective
5194
scaffolding, but a little bit of scaffolding can save a lot of debugging.
5195
5196
5197
\section{Glossary}
5198
5199
\begin{description}
5200
5201
\item[temporary variable:] A variable used to store an intermediate value in
5202
a complex calculation.
5203
\index{temporary variable}
5204
\index{variable!temporary}
5205
5206
\item[dead code:] Part of a program that can never run, often because
5207
it appears after a {\tt return} statement.
5208
\index{dead code}
5209
5210
\item[incremental development:] A program development plan intended to
5211
avoid debugging by adding and testing only
5212
a small amount of code at a time.
5213
\index{incremental development}
5214
5215
\item[scaffolding:] Code that is used during program development but is
5216
not part of the final version.
5217
\index{scaffolding}
5218
5219
\item[guardian:] A programming pattern that uses a conditional
5220
statement to check for and handle circumstances that
5221
might cause an error.
5222
\index{guardian pattern}
5223
\index{pattern!guardian}
5224
5225
\end{description}
5226
5227
5228
\section{Exercises}
5229
5230
\begin{exercise}
5231
5232
Draw a stack diagram for the following program. What does the program print?
5233
\index{stack diagram}
5234
5235
\begin{verbatim}
5236
def b(z):
5237
prod = a(z, z)
5238
print(z, prod)
5239
return prod
5240
5241
def a(x, y):
5242
x = x + 1
5243
return x * y
5244
5245
def c(x, y, z):
5246
total = x + y + z
5247
square = b(total)**2
5248
return square
5249
5250
x = 1
5251
y = x + 1
5252
print(c(x, y+3, x+y))
5253
\end{verbatim}
5254
5255
\end{exercise}
5256
5257
5258
\begin{exercise}
5259
\label{ackermann}
5260
5261
The Ackermann function, $A(m, n)$, is defined:
5262
5263
\begin{eqnarray*}
5264
A(m, n) = \begin{cases}
5265
n+1 & \mbox{if } m = 0 \\
5266
A(m-1, 1) & \mbox{if } m > 0 \mbox{ and } n = 0 \\
5267
A(m-1, A(m, n-1)) & \mbox{if } m > 0 \mbox{ and } n > 0.
5268
\end{cases}
5269
\end{eqnarray*}
5270
%
5271
See \url{http://en.wikipedia.org/wiki/Ackermann_function}.
5272
Write a function named {\tt ack} that evaluates the Ackermann function.
5273
Use your function to evaluate {\tt ack(3, 4)}, which should be 125.
5274
What happens for larger values of {\tt m} and {\tt n}?
5275
Solution: \url{http://thinkpython2.com/code/ackermann.py}.
5276
\index{Ackermann function}
5277
\index{function!ack}
5278
5279
\end{exercise}
5280
5281
5282
\begin{exercise}
5283
\label{palindrome}
5284
5285
A palindrome is a word that is spelled the same backward and
5286
forward, like ``noon'' and ``redivider''. Recursively, a word
5287
is a palindrome if the first and last letters are the same
5288
and the middle is a palindrome.
5289
\index{palindrome}
5290
5291
The following are functions that take a string argument and
5292
return the first, last, and middle letters:
5293
5294
\begin{verbatim}
5295
def first(word):
5296
return word[0]
5297
5298
def last(word):
5299
return word[-1]
5300
5301
def middle(word):
5302
return word[1:-1]
5303
\end{verbatim}
5304
%
5305
We'll see how they work in Chapter~\ref{strings}.
5306
5307
\begin{enumerate}
5308
5309
\item Type these functions into a file named {\tt palindrome.py}
5310
and test them out. What happens if you call {\tt middle} with
5311
a string with two letters? One letter? What about the empty
5312
string, which is written \verb"''" and contains no letters?
5313
5314
\item Write a function called \verb"is_palindrome" that takes
5315
a string argument and returns {\tt True} if it is a palindrome
5316
and {\tt False} otherwise. Remember that you can use the
5317
built-in function {\tt len} to check the length of a string.
5318
5319
\end{enumerate}
5320
5321
Solution: \url{http://thinkpython2.com/code/palindrome_soln.py}.
5322
5323
\end{exercise}
5324
5325
\begin{exercise}
5326
5327
A number, $a$, is a power of $b$ if it is divisible by $b$
5328
and $a/b$ is a power of $b$. Write a function called
5329
\verb"is_power" that takes parameters {\tt a} and {\tt b}
5330
and returns {\tt True} if {\tt a} is a power of {\tt b}.
5331
Note: you will have to think about the base case.
5332
5333
\end{exercise}
5334
5335
5336
\begin{exercise}
5337
\index{greatest common divisor (GCD)}
5338
\index{GCD (greatest common divisor)}
5339
5340
The greatest common divisor (GCD) of $a$ and $b$ is the largest number
5341
that divides both of them with no remainder.
5342
5343
One way to find the GCD of two numbers is based on the observation
5344
that if $r$ is the remainder when $a$ is divided by $b$, then $gcd(a,
5345
b) = gcd(b, r)$. As a base case, we can use $gcd(a, 0) = a$.
5346
5347
Write a function called
5348
\verb"gcd" that takes parameters {\tt a} and {\tt b}
5349
and returns their greatest common divisor.
5350
5351
Credit: This exercise is based on an example from Abelson and
5352
Sussman's {\em Structure and Interpretation of Computer Programs}.
5353
5354
\end{exercise}
5355
5356
5357
\chapter{Iteration}
5358
5359
This chapter is about iteration, which is the ability to run
5360
a block of statements repeatedly. We saw a kind of iteration,
5361
using recursion, in Section~\ref{recursion}.
5362
We saw another kind, using a {\tt for} loop,
5363
in Section~\ref{repetition}. In this chapter we'll see yet another
5364
kind, using a {\tt while} statement.
5365
But first I want to say a little more about variable assignment.
5366
5367
5368
\section{Reassignment}
5369
\index{assignment}
5370
\index{statement!assignment}
5371
\index{reassignment}
5372
5373
As you may have discovered, it is legal to make more than one
5374
assignment to the same variable. A new assignment makes an existing
5375
variable refer to a new value (and stop referring to the old value).
5376
5377
\begin{verbatim}
5378
>>> x = 5
5379
>>> x
5380
5
5381
>>> x = 7
5382
>>> x
5383
7
5384
\end{verbatim}
5385
%
5386
The first time we display
5387
{\tt x}, its value is 5; the second time, its
5388
value is 7.
5389
5390
Figure~\ref{fig.assign2} shows what {\bf reassignment} looks
5391
like in a state diagram. \index{state diagram} \index{diagram!state}
5392
5393
At this point I want to address a common source of
5394
confusion.
5395
Because Python uses the equal sign ({\tt =}) for assignment, it is
5396
tempting to interpret a statement like {\tt a = b} as a
5397
mathematical
5398
proposition of equality; that is, the claim that {\tt a} and
5399
{\tt b} are equal. But this interpretation is wrong.
5400
\index{equality and assignment}
5401
5402
First, equality is a symmetric relationship and assignment is not. For
5403
example, in mathematics, if $a=7$ then $7=a$. But in Python, the
5404
statement {\tt a = 7} is legal and {\tt 7 = a} is not.
5405
5406
Also, in mathematics, a proposition of equality is either true or
5407
false for all time. If $a=b$ now, then $a$ will always equal $b$.
5408
In Python, an assignment statement can make two variables equal, but
5409
they don't have to stay that way:
5410
5411
\begin{verbatim}
5412
>>> a = 5
5413
>>> b = a # a and b are now equal
5414
>>> a = 3 # a and b are no longer equal
5415
>>> b
5416
5
5417
\end{verbatim}
5418
%
5419
The third line changes the value of {\tt a} but does not change the
5420
value of {\tt b}, so they are no longer equal.
5421
5422
Reassigning variables is often useful, but you should use it
5423
with caution. If the values of variables change frequently, it can
5424
make the code difficult to read and debug.
5425
5426
\begin{figure}
5427
\centerline
5428
{\includegraphics[scale=0.8]{figs/assign2.pdf}}
5429
\caption{State diagram.}
5430
\label{fig.assign2}
5431
\end{figure}
5432
5433
5434
5435
\section{Updating variables}
5436
\label{update}
5437
5438
\index{update}
5439
\index{variable!updating}
5440
5441
A common kind of reassignment is an {\bf update},
5442
where the new value of the variable depends on the old.
5443
5444
\begin{verbatim}
5445
>>> x = x + 1
5446
\end{verbatim}
5447
%
5448
This means ``get the current value of {\tt x}, add one, and then
5449
update {\tt x} with the new value.''
5450
5451
If you try to update a variable that doesn't exist, you get an
5452
error, because Python evaluates the right side before it assigns
5453
a value to {\tt x}:
5454
5455
\begin{verbatim}
5456
>>> x = x + 1
5457
NameError: name 'x' is not defined
5458
\end{verbatim}
5459
%
5460
Before you can update a variable, you have to {\bf initialize}
5461
it, usually with a simple assignment:
5462
\index{initialization (before update)}
5463
5464
\begin{verbatim}
5465
>>> x = 0
5466
>>> x = x + 1
5467
\end{verbatim}
5468
%
5469
Updating a variable by adding 1 is called an {\bf increment};
5470
subtracting 1 is called a {\bf decrement}.
5471
\index{increment}
5472
\index{decrement}
5473
5474
5475
5476
5477
\section{The {\tt while} statement}
5478
\index{statement!while}
5479
\index{while loop}
5480
\index{loop!while}
5481
\index{iteration}
5482
5483
Computers are often used to automate repetitive tasks. Repeating
5484
identical or similar tasks without making errors is something that
5485
computers do well and people do poorly. In a computer program,
5486
repetition is also called {\bf iteration}.
5487
5488
We have already seen two functions, {\tt countdown} and
5489
\verb"print_n", that iterate using recursion. Because iteration is so
5490
common, Python provides language features to make it easier.
5491
One is the {\tt for} statement we saw in Section~\ref{repetition}.
5492
We'll get back to that later.
5493
5494
Another is the {\tt while} statement. Here is a version of {\tt
5495
countdown} that uses a {\tt while} statement:
5496
5497
\begin{verbatim}
5498
def countdown(n):
5499
while n > 0:
5500
print(n)
5501
n = n - 1
5502
print('Blastoff!')
5503
\end{verbatim}
5504
%
5505
You can almost read the {\tt while} statement as if it were English.
5506
It means, ``While {\tt n} is greater than 0,
5507
display the value of {\tt n} and then decrement
5508
{\tt n}. When you get to 0, display the word {\tt Blastoff!}''
5509
\index{flow of execution}
5510
5511
More formally, here is the flow of execution for a {\tt while} statement:
5512
5513
\begin{enumerate}
5514
5515
\item Determine whether the condition is true or false.
5516
5517
\item If false, exit the {\tt while} statement
5518
and continue execution at the next statement.
5519
5520
\item If the condition is true, run the
5521
body and then go back to step 1.
5522
5523
\end{enumerate}
5524
5525
This type of flow is called a loop because the third step
5526
loops back around to the top.
5527
\index{condition}
5528
\index{loop}
5529
\index{body}
5530
5531
The body of the loop should change the value of one or more variables
5532
so that the condition becomes false eventually and the loop
5533
terminates. Otherwise the loop will repeat forever, which is called
5534
an {\bf infinite loop}. An endless source of amusement for computer
5535
scientists is the observation that the directions on shampoo,
5536
``Lather, rinse, repeat'', are an infinite loop.
5537
\index{infinite loop}
5538
\index{loop!infinite}
5539
5540
In the case of {\tt countdown}, we can prove that the loop
5541
terminates: if {\tt n} is zero or negative, the loop never runs.
5542
Otherwise, {\tt n} gets smaller each time through the
5543
loop, so eventually we have to get to 0.
5544
5545
For some other loops, it is not so easy to tell. For example:
5546
5547
\begin{verbatim}
5548
def sequence(n):
5549
while n != 1:
5550
print(n)
5551
if n % 2 == 0: # n is even
5552
n = n / 2
5553
else: # n is odd
5554
n = n*3 + 1
5555
\end{verbatim}
5556
%
5557
The condition for this loop is {\tt n != 1}, so the loop will continue
5558
until {\tt n} is {\tt 1}, which makes the condition false.
5559
5560
Each time through the loop, the program outputs the value of {\tt n}
5561
and then checks whether it is even or odd. If it is even, {\tt n} is
5562
divided by 2. If it is odd, the value of {\tt n} is replaced with
5563
{\tt n*3 + 1}. For example, if the argument passed to {\tt sequence}
5564
is 3, the resulting values of {\tt n} are 3, 10, 5, 16, 8, 4, 2, 1.
5565
5566
Since {\tt n} sometimes increases and sometimes decreases, there is no
5567
obvious proof that {\tt n} will ever reach 1, or that the program
5568
terminates. For some particular values of {\tt n}, we can prove
5569
termination. For example, if the starting value is a power of two,
5570
{\tt n} will be even every time through the loop
5571
until it reaches 1. The previous example ends with such a sequence,
5572
starting with 16.
5573
\index{Collatz conjecture}
5574
5575
The hard question is whether we can prove that this program terminates
5576
for {\em all} positive values of {\tt n}. So far, no one has
5577
been able to prove it {\em or} disprove it! (See
5578
\url{http://en.wikipedia.org/wiki/Collatz_conjecture}.)
5579
5580
As an exercise, rewrite the function \verb"print_n" from
5581
Section~\ref{recursion} using iteration instead of recursion.
5582
5583
5584
\section{{\tt break}}
5585
\index{break statement}
5586
\index{statement!break}
5587
5588
Sometimes you don't know it's time to end a loop until you get half
5589
way through the body. In that case you can use the {\tt break}
5590
statement to jump out of the loop.
5591
5592
For example, suppose you want to take input from the user until they
5593
type {\tt done}. You could write:
5594
5595
\begin{verbatim}
5596
while True:
5597
line = input('> ')
5598
if line == 'done':
5599
break
5600
print(line)
5601
5602
print('Done!')
5603
\end{verbatim}
5604
%
5605
The loop condition is {\tt True}, which is always true, so the
5606
loop runs until it hits the break statement.
5607
5608
Each time through, it prompts the user with an angle bracket.
5609
If the user types {\tt done}, the {\tt break} statement exits
5610
the loop. Otherwise the program echoes whatever the user types
5611
and goes back to the top of the loop. Here's a sample run:
5612
5613
\begin{verbatim}
5614
> not done
5615
not done
5616
> done
5617
Done!
5618
\end{verbatim}
5619
%
5620
This way of writing {\tt while} loops is common because you
5621
can check the condition anywhere in the loop (not just at the
5622
top) and you can express the stop condition affirmatively
5623
(``stop when this happens'') rather than negatively (``keep going
5624
until that happens'').
5625
5626
5627
\section{Square roots}
5628
\label{squareroot}
5629
\index{square root}
5630
5631
Loops are often used in programs that compute
5632
numerical results by starting with an approximate answer and
5633
iteratively improving it.
5634
\index{Newton's method}
5635
5636
For example, one way of computing square roots is Newton's method.
5637
Suppose that you want to know the square root of $a$. If you start
5638
with almost any estimate, $x$, you can compute a better
5639
estimate with the following formula:
5640
5641
\[ y = \frac{x + a/x}{2} \]
5642
%
5643
For example, if $a$ is 4 and $x$ is 3:
5644
5645
\begin{verbatim}
5646
>>> a = 4
5647
>>> x = 3
5648
>>> y = (x + a/x) / 2
5649
>>> y
5650
2.16666666667
5651
\end{verbatim}
5652
%
5653
The result is closer to the correct answer ($\sqrt{4} = 2$). If we
5654
repeat the process with the new estimate, it gets even closer:
5655
5656
\begin{verbatim}
5657
>>> x = y
5658
>>> y = (x + a/x) / 2
5659
>>> y
5660
2.00641025641
5661
\end{verbatim}
5662
%
5663
After a few more updates, the estimate is almost exact:
5664
\index{update}
5665
5666
\begin{verbatim}
5667
>>> x = y
5668
>>> y = (x + a/x) / 2
5669
>>> y
5670
2.00001024003
5671
>>> x = y
5672
>>> y = (x + a/x) / 2
5673
>>> y
5674
2.00000000003
5675
\end{verbatim}
5676
%
5677
In general we don't know ahead of time how many steps it takes
5678
to get to the right answer, but we know when we get there
5679
because the estimate
5680
stops changing:
5681
5682
\begin{verbatim}
5683
>>> x = y
5684
>>> y = (x + a/x) / 2
5685
>>> y
5686
2.0
5687
>>> x = y
5688
>>> y = (x + a/x) / 2
5689
>>> y
5690
2.0
5691
\end{verbatim}
5692
%
5693
When {\tt y == x}, we can stop. Here is a loop that starts
5694
with an initial estimate, {\tt x}, and improves it until it
5695
stops changing:
5696
5697
\begin{verbatim}
5698
while True:
5699
print(x)
5700
y = (x + a/x) / 2
5701
if y == x:
5702
break
5703
x = y
5704
\end{verbatim}
5705
%
5706
For most values of {\tt a} this works fine, but in general it is
5707
dangerous to test {\tt float} equality.
5708
Floating-point values are only approximately right:
5709
most rational numbers, like $1/3$, and irrational numbers, like
5710
$\sqrt{2}$, can't be represented exactly with a {\tt float}.
5711
\index{floating-point}
5712
\index{epsilon}
5713
5714
Rather than checking whether {\tt x} and {\tt y} are exactly equal, it
5715
is safer to use the built-in function {\tt abs} to compute the
5716
absolute value, or magnitude, of the difference between them:
5717
5718
\begin{verbatim}
5719
if abs(y-x) < epsilon:
5720
break
5721
\end{verbatim}
5722
%
5723
Where \verb"epsilon" has a value like {\tt 0.0000001} that
5724
determines how close is close enough.
5725
5726
5727
\section{Algorithms}
5728
\index{algorithm}
5729
5730
Newton's method is an example of an {\bf algorithm}: it is a
5731
mechanical process for solving a category of problems (in this
5732
case, computing square roots).
5733
5734
To understand what an algorithm is, it might help to start with
5735
something that is not an algorithm. When you learned to multiply
5736
single-digit numbers, you probably memorized the multiplication table.
5737
In effect, you memorized 100 specific solutions. That kind of
5738
knowledge is not algorithmic.
5739
5740
But if you were ``lazy'', you might have learned a few
5741
tricks. For example, to find the product of $n$ and 9, you can
5742
write $n-1$ as the first digit and $10-n$ as the second
5743
digit. This trick is a general solution for multiplying any
5744
single-digit number by 9. That's an algorithm!
5745
\index{addition with carrying}
5746
\index{carrying, addition with}
5747
\index{subtraction!with borrowing}
5748
\index{borrowing, subtraction with}
5749
5750
Similarly, the techniques you learned for addition with carrying,
5751
subtraction with borrowing, and long division are all algorithms. One
5752
of the characteristics of algorithms is that they do not require any
5753
intelligence to carry out. They are mechanical processes where
5754
each step follows from the last according to a simple set of rules.
5755
5756
Executing algorithms is boring, but designing them is interesting,
5757
intellectually challenging, and a central part of computer science.
5758
5759
Some of the things that people do naturally, without difficulty or
5760
conscious thought, are the hardest to express algorithmically.
5761
Understanding natural language is a good example. We all do it, but
5762
so far no one has been able to explain {\em how} we do it, at least
5763
not in the form of an algorithm.
5764
5765
5766
\section{Debugging}
5767
\label{bisectbug}
5768
5769
As you start writing bigger programs, you might find yourself
5770
spending more time debugging. More code means more chances to
5771
make an error and more places for bugs to hide.
5772
\index{debugging!by bisection}
5773
\index{bisection, debugging by}
5774
5775
One way to cut your debugging time is ``debugging by bisection''.
5776
For example, if there are 100 lines in your program and you
5777
check them one at a time, it would take 100 steps.
5778
5779
Instead, try to break the problem in half. Look at the middle
5780
of the program, or near it, for an intermediate value you
5781
can check. Add a {\tt print} statement (or something else
5782
that has a verifiable effect) and run the program.
5783
5784
If the mid-point check is incorrect, there must be a problem in the
5785
first half of the program. If it is correct, the problem is
5786
in the second half.
5787
5788
Every time you perform a check like this, you halve the number of
5789
lines you have to search. After six steps (which is fewer than 100),
5790
you would be down to one or two lines of code, at least in theory.
5791
5792
In practice it is not always clear what
5793
the ``middle of the program'' is and not always possible to
5794
check it. It doesn't make sense to count lines and find the
5795
exact midpoint. Instead, think about places
5796
in the program where there might be errors and places where it
5797
is easy to put a check. Then choose a spot where you
5798
think the chances are about the same that the bug is before
5799
or after the check.
5800
5801
5802
5803
5804
\section{Glossary}
5805
5806
\begin{description}
5807
5808
\item[reassignment:] Assigning a new value to a variable that
5809
already exists.
5810
\index{reassignment}
5811
5812
\item[update:] An assignment where the new value of the variable
5813
depends on the old.
5814
\index{update}
5815
5816
\item[initialization:] An assignment that gives an initial value to
5817
a variable that will be updated.
5818
\index{initialization!variable}
5819
5820
\item[increment:] An update that increases the value of a variable
5821
(often by one).
5822
\index{increment}
5823
5824
\item[decrement:] An update that decreases the value of a variable.
5825
\index{decrement}
5826
5827
\item[iteration:] Repeated execution of a set of statements using
5828
either a recursive function call or a loop.
5829
\index{iteration}
5830
5831
\item[infinite loop:] A loop in which the terminating condition is
5832
never satisfied.
5833
\index{infinite loop}
5834
5835
\item[algorithm:] A general process for solving a category of
5836
problems.
5837
\index{algorithm}
5838
5839
\end{description}
5840
5841
5842
\section{Exercises}
5843
5844
\begin{exercise}
5845
\index{algorithm!square root}
5846
5847
Copy the loop from Section~\ref{squareroot}
5848
and encapsulate it in a function called
5849
\verb"mysqrt" that takes {\tt a} as a parameter, chooses a
5850
reasonable value of {\tt x}, and returns an estimate of the square
5851
root of {\tt a}. \index{encapsulation}
5852
5853
To test it, write a function named \verb"test_square_root"
5854
that prints a table like this:
5855
5856
\begin{verbatim}
5857
a mysqrt(a) math.sqrt(a) diff
5858
- --------- ------------ ----
5859
1.0 1.0 1.0 0.0
5860
2.0 1.41421356237 1.41421356237 2.22044604925e-16
5861
3.0 1.73205080757 1.73205080757 0.0
5862
4.0 2.0 2.0 0.0
5863
5.0 2.2360679775 2.2360679775 0.0
5864
6.0 2.44948974278 2.44948974278 0.0
5865
7.0 2.64575131106 2.64575131106 0.0
5866
8.0 2.82842712475 2.82842712475 4.4408920985e-16
5867
9.0 3.0 3.0 0.0
5868
\end{verbatim}
5869
%
5870
The first column is a number, $a$; the second column is the square
5871
root of $a$ computed with \verb"mysqrt"; the third column is the
5872
square root computed by {\tt math.sqrt}; the fourth column is the
5873
absolute value of the difference between the two estimates.
5874
\end{exercise}
5875
5876
5877
\begin{exercise}
5878
\index{eval function}
5879
\index{function!eval}
5880
5881
The built-in function {\tt eval} takes a string and evaluates
5882
it using the Python interpreter. For example:
5883
5884
\begin{verbatim}
5885
>>> eval('1 + 2 * 3')
5886
7
5887
>>> import math
5888
>>> eval('math.sqrt(5)')
5889
2.2360679774997898
5890
>>> eval('type(math.pi)')
5891
<class 'float'>
5892
\end{verbatim}
5893
%
5894
Write a function called \verb"eval_loop" that iteratively
5895
prompts the user, takes the resulting input and evaluates
5896
it using {\tt eval}, and prints the result.
5897
5898
It should continue until the user enters \verb"'done'", and then
5899
return the value of the last expression it evaluated.
5900
5901
\end{exercise}
5902
5903
5904
\begin{exercise}
5905
\index{Ramanujan, Srinivasa}
5906
5907
The mathematician Srinivasa Ramanujan found an
5908
infinite series
5909
that can be used to generate a numerical
5910
approximation of $1 / \pi$:
5911
\index{pi}
5912
5913
\[ \frac{1}{\pi} = \frac{2\sqrt{2}}{9801}
5914
\sum^\infty_{k=0} \frac{(4k)!(1103+26390k)}{(k!)^4 396^{4k}} \]
5915
5916
Write a function called \verb"estimate_pi" that uses this formula
5917
to compute and return an estimate of $\pi$. It should use a {\tt while}
5918
loop to compute terms of the summation until the last term is
5919
smaller than {\tt 1e-15} (which is Python notation for $10^{-15}$).
5920
You can check the result by comparing it to {\tt math.pi}.
5921
5922
Solution: \url{http://thinkpython2.com/code/pi.py}.
5923
5924
\end{exercise}
5925
5926
5927
\chapter{Strings}
5928
\label{strings}
5929
5930
Strings are not like integers, floats, and booleans. A string
5931
is a {\bf sequence}, which means it is
5932
an ordered collection of other values. In this chapter you'll see
5933
how to access the characters that make up a string, and you'll
5934
learn about some of the methods strings provide.
5935
\index{sequence}
5936
5937
5938
\section{A string is a sequence}
5939
5940
\index{sequence}
5941
\index{character}
5942
\index{bracket operator}
5943
\index{operator!bracket}
5944
A string is a sequence of characters.
5945
You can access the characters one at a time with the
5946
bracket operator:
5947
5948
\begin{verbatim}
5949
>>> fruit = 'banana'
5950
>>> letter = fruit[1]
5951
\end{verbatim}
5952
%
5953
The second statement selects character number 1 from {\tt
5954
fruit} and assigns it to {\tt letter}.
5955
\index{index}
5956
5957
The expression in brackets is called an {\bf index}.
5958
The index indicates which character in the sequence you
5959
want (hence the name).
5960
5961
But you might not get what you expect:
5962
5963
\begin{verbatim}
5964
>>> letter
5965
'a'
5966
\end{verbatim}
5967
%
5968
For most people, the first letter of \verb"'banana'" is {\tt b}, not
5969
{\tt a}. But for computer scientists, the index is an offset from the
5970
beginning of the string, and the offset of the first letter is zero.
5971
5972
\begin{verbatim}
5973
>>> letter = fruit[0]
5974
>>> letter
5975
'b'
5976
\end{verbatim}
5977
%
5978
So {\tt b} is the 0th letter (``zero-eth'') of \verb"'banana'", {\tt
5979
a} is the 1th letter (``one-eth''), and {\tt n} is the 2th letter
5980
(``two-eth''). \index{index!starting at zero} \index{zero, index
5981
starting at}
5982
5983
As an index you can use an expression that contains variables and
5984
operators:
5985
\index{index}
5986
5987
\begin{verbatim}
5988
>>> i = 1
5989
>>> fruit[i]
5990
'a'
5991
>>> fruit[i+1]
5992
'n'
5993
\end{verbatim}
5994
%
5995
5996
But the value of the index has to be an integer. Otherwise you
5997
get:
5998
\index{exception!TypeError}
5999
\index{TypeError}
6000
6001
\begin{verbatim}
6002
>>> letter = fruit[1.5]
6003
TypeError: string indices must be integers
6004
\end{verbatim}
6005
%
6006
6007
\section{{\tt len}}
6008
\index{len function}
6009
\index{function!len}
6010
6011
{\tt len} is a built-in function that returns the number of characters
6012
in a string:
6013
6014
\begin{verbatim}
6015
>>> fruit = 'banana'
6016
>>> len(fruit)
6017
6
6018
\end{verbatim}
6019
%
6020
To get the last letter of a string, you might be tempted to try something
6021
like this:
6022
\index{exception!IndexError}
6023
\index{IndexError}
6024
6025
\begin{verbatim}
6026
>>> length = len(fruit)
6027
>>> last = fruit[length]
6028
IndexError: string index out of range
6029
\end{verbatim}
6030
%
6031
The reason for the {\tt IndexError} is that there is no letter in {\tt
6032
'banana'} with the index 6. Since we started counting at zero, the
6033
six letters are numbered 0 to 5. To get the last character, you have
6034
to subtract 1 from {\tt length}:
6035
6036
\begin{verbatim}
6037
>>> last = fruit[length-1]
6038
>>> last
6039
'a'
6040
\end{verbatim}
6041
%
6042
Or you can use negative indices, which count backward from
6043
the end of the string. The expression {\tt fruit[-1]} yields the last
6044
letter, {\tt fruit[-2]} yields the second to last, and so on.
6045
\index{index!negative}
6046
\index{negative index}
6047
6048
6049
\section{Traversal with a {\tt for} loop}
6050
\label{for}
6051
\index{traversal}
6052
\index{loop!traversal}
6053
\index{for loop}
6054
\index{loop!for}
6055
\index{statement!for}
6056
\index{traversal}
6057
6058
A lot of computations involve processing a string one character at a
6059
time. Often they start at the beginning, select each character in
6060
turn, do something to it, and continue until the end. This pattern of
6061
processing is called a {\bf traversal}. One way to write a traversal
6062
is with a {\tt while} loop:
6063
6064
\begin{verbatim}
6065
index = 0
6066
while index < len(fruit):
6067
letter = fruit[index]
6068
print(letter)
6069
index = index + 1
6070
\end{verbatim}
6071
%
6072
This loop traverses the string and displays each letter on a line by
6073
itself. The loop condition is {\tt index < len(fruit)}, so
6074
when {\tt index} is equal to the length of the string, the
6075
condition is false, and the body of the loop doesn't run. The
6076
last character accessed is the one with the index {\tt len(fruit)-1},
6077
which is the last character in the string.
6078
6079
As an exercise, write a function that takes a string as an argument
6080
and displays the letters backward, one per line.
6081
6082
Another way to write a traversal is with a {\tt for} loop:
6083
6084
\begin{verbatim}
6085
for letter in fruit:
6086
print(letter)
6087
\end{verbatim}
6088
%
6089
Each time through the loop, the next character in the string is assigned
6090
to the variable {\tt letter}. The loop continues until no characters are
6091
left.
6092
\index{concatenation}
6093
\index{abecedarian}
6094
\index{McCloskey, Robert}
6095
6096
The following example shows how to use concatenation (string addition)
6097
and a {\tt for} loop to generate an abecedarian series (that is, in
6098
alphabetical order). In Robert McCloskey's book {\em Make
6099
Way for Ducklings}, the names of the ducklings are Jack, Kack, Lack,
6100
Mack, Nack, Ouack, Pack, and Quack. This loop outputs these names in
6101
order:
6102
6103
\begin{verbatim}
6104
prefixes = 'JKLMNOPQ'
6105
suffix = 'ack'
6106
6107
for letter in prefixes:
6108
print(letter + suffix)
6109
\end{verbatim}
6110
%
6111
The output is:
6112
6113
\begin{verbatim}
6114
Jack
6115
Kack
6116
Lack
6117
Mack
6118
Nack
6119
Oack
6120
Pack
6121
Qack
6122
\end{verbatim}
6123
%
6124
Of course, that's not quite right because ``Ouack'' and ``Quack'' are
6125
misspelled. As an exercise, modify the program to fix this error.
6126
6127
6128
6129
\section{String slices}
6130
\label{slice}
6131
\index{slice operator} \index{operator!slice} \index{index!slice}
6132
\index{string!slice} \index{slice!string}
6133
6134
A segment of a string is called a {\bf slice}. Selecting a slice is
6135
similar to selecting a character:
6136
6137
\begin{verbatim}
6138
>>> s = 'Monty Python'
6139
>>> s[0:5]
6140
'Monty'
6141
>>> s[6:12]
6142
'Python'
6143
\end{verbatim}
6144
%
6145
The operator {\tt [n:m]} returns the part of the string from the
6146
``n-eth'' character to the ``m-eth'' character, including the first but
6147
excluding the last. This behavior is counterintuitive, but it might
6148
help to imagine the indices pointing {\em between} the
6149
characters, as in Figure~\ref{fig.banana}.
6150
6151
\begin{figure}
6152
\centerline
6153
{\includegraphics[scale=0.8]{figs/banana.pdf}}
6154
\caption{Slice indices.}
6155
\label{fig.banana}
6156
\end{figure}
6157
6158
If you omit the first index (before the colon), the slice starts at
6159
the beginning of the string. If you omit the second index, the slice
6160
goes to the end of the string:
6161
6162
\begin{verbatim}
6163
>>> fruit = 'banana'
6164
>>> fruit[:3]
6165
'ban'
6166
>>> fruit[3:]
6167
'ana'
6168
\end{verbatim}
6169
%
6170
If the first index is greater than or equal to the second the result
6171
is an {\bf empty string}, represented by two quotation marks:
6172
\index{quotation mark}
6173
6174
\begin{verbatim}
6175
>>> fruit = 'banana'
6176
>>> fruit[3:3]
6177
''
6178
\end{verbatim}
6179
%
6180
An empty string contains no characters and has length 0, but other
6181
than that, it is the same as any other string.
6182
6183
Continuing this example, what do you think
6184
{\tt fruit[:]} means? Try it and see.
6185
\index{copy!slice}
6186
\index{slice!copy}
6187
6188
6189
6190
\section{Strings are immutable}
6191
\index{mutability}
6192
\index{immutability}
6193
\index{string!immutable}
6194
6195
It is tempting to use the {\tt []} operator on the left side of an
6196
assignment, with the intention of changing a character in a string.
6197
For example:
6198
\index{TypeError}
6199
\index{exception!TypeError}
6200
6201
\begin{verbatim}
6202
>>> greeting = 'Hello, world!'
6203
>>> greeting[0] = 'J'
6204
TypeError: 'str' object does not support item assignment
6205
\end{verbatim}
6206
%
6207
The ``object'' in this case is the string and the ``item'' is
6208
the character you tried to assign. For now, an object is
6209
the same thing as a value, but we will refine that definition
6210
later (Section~\ref{equivalence}).
6211
\index{object}
6212
\index{item}
6213
\index{item assignment}
6214
\index{assignment!item}
6215
\index{immutability}
6216
6217
The reason for the error is that
6218
strings are {\bf immutable}, which means you can't change an
6219
existing string. The best you can do is create a new string
6220
that is a variation on the original:
6221
6222
\begin{verbatim}
6223
>>> greeting = 'Hello, world!'
6224
>>> new_greeting = 'J' + greeting[1:]
6225
>>> new_greeting
6226
'Jello, world!'
6227
\end{verbatim}
6228
%
6229
This example concatenates a new first letter onto
6230
a slice of {\tt greeting}. It has no effect on
6231
the original string.
6232
\index{concatenation}
6233
6234
6235
\section{Searching}
6236
\label{find}
6237
6238
What does the following function do?
6239
\index{find function}
6240
\index{function!find}
6241
6242
\begin{verbatim}
6243
def find(word, letter):
6244
index = 0
6245
while index < len(word):
6246
if word[index] == letter:
6247
return index
6248
index = index + 1
6249
return -1
6250
\end{verbatim}
6251
%
6252
In a sense, {\tt find} is the inverse of the {\tt []} operator.
6253
Instead of taking an index and extracting the corresponding character,
6254
it takes a character and finds the index where that character
6255
appears. If the character is not found, the function returns {\tt
6256
-1}.
6257
6258
This is the first example we have seen of a {\tt return} statement
6259
inside a loop. If {\tt word[index] == letter}, the function breaks
6260
out of the loop and returns immediately.
6261
6262
If the character doesn't appear in the string, the program
6263
exits the loop normally and returns {\tt -1}.
6264
6265
This pattern of computation---traversing a sequence and returning
6266
when we find what we are looking for---is called a {\bf search}.
6267
\index{traversal}
6268
\index{search pattern}
6269
\index{pattern!search}
6270
6271
As an exercise, modify {\tt find} so that it has a
6272
third parameter, the index in {\tt word} where it should start
6273
looking.
6274
6275
6276
\section{Looping and counting}
6277
\label{counter}
6278
\index{counter}
6279
\index{counting and looping}
6280
\index{looping and counting}
6281
\index{looping!with strings}
6282
6283
The following program counts the number of times the letter {\tt a}
6284
appears in a string:
6285
6286
\begin{verbatim}
6287
word = 'banana'
6288
count = 0
6289
for letter in word:
6290
if letter == 'a':
6291
count = count + 1
6292
print(count)
6293
\end{verbatim}
6294
%
6295
This program demonstrates another pattern of computation called a {\bf
6296
counter}. The variable {\tt count} is initialized to 0 and then
6297
incremented each time an {\tt a} is found.
6298
When the loop exits, {\tt count}
6299
contains the result---the total number of {\tt a}'s.
6300
6301
\index{encapsulation}
6302
As an exercise, encapsulate this code in a function named {\tt
6303
count}, and generalize it so that it accepts the string and the
6304
letter as arguments.
6305
6306
Then rewrite the function so that instead of
6307
traversing the string, it uses the three-parameter version of {\tt
6308
find} from the previous section.
6309
6310
6311
\section{String methods}
6312
\label{optional}
6313
6314
Strings provide methods that perform a variety of useful operations.
6315
A method is similar to a function---it takes arguments and
6316
returns a value---but the syntax is different. For example, the
6317
method {\tt upper} takes a string and returns a new string with
6318
all uppercase letters.
6319
\index{method}
6320
\index{string!method}
6321
6322
Instead of the function syntax {\tt upper(word)}, it uses
6323
the method syntax {\tt word.upper()}.
6324
6325
\begin{verbatim}
6326
>>> word = 'banana'
6327
>>> new_word = word.upper()
6328
>>> new_word
6329
'BANANA'
6330
\end{verbatim}
6331
%
6332
This form of dot notation specifies the name of the method, {\tt
6333
upper}, and the name of the string to apply the method to, {\tt
6334
word}. The empty parentheses indicate that this method takes no
6335
arguments.
6336
\index{parentheses!empty}
6337
\index{dot notation}
6338
6339
A method call is called an {\bf invocation}; in this case, we would
6340
say that we are invoking {\tt upper} on {\tt word}.
6341
\index{invocation}
6342
6343
As it turns out, there is a string method named {\tt find} that
6344
is remarkably similar to the function we wrote:
6345
6346
\begin{verbatim}
6347
>>> word = 'banana'
6348
>>> index = word.find('a')
6349
>>> index
6350
1
6351
\end{verbatim}
6352
%
6353
In this example, we invoke {\tt find} on {\tt word} and pass
6354
the letter we are looking for as a parameter.
6355
6356
Actually, the {\tt find} method is more general than our function;
6357
it can find substrings, not just characters:
6358
6359
\begin{verbatim}
6360
>>> word.find('na')
6361
2
6362
\end{verbatim}
6363
%
6364
By default, {\tt find} starts at the beginning of the string, but
6365
it can take a second argument, the index where it should start:
6366
\index{optional argument}
6367
\index{argument!optional}
6368
6369
\begin{verbatim}
6370
>>> word.find('na', 3)
6371
4
6372
\end{verbatim}
6373
%
6374
This is an example of an {\bf optional argument};
6375
{\tt find} can
6376
also take a third argument, the index where it should stop:
6377
6378
\begin{verbatim}
6379
>>> name = 'bob'
6380
>>> name.find('b', 1, 2)
6381
-1
6382
\end{verbatim}
6383
%
6384
This search fails because {\tt b} does not
6385
appear in the index range from {\tt 1} to {\tt 2}, not including {\tt
6386
2}. Searching up to, but not including, the second index makes
6387
{\tt find} consistent with the slice operator.
6388
6389
6390
6391
\section{The {\tt in} operator}
6392
\label{inboth}
6393
\index{in operator}
6394
\index{operator!in}
6395
\index{boolean operator}
6396
\index{operator!boolean}
6397
6398
The word {\tt in} is a boolean operator that takes two strings and
6399
returns {\tt True} if the first appears as a substring in the second:
6400
6401
\begin{verbatim}
6402
>>> 'a' in 'banana'
6403
True
6404
>>> 'seed' in 'banana'
6405
False
6406
\end{verbatim}
6407
%
6408
For example, the following function prints all the
6409
letters from {\tt word1} that also appear in {\tt word2}:
6410
6411
\begin{verbatim}
6412
def in_both(word1, word2):
6413
for letter in word1:
6414
if letter in word2:
6415
print(letter)
6416
\end{verbatim}
6417
%
6418
With well-chosen variable names,
6419
Python sometimes reads like English. You could read
6420
this loop, ``for (each) letter in (the first) word, if (the) letter
6421
(appears) in (the second) word, print (the) letter.''
6422
6423
Here's what you get if you compare apples and oranges:
6424
6425
\begin{verbatim}
6426
>>> in_both('apples', 'oranges')
6427
a
6428
e
6429
s
6430
\end{verbatim}
6431
%
6432
6433
\section{String comparison}
6434
\index{string!comparison}
6435
\index{comparison!string}
6436
6437
The relational operators work on strings. To see if two strings are equal:
6438
6439
\begin{verbatim}
6440
if word == 'banana':
6441
print('All right, bananas.')
6442
\end{verbatim}
6443
%
6444
Other relational operations are useful for putting words in alphabetical
6445
order:
6446
6447
\begin{verbatim}
6448
if word < 'banana':
6449
print('Your word, ' + word + ', comes before banana.')
6450
elif word > 'banana':
6451
print('Your word, ' + word + ', comes after banana.')
6452
else:
6453
print('All right, bananas.')
6454
\end{verbatim}
6455
%
6456
Python does not handle uppercase and lowercase letters the same way
6457
people do. All the uppercase letters come before all the
6458
lowercase letters, so:
6459
6460
\begin{verbatim}
6461
Your word, Pineapple, comes before banana.
6462
\end{verbatim}
6463
%
6464
A common way to address this problem is to convert strings to a
6465
standard format, such as all lowercase, before performing the
6466
comparison. Keep that in mind in case you have to defend yourself
6467
against a man armed with a Pineapple.
6468
6469
6470
\section{Debugging}
6471
\index{debugging}
6472
\index{traversal}
6473
6474
When you use indices to traverse the values in a sequence,
6475
it is tricky to get the beginning and end of the traversal
6476
right. Here is a function that is supposed to compare two
6477
words and return {\tt True} if one of the words is the reverse
6478
of the other, but it contains two errors:
6479
6480
\begin{verbatim}
6481
def is_reverse(word1, word2):
6482
if len(word1) != len(word2):
6483
return False
6484
6485
i = 0
6486
j = len(word2)
6487
6488
while j > 0:
6489
if word1[i] != word2[j]:
6490
return False
6491
i = i+1
6492
j = j-1
6493
6494
return True
6495
\end{verbatim}
6496
%
6497
The first {\tt if} statement checks whether the words are the
6498
same length. If not, we can return {\tt False} immediately.
6499
Otherwise, for the rest of the function, we can assume that the words
6500
are the same length. This is an example of the guardian pattern
6501
in Section~\ref{guardian}.
6502
\index{guardian pattern}
6503
\index{pattern!guardian}
6504
\index{index}
6505
6506
{\tt i} and {\tt j} are indices: {\tt i} traverses {\tt word1}
6507
forward while {\tt j} traverses {\tt word2} backward. If we find
6508
two letters that don't match, we can return {\tt False} immediately.
6509
If we get through the whole loop and all the letters match, we
6510
return {\tt True}.
6511
6512
If we test this function with the words ``pots'' and ``stop'', we
6513
expect the return value {\tt True}, but we get an IndexError:
6514
\index{IndexError}
6515
\index{exception!IndexError}
6516
6517
\begin{verbatim}
6518
>>> is_reverse('pots', 'stop')
6519
...
6520
File "reverse.py", line 15, in is_reverse
6521
if word1[i] != word2[j]:
6522
IndexError: string index out of range
6523
\end{verbatim}
6524
%
6525
For debugging this kind of error, my first move is to
6526
print the values of the indices immediately before the line
6527
where the error appears.
6528
6529
\begin{verbatim}
6530
while j > 0:
6531
print(i, j) # print here
6532
6533
if word1[i] != word2[j]:
6534
return False
6535
i = i+1
6536
j = j-1
6537
\end{verbatim}
6538
%
6539
Now when I run the program again, I get more information:
6540
6541
\begin{verbatim}
6542
>>> is_reverse('pots', 'stop')
6543
0 4
6544
...
6545
IndexError: string index out of range
6546
\end{verbatim}
6547
%
6548
The first time through the loop, the value of {\tt j} is 4,
6549
which is out of range for the string \verb"'pots'".
6550
The index of the last character is 3, so the
6551
initial value for {\tt j} should be {\tt len(word2)-1}.
6552
6553
If I fix that error and run the program again, I get:
6554
6555
\begin{verbatim}
6556
>>> is_reverse('pots', 'stop')
6557
0 3
6558
1 2
6559
2 1
6560
True
6561
\end{verbatim}
6562
%
6563
This time we get the right answer, but it looks like the loop only ran
6564
three times, which is suspicious. To get a better idea of what is
6565
happening, it is useful to draw a state diagram. During the first
6566
iteration, the frame for \verb"is_reverse" is shown in
6567
Figure~\ref{fig.state4}. \index{state diagram} \index{diagram!state}
6568
6569
\begin{figure}
6570
\centerline
6571
{\includegraphics[scale=0.8]{figs/state4.pdf}}
6572
\caption{State diagram.}
6573
\label{fig.state4}
6574
\end{figure}
6575
6576
I took some license by arranging the variables in the frame
6577
and adding dotted lines to show that the values of {\tt i} and
6578
{\tt j} indicate characters in {\tt word1} and {\tt word2}.
6579
6580
Starting with this diagram, run the program on paper, changing the
6581
values of {\tt i} and {\tt j} during each iteration. Find and fix the
6582
second error in this function.
6583
\label{isreverse}
6584
6585
6586
\section{Glossary}
6587
6588
\begin{description}
6589
6590
\item[object:] Something a variable can refer to. For now,
6591
you can use ``object'' and ``value'' interchangeably.
6592
\index{object}
6593
6594
\item[sequence:] An ordered collection of
6595
values where each value is identified by an integer index.
6596
\index{sequence}
6597
6598
\item[item:] One of the values in a sequence.
6599
\index{item}
6600
6601
\item[index:] An integer value used to select an item in
6602
a sequence, such as a character in a string. In Python
6603
indices start from 0.
6604
\index{index}
6605
6606
\item[slice:] A part of a string specified by a range of indices.
6607
\index{slice}
6608
6609
\item[empty string:] A string with no characters and length 0, represented
6610
by two quotation marks.
6611
\index{empty string}
6612
6613
\item[immutable:] The property of a sequence whose items cannot
6614
be changed.
6615
\index{immutability}
6616
6617
\item[traverse:] To iterate through the items in a sequence,
6618
performing a similar operation on each.
6619
\index{traversal}
6620
6621
\item[search:] A pattern of traversal that stops
6622
when it finds what it is looking for.
6623
\index{search pattern}
6624
\index{pattern!search}
6625
6626
\item[counter:] A variable used to count something, usually initialized
6627
to zero and then incremented.
6628
\index{counter}
6629
6630
\item[invocation:] A statement that calls a method.
6631
\index{invocation}
6632
6633
\item[optional argument:] A function or method argument that is not
6634
required.
6635
\index{optional argument}
6636
\index{argument!optional}
6637
6638
\end{description}
6639
6640
6641
\section{Exercises}
6642
6643
\begin{exercise}
6644
\index{string method}
6645
\index{method!string}
6646
6647
Read the documentation of the string methods at
6648
\url{http://docs.python.org/3/library/stdtypes.html#string-methods}.
6649
You might want to experiment with some of them to make sure you
6650
understand how they work. {\tt strip} and {\tt replace} are
6651
particularly useful.
6652
6653
The documentation uses a syntax that might be confusing.
6654
For example, in \verb"find(sub[, start[, end]])", the brackets
6655
indicate optional arguments. So {\tt sub} is required, but
6656
{\tt start} is optional, and if you include {\tt start},
6657
then {\tt end} is optional.
6658
\index{optional argument}
6659
\index{argument!optional}
6660
6661
\end{exercise}
6662
6663
6664
\begin{exercise}
6665
\index{count method}
6666
\index{method!count}
6667
6668
There is a string method called {\tt count} that is similar
6669
to the function in Section~\ref{counter}. Read the documentation
6670
of this method
6671
and write an invocation that counts the number of {\tt a}'s
6672
in \verb"'banana'".
6673
\end{exercise}
6674
6675
6676
\begin{exercise}
6677
\index{step size}
6678
\index{slice operator}
6679
\index{operator!slice}
6680
6681
A string slice can take a third index that specifies the ``step
6682
size''; that is, the number of spaces between successive characters.
6683
A step size of 2 means every other character; 3 means every third,
6684
etc.
6685
6686
\begin{verbatim}
6687
>>> fruit = 'banana'
6688
>>> fruit[0:5:2]
6689
'bnn'
6690
\end{verbatim}
6691
6692
A step size of -1 goes through the word backwards, so
6693
the slice \verb"[::-1]" generates a reversed string.
6694
\index{palindrome}
6695
6696
Use this idiom to write a one-line version of \verb"is_palindrome"
6697
from Exercise~\ref{palindrome}.
6698
\end{exercise}
6699
6700
6701
\begin{exercise}
6702
6703
The following functions are all {\em intended} to check whether a
6704
string contains any lowercase letters, but at least some of them are
6705
wrong. For each function, describe what the function actually does
6706
(assuming that the parameter is a string).
6707
6708
\begin{verbatim}
6709
def any_lowercase1(s):
6710
for c in s:
6711
if c.islower():
6712
return True
6713
else:
6714
return False
6715
6716
def any_lowercase2(s):
6717
for c in s:
6718
if 'c'.islower():
6719
return 'True'
6720
else:
6721
return 'False'
6722
6723
def any_lowercase3(s):
6724
for c in s:
6725
flag = c.islower()
6726
return flag
6727
6728
def any_lowercase4(s):
6729
flag = False
6730
for c in s:
6731
flag = flag or c.islower()
6732
return flag
6733
6734
def any_lowercase5(s):
6735
for c in s:
6736
if not c.islower():
6737
return False
6738
return True
6739
\end{verbatim}
6740
6741
\end{exercise}
6742
6743
6744
\begin{exercise}
6745
\index{letter rotation}
6746
\index{rotation, letter}
6747
6748
\label{exrotate}
6749
A Caesar cypher is a weak form of encryption that involves ``rotating'' each
6750
letter by a fixed number of places. To rotate a letter means
6751
to shift it through the alphabet, wrapping around to the beginning if
6752
necessary, so 'A' rotated by 3 is 'D' and 'Z' rotated by 1 is 'A'.
6753
6754
To rotate a word, rotate each letter by the same amount.
6755
For example, ``cheer'' rotated by 7 is ``jolly'' and ``melon'' rotated
6756
by -10 is ``cubed''. In the movie {\em 2001: A Space Odyssey}, the
6757
ship computer is called HAL, which is IBM rotated by -1.
6758
6759
%For example ``sleep''
6760
%rotated by 9 is ``bunny'' and ``latex'' rotated by 7 is ``shale''.
6761
6762
Write a function called \verb"rotate_word"
6763
that takes a string and an integer as parameters, and returns
6764
a new string that contains the letters from the original string
6765
rotated by the given amount.
6766
6767
You might want to use the built-in function {\tt ord}, which converts
6768
a character to a numeric code, and {\tt chr}, which converts numeric
6769
codes to characters. Letters of the alphabet are encoded in alphabetical
6770
order, so for example:
6771
6772
\begin{verbatim}
6773
>>> ord('c') - ord('a')
6774
2
6775
\end{verbatim}
6776
6777
Because \verb"'c'" is the two-eth letter of the alphabet. But
6778
beware: the numeric codes for upper case letters are different.
6779
6780
Potentially offensive jokes on the Internet are sometimes encoded in
6781
ROT13, which is a Caesar cypher with rotation 13. If you are not
6782
easily offended, find and decode some of them. Solution:
6783
\url{http://thinkpython2.com/code/rotate.py}.
6784
6785
\end{exercise}
6786
6787
6788
\chapter{Case study: word play}
6789
\label{wordplay}
6790
6791
This chapter presents the second case study, which involves
6792
solving word puzzles by searching for words that have certain
6793
properties. For example, we'll find the longest palindromes
6794
in English and search for words whose letters appear in
6795
alphabetical order. And I will present another program development
6796
plan: reduction to a previously solved problem.
6797
6798
6799
\section{Reading word lists}
6800
\label{wordlist}
6801
6802
For the exercises in this chapter we need a list of English words.
6803
There are lots of word lists available on the Web, but the one most
6804
suitable for our purpose is one of the word lists collected and
6805
contributed to the public domain by Grady Ward as part of the Moby
6806
lexicon project (see \url{http://wikipedia.org/wiki/Moby_Project}). It
6807
is a list of 113,809 official crosswords; that is, words that are
6808
considered valid in crossword puzzles and other word games. In the
6809
Moby collection, the filename is {\tt 113809of.fic}; you can download
6810
a copy, with the simpler name {\tt words.txt}, from
6811
\url{http://thinkpython2.com/code/words.txt}.
6812
\index{Moby Project}
6813
\index{crosswords}
6814
6815
This file is in plain text, so you can open it with a text
6816
editor, but you can also read it from Python. The built-in
6817
function {\tt open} takes the name of the file as a parameter
6818
and returns a {\bf file object} you can use to read the file.
6819
\index{open function}
6820
\index{function!open}
6821
\index{plain text}
6822
\index{text!plain}
6823
\index{object!file}
6824
\index{file object}
6825
6826
\begin{verbatim}
6827
>>> fin = open('words.txt')
6828
\end{verbatim}
6829
%
6830
{\tt fin} is a common name for a file object used for input. The file
6831
object provides several methods for reading, including {\tt readline},
6832
which reads characters from the file until it gets to a newline and
6833
returns the result as a string: \index{readline method}
6834
\index{method!readline}
6835
6836
\begin{verbatim}
6837
>>> fin.readline()
6838
'aa\n'
6839
\end{verbatim}
6840
%
6841
The first word in this particular list is ``aa'', which is a kind of
6842
lava. The sequence \verb"\n" represents the newline character that
6843
separates this word from the next.
6844
6845
The file object keeps track of where it is in the file, so
6846
if you call {\tt readline} again, you get the next word:
6847
6848
\begin{verbatim}
6849
>>> fin.readline()
6850
'aah\n'
6851
\end{verbatim}
6852
%
6853
The next word is ``aah'', which is a perfectly legitimate
6854
word, so stop looking at me like that.
6855
Or, if it's the newline character that's bothering you,
6856
we can get rid of it with the string method {\tt strip}:
6857
\index{strip method}
6858
\index{method!strip}
6859
6860
\begin{verbatim}
6861
>>> line = fin.readline()
6862
>>> word = line.strip()
6863
>>> word
6864
'aahed'
6865
\end{verbatim}
6866
%
6867
You can also use a file object as part of a {\tt for} loop.
6868
This program reads {\tt words.txt} and prints each word, one
6869
per line:
6870
\index{open function}
6871
\index{function!open}
6872
6873
\begin{verbatim}
6874
fin = open('words.txt')
6875
for line in fin:
6876
word = line.strip()
6877
print(word)
6878
\end{verbatim}
6879
%
6880
6881
\section{Exercises}
6882
6883
There are solutions to these exercises in the next section.
6884
You should at least attempt each one before you read the solutions.
6885
6886
\begin{exercise}
6887
Write a program that reads {\tt words.txt} and prints only the
6888
words with more than 20 characters (not counting whitespace).
6889
\index{whitespace}
6890
6891
\end{exercise}
6892
6893
\begin{exercise}
6894
6895
In 1939 Ernest Vincent Wright published a 50,000 word novel called
6896
{\em Gadsby} that does not contain the letter ``e''. Since ``e'' is
6897
the most common letter in English, that's not easy to do.
6898
6899
In fact, it is difficult to construct a solitary thought without using
6900
that most common symbol. It is slow going at first, but with caution
6901
and hours of training you can gradually gain facility.
6902
6903
All right, I'll stop now.
6904
6905
Write a function called \verb"has_no_e" that returns {\tt True} if
6906
the given word doesn't have the letter ``e'' in it.
6907
6908
Write a program that reads {\tt words.txt} and prints only the words
6909
that have no ``e''. Compute the percentage of words in the list
6910
that have no ``e''.
6911
\index{lipogram}
6912
6913
\end{exercise}
6914
6915
6916
\begin{exercise}
6917
6918
Write a function named {\tt avoids}
6919
that takes a word and a string of forbidden letters, and
6920
that returns {\tt True} if the word doesn't use any of the forbidden
6921
letters.
6922
6923
Write a program that prompts the user to enter a string
6924
of forbidden letters and then prints the number of words that
6925
don't contain any of them.
6926
Can you find a combination of 5 forbidden letters that
6927
excludes the smallest number of words?
6928
6929
\end{exercise}
6930
6931
6932
6933
\begin{exercise}
6934
6935
Write a function named \verb"uses_only" that takes a word and a
6936
string of letters, and that returns {\tt True} if the word contains
6937
only letters in the list. Can you make a sentence using only the
6938
letters {\tt acefhlo}? Other than ``Hoe alfalfa''?
6939
6940
\end{exercise}
6941
6942
6943
\begin{exercise}
6944
6945
Write a function named \verb"uses_all" that takes a word and a
6946
string of required letters, and that returns {\tt True} if the word
6947
uses all the required letters at least once. How many words are there
6948
that use all the vowels {\tt aeiou}? How about {\tt aeiouy}?
6949
6950
\end{exercise}
6951
6952
6953
\begin{exercise}
6954
6955
Write a function called \verb"is_abecedarian" that returns
6956
{\tt True} if the letters in a word appear in alphabetical order
6957
(double letters are ok).
6958
How many abecedarian words are there?
6959
6960
\index{abecedarian}
6961
6962
\end{exercise}
6963
6964
6965
6966
\section{Search}
6967
\label{search}
6968
\index{search pattern}
6969
\index{pattern!search}
6970
6971
All of the exercises in the previous section have something
6972
in common; they can be solved with the search pattern we saw
6973
in Section~\ref{find}. The simplest example is:
6974
6975
\begin{verbatim}
6976
def has_no_e(word):
6977
for letter in word:
6978
if letter == 'e':
6979
return False
6980
return True
6981
\end{verbatim}
6982
%
6983
The {\tt for} loop traverses the characters in {\tt word}. If we find
6984
the letter ``e'', we can immediately return {\tt False}; otherwise we
6985
have to go to the next letter. If we exit the loop normally, that
6986
means we didn't find an ``e'', so we return {\tt True}.
6987
\index{traversal}
6988
6989
\index{in operator}
6990
\index{operator!in}
6991
You could write this function more concisely using the {\tt in}
6992
operator, but I started with this version because it
6993
demonstrates the logic of the search pattern.
6994
6995
\index{generalization}
6996
{\tt avoids} is a more general version of \verb"has_no_e" but it
6997
has the same structure:
6998
6999
\begin{verbatim}
7000
def avoids(word, forbidden):
7001
for letter in word:
7002
if letter in forbidden:
7003
return False
7004
return True
7005
\end{verbatim}
7006
%
7007
We can return {\tt False} as soon as we find a forbidden letter;
7008
if we get to the end of the loop, we return {\tt True}.
7009
7010
\verb"uses_only" is similar except that the sense of the condition
7011
is reversed:
7012
7013
\begin{verbatim}
7014
def uses_only(word, available):
7015
for letter in word:
7016
if letter not in available:
7017
return False
7018
return True
7019
\end{verbatim}
7020
%
7021
Instead of a list of forbidden letters, we have a list of available
7022
letters. If we find a letter in {\tt word} that is not in
7023
{\tt available}, we can return {\tt False}.
7024
7025
\verb"uses_all" is similar except that we reverse the role
7026
of the word and the string of letters:
7027
7028
\begin{verbatim}
7029
def uses_all(word, required):
7030
for letter in required:
7031
if letter not in word:
7032
return False
7033
return True
7034
\end{verbatim}
7035
%
7036
Instead of traversing the letters in {\tt word}, the loop
7037
traverses the required letters. If any of the required letters
7038
do not appear in the word, we can return {\tt False}.
7039
\index{traversal}
7040
7041
If you were really thinking like a computer scientist, you would
7042
have recognized that \verb"uses_all" was an instance of a
7043
previously solved problem, and you would have written:
7044
7045
\begin{verbatim}
7046
def uses_all(word, required):
7047
return uses_only(required, word)
7048
\end{verbatim}
7049
%
7050
This is an example of a program development plan called {\bf
7051
reduction to a previously solved problem}, which means that you
7052
recognize the problem you are working on as an instance of a solved
7053
problem and apply an existing solution. \index{reduction to a
7054
previously solved problem} \index{development plan!reduction}
7055
7056
7057
\section{Looping with indices}
7058
\index{looping!with indices}
7059
\index{index!looping with}
7060
7061
I wrote the functions in the previous section with {\tt for}
7062
loops because I only needed the characters in the strings; I didn't
7063
have to do anything with the indices.
7064
7065
For \verb"is_abecedarian" we have to compare adjacent letters,
7066
which is a little tricky with a {\tt for} loop:
7067
7068
\begin{verbatim}
7069
def is_abecedarian(word):
7070
previous = word[0]
7071
for c in word:
7072
if c < previous:
7073
return False
7074
previous = c
7075
return True
7076
\end{verbatim}
7077
7078
An alternative is to use recursion:
7079
7080
\begin{verbatim}
7081
def is_abecedarian(word):
7082
if len(word) <= 1:
7083
return True
7084
if word[0] > word[1]:
7085
return False
7086
return is_abecedarian(word[1:])
7087
\end{verbatim}
7088
7089
Another option is to use a {\tt while} loop:
7090
7091
\begin{verbatim}
7092
def is_abecedarian(word):
7093
i = 0
7094
while i < len(word)-1:
7095
if word[i+1] < word[i]:
7096
return False
7097
i = i+1
7098
return True
7099
\end{verbatim}
7100
%
7101
The loop starts at {\tt i=0} and ends when {\tt i=len(word)-1}. Each
7102
time through the loop, it compares the $i$th character (which you can
7103
think of as the current character) to the $i+1$th character (which you
7104
can think of as the next).
7105
7106
If the next character is less than (alphabetically before) the current
7107
one, then we have discovered a break in the abecedarian trend, and
7108
we return {\tt False}.
7109
7110
If we get to the end of the loop without finding a fault, then the
7111
word passes the test. To convince yourself that the loop ends
7112
correctly, consider an example like \verb"'flossy'". The
7113
length of the word is 6, so
7114
the last time the loop runs is when {\tt i} is 4, which is the
7115
index of the second-to-last character. On the last iteration,
7116
it compares the second-to-last character to the last, which is
7117
what we want.
7118
\index{palindrome}
7119
7120
Here is a version of \verb"is_palindrome" (see
7121
Exercise~\ref{palindrome}) that uses two indices; one starts at the
7122
beginning and goes up; the other starts at the end and goes down.
7123
7124
\begin{verbatim}
7125
def is_palindrome(word):
7126
i = 0
7127
j = len(word)-1
7128
7129
while i<j:
7130
if word[i] != word[j]:
7131
return False
7132
i = i+1
7133
j = j-1
7134
7135
return True
7136
\end{verbatim}
7137
7138
Or we could reduce to a previously solved
7139
problem and write:
7140
\index{reduction to a previously solved problem}
7141
\index{development plan!reduction}
7142
7143
\begin{verbatim}
7144
def is_palindrome(word):
7145
return is_reverse(word, word)
7146
\end{verbatim}
7147
%
7148
Using \verb"is_reverse" from Section~\ref{isreverse}.
7149
7150
7151
\section{Debugging}
7152
\index{debugging}
7153
\index{testing!is hard}
7154
\index{program testing}
7155
7156
Testing programs is hard. The functions in this chapter are
7157
relatively easy to test because you can check the results by hand.
7158
Even so, it is somewhere between difficult and impossible to choose a
7159
set of words that test for all possible errors.
7160
7161
Taking \verb"has_no_e" as an example, there are two obvious
7162
cases to check: words that have an `e' should return {\tt False}, and
7163
words that don't should return {\tt True}. You should have no
7164
trouble coming up with one of each.
7165
7166
Within each case, there are some less obvious subcases. Among the
7167
words that have an ``e'', you should test words with an ``e'' at the
7168
beginning, the end, and somewhere in the middle. You should test long
7169
words, short words, and very short words, like the empty string. The
7170
empty string is an example of a {\bf special case}, which is one of
7171
the non-obvious cases where errors often lurk.
7172
\index{special case}
7173
7174
In addition to the test cases you generate, you can also test
7175
your program with a word list like {\tt words.txt}. By scanning
7176
the output, you might be able to catch errors, but be careful:
7177
you might catch one kind of error (words that should not be
7178
included, but are) and not another (words that should be included,
7179
but aren't).
7180
7181
In general, testing can help you find bugs, but it is not easy to
7182
generate a good set of test cases, and even if you do, you can't
7183
be sure your program is correct.
7184
According to a legendary computer scientist:
7185
\index{testing!and absence of bugs}
7186
7187
\begin{quote}
7188
Program testing can be used to show the presence of bugs, but never to
7189
show their absence!
7190
7191
--- Edsger W. Dijkstra
7192
\end{quote}
7193
\index{Dijkstra, Edsger}
7194
7195
7196
\section{Glossary}
7197
7198
\begin{description}
7199
7200
\item[file object:] A value that represents an open file.
7201
\index{file object}
7202
\index{object!file}
7203
7204
\item[reduction to a previously solved problem:] A way of solving a
7205
problem by expressing it as an instance of a previously solved
7206
problem. \index{reduction to a previously solved problem}
7207
\index{development plan!reduction}
7208
7209
\item[special case:] A test case that is atypical or non-obvious
7210
(and less likely to be handled correctly).
7211
\index{special case}
7212
7213
\end{description}
7214
7215
7216
\section{Exercises}
7217
7218
\begin{exercise}
7219
\index{Car Talk}
7220
\index{Puzzler}
7221
\index{double letters}
7222
7223
This question is based on a Puzzler that was broadcast on the radio
7224
program {\em Car Talk}
7225
(\url{http://www.cartalk.com/content/puzzlers}):
7226
7227
\begin{quote}
7228
Give me a word with three consecutive double letters. I'll give you a
7229
couple of words that almost qualify, but don't. For example, the word
7230
committee, c-o-m-m-i-t-t-e-e. It would be great except for the `i' that
7231
sneaks in there. Or Mississippi: M-i-s-s-i-s-s-i-p-p-i. If you could
7232
take out those i's it would work. But there is a word that has three
7233
consecutive pairs of letters and to the best of my knowledge this may
7234
be the only word. Of course there are probably 500 more but I can only
7235
think of one. What is the word?
7236
\end{quote}
7237
7238
Write a program to find it.
7239
Solution: \url{http://thinkpython2.com/code/cartalk1.py}.
7240
7241
\end{exercise}
7242
7243
7244
\begin{exercise}
7245
Here's another {\em Car Talk}
7246
Puzzler (\url{http://www.cartalk.com/content/puzzlers}):
7247
\index{Car Talk}
7248
\index{Puzzler}
7249
\index{odometer}
7250
\index{palindrome}
7251
7252
\begin{quote}
7253
``I was driving on the highway the other day and I happened to
7254
notice my odometer. Like most odometers, it shows six digits,
7255
in whole miles only. So, if my car had 300,000
7256
miles, for example, I'd see 3-0-0-0-0-0.
7257
7258
``Now, what I saw that day was very interesting. I noticed that the
7259
last 4 digits were palindromic; that is, they read the same forward as
7260
backward. For example, 5-4-4-5 is a palindrome, so my odometer
7261
could have read 3-1-5-4-4-5.
7262
7263
``One mile later, the last 5 numbers were palindromic. For example, it
7264
could have read 3-6-5-4-5-6. One mile after that, the middle 4 out of
7265
6 numbers were palindromic. And you ready for this? One mile later,
7266
all 6 were palindromic!
7267
7268
``The question is, what was on the odometer when I first looked?''
7269
\end{quote}
7270
7271
Write a Python program that tests all the six-digit numbers and prints
7272
any numbers that satisfy these requirements.
7273
Solution: \url{http://thinkpython2.com/code/cartalk2.py}.
7274
7275
\end{exercise}
7276
7277
7278
\begin{exercise}
7279
Here's another {\em Car Talk} Puzzler you can solve with a
7280
search (\url{http://www.cartalk.com/content/puzzlers}):
7281
\index{Car Talk}
7282
\index{Puzzler}
7283
\index{palindrome}
7284
7285
\begin{quote}
7286
``Recently I had a visit with my mom and we realized that
7287
the two digits that make up my age when reversed resulted in her
7288
age. For example, if she's 73, I'm 37. We wondered how often this has
7289
happened over the years but we got sidetracked with other topics and
7290
we never came up with an answer.
7291
7292
``When I got home I figured out that the digits of our ages have been
7293
reversible six times so far. I also figured out that if we're lucky it
7294
would happen again in a few years, and if we're really lucky it would
7295
happen one more time after that. In other words, it would have
7296
happened 8 times over all. So the question is, how old am I now?''
7297
7298
\end{quote}
7299
7300
Write a Python program that searches for solutions to this Puzzler.
7301
Hint: you might find the string method {\tt zfill} useful.
7302
7303
Solution: \url{http://thinkpython2.com/code/cartalk3.py}.
7304
7305
\end{exercise}
7306
7307
7308
7309
\chapter{Lists}
7310
7311
This chapter presents one of Python's most useful built-in types, lists.
7312
You will also learn more about objects and what can happen when you have
7313
more than one name for the same object.
7314
7315
7316
\section{A list is a sequence}
7317
\label{sequence}
7318
7319
Like a string, a {\bf list} is a sequence of values. In a string, the
7320
values are characters; in a list, they can be any type. The values in
7321
a list are called {\bf elements} or sometimes {\bf items}.
7322
\index{list}
7323
\index{type!list}
7324
\index{element}
7325
\index{sequence}
7326
\index{item}
7327
7328
There are several ways to create a new list; the simplest is to
7329
enclose the elements in square brackets (\verb"[" and \verb"]"):
7330
7331
\begin{verbatim}
7332
[10, 20, 30, 40]
7333
['crunchy frog', 'ram bladder', 'lark vomit']
7334
\end{verbatim}
7335
%
7336
The first example is a list of four integers. The second is a list of
7337
three strings. The elements of a list don't have to be the same type.
7338
The following list contains a string, a float, an integer, and
7339
(lo!) another list:
7340
7341
\begin{verbatim}
7342
['spam', 2.0, 5, [10, 20]]
7343
\end{verbatim}
7344
%
7345
A list within another list is {\bf nested}.
7346
\index{nested list}
7347
\index{list!nested}
7348
7349
A list that contains no elements is
7350
called an empty list; you can create one with empty
7351
brackets, \verb"[]".
7352
\index{empty list}
7353
\index{list!empty}
7354
7355
As you might expect, you can assign list values to variables:
7356
7357
\begin{verbatim}
7358
>>> cheeses = ['Cheddar', 'Edam', 'Gouda']
7359
>>> numbers = [42, 123]
7360
>>> empty = []
7361
>>> print(cheeses, numbers, empty)
7362
['Cheddar', 'Edam', 'Gouda'] [42, 123] []
7363
\end{verbatim}
7364
%
7365
\index{assignment}
7366
7367
7368
\section{Lists are mutable}
7369
\label{mutable}
7370
\index{list!element}
7371
\index{access}
7372
\index{index}
7373
\index{bracket operator}
7374
\index{operator!bracket}
7375
7376
The syntax for accessing the elements of a list is the same as for
7377
accessing the characters of a string---the bracket operator. The
7378
expression inside the brackets specifies the index. Remember that the
7379
indices start at 0:
7380
7381
\begin{verbatim}
7382
>>> cheeses[0]
7383
'Cheddar'
7384
\end{verbatim}
7385
%
7386
Unlike strings, lists are mutable. When the bracket operator appears
7387
on the left side of an assignment, it identifies the element of the
7388
list that will be assigned.
7389
\index{mutability}
7390
7391
\begin{verbatim}
7392
>>> numbers = [42, 123]
7393
>>> numbers[1] = 5
7394
>>> numbers
7395
[42, 5]
7396
\end{verbatim}
7397
%
7398
The one-eth element of {\tt numbers}, which
7399
used to be 123, is now 5.
7400
\index{index!starting at zero}
7401
\index{zero, index starting at}
7402
7403
Figure~\ref{fig.liststate} shows
7404
the state diagram for {\tt
7405
cheeses}, {\tt numbers} and {\tt empty}:
7406
\index{state diagram}
7407
\index{diagram!state}
7408
7409
\begin{figure}
7410
\centerline
7411
{\includegraphics[scale=0.8]{figs/liststate.pdf}}
7412
\caption{State diagram.}
7413
\label{fig.liststate}
7414
\end{figure}
7415
7416
Lists are represented by boxes with the word ``list'' outside
7417
and the elements of the list inside. {\tt cheeses} refers to
7418
a list with three elements indexed 0, 1 and 2.
7419
{\tt numbers} contains two elements; the diagram shows that the
7420
value of the second element has been reassigned from 123 to 5.
7421
{\tt empty} refers to a list with no elements.
7422
\index{item assignment}
7423
\index{assignment!item}
7424
\index{reassignment}
7425
7426
List indices work the same way as string indices:
7427
7428
\begin{itemize}
7429
7430
\item Any integer expression can be used as an index.
7431
7432
\item If you try to read or write an element that does not exist, you
7433
get an {\tt IndexError}.
7434
\index{exception!IndexError}
7435
\index{IndexError}
7436
7437
\item If an index has a negative value, it counts backward from the
7438
end of the list.
7439
7440
\end{itemize}
7441
\index{list!index}
7442
7443
\index{list!membership}
7444
\index{membership!list}
7445
\index{in operator}
7446
\index{operator!in}
7447
7448
The {\tt in} operator also works on lists.
7449
7450
\begin{verbatim}
7451
>>> cheeses = ['Cheddar', 'Edam', 'Gouda']
7452
>>> 'Edam' in cheeses
7453
True
7454
>>> 'Brie' in cheeses
7455
False
7456
\end{verbatim}
7457
7458
7459
\section{Traversing a list}
7460
\index{list!traversal}
7461
\index{traversal!list}
7462
\index{for loop}
7463
\index{loop!for}
7464
\index{statement!for}
7465
7466
The most common way to traverse the elements of a list is
7467
with a {\tt for} loop. The syntax is the same as for strings:
7468
7469
\begin{verbatim}
7470
for cheese in cheeses:
7471
print(cheese)
7472
\end{verbatim}
7473
%
7474
This works well if you only need to read the elements of the
7475
list. But if you want to write or update the elements, you
7476
need the indices. A common way to do that is to combine
7477
the built-in functions {\tt range} and {\tt len}:
7478
\index{looping!with indices}
7479
\index{index!looping with}
7480
7481
\begin{verbatim}
7482
for i in range(len(numbers)):
7483
numbers[i] = numbers[i] * 2
7484
\end{verbatim}
7485
%
7486
This loop traverses the list and updates each element. {\tt len}
7487
returns the number of elements in the list. {\tt range} returns
7488
a list of indices from 0 to $n-1$, where $n$ is the length of
7489
the list. Each time through the loop {\tt i} gets the index
7490
of the next element. The assignment statement in the body uses
7491
{\tt i} to read the old value of the element and to assign the
7492
new value.
7493
\index{item update}
7494
\index{update!item}
7495
7496
A {\tt for} loop over an empty list never runs the body:
7497
7498
\begin{verbatim}
7499
for x in []:
7500
print('This never happens.')
7501
\end{verbatim}
7502
%
7503
Although a list can contain another list, the nested
7504
list still counts as a single element. The length of this list is
7505
four:
7506
\index{nested list}
7507
\index{list!nested}
7508
7509
\begin{verbatim}
7510
['spam', 1, ['Brie', 'Roquefort', 'Pol le Veq'], [1, 2, 3]]
7511
\end{verbatim}
7512
7513
7514
7515
\section{List operations}
7516
\index{list!operation}
7517
7518
The {\tt +} operator concatenates lists:
7519
\index{concatenation!list}
7520
\index{list!concatenation}
7521
7522
\begin{verbatim}
7523
>>> a = [1, 2, 3]
7524
>>> b = [4, 5, 6]
7525
>>> c = a + b
7526
>>> c
7527
[1, 2, 3, 4, 5, 6]
7528
\end{verbatim}
7529
%
7530
The {\tt *} operator repeats a list a given number of times:
7531
\index{repetition!list}
7532
\index{list!repetition}
7533
7534
\begin{verbatim}
7535
>>> [0] * 4
7536
[0, 0, 0, 0]
7537
>>> [1, 2, 3] * 3
7538
[1, 2, 3, 1, 2, 3, 1, 2, 3]
7539
\end{verbatim}
7540
%
7541
The first example repeats {\tt [0]} four times. The second example
7542
repeats the list {\tt [1, 2, 3]} three times.
7543
7544
7545
\section{List slices}
7546
\index{slice operator}
7547
\index{operator!slice}
7548
\index{index!slice}
7549
\index{list!slice}
7550
\index{slice!list}
7551
7552
The slice operator also works on lists:
7553
7554
\begin{verbatim}
7555
>>> t = ['a', 'b', 'c', 'd', 'e', 'f']
7556
>>> t[1:3]
7557
['b', 'c']
7558
>>> t[:4]
7559
['a', 'b', 'c', 'd']
7560
>>> t[3:]
7561
['d', 'e', 'f']
7562
\end{verbatim}
7563
%
7564
If you omit the first index, the slice starts at the beginning.
7565
If you omit the second, the slice goes to the end. So if you
7566
omit both, the slice is a copy of the whole list.
7567
\index{list!copy}
7568
\index{slice!copy}
7569
\index{copy!slice}
7570
7571
\begin{verbatim}
7572
>>> t[:]
7573
['a', 'b', 'c', 'd', 'e', 'f']
7574
\end{verbatim}
7575
%
7576
Since lists are mutable, it is often useful to make a copy
7577
before performing operations that modify lists.
7578
\index{mutability}
7579
7580
A slice operator on the left side of an assignment
7581
can update multiple elements:
7582
\index{slice!update}
7583
\index{update!slice}
7584
7585
\begin{verbatim}
7586
>>> t = ['a', 'b', 'c', 'd', 'e', 'f']
7587
>>> t[1:3] = ['x', 'y']
7588
>>> t
7589
['a', 'x', 'y', 'd', 'e', 'f']
7590
\end{verbatim}
7591
%
7592
7593
% You can add elements to a list by squeezing them into an empty
7594
% slice:
7595
7596
% % \begin{verbatim}
7597
% >>> t = ['a', 'd', 'e', 'f']
7598
% >>> t[1:1] = ['b', 'c']
7599
% >>> print t
7600
% ['a', 'b', 'c', 'd', 'e', 'f']
7601
% \end{verbatim}
7602
% \afterverb
7603
%
7604
% And you can remove elements from a list by assigning the empty list to
7605
% them:
7606
7607
% % \begin{verbatim}
7608
% >>> t = ['a', 'b', 'c', 'd', 'e', 'f']
7609
% >>> t[1:3] = []
7610
% >>> print t
7611
% ['a', 'd', 'e', 'f']
7612
% \end{verbatim}
7613
% \afterverb
7614
%
7615
% But both of those operations can be expressed more clearly
7616
% with list methods.
7617
7618
7619
\section{List methods}
7620
\index{list!method}
7621
\index{method, list}
7622
7623
Python provides methods that operate on lists. For example,
7624
{\tt append} adds a new element to the end of a list:
7625
\index{append method}
7626
\index{method!append}
7627
7628
\begin{verbatim}
7629
>>> t = ['a', 'b', 'c']
7630
>>> t.append('d')
7631
>>> t
7632
['a', 'b', 'c', 'd']
7633
\end{verbatim}
7634
%
7635
{\tt extend} takes a list as an argument and appends all of
7636
the elements:
7637
\index{extend method}
7638
\index{method!extend}
7639
7640
\begin{verbatim}
7641
>>> t1 = ['a', 'b', 'c']
7642
>>> t2 = ['d', 'e']
7643
>>> t1.extend(t2)
7644
>>> t1
7645
['a', 'b', 'c', 'd', 'e']
7646
\end{verbatim}
7647
%
7648
This example leaves {\tt t2} unmodified.
7649
7650
{\tt sort} arranges the elements of the list from low to high:
7651
\index{sort method}
7652
\index{method!sort}
7653
7654
\begin{verbatim}
7655
>>> t = ['d', 'c', 'e', 'b', 'a']
7656
>>> t.sort()
7657
>>> t
7658
['a', 'b', 'c', 'd', 'e']
7659
\end{verbatim}
7660
%
7661
Most list methods are void; they modify the list and return {\tt None}.
7662
If you accidentally write {\tt t = t.sort()}, you will be disappointed
7663
with the result.
7664
\index{void method}
7665
\index{method!void}
7666
\index{None special value}
7667
\index{special value!None}
7668
7669
7670
\section{Map, filter and reduce}
7671
\label{filter}
7672
7673
To add up all the numbers in a list, you can use a loop like this:
7674
7675
% see add.py
7676
7677
\begin{verbatim}
7678
def add_all(t):
7679
total = 0
7680
for x in t:
7681
total += x
7682
return total
7683
\end{verbatim}
7684
%
7685
{\tt total} is initialized to 0. Each time through the loop,
7686
{\tt x} gets one element from the list. The {\tt +=} operator
7687
provides a short way to update a variable. This
7688
{\bf augmented assignment statement},
7689
\index{update operator}
7690
\index{operator!update}
7691
\index{assignment!augmented}
7692
\index{augmented assignment}
7693
7694
\begin{verbatim}
7695
total += x
7696
\end{verbatim}
7697
%
7698
is equivalent to
7699
7700
\begin{verbatim}
7701
total = total + x
7702
\end{verbatim}
7703
%
7704
As the loop runs, {\tt total} accumulates the sum of the
7705
elements; a variable used this way is sometimes called an
7706
{\bf accumulator}.
7707
\index{accumulator!sum}
7708
7709
Adding up the elements of a list is such a common operation
7710
that Python provides it as a built-in function, {\tt sum}:
7711
7712
\begin{verbatim}
7713
>>> t = [1, 2, 3]
7714
>>> sum(t)
7715
6
7716
\end{verbatim}
7717
%
7718
An operation like this that combines a sequence of elements into
7719
a single value is sometimes called {\bf reduce}.
7720
\index{reduce pattern}
7721
\index{pattern!reduce}
7722
\index{traversal}
7723
7724
Sometimes you want to traverse one list while building
7725
another. For example, the following function takes a list of strings
7726
and returns a new list that contains capitalized strings:
7727
7728
\begin{verbatim}
7729
def capitalize_all(t):
7730
res = []
7731
for s in t:
7732
res.append(s.capitalize())
7733
return res
7734
\end{verbatim}
7735
%
7736
{\tt res} is initialized with an empty list; each time through
7737
the loop, we append the next element. So {\tt res} is another
7738
kind of accumulator.
7739
\index{accumulator!list}
7740
7741
An operation like \verb"capitalize_all" is sometimes called a {\bf
7742
map} because it ``maps'' a function (in this case the method {\tt
7743
capitalize}) onto each of the elements in a sequence.
7744
\index{map pattern}
7745
\index{pattern!map}
7746
\index{filter pattern}
7747
\index{pattern!filter}
7748
7749
Another common operation is to select some of the elements from
7750
a list and return a sublist. For example, the following
7751
function takes a list of strings and returns a list that contains
7752
only the uppercase strings:
7753
7754
\begin{verbatim}
7755
def only_upper(t):
7756
res = []
7757
for s in t:
7758
if s.isupper():
7759
res.append(s)
7760
return res
7761
\end{verbatim}
7762
%
7763
{\tt isupper} is a string method that returns {\tt True} if
7764
the string contains only upper case letters.
7765
7766
An operation like \verb"only_upper" is called a {\bf filter} because
7767
it selects some of the elements and filters out the others.
7768
7769
Most common list operations can be expressed as a combination
7770
of map, filter and reduce.
7771
7772
7773
\section{Deleting elements}
7774
\index{element deletion}
7775
\index{deletion, element of list}
7776
7777
There are several ways to delete elements from a list. If you
7778
know the index of the element you want, you can use
7779
{\tt pop}:
7780
\index{pop method}
7781
\index{method!pop}
7782
7783
\begin{verbatim}
7784
>>> t = ['a', 'b', 'c']
7785
>>> x = t.pop(1)
7786
>>> t
7787
['a', 'c']
7788
>>> x
7789
'b'
7790
\end{verbatim}
7791
%
7792
{\tt pop} modifies the list and returns the element that was removed.
7793
If you don't provide an index, it deletes and returns the
7794
last element.
7795
7796
If you don't need the removed value, you can use the {\tt del}
7797
operator:
7798
\index{del operator}
7799
\index{operator!del}
7800
7801
\begin{verbatim}
7802
>>> t = ['a', 'b', 'c']
7803
>>> del t[1]
7804
>>> t
7805
['a', 'c']
7806
\end{verbatim}
7807
%
7808
If you know the element you want to remove (but not the index), you
7809
can use {\tt remove}:
7810
\index{remove method}
7811
\index{method!remove}
7812
7813
\begin{verbatim}
7814
>>> t = ['a', 'b', 'c']
7815
>>> t.remove('b')
7816
>>> t
7817
['a', 'c']
7818
\end{verbatim}
7819
%
7820
The return value from {\tt remove} is {\tt None}.
7821
\index{None special value}
7822
\index{special value!None}
7823
7824
To remove more than one element, you can use {\tt del} with
7825
a slice index:
7826
7827
\begin{verbatim}
7828
>>> t = ['a', 'b', 'c', 'd', 'e', 'f']
7829
>>> del t[1:5]
7830
>>> t
7831
['a', 'f']
7832
\end{verbatim}
7833
%
7834
As usual, the slice selects all the elements up to but not
7835
including the second index.
7836
7837
7838
7839
\section{Lists and strings}
7840
\index{list}
7841
\index{string}
7842
\index{sequence}
7843
7844
A string is a sequence of characters and a list is a sequence
7845
of values, but a list of characters is not the same as a
7846
string. To convert from a string to a list of characters,
7847
you can use {\tt list}:
7848
\index{list!function}
7849
\index{function!list}
7850
7851
\begin{verbatim}
7852
>>> s = 'spam'
7853
>>> t = list(s)
7854
>>> t
7855
['s', 'p', 'a', 'm']
7856
\end{verbatim}
7857
%
7858
Because {\tt list} is the name of a built-in function, you should
7859
avoid using it as a variable name. I also avoid {\tt l} because
7860
it looks too much like {\tt 1}. So that's why I use {\tt t}.
7861
7862
The {\tt list} function breaks a string into individual letters. If
7863
you want to break a string into words, you can use the {\tt split}
7864
method:
7865
\index{split method}
7866
\index{method!split}
7867
7868
\begin{verbatim}
7869
>>> s = 'pining for the fjords'
7870
>>> t = s.split()
7871
>>> t
7872
['pining', 'for', 'the', 'fjords']
7873
\end{verbatim}
7874
%
7875
An optional argument called a {\bf delimiter} specifies which
7876
characters to use as word boundaries.
7877
The following example
7878
uses a hyphen as a delimiter:
7879
\index{optional argument}
7880
\index{argument!optional}
7881
\index{delimiter}
7882
7883
\begin{verbatim}
7884
>>> s = 'spam-spam-spam'
7885
>>> delimiter = '-'
7886
>>> t = s.split(delimiter)
7887
>>> t
7888
['spam', 'spam', 'spam']
7889
\end{verbatim}
7890
%
7891
{\tt join} is the inverse of {\tt split}. It
7892
takes a list of strings and
7893
concatenates the elements. {\tt join} is a string method,
7894
so you have to invoke it on the delimiter and pass the
7895
list as a parameter:
7896
\index{join method}
7897
\index{method!join}
7898
\index{concatenation}
7899
7900
\begin{verbatim}
7901
>>> t = ['pining', 'for', 'the', 'fjords']
7902
>>> delimiter = ' '
7903
>>> s = delimiter.join(t)
7904
>>> s
7905
'pining for the fjords'
7906
\end{verbatim}
7907
%
7908
In this case the delimiter is a space character, so
7909
{\tt join} puts a space between words. To concatenate
7910
strings without spaces, you can use the empty string,
7911
\verb"''", as a delimiter.
7912
\index{empty string}
7913
\index{string!empty}
7914
7915
7916
\section{Objects and values}
7917
\label{equivalence}
7918
\index{object}
7919
\index{value}
7920
7921
If we run these assignment statements:
7922
7923
\begin{verbatim}
7924
a = 'banana'
7925
b = 'banana'
7926
\end{verbatim}
7927
%
7928
We know that {\tt a} and {\tt b} both refer to a
7929
string, but we don't
7930
know whether they refer to the {\em same} string.
7931
There are two possible states, shown in Figure~\ref{fig.list1}.
7932
\index{aliasing}
7933
7934
\begin{figure}
7935
\centerline
7936
{\includegraphics[scale=0.8]{figs/list1.pdf}}
7937
\caption{State diagram.}
7938
\label{fig.list1}
7939
\end{figure}
7940
7941
In one case, {\tt a} and {\tt b} refer to two different objects that
7942
have the same value. In the second case, they refer to the same
7943
object.
7944
\index{is operator}
7945
\index{operator!is}
7946
7947
To check whether two variables refer to the same object, you can
7948
use the {\tt is} operator.
7949
7950
\begin{verbatim}
7951
>>> a = 'banana'
7952
>>> b = 'banana'
7953
>>> a is b
7954
True
7955
\end{verbatim}
7956
%
7957
In this example, Python only created one string object, and both {\tt
7958
a} and {\tt b} refer to it. But when you create two lists, you get
7959
two objects:
7960
7961
\begin{verbatim}
7962
>>> a = [1, 2, 3]
7963
>>> b = [1, 2, 3]
7964
>>> a is b
7965
False
7966
\end{verbatim}
7967
%
7968
So the state diagram looks like Figure~\ref{fig.list2}.
7969
\index{state diagram}
7970
\index{diagram!state}
7971
7972
\begin{figure}
7973
\centerline
7974
{\includegraphics[scale=0.8]{figs/list2.pdf}}
7975
\caption{State diagram.}
7976
\label{fig.list2}
7977
\end{figure}
7978
7979
In this case we would say that the two lists are {\bf equivalent},
7980
because they have the same elements, but not {\bf identical}, because
7981
they are not the same object. If two objects are identical, they are
7982
also equivalent, but if they are equivalent, they are not necessarily
7983
identical.
7984
\index{equivalence}
7985
\index{identity}
7986
7987
Until now, we have been using ``object'' and ``value''
7988
interchangeably, but it is more precise to say that an object has a
7989
value. If you evaluate {\tt [1, 2, 3]}, you get a list
7990
object whose value is a sequence of integers. If another
7991
list has the same elements, we say it has the same value, but
7992
it is not the same object.
7993
\index{object}
7994
\index{value}
7995
7996
7997
\section{Aliasing}
7998
\index{aliasing}
7999
\index{reference!aliasing}
8000
8001
If {\tt a} refers to an object and you assign {\tt b = a},
8002
then both variables refer to the same object:
8003
8004
\begin{verbatim}
8005
>>> a = [1, 2, 3]
8006
>>> b = a
8007
>>> b is a
8008
True
8009
\end{verbatim}
8010
%
8011
The state diagram looks like Figure~\ref{fig.list3}.
8012
\index{state diagram}
8013
\index{diagram!state}
8014
8015
\begin{figure}
8016
\centerline
8017
{\includegraphics[scale=0.8]{figs/list3.pdf}}
8018
\caption{State diagram.}
8019
\label{fig.list3}
8020
\end{figure}
8021
8022
The association of a variable with an object is called a {\bf
8023
reference}. In this example, there are two references to the same
8024
object.
8025
\index{reference}
8026
8027
An object with more than one reference has more
8028
than one name, so we say that the object is {\bf aliased}.
8029
\index{mutability}
8030
8031
If the aliased object is mutable, changes made with one alias affect
8032
the other:
8033
8034
\begin{verbatim}
8035
>>> b[0] = 42
8036
>>> a
8037
[42, 2, 3]
8038
\end{verbatim}
8039
%
8040
Although this behavior can be useful, it is error-prone. In general,
8041
it is safer to avoid aliasing when you are working with mutable
8042
objects.
8043
\index{immutability}
8044
8045
For immutable objects like strings, aliasing is not as much of a
8046
problem. In this example:
8047
8048
\begin{verbatim}
8049
a = 'banana'
8050
b = 'banana'
8051
\end{verbatim}
8052
%
8053
It almost never makes a difference whether {\tt a} and {\tt b} refer
8054
to the same string or not.
8055
8056
8057
\section{List arguments}
8058
\label{list.arguments}
8059
\index{list!as argument}
8060
\index{argument}
8061
\index{argument!list}
8062
\index{reference}
8063
\index{parameter}
8064
8065
When you pass a list to a function, the function gets a reference to
8066
the list. If the function modifies the list, the caller sees
8067
the change. For example, \verb"delete_head" removes the first element
8068
from a list:
8069
8070
\begin{verbatim}
8071
def delete_head(t):
8072
del t[0]
8073
\end{verbatim}
8074
%
8075
Here's how it is used:
8076
8077
\begin{verbatim}
8078
>>> letters = ['a', 'b', 'c']
8079
>>> delete_head(letters)
8080
>>> letters
8081
['b', 'c']
8082
\end{verbatim}
8083
%
8084
The parameter {\tt t} and the variable {\tt letters} are
8085
aliases for the same object. The stack diagram looks like
8086
Figure~\ref{fig.stack5}.
8087
\index{stack diagram}
8088
\index{diagram!stack}
8089
8090
\begin{figure}
8091
\centerline
8092
{\includegraphics[scale=0.8]{figs/stack5.pdf}}
8093
\caption{Stack diagram.}
8094
\label{fig.stack5}
8095
\end{figure}
8096
8097
Since the list is shared by two frames, I drew
8098
it between them.
8099
8100
It is important to distinguish between operations that
8101
modify lists and operations that create new lists. For
8102
example, the {\tt append} method modifies a list, but the
8103
{\tt +} operator creates a new list.
8104
\index{append method}
8105
\index{method!append}
8106
\index{list!concatenation}
8107
\index{concatenation!list}
8108
8109
Here's an example using {\tt append}:
8110
%
8111
\begin{verbatim}
8112
>>> t1 = [1, 2]
8113
>>> t2 = t1.append(3)
8114
>>> t1
8115
[1, 2, 3]
8116
>>> t2
8117
None
8118
\end{verbatim}
8119
%
8120
The return value from {\tt append} is {\tt None}.
8121
8122
Here's an example using the {\tt +} operator:
8123
%
8124
\begin{verbatim}
8125
>>> t3 = t1 + [4]
8126
>>> t1
8127
[1, 2, 3]
8128
>>> t3
8129
[1, 2, 3, 4]
8130
\end{verbatim}
8131
%
8132
The result of the operator is a new list, and the original list is
8133
unchanged.
8134
8135
This difference is important when you write functions that
8136
are supposed to modify lists. For example, this function
8137
{\em does not} delete the head of a list:
8138
%
8139
\begin{verbatim}
8140
def bad_delete_head(t):
8141
t = t[1:] # WRONG!
8142
\end{verbatim}
8143
%
8144
The slice operator creates a new list and the assignment
8145
makes {\tt t} refer to it, but that doesn't affect the caller.
8146
\index{slice operator}
8147
\index{operator!slice}
8148
%
8149
\begin{verbatim}
8150
>>> t4 = [1, 2, 3]
8151
>>> bad_delete_head(t4)
8152
>>> t4
8153
[1, 2, 3]
8154
\end{verbatim}
8155
%
8156
At the beginning of \verb"bad_delete_head", {\tt t} and {\tt t4}
8157
refer to the same list. At the end, {\tt t} refers to a new list,
8158
but {\tt t4} still refers to the original, unmodified list.
8159
8160
An alternative is to write a function that creates and
8161
returns a new list. For
8162
example, {\tt tail} returns all but the first
8163
element of a list:
8164
8165
\begin{verbatim}
8166
def tail(t):
8167
return t[1:]
8168
\end{verbatim}
8169
%
8170
This function leaves the original list unmodified.
8171
Here's how it is used:
8172
8173
\begin{verbatim}
8174
>>> letters = ['a', 'b', 'c']
8175
>>> rest = tail(letters)
8176
>>> rest
8177
['b', 'c']
8178
\end{verbatim}
8179
8180
8181
8182
\section{Debugging}
8183
\index{debugging}
8184
8185
Careless use of lists (and other mutable objects)
8186
can lead to long hours of debugging. Here are some common
8187
pitfalls and ways to avoid them:
8188
8189
\begin{enumerate}
8190
8191
\item Most list methods modify the argument and
8192
return {\tt None}. This is the opposite of the string methods,
8193
which return a new string and leave the original alone.
8194
8195
If you are used to writing string code like this:
8196
8197
\begin{verbatim}
8198
word = word.strip()
8199
\end{verbatim}
8200
8201
It is tempting to write list code like this:
8202
8203
\begin{verbatim}
8204
t = t.sort() # WRONG!
8205
\end{verbatim}
8206
\index{sort method}
8207
\index{method!sort}
8208
8209
Because {\tt sort} returns {\tt None}, the
8210
next operation you perform with {\tt t} is likely to fail.
8211
8212
Before using list methods and operators, you should read the
8213
documentation carefully and then test them in interactive mode.
8214
8215
\item Pick an idiom and stick with it.
8216
8217
Part of the problem with lists is that there are too many
8218
ways to do things. For example, to remove an element from
8219
a list, you can use {\tt pop}, {\tt remove}, {\tt del},
8220
or even a slice assignment.
8221
8222
To add an element, you can use the {\tt append} method or
8223
the {\tt +} operator. Assuming that {\tt t} is a list and
8224
{\tt x} is a list element, these are correct:
8225
8226
\begin{verbatim}
8227
t.append(x)
8228
t = t + [x]
8229
t += [x]
8230
\end{verbatim}
8231
8232
And these are wrong:
8233
8234
\begin{verbatim}
8235
t.append([x]) # WRONG!
8236
t = t.append(x) # WRONG!
8237
t + [x] # WRONG!
8238
t = t + x # WRONG!
8239
\end{verbatim}
8240
8241
Try out each of these examples in interactive mode to make sure
8242
you understand what they do. Notice that only the last
8243
one causes a runtime error; the other three are legal, but they
8244
do the wrong thing.
8245
8246
8247
\item Make copies to avoid aliasing.
8248
\index{aliasing!copying to avoid}
8249
\index{copy!to avoid aliasing}
8250
8251
If you want to use a method like {\tt sort} that modifies
8252
the argument, but you need to keep the original list as
8253
well, you can make a copy.
8254
8255
\begin{verbatim}
8256
>>> t = [3, 1, 2]
8257
>>> t2 = t[:]
8258
>>> t2.sort()
8259
>>> t
8260
[3, 1, 2]
8261
>>> t2
8262
[1, 2, 3]
8263
\end{verbatim}
8264
8265
In this example you could also use the built-in function {\tt sorted},
8266
which returns a new, sorted list and leaves the original alone.
8267
\index{sorted!function}
8268
\index{function!sorted}
8269
8270
\begin{verbatim}
8271
>>> t2 = sorted(t)
8272
>>> t
8273
[3, 1, 2]
8274
>>> t2
8275
[1, 2, 3]
8276
\end{verbatim}
8277
8278
\end{enumerate}
8279
8280
8281
8282
\section{Glossary}
8283
8284
\begin{description}
8285
8286
\item[list:] A sequence of values.
8287
\index{list}
8288
8289
\item[element:] One of the values in a list (or other sequence),
8290
also called items.
8291
\index{element}
8292
8293
\item[nested list:] A list that is an element of another list.
8294
\index{nested list}
8295
8296
\item[accumulator:] A variable used in a loop to add up or
8297
accumulate a result.
8298
\index{accumulator}
8299
8300
\item[augmented assignment:] A statement that updates the value
8301
of a variable using an operator like \verb"+=".
8302
\index{assignment!augmented}
8303
\index{augmented assignment}
8304
\index{traversal}
8305
8306
\item[reduce:] A processing pattern that traverses a sequence
8307
and accumulates the elements into a single result.
8308
\index{reduce pattern}
8309
\index{pattern!reduce}
8310
8311
\item[map:] A processing pattern that traverses a sequence and
8312
performs an operation on each element.
8313
\index{map pattern}
8314
\index{pattern!map}
8315
8316
\item[filter:] A processing pattern that traverses a list and
8317
selects the elements that satisfy some criterion.
8318
\index{filter pattern}
8319
\index{pattern!filter}
8320
8321
\item[object:] Something a variable can refer to. An object
8322
has a type and a value.
8323
\index{object}
8324
8325
\item[equivalent:] Having the same value.
8326
\index{equivalent}
8327
8328
\item[identical:] Being the same object (which implies equivalence).
8329
\index{identical}
8330
8331
\item[reference:] The association between a variable and its value.
8332
\index{reference}
8333
8334
\item[aliasing:] A circumstance where two or more variables refer to the same
8335
object.
8336
\index{aliasing}
8337
8338
\item[delimiter:] A character or string used to indicate where a
8339
string should be split.
8340
\index{delimiter}
8341
8342
\end{description}
8343
8344
8345
\section{Exercises}
8346
8347
You can download solutions to these exercises from
8348
\url{http://thinkpython2.com/code/list_exercises.py}.
8349
8350
\begin{exercise}
8351
8352
Write a function called \verb"nested_sum" that takes a list of lists
8353
of integers and adds up the elements from all of the nested lists.
8354
For example:
8355
8356
\begin{verbatim}
8357
>>> t = [[1, 2], [3], [4, 5, 6]]
8358
>>> nested_sum(t)
8359
21
8360
\end{verbatim}
8361
8362
\end{exercise}
8363
8364
\begin{exercise}
8365
\label{cumulative}
8366
\index{cumulative sum}
8367
8368
Write a function called {\tt cumsum} that takes a list of numbers and
8369
returns the cumulative sum; that is, a new list where the $i$th
8370
element is the sum of the first $i+1$ elements from the original list.
8371
For example:
8372
8373
\begin{verbatim}
8374
>>> t = [1, 2, 3]
8375
>>> cumsum(t)
8376
[1, 3, 6]
8377
\end{verbatim}
8378
8379
\end{exercise}
8380
8381
\begin{exercise}
8382
8383
Write a function called \verb"middle" that takes a list and
8384
returns a new list that contains all but the first and last
8385
elements. For example:
8386
8387
\begin{verbatim}
8388
>>> t = [1, 2, 3, 4]
8389
>>> middle(t)
8390
[2, 3]
8391
\end{verbatim}
8392
8393
\end{exercise}
8394
8395
\begin{exercise}
8396
8397
Write a function called \verb"chop" that takes a list, modifies it
8398
by removing the first and last elements, and returns {\tt None}.
8399
For example:
8400
8401
\begin{verbatim}
8402
>>> t = [1, 2, 3, 4]
8403
>>> chop(t)
8404
>>> t
8405
[2, 3]
8406
\end{verbatim}
8407
8408
\end{exercise}
8409
8410
8411
\begin{exercise}
8412
Write a function called \verb"is_sorted" that takes a list as a
8413
parameter and returns {\tt True} if the list is sorted in ascending
8414
order and {\tt False} otherwise. For example:
8415
8416
\begin{verbatim}
8417
>>> is_sorted([1, 2, 2])
8418
True
8419
>>> is_sorted(['b', 'a'])
8420
False
8421
\end{verbatim}
8422
8423
\end{exercise}
8424
8425
8426
\begin{exercise}
8427
\label{anagram}
8428
\index{anagram}
8429
8430
Two words are anagrams if you can rearrange the letters from one
8431
to spell the other. Write a function called \verb"is_anagram"
8432
that takes two strings and returns {\tt True} if they are anagrams.
8433
\end{exercise}
8434
8435
8436
8437
\begin{exercise}
8438
\label{duplicate}
8439
\index{duplicate}
8440
\index{uniqueness}
8441
8442
Write a function called \verb"has_duplicates" that takes
8443
a list and returns {\tt True} if there is any element that
8444
appears more than once. It should not modify the original
8445
list.
8446
8447
\end{exercise}
8448
8449
8450
\begin{exercise}
8451
8452
This exercise pertains to the so-called Birthday Paradox, which you
8453
can read about at \url{http://en.wikipedia.org/wiki/Birthday_paradox}.
8454
\index{birthday paradox}
8455
8456
If there are 23 students in your class, what are the chances
8457
that two of you have the same birthday? You can estimate this
8458
probability by generating random samples of 23 birthdays
8459
and checking for matches. Hint: you can generate random birthdays
8460
with the {\tt randint} function in the {\tt random} module.
8461
\index{random module}
8462
\index{module!random}
8463
\index{randint function}
8464
\index{function!randint}
8465
8466
You can download my
8467
solution from \url{http://thinkpython2.com/code/birthday.py}.
8468
8469
\end{exercise}
8470
8471
8472
8473
\begin{exercise}
8474
\index{append method}
8475
\index{method append}
8476
\index{list!concatenation}
8477
\index{concatenation!list}
8478
8479
Write a function that reads the file {\tt words.txt} and builds
8480
a list with one element per word. Write two versions of
8481
this function, one using the {\tt append} method and the
8482
other using the idiom {\tt t = t + [x]}. Which one takes
8483
longer to run? Why?
8484
8485
Solution: \url{http://thinkpython2.com/code/wordlist.py}.
8486
\index{time module}
8487
\index{module!time}
8488
8489
\end{exercise}
8490
8491
8492
\begin{exercise}
8493
\label{wordlist1}
8494
\label{bisection}
8495
\index{membership!bisection search}
8496
\index{bisection search}
8497
\index{search, bisection}
8498
\index{membership!binary search}
8499
\index{binary search}
8500
\index{search, binary}
8501
8502
To check whether a word is in the word list, you could use
8503
the {\tt in} operator, but it would be slow because it searches
8504
through the words in order.
8505
8506
Because the words are in alphabetical order, we can speed things up
8507
with a bisection search (also known as binary search), which is
8508
similar to what you do when you look a word up in the dictionary (the book, not the data structure). You
8509
start in the middle and check to see whether the word you are looking
8510
for comes before the word in the middle of the list. If so, you
8511
search the first half of the list the same way. Otherwise you search
8512
the second half.
8513
8514
Either way, you cut the remaining search space in half. If the
8515
word list has 113,809 words, it will take about 17 steps to
8516
find the word or conclude that it's not there.
8517
8518
Write a function called \verb"in_bisect" that takes a sorted list
8519
and a target value and returns {\tt True} if the word is
8520
in the list and {\tt False} if it's not.
8521
\index{bisect module}
8522
\index{module!bisect}
8523
8524
Or you could read the documentation of the {\tt bisect} module
8525
and use that! Solution: \url{http://thinkpython2.com/code/inlist.py}.
8526
8527
\end{exercise}
8528
8529
\begin{exercise}
8530
\index{reverse word pair}
8531
8532
Two words are a ``reverse pair'' if each is the reverse of the
8533
other. Write a program that finds all the reverse pairs in the
8534
word list. Solution: \url{http://thinkpython2.com/code/reverse_pair.py}.
8535
8536
\end{exercise}
8537
8538
\begin{exercise}
8539
\index{interlocking words}
8540
8541
Two words ``interlock'' if taking alternating letters from each forms
8542
a new word. For example, ``shoe'' and ``cold''
8543
interlock to form ``schooled''.
8544
Solution: \url{http://thinkpython2.com/code/interlock.py}.
8545
Credit: This exercise is inspired by an example at \url{http://puzzlers.org}.
8546
8547
\begin{enumerate}
8548
8549
\item Write a program that finds all pairs of words that interlock.
8550
Hint: don't enumerate all pairs!
8551
8552
\item Can you find any words that are three-way interlocked; that is,
8553
every third letter forms a word, starting from the first, second or
8554
third?
8555
8556
\end{enumerate}
8557
\end{exercise}
8558
8559
8560
\chapter{Dictionaries}
8561
8562
This chapter presents another built-in type called a dictionary.
8563
Dictionaries are one of Python's best features; they are the
8564
building blocks of many efficient and elegant algorithms.
8565
8566
8567
\section{A dictionary is a mapping}
8568
8569
\index{dictionary}
8570
\index{dictionary}
8571
\index{type!dict}
8572
\index{key}
8573
\index{key-value pair}
8574
\index{index}
8575
A {\bf dictionary} is like a list, but more general. In a list,
8576
the indices have to be integers; in a dictionary they can
8577
be (almost) any type.
8578
8579
A dictionary contains a collection of indices, which are called {\bf
8580
keys}, and a collection of values. Each key is associated with a
8581
single value. The association of a key and a value is called a {\bf
8582
key-value pair} or sometimes an {\bf item}. \index{item}
8583
8584
In mathematical language, a dictionary represents a {\bf mapping}
8585
from keys to values, so you can also say that each key
8586
``maps to'' a value.
8587
As an example, we'll build a dictionary that maps from English
8588
to Spanish words, so the keys and the values are all strings.
8589
8590
The function {\tt dict} creates a new dictionary with no items.
8591
Because {\tt dict} is the name of a built-in function, you
8592
should avoid using it as a variable name.
8593
\index{dict function}
8594
\index{function!dict}
8595
8596
\begin{verbatim}
8597
>>> eng2sp = dict()
8598
>>> eng2sp
8599
{}
8600
\end{verbatim}
8601
8602
The squiggly-brackets, \verb"{}", represent an empty dictionary.
8603
To add items to the dictionary, you can use square brackets:
8604
\index{squiggly bracket}
8605
\index{bracket!squiggly}
8606
8607
\begin{verbatim}
8608
>>> eng2sp['one'] = 'uno'
8609
\end{verbatim}
8610
%
8611
This line creates an item that maps from the key
8612
\verb"'one'" to the value \verb"'uno'". If we print the
8613
dictionary again, we see a key-value pair with a colon
8614
between the key and value:
8615
8616
\begin{verbatim}
8617
>>> eng2sp
8618
{'one': 'uno'}
8619
\end{verbatim}
8620
%
8621
This output format is also an input format. For example,
8622
you can create a new dictionary with three items:
8623
8624
\begin{verbatim}
8625
>>> eng2sp = {'one': 'uno', 'two': 'dos', 'three': 'tres'}
8626
\end{verbatim}
8627
%
8628
But if you print {\tt eng2sp}, you might be surprised:
8629
8630
\begin{verbatim}
8631
>>> eng2sp
8632
{'one': 'uno', 'three': 'tres', 'two': 'dos'}
8633
\end{verbatim}
8634
%
8635
The order of the key-value pairs might not be the same. If
8636
you type the same example on your computer, you might get a
8637
different result. In general, the order of items in
8638
a dictionary is unpredictable.
8639
8640
But that's not a problem because
8641
the elements of a dictionary are never indexed with integer indices.
8642
Instead, you use the keys to look up the corresponding values:
8643
8644
\begin{verbatim}
8645
>>> eng2sp['two']
8646
'dos'
8647
\end{verbatim}
8648
%
8649
The key \verb"'two'" always maps to the value \verb"'dos'" so the order
8650
of the items doesn't matter.
8651
8652
If the key isn't in the dictionary, you get an exception:
8653
\index{exception!KeyError}
8654
\index{KeyError}
8655
8656
\begin{verbatim}
8657
>>> eng2sp['four']
8658
KeyError: 'four'
8659
\end{verbatim}
8660
%
8661
The {\tt len} function works on dictionaries; it returns the
8662
number of key-value pairs:
8663
\index{len function}
8664
\index{function!len}
8665
8666
\begin{verbatim}
8667
>>> len(eng2sp)
8668
3
8669
\end{verbatim}
8670
%
8671
The {\tt in} operator works on dictionaries, too; it tells you whether
8672
something appears as a {\em key} in the dictionary (appearing
8673
as a value is not good enough).
8674
\index{membership!dictionary}
8675
\index{in operator}
8676
\index{operator!in}
8677
8678
\begin{verbatim}
8679
>>> 'one' in eng2sp
8680
True
8681
>>> 'uno' in eng2sp
8682
False
8683
\end{verbatim}
8684
%
8685
To see whether something appears as a value in a dictionary, you
8686
can use the method {\tt values}, which returns a collection of
8687
values, and then use the {\tt in} operator:
8688
\index{values method}
8689
\index{method!values}
8690
8691
\begin{verbatim}
8692
>>> vals = eng2sp.values()
8693
>>> 'uno' in vals
8694
True
8695
\end{verbatim}
8696
%
8697
The {\tt in} operator uses different algorithms for lists and
8698
dictionaries. For lists, it searches the elements of the list in
8699
order, as in Section~\ref{find}. As the list gets longer, the search
8700
time gets longer in direct proportion.
8701
8702
Python dictionaries use a data structure
8703
called a {\bf hashtable} that has a remarkable property: the
8704
{\tt in} operator takes about the same amount of time no matter how
8705
many items are in the dictionary. I explain how that's possible
8706
in Section~\ref{hashtable}, but the explanation might not make
8707
sense until you've read a few more chapters.
8708
8709
8710
\section{Dictionary as a collection of counters}
8711
\label{histogram}
8712
\index{counter}
8713
8714
Suppose you are given a string and you want to count how many
8715
times each letter appears. There are several ways you could do it:
8716
8717
\begin{enumerate}
8718
8719
\item You could create 26 variables, one for each letter of the
8720
alphabet. Then you could traverse the string and, for each
8721
character, increment the corresponding counter, probably using
8722
a chained conditional.
8723
8724
\item You could create a list with 26 elements. Then you could
8725
convert each character to a number (using the built-in function
8726
{\tt ord}), use the number as an index into the list, and increment
8727
the appropriate counter.
8728
8729
\item You could create a dictionary with characters as keys
8730
and counters as the corresponding values. The first time you
8731
see a character, you would add an item to the dictionary. After
8732
that you would increment the value of an existing item.
8733
8734
\end{enumerate}
8735
8736
Each of these options performs the same computation, but each
8737
of them implements that computation in a different way.
8738
\index{implementation}
8739
8740
An {\bf implementation} is a way of performing a computation;
8741
some implementations are better than others. For example,
8742
an advantage of the dictionary implementation is that we don't
8743
have to know ahead of time which letters appear in the string
8744
and we only have to make room for the letters that do appear.
8745
8746
Here is what the code might look like:
8747
8748
\begin{verbatim}
8749
def histogram(s):
8750
d = dict()
8751
for c in s:
8752
if c not in d:
8753
d[c] = 1
8754
else:
8755
d[c] += 1
8756
return d
8757
\end{verbatim}
8758
%
8759
The name of the function is {\tt histogram}, which is a statistical
8760
term for a collection of counters (or frequencies).
8761
\index{histogram}
8762
\index{frequency}
8763
\index{traversal}
8764
8765
The first line of the
8766
function creates an empty dictionary. The {\tt for} loop traverses
8767
the string. Each time through the loop, if the character {\tt c} is
8768
not in the dictionary, we create a new item with key {\tt c} and the
8769
initial value 1 (since we have seen this letter once). If {\tt c} is
8770
already in the dictionary we increment {\tt d[c]}.
8771
\index{histogram}
8772
8773
Here's how it works:
8774
8775
\begin{verbatim}
8776
>>> h = histogram('brontosaurus')
8777
>>> h
8778
{'a': 1, 'b': 1, 'o': 2, 'n': 1, 's': 2, 'r': 2, 'u': 2, 't': 1}
8779
\end{verbatim}
8780
%
8781
The histogram indicates that the letters \verb"'a'" and \verb"'b'"
8782
appear once; \verb"'o'" appears twice, and so on.
8783
8784
8785
\index{get method}
8786
\index{method!get}
8787
Dictionaries have a method called {\tt get} that takes a key
8788
and a default value. If the key appears in the dictionary,
8789
{\tt get} returns the corresponding value; otherwise it returns
8790
the default value. For example:
8791
8792
\begin{verbatim}
8793
>>> h = histogram('a')
8794
>>> h
8795
{'a': 1}
8796
>>> h.get('a', 0)
8797
1
8798
>>> h.get('c', 0)
8799
0
8800
\end{verbatim}
8801
%
8802
As an exercise, use {\tt get} to write {\tt histogram} more concisely. You
8803
should be able to eliminate the {\tt if} statement.
8804
8805
8806
\section{Looping and dictionaries}
8807
\index{dictionary!looping with}
8808
\index{looping!with dictionaries}
8809
\index{traversal}
8810
8811
If you use a dictionary in a {\tt for} statement, it traverses
8812
the keys of the dictionary. For example, \verb"print_hist"
8813
prints each key and the corresponding value:
8814
8815
\begin{verbatim}
8816
def print_hist(h):
8817
for c in h:
8818
print(c, h[c])
8819
\end{verbatim}
8820
%
8821
Here's what the output looks like:
8822
8823
\begin{verbatim}
8824
>>> h = histogram('parrot')
8825
>>> print_hist(h)
8826
a 1
8827
p 1
8828
r 2
8829
t 1
8830
o 1
8831
\end{verbatim}
8832
%
8833
Again, the keys are in no particular order. To traverse the keys
8834
in sorted order, you can use the built-in function {\tt sorted}:
8835
\index{sorted!function}
8836
\index{function!sorted}
8837
8838
\begin{verbatim}
8839
>>> for key in sorted(h):
8840
... print(key, h[key])
8841
a 1
8842
o 1
8843
p 1
8844
r 2
8845
t 1
8846
\end{verbatim}
8847
8848
%TODO: get this on Atlas
8849
8850
8851
\section{Reverse lookup}
8852
\label{raise}
8853
\index{dictionary!lookup}
8854
\index{dictionary!reverse lookup}
8855
\index{lookup, dictionary}
8856
\index{reverse lookup, dictionary}
8857
8858
Given a dictionary {\tt d} and a key {\tt k}, it is easy to
8859
find the corresponding value {\tt v = d[k]}. This operation
8860
is called a {\bf lookup}.
8861
8862
But what if you have {\tt v} and you want to find {\tt k}?
8863
You have two problems: first, there might be more than one
8864
key that maps to the value {\tt v}. Depending on the application,
8865
you might be able to pick one, or you might have to make
8866
a list that contains all of them. Second, there is no
8867
simple syntax to do a {\bf reverse lookup}; you have to search.
8868
8869
Here is a function that takes a value and returns the first
8870
key that maps to that value:
8871
8872
\begin{verbatim}
8873
def reverse_lookup(d, v):
8874
for k in d:
8875
if d[k] == v:
8876
return k
8877
raise LookupError()
8878
\end{verbatim}
8879
%
8880
This function is yet another example of the search pattern, but it
8881
uses a feature we haven't seen before, {\tt raise}. The
8882
{\bf raise statement} causes an exception; in this case it causes a
8883
{\tt LookupError}, which is a built-in exception used to indicate
8884
that a lookup operation failed.
8885
\index{search}
8886
\index{pattern!search} \index{raise statement} \index{statement!raise}
8887
\index{exception!LookupError} \index{LookupError}
8888
8889
If we get to the end of the loop, that means {\tt v}
8890
doesn't appear in the dictionary as a value, so we raise an
8891
exception.
8892
8893
Here is an example of a successful reverse lookup:
8894
8895
\begin{verbatim}
8896
>>> h = histogram('parrot')
8897
>>> key = reverse_lookup(h, 2)
8898
>>> key
8899
'r'
8900
\end{verbatim}
8901
%
8902
And an unsuccessful one:
8903
8904
\begin{verbatim}
8905
>>> key = reverse_lookup(h, 3)
8906
Traceback (most recent call last):
8907
File "<stdin>", line 1, in <module>
8908
File "<stdin>", line 5, in reverse_lookup
8909
LookupError
8910
\end{verbatim}
8911
%
8912
The effect when you raise an exception is the same as when
8913
Python raises one: it prints a traceback and an error message.
8914
\index{traceback}
8915
\index{optional argument}
8916
\index{argument!optional}
8917
8918
When you raise an exception, you can provide a detailed error message as an optional argument. For example:
8919
8920
\begin{verbatim}
8921
>>> raise LookupError('value does not appear in the dictionary')
8922
Traceback (most recent call last):
8923
File "<stdin>", line 1, in ?
8924
LookupError: value does not appear in the dictionary
8925
\end{verbatim}
8926
%
8927
A reverse lookup is much slower than a forward lookup; if you
8928
have to do it often, or if the dictionary gets big, the performance
8929
of your program will suffer.
8930
8931
8932
\section{Dictionaries and lists}
8933
\label{invert}
8934
8935
Lists can appear as values in a dictionary. For example, if you
8936
are given a dictionary that maps from letters to frequencies, you
8937
might want to invert it; that is, create a dictionary that maps
8938
from frequencies to letters. Since there might be several letters
8939
with the same frequency, each value in the inverted dictionary
8940
should be a list of letters.
8941
\index{invert dictionary}
8942
\index{dictionary!invert}
8943
8944
Here is a function that inverts a dictionary:
8945
8946
\begin{verbatim}
8947
def invert_dict(d):
8948
inverse = dict()
8949
for key in d:
8950
val = d[key]
8951
if val not in inverse:
8952
inverse[val] = [key]
8953
else:
8954
inverse[val].append(key)
8955
return inverse
8956
\end{verbatim}
8957
%
8958
Each time through the loop, {\tt key} gets a key from {\tt d} and
8959
{\tt val} gets the corresponding value. If {\tt val} is not in {\tt
8960
inverse}, that means we haven't seen it before, so we create a new
8961
item and initialize it with a {\bf singleton} (a list that contains a
8962
single element). Otherwise we have seen this value before, so we
8963
append the corresponding key to the list. \index{singleton}
8964
8965
Here is an example:
8966
8967
\begin{verbatim}
8968
>>> hist = histogram('parrot')
8969
>>> hist
8970
{'a': 1, 'p': 1, 'r': 2, 't': 1, 'o': 1}
8971
>>> inverse = invert_dict(hist)
8972
>>> inverse
8973
{1: ['a', 'p', 't', 'o'], 2: ['r']}
8974
\end{verbatim}
8975
8976
\begin{figure}
8977
\centerline
8978
{\includegraphics[scale=0.8]{figs/dict1.pdf}}
8979
\caption{State diagram.}
8980
\label{fig.dict1}
8981
\end{figure}
8982
8983
Figure~\ref{fig.dict1} is a state diagram showing {\tt hist} and {\tt inverse}.
8984
A dictionary is represented as a box with the type {\tt dict} above it
8985
and the key-value pairs inside. If the values are integers, floats or
8986
strings, I draw them inside the box, but I usually draw lists
8987
outside the box, just to keep the diagram simple.
8988
\index{state diagram}
8989
\index{diagram!state}
8990
8991
Lists can be values in a dictionary, as this example shows, but they
8992
cannot be keys. Here's what happens if you try:
8993
\index{TypeError}
8994
\index{exception!TypeError}
8995
8996
8997
\begin{verbatim}
8998
>>> t = [1, 2, 3]
8999
>>> d = dict()
9000
>>> d[t] = 'oops'
9001
Traceback (most recent call last):
9002
File "<stdin>", line 1, in ?
9003
TypeError: list objects are unhashable
9004
\end{verbatim}
9005
%
9006
I mentioned earlier that a dictionary is implemented using
9007
a hashtable and that means that the keys have to be {\bf hashable}.
9008
\index{hash function}
9009
\index{hashable}
9010
9011
A {\bf hash} is a function that takes a value (of any kind)
9012
and returns an integer. Dictionaries use these integers,
9013
called hash values, to store and look up key-value pairs.
9014
\index{immutability}
9015
9016
This system works fine if the keys are immutable. But if the
9017
keys are mutable, like lists, bad things happen. For example,
9018
when you create a key-value pair, Python hashes the key and
9019
stores it in the corresponding location. If you modify the
9020
key and then hash it again, it would go to a different location.
9021
In that case you might have two entries for the same key,
9022
or you might not be able to find a key. Either way, the
9023
dictionary wouldn't work correctly.
9024
9025
That's why keys have to be hashable, and why mutable types like
9026
lists aren't. The simplest way to get around this limitation is to
9027
use tuples, which we will see in the next chapter.
9028
9029
Since dictionaries are mutable, they can't be used as keys,
9030
but they {\em can} be used as values.
9031
9032
9033
\section{Memos}
9034
\label{memoize}
9035
9036
If you played with the {\tt fibonacci} function from
9037
Section~\ref{one.more.example}, you might have noticed that the bigger
9038
the argument you provide, the longer the function takes to run.
9039
Furthermore, the run time increases quickly.
9040
\index{fibonacci function}
9041
\index{function!fibonacci}
9042
9043
To understand why, consider Figure~\ref{fig.fibonacci}, which shows
9044
the {\bf call graph} for {\tt fibonacci} with {\tt n=4}:
9045
9046
\begin{figure}
9047
\centerline
9048
{\includegraphics[scale=0.7]{figs/fibonacci.pdf}}
9049
\caption{Call graph.}
9050
\label{fig.fibonacci}
9051
\end{figure}
9052
9053
A call graph shows a set of function frames, with lines connecting each
9054
frame to the frames of the functions it calls. At the top of the
9055
graph, {\tt fibonacci} with {\tt n=4} calls {\tt fibonacci} with {\tt
9056
n=3} and {\tt n=2}. In turn, {\tt fibonacci} with {\tt n=3} calls
9057
{\tt fibonacci} with {\tt n=2} and {\tt n=1}. And so on.
9058
\index{function frame}
9059
\index{frame}
9060
\index{call graph}
9061
9062
Count how many times {\tt fibonacci(0)} and {\tt fibonacci(1)} are
9063
called. This is an inefficient solution to the problem, and it gets
9064
worse as the argument gets bigger.
9065
\index{memo}
9066
9067
One solution is to keep track of values that have already been
9068
computed by storing them in a dictionary. A previously computed value
9069
that is stored for later use is called a {\bf memo}. Here is a
9070
``memoized'' version of {\tt fibonacci}:
9071
9072
\begin{verbatim}
9073
known = {0:0, 1:1}
9074
9075
def fibonacci(n):
9076
if n in known:
9077
return known[n]
9078
9079
res = fibonacci(n-1) + fibonacci(n-2)
9080
known[n] = res
9081
return res
9082
\end{verbatim}
9083
%
9084
{\tt known} is a dictionary that keeps track of the Fibonacci
9085
numbers we already know. It starts with
9086
two items: 0 maps to 0 and 1 maps to 1.
9087
9088
Whenever {\tt fibonacci} is called, it checks {\tt known}.
9089
If the result is already there, it can return
9090
immediately. Otherwise it has to
9091
compute the new value, add it to the dictionary, and return it.
9092
9093
If you run this version of {\tt fibonacci} and compare it with
9094
the original, you will find that it is much faster.
9095
9096
9097
9098
\section{Global variables}
9099
\index{global variable}
9100
\index{variable!global}
9101
9102
In the previous example, {\tt known} is created outside the function,
9103
so it belongs to the special frame called \verb"__main__".
9104
Variables in \verb"__main__" are sometimes called {\bf global}
9105
because they can be accessed from any function. Unlike local
9106
variables, which disappear when their function ends, global variables
9107
persist from one function call to the next.
9108
\index{flag}
9109
\index{main}
9110
9111
It is common to use global variables for {\bf flags}; that is,
9112
boolean variables that indicate (``flag'') whether a condition
9113
is true. For example, some programs use
9114
a flag named {\tt verbose} to control the level of detail in the
9115
output:
9116
9117
\begin{verbatim}
9118
verbose = True
9119
9120
def example1():
9121
if verbose:
9122
print('Running example1')
9123
\end{verbatim}
9124
%
9125
If you try to reassign a global variable, you might be surprised.
9126
The following example is supposed to keep track of whether the
9127
function has been called:
9128
\index{reassignment}
9129
9130
\begin{verbatim}
9131
been_called = False
9132
9133
def example2():
9134
been_called = True # WRONG
9135
\end{verbatim}
9136
%
9137
But if you run it you will see that the value of \verb"been_called"
9138
doesn't change. The problem is that {\tt example2} creates a new local
9139
variable named \verb"been_called". The local variable goes away when
9140
the function ends, and has no effect on the global variable.
9141
\index{global statement}
9142
\index{statement!global}
9143
\index{declaration}
9144
9145
To reassign a global variable inside a function you have to
9146
{\bf declare} the global variable before you use it:
9147
9148
\begin{verbatim}
9149
been_called = False
9150
9151
def example2():
9152
global been_called
9153
been_called = True
9154
\end{verbatim}
9155
%
9156
The {\bf global statement} tells the interpreter
9157
something like, ``In this function, when I say \verb"been_called", I
9158
mean the global variable; don't create a local one.''
9159
\index{update!global variable}
9160
\index{global variable!update}
9161
9162
Here's an example that tries to update a global variable:
9163
9164
\begin{verbatim}
9165
count = 0
9166
9167
def example3():
9168
count = count + 1 # WRONG
9169
\end{verbatim}
9170
%
9171
If you run it you get:
9172
\index{UnboundLocalError}
9173
\index{exception!UnboundLocalError}
9174
9175
\begin{verbatim}
9176
UnboundLocalError: local variable 'count' referenced before assignment
9177
\end{verbatim}
9178
%
9179
Python assumes that {\tt count} is local, and under that assumption
9180
you are reading it before writing it. The solution, again,
9181
is to declare {\tt count} global.
9182
\index{counter}
9183
9184
\begin{verbatim}
9185
def example3():
9186
global count
9187
count += 1
9188
\end{verbatim}
9189
%
9190
If a global variable refers to a mutable value, you can modify
9191
the value without declaring the variable:
9192
\index{mutability}
9193
9194
\begin{verbatim}
9195
known = {0:0, 1:1}
9196
9197
def example4():
9198
known[2] = 1
9199
\end{verbatim}
9200
%
9201
So you can add, remove and replace elements of a global list or
9202
dictionary, but if you want to reassign the variable, you
9203
have to declare it:
9204
9205
\begin{verbatim}
9206
def example5():
9207
global known
9208
known = dict()
9209
\end{verbatim}
9210
%
9211
Global variables can be useful, but if you have a lot of them,
9212
and you modify them frequently, they can make programs
9213
hard to debug.
9214
9215
9216
\section{Debugging}
9217
\index{debugging}
9218
9219
As you work with bigger datasets it can become unwieldy to
9220
debug by printing and checking the output by hand. Here are some
9221
suggestions for debugging large datasets:
9222
9223
\begin{description}
9224
9225
\item[Scale down the input:] If possible, reduce the size of the
9226
dataset. For example if the program reads a text file, start with
9227
just the first 10 lines, or with the smallest example you can find.
9228
You can either edit the files themselves, or (better) modify the
9229
program so it reads only the first {\tt n} lines.
9230
9231
If there is an error, you can reduce {\tt n} to the smallest
9232
value that manifests the error, and then increase it gradually
9233
as you find and correct errors.
9234
9235
\item[Check summaries and types:] Instead of printing and checking the
9236
entire dataset, consider printing summaries of the data: for example,
9237
the number of items in a dictionary or the total of a list of numbers.
9238
9239
A common cause of runtime errors is a value that is not the right
9240
type. For debugging this kind of error, it is often enough to print
9241
the type of a value.
9242
9243
\item[Write self-checks:] Sometimes you can write code to check
9244
for errors automatically. For example, if you are computing the
9245
average of a list of numbers, you could check that the result is
9246
not greater than the largest element in the list or less than
9247
the smallest. This is called a ``sanity check'' because it detects
9248
results that are ``insane''.
9249
\index{sanity check}
9250
\index{consistency check}
9251
9252
Another kind of check compares the results of two different
9253
computations to see if they are consistent. This is called a
9254
``consistency check''.
9255
9256
\item[Format the output:] Formatting debugging output
9257
can make it easier to spot an error. We saw an example in
9258
Section~\ref{factdebug}. Another tool you might find useful is the {\tt pprint} module, which provides
9259
a {\tt pprint} function that displays built-in types in
9260
a more human-readable format ({\tt pprint} stands for
9261
``pretty print'').
9262
\index{pretty print}
9263
\index{pprint module}
9264
\index{module!pprint}
9265
9266
\end{description}
9267
9268
Again, time you spend building scaffolding can reduce
9269
the time you spend debugging.
9270
\index{scaffolding}
9271
9272
9273
\section{Glossary}
9274
9275
\begin{description}
9276
9277
\item[mapping:] A relationship in which each element of one set
9278
corresponds to an element of another set.
9279
\index{mapping}
9280
9281
\item[dictionary:] A mapping from keys to their
9282
corresponding values.
9283
\index{dictionary}
9284
9285
\item[key-value pair:] The representation of the mapping from
9286
a key to a value.
9287
\index{key-value pair}
9288
9289
\item[item:] In a dictionary, another name for a key-value
9290
pair.
9291
\index{item!dictionary}
9292
9293
\item[key:] An object that appears in a dictionary as the
9294
first part of a key-value pair.
9295
\index{key}
9296
9297
\item[value:] An object that appears in a dictionary as the
9298
second part of a key-value pair. This is more specific than
9299
our previous use of the word ``value''.
9300
\index{value}
9301
9302
\item[implementation:] A way of performing a computation.
9303
\index{implementation}
9304
9305
\item[hashtable:] The algorithm used to implement Python
9306
dictionaries.
9307
\index{hashtable}
9308
9309
\item[hash function:] A function used by a hashtable to compute the
9310
location for a key.
9311
\index{hash function}
9312
9313
\item[hashable:] A type that has a hash function. Immutable
9314
types like integers,
9315
floats and strings are hashable; mutable types like lists and
9316
dictionaries are not.
9317
\index{hashable}
9318
9319
\item[lookup:] A dictionary operation that takes a key and finds
9320
the corresponding value.
9321
\index{lookup}
9322
9323
\item[reverse lookup:] A dictionary operation that takes a value and finds
9324
one or more keys that map to it.
9325
\index{reverse lookup}
9326
9327
\item[raise statement:] A statement that (deliberately) raises an exception.
9328
\index{raise statement}
9329
\index{statement!raise}
9330
9331
\item[singleton:] A list (or other sequence) with a single element.
9332
\index{singleton}
9333
9334
\item[call graph:] A diagram that shows every frame created during
9335
the execution of a program, with an arrow from each caller to
9336
each callee.
9337
\index{call graph}
9338
\index{diagram!call graph}
9339
9340
\item[memo:] A computed value stored to avoid unnecessary future
9341
computation.
9342
\index{memo}
9343
9344
\item[global variable:] A variable defined outside a function. Global
9345
variables can be accessed from any function.
9346
\index{global variable}
9347
9348
\item[global statement:] A statement that declares a variable name
9349
global.
9350
\index{global statement}
9351
\index{statement!global}
9352
9353
\item[flag:] A boolean variable used to indicate whether a condition
9354
is true.
9355
\index{flag}
9356
9357
\item[declaration:] A statement like {\tt global} that tells the
9358
interpreter something about a variable.
9359
\index{declaration}
9360
9361
\end{description}
9362
9363
9364
\section{Exercises}
9365
9366
\begin{exercise}
9367
\label{wordlist2}
9368
\index{set membership}
9369
\index{membership!set}
9370
9371
Write a function that reads the words in {\tt words.txt} and
9372
stores them as keys in a dictionary. It doesn't matter what the
9373
values are. Then you can use the {\tt in} operator
9374
as a fast way to check whether a string is in
9375
the dictionary.
9376
9377
If you did Exercise~\ref{wordlist1}, you can compare the speed
9378
of this implementation with the list {\tt in} operator and the
9379
bisection search.
9380
9381
\end{exercise}
9382
9383
9384
\begin{exercise}
9385
\label{setdefault}
9386
9387
Read the documentation of the dictionary method {\tt setdefault}
9388
and use it to write a more concise version of \verb"invert_dict".
9389
Solution: \url{http://thinkpython2.com/code/invert_dict.py}.
9390
\index{setdefault method}
9391
\index{method!setdefault}
9392
9393
\end{exercise}
9394
9395
9396
\begin{exercise}
9397
Memoize the Ackermann function from Exercise~\ref{ackermann} and see if
9398
memoization makes it possible to evaluate the function with bigger
9399
arguments. Hint: no.
9400
Solution: \url{http://thinkpython2.com/code/ackermann_memo.py}.
9401
\index{Ackermann function}
9402
\index{function!ack}
9403
9404
\end{exercise}
9405
9406
9407
9408
\begin{exercise}
9409
\index{duplicate}
9410
9411
If you did Exercise~\ref{duplicate}, you already have
9412
a function named \verb"has_duplicates" that takes a list
9413
as a parameter and returns {\tt True} if there is any object
9414
that appears more than once in the list.
9415
9416
Use a dictionary to write a faster, simpler version of
9417
\verb"has_duplicates".
9418
Solution: \url{http://thinkpython2.com/code/has_duplicates.py}.
9419
9420
\end{exercise}
9421
9422
9423
\begin{exercise}
9424
\label{exrotatepairs}
9425
\index{letter rotation}
9426
\index{rotation!letters}
9427
9428
Two words are ``rotate pairs'' if you can rotate one of them
9429
and get the other (see \verb"rotate_word" in Exercise~\ref{exrotate}).
9430
9431
Write a program that reads a wordlist and finds all the rotate
9432
pairs. Solution: \url{http://thinkpython2.com/code/rotate_pairs.py}.
9433
9434
\end{exercise}
9435
9436
9437
\begin{exercise}
9438
\index{Car Talk}
9439
\index{Puzzler}
9440
9441
Here's another Puzzler from {\em Car Talk}
9442
(\url{http://www.cartalk.com/content/puzzlers}):
9443
9444
\begin{quote}
9445
This was sent in by a fellow named Dan O'Leary. He came upon a common
9446
one-syllable, five-letter word recently that has the following unique
9447
property. When you remove the first letter, the remaining letters form
9448
a homophone of the original word, that is a word that sounds exactly
9449
the same. Replace the first letter, that is, put it back and remove
9450
the second letter and the result is yet another homophone of the
9451
original word. And the question is, what's the word?
9452
9453
Now I'm going to give you an example that doesn't work. Let's look at
9454
the five-letter word, `wrack.' W-R-A-C-K, you know like to `wrack with
9455
pain.' If I remove the first letter, I am left with a four-letter
9456
word, 'R-A-C-K.' As in, `Holy cow, did you see the rack on that buck!
9457
It must have been a nine-pointer!' It's a perfect homophone. If you
9458
put the `w' back, and remove the `r,' instead, you're left with the
9459
word, `wack,' which is a real word, it's just not a homophone of the
9460
other two words.
9461
9462
But there is, however, at least one word that Dan and we know of,
9463
which will yield two homophones if you remove either of the first two
9464
letters to make two, new four-letter words. The question is, what's
9465
the word?
9466
\end{quote}
9467
\index{homophone}
9468
\index{reducible word}
9469
\index{word, reducible}
9470
9471
You can use the dictionary from Exercise~\ref{wordlist2} to check
9472
whether a string is in the word list.
9473
9474
To check whether two words are homophones, you can use the CMU
9475
Pronouncing Dictionary. You can download it from
9476
\url{http://www.speech.cs.cmu.edu/cgi-bin/cmudict} or from
9477
\url{http://thinkpython2.com/code/c06d} and you can also download
9478
\url{http://thinkpython2.com/code/pronounce.py}, which provides a function
9479
named \verb"read_dictionary" that reads the pronouncing dictionary and
9480
returns a Python dictionary that maps from each word to a string that
9481
describes its primary pronunciation.
9482
9483
Write a program that lists all the words that solve the Puzzler.
9484
Solution: \url{http://thinkpython2.com/code/homophone.py}.
9485
9486
\end{exercise}
9487
9488
9489
9490
\chapter{Tuples}
9491
\label{tuplechap}
9492
9493
This chapter presents one more built-in type, the tuple, and then
9494
shows how lists, dictionaries, and tuples work together.
9495
I also present a useful feature for variable-length argument lists,
9496
the gather and scatter operators.
9497
9498
One note: there is no consensus on how to pronounce ``tuple''.
9499
Some people say ``tuh-ple'', which rhymes with ``supple''. But
9500
in the context of programming, most people say ``too-ple'', which
9501
rhymes with ``quadruple''.
9502
9503
9504
\section{Tuples are immutable}
9505
\index{tuple}
9506
\index{type!tuple}
9507
\index{sequence}
9508
9509
A tuple is a sequence of values. The values can be any type, and
9510
they are indexed by integers, so in that respect tuples are a lot
9511
like lists. The important difference is that tuples are immutable.
9512
\index{mutability}
9513
\index{immutability}
9514
9515
Syntactically, a tuple is a comma-separated list of values:
9516
9517
\begin{verbatim}
9518
>>> t = 'a', 'b', 'c', 'd', 'e'
9519
\end{verbatim}
9520
%
9521
Although it is not necessary, it is common to enclose tuples in
9522
parentheses:
9523
\index{parentheses!tuples in}
9524
9525
\begin{verbatim}
9526
>>> t = ('a', 'b', 'c', 'd', 'e')
9527
\end{verbatim}
9528
%
9529
To create a tuple with a single element, you have to include a final
9530
comma:
9531
\index{singleton}
9532
\index{tuple!singleton}
9533
9534
\begin{verbatim}
9535
>>> t1 = 'a',
9536
>>> type(t1)
9537
<class 'tuple'>
9538
\end{verbatim}
9539
%
9540
A value in parentheses is not a tuple:
9541
9542
\begin{verbatim}
9543
>>> t2 = ('a')
9544
>>> type(t2)
9545
<class 'str'>
9546
\end{verbatim}
9547
%
9548
Another way to create a tuple is the built-in function {\tt tuple}.
9549
With no argument, it creates an empty tuple:
9550
\index{tuple function}
9551
\index{function!tuple}
9552
9553
\begin{verbatim}
9554
>>> t = tuple()
9555
>>> t
9556
()
9557
\end{verbatim}
9558
%
9559
If the argument is a sequence (string, list or tuple), the result
9560
is a tuple with the elements of the sequence:
9561
9562
\begin{verbatim}
9563
>>> t = tuple('lupins')
9564
>>> t
9565
('l', 'u', 'p', 'i', 'n', 's')
9566
\end{verbatim}
9567
%
9568
Because {\tt tuple} is the name of a built-in function, you should
9569
avoid using it as a variable name.
9570
9571
Most list operators also work on tuples. The bracket operator
9572
indexes an element:
9573
\index{bracket operator}
9574
\index{operator!bracket}
9575
9576
\begin{verbatim}
9577
>>> t = ('a', 'b', 'c', 'd', 'e')
9578
>>> t[0]
9579
'a'
9580
\end{verbatim}
9581
%
9582
And the slice operator selects a range of elements.
9583
\index{slice operator}
9584
\index{operator!slice}
9585
\index{tuple!slice}
9586
\index{slice!tuple}
9587
9588
\begin{verbatim}
9589
>>> t[1:3]
9590
('b', 'c')
9591
\end{verbatim}
9592
%
9593
But if you try to modify one of the elements of the tuple, you get
9594
an error:
9595
\index{exception!TypeError}
9596
\index{TypeError}
9597
\index{item assignment}
9598
\index{assignment!item}
9599
9600
\begin{verbatim}
9601
>>> t[0] = 'A'
9602
TypeError: object doesn't support item assignment
9603
\end{verbatim}
9604
%
9605
Because tuples are immutable, you can't modify the elements. But you
9606
can replace one tuple with another:
9607
9608
\begin{verbatim}
9609
>>> t = ('A',) + t[1:]
9610
>>> t
9611
('A', 'b', 'c', 'd', 'e')
9612
\end{verbatim}
9613
%
9614
This statement makes a new tuple and then makes {\tt t} refer to it.
9615
9616
The relational operators work with tuples and other sequences;
9617
Python starts by comparing the first element from each
9618
sequence. If they are equal, it goes on to the next elements,
9619
and so on, until it finds elements that differ. Subsequent
9620
elements are not considered (even if they are really big).
9621
\index{comparison!tuple}
9622
\index{tuple!comparison}
9623
9624
\begin{verbatim}
9625
>>> (0, 1, 2) < (0, 3, 4)
9626
True
9627
>>> (0, 1, 2000000) < (0, 3, 4)
9628
True
9629
\end{verbatim}
9630
9631
9632
9633
\section{Tuple assignment}
9634
\label{tuple.assignment}
9635
\index{tuple!assignment}
9636
\index{assignment!tuple}
9637
\index{swap pattern}
9638
\index{pattern!swap}
9639
9640
It is often useful to swap the values of two variables.
9641
With conventional assignments, you have to use a temporary
9642
variable. For example, to swap {\tt a} and {\tt b}:
9643
9644
\begin{verbatim}
9645
>>> temp = a
9646
>>> a = b
9647
>>> b = temp
9648
\end{verbatim}
9649
%
9650
This solution is cumbersome; {\bf tuple assignment} is more elegant:
9651
9652
\begin{verbatim}
9653
>>> a, b = b, a
9654
\end{verbatim}
9655
%
9656
The left side is a tuple of variables; the right side is a tuple of
9657
expressions. Each value is assigned to its respective variable.
9658
All the expressions on the right side are evaluated before any
9659
of the assignments.
9660
9661
The number of variables on the left and the number of
9662
values on the right have to be the same:
9663
\index{exception!ValueError}
9664
\index{ValueError}
9665
9666
\begin{verbatim}
9667
>>> a, b = 1, 2, 3
9668
ValueError: too many values to unpack
9669
\end{verbatim}
9670
%
9671
More generally, the right side can be any kind of sequence
9672
(string, list or tuple). For example, to split an email address
9673
into a user name and a domain, you could write:
9674
\index{split method}
9675
\index{method!split}
9676
\index{email address}
9677
9678
\begin{verbatim}
9679
>>> addr = 'monty@python.org'
9680
>>> uname, domain = addr.split('@')
9681
\end{verbatim}
9682
%
9683
The return value from {\tt split} is a list with two elements;
9684
the first element is assigned to {\tt uname}, the second to
9685
{\tt domain}.
9686
9687
\begin{verbatim}
9688
>>> uname
9689
'monty'
9690
>>> domain
9691
'python.org'
9692
\end{verbatim}
9693
%
9694
9695
\section{Tuples as return values}
9696
\index{tuple}
9697
\index{value!tuple}
9698
\index{return value!tuple}
9699
\index{function!tuple as return value}
9700
9701
Strictly speaking, a function can only return one value, but
9702
if the value is a tuple, the effect is the same as returning
9703
multiple values. For example, if you want to divide two integers
9704
and compute the quotient and remainder, it is inefficient to
9705
compute {\tt x//y} and then {\tt x\%y}. It is better to compute
9706
them both at the same time.
9707
\index{divmod}
9708
9709
The built-in function {\tt divmod} takes two arguments and
9710
returns a tuple of two values, the quotient and remainder.
9711
You can store the result as a tuple:
9712
9713
\begin{verbatim}
9714
>>> t = divmod(7, 3)
9715
>>> t
9716
(2, 1)
9717
\end{verbatim}
9718
%
9719
Or use tuple assignment to store the elements separately:
9720
\index{tuple assignment}
9721
\index{assignment!tuple}
9722
9723
\begin{verbatim}
9724
>>> quot, rem = divmod(7, 3)
9725
>>> quot
9726
2
9727
>>> rem
9728
1
9729
\end{verbatim}
9730
%
9731
Here is an example of a function that returns a tuple:
9732
9733
\begin{verbatim}
9734
def min_max(t):
9735
return min(t), max(t)
9736
\end{verbatim}
9737
%
9738
{\tt max} and {\tt min} are built-in functions that find
9739
the largest and smallest elements of a sequence. \verb"min_max"
9740
computes both and returns a tuple of two values.
9741
\index{max function}
9742
\index{function!max}
9743
\index{min function}
9744
\index{function!min}
9745
9746
9747
\section{Variable-length argument tuples}
9748
\label{gather}
9749
\index{variable-length argument tuple}
9750
\index{argument!variable-length tuple}
9751
\index{gather}
9752
\index{parameter!gather}
9753
\index{argument!gather}
9754
9755
Functions can take a variable number of arguments. A parameter
9756
name that begins with {\tt *} {\bf gathers} arguments into
9757
a tuple. For example, {\tt printall}
9758
takes any number of arguments and prints them:
9759
9760
\begin{verbatim}
9761
def printall(*args):
9762
print(args)
9763
\end{verbatim}
9764
%
9765
The gather parameter can have any name you like, but {\tt args} is
9766
conventional. Here's how the function works:
9767
9768
\begin{verbatim}
9769
>>> printall(1, 2.0, '3')
9770
(1, 2.0, '3')
9771
\end{verbatim}
9772
%
9773
The complement of gather is {\bf scatter}. If you have a
9774
sequence of values and you want to pass it to a function
9775
as multiple arguments, you can use the {\tt *} operator.
9776
For example, {\tt divmod} takes exactly two arguments; it
9777
doesn't work with a tuple:
9778
\index{scatter}
9779
\index{argument scatter}
9780
\index{TypeError}
9781
\index{exception!TypeError}
9782
9783
\begin{verbatim}
9784
>>> t = (7, 3)
9785
>>> divmod(t)
9786
TypeError: divmod expected 2 arguments, got 1
9787
\end{verbatim}
9788
%
9789
But if you scatter the tuple, it works:
9790
9791
\begin{verbatim}
9792
>>> divmod(*t)
9793
(2, 1)
9794
\end{verbatim}
9795
%
9796
Many of the built-in functions use
9797
variable-length argument tuples. For example, {\tt max}
9798
and {\tt min} can take any number of arguments:
9799
\index{max function}
9800
\index{function!max}
9801
\index{min function}
9802
\index{function!min}
9803
9804
\begin{verbatim}
9805
>>> max(1, 2, 3)
9806
3
9807
\end{verbatim}
9808
%
9809
But {\tt sum} does not.
9810
\index{sum function}
9811
\index{function!sum}
9812
9813
\begin{verbatim}
9814
>>> sum(1, 2, 3)
9815
TypeError: sum expected at most 2 arguments, got 3
9816
\end{verbatim}
9817
%
9818
As an exercise, write a function called \verb"sum_all" that takes any number
9819
of arguments and returns their sum.
9820
9821
9822
\section{Lists and tuples}
9823
\index{zip function}
9824
\index{function!zip}
9825
9826
{\tt zip} is a built-in function that takes two or more sequences and
9827
interleaves them. The name of the function refers to
9828
a zipper, which interleaves two rows of teeth.
9829
9830
This example zips a string and a list:
9831
9832
\begin{verbatim}
9833
>>> s = 'abc'
9834
>>> t = [0, 1, 2]
9835
>>> zip(s, t)
9836
<zip object at 0x7f7d0a9e7c48>
9837
\end{verbatim}
9838
%
9839
The result is a {\bf zip object} that knows how to iterate through
9840
the pairs. The most common use of {\tt zip} is in a {\tt for} loop:
9841
9842
\begin{verbatim}
9843
>>> for pair in zip(s, t):
9844
... print(pair)
9845
...
9846
('a', 0)
9847
('b', 1)
9848
('c', 2)
9849
\end{verbatim}
9850
%
9851
A zip object is a kind of {\bf iterator}, which is any object
9852
that iterates through a sequence. Iterators are similar to lists in some
9853
ways, but unlike lists, you can't use an index to select an element from
9854
an iterator.
9855
\index{iterator}
9856
9857
If you want to use list operators and methods, you can
9858
use a zip object to make a list:
9859
9860
\begin{verbatim}
9861
>>> list(zip(s, t))
9862
[('a', 0), ('b', 1), ('c', 2)]
9863
\end{verbatim}
9864
%
9865
The result is a list of tuples; in this example, each tuple contains
9866
a character from the string and the corresponding element from
9867
the list.
9868
\index{list!of tuples}
9869
9870
If the sequences are not the same length, the result has the
9871
length of the shorter one.
9872
9873
\begin{verbatim}
9874
>>> list(zip('Anne', 'Elk'))
9875
[('A', 'E'), ('n', 'l'), ('n', 'k')]
9876
\end{verbatim}
9877
%
9878
You can use tuple assignment in a {\tt for} loop to traverse a list of
9879
tuples:
9880
\index{traversal}
9881
\index{tuple assignment}
9882
\index{assignment!tuple}
9883
9884
\begin{verbatim}
9885
t = [('a', 0), ('b', 1), ('c', 2)]
9886
for letter, number in t:
9887
print(number, letter)
9888
\end{verbatim}
9889
%
9890
Each time through the loop, Python selects the next tuple in
9891
the list and assigns the elements to {\tt letter} and
9892
{\tt number}. The output of this loop is:
9893
\index{loop}
9894
9895
\begin{verbatim}
9896
0 a
9897
1 b
9898
2 c
9899
\end{verbatim}
9900
%
9901
If you combine {\tt zip}, {\tt for} and tuple assignment, you get a
9902
useful idiom for traversing two (or more) sequences at the same
9903
time. For example, \verb"has_match" takes two sequences, {\tt t1} and
9904
{\tt t2}, and returns {\tt True} if there is an index {\tt i}
9905
such that {\tt t1[i] == t2[i]}:
9906
\index{for loop}
9907
9908
\begin{verbatim}
9909
def has_match(t1, t2):
9910
for x, y in zip(t1, t2):
9911
if x == y:
9912
return True
9913
return False
9914
\end{verbatim}
9915
%
9916
If you need to traverse the elements of a sequence and their
9917
indices, you can use the built-in function {\tt enumerate}:
9918
\index{traversal}
9919
\index{enumerate function}
9920
\index{function!enumerate}
9921
9922
\begin{verbatim}
9923
for index, element in enumerate('abc'):
9924
print(index, element)
9925
\end{verbatim}
9926
%
9927
The result from {\tt enumerate} is an enumerate object, which
9928
iterates a sequence of pairs; each pair contains an index (starting
9929
from 0) and an element from the given sequence.
9930
In this example, the output is
9931
9932
\begin{verbatim}
9933
0 a
9934
1 b
9935
2 c
9936
\end{verbatim}
9937
%
9938
Again.
9939
\index{iterator}
9940
\index{object!enumerate}
9941
\index{enumerate object}
9942
9943
9944
\section{Dictionaries and tuples}
9945
\label{dictuple}
9946
\index{dictionary}
9947
\index{items method}
9948
\index{method!items}
9949
\index{key-value pair}
9950
9951
Dictionaries have a method called {\tt items} that returns a sequence of
9952
tuples, where each tuple is a key-value pair.
9953
9954
\begin{verbatim}
9955
>>> d = {'a':0, 'b':1, 'c':2}
9956
>>> t = d.items()
9957
>>> t
9958
dict_items([('c', 2), ('a', 0), ('b', 1)])
9959
\end{verbatim}
9960
%
9961
The result is a \verb"dict_items" object, which is an iterator that
9962
iterates the key-value pairs. You can use it in a {\tt for} loop
9963
like this:
9964
\index{iterator}
9965
9966
\begin{verbatim}
9967
>>> for key, value in d.items():
9968
... print(key, value)
9969
...
9970
c 2
9971
a 0
9972
b 1
9973
\end{verbatim}
9974
%
9975
As you should expect from a dictionary, the items are in no
9976
particular order.
9977
9978
Going in the other direction, you can use a list of tuples to
9979
initialize a new dictionary: \index{dictionary!initialize}
9980
9981
\begin{verbatim}
9982
>>> t = [('a', 0), ('c', 2), ('b', 1)]
9983
>>> d = dict(t)
9984
>>> d
9985
{'a': 0, 'c': 2, 'b': 1}
9986
\end{verbatim}
9987
9988
Combining {\tt dict} with {\tt zip} yields a concise way
9989
to create a dictionary:
9990
\index{zip function!use with dict}
9991
9992
\begin{verbatim}
9993
>>> d = dict(zip('abc', range(3)))
9994
>>> d
9995
{'a': 0, 'c': 2, 'b': 1}
9996
\end{verbatim}
9997
%
9998
The dictionary method {\tt update} also takes a list of tuples
9999
and adds them, as key-value pairs, to an existing dictionary.
10000
\index{update method}
10001
\index{method!update}
10002
\index{traverse!dictionary}
10003
\index{dictionary!traversal}
10004
10005
It is common to use tuples as keys in dictionaries (primarily because
10006
you can't use lists). For example, a telephone directory might map
10007
from last-name, first-name pairs to telephone numbers. Assuming
10008
that we have defined {\tt last}, {\tt first} and {\tt number}, we
10009
could write:
10010
\index{tuple!as key in dictionary}
10011
\index{hashable}
10012
10013
\begin{verbatim}
10014
directory[last, first] = number
10015
\end{verbatim}
10016
%
10017
The expression in brackets is a tuple. We could use tuple
10018
assignment to traverse this dictionary.
10019
\index{tuple!in brackets}
10020
10021
\begin{verbatim}
10022
for last, first in directory:
10023
print(first, last, directory[last,first])
10024
\end{verbatim}
10025
%
10026
This loop traverses the keys in {\tt directory}, which are tuples. It
10027
assigns the elements of each tuple to {\tt last} and {\tt first}, then
10028
prints the name and corresponding telephone number.
10029
10030
There are two ways to represent tuples in a state diagram. The more
10031
detailed version shows the indices and elements just as they appear in
10032
a list. For example, the tuple \verb"('Cleese', 'John')" would appear
10033
as in Figure~\ref{fig.tuple1}.
10034
\index{state diagram}
10035
\index{diagram!state}
10036
10037
\begin{figure}
10038
\centerline
10039
{\includegraphics[scale=0.8]{figs/tuple1.pdf}}
10040
\caption{State diagram.}
10041
\label{fig.tuple1}
10042
\end{figure}
10043
10044
But in a larger diagram you might want to leave out the
10045
details. For example, a diagram of the telephone directory might
10046
appear as in Figure~\ref{fig.dict2}.
10047
10048
\begin{figure}
10049
\centerline
10050
{\includegraphics[scale=0.8]{figs/dict2.pdf}}
10051
\caption{State diagram.}
10052
\label{fig.dict2}
10053
\end{figure}
10054
10055
Here the tuples are shown using Python syntax as a graphical
10056
shorthand. The telephone number in the diagram is the complaints line
10057
for the BBC, so please don't call it.
10058
10059
10060
\section{Sequences of sequences}
10061
\index{sequence}
10062
10063
I have focused on lists of tuples, but almost all of the examples in
10064
this chapter also work with lists of lists, tuples of tuples, and
10065
tuples of lists. To avoid enumerating the possible combinations, it
10066
is sometimes easier to talk about sequences of sequences.
10067
10068
In many contexts, the different kinds of sequences (strings, lists and
10069
tuples) can be used interchangeably. So how should you choose one
10070
over the others?
10071
\index{string}
10072
\index{list}
10073
\index{tuple}
10074
\index{mutability}
10075
\index{immutability}
10076
10077
To start with the obvious, strings are more limited than other
10078
sequences because the elements have to be characters. They are
10079
also immutable. If you need the ability to change the characters
10080
in a string (as opposed to creating a new string), you might
10081
want to use a list of characters instead.
10082
10083
Lists are more common than tuples, mostly because they are mutable.
10084
But there are a few cases where you might prefer tuples:
10085
10086
\begin{enumerate}
10087
10088
\item In some contexts, like a {\tt return} statement, it is
10089
syntactically simpler to create a tuple than a list.
10090
10091
\item If you want to use a sequence as a dictionary key, you
10092
have to use an immutable type like a tuple or string.
10093
10094
\item If you are passing a sequence as an argument to a function,
10095
using tuples reduces the potential for unexpected behavior
10096
due to aliasing.
10097
10098
\end{enumerate}
10099
10100
Because tuples are immutable, they don't provide methods like {\tt
10101
sort} and {\tt reverse}, which modify existing lists. But Python
10102
provides the built-in function {\tt sorted}, which takes any sequence
10103
and returns a new list with the same elements in sorted order, and
10104
{\tt reversed}, which takes a sequence and returns an iterator that
10105
traverses the list in reverse order.
10106
\index{sorted function}
10107
\index{function!sorted} \index{reversed function}
10108
\index{function!reversed}
10109
\index{iterator}
10110
10111
10112
\section{Debugging}
10113
\index{debugging}
10114
\index{data structure}
10115
\index{shape error}
10116
\index{error!shape}
10117
10118
Lists, dictionaries and tuples are examples of {\bf data
10119
structures}; in this chapter we are starting to see compound data
10120
structures, like lists of tuples, or dictionaries that contain tuples
10121
as keys and lists as values. Compound data structures are useful, but
10122
they are prone to what I call {\bf shape errors}; that is, errors
10123
caused when a data structure has the wrong type, size, or structure.
10124
For example, if you are expecting a list with one integer and I
10125
give you a plain old integer (not in a list), it won't work.
10126
\index{structshape module}
10127
\index{module!structshape}
10128
10129
To help debug these kinds of errors, I have written a module
10130
called {\tt structshape} that provides a function, also called
10131
{\tt structshape}, that takes any kind of data structure as
10132
an argument and returns a string that summarizes its shape.
10133
You can download it from \url{http://thinkpython2.com/code/structshape.py}
10134
10135
Here's the result for a simple list:
10136
10137
\begin{verbatim}
10138
>>> from structshape import structshape
10139
>>> t = [1, 2, 3]
10140
>>> structshape(t)
10141
'list of 3 int'
10142
\end{verbatim}
10143
%
10144
A fancier program might write ``list of 3 int{\em s}'', but it
10145
was easier not to deal with plurals. Here's a list of lists:
10146
10147
\begin{verbatim}
10148
>>> t2 = [[1,2], [3,4], [5,6]]
10149
>>> structshape(t2)
10150
'list of 3 list of 2 int'
10151
\end{verbatim}
10152
%
10153
If the elements of the list are not the same type,
10154
{\tt structshape} groups them, in order, by type:
10155
10156
\begin{verbatim}
10157
>>> t3 = [1, 2, 3, 4.0, '5', '6', [7], [8], 9]
10158
>>> structshape(t3)
10159
'list of (3 int, float, 2 str, 2 list of int, int)'
10160
\end{verbatim}
10161
%
10162
Here's a list of tuples:
10163
10164
\begin{verbatim}
10165
>>> s = 'abc'
10166
>>> lt = list(zip(t, s))
10167
>>> structshape(lt)
10168
'list of 3 tuple of (int, str)'
10169
\end{verbatim}
10170
%
10171
And here's a dictionary with 3 items that map integers to strings.
10172
10173
\begin{verbatim}
10174
>>> d = dict(lt)
10175
>>> structshape(d)
10176
'dict of 3 int->str'
10177
\end{verbatim}
10178
%
10179
If you are having trouble keeping track of your data structures,
10180
{\tt structshape} can help.
10181
10182
10183
\section{Glossary}
10184
10185
\begin{description}
10186
10187
\item[tuple:] An immutable sequence of elements.
10188
\index{tuple}
10189
10190
\item[tuple assignment:] An assignment with a sequence on the
10191
right side and a tuple of variables on the left. The right
10192
side is evaluated and then its elements are assigned to the
10193
variables on the left.
10194
\index{tuple assignment}
10195
\index{assignment!tuple}
10196
10197
\item[gather:] An operation that collects multiple arguments into a tuple.
10198
\index{gather}
10199
10200
\item[scatter:] An operation that makes a sequence behave like multiple arguments.
10201
\index{scatter}
10202
10203
\item[zip object:] The result of calling a built-in function {\tt zip};
10204
an object that iterates through a sequence of tuples.
10205
\index{zip object}
10206
\index{object!zip}
10207
10208
\item[iterator:] An object that can iterate through a sequence, but
10209
which does not provide list operators and methods.
10210
\index{iterator}
10211
10212
\item[data structure:] A collection of related values, often
10213
organized in lists, dictionaries, tuples, etc.
10214
\index{data structure}
10215
10216
\item[shape error:] An error caused because a value has the
10217
wrong shape; that is, the wrong type or size.
10218
\index{shape}
10219
10220
\end{description}
10221
10222
10223
\section{Exercises}
10224
10225
\begin{exercise}
10226
10227
Write a function called \verb"most_frequent" that takes a string and
10228
prints the letters in decreasing order of frequency. Find text
10229
samples from several different languages and see how letter frequency
10230
varies between languages. Compare your results with the tables at
10231
\url{http://en.wikipedia.org/wiki/Letter_frequencies}. Solution:
10232
\url{http://thinkpython2.com/code/most_frequent.py}. \index{letter
10233
frequency} \index{frequency!letter}
10234
10235
\end{exercise}
10236
10237
10238
\begin{exercise}
10239
\label{anagrams}
10240
\index{anagram set}
10241
\index{set!anagram}
10242
10243
More anagrams!
10244
10245
\begin{enumerate}
10246
10247
\item Write a program
10248
that reads a word list from a file (see Section~\ref{wordlist}) and
10249
prints all the sets of words that are anagrams.
10250
10251
Here is an example of what the output might look like:
10252
10253
\begin{verbatim}
10254
['deltas', 'desalt', 'lasted', 'salted', 'slated', 'staled']
10255
['retainers', 'ternaries']
10256
['generating', 'greatening']
10257
['resmelts', 'smelters', 'termless']
10258
\end{verbatim}
10259
%
10260
Hint: you might want to build a dictionary that maps from a
10261
collection of letters to a list of words that can be spelled with those
10262
letters. The question is, how can you represent the collection of
10263
letters in a way that can be used as a key?
10264
10265
\item Modify the previous program so that it prints the longest list
10266
of anagrams first, followed by the second longest, and so on.
10267
\index{Scrabble}
10268
\index{bingo}
10269
10270
\item In Scrabble a ``bingo'' is when you play all seven tiles in
10271
your rack, along with a letter on the board, to form an eight-letter
10272
word. What collection of 8 letters forms the most possible bingos?
10273
10274
% (7, ['angriest', 'astringe', 'ganister', 'gantries', 'granites',
10275
% 'ingrates', 'rangiest'])
10276
10277
Solution: \url{http://thinkpython2.com/code/anagram_sets.py}.
10278
10279
\end{enumerate}
10280
\end{exercise}
10281
10282
\begin{exercise}
10283
\index{metathesis}
10284
10285
Two words form a ``metathesis pair'' if you can transform one into the
10286
other by swapping two letters; for example, ``converse'' and
10287
``conserve''. Write a program that finds all of the metathesis pairs
10288
in the dictionary. Hint: don't test all pairs of words, and don't
10289
test all possible swaps. Solution:
10290
\url{http://thinkpython2.com/code/metathesis.py}. Credit: This
10291
exercise is inspired by an example at \url{http://puzzlers.org}.
10292
10293
\end{exercise}
10294
10295
10296
\begin{exercise}
10297
\index{Car Talk}
10298
\index{Puzzler}
10299
10300
Here's another Car Talk Puzzler
10301
(\url{http://www.cartalk.com/content/puzzlers}):
10302
10303
\begin{quote}
10304
What is the longest English word, that remains a valid English word,
10305
as you remove its letters one at a time?
10306
10307
Now, letters can be removed from either end, or the middle, but you
10308
can't rearrange any of the letters. Every time you drop a letter, you
10309
wind up with another English word. If you do that, you're eventually
10310
going to wind up with one letter and that too is going to be an
10311
English word---one that's found in the dictionary. I want to know
10312
what's the longest word and how many letters does it
10313
have?
10314
10315
I'm going to give you a little modest example: Sprite. Ok? You start
10316
off with sprite, you take a letter off, one from the interior of the
10317
word, take the r away, and we're left with the word spite, then we
10318
take the e off the end, we're left with spit, we take the s off, we're
10319
left with pit, it, and I.
10320
\end{quote}
10321
\index{reducible word}
10322
\index{word, reducible}
10323
10324
Write a program to find all words that can be reduced in this way,
10325
and then find the longest one.
10326
10327
This exercise is a little more challenging than most, so here are
10328
some suggestions:
10329
10330
\begin{enumerate}
10331
10332
\item You might want to write a function that takes a word and
10333
computes a list of all the words that can be formed by removing one
10334
letter. These are the ``children'' of the word.
10335
\index{recursive definition}
10336
\index{definition!recursive}
10337
10338
\item Recursively, a word is reducible if any of its children
10339
are reducible. As a base case, you can consider the empty
10340
string reducible.
10341
10342
\item The wordlist I provided, {\tt words.txt}, doesn't
10343
contain single letter words. So you might want to add
10344
``I'', ``a'', and the empty string.
10345
10346
\item To improve the performance of your program, you might want
10347
to memoize the words that are known to be reducible.
10348
10349
\end{enumerate}
10350
10351
Solution: \url{http://thinkpython2.com/code/reducible.py}.
10352
10353
\end{exercise}
10354
10355
10356
10357
10358
%\begin{exercise}
10359
%\url{http://en.wikipedia.org/wiki/Word_Ladder}
10360
%\end{exercise}
10361
10362
10363
10364
10365
\chapter{Case study: data structure selection}
10366
10367
At this point you have learned about Python's core data structures,
10368
and you have seen some of the algorithms that use them.
10369
If you would like to know more about algorithms, this might be a good
10370
time to read Chapter~\ref{algorithms}.
10371
But you don't have to read it before you go on; you can read
10372
it whenever you are interested.
10373
10374
This chapter presents a case study with exercises that let
10375
you think about choosing data structures and practice using them.
10376
10377
10378
\section{Word frequency analysis}
10379
\label{analysis}
10380
10381
As usual, you should at least attempt the exercises
10382
before you read my solutions.
10383
10384
\begin{exercise}
10385
10386
Write a program that reads a file, breaks each line into
10387
words, strips whitespace and punctuation from the words, and
10388
converts them to lowercase.
10389
\index{string module}
10390
\index{module!string}
10391
10392
Hint: The {\tt string} module provides a string named {\tt whitespace},
10393
which contains space, tab, newline, etc., and {\tt
10394
punctuation} which contains the punctuation characters. Let's see
10395
if we can make Python swear:
10396
10397
\begin{verbatim}
10398
>>> import string
10399
>>> string.punctuation
10400
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
10401
\end{verbatim}
10402
%
10403
Also, you might consider using the string methods {\tt strip},
10404
{\tt replace} and {\tt translate}.
10405
\index{strip method}
10406
\index{method!strip}
10407
\index{replace method}
10408
\index{method!replace}
10409
\index{translate method}
10410
\index{method!translate}
10411
10412
\end{exercise}
10413
10414
10415
\begin{exercise}
10416
\index{Project Gutenberg}
10417
10418
Go to Project Gutenberg (\url{http://gutenberg.org}) and download
10419
your favorite out-of-copyright book in plain text format.
10420
\index{plain text}
10421
\index{text!plain}
10422
10423
Modify your program from the previous exercise to read the book
10424
you downloaded, skip over the header information at the beginning
10425
of the file, and process the rest of the words as before.
10426
10427
Then modify the program to count the total number of words in
10428
the book, and the number of times each word is used.
10429
\index{word frequency}
10430
\index{frequency!word}
10431
10432
Print the number of different words used in the book. Compare
10433
different books by different authors, written in different eras.
10434
Which author uses the most extensive vocabulary?
10435
\end{exercise}
10436
10437
10438
\begin{exercise}
10439
10440
Modify the program from the previous exercise to print the
10441
20 most frequently used words in the book.
10442
10443
\end{exercise}
10444
10445
10446
\begin{exercise}
10447
10448
Modify the previous program to read a word list (see
10449
Section~\ref{wordlist}) and then print all the words in the book that
10450
are not in the word list. How many of them are typos? How many of
10451
them are common words that {\em should} be in the word list, and how
10452
many of them are really obscure?
10453
10454
\end{exercise}
10455
10456
10457
\section{Random numbers}
10458
\index{random number}
10459
\index{number, random}
10460
\index{deterministic}
10461
\index{pseudorandom}
10462
10463
Given the same inputs, most computer programs generate the same
10464
outputs every time, so they are said to be {\bf deterministic}.
10465
Determinism is usually a good thing, since we expect the same
10466
calculation to yield the same result. For some applications, though,
10467
we want the computer to be unpredictable. Games are an obvious
10468
example, but there are more.
10469
10470
Making a program truly nondeterministic turns out to be difficult,
10471
but there are ways to make it at least seem nondeterministic. One of
10472
them is to use algorithms that generate {\bf pseudorandom} numbers.
10473
Pseudorandom numbers are not truly random because they are generated
10474
by a deterministic computation, but just by looking at the numbers it
10475
is all but impossible to distinguish them from random.
10476
\index{random module}
10477
\index{module!random}
10478
10479
The {\tt random} module provides functions that generate
10480
pseudorandom numbers (which I will simply call ``random'' from
10481
here on).
10482
\index{random function}
10483
\index{function!random}
10484
10485
The function {\tt random} returns a random float
10486
between 0.0 and 1.0 (including 0.0 but not 1.0). Each time you
10487
call {\tt random}, you get the next number in a long series. To see a
10488
sample, run this loop:
10489
10490
\begin{verbatim}
10491
import random
10492
10493
for i in range(10):
10494
x = random.random()
10495
print(x)
10496
\end{verbatim}
10497
%
10498
The function {\tt randint} takes parameters {\tt low} and
10499
{\tt high} and returns an integer between {\tt low} and
10500
{\tt high} (including both).
10501
\index{randint function}
10502
\index{function!randint}
10503
10504
\begin{verbatim}
10505
>>> random.randint(5, 10)
10506
5
10507
>>> random.randint(5, 10)
10508
9
10509
\end{verbatim}
10510
%
10511
To choose an element from a sequence at random, you can use
10512
{\tt choice}:
10513
\index{choice function}
10514
\index{function!choice}
10515
10516
\begin{verbatim}
10517
>>> t = [1, 2, 3]
10518
>>> random.choice(t)
10519
2
10520
>>> random.choice(t)
10521
3
10522
\end{verbatim}
10523
%
10524
The {\tt random} module also provides functions to generate
10525
random values from continuous distributions including
10526
Gaussian, exponential, gamma, and a few more.
10527
10528
\begin{exercise}
10529
\index{histogram!random choice}
10530
10531
Write a function named \verb"choose_from_hist" that takes
10532
a histogram as defined in Section~\ref{histogram} and returns a
10533
random value from the histogram, chosen with probability
10534
in proportion to frequency. For example, for this histogram:
10535
10536
\begin{verbatim}
10537
>>> t = ['a', 'a', 'b']
10538
>>> hist = histogram(t)
10539
>>> hist
10540
{'a': 2, 'b': 1}
10541
\end{verbatim}
10542
%
10543
your function should return \verb"'a'" with probability $2/3$ and \verb"'b'"
10544
with probability $1/3$.
10545
\end{exercise}
10546
10547
10548
\section{Word histogram}
10549
10550
You should attempt the previous exercises before you go on.
10551
You can download my solution from
10552
\url{http://thinkpython2.com/code/analyze_book1.py}. You will
10553
also need \url{http://thinkpython2.com/code/emma.txt}.
10554
10555
Here is a program that reads a file and builds a histogram of the
10556
words in the file:
10557
\index{histogram!word frequencies}
10558
10559
\begin{verbatim}
10560
import string
10561
10562
def process_file(filename):
10563
hist = dict()
10564
fp = open(filename)
10565
for line in fp:
10566
process_line(line, hist)
10567
return hist
10568
10569
def process_line(line, hist):
10570
line = line.replace('-', ' ')
10571
10572
for word in line.split():
10573
word = word.strip(string.punctuation + string.whitespace)
10574
word = word.lower()
10575
hist[word] = hist.get(word, 0) + 1
10576
10577
hist = process_file('emma.txt')
10578
\end{verbatim}
10579
%
10580
This program reads {\tt emma.txt}, which contains the text of {\em
10581
Emma} by Jane Austen.
10582
\index{Austin, Jane}
10583
10584
\verb"process_file" loops through the lines of the file,
10585
passing them one at a time to \verb"process_line". The histogram
10586
{\tt hist} is being used as an accumulator.
10587
\index{accumulator!histogram}
10588
\index{traversal}
10589
10590
\verb"process_line" uses the string method {\tt replace} to replace
10591
hyphens with spaces before using {\tt split} to break the line into a
10592
list of strings. It traverses the list of words and uses {\tt strip}
10593
and {\tt lower} to remove punctuation and convert to lower case. (It
10594
is a shorthand to say that strings are ``converted''; remember that
10595
strings are immutable, so methods like {\tt strip} and {\tt lower}
10596
return new strings.)
10597
10598
Finally, \verb"process_line" updates the histogram by creating a new
10599
item or incrementing an existing one.
10600
\index{update!histogram}
10601
10602
To count the total number of words in the file, we can add up
10603
the frequencies in the histogram:
10604
10605
\begin{verbatim}
10606
def total_words(hist):
10607
return sum(hist.values())
10608
\end{verbatim}
10609
%
10610
The number of different words is just the number of items in
10611
the dictionary:
10612
10613
\begin{verbatim}
10614
def different_words(hist):
10615
return len(hist)
10616
\end{verbatim}
10617
%
10618
Here is some code to print the results:
10619
10620
\begin{verbatim}
10621
print('Total number of words:', total_words(hist))
10622
print('Number of different words:', different_words(hist))
10623
\end{verbatim}
10624
%
10625
And the results:
10626
10627
\begin{verbatim}
10628
Total number of words: 161080
10629
Number of different words: 7214
10630
\end{verbatim}
10631
%
10632
10633
\section{Most common words}
10634
10635
To find the most common words, we can make a list of tuples,
10636
where each tuple contains a word and its frequency,
10637
and sort it.
10638
10639
The following function takes a histogram and returns a list of
10640
word-frequency tuples:
10641
10642
\begin{verbatim}
10643
def most_common(hist):
10644
t = []
10645
for key, value in hist.items():
10646
t.append((value, key))
10647
10648
t.sort(reverse=True)
10649
return t
10650
\end{verbatim}
10651
10652
In each tuple, the frequency appears first, so the resulting list is
10653
sorted by frequency. Here is a loop that prints the ten most common
10654
words:
10655
10656
\begin{verbatim}
10657
t = most_common(hist)
10658
print('The most common words are:')
10659
for freq, word in t[:10]:
10660
print(word, freq, sep='\t')
10661
\end{verbatim}
10662
%
10663
I use the keyword argument {\tt sep} to tell {\tt print} to use a tab
10664
character as a ``separator'', rather than a space, so the second
10665
column is lined up. Here are the results from {\em Emma}:
10666
10667
\begin{verbatim}
10668
The most common words are:
10669
to 5242
10670
the 5205
10671
and 4897
10672
of 4295
10673
i 3191
10674
a 3130
10675
it 2529
10676
her 2483
10677
was 2400
10678
she 2364
10679
\end{verbatim}
10680
%
10681
This code can be simplified using the {\tt key} parameter of
10682
the {\tt sort} function. If you are curious, you can read about it
10683
at \url{https://wiki.python.org/moin/HowTo/Sorting}.
10684
10685
10686
\section{Optional parameters}
10687
\index{optional parameter}
10688
\index{parameter!optional}
10689
10690
We have seen built-in functions and methods that take optional
10691
arguments. It is possible to write programmer-defined functions
10692
with optional arguments, too. For example, here is a function that
10693
prints the most common words in a histogram
10694
\index{programmer-defined function}
10695
\index{function!programmer defined}
10696
10697
\begin{verbatim}
10698
def print_most_common(hist, num=10):
10699
t = most_common(hist)
10700
print('The most common words are:')
10701
for freq, word in t[:num]:
10702
print(word, freq, sep='\t')
10703
\end{verbatim}
10704
10705
The first parameter is required; the second is optional.
10706
The {\bf default value} of {\tt num} is 10.
10707
\index{default value}
10708
\index{value!default}
10709
10710
If you only provide one argument:
10711
10712
\begin{verbatim}
10713
print_most_common(hist)
10714
\end{verbatim}
10715
10716
{\tt num} gets the default value. If you provide two arguments:
10717
10718
\begin{verbatim}
10719
print_most_common(hist, 20)
10720
\end{verbatim}
10721
10722
{\tt num} gets the value of the argument instead. In other
10723
words, the optional argument {\bf overrides} the default value.
10724
\index{override}
10725
10726
If a function has both required and optional parameters, all
10727
the required parameters have to come first, followed by the
10728
optional ones.
10729
10730
10731
\section{Dictionary subtraction}
10732
\label{dictsub}
10733
\index{dictionary!subtraction}
10734
\index{subtraction!dictionary}
10735
10736
Finding the words from the book that are not in the word list
10737
from {\tt words.txt} is a problem you might recognize as set
10738
subtraction; that is, we want to find all the words from one
10739
set (the words in the book) that are not in the other (the
10740
words in the list).
10741
10742
{\tt subtract} takes dictionaries {\tt d1} and {\tt d2} and returns a
10743
new dictionary that contains all the keys from {\tt d1} that are not
10744
in {\tt d2}. Since we don't really care about the values, we
10745
set them all to None.
10746
10747
\begin{verbatim}
10748
def subtract(d1, d2):
10749
res = dict()
10750
for key in d1:
10751
if key not in d2:
10752
res[key] = None
10753
return res
10754
\end{verbatim}
10755
%
10756
To find the words in the book that are not in {\tt words.txt},
10757
we can use \verb"process_file" to build a histogram for
10758
{\tt words.txt}, and then subtract:
10759
10760
\begin{verbatim}
10761
words = process_file('words.txt')
10762
diff = subtract(hist, words)
10763
10764
print("Words in the book that aren't in the word list:")
10765
for word in diff:
10766
print(word, end=' ')
10767
\end{verbatim}
10768
%
10769
Here are some of the results from {\em Emma}:
10770
10771
\begin{verbatim}
10772
Words in the book that aren't in the word list:
10773
rencontre jane's blanche woodhouses disingenuousness
10774
friend's venice apartment ...
10775
\end{verbatim}
10776
%
10777
Some of these words are names and possessives. Others, like
10778
``rencontre'', are no longer in common use. But a few are common
10779
words that should really be in the list!
10780
10781
\begin{exercise}
10782
\index{set}
10783
\index{type!set}
10784
10785
Python provides a data structure called {\tt set} that provides many
10786
common set operations. You can read about them in Section~\ref{sets},
10787
or read the documentation at
10788
\url{http://docs.python.org/3/library/stdtypes.html#types-set}.
10789
10790
Write a program that uses set subtraction to find words in the book
10791
that are not in the word list. Solution:
10792
\url{http://thinkpython2.com/code/analyze_book2.py}.
10793
10794
\end{exercise}
10795
10796
10797
\section{Random words}
10798
\label{randomwords}
10799
\index{histogram!random choice}
10800
10801
To choose a random word from the histogram, the simplest algorithm
10802
is to build a list with multiple copies of each word, according
10803
to the observed frequency, and then choose from the list:
10804
10805
\begin{verbatim}
10806
def random_word(h):
10807
t = []
10808
for word, freq in h.items():
10809
t.extend([word] * freq)
10810
10811
return random.choice(t)
10812
\end{verbatim}
10813
%
10814
The expression {\tt [word] * freq} creates a list with {\tt freq}
10815
copies of the string {\tt word}. The {\tt extend}
10816
method is similar to {\tt append} except that the argument is
10817
a sequence.
10818
10819
This algorithm works, but it is not very efficient; each time you
10820
choose a random word, it rebuilds the list, which is as big as
10821
the original book. An obvious improvement is to build the list
10822
once and then make multiple selections, but the list is still big.
10823
10824
An alternative is:
10825
10826
\begin{enumerate}
10827
10828
\item Use {\tt keys} to get a list of the words in the book.
10829
10830
\item Build a list that contains the cumulative sum of the word
10831
frequencies (see Exercise~\ref{cumulative}). The last item
10832
in this list is the total number of words in the book, $n$.
10833
10834
\item Choose a random number from 1 to $n$. Use a bisection search
10835
(See Exercise~\ref{bisection}) to find the index where the random
10836
number would be inserted in the cumulative sum.
10837
10838
\item Use the index to find the corresponding word in the word list.
10839
10840
\end{enumerate}
10841
10842
\begin{exercise}
10843
\label{randhist}
10844
\index{algorithm}
10845
10846
Write a program that uses this algorithm to choose a random word from
10847
the book. Solution:
10848
\url{http://thinkpython2.com/code/analyze_book3.py}.
10849
10850
\end{exercise}
10851
10852
10853
10854
\section{Markov analysis}
10855
\label{markov}
10856
\index{Markov analysis}
10857
10858
If you choose words from the book at random, you can get a
10859
sense of the vocabulary, but you probably won't get a sentence:
10860
10861
\begin{verbatim}
10862
this the small regard harriet which knightley's it most things
10863
\end{verbatim}
10864
%
10865
A series of random words seldom makes sense because there
10866
is no relationship between successive words. For example, in
10867
a real sentence you would expect an article like ``the'' to
10868
be followed by an adjective or a noun, and probably not a verb
10869
or adverb.
10870
10871
One way to measure these kinds of relationships is Markov
10872
analysis, which
10873
characterizes, for a given sequence of words, the probability of the
10874
words that might come next. For example, the song {\em Eric, the Half a
10875
Bee} begins:
10876
10877
\begin{quote}
10878
Half a bee, philosophically, \\
10879
Must, ipso facto, half not be. \\
10880
But half the bee has got to be \\
10881
Vis a vis, its entity. D'you see? \\
10882
\\
10883
But can a bee be said to be \\
10884
Or not to be an entire bee \\
10885
When half the bee is not a bee \\
10886
Due to some ancient injury? \\
10887
\end{quote}
10888
%
10889
In this text,
10890
the phrase ``half the'' is always followed by the word ``bee'',
10891
but the phrase ``the bee'' might be followed by either
10892
``has'' or ``is''.
10893
\index{prefix}
10894
\index{suffix}
10895
\index{mapping}
10896
10897
The result of Markov analysis is a mapping from each prefix
10898
(like ``half the'' and ``the bee'') to all possible suffixes
10899
(like ``has'' and ``is'').
10900
\index{random text}
10901
\index{text!random}
10902
10903
Given this mapping, you can generate a random text by
10904
starting with any prefix and choosing at random from the
10905
possible suffixes. Next, you can combine the end of the
10906
prefix and the new suffix to form the next prefix, and repeat.
10907
10908
For example, if you start with the prefix ``Half a'', then the
10909
next word has to be ``bee'', because the prefix only appears
10910
once in the text. The next prefix is ``a bee'', so the
10911
next suffix might be ``philosophically'', ``be'' or ``due''.
10912
10913
In this example the length of the prefix is always two, but
10914
you can do Markov analysis with any prefix length.
10915
10916
\begin{exercise}
10917
10918
Markov analysis:
10919
10920
\begin{enumerate}
10921
10922
\item Write a program to read a text from a file and perform Markov
10923
analysis. The result should be a dictionary that maps from
10924
prefixes to a collection of possible suffixes. The collection
10925
might be a list, tuple, or dictionary; it is up to you to make
10926
an appropriate choice. You can test your program with prefix
10927
length two, but you should write the program in a way that makes
10928
it easy to try other lengths.
10929
10930
\item Add a function to the previous program to generate random text
10931
based on the Markov analysis. Here is an example from {\em Emma}
10932
with prefix length 2:
10933
10934
\begin{quote}
10935
He was very clever, be it sweetness or be angry, ashamed or only
10936
amused, at such a stroke. She had never thought of Hannah till you
10937
were never meant for me?" "I cannot make speeches, Emma:" he soon cut
10938
it all himself.
10939
\end{quote}
10940
10941
For this example, I left the punctuation attached to the words.
10942
The result is almost syntactically correct, but not quite.
10943
Semantically, it almost makes sense, but not quite.
10944
10945
What happens if you increase the prefix length? Does the random
10946
text make more sense?
10947
10948
\item Once your program is working, you might want to try a mash-up:
10949
if you combine text from two or more books, the random
10950
text you generate will blend the vocabulary and phrases from
10951
the sources in interesting ways.
10952
\index{mash-up}
10953
10954
\end{enumerate}
10955
10956
Credit: This case study is based on an example from Kernighan and
10957
Pike, {\em The Practice of Programming}, Addison-Wesley, 1999.
10958
10959
\end{exercise}
10960
10961
You should attempt this exercise before you go on; then you can
10962
download my solution from \url{http://thinkpython2.com/code/markov.py}.
10963
You will also need \url{http://thinkpython2.com/code/emma.txt}.
10964
10965
10966
\section{Data structures}
10967
\index{data structure}
10968
10969
Using Markov analysis to generate random text is fun, but there is
10970
also a point to this exercise: data structure selection. In your
10971
solution to the previous exercises, you had to choose:
10972
10973
\begin{itemize}
10974
10975
\item How to represent the prefixes.
10976
10977
\item How to represent the collection of possible suffixes.
10978
10979
\item How to represent the mapping from each prefix to
10980
the collection of possible suffixes.
10981
10982
\end{itemize}
10983
10984
The last one is easy: a dictionary is the obvious choice
10985
for a mapping from keys to corresponding values.
10986
10987
For the prefixes, the most obvious options are string,
10988
list of strings, or tuple of strings.
10989
10990
For the suffixes,
10991
one option is a list; another is a histogram (dictionary).
10992
\index{implementation}
10993
10994
How should you choose? The first step is to think about
10995
the operations you will need to implement for each data structure.
10996
For the prefixes, we need to be able to remove words from
10997
the beginning and add to the end. For example, if the current
10998
prefix is ``Half a'', and the next word is ``bee'', you need
10999
to be able to form the next prefix, ``a bee''.
11000
\index{tuple!as key in dictionary}
11001
11002
Your first choice might be a list, since it is easy to add
11003
and remove elements, but we also need to be able to use the
11004
prefixes as keys in a dictionary, so that rules out lists.
11005
With tuples, you can't append or remove, but you can use
11006
the addition operator to form a new tuple:
11007
11008
\begin{verbatim}
11009
def shift(prefix, word):
11010
return prefix[1:] + (word,)
11011
\end{verbatim}
11012
%
11013
{\tt shift} takes a tuple of words, {\tt prefix}, and a string,
11014
{\tt word}, and forms a new tuple that has all the words
11015
in {\tt prefix} except the first, and {\tt word} added to
11016
the end.
11017
11018
For the collection of suffixes, the operations we need to
11019
perform include adding a new suffix (or increasing the frequency
11020
of an existing one), and choosing a random suffix.
11021
11022
Adding a new suffix is equally easy for the list implementation
11023
or the histogram. Choosing a random element from a list
11024
is easy; choosing from a histogram is harder to do
11025
efficiently (see Exercise~\ref{randhist}).
11026
11027
So far we have been talking mostly about ease of implementation,
11028
but there are other factors to consider in choosing data structures.
11029
One is run time. Sometimes there is a theoretical reason to expect
11030
one data structure to be faster than other; for example, I mentioned
11031
that the {\tt in} operator is faster for dictionaries than for lists,
11032
at least when the number of elements is large.
11033
11034
But often you don't know ahead of time which implementation will
11035
be faster. One option is to implement both of them and see which
11036
is better. This approach is called {\bf benchmarking}. A practical
11037
alternative is to choose the data structure that is
11038
easiest to implement, and then see if it is fast enough for the
11039
intended application. If so, there is no need to go on. If not,
11040
there are tools, like the {\tt profile} module, that can identify
11041
the places in a program that take the most time.
11042
\index{benchmarking}
11043
\index{profile module}
11044
\index{module!profile}
11045
11046
The other factor to consider is storage space. For example, using a
11047
histogram for the collection of suffixes might take less space because
11048
you only have to store each word once, no matter how many times it
11049
appears in the text. In some cases, saving space can also make your
11050
program run faster, and in the extreme, your program might not run at
11051
all if you run out of memory. But for many applications, space is a
11052
secondary consideration after run time.
11053
11054
One final thought: in this discussion, I have implied that
11055
we should use one data structure for both analysis and generation. But
11056
since these are separate phases, it would also be possible to use one
11057
structure for analysis and then convert to another structure for
11058
generation. This would be a net win if the time saved during
11059
generation exceeded the time spent in conversion.
11060
11061
11062
\section{Debugging}
11063
\index{debugging}
11064
11065
When you are debugging a program, and especially if you are
11066
working on a hard bug, there are five things to try:
11067
11068
\begin{description}
11069
11070
\item[Reading:] Examine your code, read it back to yourself, and
11071
check that it says what you meant to say.
11072
11073
\item[Running:] Experiment by making changes and running different
11074
versions. Often if you display the right thing at the right place
11075
in the program, the problem becomes obvious, but sometimes you have to
11076
build scaffolding.
11077
11078
\item[Ruminating:] Take some time to think! What kind of error
11079
is it: syntax, runtime, or semantic? What information can you get from
11080
the error messages, or from the output of the program? What kind of
11081
error could cause the problem you're seeing? What did you change
11082
last, before the problem appeared?
11083
11084
\item[Rubberducking:] If you explain the problem to someone else, you
11085
sometimes find the answer before you finish asking the question.
11086
Often you don't need the other person; you could just talk to a rubber
11087
duck. And that's the origin of the well-known strategy called {\bf
11088
rubber duck debugging}. I am not making this up; see
11089
\url{https://en.wikipedia.org/wiki/Rubber_duck_debugging}.
11090
11091
\item[Retreating:] At some point, the best thing to do is back
11092
off, undoing recent changes, until you get back to a program that
11093
works and that you understand. Then you can start rebuilding.
11094
11095
\end{description}
11096
11097
Beginning programmers sometimes get stuck on one of these activities
11098
and forget the others. Each activity comes with its own failure
11099
mode.
11100
\index{typographical error}
11101
11102
For example, reading your code might help if the problem is a
11103
typographical error, but not if the problem is a conceptual
11104
misunderstanding. If you don't understand what your program does, you
11105
can read it 100 times and never see the error, because the error is in
11106
your head.
11107
\index{experimental debugging}
11108
11109
Running experiments can help, especially if you run small, simple
11110
tests. But if you run experiments without thinking or reading your
11111
code, you might fall into a pattern I call ``random walk programming'',
11112
which is the process of making random changes until the program
11113
does the right thing. Needless to say, random walk programming
11114
can take a long time.
11115
\index{random walk programming}
11116
\index{development plan!random walk programming}
11117
11118
You have to take time to think. Debugging is like an
11119
experimental science. You should have at least one hypothesis about
11120
what the problem is. If there are two or more possibilities, try to
11121
think of a test that would eliminate one of them.
11122
11123
But even the best debugging techniques will fail if there are too many
11124
errors, or if the code you are trying to fix is too big and
11125
complicated. Sometimes the best option is to retreat, simplifying the
11126
program until you get to something that works and that you
11127
understand.
11128
11129
Beginning programmers are often reluctant to retreat because
11130
they can't stand to delete a line of code (even if it's wrong).
11131
If it makes you feel better, copy your program into another file
11132
before you start stripping it down. Then you can copy the pieces
11133
back one at a time.
11134
11135
Finding a hard bug requires reading, running, ruminating, and
11136
sometimes retreating. If you get stuck on one of these activities,
11137
try the others.
11138
11139
11140
\section{Glossary}
11141
11142
\begin{description}
11143
11144
\item[deterministic:] Pertaining to a program that does the same
11145
thing each time it runs, given the same inputs.
11146
\index{deterministic}
11147
11148
\item[pseudorandom:] Pertaining to a sequence of numbers that appears
11149
to be random, but is generated by a deterministic program.
11150
\index{pseudorandom}
11151
11152
\item[default value:] The value given to an optional parameter if no
11153
argument is provided.
11154
\index{default value}
11155
11156
\item[override:] To replace a default value with an argument.
11157
\index{override}
11158
11159
\item[benchmarking:] The process of choosing between data structures
11160
by implementing alternatives and testing them on a sample of the
11161
possible inputs.
11162
\index{benchmarking}
11163
11164
\item[rubber duck debugging:] Debugging by explaining your problem
11165
to an inanimate object such as a rubber duck. Articulating the
11166
problem can help you solve it, even if the rubber duck doesn't know
11167
Python.
11168
\index{rubber duck debugging}
11169
\index{debugging!rubber duck}
11170
11171
\end{description}
11172
11173
11174
\section{Exercises}
11175
11176
\begin{exercise}
11177
\index{word frequency}
11178
\index{frequency!word}
11179
\index{Zipf's law}
11180
11181
The ``rank'' of a word is its position in a list of words
11182
sorted by frequency: the most common word has rank 1, the
11183
second most common has rank 2, etc.
11184
11185
Zipf's law describes a relationship between the ranks and frequencies
11186
of words in natural languages
11187
(\url{http://en.wikipedia.org/wiki/Zipf's_law}). Specifically, it
11188
predicts that the frequency, $f$, of the word with rank $r$ is:
11189
11190
\[ f = c r^{-s} \]
11191
%
11192
where $s$ and $c$ are parameters that depend on the language and the
11193
text. If you take the logarithm of both sides of this equation, you
11194
get:
11195
\index{logarithm}
11196
11197
\[ \log f = \log c - s \log r \]
11198
%
11199
So if you plot log $f$ versus log $r$, you should get
11200
a straight line with slope $-s$ and intercept log $c$.
11201
11202
Write a program that reads a text from a file, counts
11203
word frequencies, and prints one line
11204
for each word, in descending order of frequency, with
11205
log $f$ and log $r$. Use the graphing program of your
11206
choice to plot the results and check whether they form
11207
a straight line. Can you estimate the value of $s$?
11208
11209
Solution: \url{http://thinkpython2.com/code/zipf.py}.
11210
To run my solution, you need the plotting module {\tt matplotlib}.
11211
If you installed Anaconda, you already have {\tt matplotlib};
11212
otherwise you might have to install it.
11213
\index{matplotlib}
11214
11215
\end{exercise}
11216
11217
11218
11219
\chapter{Files}
11220
11221
This chapter introduces the idea of ``persistent'' programs that
11222
keep data in permanent storage, and shows how to use different
11223
kinds of permanent storage, like files and databases.
11224
11225
11226
\section{Persistence}
11227
\index{file}
11228
\index{type!file}
11229
\index{persistence}
11230
11231
Most of the programs we have seen so far are transient in the
11232
sense that they run for a short time and produce some output,
11233
but when they end, their data disappears. If you run the program
11234
again, it starts with a clean slate.
11235
11236
Other programs are {\bf persistent}: they run for a long time
11237
(or all the time); they keep at least some of their data
11238
in permanent storage (a hard drive, for example); and
11239
if they shut down and restart, they pick up where they left off.
11240
11241
Examples of persistent programs are operating systems, which
11242
run pretty much whenever a computer is on, and web servers,
11243
which run all the time, waiting for requests to come in on
11244
the network.
11245
11246
One of the simplest ways for programs to maintain their data
11247
is by reading and writing text files. We have already seen
11248
programs that read text files; in this chapter we will see programs
11249
that write them.
11250
11251
An alternative is to store the state of the program in a database.
11252
In this chapter I will present a simple database and a module,
11253
{\tt pickle}, that makes it easy to store program data.
11254
\index{pickle module}
11255
\index{module!pickle}
11256
11257
11258
\section{Reading and writing}
11259
\index{file!reading and writing}
11260
11261
A text file is a sequence of characters stored on a permanent
11262
medium like a hard drive, flash memory, or CD-ROM. We saw how
11263
to open and read a file in Section~\ref{wordlist}.
11264
\index{open function}
11265
\index{function!open}
11266
11267
To write a file, you have to open it with mode \verb"'w'" as a second
11268
parameter:
11269
11270
\begin{verbatim}
11271
>>> fout = open('output.txt', 'w')
11272
\end{verbatim}
11273
%
11274
If the file already exists, opening it in write mode clears out
11275
the old data and starts fresh, so be careful!
11276
If the file doesn't exist, a new one is created.
11277
11278
{\tt open} returns a file object that provides methods for working
11279
with the file.
11280
The {\tt write} method puts data into the file.
11281
11282
\begin{verbatim}
11283
>>> line1 = "This here's the wattle,\n"
11284
>>> fout.write(line1)
11285
24
11286
\end{verbatim}
11287
%
11288
The return value is the number of characters that were written.
11289
The file object keeps track of where it is, so if
11290
you call {\tt write} again, it adds the new data to the end of
11291
the file.
11292
11293
\begin{verbatim}
11294
>>> line2 = "the emblem of our land.\n"
11295
>>> fout.write(line2)
11296
24
11297
\end{verbatim}
11298
%
11299
When you are done writing, you should close the file.
11300
11301
\begin{verbatim}
11302
>>> fout.close()
11303
\end{verbatim}
11304
%
11305
\index{close method}
11306
\index{method!close}
11307
%
11308
If you don't close the file, it gets closed for you when the
11309
program ends.
11310
11311
11312
\section{Format operator}
11313
\index{format operator}
11314
\index{operator!format}
11315
11316
The argument of {\tt write} has to be a string, so if we want
11317
to put other values in a file, we have to convert them to
11318
strings. The easiest way to do that is with {\tt str}:
11319
11320
\begin{verbatim}
11321
>>> x = 52
11322
>>> fout.write(str(x))
11323
\end{verbatim}
11324
%
11325
An alternative is to use the {\bf format operator}, {\tt \%}. When
11326
applied to integers, {\tt \%} is the modulus operator. But
11327
when the first operand is a string, {\tt \%} is the format operator.
11328
\index{format string}
11329
11330
The first operand is the {\bf format string}, which contains
11331
one or more {\bf format sequences}, which
11332
specify how
11333
the second operand is formatted. The result is a string.
11334
\index{format sequence}
11335
11336
For example, the format sequence \verb"'%d'" means that
11337
the second operand should be formatted as a decimal
11338
integer:
11339
11340
\begin{verbatim}
11341
>>> camels = 42
11342
>>> '%d' % camels
11343
'42'
11344
\end{verbatim}
11345
%
11346
The result is the string \verb"'42'", which is not to be confused
11347
with the integer value {\tt 42}.
11348
11349
A format sequence can appear anywhere in the string,
11350
so you can embed a value in a sentence:
11351
11352
\begin{verbatim}
11353
>>> 'I have spotted %d camels.' % camels
11354
'I have spotted 42 camels.'
11355
\end{verbatim}
11356
%
11357
If there is more than one format sequence in the string,
11358
the second argument has to be a tuple. Each format sequence is
11359
matched with an element of the tuple, in order.
11360
11361
The following example uses \verb"'%d'" to format an integer,
11362
\verb"'%g'" to format a floating-point number, and
11363
\verb"'%s'" to format a string:
11364
11365
\begin{verbatim}
11366
>>> 'In %d years I have spotted %g %s.' % (3, 0.1, 'camels')
11367
'In 3 years I have spotted 0.1 camels.'
11368
\end{verbatim}
11369
%
11370
The number of elements in the tuple has to match the number
11371
of format sequences in the string. Also, the types of the
11372
elements have to match the format sequences:
11373
\index{exception!TypeError}
11374
\index{TypeError}
11375
11376
\begin{verbatim}
11377
>>> '%d %d %d' % (1, 2)
11378
TypeError: not enough arguments for format string
11379
>>> '%d' % 'dollars'
11380
TypeError: %d format: a number is required, not str
11381
\end{verbatim}
11382
%
11383
In the first example, there aren't enough elements; in the
11384
second, the element is the wrong type.
11385
11386
For more information on the format operator, see
11387
\url{https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting}. A more powerful alternative is the string
11388
format method, which you can read about at
11389
\url{https://docs.python.org/3/library/stdtypes.html#str.format}.
11390
11391
% You can specify the number of digits as part of the format sequence.
11392
% For example, the sequence \verb"'%8.2f'"
11393
% formats a floating-point number to be 8 characters long, with
11394
% 2 digits after the decimal point:
11395
11396
% % \begin{verbatim}
11397
% >>> '%8.2f' % 3.14159
11398
% ' 3.14'
11399
% \end{verbatim}
11400
% \afterverb
11401
% %
11402
% The result takes up eight spaces with two
11403
% digits after the decimal point.
11404
11405
11406
\section{Filenames and paths}
11407
\label{paths}
11408
\index{filename}
11409
\index{path}
11410
\index{directory}
11411
\index{folder}
11412
11413
Files are organized into {\bf directories} (also called ``folders'').
11414
Every running program has a ``current directory'', which is the
11415
default directory for most operations.
11416
For example, when you open a file for reading, Python looks for it in the
11417
current directory.
11418
\index{os module}
11419
\index{module!os}
11420
11421
The {\tt os} module provides functions for working with files and
11422
directories (``os'' stands for ``operating system''). {\tt os.getcwd}
11423
returns the name of the current directory:
11424
\index{getcwd function}
11425
\index{function!getcwd}
11426
11427
\begin{verbatim}
11428
>>> import os
11429
>>> cwd = os.getcwd()
11430
>>> cwd
11431
'/home/dinsdale'
11432
\end{verbatim}
11433
%
11434
{\tt cwd} stands for ``current working directory''. The result in
11435
this example is {\tt /home/dinsdale}, which is the home directory of a
11436
user named {\tt dinsdale}.
11437
\index{working directory}
11438
\index{directory!working}
11439
11440
A string like \verb"'/home/dinsdale'" that identifies a file or
11441
directory is called a {\bf path}.
11442
11443
A simple filename, like {\tt memo.txt} is also considered a path,
11444
but it is a {\bf relative path} because it relates to the current
11445
directory. If the current directory is {\tt /home/dinsdale}, the
11446
filename {\tt memo.txt} would refer to {\tt /home/dinsdale/memo.txt}.
11447
\index{relative path} \index{path!relative}
11448
\index{absolute path} \index{path!absolute}
11449
11450
A path that begins with {\tt /} does not depend on the current
11451
directory; it is called an {\bf absolute path}. To find the absolute
11452
path to a file, you can use {\tt os.path.abspath}:
11453
11454
\begin{verbatim}
11455
>>> os.path.abspath('memo.txt')
11456
'/home/dinsdale/memo.txt'
11457
\end{verbatim}
11458
%
11459
{\tt os.path} provides other functions for working with filenames
11460
and paths. For example,
11461
{\tt os.path.exists} checks
11462
whether a file or directory exists:
11463
\index{exists function}
11464
\index{function!exists}
11465
11466
\begin{verbatim}
11467
>>> os.path.exists('memo.txt')
11468
True
11469
\end{verbatim}
11470
%
11471
If it exists, {\tt os.path.isdir} checks whether it's a directory:
11472
11473
\begin{verbatim}
11474
>>> os.path.isdir('memo.txt')
11475
False
11476
>>> os.path.isdir('/home/dinsdale')
11477
True
11478
\end{verbatim}
11479
%
11480
Similarly, {\tt os.path.isfile} checks whether it's a file.
11481
11482
{\tt os.listdir} returns a list of the files (and other directories)
11483
in the given directory:
11484
11485
\begin{verbatim}
11486
>>> os.listdir(cwd)
11487
['music', 'photos', 'memo.txt']
11488
\end{verbatim}
11489
%
11490
To demonstrate these functions, the following example
11491
``walks'' through a directory, prints
11492
the names of all the files, and calls itself recursively on
11493
all the directories.
11494
\index{walk, directory}
11495
\index{directory!walk}
11496
11497
\begin{verbatim}
11498
def walk(dirname):
11499
for name in os.listdir(dirname):
11500
path = os.path.join(dirname, name)
11501
11502
if os.path.isfile(path):
11503
print(path)
11504
else:
11505
walk(path)
11506
\end{verbatim}
11507
%
11508
{\tt os.path.join} takes a directory and a file name and joins
11509
them into a complete path.
11510
11511
The {\tt os} module provides a function called {\tt walk} that is
11512
similar to this one but more versatile. As an exercise, read the
11513
documentation and use it to print the names of the files in a given
11514
directory and its subdirectories. You can download my solution from
11515
\url{http://thinkpython2.com/code/walk.py}.
11516
11517
11518
\section{Catching exceptions}
11519
\label{catch}
11520
11521
A lot of things can go wrong when you try to read and write
11522
files. If you try to open a file that doesn't exist, you get an
11523
{\tt IOError}:
11524
\index{open function}
11525
\index{function!open}
11526
\index{exception!IOError}
11527
\index{IOError}
11528
11529
\begin{verbatim}
11530
>>> fin = open('bad_file')
11531
IOError: [Errno 2] No such file or directory: 'bad_file'
11532
\end{verbatim}
11533
%
11534
If you don't have permission to access a file:
11535
\index{file!permission}
11536
\index{permission, file}
11537
11538
\begin{verbatim}
11539
>>> fout = open('/etc/passwd', 'w')
11540
PermissionError: [Errno 13] Permission denied: '/etc/passwd'
11541
\end{verbatim}
11542
%
11543
And if you try to open a directory for reading, you get
11544
11545
\begin{verbatim}
11546
>>> fin = open('/home')
11547
IsADirectoryError: [Errno 21] Is a directory: '/home'
11548
\end{verbatim}
11549
%
11550
To avoid these errors, you could use functions like {\tt os.path.exists}
11551
and {\tt os.path.isfile}, but it would take a lot of time and code
11552
to check all the possibilities (if ``{\tt Errno 21}'' is any
11553
indication, there are at least 21 things that can go wrong).
11554
\index{exception, catching}
11555
\index{try statement}
11556
\index{statement!try}
11557
11558
It is better to go ahead and try---and deal with problems if they
11559
happen---which is exactly what the {\tt try} statement does. The
11560
syntax is similar to an {\tt if...else} statement:
11561
11562
\begin{verbatim}
11563
try:
11564
fin = open('bad_file')
11565
except:
11566
print('Something went wrong.')
11567
\end{verbatim}
11568
%
11569
Python starts by executing the {\tt try} clause. If all goes
11570
well, it skips the {\tt except} clause and proceeds. If an
11571
exception occurs, it jumps out of the {\tt try} clause and
11572
runs the {\tt except} clause.
11573
11574
Handling an exception with a {\tt try} statement is called {\bf
11575
catching} an exception. In this example, the {\tt except} clause
11576
prints an error message that is not very helpful. In general,
11577
catching an exception gives you a chance to fix the problem, or try
11578
again, or at least end the program gracefully.
11579
11580
11581
\section{Databases}
11582
\index{database}
11583
11584
A {\bf database} is a file that is organized for storing data. Many
11585
databases are organized like a dictionary in the sense that they map
11586
from keys to values. The biggest difference between a database and a
11587
dictionary is that the database is on disk (or other permanent
11588
storage), so it persists after the program ends. \index{dbm
11589
module} \index{module!dbm}
11590
11591
The module {\tt dbm} provides an interface for creating
11592
and updating database files.
11593
As an example, I'll create a database
11594
that contains captions for image files.
11595
\index{open function}
11596
\index{function!open}
11597
11598
Opening a database is similar to opening other files:
11599
11600
\begin{verbatim}
11601
>>> import dbm
11602
>>> db = dbm.open('captions', 'c')
11603
\end{verbatim}
11604
%
11605
The mode \verb"'c'" means that the database should be created if
11606
it doesn't already exist. The result is a database object
11607
that can be used (for most operations) like a dictionary.
11608
\index{database object}
11609
\index{object!database}
11610
11611
When you create a new item, {\tt dbm} updates the database file.
11612
\index{update!database}
11613
11614
\begin{verbatim}
11615
>>> db['cleese.png'] = 'Photo of John Cleese.'
11616
\end{verbatim}
11617
%
11618
When you access one of the items, {\tt dbm} reads the file:
11619
11620
\begin{verbatim}
11621
>>> db['cleese.png']
11622
b'Photo of John Cleese.'
11623
\end{verbatim}
11624
%
11625
The result is a {\bf bytes object}, which is why it begins with {\tt
11626
b}. A bytes object is similar to a string in many ways. When you
11627
get farther into Python, the difference becomes important, but for now
11628
we can ignore it.
11629
\index{bytes object}
11630
\index{object!bytes}
11631
11632
If you make another assignment to an existing key, {\tt dbm} replaces
11633
the old value:
11634
11635
\begin{verbatim}
11636
>>> db['cleese.png'] = 'Photo of John Cleese doing a silly walk.'
11637
>>> db['cleese.png']
11638
b'Photo of John Cleese doing a silly walk.'
11639
\end{verbatim}
11640
%
11641
11642
Some dictionary methods, like {\tt keys} and {\tt items}, don't
11643
work with database objects. But iteration with a {\tt for}
11644
loop works:
11645
\index{dictionary methods!dbm module}
11646
11647
\begin{verbatim}
11648
for key in db:
11649
print(key, db[key])
11650
\end{verbatim}
11651
%
11652
As with other files, you should close the database when you are
11653
done:
11654
11655
\begin{verbatim}
11656
>>> db.close()
11657
\end{verbatim}
11658
%
11659
\index{close method}
11660
\index{method!close}
11661
11662
11663
\section{Pickling}
11664
\index{pickling}
11665
11666
A limitation of {\tt dbm} is that the keys and values have to be
11667
strings or bytes. If you try to use any other type, you get an error.
11668
\index{pickle module} \index{module!pickle}
11669
11670
The {\tt pickle} module can help. It translates
11671
almost any type of object into a string suitable for storage in a
11672
database, and then translates strings back into objects.
11673
11674
{\tt pickle.dumps} takes an object as a parameter and returns
11675
a string representation ({\tt dumps} is short for ``dump string''):
11676
11677
\begin{verbatim}
11678
>>> import pickle
11679
>>> t = [1, 2, 3]
11680
>>> pickle.dumps(t)
11681
b'\x80\x03]q\x00(K\x01K\x02K\x03e.'
11682
\end{verbatim}
11683
%
11684
The format isn't obvious to human readers; it is meant to be
11685
easy for {\tt pickle} to interpret. {\tt pickle.loads}
11686
(``load string'') reconstitutes the object:
11687
11688
\begin{verbatim}
11689
>>> t1 = [1, 2, 3]
11690
>>> s = pickle.dumps(t1)
11691
>>> t2 = pickle.loads(s)
11692
>>> t2
11693
[1, 2, 3]
11694
\end{verbatim}
11695
%
11696
Although the new object has the same value as the old, it is
11697
not (in general) the same object:
11698
11699
\begin{verbatim}
11700
>>> t1 == t2
11701
True
11702
>>> t1 is t2
11703
False
11704
\end{verbatim}
11705
%
11706
In other words, pickling and then unpickling has the same effect
11707
as copying the object.
11708
11709
You can use {\tt pickle} to store non-strings in a database.
11710
In fact, this combination is so common that it has been
11711
encapsulated in a module called {\tt shelve}.
11712
\index{shelve module}
11713
\index{module!shelve}
11714
11715
11716
\section{Pipes}
11717
\index{shell}
11718
\index{pipe}
11719
11720
Most operating systems provide a command-line interface,
11721
also known as a {\bf shell}. Shells usually provide commands
11722
to navigate the file system and launch applications. For
11723
example, in Unix you can change directories with {\tt cd},
11724
display the contents of a directory with {\tt ls}, and launch
11725
a web browser by typing (for example) {\tt firefox}.
11726
\index{ls (Unix command)}
11727
\index{Unix command!ls}
11728
11729
Any program that you can launch from the shell can also be
11730
launched from Python using a {\bf pipe object}, which
11731
represents a running program.
11732
11733
For example, the Unix command {\tt ls -l} normally displays the
11734
contents of the current directory in long format. You can
11735
launch {\tt ls} with {\tt os.popen}\footnote{{\tt popen} is deprecated
11736
now, which means we are supposed to stop using it and start using
11737
the {\tt subprocess} module. But for simple cases, I find
11738
{\tt subprocess} more complicated than necessary. So I am going
11739
to keep using {\tt popen} until they take it away.}:
11740
\index{popen function}
11741
\index{function!popen}
11742
11743
\begin{verbatim}
11744
>>> cmd = 'ls -l'
11745
>>> fp = os.popen(cmd)
11746
\end{verbatim}
11747
%
11748
The argument is a string that contains a shell command. The
11749
return value is an object that behaves like an open
11750
file. You can read the output from the {\tt ls} process one
11751
line at a time with {\tt readline} or get the whole thing at
11752
once with {\tt read}:
11753
\index{readline method}
11754
\index{method!readline}
11755
\index{read method}
11756
\index{method!read}
11757
11758
\begin{verbatim}
11759
>>> res = fp.read()
11760
\end{verbatim}
11761
%
11762
When you are done, you close the pipe like a file:
11763
\index{close method}
11764
\index{method!close}
11765
11766
\begin{verbatim}
11767
>>> stat = fp.close()
11768
>>> print(stat)
11769
None
11770
\end{verbatim}
11771
%
11772
The return value is the final status of the {\tt ls} process;
11773
{\tt None} means that it ended normally (with no errors).
11774
11775
For example, most Unix systems provide a command called {\tt md5sum}
11776
that reads the contents of a file and computes a ``checksum''.
11777
You can read about MD5 at \url{http://en.wikipedia.org/wiki/Md5}. This
11778
command provides an efficient way to check whether two files
11779
have the same contents. The probability that different contents
11780
yield the same checksum is very small (that is, unlikely to happen
11781
before the universe collapses).
11782
\index{md5}
11783
\index{checksum}
11784
11785
You can use a pipe to run {\tt md5sum} from Python and get the result:
11786
11787
\begin{verbatim}
11788
>>> filename = 'book.tex'
11789
>>> cmd = 'md5sum ' + filename
11790
>>> fp = os.popen(cmd)
11791
>>> res = fp.read()
11792
>>> stat = fp.close()
11793
>>> print(res)
11794
1e0033f0ed0656636de0d75144ba32e0 book.tex
11795
>>> print(stat)
11796
None
11797
\end{verbatim}
11798
11799
11800
11801
\section{Writing modules}
11802
\label{modules}
11803
\index{module, writing}
11804
\index{word count}
11805
11806
Any file that contains Python code can be imported as a module.
11807
For example, suppose you have a file named {\tt wc.py} with the following
11808
code:
11809
11810
\begin{verbatim}
11811
def linecount(filename):
11812
count = 0
11813
for line in open(filename):
11814
count += 1
11815
return count
11816
11817
print(linecount('wc.py'))
11818
\end{verbatim}
11819
%
11820
If you run this program, it reads itself and prints the number
11821
of lines in the file, which is 7.
11822
You can also import it like this:
11823
11824
\begin{verbatim}
11825
>>> import wc
11826
7
11827
\end{verbatim}
11828
%
11829
Now you have a module object {\tt wc}:
11830
\index{module object}
11831
\index{object!module}
11832
11833
\begin{verbatim}
11834
>>> wc
11835
<module 'wc' from 'wc.py'>
11836
\end{verbatim}
11837
%
11838
The module object provides \verb"linecount":
11839
11840
\begin{verbatim}
11841
>>> wc.linecount('wc.py')
11842
7
11843
\end{verbatim}
11844
%
11845
So that's how you write modules in Python.
11846
11847
The only problem with this example is that when you import
11848
the module it runs the test code at the bottom. Normally
11849
when you import a module, it defines new functions but it
11850
doesn't run them.
11851
\index{import statement}
11852
\index{statement!import}
11853
11854
Programs that will be imported as modules often
11855
use the following idiom:
11856
11857
\begin{verbatim}
11858
if __name__ == '__main__':
11859
print(linecount('wc.py'))
11860
\end{verbatim}
11861
%
11862
\verb"__name__" is a built-in variable that is set when the
11863
program starts. If the program is running as a script,
11864
\verb"__name__" has the value \verb"'__main__'"; in that
11865
case, the test code runs. Otherwise,
11866
if the module is being imported, the test code is skipped.
11867
11868
\index{name built-in variable}
11869
\index{main}
11870
11871
As an exercise, type this example into a file named {\tt wc.py} and run
11872
it as a script. Then run the Python interpreter and
11873
{\tt import wc}. What is the value of \verb"__name__"
11874
when the module is being imported?
11875
11876
Warning: If you import a module that has already been imported,
11877
Python does nothing. It does not re-read the file, even if it has
11878
changed.
11879
\index{module!reload}
11880
\index{reload function}
11881
\index{function!reload}
11882
11883
If you want to reload a module, you can use the built-in function
11884
{\tt reload}, but it can be tricky, so the safest thing to do is
11885
restart the interpreter and then import the module again.
11886
11887
11888
\section{Debugging}
11889
\index{debugging}
11890
\index{whitespace}
11891
11892
When you are reading and writing files, you might run into problems
11893
with whitespace. These errors can be hard to debug because spaces,
11894
tabs and newlines are normally invisible:
11895
11896
\begin{verbatim}
11897
>>> s = '1 2\t 3\n 4'
11898
>>> print(s)
11899
1 2 3
11900
4
11901
\end{verbatim}
11902
\index{repr function}
11903
\index{function!repr}
11904
\index{string representation}
11905
11906
The built-in function {\tt repr} can help. It takes any object as an
11907
argument and returns a string representation of the object. For
11908
strings, it represents whitespace
11909
characters with backslash sequences:
11910
11911
\begin{verbatim}
11912
>>> print(repr(s))
11913
'1 2\t 3\n 4'
11914
\end{verbatim}
11915
11916
This can be helpful for debugging.
11917
11918
One other problem you might run into is that different systems
11919
use different characters to indicate the end of a line. Some
11920
systems use a newline, represented \verb"\n". Others use
11921
a return character, represented \verb"\r". Some use both.
11922
If you move files between different systems, these inconsistencies
11923
can cause problems.
11924
\index{end of line character}
11925
11926
For most systems, there are applications to convert from one
11927
format to another. You can find them (and read more about this
11928
issue) at \url{http://en.wikipedia.org/wiki/Newline}. Or, of course, you
11929
could write one yourself.
11930
11931
11932
\section{Glossary}
11933
11934
\begin{description}
11935
11936
\item[persistent:] Pertaining to a program that runs indefinitely
11937
and keeps at least some of its data in permanent storage.
11938
\index{persistence}
11939
11940
\item[format operator:] An operator, {\tt \%}, that takes a format
11941
string and a tuple and generates a string that includes
11942
the elements of the tuple formatted as specified by the format string.
11943
\index{format operator}
11944
\index{operator!format}
11945
11946
\item[format string:] A string, used with the format operator, that
11947
contains format sequences.
11948
\index{format string}
11949
11950
\item[format sequence:] A sequence of characters in a format string,
11951
like {\tt \%d}, that specifies how a value should be formatted.
11952
\index{format sequence}
11953
11954
\item[text file:] A sequence of characters stored in permanent
11955
storage like a hard drive.
11956
\index{text file}
11957
11958
\item[directory:] A named collection of files, also called a folder.
11959
\index{directory}
11960
11961
\item[path:] A string that identifies a file.
11962
\index{path}
11963
11964
\item[relative path:] A path that starts from the current directory.
11965
\index{relative path}
11966
11967
\item[absolute path:] A path that starts from the topmost directory
11968
in the file system.
11969
\index{absolute path}
11970
11971
\item[catch:] To prevent an exception from terminating
11972
a program using the {\tt try}
11973
and {\tt except} statements.
11974
\index{catch}
11975
11976
\item[database:] A file whose contents are organized like a dictionary
11977
with keys that correspond to values.
11978
\index{database}
11979
11980
\item[bytes object:] An object similar to a string.
11981
\index{bytes object}
11982
\index{object!bytes}
11983
11984
\item[shell:] A program that allows users to type commands and then
11985
executes them by starting other programs.
11986
\index{shell}
11987
11988
\item[pipe object:] An object that represents a running program, allowing
11989
a Python program to run commands and read the results.
11990
\index{pipe object}
11991
\index{object!pipe}
11992
11993
\end{description}
11994
11995
11996
\section{Exercises}
11997
11998
\begin{exercise}
11999
12000
Write a function called {\tt sed} that takes as arguments a pattern string,
12001
a replacement string, and two filenames; it should read the first file
12002
and write the contents into the second file (creating it if
12003
necessary). If the pattern string appears anywhere in the file, it
12004
should be replaced with the replacement string.
12005
12006
If an error occurs while opening, reading, writing or closing files,
12007
your program should catch the exception, print an error message, and
12008
exit. Solution: \url{http://thinkpython2.com/code/sed.py}.
12009
12010
\end{exercise}
12011
12012
12013
\begin{exercise}
12014
\index{anagram set}
12015
\index{set!anagram}
12016
12017
If you download my solution to Exercise~\ref{anagrams} from
12018
\url{http://thinkpython2.com/code/anagram_sets.py}, you'll see that it creates
12019
a dictionary that maps from a sorted string of letters to the list of
12020
words that can be spelled with those letters. For example,
12021
\verb"'opst'" maps to the list
12022
\verb"['opts', 'post', 'pots', 'spot', 'stop', 'tops']".
12023
12024
Write a module that imports \verb"anagram_sets" and provides
12025
two new functions: \verb"store_anagrams" should store the
12026
anagram dictionary in a ``shelf''; \verb"read_anagrams" should
12027
look up a word and return a list of its anagrams.
12028
Solution: \url{http://thinkpython2.com/code/anagram_db.py}.
12029
12030
\end{exercise}
12031
12032
12033
\begin{exercise}
12034
\label{checksum}
12035
\index{MP3}
12036
12037
In a large collection of MP3 files, there may be more than one
12038
copy of the same song, stored in different directories or with
12039
different file names. The goal of this exercise is to search for
12040
duplicates.
12041
12042
\begin{enumerate}
12043
12044
\item Write a program that searches a directory and all of its
12045
subdirectories, recursively, and returns a list of complete paths
12046
for all files with a given suffix (like {\tt .mp3}).
12047
Hint: {\tt os.path} provides several useful functions for
12048
manipulating file and path names.
12049
\index{duplicate}
12050
\index{MD5 algorithm}
12051
\index{algorithm!MD5}
12052
\index{checksum}
12053
12054
\item To recognize duplicates, you can use {\tt md5sum}
12055
to compute a ``checksum'' for each files. If two files have
12056
the same checksum, they probably have the same contents.
12057
\index{md5sum}
12058
12059
\item To double-check, you can use the Unix command {\tt diff}.
12060
\index{diff}
12061
12062
\end{enumerate}
12063
12064
Solution: \url{http://thinkpython2.com/code/find_duplicates.py}.
12065
12066
\end{exercise}
12067
12068
12069
12070
\chapter{Classes and objects}
12071
\label{clobjects}
12072
12073
At this point you know how to use
12074
functions to organize code and
12075
built-in types to organize data. The next step is to learn
12076
``object-oriented programming'', which uses programmer-defined types
12077
to organize both code and data. Object-oriented programming is
12078
a big topic; it will take a few chapters to get there.
12079
\index{object-oriented programming}
12080
12081
Code examples from this chapter are available from
12082
\url{http://thinkpython2.com/code/Point1.py}; solutions
12083
to the exercises are available from
12084
\url{http://thinkpython2.com/code/Point1_soln.py}.
12085
12086
12087
\section{Programmer-defined types}
12088
\label{point}
12089
\index{programmer-defined type}
12090
\index{type!programmer-defined}
12091
12092
We have used many of Python's built-in types; now we are going
12093
to define a new type. As an example, we will create a type
12094
called {\tt Point} that represents a point in two-dimensional
12095
space.
12096
\index{point, mathematical}
12097
12098
In mathematical notation, points are often written in
12099
parentheses with a comma separating the coordinates. For example,
12100
$(0,0)$ represents the origin, and $(x,y)$ represents the
12101
point $x$ units to the right and $y$ units up from the origin.
12102
12103
There are several ways we might represent points in Python:
12104
12105
\begin{itemize}
12106
12107
\item We could store the coordinates separately in two
12108
variables, {\tt x} and {\tt y}.
12109
12110
\item We could store the coordinates as elements in a list
12111
or tuple.
12112
12113
\item We could create a new type to represent points as
12114
objects.
12115
12116
\end{itemize}
12117
\index{representation}
12118
12119
Creating a new type
12120
is more complicated than the other options, but
12121
it has advantages that will be apparent soon.
12122
12123
A programmer-defined type is also called a {\bf class}.
12124
A class definition looks like this:
12125
\index{class}
12126
\index{object!class}
12127
\index{class definition}
12128
\index{definition!class}
12129
12130
\begin{verbatim}
12131
class Point:
12132
"""Represents a point in 2-D space."""
12133
\end{verbatim}
12134
%
12135
The header indicates that the new class is called {\tt Point}.
12136
The body is a docstring that explains what the class is for.
12137
You can define variables and methods inside a class definition,
12138
but we will get back to that later.
12139
\index{Point class}
12140
\index{class!Point}
12141
\index{docstring}
12142
12143
Defining a class named {\tt Point} creates a {\bf class object}.
12144
12145
\begin{verbatim}
12146
>>> Point
12147
<class '__main__.Point'>
12148
\end{verbatim}
12149
%
12150
Because {\tt Point} is defined at the top level, its ``full
12151
name'' is \verb"__main__.Point".
12152
\index{object!class}
12153
\index{class object}
12154
12155
The class object is like a factory for creating objects. To create a
12156
Point, you call {\tt Point} as if it were a function.
12157
12158
\begin{verbatim}
12159
>>> blank = Point()
12160
>>> blank
12161
<__main__.Point object at 0xb7e9d3ac>
12162
\end{verbatim}
12163
%
12164
The return value is a reference to a Point object, which we
12165
assign to {\tt blank}.
12166
12167
Creating a new object is called
12168
{\bf instantiation}, and the object is an {\bf instance} of
12169
the class.
12170
\index{instance}
12171
\index{instantiation}
12172
12173
When you print an instance, Python tells you what class it
12174
belongs to and where it is stored in memory (the prefix
12175
{\tt 0x} means that the following number is in hexadecimal).
12176
\index{hexadecimal}
12177
12178
Every object is an instance of some class, so ``object'' and
12179
``instance'' are interchangeable. But in this chapter I use
12180
``instance'' to indicate that I am talking about a programmer-defined
12181
type.
12182
12183
12184
\section{Attributes}
12185
\label{attributes}
12186
\index{instance attribute}
12187
\index{attribute!instance}
12188
\index{dot notation}
12189
12190
You can assign values to an instance using dot notation:
12191
12192
\begin{verbatim}
12193
>>> blank.x = 3.0
12194
>>> blank.y = 4.0
12195
\end{verbatim}
12196
%
12197
This syntax is similar to the syntax for selecting a variable from a
12198
module, such as {\tt math.pi} or {\tt string.whitespace}. In this case,
12199
though, we are assigning values to named elements of an object.
12200
These elements are called {\bf attributes}.
12201
12202
As a noun, ``AT-trib-ute'' is pronounced with emphasis on the first
12203
syllable, as opposed to ``a-TRIB-ute'', which is a verb.
12204
12205
Figure~\ref{fig.point} is a state diagram that shows the result of these assignments.
12206
A state diagram that shows an object and its attributes is
12207
called an {\bf object diagram}.
12208
\index{state diagram}
12209
\index{diagram!state}
12210
\index{object diagram}
12211
\index{diagram!object}
12212
12213
\begin{figure}
12214
\centerline
12215
{\includegraphics[scale=0.8]{figs/point.pdf}}
12216
\caption{Object diagram.}
12217
\label{fig.point}
12218
\end{figure}
12219
12220
The variable {\tt blank} refers to a Point object, which
12221
contains two attributes. Each attribute refers to a
12222
floating-point number.
12223
12224
You can read the value of an attribute using the same syntax:
12225
12226
\begin{verbatim}
12227
>>> blank.y
12228
4.0
12229
>>> x = blank.x
12230
>>> x
12231
3.0
12232
\end{verbatim}
12233
%
12234
The expression {\tt blank.x} means, ``Go to the object {\tt blank}
12235
refers to and get the value of {\tt x}.'' In the example, we assign that
12236
value to a variable named {\tt x}. There is no conflict between
12237
the variable {\tt x} and the attribute {\tt x}.
12238
12239
You can use dot notation as part of any expression. For example:
12240
12241
\begin{verbatim}
12242
>>> '(%g, %g)' % (blank.x, blank.y)
12243
'(3.0, 4.0)'
12244
>>> distance = math.sqrt(blank.x**2 + blank.y**2)
12245
>>> distance
12246
5.0
12247
\end{verbatim}
12248
%
12249
You can pass an instance as an argument in the usual way.
12250
For example:
12251
\index{instance!as argument}
12252
12253
\begin{verbatim}
12254
def print_point(p):
12255
print('(%g, %g)' % (p.x, p.y))
12256
\end{verbatim}
12257
%
12258
\verb"print_point" takes a point as an argument and displays it in
12259
mathematical notation. To invoke it, you can pass {\tt blank} as
12260
an argument:
12261
12262
\begin{verbatim}
12263
>>> print_point(blank)
12264
(3.0, 4.0)
12265
\end{verbatim}
12266
%
12267
Inside the function, {\tt p} is an alias for {\tt blank}, so if
12268
the function modifies {\tt p}, {\tt blank} changes.
12269
\index{aliasing}
12270
12271
As an exercise, write a function called \verb"distance_between_points"
12272
that takes two Points as arguments and returns the distance between
12273
them.
12274
12275
12276
\section{Rectangles}
12277
\label{rectangles}
12278
12279
Sometimes it is obvious what the attributes of an object should be,
12280
but other times you have to make decisions. For example, imagine you
12281
are designing a class to represent rectangles. What attributes would
12282
you use to specify the location and size of a rectangle? You can
12283
ignore angle; to keep things simple, assume that the rectangle is
12284
either vertical or horizontal.
12285
\index{representation}
12286
12287
There are at least two possibilities:
12288
12289
\begin{itemize}
12290
12291
\item You could specify one corner of the rectangle
12292
(or the center), the width, and the height.
12293
12294
\item You could specify two opposing corners.
12295
12296
\end{itemize}
12297
12298
At this point it is hard to say whether either is better than
12299
the other, so we'll implement the first one, just as an example.
12300
\index{Rectangle class}
12301
\index{class!Rectangle}
12302
12303
Here is the class definition:
12304
12305
\begin{verbatim}
12306
class Rectangle:
12307
"""Represents a rectangle.
12308
12309
attributes: width, height, corner.
12310
"""
12311
\end{verbatim}
12312
%
12313
The docstring lists the attributes: {\tt width} and
12314
{\tt height} are numbers; {\tt corner} is a Point object that
12315
specifies the lower-left corner.
12316
12317
To represent a rectangle, you have to instantiate a Rectangle
12318
object and assign values to the attributes:
12319
12320
\begin{verbatim}
12321
box = Rectangle()
12322
box.width = 100.0
12323
box.height = 200.0
12324
box.corner = Point()
12325
box.corner.x = 0.0
12326
box.corner.y = 0.0
12327
\end{verbatim}
12328
%
12329
The expression {\tt box.corner.x} means,
12330
``Go to the object {\tt box} refers to and select the attribute named
12331
{\tt corner}; then go to that object and select the attribute named
12332
{\tt x}.''
12333
12334
\begin{figure}
12335
\centerline
12336
{\includegraphics[scale=0.8]{figs/rectangle.pdf}}
12337
\caption{Object diagram.}
12338
\label{fig.rectangle}
12339
\end{figure}
12340
12341
12342
Figure~\ref{fig.rectangle} shows the state of this object.
12343
An object that is an attribute of another object is {\bf embedded}.
12344
\index{state diagram}
12345
\index{diagram!state}
12346
\index{object diagram}
12347
\index{diagram!object}
12348
\index{embedded object}
12349
\index{object!embedded}
12350
12351
12352
\section{Instances as return values}
12353
\index{instance!as return value}
12354
\index{return value}
12355
12356
Functions can return instances. For example, \verb"find_center"
12357
takes a {\tt Rectangle} as an argument and returns a {\tt Point}
12358
that contains the coordinates of the center of the {\tt Rectangle}:
12359
12360
\begin{verbatim}
12361
def find_center(rect):
12362
p = Point()
12363
p.x = rect.corner.x + rect.width/2
12364
p.y = rect.corner.y + rect.height/2
12365
return p
12366
\end{verbatim}
12367
%
12368
Here is an example that passes {\tt box} as an argument and assigns
12369
the resulting Point to {\tt center}:
12370
12371
\begin{verbatim}
12372
>>> center = find_center(box)
12373
>>> print_point(center)
12374
(50, 100)
12375
\end{verbatim}
12376
%
12377
12378
\section{Objects are mutable}
12379
\index{object!mutable}
12380
\index{mutability}
12381
12382
You can change the state of an object by making an assignment to one of
12383
its attributes. For example, to change the size of a rectangle
12384
without changing its position, you can modify the values of {\tt
12385
width} and {\tt height}:
12386
12387
\begin{verbatim}
12388
box.width = box.width + 50
12389
box.height = box.height + 100
12390
\end{verbatim}
12391
%
12392
You can also write functions that modify objects. For example,
12393
\verb"grow_rectangle" takes a Rectangle object and two numbers,
12394
{\tt dwidth} and {\tt dheight}, and adds the numbers to the
12395
width and height of the rectangle:
12396
12397
\begin{verbatim}
12398
def grow_rectangle(rect, dwidth, dheight):
12399
rect.width += dwidth
12400
rect.height += dheight
12401
\end{verbatim}
12402
%
12403
Here is an example that demonstrates the effect:
12404
12405
\begin{verbatim}
12406
>>> box.width, box.height
12407
(150.0, 300.0)
12408
>>> grow_rectangle(box, 50, 100)
12409
>>> box.width, box.height
12410
(200.0, 400.0)
12411
\end{verbatim}
12412
%
12413
Inside the function, {\tt rect} is an
12414
alias for {\tt box}, so when the function modifies {\tt rect},
12415
{\tt box} changes.
12416
12417
As an exercise, write a function named \verb"move_rectangle" that takes
12418
a Rectangle and two numbers named {\tt dx} and {\tt dy}. It
12419
should change the location of the rectangle by adding {\tt dx}
12420
to the {\tt x} coordinate of {\tt corner} and adding {\tt dy}
12421
to the {\tt y} coordinate of {\tt corner}.
12422
12423
12424
\section{Copying}
12425
\label{copying}
12426
\index{aliasing}
12427
12428
Aliasing can make a program difficult to read because changes
12429
in one place might have unexpected effects in another place.
12430
It is hard to keep track of all the variables that might refer
12431
to a given object.
12432
\index{copying objects}
12433
\index{object!copying}
12434
\index{copy module}
12435
\index{module!copy}
12436
12437
Copying an object is often an alternative to aliasing.
12438
The {\tt copy} module contains a function called {\tt copy} that
12439
can duplicate any object:
12440
12441
\begin{verbatim}
12442
>>> p1 = Point()
12443
>>> p1.x = 3.0
12444
>>> p1.y = 4.0
12445
12446
>>> import copy
12447
>>> p2 = copy.copy(p1)
12448
\end{verbatim}
12449
%
12450
{\tt p1} and {\tt p2} contain the same data, but they are
12451
not the same Point.
12452
12453
\begin{verbatim}
12454
>>> print_point(p1)
12455
(3, 4)
12456
>>> print_point(p2)
12457
(3, 4)
12458
>>> p1 is p2
12459
False
12460
>>> p1 == p2
12461
False
12462
\end{verbatim}
12463
%
12464
The {\tt is} operator indicates that {\tt p1} and {\tt p2} are not the
12465
same object, which is what we expected. But you might have expected
12466
{\tt ==} to yield {\tt True} because these points contain the same
12467
data. In that case, you will be disappointed to learn that for
12468
instances, the default behavior of the {\tt ==} operator is the same
12469
as the {\tt is} operator; it checks object identity, not object
12470
equivalence. That's because for programmer-defined types, Python doesn't
12471
know what should be considered equivalent. At least, not yet.
12472
\index{is operator}
12473
\index{operator!is}
12474
\index{identity}
12475
\index{equivalence}
12476
12477
If you use {\tt copy.copy} to duplicate a Rectangle, you will find
12478
that it copies the Rectangle object but not the embedded Point.
12479
\index{embedded object!copying}
12480
12481
\begin{verbatim}
12482
>>> box2 = copy.copy(box)
12483
>>> box2 is box
12484
False
12485
>>> box2.corner is box.corner
12486
True
12487
\end{verbatim}
12488
12489
\begin{figure}
12490
\centerline
12491
{\includegraphics[scale=0.8]{figs/rectangle2.pdf}}
12492
\caption{Object diagram.}
12493
\label{fig.rectangle2}
12494
\end{figure}
12495
12496
Figure~\ref{fig.rectangle2} shows what the object diagram looks like.
12497
\index{state diagram}
12498
\index{diagram!state}
12499
\index{object diagram}
12500
\index{diagram!object}
12501
This operation is called a {\bf shallow copy} because it copies the
12502
object and any references it contains, but not the embedded objects.
12503
\index{shallow copy}
12504
\index{copy!shallow}
12505
12506
For most applications, this is not what you want. In this example,
12507
invoking \verb"grow_rectangle" on one of the Rectangles would not
12508
affect the other, but invoking \verb"move_rectangle" on either would
12509
affect both! This behavior is confusing and error-prone.
12510
\index{deep copy}
12511
\index{copy!deep}
12512
12513
Fortunately, the {\tt copy} module provides a method named {\tt
12514
deepcopy} that copies not only the object but also
12515
the objects it refers to, and the objects {\em they} refer to,
12516
and so on.
12517
You will not be surprised to learn that this operation is
12518
called a {\bf deep copy}.
12519
\index{deepcopy function}
12520
\index{function!deepcopy}
12521
12522
\begin{verbatim}
12523
>>> box3 = copy.deepcopy(box)
12524
>>> box3 is box
12525
False
12526
>>> box3.corner is box.corner
12527
False
12528
\end{verbatim}
12529
%
12530
{\tt box3} and {\tt box} are completely separate objects.
12531
12532
As an exercise, write a version of \verb"move_rectangle" that creates and
12533
returns a new Rectangle instead of modifying the old one.
12534
12535
12536
\section{Debugging}
12537
\label{hasattr}
12538
\index{debugging}
12539
12540
When you start working with objects, you are likely to encounter
12541
some new exceptions. If you try to access an attribute
12542
that doesn't exist, you get an {\tt AttributeError}:
12543
\index{exception!AttributeError}
12544
\index{AttributeError}
12545
12546
\begin{verbatim}
12547
>>> p = Point()
12548
>>> p.x = 3
12549
>>> p.y = 4
12550
>>> p.z
12551
AttributeError: Point instance has no attribute 'z'
12552
\end{verbatim}
12553
%
12554
If you are not sure what type an object is, you can ask:
12555
\index{type function}
12556
\index{function!type}
12557
12558
\begin{verbatim}
12559
>>> type(p)
12560
<class '__main__.Point'>
12561
\end{verbatim}
12562
%
12563
You can also use {\tt isinstance} to check whether an object
12564
is an instance of a class:
12565
\index{isinstance function}
12566
\index{function!isinstance}
12567
12568
\begin{verbatim}
12569
>>> isinstance(p, Point)
12570
True
12571
\end{verbatim}
12572
%
12573
If you are not sure whether an object has a particular attribute,
12574
you can use the built-in function {\tt hasattr}:
12575
\index{hasattr function}
12576
\index{function!hasattr}
12577
12578
\begin{verbatim}
12579
>>> hasattr(p, 'x')
12580
True
12581
>>> hasattr(p, 'z')
12582
False
12583
\end{verbatim}
12584
%
12585
The first argument can be any object; the second argument is a {\em
12586
string} that contains the name of the attribute.
12587
\index{attribute}
12588
12589
You can also use a {\tt try} statement to see if the object has the
12590
attributes you need:
12591
\index{try statement}
12592
\index{statement!try}
12593
12594
\begin{verbatim}
12595
try:
12596
x = p.x
12597
except AttributeError:
12598
x = 0
12599
\end{verbatim}
12600
12601
This approach can make it easier to write functions that work with
12602
different types; more on that topic is
12603
coming up in Section~\ref{polymorphism}.
12604
12605
12606
\section{Glossary}
12607
12608
\begin{description}
12609
12610
\item[class:] A programmer-defined type. A class definition creates a new
12611
class object.
12612
\index{class}
12613
\index{programmer-defined type}
12614
\index{type!programmer-defined}
12615
12616
\item[class object:] An object that contains information about a
12617
programmer-defined type. The class object can be used to create instances
12618
of the type.
12619
\index{class object}
12620
\index{object!class}
12621
12622
\item[instance:] An object that belongs to a class.
12623
\index{instance}
12624
12625
\item[instantiate:] To create a new object.
12626
\index{instantiate}
12627
12628
\item[attribute:] One of the named values associated with an object.
12629
\index{attribute!instance}
12630
\index{instance attribute}
12631
12632
\item[embedded object:] An object that is stored as an attribute
12633
of another object.
12634
\index{embedded object}
12635
\index{object!embedded}
12636
12637
\item[shallow copy:] To copy the contents of an object, including
12638
any references to embedded objects;
12639
implemented by the {\tt copy} function in the {\tt copy} module.
12640
\index{shallow copy}
12641
12642
\item[deep copy:] To copy the contents of an object as well as any
12643
embedded objects, and any objects embedded in them, and so on;
12644
implemented by the {\tt deepcopy} function in the {\tt copy} module.
12645
\index{deep copy}
12646
12647
\item[object diagram:] A diagram that shows objects, their
12648
attributes, and the values of the attributes.
12649
\index{object diagram}
12650
\index{diagram!object}
12651
12652
\end{description}
12653
12654
12655
\section{Exercises}
12656
12657
\begin{exercise}
12658
12659
Write a definition for a class named {\tt Circle} with attributes
12660
{\tt center} and {\tt radius}, where {\tt center} is a Point object
12661
and radius is a number.
12662
12663
Instantiate a Circle object that represents a circle with its center
12664
at $(150, 100)$ and radius 75.
12665
12666
Write a function named \verb"point_in_circle" that takes a Circle and
12667
a Point and returns True if the Point lies in or on the boundary of
12668
the circle.
12669
12670
Write a function named \verb"rect_in_circle" that takes a Circle and a
12671
Rectangle and returns True if the Rectangle lies entirely in or on the boundary
12672
of the circle.
12673
12674
Write a function named \verb"rect_circle_overlap" that takes a Circle
12675
and a Rectangle and returns True if any of the corners of the Rectangle fall
12676
inside the circle. Or as a more challenging version, return True if
12677
any part of the Rectangle falls inside the circle.
12678
12679
Solution: \url{http://thinkpython2.com/code/Circle.py}.
12680
12681
\end{exercise}
12682
12683
12684
\begin{exercise}
12685
12686
Write a function called \verb"draw_rect" that takes a Turtle object
12687
and a Rectangle and uses the Turtle to draw the Rectangle. See
12688
Chapter~\ref{turtlechap} for examples using Turtle objects.
12689
12690
Write a function called \verb"draw_circle" that takes a Turtle and
12691
a Circle and draws the Circle.
12692
12693
Solution: \url{http://thinkpython2.com/code/draw.py}.
12694
12695
\end{exercise}
12696
12697
12698
12699
\chapter{Classes and functions}
12700
\label{time}
12701
12702
Now that we know how to create new types, the next
12703
step is to write functions that take programmer-defined objects
12704
as parameters and return them as results. In this chapter I
12705
also present ``functional programming style'' and two new
12706
program development plans.
12707
12708
Code examples from this chapter are available from
12709
\url{http://thinkpython2.com/code/Time1.py}.
12710
Solutions to the exercises are at
12711
\url{http://thinkpython2.com/code/Time1_soln.py}.
12712
12713
12714
\section{Time}
12715
\label{isafter}
12716
12717
As another example of a programmer-defined type, we'll define a class
12718
called {\tt Time} that records the time of day. The class definition
12719
looks like this: \index{programmer-defined type}
12720
\index{type!programmer-defined} \index{Time class} \index{class!Time}
12721
12722
\begin{verbatim}
12723
class Time:
12724
"""Represents the time of day.
12725
12726
attributes: hour, minute, second
12727
"""
12728
\end{verbatim}
12729
%
12730
We can create a new {\tt Time} object and assign
12731
attributes for hours, minutes, and seconds:
12732
12733
\begin{verbatim}
12734
time = Time()
12735
time.hour = 11
12736
time.minute = 59
12737
time.second = 30
12738
\end{verbatim}
12739
%
12740
The state diagram for the {\tt Time} object looks like Figure~\ref{fig.time}.
12741
\index{state diagram}
12742
\index{diagram!state}
12743
\index{object diagram}
12744
\index{diagram!object}
12745
12746
As an exercise, write a function called \verb"print_time" that takes a
12747
Time object and prints it in the form {\tt hour:minute:second}.
12748
Hint: the format sequence \verb"'%.2d'" prints an integer using
12749
at least two digits, including a leading zero if necessary.
12750
12751
Write a boolean function called \verb"is_after" that
12752
takes two Time objects, {\tt t1} and {\tt t2}, and
12753
returns {\tt True} if {\tt t1} follows {\tt t2} chronologically and
12754
{\tt False} otherwise. Challenge: don't use an {\tt if} statement.
12755
12756
\begin{figure}
12757
\centerline
12758
{\includegraphics[scale=0.8]{figs/time.pdf}}
12759
\caption{Object diagram.}
12760
\label{fig.time}
12761
\end{figure}
12762
12763
12764
\section{Pure functions}
12765
\index{prototype and patch}
12766
\index{development plan!prototype and patch}
12767
12768
In the next few sections, we'll write two functions that add time
12769
values. They demonstrate two kinds of functions: pure functions and
12770
modifiers. They also demonstrate a development plan I'll call {\bf
12771
prototype and patch}, which is a way of tackling a complex problem
12772
by starting with a simple prototype and incrementally dealing with the
12773
complications.
12774
12775
Here is a simple prototype of \verb"add_time":
12776
12777
\begin{verbatim}
12778
def add_time(t1, t2):
12779
sum = Time()
12780
sum.hour = t1.hour + t2.hour
12781
sum.minute = t1.minute + t2.minute
12782
sum.second = t1.second + t2.second
12783
return sum
12784
\end{verbatim}
12785
%
12786
The function creates a new {\tt Time} object, initializes its
12787
attributes, and returns a reference to the new object. This is called
12788
a {\bf pure function} because it does not modify any of the objects
12789
passed to it as arguments and it has no effect,
12790
like displaying a value or getting user input,
12791
other than returning a value.
12792
\index{pure function}
12793
\index{function type!pure}
12794
12795
To test this function, I'll create two Time objects: {\tt start}
12796
contains the start time of a movie, like {\em Monty Python and the
12797
Holy Grail}, and {\tt duration} contains the run time of the movie,
12798
which is one hour 35 minutes.
12799
\index{Monty Python and the Holy Grail}
12800
12801
\verb"add_time" figures out when the movie will be done.
12802
12803
\begin{verbatim}
12804
>>> start = Time()
12805
>>> start.hour = 9
12806
>>> start.minute = 45
12807
>>> start.second = 0
12808
12809
>>> duration = Time()
12810
>>> duration.hour = 1
12811
>>> duration.minute = 35
12812
>>> duration.second = 0
12813
12814
>>> done = add_time(start, duration)
12815
>>> print_time(done)
12816
10:80:00
12817
\end{verbatim}
12818
%
12819
The result, {\tt 10:80:00} might not be what you were hoping
12820
for. The problem is that this function does not deal with cases where the
12821
number of seconds or minutes adds up to more than sixty. When that
12822
happens, we have to ``carry'' the extra seconds into the minute column
12823
or the extra minutes into the hour column.
12824
\index{carrying, addition with}
12825
12826
Here's an improved version:
12827
12828
\begin{verbatim}
12829
def add_time(t1, t2):
12830
sum = Time()
12831
sum.hour = t1.hour + t2.hour
12832
sum.minute = t1.minute + t2.minute
12833
sum.second = t1.second + t2.second
12834
12835
if sum.second >= 60:
12836
sum.second -= 60
12837
sum.minute += 1
12838
12839
if sum.minute >= 60:
12840
sum.minute -= 60
12841
sum.hour += 1
12842
12843
return sum
12844
\end{verbatim}
12845
%
12846
Although this function is correct, it is starting to get big.
12847
We will see a shorter alternative later.
12848
12849
12850
\section{Modifiers}
12851
\label{increment}
12852
\index{modifier}
12853
\index{function type!modifier}
12854
12855
Sometimes it is useful for a function to modify the objects it gets as
12856
parameters. In that case, the changes are visible to the caller.
12857
Functions that work this way are called {\bf modifiers}.
12858
\index{increment}
12859
12860
{\tt increment}, which adds a given number of seconds to a {\tt Time}
12861
object, can be written naturally as a
12862
modifier. Here is a rough draft:
12863
12864
\begin{verbatim}
12865
def increment(time, seconds):
12866
time.second += seconds
12867
12868
if time.second >= 60:
12869
time.second -= 60
12870
time.minute += 1
12871
12872
if time.minute >= 60:
12873
time.minute -= 60
12874
time.hour += 1
12875
\end{verbatim}
12876
%
12877
The first line performs the basic operation; the remainder deals
12878
with the special cases we saw before.
12879
\index{special case}
12880
12881
Is this function correct? What happens if {\tt seconds}
12882
is much greater than sixty?
12883
12884
In that case, it is not enough to carry once; we have to keep doing it
12885
until {\tt time.second} is less than sixty. One solution is to
12886
replace the {\tt if} statements with {\tt while} statements. That
12887
would make the function correct, but not very efficient. As an
12888
exercise, write a correct version of {\tt increment} that doesn't
12889
contain any loops.
12890
12891
Anything that can be done with modifiers can also be done with pure
12892
functions. In fact, some programming languages only allow pure
12893
functions. There is some evidence that programs that use pure
12894
functions are faster to develop and less error-prone than programs
12895
that use modifiers. But modifiers are convenient at times,
12896
and functional programs tend to be less efficient.
12897
12898
In general, I recommend that you write pure functions whenever it is
12899
reasonable and resort to modifiers only if there is a compelling
12900
advantage. This approach might be called a {\bf functional
12901
programming style}.
12902
\index{functional programming style}
12903
12904
As an exercise, write a ``pure'' version of {\tt increment} that
12905
creates and returns a new Time object rather than modifying the
12906
parameter.
12907
12908
12909
\section{Prototyping versus planning}
12910
\label{prototype}
12911
\index{prototype and patch}
12912
\index{development plan!prototype and patch}
12913
\index{planned development}
12914
\index{development plan!designed}
12915
12916
The development plan I am demonstrating is called ``prototype and
12917
patch''. For each function, I wrote a prototype that performed the
12918
basic calculation and then tested it, patching errors along the
12919
way.
12920
12921
This approach can be effective, especially if you don't yet have a
12922
deep understanding of the problem. But incremental corrections can
12923
generate code that is unnecessarily complicated---since it deals with
12924
many special cases---and unreliable---since it is hard to know if you
12925
have found all the errors.
12926
12927
An alternative is {\bf designed development}, in which high-level
12928
insight into the problem can make the programming much easier. In
12929
this case, the insight is that a Time object is really a three-digit
12930
number in base 60 (see \url{http://en.wikipedia.org/wiki/Sexagesimal}.)! The
12931
{\tt second} attribute is the ``ones column'', the {\tt minute}
12932
attribute is the ``sixties column'', and the {\tt hour} attribute is
12933
the ``thirty-six hundreds column''.
12934
\index{sexagesimal}
12935
12936
When we wrote \verb"add_time" and {\tt increment}, we were effectively
12937
doing addition in base 60, which is why we had to carry from one
12938
column to the next.
12939
\index{carrying, addition with}
12940
12941
This observation suggests another approach to the whole problem---we
12942
can convert Time objects to integers and take advantage of the fact
12943
that the computer knows how to do integer arithmetic.
12944
12945
Here is a function that converts Times to integers:
12946
12947
\begin{verbatim}
12948
def time_to_int(time):
12949
minutes = time.hour * 60 + time.minute
12950
seconds = minutes * 60 + time.second
12951
return seconds
12952
\end{verbatim}
12953
%
12954
And here is a function that converts an integer to a Time
12955
(recall that {\tt divmod} divides the first argument by the second
12956
and returns the quotient and remainder as a tuple).
12957
\index{divmod}
12958
12959
\begin{verbatim}
12960
def int_to_time(seconds):
12961
time = Time()
12962
minutes, time.second = divmod(seconds, 60)
12963
time.hour, time.minute = divmod(minutes, 60)
12964
return time
12965
\end{verbatim}
12966
%
12967
You might have to think a bit, and run some tests, to convince
12968
yourself that these functions are correct. One way to test them is to
12969
check that \verb"time_to_int(int_to_time(x)) == x" for many values of
12970
{\tt x}. This is an example of a consistency check.
12971
\index{consistency check}
12972
12973
Once you are convinced they are correct, you can use them to
12974
rewrite \verb"add_time":
12975
12976
\begin{verbatim}
12977
def add_time(t1, t2):
12978
seconds = time_to_int(t1) + time_to_int(t2)
12979
return int_to_time(seconds)
12980
\end{verbatim}
12981
%
12982
This version is shorter than the original, and easier to verify. As
12983
an exercise, rewrite {\tt increment} using \verb"time_to_int" and
12984
\verb"int_to_time".
12985
12986
In some ways, converting from base 60 to base 10 and back is harder
12987
than just dealing with times. Base conversion is more abstract; our
12988
intuition for dealing with time values is better.
12989
12990
But if we have the insight to treat times as base 60 numbers and make
12991
the investment of writing the conversion functions (\verb"time_to_int"
12992
and \verb"int_to_time"), we get a program that is shorter, easier to
12993
read and debug, and more reliable.
12994
12995
It is also easier to add features later. For example, imagine
12996
subtracting two Times to find the duration between them. The
12997
naive approach would be to implement subtraction with borrowing.
12998
Using the conversion functions would be easier and more likely to be
12999
correct.
13000
\index{subtraction with borrowing}
13001
\index{borrowing, subtraction with}
13002
\index{generalization}
13003
13004
Ironically, sometimes making a problem harder (or more general) makes it
13005
easier (because there are fewer special cases and fewer opportunities
13006
for error).
13007
13008
13009
\section{Debugging}
13010
\index{debugging}
13011
13012
A Time object is well-formed if the values of {\tt minute} and {\tt
13013
second} are between 0 and 60 (including 0 but not 60) and if
13014
{\tt hour} is positive. {\tt hour} and {\tt minute} should be
13015
integral values, but we might allow {\tt second} to have a
13016
fraction part.
13017
\index{invariant}
13018
13019
Requirements like these are called {\bf invariants} because
13020
they should always be true. To put it a different way, if they
13021
are not true, something has gone wrong.
13022
13023
Writing code to check invariants can help detect errors
13024
and find their causes. For example, you might have a function
13025
like \verb"valid_time" that takes a Time object and returns
13026
{\tt False} if it violates an invariant:
13027
13028
\begin{verbatim}
13029
def valid_time(time):
13030
if time.hour < 0 or time.minute < 0 or time.second < 0:
13031
return False
13032
if time.minute >= 60 or time.second >= 60:
13033
return False
13034
return True
13035
\end{verbatim}
13036
%
13037
At the beginning of each function you could check the
13038
arguments to make sure they are valid:
13039
\index{raise statement}
13040
\index{statement!raise}
13041
13042
\begin{verbatim}
13043
def add_time(t1, t2):
13044
if not valid_time(t1) or not valid_time(t2):
13045
raise ValueError('invalid Time object in add_time')
13046
seconds = time_to_int(t1) + time_to_int(t2)
13047
return int_to_time(seconds)
13048
\end{verbatim}
13049
%
13050
Or you could use an {\bf assert statement}, which checks a given invariant
13051
and raises an exception if it fails:
13052
\index{assert statement}
13053
\index{statement!assert}
13054
13055
\begin{verbatim}
13056
def add_time(t1, t2):
13057
assert valid_time(t1) and valid_time(t2)
13058
seconds = time_to_int(t1) + time_to_int(t2)
13059
return int_to_time(seconds)
13060
\end{verbatim}
13061
%
13062
{\tt assert} statements are useful because they distinguish
13063
code that deals with normal conditions from code
13064
that checks for errors.
13065
13066
13067
\section{Glossary}
13068
13069
\begin{description}
13070
13071
\item[prototype and patch:] A development plan that involves
13072
writing a rough draft of a program, testing, and correcting errors as
13073
they are found.
13074
\index{prototype and patch}
13075
13076
\item[designed development:] A development plan that involves
13077
high-level insight into the problem and more planning than incremental
13078
development or prototype development.
13079
\index{designed development}
13080
13081
\item[pure function:] A function that does not modify any of the objects it
13082
receives as arguments. Most pure functions are fruitful.
13083
\index{pure function}
13084
13085
\item[modifier:] A function that changes one or more of the objects it
13086
receives as arguments. Most modifiers are void; that is, they
13087
return {\tt None}. \index{modifier}
13088
13089
\item[functional programming style:] A style of program design in which the
13090
majority of functions are pure.
13091
\index{functional programming style}
13092
13093
\item[invariant:] A condition that should always be true during the
13094
execution of a program.
13095
\index{invariant}
13096
13097
\item[assert statement:] A statement that check a condition and raises
13098
an exception if it fails.
13099
\index{assert statement}
13100
\index{statement!assert}
13101
13102
\end{description}
13103
13104
13105
\section{Exercises}
13106
13107
Code examples from this chapter are available from
13108
\url{http://thinkpython2.com/code/Time1.py}; solutions to the
13109
exercises are available from \url{http://thinkpython2.com/code/Time1_soln.py}.
13110
13111
\begin{exercise}
13112
13113
Write a function called \verb"mul_time" that takes a Time object
13114
and a number and returns a new Time object that contains
13115
the product of the original Time and the number.
13116
13117
Then use \verb"mul_time" to write a function that takes a Time
13118
object that represents the finishing time in a race, and a number
13119
that represents the distance, and returns a Time object that represents
13120
the average pace (time per mile).
13121
\index{running pace}
13122
13123
\end{exercise}
13124
13125
13126
\begin{exercise}
13127
\index{datetime module}
13128
\index{module!datetime}
13129
13130
The {\tt datetime} module provides {\tt time} objects
13131
that are similar to the Time objects in this chapter, but
13132
they provide a rich set of methods and operators. Read the
13133
documentation at \url{http://docs.python.org/3/library/datetime.html}.
13134
13135
\begin{enumerate}
13136
13137
\item Use the {\tt datetime} module to write a program that gets the
13138
current date and prints the day of the week.
13139
13140
\item Write a program that takes a birthday as input and prints the
13141
user's age and the number of days, hours, minutes and seconds until
13142
their next birthday.
13143
\index{birthday}
13144
13145
\item For two people born on different days, there is a day when one
13146
is twice as old as the other. That's their Double Day. Write a
13147
program that takes two birth dates and computes their Double Day.
13148
13149
\item For a little more challenge, write the more general version that
13150
computes the day when one person is $n$ times older than the other.
13151
\index{Double Day}
13152
13153
\end{enumerate}
13154
13155
Solution: \url{http://thinkpython2.com/code/double.py}
13156
13157
\end{exercise}
13158
13159
13160
\chapter{Classes and methods}
13161
13162
Although we are using some of Python's object-oriented features,
13163
the programs from the last two chapters are not really
13164
object-oriented because they don't represent the relationships
13165
between programmer-defined types and the functions that operate
13166
on them. The next step is to transform those functions into
13167
methods that make the relationships explicit.
13168
13169
Code examples from this chapter are available from
13170
\url{http://thinkpython2.com/code/Time2.py}, and solutions
13171
to the exercises are in \url{http://thinkpython2.com/code/Point2_soln.py}.
13172
13173
13174
\section{Object-oriented features}
13175
\index{object-oriented programming}
13176
13177
Python is an {\bf object-oriented programming language}, which means
13178
that it provides features that support object-oriented
13179
programming, which has these defining characteristics:
13180
13181
\begin{itemize}
13182
13183
\item Programs include class and method definitions.
13184
13185
\item Most of the computation is expressed in terms of operations on
13186
objects.
13187
13188
\item Objects often represent things
13189
in the real world, and methods often
13190
correspond to the ways things in the real world interact.
13191
13192
\end{itemize}
13193
13194
For example, the {\tt Time} class defined in Chapter~\ref{time}
13195
corresponds to the way people record the time of day, and the
13196
functions we defined correspond to the kinds of things people do with
13197
times. Similarly, the {\tt Point} and {\tt Rectangle} classes
13198
in Chapter~\ref{clobjects}
13199
correspond to the mathematical concepts of a point and a rectangle.
13200
13201
So far, we have not taken advantage of the features Python provides to
13202
support object-oriented programming. These
13203
features are not strictly necessary; most of them provide
13204
alternative syntax for things we have already done. But in many cases,
13205
the alternative is more concise and more accurately conveys the
13206
structure of the program.
13207
13208
For example, in {\tt Time1.py} there is no obvious
13209
connection between the class definition and the function definitions
13210
that follow. With some examination, it is apparent that every function
13211
takes at least one {\tt Time} object as an argument.
13212
\index{method}
13213
\index{function}
13214
13215
This observation is the motivation for {\bf methods}; a method is
13216
a function that is associated with a particular class.
13217
We have seen methods for strings, lists, dictionaries and tuples.
13218
In this chapter, we will define methods for programmer-defined types.
13219
\index{syntax}
13220
\index{semantics}
13221
\index{programmer-defined type}
13222
\index{type!programmer-defined}
13223
13224
Methods are semantically the same as functions, but there are
13225
two syntactic differences:
13226
13227
\begin{itemize}
13228
13229
\item Methods are defined inside a class definition in order
13230
to make the relationship between the class and the method explicit.
13231
13232
\item The syntax for invoking a method is different from the
13233
syntax for calling a function.
13234
13235
\end{itemize}
13236
13237
In the next few sections, we will take the functions from the previous
13238
two chapters and transform them into methods. This transformation is
13239
purely mechanical; you can do it by following a sequence of
13240
steps. If you are comfortable converting from one form to another,
13241
you will be able to choose the best form for whatever you are doing.
13242
13243
13244
\section{Printing objects}
13245
\index{object!printing}
13246
13247
In Chapter~\ref{time}, we defined a class named
13248
{\tt Time} and in Section~\ref{isafter}, you
13249
wrote a function named \verb"print_time":
13250
13251
\begin{verbatim}
13252
class Time:
13253
"""Represents the time of day."""
13254
13255
def print_time(time):
13256
print('%.2d:%.2d:%.2d' % (time.hour, time.minute, time.second))
13257
\end{verbatim}
13258
%
13259
To call this function, you have to pass a {\tt Time} object as an
13260
argument:
13261
13262
\begin{verbatim}
13263
>>> start = Time()
13264
>>> start.hour = 9
13265
>>> start.minute = 45
13266
>>> start.second = 00
13267
>>> print_time(start)
13268
09:45:00
13269
\end{verbatim}
13270
%
13271
To make \verb"print_time" a method, all we have to do is
13272
move the function definition inside the class definition. Notice
13273
the change in indentation.
13274
\index{indentation}
13275
13276
\begin{verbatim}
13277
class Time:
13278
def print_time(time):
13279
print('%.2d:%.2d:%.2d' % (time.hour, time.minute, time.second))
13280
\end{verbatim}
13281
%
13282
Now there are two ways to call \verb"print_time". The first
13283
(and less common) way is to use function syntax:
13284
\index{function syntax}
13285
\index{dot notation}
13286
13287
\begin{verbatim}
13288
>>> Time.print_time(start)
13289
09:45:00
13290
\end{verbatim}
13291
%
13292
In this use of dot notation, {\tt Time} is the name of the class,
13293
and \verb"print_time" is the name of the method. {\tt start} is
13294
passed as a parameter.
13295
13296
The second (and more concise) way is to use method syntax:
13297
\index{method syntax}
13298
13299
\begin{verbatim}
13300
>>> start.print_time()
13301
09:45:00
13302
\end{verbatim}
13303
%
13304
In this use of dot notation, \verb"print_time" is the name of the
13305
method (again), and {\tt start} is the object the method is
13306
invoked on, which is called the {\bf subject}. Just as the
13307
subject of a sentence is what the sentence is about, the subject
13308
of a method invocation is what the method is about.
13309
\index{subject}
13310
13311
Inside the method, the subject is assigned to the first
13312
parameter, so in this case {\tt start} is assigned
13313
to {\tt time}.
13314
\index{self (parameter name)}
13315
\index{parameter!self}
13316
13317
By convention, the first parameter of a method is
13318
called {\tt self}, so it would be more common to write
13319
\verb"print_time" like this:
13320
13321
\begin{verbatim}
13322
class Time:
13323
def print_time(self):
13324
print('%.2d:%.2d:%.2d' % (self.hour, self.minute, self.second))
13325
\end{verbatim}
13326
%
13327
The reason for this convention is an implicit metaphor:
13328
\index{metaphor, method invocation}
13329
13330
\begin{itemize}
13331
13332
\item The syntax for a function call, \verb"print_time(start)",
13333
suggests that the function is the active agent. It says something
13334
like, ``Hey \verb"print_time"! Here's an object for you to print.''
13335
13336
\item In object-oriented programming, the objects are the active
13337
agents. A method invocation like \verb"start.print_time()" says
13338
``Hey {\tt start}! Please print yourself.''
13339
13340
\end{itemize}
13341
13342
This change in perspective might be more polite, but it is not obvious
13343
that it is useful. In the examples we have seen so far, it may not
13344
be. But sometimes shifting responsibility from the functions onto the
13345
objects makes it possible to write more versatile functions (or
13346
methods), and makes it easier to maintain and reuse code.
13347
13348
As an exercise, rewrite \verb"time_to_int" (from
13349
Section~\ref{prototype}) as a method. You might be tempted to
13350
rewrite \verb"int_to_time" as a method, too, but that doesn't
13351
really make sense because there would be no object to invoke
13352
it on.
13353
13354
13355
\section{Another example}
13356
\index{increment}
13357
13358
Here's a version of {\tt increment} (from Section~\ref{increment})
13359
rewritten as a method:
13360
13361
\begin{verbatim}
13362
# inside class Time:
13363
13364
def increment(self, seconds):
13365
seconds += self.time_to_int()
13366
return int_to_time(seconds)
13367
\end{verbatim}
13368
%
13369
This version assumes that \verb"time_to_int" is written
13370
as a method. Also, note that
13371
it is a pure function, not a modifier.
13372
13373
Here's how you would invoke {\tt increment}:
13374
13375
\begin{verbatim}
13376
>>> start.print_time()
13377
09:45:00
13378
>>> end = start.increment(1337)
13379
>>> end.print_time()
13380
10:07:17
13381
\end{verbatim}
13382
%
13383
The subject, {\tt start}, gets assigned to the first parameter,
13384
{\tt self}. The argument, {\tt 1337}, gets assigned to the
13385
second parameter, {\tt seconds}.
13386
13387
This mechanism can be confusing, especially if you make an error.
13388
For example, if you invoke {\tt increment} with two arguments, you
13389
get:
13390
\index{exception!TypeError}
13391
\index{TypeError}
13392
13393
\begin{verbatim}
13394
>>> end = start.increment(1337, 460)
13395
TypeError: increment() takes 2 positional arguments but 3 were given
13396
\end{verbatim}
13397
%
13398
The error message is initially confusing, because there are
13399
only two arguments in parentheses. But the subject is also
13400
considered an argument, so all together that's three.
13401
13402
By the way, a {\bf positional argument} is an argument that
13403
doesn't have a parameter name; that is, it is not a keyword
13404
argument. In this function call:
13405
\index{positional argument}
13406
\index{argument!positional}
13407
13408
\begin{verbatim}
13409
sketch(parrot, cage, dead=True)
13410
\end{verbatim}
13411
13412
{\tt parrot} and {\tt cage} are positional, and {\tt dead} is
13413
a keyword argument.
13414
13415
13416
\section{A more complicated example}
13417
13418
Rewriting \verb"is_after" (from Section~\ref{isafter}) is slightly
13419
more complicated because it takes two Time objects as parameters. In
13420
this case it is conventional to name the first parameter {\tt self}
13421
and the second parameter {\tt other}: \index{other (parameter name)}
13422
\index{parameter!other}
13423
13424
\begin{verbatim}
13425
# inside class Time:
13426
13427
def is_after(self, other):
13428
return self.time_to_int() > other.time_to_int()
13429
\end{verbatim}
13430
%
13431
To use this method, you have to invoke it on one object and pass
13432
the other as an argument:
13433
13434
\begin{verbatim}
13435
>>> end.is_after(start)
13436
True
13437
\end{verbatim}
13438
%
13439
One nice thing about this syntax is that it almost reads
13440
like English: ``end is after start?''
13441
13442
13443
\section{The init method}
13444
\index{init method}
13445
\index{method!init}
13446
13447
The init method (short for ``initialization'') is
13448
a special method that gets invoked when an object is instantiated.
13449
Its full name is \verb"__init__" (two underscore characters,
13450
followed by {\tt init}, and then two more underscores). An
13451
init method for the {\tt Time} class might look like this:
13452
13453
\begin{verbatim}
13454
# inside class Time:
13455
13456
def __init__(self, hour=0, minute=0, second=0):
13457
self.hour = hour
13458
self.minute = minute
13459
self.second = second
13460
\end{verbatim}
13461
%
13462
It is common for the parameters of \verb"__init__"
13463
to have the same names as the attributes. The statement
13464
13465
\begin{verbatim}
13466
self.hour = hour
13467
\end{verbatim}
13468
%
13469
stores the value of the parameter {\tt hour} as an attribute
13470
of {\tt self}.
13471
\index{optional parameter}
13472
\index{parameter!optional}
13473
\index{default value}
13474
\index{override}
13475
13476
The parameters are optional, so if you call {\tt Time} with
13477
no arguments, you get the default values.
13478
13479
\begin{verbatim}
13480
>>> time = Time()
13481
>>> time.print_time()
13482
00:00:00
13483
\end{verbatim}
13484
%
13485
If you provide one argument, it overrides {\tt hour}:
13486
13487
\begin{verbatim}
13488
>>> time = Time (9)
13489
>>> time.print_time()
13490
09:00:00
13491
\end{verbatim}
13492
%
13493
If you provide two arguments, they override {\tt hour} and
13494
{\tt minute}.
13495
13496
\begin{verbatim}
13497
>>> time = Time(9, 45)
13498
>>> time.print_time()
13499
09:45:00
13500
\end{verbatim}
13501
%
13502
And if you provide three arguments, they override all three
13503
default values.
13504
13505
As an exercise, write an init method for the {\tt Point} class that takes
13506
{\tt x} and {\tt y} as optional parameters and assigns
13507
them to the corresponding attributes.
13508
\index{Point class}
13509
\index{class!Point}
13510
13511
13512
\section{The {\tt \_\_str\_\_} method}
13513
\index{str method@\_\_str\_\_ method}
13514
\index{method!\_\_str\_\_}
13515
13516
\verb"__str__" is a special method, like \verb"__init__",
13517
that is supposed to return a string representation of an object.
13518
\index{string representation}
13519
13520
For example, here is a {\tt str} method for Time objects:
13521
13522
\begin{verbatim}
13523
# inside class Time:
13524
13525
def __str__(self):
13526
return '%.2d:%.2d:%.2d' % (self.hour, self.minute, self.second)
13527
\end{verbatim}
13528
%
13529
When you {\tt print} an object, Python invokes the {\tt str} method:
13530
\index{print statement}
13531
\index{statement!print}
13532
13533
\begin{verbatim}
13534
>>> time = Time(9, 45)
13535
>>> print(time)
13536
09:45:00
13537
\end{verbatim}
13538
%
13539
When I write a new class, I almost always start by writing
13540
\verb"__init__", which makes it easier to instantiate objects, and
13541
\verb"__str__", which is useful for debugging.
13542
13543
As an exercise, write a {\tt str} method for the {\tt Point} class.
13544
Create a Point object and print it.
13545
13546
13547
\section{Operator overloading}
13548
\label{operator.overloading}
13549
13550
By defining other special methods, you can specify the behavior
13551
of operators on programmer-defined types. For example, if you define
13552
a method named \verb"__add__" for the {\tt Time} class, you can use the
13553
{\tt +} operator on Time objects.
13554
\index{programmer-defined type}
13555
\index{type!programmer-defined}
13556
13557
Here is what the definition might look like:
13558
\index{add method}
13559
\index{method!add}
13560
13561
\begin{verbatim}
13562
# inside class Time:
13563
13564
def __add__(self, other):
13565
seconds = self.time_to_int() + other.time_to_int()
13566
return int_to_time(seconds)
13567
\end{verbatim}
13568
%
13569
And here is how you could use it:
13570
13571
\begin{verbatim}
13572
>>> start = Time(9, 45)
13573
>>> duration = Time(1, 35)
13574
>>> print(start + duration)
13575
11:20:00
13576
\end{verbatim}
13577
%
13578
When you apply the {\tt +} operator to Time objects, Python invokes
13579
\verb"__add__". When you print the result, Python invokes
13580
\verb"__str__". So there is a lot happening behind the scenes!
13581
\index{operator overloading}
13582
13583
Changing the behavior of an operator so that it works with
13584
programmer-defined types is called {\bf operator overloading}. For every
13585
operator in Python there is a corresponding special method, like
13586
\verb"__add__". For more details, see
13587
\url{http://docs.python.org/3/reference/datamodel.html#specialnames}.
13588
13589
As an exercise, write an {\tt add} method for the Point class.
13590
13591
13592
\section{Type-based dispatch}
13593
13594
In the previous section we added two Time objects, but you
13595
also might want to add an integer to a Time object. The
13596
following is a version of \verb"__add__"
13597
that checks the type of {\tt other} and invokes either
13598
\verb"add_time" or {\tt increment}:
13599
13600
\begin{verbatim}
13601
# inside class Time:
13602
13603
def __add__(self, other):
13604
if isinstance(other, Time):
13605
return self.add_time(other)
13606
else:
13607
return self.increment(other)
13608
13609
def add_time(self, other):
13610
seconds = self.time_to_int() + other.time_to_int()
13611
return int_to_time(seconds)
13612
13613
def increment(self, seconds):
13614
seconds += self.time_to_int()
13615
return int_to_time(seconds)
13616
\end{verbatim}
13617
%
13618
The built-in function {\tt isinstance} takes a value and a
13619
class object, and returns {\tt True} if the value is an instance
13620
of the class.
13621
\index{isinstance function}
13622
\index{function!isinstance}
13623
13624
If {\tt other} is a Time object, \verb"__add__" invokes
13625
\verb"add_time". Otherwise it assumes that the parameter
13626
is a number and invokes {\tt increment}. This operation is
13627
called a {\bf type-based dispatch} because it dispatches the
13628
computation to different methods based on the type of the
13629
arguments.
13630
\index{type-based dispatch}
13631
\index{dispatch, type-based}
13632
13633
Here are examples that use the {\tt +} operator with different
13634
types:
13635
13636
\begin{verbatim}
13637
>>> start = Time(9, 45)
13638
>>> duration = Time(1, 35)
13639
>>> print(start + duration)
13640
11:20:00
13641
>>> print(start + 1337)
13642
10:07:17
13643
\end{verbatim}
13644
%
13645
Unfortunately, this implementation of addition is not commutative.
13646
If the integer is the first operand, you get
13647
\index{commutativity}
13648
13649
\begin{verbatim}
13650
>>> print(1337 + start)
13651
TypeError: unsupported operand type(s) for +: 'int' and 'instance'
13652
\end{verbatim}
13653
%
13654
The problem is, instead of asking the Time object to add an integer,
13655
Python is asking an integer to add a Time object, and it doesn't know
13656
how. But there is a clever solution for this problem: the
13657
special method \verb"__radd__", which stands for ``right-side add''.
13658
This method is invoked when a Time object appears on the right side of
13659
the {\tt +} operator. Here's the definition:
13660
\index{radd method}
13661
\index{method!radd}
13662
13663
\begin{verbatim}
13664
# inside class Time:
13665
13666
def __radd__(self, other):
13667
return self.__add__(other)
13668
\end{verbatim}
13669
%
13670
And here's how it's used:
13671
13672
\begin{verbatim}
13673
>>> print(1337 + start)
13674
10:07:17
13675
\end{verbatim}
13676
%
13677
13678
As an exercise, write an {\tt add} method for Points that works with
13679
either a Point object or a tuple:
13680
13681
\begin{itemize}
13682
13683
\item If the second operand is a Point, the method should return a new
13684
Point whose $x$ coordinate is the sum of the $x$ coordinates of the
13685
operands, and likewise for the $y$ coordinates.
13686
13687
\item If the second operand is a tuple, the method should add the
13688
first element of the tuple to the $x$ coordinate and the second
13689
element to the $y$ coordinate, and return a new Point with the result.
13690
13691
\end{itemize}
13692
13693
13694
13695
13696
\section{Polymorphism}
13697
\label{polymorphism}
13698
13699
Type-based dispatch is useful when it is necessary, but (fortunately)
13700
it is not always necessary. Often you can avoid it by writing functions
13701
that work correctly for arguments with different types.
13702
\index{type-based dispatch}
13703
\index{dispatch!type-based}
13704
13705
Many of the functions we wrote for strings also
13706
work for other sequence types.
13707
For example, in Section~\ref{histogram}
13708
we used {\tt histogram} to count the number of times each letter
13709
appears in a word.
13710
13711
\begin{verbatim}
13712
def histogram(s):
13713
d = dict()
13714
for c in s:
13715
if c not in d:
13716
d[c] = 1
13717
else:
13718
d[c] = d[c]+1
13719
return d
13720
\end{verbatim}
13721
%
13722
This function also works for lists, tuples, and even dictionaries,
13723
as long as the elements of {\tt s} are hashable, so they can be used
13724
as keys in {\tt d}.
13725
13726
\begin{verbatim}
13727
>>> t = ['spam', 'egg', 'spam', 'spam', 'bacon', 'spam']
13728
>>> histogram(t)
13729
{'bacon': 1, 'egg': 1, 'spam': 4}
13730
\end{verbatim}
13731
%
13732
Functions that work with several types are called {\bf polymorphic}.
13733
Polymorphism can facilitate code reuse. For example, the built-in
13734
function {\tt sum}, which adds the elements of a sequence, works
13735
as long as the elements of the sequence support addition.
13736
\index{polymorphism}
13737
13738
Since Time objects provide an {\tt add} method, they work
13739
with {\tt sum}:
13740
13741
\begin{verbatim}
13742
>>> t1 = Time(7, 43)
13743
>>> t2 = Time(7, 41)
13744
>>> t3 = Time(7, 37)
13745
>>> total = sum([t1, t2, t3])
13746
>>> print(total)
13747
23:01:00
13748
\end{verbatim}
13749
%
13750
In general, if all of the operations inside a function
13751
work with a given type, the function works with that type.
13752
13753
The best kind of polymorphism is the unintentional kind, where
13754
you discover that a function you already wrote can be
13755
applied to a type you never planned for.
13756
13757
13758
\section{Debugging}
13759
\index{debugging}
13760
13761
It is legal to add attributes to objects at any point in the execution
13762
of a program, but if you have objects with the same type that don't
13763
have the same attributes, it is easy to make mistakes.
13764
It is considered a good idea to
13765
initialize all of an object's attributes in the init method.
13766
\index{init method}
13767
\index{attribute!initializing}
13768
13769
If you are not sure whether an object has a particular attribute, you
13770
can use the built-in function {\tt hasattr} (see Section~\ref{hasattr}).
13771
\index{hasattr function}
13772
\index{function!hasattr}
13773
\index{dict attribute@\_\_dict\_\_ attribute}
13774
\index{attribute!\_\_dict\_\_}
13775
13776
Another way to access attributes is the built-in function {\tt vars},
13777
which takes an object and returns a dictionary that maps from
13778
attribute names (as strings) to their values:
13779
13780
\begin{verbatim}
13781
>>> p = Point(3, 4)
13782
>>> vars(p)
13783
{'y': 4, 'x': 3}
13784
\end{verbatim}
13785
%
13786
For purposes of debugging, you might find it useful to keep this
13787
function handy:
13788
13789
\begin{verbatim}
13790
def print_attributes(obj):
13791
for attr in vars(obj):
13792
print(attr, getattr(obj, attr))
13793
\end{verbatim}
13794
%
13795
\verb"print_attributes" traverses the dictionary
13796
and prints each attribute name and its corresponding value.
13797
\index{traversal!dictionary}
13798
\index{dictionary!traversal}
13799
13800
The built-in function {\tt getattr} takes an object and an attribute
13801
name (as a string) and returns the attribute's value.
13802
\index{getattr function}
13803
\index{function!getattr}
13804
13805
13806
\section{Interface and implementation}
13807
13808
One of the goals of object-oriented design is to make software more
13809
maintainable, which means that you can keep the program working when
13810
other parts of the system change, and modify the program to meet new
13811
requirements.
13812
\index{interface}
13813
\index{implementation}
13814
\index{maintainable}
13815
\index{object-oriented design}
13816
13817
A design principle that helps achieve that goal is to keep
13818
interfaces separate from implementations. For objects, that means
13819
that the methods a class provides should not depend on how the
13820
attributes are represented.
13821
\index{attribute}
13822
13823
For example, in this chapter we developed a class that represents
13824
a time of day. Methods provided by this class include
13825
\verb"time_to_int", \verb"is_after", and \verb"add_time".
13826
13827
We could implement those methods in several ways. The details of the
13828
implementation depend on how we represent time. In this chapter, the
13829
attributes of a {\tt Time} object are {\tt hour}, {\tt minute}, and
13830
{\tt second}.
13831
13832
As an alternative, we could replace these attributes with
13833
a single integer representing the number of seconds
13834
since midnight. This implementation would make some methods,
13835
like \verb"is_after", easier to write, but it makes other methods
13836
harder.
13837
13838
After you deploy a new class, you might discover a better
13839
implementation. If other parts of the program are using your
13840
class, it might be time-consuming and error-prone to change the
13841
interface.
13842
13843
But if you designed the interface carefully, you can
13844
change the implementation without changing the interface, which
13845
means that other parts of the program don't have to change.
13846
13847
13848
\section{Glossary}
13849
13850
\begin{description}
13851
13852
\item[object-oriented language:] A language that provides features,
13853
such as programmer-defined types and methods, that facilitate
13854
object-oriented programming.
13855
\index{object-oriented language}
13856
13857
\item[object-oriented programming:] A style of programming in which
13858
data and the operations that manipulate it are organized into classes
13859
and methods.
13860
\index{object-oriented programming}
13861
13862
\item[method:] A function that is defined inside a class definition and
13863
is invoked on instances of that class.
13864
\index{method}
13865
13866
\item[subject:] The object a method is invoked on.
13867
\index{subject}
13868
13869
\item[positional argument:] An argument that does not include
13870
a parameter name, so it is not a keyword argument.
13871
\index{positional argument}
13872
\index{argument!positional}
13873
13874
\item[operator overloading:] Changing the behavior of an operator like
13875
{\tt +} so it works with a programmer-defined type.
13876
\index{overloading}
13877
\index{operator!overloading}
13878
13879
\item[type-based dispatch:] A programming pattern that checks the type
13880
of an operand and invokes different functions for different types.
13881
\index{type-based dispatch}
13882
13883
\item[polymorphic:] Pertaining to a function that can work with more
13884
than one type.
13885
\index{polymorphism}
13886
13887
\end{description}
13888
13889
13890
\section{Exercises}
13891
13892
\begin{exercise}
13893
13894
Download the code from this chapter from
13895
\url{http://thinkpython2.com/code/Time2.py}. Change the attributes of
13896
{\tt Time} to be a single integer representing seconds since
13897
midnight. Then modify the methods (and the function
13898
\verb"int_to_time") to work with the new implementation. You
13899
should not have to modify the test code in {\tt main}. When you
13900
are done, the output should be the same as before. Solution:
13901
\url{http://thinkpython2.com/code/Time2_soln.py}.
13902
13903
\end{exercise}
13904
13905
13906
\begin{exercise}
13907
\label{kangaroo}
13908
\index{default value!avoiding mutable}
13909
\index{mutable object, as default value}
13910
\index{worst bug}
13911
\index{bug!worst}
13912
\index{Kangaroo class}
13913
\index{class!Kangaroo}
13914
13915
This exercise is a cautionary tale about one of the most
13916
common, and difficult to find, errors in Python.
13917
Write a definition for a class named {\tt Kangaroo} with the following
13918
methods:
13919
13920
\begin{enumerate}
13921
13922
\item An \verb"__init__" method that initializes an attribute named
13923
\verb"pouch_contents" to an empty list.
13924
13925
\item A method named \verb"put_in_pouch" that takes an object
13926
of any type and adds it to \verb"pouch_contents".
13927
13928
\item A \verb"__str__" method that returns a string representation
13929
of the Kangaroo object and the contents of the pouch.
13930
13931
\end{enumerate}
13932
%
13933
Test your code
13934
by creating two {\tt Kangaroo} objects, assigning them to variables
13935
named {\tt kanga} and {\tt roo}, and then adding {\tt roo} to the
13936
contents of {\tt kanga}'s pouch.
13937
13938
Download \url{http://thinkpython2.com/code/BadKangaroo.py}. It contains
13939
a solution to the previous problem with one big, nasty bug.
13940
Find and fix the bug.
13941
13942
If you get stuck, you can download
13943
\url{http://thinkpython2.com/code/GoodKangaroo.py}, which explains the
13944
problem and demonstrates a solution.
13945
\index{aliasing}
13946
\index{embedded object}
13947
\index{object!embedded}
13948
13949
\end{exercise}
13950
13951
13952
13953
\chapter{Inheritance}
13954
13955
The language feature most often associated with object-oriented
13956
programming is {\bf inheritance}. Inheritance is the ability to
13957
define a new class that is a modified version of an existing class.
13958
In this chapter I demonstrate inheritance using classes that represent
13959
playing cards, decks of cards, and poker hands.
13960
\index{deck}
13961
\index{card, playing}
13962
\index{poker}
13963
13964
If you don't play
13965
poker, you can read about it at
13966
\url{http://en.wikipedia.org/wiki/Poker}, but you don't have to; I'll
13967
tell you what you need to know for the exercises.
13968
13969
Code examples from
13970
this chapter are available from
13971
\url{http://thinkpython2.com/code/Card.py}.
13972
13973
13974
\section{Card objects}
13975
13976
There are fifty-two cards in a deck, each of which belongs to one of
13977
four suits and one of thirteen ranks. The suits are Spades, Hearts,
13978
Diamonds, and Clubs (in descending order in bridge). The ranks are
13979
Ace, 2, 3, 4, 5, 6, 7, 8, 9, 10, Jack, Queen, and King. Depending on
13980
the game that you are playing, an Ace may be higher than King
13981
or lower than 2.
13982
\index{rank}
13983
\index{suit}
13984
13985
If we want to define a new object to represent a playing card, it is
13986
obvious what the attributes should be: {\tt rank} and
13987
{\tt suit}. It is not as obvious what type the attributes
13988
should be. One possibility is to use strings containing words like
13989
\verb"'Spade'" for suits and \verb"'Queen'" for ranks. One problem with
13990
this implementation is that it would not be easy to compare cards to
13991
see which had a higher rank or suit.
13992
\index{encode}
13993
\index{encrypt}
13994
\index{map to}
13995
\index{representation}
13996
13997
An alternative is to use integers to {\bf encode} the ranks and suits.
13998
In this context, ``encode'' means that we are going to define a mapping
13999
between numbers and suits, or between numbers and ranks. This
14000
kind of encoding is not meant to be a secret (that
14001
would be ``encryption'').
14002
14003
\newcommand{\mymapsto}{$\mapsto$}
14004
14005
For example, this table shows the suits and the corresponding integer
14006
codes:
14007
14008
\begin{tabular}{l c l}
14009
Spades & \mymapsto & 3 \\
14010
Hearts & \mymapsto & 2 \\
14011
Diamonds & \mymapsto & 1 \\
14012
Clubs & \mymapsto & 0
14013
\end{tabular}
14014
14015
This code makes it easy to compare cards; because higher suits map to
14016
higher numbers, we can compare suits by comparing their codes.
14017
14018
The mapping for ranks is fairly obvious; each of the numerical ranks
14019
maps to the corresponding integer, and for face cards:
14020
14021
\begin{tabular}{l c l}
14022
Jack & \mymapsto & 11 \\
14023
Queen & \mymapsto & 12 \\
14024
King & \mymapsto & 13 \\
14025
\end{tabular}
14026
14027
I am using the \mymapsto~symbol to make it clear that these mappings
14028
are not part of the Python program. They are part of the program
14029
design, but they don't appear explicitly in the code.
14030
\index{Card class}
14031
\index{class!Card}
14032
14033
The class definition for {\tt Card} looks like this:
14034
14035
\begin{verbatim}
14036
class Card:
14037
"""Represents a standard playing card."""
14038
14039
def __init__(self, suit=0, rank=2):
14040
self.suit = suit
14041
self.rank = rank
14042
\end{verbatim}
14043
%
14044
As usual, the init method takes an optional
14045
parameter for each attribute. The default card is
14046
the 2 of Clubs.
14047
\index{init method}
14048
\index{method!init}
14049
14050
To create a Card, you call {\tt Card} with the
14051
suit and rank of the card you want.
14052
14053
\begin{verbatim}
14054
queen_of_diamonds = Card(1, 12)
14055
\end{verbatim}
14056
%
14057
14058
14059
\section{Class attributes}
14060
\label{class.attribute}
14061
\index{class attribute}
14062
\index{attribute!class}
14063
14064
In order to print Card objects in a way that people can easily
14065
read, we need a mapping from the integer codes to the corresponding
14066
ranks and suits. A natural way to
14067
do that is with lists of strings. We assign these lists to {\bf class
14068
attributes}:
14069
14070
\begin{verbatim}
14071
# inside class Card:
14072
14073
suit_names = ['Clubs', 'Diamonds', 'Hearts', 'Spades']
14074
rank_names = [None, 'Ace', '2', '3', '4', '5', '6', '7',
14075
'8', '9', '10', 'Jack', 'Queen', 'King']
14076
14077
def __str__(self):
14078
return '%s of %s' % (Card.rank_names[self.rank],
14079
Card.suit_names[self.suit])
14080
\end{verbatim}
14081
%
14082
Variables like \verb"suit_names" and \verb"rank_names", which are
14083
defined inside a class but outside of any method, are called
14084
class attributes because they are associated with the class object
14085
{\tt Card}.
14086
\index{instance attribute}
14087
\index{attribute!instance}
14088
14089
This term distinguishes them from variables like {\tt suit} and {\tt
14090
rank}, which are called {\bf instance attributes} because they are
14091
associated with a particular instance.
14092
\index{dot notation}
14093
14094
Both kinds of attribute are accessed using dot notation. For
14095
example, in \verb"__str__", {\tt self} is a Card object,
14096
and {\tt self.rank} is its rank. Similarly, {\tt Card}
14097
is a class object, and \verb"Card.rank_names" is a
14098
list of strings associated with the class.
14099
14100
Every card has its own {\tt suit} and {\tt rank}, but there
14101
is only one copy of \verb"suit_names" and \verb"rank_names".
14102
14103
Putting it all together, the expression
14104
\verb"Card.rank_names[self.rank]" means ``use the attribute {\tt rank}
14105
from the object {\tt self} as an index into the list \verb"rank_names"
14106
from the class {\tt Card}, and select the appropriate string.''
14107
14108
The first element of \verb"rank_names" is {\tt None} because there
14109
is no card with rank zero. By including {\tt None} as a place-keeper,
14110
we get a mapping with the nice property that the index 2 maps to the
14111
string \verb"'2'", and so on. To avoid this tweak, we could have
14112
used a dictionary instead of a list.
14113
14114
With the methods we have so far, we can create and print cards:
14115
14116
\begin{verbatim}
14117
>>> card1 = Card(2, 11)
14118
>>> print(card1)
14119
Jack of Hearts
14120
\end{verbatim}
14121
14122
\begin{figure}
14123
\centerline
14124
{\includegraphics[scale=0.8]{figs/card1.pdf}}
14125
\caption{Object diagram.}
14126
\label{fig.card1}
14127
\end{figure}
14128
14129
Figure~\ref{fig.card1} is a diagram of the {\tt Card} class object and
14130
one Card instance. {\tt Card} is a class object; its type is {\tt
14131
type}. {\tt card1} is an instance of {\tt Card}, so its type is
14132
{\tt Card}. To save space, I didn't draw the contents of
14133
\verb"suit_names" and \verb"rank_names". \index{state diagram}
14134
\index{diagram!state} \index{object diagram} \index{diagram!object}
14135
14136
14137
\section{Comparing cards}
14138
\label{comparecard}
14139
\index{operator!relational}
14140
\index{relational operator}
14141
14142
For built-in types, there are relational operators
14143
({\tt <}, {\tt >}, {\tt ==}, etc.)
14144
that compare
14145
values and determine when one is greater than, less than, or equal to
14146
another. For programmer-defined types, we can override the behavior of
14147
the built-in operators by providing a method named
14148
\verb"__lt__", which stands for ``less than''.
14149
\index{programmer-defined type}
14150
\index{type!programmer-defined}
14151
14152
\verb"__lt__" takes two parameters, {\tt self} and {\tt other},
14153
and returns {\tt True} if {\tt self} is strictly less than {\tt other}.
14154
\index{override}
14155
\index{operator overloading}
14156
14157
The correct ordering for cards is not obvious.
14158
For example, which
14159
is better, the 3 of Clubs or the 2 of Diamonds? One has a higher
14160
rank, but the other has a higher suit. In order to compare
14161
cards, you have to decide whether rank or suit is more important.
14162
14163
The answer might depend on what game you are playing, but to keep
14164
things simple, we'll make the arbitrary choice that suit is more
14165
important, so all of the Spades outrank all of the Diamonds,
14166
and so on.
14167
\index{cmp method@\_\_cmp\_\_ method}
14168
\index{method!\_\_cmp\_\_}
14169
14170
With that decided, we can write \verb"__lt__":
14171
14172
\begin{verbatim}
14173
# inside class Card:
14174
14175
def __lt__(self, other):
14176
# check the suits
14177
if self.suit < other.suit: return True
14178
if self.suit > other.suit: return False
14179
14180
# suits are the same... check ranks
14181
return self.rank < other.rank
14182
\end{verbatim}
14183
%
14184
You can write this more concisely using tuple comparison:
14185
\index{tuple!comparison}
14186
\index{comparison!tuple}
14187
14188
\begin{verbatim}
14189
# inside class Card:
14190
14191
def __lt__(self, other):
14192
t1 = self.suit, self.rank
14193
t2 = other.suit, other.rank
14194
return t1 < t2
14195
\end{verbatim}
14196
%
14197
As an exercise, write an \verb"__lt__" method for Time objects. You
14198
can use tuple comparison, but you also might consider
14199
comparing integers.
14200
14201
14202
\section{Decks}
14203
\index{list!of objects}
14204
\index{deck, playing cards}
14205
14206
Now that we have Cards, the next step is to define Decks. Since a
14207
deck is made up of cards, it is natural for each Deck to contain a
14208
list of cards as an attribute.
14209
\index{init method}
14210
\index{method!init}
14211
14212
The following is a class definition for {\tt Deck}. The
14213
init method creates the attribute {\tt cards} and generates
14214
the standard set of fifty-two cards:
14215
\index{composition}
14216
\index{loop!nested}
14217
\index{Deck class}
14218
\index{class!Deck}
14219
14220
\begin{verbatim}
14221
class Deck:
14222
14223
def __init__(self):
14224
self.cards = []
14225
for suit in range(4):
14226
for rank in range(1, 14):
14227
card = Card(suit, rank)
14228
self.cards.append(card)
14229
\end{verbatim}
14230
%
14231
The easiest way to populate the deck is with a nested loop. The outer
14232
loop enumerates the suits from 0 to 3. The inner loop enumerates the
14233
ranks from 1 to 13. Each iteration
14234
creates a new Card with the current suit and rank,
14235
and appends it to {\tt self.cards}.
14236
\index{append method}
14237
\index{method!append}
14238
14239
14240
\section{Printing the deck}
14241
\label{printdeck}
14242
\index{str method@\_\_str\_\_ method}
14243
\index{method!\_\_str\_\_}
14244
14245
Here is a \verb"__str__" method for {\tt Deck}:
14246
14247
\begin{verbatim}
14248
#inside class Deck:
14249
14250
def __str__(self):
14251
res = []
14252
for card in self.cards:
14253
res.append(str(card))
14254
return '\n'.join(res)
14255
\end{verbatim}
14256
%
14257
This method demonstrates an efficient way to accumulate a large
14258
string: building a list of strings and then using the string method
14259
{\tt join}. The built-in function {\tt str} invokes the
14260
\verb"__str__" method on each card and returns the string
14261
representation. \index{accumulator!string} \index{string!accumulator}
14262
\index{join method} \index{method!join} \index{newline}
14263
14264
Since we invoke {\tt join} on a newline character, the cards
14265
are separated by newlines. Here's what the result looks like:
14266
14267
\begin{verbatim}
14268
>>> deck = Deck()
14269
>>> print(deck)
14270
Ace of Clubs
14271
2 of Clubs
14272
3 of Clubs
14273
...
14274
10 of Spades
14275
Jack of Spades
14276
Queen of Spades
14277
King of Spades
14278
\end{verbatim}
14279
%
14280
Even though the result appears on 52 lines, it is
14281
one long string that contains newlines.
14282
14283
14284
\section{Add, remove, shuffle and sort}
14285
14286
To deal cards, we would like a method that
14287
removes a card from the deck and returns it.
14288
The list method {\tt pop} provides a convenient way to do that:
14289
\index{pop method}
14290
\index{method!pop}
14291
14292
\begin{verbatim}
14293
#inside class Deck:
14294
14295
def pop_card(self):
14296
return self.cards.pop()
14297
\end{verbatim}
14298
%
14299
Since {\tt pop} removes the {\em last} card in the list, we are
14300
dealing from the bottom of the deck.
14301
\index{append method}
14302
\index{method!append}
14303
14304
To add a card, we can use the list method {\tt append}:
14305
14306
\begin{verbatim}
14307
#inside class Deck:
14308
14309
def add_card(self, card):
14310
self.cards.append(card)
14311
\end{verbatim}
14312
%
14313
A method like this that uses another method without doing
14314
much work is sometimes called a {\bf veneer}. The metaphor
14315
comes from woodworking, where a veneer is a thin
14316
layer of good quality wood glued to the surface of a cheaper piece of
14317
wood to improve the appearance.
14318
\index{veneer}
14319
14320
In this case \verb"add_card" is a ``thin'' method that expresses
14321
a list operation in terms appropriate for decks. It
14322
improves the appearance, or interface, of the
14323
implementation.
14324
14325
As another example, we can write a Deck method named {\tt shuffle}
14326
using the function {\tt shuffle} from the {\tt random} module:
14327
\index{random module}
14328
\index{module!random}
14329
\index{shuffle function}
14330
\index{function!shuffle}
14331
14332
\begin{verbatim}
14333
# inside class Deck:
14334
14335
def shuffle(self):
14336
random.shuffle(self.cards)
14337
\end{verbatim}
14338
%
14339
Don't forget to import {\tt random}.
14340
14341
As an exercise, write a Deck method named {\tt sort} that uses the
14342
list method {\tt sort} to sort the cards in a {\tt Deck}. {\tt sort}
14343
uses the \verb"__lt__" method we defined to determine the order.
14344
\index{sort method} \index{method!sort}
14345
14346
14347
14348
\section{Inheritance}
14349
\index{inheritance}
14350
\index{object-oriented programming}
14351
14352
Inheritance is the ability to define a new class that is a modified
14353
version of an existing class. As an example, let's say we want a
14354
class to represent a ``hand'', that is, the cards held by one player.
14355
A hand is similar to a deck: both are made up of a collection of
14356
cards, and both require operations like adding and removing cards.
14357
14358
A hand is also different from a deck; there are operations we want for
14359
hands that don't make sense for a deck. For example, in poker we
14360
might compare two hands to see which one wins. In bridge, we might
14361
compute a score for a hand in order to make a bid.
14362
14363
This relationship between classes---similar, but different---lends
14364
itself to inheritance.
14365
To define a new class that inherits from an existing class,
14366
you put the name of the existing class in parentheses:
14367
\index{parentheses!parent class in}
14368
\index{parent class}
14369
\index{class!parent}
14370
\index{Hand class}
14371
\index{class!Hand}
14372
14373
\begin{verbatim}
14374
class Hand(Deck):
14375
"""Represents a hand of playing cards."""
14376
\end{verbatim}
14377
%
14378
This definition indicates that {\tt Hand} inherits from {\tt Deck};
14379
that means we can use methods like \verb"pop_card" and \verb"add_card"
14380
for Hands as well as Decks.
14381
14382
When a new class inherits from an existing one, the existing
14383
one is called the {\bf parent} and the new class is
14384
called the {\bf child}.
14385
\index{parent class}
14386
\index{child class}
14387
\index{class!child}
14388
14389
In this example, {\tt Hand} inherits \verb"__init__" from {\tt Deck},
14390
but it doesn't really do what we want: instead of populating the hand
14391
with 52 new cards, the init method for Hands should initialize {\tt
14392
cards} with an empty list. \index{override} \index{init method}
14393
\index{method!init}
14394
14395
If we provide an init method in the {\tt Hand} class, it overrides the
14396
one in the {\tt Deck} class:
14397
14398
\begin{verbatim}
14399
# inside class Hand:
14400
14401
def __init__(self, label=''):
14402
self.cards = []
14403
self.label = label
14404
\end{verbatim}
14405
%
14406
When you create a Hand, Python invokes this init method, not the
14407
one in {\tt Deck}.
14408
14409
\begin{verbatim}
14410
>>> hand = Hand('new hand')
14411
>>> hand.cards
14412
[]
14413
>>> hand.label
14414
'new hand'
14415
\end{verbatim}
14416
%
14417
The other methods are inherited from {\tt Deck}, so we can use
14418
\verb"pop_card" and \verb"add_card" to deal a card:
14419
14420
\begin{verbatim}
14421
>>> deck = Deck()
14422
>>> card = deck.pop_card()
14423
>>> hand.add_card(card)
14424
>>> print(hand)
14425
King of Spades
14426
\end{verbatim}
14427
%
14428
A natural next step is to encapsulate this code in a method
14429
called \verb"move_cards":
14430
\index{encapsulation}
14431
14432
\begin{verbatim}
14433
#inside class Deck:
14434
14435
def move_cards(self, hand, num):
14436
for i in range(num):
14437
hand.add_card(self.pop_card())
14438
\end{verbatim}
14439
%
14440
\verb"move_cards" takes two arguments, a Hand object and the number of
14441
cards to deal. It modifies both {\tt self} and {\tt hand}, and
14442
returns {\tt None}.
14443
14444
In some games, cards are moved from one hand to another,
14445
or from a hand back to the deck. You can use \verb"move_cards"
14446
for any of these operations: {\tt self} can be either a Deck
14447
or a Hand, and {\tt hand}, despite the name, can also be a {\tt Deck}.
14448
14449
Inheritance is a useful feature. Some programs that would be
14450
repetitive without inheritance can be written more elegantly
14451
with it. Inheritance can facilitate code reuse, since you can
14452
customize the behavior of parent classes without having to modify
14453
them. In some cases, the inheritance structure reflects the natural
14454
structure of the problem, which makes the design easier to
14455
understand.
14456
14457
On the other hand, inheritance can make programs difficult to read.
14458
When a method is invoked, it is sometimes not clear where to find its
14459
definition. The relevant code may be spread across several modules.
14460
Also, many of the things that can be done using inheritance can be
14461
done as well or better without it.
14462
14463
14464
\section{Class diagrams}
14465
\label{class.diagram}
14466
14467
So far we have seen stack diagrams, which show the state of
14468
a program, and object diagrams, which show the attributes
14469
of an object and their values. These diagrams represent a snapshot
14470
in the execution of a program, so they change as the program
14471
runs.
14472
14473
They are also highly detailed; for some purposes, too
14474
detailed. A class diagram is a more abstract representation
14475
of the structure of a program. Instead of showing individual
14476
objects, it shows classes and the relationships between them.
14477
14478
There are several kinds of relationship between classes:
14479
14480
\begin{itemize}
14481
14482
\item Objects in one class might contain references to objects
14483
in another class. For example, each Rectangle contains a reference
14484
to a Point, and each Deck contains references to many Cards.
14485
This kind of relationship is called {\bf HAS-A}, as in, ``a Rectangle
14486
has a Point.''
14487
14488
\item One class might inherit from another. This relationship
14489
is called {\bf IS-A}, as in, ``a Hand is a kind of a Deck.''
14490
14491
\item One class might depend on another in the sense that objects
14492
in one class take objects in the second class as parameters, or
14493
use objects in the second class as part of a computation. This
14494
kind of relationship is called a {\bf dependency}.
14495
14496
\end{itemize}
14497
\index{IS-A relationship}
14498
\index{HAS-A relationship}
14499
\index{class diagram}
14500
\index{diagram!class}
14501
14502
A {\bf class diagram} is a graphical representation of these
14503
relationships. For example, Figure~\ref{fig.class1} shows the
14504
relationships between {\tt Card}, {\tt Deck} and {\tt Hand}.
14505
14506
\begin{figure}
14507
\centerline
14508
{\includegraphics[scale=0.8]{figs/class1.pdf}}
14509
\caption{Class diagram.}
14510
\label{fig.class1}
14511
\end{figure}
14512
14513
The arrow with a hollow triangle head represents an IS-A
14514
relationship; in this case it indicates that Hand inherits
14515
from Deck.
14516
14517
The standard arrow head represents a HAS-A
14518
relationship; in this case a Deck has references to Card
14519
objects.
14520
\index{multiplicity (in class diagram)}
14521
14522
The star ({\tt *}) near the arrow head is a
14523
{\bf multiplicity}; it indicates how many Cards a Deck has.
14524
A multiplicity can be a simple number, like {\tt 52}, a range,
14525
like {\tt 5..7} or a star, which indicates that a Deck can
14526
have any number of Cards.
14527
14528
There are no dependencies in this diagram. They would normally
14529
be shown with a dashed arrow. Or if there are a lot of
14530
dependencies, they are sometimes omitted.
14531
14532
A more detailed diagram might show that a Deck actually
14533
contains a {\em list} of Cards, but built-in types
14534
like list and dict are usually not included in class diagrams.
14535
14536
14537
\section{Debugging}
14538
\index{debugging}
14539
14540
Inheritance can make debugging difficult because when you invoke a
14541
method on an object, it might be hard to figure out which method will
14542
be invoked.
14543
\index{inheritance}
14544
14545
Suppose you are writing a function that works with Hand objects.
14546
You would like it to work with all kinds of Hands, like
14547
PokerHands, BridgeHands, etc. If you invoke a method like
14548
{\tt shuffle}, you might get the one defined in {\tt Deck},
14549
but if any of the subclasses override this method, you'll
14550
get that version instead. This behavior is usually a good
14551
thing, but it can be confusing.
14552
14553
Any time you are unsure about the flow of execution through your
14554
program, the simplest solution is to add print statements at the
14555
beginning of the relevant methods. If {\tt Deck.shuffle} prints a
14556
message that says something like {\tt Running Deck.shuffle}, then as
14557
the program runs it traces the flow of execution.
14558
\index{flow of execution}
14559
14560
As an alternative, you could use this function, which takes an
14561
object and a method name (as a string) and returns the class that
14562
provides the definition of the method:
14563
14564
\begin{verbatim}
14565
def find_defining_class(obj, meth_name):
14566
for ty in type(obj).mro():
14567
if meth_name in ty.__dict__:
14568
return ty
14569
\end{verbatim}
14570
%
14571
Here's an example:
14572
14573
\begin{verbatim}
14574
>>> hand = Hand()
14575
>>> find_defining_class(hand, 'shuffle')
14576
<class '__main__.Deck'>
14577
\end{verbatim}
14578
%
14579
So the {\tt shuffle} method for this Hand is the one in {\tt Deck}.
14580
\index{mro method}
14581
\index{method!mro}
14582
\index{method resolution order}
14583
14584
\verb"find_defining_class" uses the {\tt mro} method to get the list
14585
of class objects (types) that will be searched for methods. ``MRO''
14586
stands for ``method resolution order'', which is the sequence of
14587
classes Python searches to ``resolve'' a method name.
14588
14589
Here's a design suggestion: when you override a method,
14590
the interface of the new method should be the same as the old. It
14591
should take the same parameters, return the same type, and obey the
14592
same preconditions and postconditions. If you follow this rule, you
14593
will find that any function designed to work with an instance of a
14594
parent class, like a Deck, will also work with instances of child
14595
classes like a Hand and PokerHand.
14596
\index{override}
14597
\index{interface}
14598
\index{precondition}
14599
\index{postcondition}
14600
14601
If you violate this rule, which is called the ``Liskov substitution
14602
principle'', your code will collapse like (sorry) a house of cards.
14603
\index{Liskov substitution principle}
14604
14605
14606
\section{Data encapsulation}
14607
14608
The previous chapters demonstrate a development plan we might call
14609
``object-oriented design''. We identified objects we needed---like
14610
{\tt Point}, {\tt Rectangle} and {\tt Time}---and defined classes to
14611
represent them. In each case there is an obvious correspondence
14612
between the object and some entity in the real world (or at least a
14613
mathematical world).
14614
\index{development plan!data encapsulation}
14615
14616
But sometimes it is less obvious what objects you need
14617
and how they should interact. In that case you need a different
14618
development plan. In the same way that we discovered function
14619
interfaces by encapsulation and generalization, we can discover
14620
class interfaces by {\bf data encapsulation}.
14621
\index{data encapsulation}
14622
14623
Markov analysis, from Section~\ref{markov}, provides a good example.
14624
If you download my code from \url{http://thinkpython2.com/code/markov.py},
14625
you'll see that it uses two global variables---\verb"suffix_map" and
14626
\verb"prefix"---that are read and written from several functions.
14627
14628
\begin{verbatim}
14629
suffix_map = {}
14630
prefix = ()
14631
\end{verbatim}
14632
14633
Because these variables are global, we can only run one analysis at a
14634
time. If we read two texts, their prefixes and suffixes would be
14635
added to the same data structures (which makes for some interesting
14636
generated text).
14637
14638
To run multiple analyses, and keep them separate, we can encapsulate
14639
the state of each analysis in an object.
14640
Here's what that looks like:
14641
14642
\begin{verbatim}
14643
class Markov:
14644
14645
def __init__(self):
14646
self.suffix_map = {}
14647
self.prefix = ()
14648
\end{verbatim}
14649
14650
Next, we transform the functions into methods. For example,
14651
here's \verb"process_word":
14652
14653
\begin{verbatim}
14654
def process_word(self, word, order=2):
14655
if len(self.prefix) < order:
14656
self.prefix += (word,)
14657
return
14658
14659
try:
14660
self.suffix_map[self.prefix].append(word)
14661
except KeyError:
14662
# if there is no entry for this prefix, make one
14663
self.suffix_map[self.prefix] = [word]
14664
14665
self.prefix = shift(self.prefix, word)
14666
\end{verbatim}
14667
14668
Transforming a program like this---changing the design without
14669
changing the behavior---is another example of refactoring
14670
(see Section~\ref{refactoring}).
14671
\index{refactoring}
14672
14673
This example suggests a development plan for designing objects and
14674
methods:
14675
14676
\begin{enumerate}
14677
14678
\item Start by writing functions that read and write global
14679
variables (when necessary).
14680
14681
\item Once you get the program working, look for associations
14682
between global variables and the functions that use them.
14683
14684
\item Encapsulate related variables as attributes of an object.
14685
14686
\item Transform the associated functions into methods of the new
14687
class.
14688
14689
\end{enumerate}
14690
14691
As an exercise, download my Markov code from
14692
\url{http://thinkpython2.com/code/markov.py}, and follow the steps
14693
described above to encapsulate the global variables as attributes of a
14694
new class called {\tt Markov}. Solution:
14695
\url{http://thinkpython2.com/code/markov2.py}.
14696
14697
14698
\section{Glossary}
14699
14700
\begin{description}
14701
14702
\item[encode:] To represent one set of values using another
14703
set of values by constructing a mapping between them.
14704
\index{encode}
14705
14706
\item[class attribute:] An attribute associated with a class
14707
object. Class attributes are defined inside
14708
a class definition but outside any method.
14709
\index{class attribute}
14710
\index{attribute!class}
14711
14712
\item[instance attribute:] An attribute associated with an
14713
instance of a class.
14714
\index{instance attribute}
14715
\index{attribute!instance}
14716
14717
\item[veneer:] A method or function that provides a different
14718
interface to another function without doing much computation.
14719
\index{veneer}
14720
14721
\item[inheritance:] The ability to define a new class that is a
14722
modified version of a previously defined class.
14723
\index{inheritance}
14724
14725
\item[parent class:] The class from which a child class inherits.
14726
\index{parent class}
14727
14728
\item[child class:] A new class created by inheriting from an
14729
existing class; also called a ``subclass''.
14730
\index{child class}
14731
\index{class!child}
14732
14733
\item[IS-A relationship:] A relationship between a child class
14734
and its parent class.
14735
\index{IS-A relationship}
14736
14737
\item[HAS-A relationship:] A relationship between two classes
14738
where instances of one class contain references to instances of
14739
the other.
14740
\index{HAS-A relationship}
14741
14742
\item[dependency:] A relationship between two classes
14743
where instances of one class use instances of the other class,
14744
but do not store them as attributes.
14745
\index{HAS-A relationship}
14746
14747
\item[class diagram:] A diagram that shows the classes in a program
14748
and the relationships between them.
14749
\index{class diagram}
14750
\index{diagram!class}
14751
14752
\item[multiplicity:] A notation in a class diagram that shows, for
14753
a HAS-A relationship, how many references there are to instances
14754
of another class.
14755
\index{multiplicity (in class diagram)}
14756
14757
\item[data encapsulation:] A program development plan that
14758
involves a prototype using global variables and a final version
14759
that makes the global variables into instance attributes.
14760
\index{data encapsulation}
14761
\index{development plan!data encapsulation}
14762
14763
\end{description}
14764
14765
14766
\section{Exercises}
14767
14768
\begin{exercise}
14769
For the following program, draw a UML class diagram that shows
14770
these classes and the relationships among them.
14771
14772
\begin{verbatim}
14773
class PingPongParent:
14774
pass
14775
14776
class Ping(PingPongParent):
14777
def __init__(self, pong):
14778
self.pong = pong
14779
14780
14781
class Pong(PingPongParent):
14782
def __init__(self, pings=None):
14783
if pings is None:
14784
self.pings = []
14785
else:
14786
self.pings = pings
14787
14788
def add_ping(self, ping):
14789
self.pings.append(ping)
14790
14791
pong = Pong()
14792
ping = Ping(pong)
14793
pong.add_ping(ping)
14794
\end{verbatim}
14795
14796
14797
\end{exercise}
14798
14799
14800
14801
\begin{exercise}
14802
Write a Deck method called \verb"deal_hands" that
14803
takes two parameters, the number of hands and the number of cards per
14804
hand. It should create the appropriate number of Hand objects, deal
14805
the appropriate number of cards per hand, and return a list of Hands.
14806
\end{exercise}
14807
14808
14809
\begin{exercise}
14810
\label{poker}
14811
14812
The following are the possible hands in poker, in increasing order
14813
of value and decreasing order of probability:
14814
\index{poker}
14815
14816
\begin{description}
14817
14818
\item[pair:] two cards with the same rank
14819
\vspace{-0.05in}
14820
14821
\item[two pair:] two pairs of cards with the same rank
14822
\vspace{-0.05in}
14823
14824
\item[three of a kind:] three cards with the same rank
14825
\vspace{-0.05in}
14826
14827
\item[straight:] five cards with ranks in sequence (aces can
14828
be high or low, so {\tt Ace-2-3-4-5} is a straight and so is {\tt
14829
10-Jack-Queen-King-Ace}, but {\tt Queen-King-Ace-2-3} is not.)
14830
\vspace{-0.05in}
14831
14832
\item[flush:] five cards with the same suit
14833
\vspace{-0.05in}
14834
14835
\item[full house:] three cards with one rank, two cards with another
14836
\vspace{-0.05in}
14837
14838
\item[four of a kind:] four cards with the same rank
14839
\vspace{-0.05in}
14840
14841
\item[straight flush:] five cards in sequence (as defined above) and
14842
with the same suit
14843
\vspace{-0.05in}
14844
14845
\end{description}
14846
%
14847
The goal of these exercises is to estimate
14848
the probability of drawing these various hands.
14849
14850
\begin{enumerate}
14851
14852
\item Download the following files from \url{http://thinkpython2.com/code}:
14853
14854
\begin{description}
14855
14856
\item[{\tt Card.py}]: A complete version of the {\tt Card},
14857
{\tt Deck} and {\tt Hand} classes in this chapter.
14858
14859
\item[{\tt PokerHand.py}]: An incomplete implementation of a class
14860
that represents a poker hand, and some code that tests it.
14861
14862
\end{description}
14863
%
14864
\item If you run {\tt PokerHand.py}, it deals seven 7-card poker hands
14865
and checks to see if any of them contains a flush. Read this
14866
code carefully before you go on.
14867
14868
\item Add methods to {\tt PokerHand.py} named \verb"has_pair",
14869
\verb"has_twopair", etc. that return True or False according to
14870
whether or not the hand meets the relevant criteria. Your code should
14871
work correctly for ``hands'' that contain any number of cards
14872
(although 5 and 7 are the most common sizes).
14873
14874
\item Write a method named {\tt classify} that figures out
14875
the highest-value classification for a hand and sets the
14876
{\tt label} attribute accordingly. For example, a 7-card hand
14877
might contain a flush and a pair; it should be labeled ``flush''.
14878
14879
\item When you are convinced that your classification methods are
14880
working, the next step is to estimate the probabilities of the various
14881
hands. Write a function in {\tt PokerHand.py} that shuffles a deck of
14882
cards, divides it into hands, classifies the hands, and counts the
14883
number of times various classifications appear.
14884
14885
\item Print a table of the classifications and their probabilities.
14886
Run your program with larger and larger numbers of hands until the
14887
output values converge to a reasonable degree of accuracy. Compare
14888
your results to the values at \url{http://en.wikipedia.org/wiki/Hand_rankings}.
14889
14890
\end{enumerate}
14891
14892
Solution: \url{http://thinkpython2.com/code/PokerHandSoln.py}.
14893
\end{exercise}
14894
14895
14896
\chapter{The Goodies}
14897
14898
One of my goals for this book has been to teach you as little Python
14899
as possible. When there were two ways to do something, I picked
14900
one and avoided mentioning the other. Or sometimes I put the second
14901
one into an exercise.
14902
14903
Now I want to go back for some of the good bits that got left behind.
14904
Python provides a number of features that are not really necessary---you
14905
can write good code without them---but with them you can sometimes
14906
write code that's more concise, readable or efficient, and sometimes
14907
all three.
14908
14909
% TODO: add the with statement
14910
14911
\section{Conditional expressions}
14912
14913
We saw conditional statements in Section~\ref{conditional.execution}.
14914
Conditional statements are often used to choose one of two values;
14915
for example:
14916
\index{conditional expression}
14917
\index{expression!conditional}
14918
14919
\begin{verbatim}
14920
if x > 0:
14921
y = math.log(x)
14922
else:
14923
y = float('nan')
14924
\end{verbatim}
14925
14926
This statement checks whether {\tt x} is positive. If so, it computes
14927
{\tt math.log}. If not, {\tt math.log} would raise a ValueError. To
14928
avoid stopping the program, we generate a ``NaN'', which is a special
14929
floating-point value that represents ``Not a Number''.
14930
\index{NaN}
14931
\index{floating-point}
14932
14933
We can write this statement more concisely using a {\bf conditional
14934
expression}:
14935
14936
\begin{verbatim}
14937
y = math.log(x) if x > 0 else float('nan')
14938
\end{verbatim}
14939
14940
You can almost read this line like English: ``{\tt y} gets log-{\tt x}
14941
if {\tt x} is greater than 0; otherwise it gets NaN''.
14942
14943
Recursive functions can sometimes be rewritten using conditional
14944
expressions. For example, here is a recursive version of {\tt factorial}:
14945
\index{factorial}
14946
\index{function!factorial}
14947
14948
\begin{verbatim}
14949
def factorial(n):
14950
if n == 0:
14951
return 1
14952
else:
14953
return n * factorial(n-1)
14954
\end{verbatim}
14955
14956
We can rewrite it like this:
14957
14958
\begin{verbatim}
14959
def factorial(n):
14960
return 1 if n == 0 else n * factorial(n-1)
14961
\end{verbatim}
14962
14963
Another use of conditional expressions is handling optional
14964
arguments. For example, here is the init method from
14965
{\tt GoodKangaroo} (see Exercise~\ref{kangaroo}):
14966
\index{optional argument}
14967
\index{argument!optional}
14968
14969
\begin{verbatim}
14970
def __init__(self, name, contents=None):
14971
self.name = name
14972
if contents == None:
14973
contents = []
14974
self.pouch_contents = contents
14975
\end{verbatim}
14976
14977
We can rewrite this one like this:
14978
14979
\begin{verbatim}
14980
def __init__(self, name, contents=None):
14981
self.name = name
14982
self.pouch_contents = [] if contents == None else contents
14983
\end{verbatim}
14984
14985
In general, you can replace a conditional statement with a conditional
14986
expression if both branches contain simple expressions that are
14987
either returned or assigned to the same variable.
14988
\index{conditional statement}
14989
\index{statement!conditional}
14990
14991
14992
14993
\section{List comprehensions}
14994
14995
In Section~\ref{filter} we saw the map and filter patterns. For
14996
example, this function takes a list of strings, maps the string method
14997
{\tt capitalize} to the elements, and returns a new list of strings:
14998
14999
\begin{verbatim}
15000
def capitalize_all(t):
15001
res = []
15002
for s in t:
15003
res.append(s.capitalize())
15004
return res
15005
\end{verbatim}
15006
15007
We can write this more concisely using a {\bf list comprehension}:
15008
\index{list comprehension}
15009
15010
\begin{verbatim}
15011
def capitalize_all(t):
15012
return [s.capitalize() for s in t]
15013
\end{verbatim}
15014
15015
The bracket operators indicate that we are constructing a new
15016
list. The expression inside the brackets specifies the elements
15017
of the list, and the {\tt for} clause indicates what sequence
15018
we are traversing.
15019
\index{list}
15020
\index{for loop}
15021
15022
The syntax of a list comprehension is a little awkward because
15023
the loop variable, {\tt s} in this example, appears in the expression
15024
before we get to the definition.
15025
\index{loop variable}
15026
15027
List comprehensions can also be used for filtering. For example,
15028
this function selects only the elements of {\tt t} that are
15029
upper case, and returns a new list:
15030
\index{filter pattern}
15031
\index{pattern!filter}
15032
15033
\begin{verbatim}
15034
def only_upper(t):
15035
res = []
15036
for s in t:
15037
if s.isupper():
15038
res.append(s)
15039
return res
15040
\end{verbatim}
15041
15042
We can rewrite it using a list comprehension
15043
15044
\begin{verbatim}
15045
def only_upper(t):
15046
return [s for s in t if s.isupper()]
15047
\end{verbatim}
15048
15049
List comprehensions are concise and easy to read, at least for simple
15050
expressions. And they are usually faster than the equivalent for
15051
loops, sometimes much faster. So if you are mad at me for not
15052
mentioning them earlier, I understand.
15053
15054
But, in my defense, list comprehensions are harder to debug because
15055
you can't put a print statement inside the loop. I suggest that you
15056
use them only if the computation is simple enough that you are likely
15057
to get it right the first time. And for beginners that means never.
15058
\index{debugging}
15059
15060
15061
15062
\section{Generator expressions}
15063
15064
{\bf Generator expressions} are similar to list comprehensions, but
15065
with parentheses instead of square brackets:
15066
\index{generator expression}
15067
\index{expression!generator}
15068
15069
\begin{verbatim}
15070
>>> g = (x**2 for x in range(5))
15071
>>> g
15072
<generator object <genexpr> at 0x7f4c45a786c0>
15073
\end{verbatim}
15074
%
15075
The result is a generator object that knows how to iterate through
15076
a sequence of values. But unlike a list comprehension, it does not
15077
compute the values all at once; it waits to be asked. The built-in
15078
function {\tt next} gets the next value from the generator:
15079
\index{generator object}
15080
\index{object!generator}
15081
15082
\begin{verbatim}
15083
>>> next(g)
15084
0
15085
>>> next(g)
15086
1
15087
\end{verbatim}
15088
%
15089
When you get to the end of the sequence, {\tt next} raises a
15090
StopIteration exception. You can also use a {\tt for} loop to iterate
15091
through the values:
15092
\index{StopIteration}
15093
\index{exception!StopIteration}
15094
15095
\begin{verbatim}
15096
>>> for val in g:
15097
... print(val)
15098
4
15099
9
15100
16
15101
\end{verbatim}
15102
%
15103
The generator object keeps track of where it is in the sequence,
15104
so the {\tt for} loop picks up where {\tt next} left off. Once the
15105
generator is exhausted, it continues to raise {\tt StopIteration}:
15106
15107
\begin{verbatim}
15108
>>> next(g)
15109
StopIteration
15110
\end{verbatim}
15111
15112
Generator expressions are often used with functions like {\tt sum},
15113
{\tt max}, and {\tt min}:
15114
\index{sum}
15115
\index{function!sum}
15116
15117
\begin{verbatim}
15118
>>> sum(x**2 for x in range(5))
15119
30
15120
\end{verbatim}
15121
15122
15123
\section{{\tt any} and {\tt all}}
15124
15125
Python provides a built-in function, {\tt any}, that takes a sequence
15126
of boolean values and returns {\tt True} if any of the values are {\tt
15127
True}. It works on lists:
15128
\index{any}
15129
\index{built-in function!any}
15130
15131
\begin{verbatim}
15132
>>> any([False, False, True])
15133
True
15134
\end{verbatim}
15135
%
15136
But it is often used with generator expressions:
15137
\index{generator expression}
15138
\index{expression!generator}
15139
15140
\begin{verbatim}
15141
>>> any(letter == 't' for letter in 'monty')
15142
True
15143
\end{verbatim}
15144
%
15145
That example isn't very useful because it does the same thing
15146
as the {\tt in} operator. But we could use {\tt any} to rewrite
15147
some of the search functions we wrote in Section~\ref{search}. For
15148
example, we could write {\tt avoids} like this:
15149
\index{search pattern}
15150
\index{pattern!search}
15151
15152
\begin{verbatim}
15153
def avoids(word, forbidden):
15154
return not any(letter in forbidden for letter in word)
15155
\end{verbatim}
15156
%
15157
The function almost reads like English, ``{\tt word} avoids
15158
{\tt forbidden} if there are not any forbidden letters in {\tt word}.''
15159
15160
Using {\tt any} with a generator expression is efficient because
15161
it stops immediately if it finds a {\tt True} value,
15162
so it doesn't have to evaluate the whole sequence.
15163
15164
Python provides another built-in function, {\tt all}, that returns
15165
{\tt True} if every element of the sequence is {\tt True}. As
15166
an exercise, use {\tt all} to re-write \verb"uses_all" from
15167
Section~\ref{search}.
15168
\index{all}
15169
\index{built-in function!any}
15170
15171
15172
\section{Sets}
15173
\label{sets}
15174
15175
In Section~\ref{dictsub} I use dictionaries to find the words
15176
that appear in a document but not in a word list. The function
15177
I wrote takes {\tt d1}, which contains the words from the document
15178
as keys, and {\tt d2}, which contains the list of words. It
15179
returns a dictionary that contains the keys from {\tt d1} that
15180
are not in {\tt d2}.
15181
15182
\begin{verbatim}
15183
def subtract(d1, d2):
15184
res = dict()
15185
for key in d1:
15186
if key not in d2:
15187
res[key] = None
15188
return res
15189
\end{verbatim}
15190
%
15191
In all of these dictionaries, the values are {\tt None} because
15192
we never use them. As a result, we waste some storage space.
15193
\index{dictionary subtraction}
15194
15195
Python provides another built-in type, called a {\tt set}, that
15196
behaves like a collection of dictionary keys with no values. Adding
15197
elements to a set is fast; so is checking membership. And sets
15198
provide methods and operators to compute common set operations.
15199
\index{set}
15200
\index{object!set}
15201
15202
For example, set subtraction is available as a method called
15203
{\tt difference} or as an operator, {\tt -}. So we can rewrite
15204
{\tt subtract} like this:
15205
\index{set subtraction}
15206
15207
\begin{verbatim}
15208
def subtract(d1, d2):
15209
return set(d1) - set(d2)
15210
\end{verbatim}
15211
%
15212
The result is a set instead of a dictionary, but for operations like
15213
iteration, the behavior is the same.
15214
15215
Some of the exercises in this book can be done concisely and
15216
efficiently with sets. For example, here is a solution to
15217
\verb"has_duplicates", from
15218
Exercise~\ref{duplicate}, that uses a dictionary:
15219
15220
\begin{verbatim}
15221
def has_duplicates(t):
15222
d = {}
15223
for x in t:
15224
if x in d:
15225
return True
15226
d[x] = True
15227
return False
15228
\end{verbatim}
15229
15230
When an element appears for the first time, it is added to the
15231
dictionary. If the same element appears again, the function returns
15232
{\tt True}.
15233
15234
Using sets, we can write the same function like this:
15235
15236
\begin{verbatim}
15237
def has_duplicates(t):
15238
return len(set(t)) < len(t)
15239
\end{verbatim}
15240
%
15241
An element can only appear in a set once, so if an element in {\tt t}
15242
appears more than once, the set will be smaller than {\tt t}. If there
15243
are no duplicates, the set will be the same size as {\tt t}.
15244
\index{duplicate}
15245
15246
We can also use sets to do some of the exercises in
15247
Chapter~\ref{wordplay}. For example, here's a version of
15248
\verb"uses_only" with a loop:
15249
15250
\begin{verbatim}
15251
def uses_only(word, available):
15252
for letter in word:
15253
if letter not in available:
15254
return False
15255
return True
15256
\end{verbatim}
15257
%
15258
\verb"uses_only" checks whether all letters in {\tt word} are
15259
in {\tt available}. We can rewrite it like this:
15260
15261
\begin{verbatim}
15262
def uses_only(word, available):
15263
return set(word) <= set(available)
15264
\end{verbatim}
15265
%
15266
The \verb"<=" operator checks whether one set is a subset of another,
15267
including the possibility that they are equal, which is true if all
15268
the letters in {\tt word} appear in {\tt available}.
15269
\index{subset}
15270
15271
As an exercise, rewrite \verb"avoids" using sets.
15272
15273
15274
\section{Counters}
15275
15276
A Counter is like a set, except that if an element appears more
15277
than once, the Counter keeps track of how many times it appears.
15278
If you are familiar with the mathematical idea of a {\bf multiset},
15279
a Counter is a natural way to represent a multiset.
15280
\index{Counter}
15281
\index{object!Counter}
15282
\index{multiset}
15283
15284
Counter is defined in a standard module called {\tt collections},
15285
so you have to import it. You can initialize a Counter with a string,
15286
list, or anything else that supports iteration:
15287
\index{collections}
15288
\index{module!collections}
15289
15290
\begin{verbatim}
15291
>>> from collections import Counter
15292
>>> count = Counter('parrot')
15293
>>> count
15294
Counter({'r': 2, 't': 1, 'o': 1, 'p': 1, 'a': 1})
15295
\end{verbatim}
15296
15297
Counters behave like dictionaries in many ways; they map from each
15298
key to the number of times it appears. As in dictionaries,
15299
the keys have to be hashable.
15300
15301
Unlike dictionaries, Counters don't raise an exception if you access
15302
an element that doesn't appear. Instead, they return 0:
15303
15304
\begin{verbatim}
15305
>>> count['d']
15306
0
15307
\end{verbatim}
15308
15309
We can use Counters to rewrite \verb"is_anagram" from
15310
Exercise~\ref{anagram}:
15311
15312
\begin{verbatim}
15313
def is_anagram(word1, word2):
15314
return Counter(word1) == Counter(word2)
15315
\end{verbatim}
15316
15317
If two words are anagrams, they contain the same letters with the same
15318
counts, so their Counters are equivalent.
15319
15320
Counters provide methods and operators to perform set-like operations,
15321
including addition, subtraction, union and intersection. And
15322
they provide an often-useful method, \verb"most_common", which
15323
returns a list of value-frequency pairs, sorted from most common to
15324
least:
15325
15326
\begin{verbatim}
15327
>>> count = Counter('parrot')
15328
>>> for val, freq in count.most_common(3):
15329
... print(val, freq)
15330
r 2
15331
p 1
15332
a 1
15333
\end{verbatim}
15334
15335
15336
\section{defaultdict}
15337
15338
The {\tt collections} module also provides {\tt defaultdict}, which is
15339
like a dictionary except that if you access a key that doesn't exist,
15340
it can generate a new value on the fly.
15341
\index{defaultdict}
15342
\index{object!defaultdict}
15343
\index{collections}
15344
\index{module!collections}
15345
15346
When you create a defaultdict, you provide a function that's used to
15347
create new values. A function used to create objects is sometimes
15348
called a {\bf factory}. The built-in functions that create lists, sets,
15349
and other types can be used as factories:
15350
\index{factory function}
15351
15352
\begin{verbatim}
15353
>>> from collections import defaultdict
15354
>>> d = defaultdict(list)
15355
\end{verbatim}
15356
15357
Notice that the argument is {\tt list}, which is a class object,
15358
not {\tt list()}, which is a new list. The function you provide
15359
doesn't get called unless you access a key that doesn't exist.
15360
15361
\begin{verbatim}
15362
>>> t = d['new key']
15363
>>> t
15364
[]
15365
\end{verbatim}
15366
15367
The new list, which we're calling {\tt t}, is also added to the
15368
dictionary. So if we modify {\tt t}, the change appears in {\tt d}:
15369
15370
\begin{verbatim}
15371
>>> t.append('new value')
15372
>>> d
15373
defaultdict(<class 'list'>, {'new key': ['new value']})
15374
\end{verbatim}
15375
15376
If you are making a dictionary of lists, you can often write simpler
15377
code using {\tt defaultdict}. In my solution to
15378
Exercise~\ref{anagrams}, which you can get from
15379
\url{http://thinkpython2.com/code/anagram_sets.py}, I make a
15380
dictionary that maps from a sorted string of letters to the list of
15381
words that can be spelled with those letters. For example, {\tt
15382
'opst'} maps to the list {\tt ['opts', 'post', 'pots', 'spot',
15383
'stop', 'tops']}.
15384
15385
Here's the original code:
15386
15387
\begin{verbatim}
15388
def all_anagrams(filename):
15389
d = {}
15390
for line in open(filename):
15391
word = line.strip().lower()
15392
t = signature(word)
15393
if t not in d:
15394
d[t] = [word]
15395
else:
15396
d[t].append(word)
15397
return d
15398
\end{verbatim}
15399
15400
This can be simplified using {\tt setdefault}, which you might
15401
have used in Exercise~\ref{setdefault}:
15402
\index{setdefault}
15403
15404
\begin{verbatim}
15405
def all_anagrams(filename):
15406
d = {}
15407
for line in open(filename):
15408
word = line.strip().lower()
15409
t = signature(word)
15410
d.setdefault(t, []).append(word)
15411
return d
15412
\end{verbatim}
15413
15414
This solution has the drawback that it makes a new list
15415
every time, regardless of whether it is needed. For lists,
15416
that's no big deal, but if the factory
15417
function is complicated, it might be.
15418
\index{factory function}
15419
15420
We can avoid this problem and
15421
simplify the code using a {\tt defaultdict}:
15422
15423
\begin{verbatim}
15424
def all_anagrams(filename):
15425
d = defaultdict(list)
15426
for line in open(filename):
15427
word = line.strip().lower()
15428
t = signature(word)
15429
d[t].append(word)
15430
return d
15431
\end{verbatim}
15432
15433
My solution to Exercise~\ref{poker}, which you can download from
15434
\url{http://thinkpython2.com/code/PokerHandSoln.py},
15435
uses {\tt setdefault} in the function
15436
\verb"has_straightflush". This solution has the drawback
15437
of creating a {\tt Hand} object every time through the loop, whether
15438
it is needed or not. As an exercise, rewrite it using
15439
a defaultdict.
15440
15441
15442
\section{Named tuples}
15443
15444
Many simple objects are basically collections of related values.
15445
For example, the Point object defined in Chapter~\ref{clobjects} contains
15446
two numbers, {\tt x} and {\tt y}. When you define a class like
15447
this, you usually start with an init method and a str method:
15448
15449
\begin{verbatim}
15450
class Point:
15451
15452
def __init__(self, x=0, y=0):
15453
self.x = x
15454
self.y = y
15455
15456
def __str__(self):
15457
return '(%g, %g)' % (self.x, self.y)
15458
\end{verbatim}
15459
15460
This is a lot of code to convey a small amount of information.
15461
Python provides a more concise way to say the same thing:
15462
15463
\begin{verbatim}
15464
from collections import namedtuple
15465
Point = namedtuple('Point', ['x', 'y'])
15466
\end{verbatim}
15467
15468
The first argument is the name of the class you want to create.
15469
The second is a list of the attributes Point objects should have,
15470
as strings. The return value from {\tt namedtuple} is a class object:
15471
\index{namedtuple}
15472
\index{object!namedtuple}
15473
\index{collections}
15474
\index{module!collections}
15475
15476
\begin{verbatim}
15477
>>> Point
15478
<class '__main__.Point'>
15479
\end{verbatim}
15480
15481
{\tt Point} automatically provides methods like \verb"__init__" and
15482
\verb"__str__" so you don't have to write them.
15483
\index{class object}
15484
\index{object!class}
15485
15486
To create a Point object, you use the Point class as a function:
15487
15488
\begin{verbatim}
15489
>>> p = Point(1, 2)
15490
>>> p
15491
Point(x=1, y=2)
15492
\end{verbatim}
15493
15494
The init method assigns the arguments to attributes using the names
15495
you provided. The str method prints a representation of the Point
15496
object and its attributes.
15497
15498
You can access the elements of the named tuple by name:
15499
15500
\begin{verbatim}
15501
>>> p.x, p.y
15502
(1, 2)
15503
\end{verbatim}
15504
15505
But you can also treat a named tuple as a tuple:
15506
15507
\begin{verbatim}
15508
>>> p[0], p[1]
15509
(1, 2)
15510
15511
>>> x, y = p
15512
>>> x, y
15513
(1, 2)
15514
\end{verbatim}
15515
15516
Named tuples provide a quick way to define simple classes.
15517
The drawback is that simple classes don't always stay simple.
15518
You might decide later that you want to add methods to a named tuple.
15519
In that case, you could define a new class that inherits from
15520
the named tuple:
15521
\index{inheritance}
15522
15523
\begin{verbatim}
15524
class Pointier(Point):
15525
# add more methods here
15526
\end{verbatim}
15527
15528
Or you could switch to a conventional class definition.
15529
15530
15531
\section{Gathering keyword args}
15532
15533
In Section~\ref{gather}, we saw how to write a function that
15534
gathers its arguments into a tuple:
15535
\index{gather}
15536
15537
\begin{verbatim}
15538
def printall(*args):
15539
print(args)
15540
\end{verbatim}
15541
%
15542
You can call this function with any number of positional arguments
15543
(that is, arguments that don't have keywords):
15544
\index{positional argument}
15545
\index{argument!positional}
15546
15547
\begin{verbatim}
15548
>>> printall(1, 2.0, '3')
15549
(1, 2.0, '3')
15550
\end{verbatim}
15551
%
15552
But the {\tt *} operator doesn't gather keyword arguments:
15553
\index{keyword argument}
15554
\index{argument!keyword}
15555
15556
\begin{verbatim}
15557
>>> printall(1, 2.0, third='3')
15558
TypeError: printall() got an unexpected keyword argument 'third'
15559
\end{verbatim}
15560
%
15561
To gather keyword arguments, you can use the {\tt **} operator:
15562
15563
\begin{verbatim}
15564
def printall(*args, **kwargs):
15565
print(args, kwargs)
15566
\end{verbatim}
15567
%
15568
You can call the keyword gathering parameter anything you want, but
15569
{\tt kwargs} is a common choice. The result is a dictionary that maps
15570
keywords to values:
15571
15572
\begin{verbatim}
15573
>>> printall(1, 2.0, third='3')
15574
(1, 2.0) {'third': '3'}
15575
\end{verbatim}
15576
%
15577
If you have a dictionary of keywords and values, you can use the
15578
scatter operator, {\tt **} to call a function:
15579
\index{scatter}
15580
15581
\begin{verbatim}
15582
>>> d = dict(x=1, y=2)
15583
>>> Point(**d)
15584
Point(x=1, y=2)
15585
\end{verbatim}
15586
%
15587
Without the scatter operator, the function would treat {\tt d} as
15588
a single positional argument, so it would assign {\tt d} to
15589
{\tt x} and complain because there's nothing to assign to {\tt y}:
15590
15591
\begin{verbatim}
15592
>>> d = dict(x=1, y=2)
15593
>>> Point(d)
15594
Traceback (most recent call last):
15595
File "<stdin>", line 1, in <module>
15596
TypeError: __new__() missing 1 required positional argument: 'y'
15597
\end{verbatim}
15598
%
15599
When you are working with functions that have a large number of
15600
parameters, it is often useful to create and pass around dictionaries
15601
that specify frequently used options.
15602
15603
15604
\section{Glossary}
15605
15606
\begin{description}
15607
15608
\item[conditional expression:] An expression that has one of two
15609
values, depending on a condition.
15610
\index{conditional expression}
15611
\index{expression!conditional}
15612
15613
\item[list comprehension:] An expression with a {\tt for} loop in square
15614
brackets that yields a new list.
15615
\index{list comprehension}
15616
15617
\item[generator expression:] An expression with a {\tt for} loop in parentheses
15618
that yields a generator object.
15619
\index{generator expression}
15620
\index{expression!generator}
15621
15622
\item[multiset:] A mathematical entity that represents a mapping
15623
between the elements of a set and the number of times they appear.
15624
15625
\item[factory:] A function, usually passed as a parameter, used to
15626
create objects.
15627
\index{factory}
15628
15629
\end{description}
15630
15631
15632
15633
15634
\section{Exercises}
15635
15636
\begin{exercise}
15637
15638
The following is a function computes the binomial
15639
coefficient recursively.
15640
15641
\begin{verbatim}
15642
def binomial_coeff(n, k):
15643
"""Compute the binomial coefficient "n choose k".
15644
15645
n: number of trials
15646
k: number of successes
15647
15648
returns: int
15649
"""
15650
if k == 0:
15651
return 1
15652
if n == 0:
15653
return 0
15654
15655
res = binomial_coeff(n-1, k) + binomial_coeff(n-1, k-1)
15656
return res
15657
\end{verbatim}
15658
15659
Rewrite the body of the function using nested conditional
15660
expressions.
15661
15662
One note: this function is not very efficient because it ends up computing
15663
the same values over and over. You could make it more efficient by
15664
memoizing (see Section~\ref{memoize}). But you will find that it's harder to
15665
memoize if you write it using conditional expressions.
15666
15667
\end{exercise}
15668
15669
15670
15671
\appendix
15672
15673
\chapter{Debugging}
15674
\index{debugging}
15675
15676
When you are debugging, you should distinguish among different
15677
kinds of errors in order to track them down more quickly:
15678
15679
\begin{itemize}
15680
15681
\item Syntax errors are discovered by the interpreter when it is
15682
translating the source code into byte code. They indicate
15683
that there is something wrong with the structure of the program.
15684
Example: Omitting the colon at the end of a {\tt def} statement
15685
generates the somewhat redundant message {\tt SyntaxError: invalid
15686
syntax}.
15687
\index{syntax error}
15688
\index{error!syntax}
15689
15690
\item Runtime errors are produced by the interpreter if something goes
15691
wrong while the program is running. Most runtime error messages
15692
include information about where the error occurred and what
15693
functions were executing. Example: An infinite recursion eventually
15694
causes the runtime error ``maximum recursion depth exceeded''.
15695
\index{runtime error}
15696
\index{error!runtime}
15697
\index{exception}
15698
15699
\item Semantic errors are problems with a program that runs without
15700
producing error messages but doesn't do the right thing. Example:
15701
An expression may not be evaluated in the order you expect, yielding
15702
an incorrect result.
15703
\index{semantic error}
15704
\index{error!semantic}
15705
15706
\end{itemize}
15707
15708
The first step in debugging is to figure out which kind of
15709
error you are dealing with. Although the following sections are
15710
organized by error type, some techniques are
15711
applicable in more than one situation.
15712
15713
15714
\section{Syntax errors}
15715
\index{error message}
15716
15717
Syntax errors are usually easy to fix once you figure out what they
15718
are. Unfortunately, the error messages are often not helpful.
15719
The most common messages are {\tt SyntaxError: invalid syntax} and
15720
{\tt SyntaxError: invalid token}, neither of which is very informative.
15721
15722
On the other hand, the message does tell you where in the program the
15723
problem occurred. Actually, it tells you where Python
15724
noticed a problem, which is not necessarily where the error
15725
is. Sometimes the error is prior to the location of the error
15726
message, often on the preceding line.
15727
\index{incremental development}
15728
\index{development plan!incremental}
15729
15730
If you are building the program incrementally, you should have
15731
a good idea about where the error is. It will be in the last
15732
line you added.
15733
15734
If you are copying code from a book, start by comparing
15735
your code to the book's code very carefully. Check every character.
15736
At the same time, remember that the book might be wrong, so
15737
if you see something that looks like a syntax error, it might be.
15738
15739
Here are some ways to avoid the most common syntax errors:
15740
\index{syntax}
15741
15742
\begin{enumerate}
15743
15744
\item Make sure you are not using a Python keyword for a variable name.
15745
\index{keyword}
15746
15747
\item Check that you have a colon at the end of the header of every
15748
compound statement, including {\tt for}, {\tt while},
15749
{\tt if}, and {\tt def} statements.
15750
\index{header}
15751
\index{colon}
15752
15753
\item Make sure that any strings in the code have matching
15754
quotation marks. Make sure that all quotation marks are
15755
``straight quotes'', not ``curly quotes''.
15756
\index{quotation mark}
15757
15758
\item If you have multiline strings with triple quotes (single or double), make
15759
sure you have terminated the string properly. An unterminated string
15760
may cause an {\tt invalid token} error at the end of your program,
15761
or it may treat the following part of the program as a string until it
15762
comes to the next string. In the second case, it might not produce an error
15763
message at all!
15764
\index{multiline string}
15765
\index{string!multiline}
15766
15767
\item An unclosed opening operator---\verb+(+, \verb+{+, or
15768
\verb+[+---makes Python continue with the next line as part of the
15769
current statement. Generally, an error occurs almost immediately in
15770
the next line.
15771
15772
\item Check for the classic {\tt =} instead of {\tt ==} inside
15773
a conditional.
15774
\index{conditional}
15775
15776
\item Check the indentation to make sure it lines up the way it
15777
is supposed to. Python can handle space and tabs, but if you mix
15778
them it can cause problems. The best way to avoid this problem
15779
is to use a text editor that knows about Python and generates
15780
consistent indentation.
15781
\index{indentation}
15782
\index{whitespace}
15783
15784
\item If you have non-ASCII characters in the code (including strings
15785
and comments), that might cause a problem, although Python 3 usually
15786
handles non-ASCII characters. Be careful if you paste in text from
15787
a web page or other source.
15788
15789
\end{enumerate}
15790
15791
If nothing works, move on to the next section...
15792
15793
15794
\subsection{I keep making changes and it makes no difference.}
15795
15796
If the interpreter says there is an error and you don't see it, that
15797
might be because you and the interpreter are not looking at the same
15798
code. Check your programming environment to make sure that the
15799
program you are editing is the one Python is trying to run.
15800
15801
If you are not sure, try putting an obvious and deliberate syntax
15802
error at the beginning of the program. Now run it again. If the
15803
interpreter doesn't find the new error, you are not running the
15804
new code.
15805
15806
There are a few likely culprits:
15807
15808
\begin{itemize}
15809
15810
\item You edited the file and forgot to save the changes before
15811
running it again. Some programming environments do this
15812
for you, but some don't.
15813
15814
\item You changed the name of the file, but you are still running
15815
the old name.
15816
15817
\item Something in your development environment is configured
15818
incorrectly.
15819
15820
\item If you are writing a module and using {\tt import},
15821
make sure you don't give your module the same name as one
15822
of the standard Python modules.
15823
15824
\item If you are using {\tt import} to read a module, remember
15825
that you have to restart the interpreter or use {\tt reload}
15826
to read a modified file. If you import the module again, it
15827
doesn't do anything.
15828
\index{module!reload}
15829
\index{reload function}
15830
\index{function!reload}
15831
15832
\end{itemize}
15833
15834
If you get stuck and you can't figure out what is going on, one
15835
approach is to start again with a new program like ``Hello, World!'',
15836
and make sure you can get a known program to run. Then gradually add
15837
the pieces of the original program to the new one.
15838
15839
15840
\section{Runtime errors}
15841
15842
Once your program is syntactically correct,
15843
Python can read it and at least start running it. What could
15844
possibly go wrong?
15845
15846
15847
\subsection{My program does absolutely nothing.}
15848
15849
This problem is most common when your file consists of functions and
15850
classes but does not actually invoke a function to start execution.
15851
This may be intentional if you only plan to import this module to
15852
supply classes and functions.
15853
15854
If it is not intentional, make sure there is a function call
15855
in the program, and make sure the flow of execution reaches
15856
it (see ``Flow of Execution'' below).
15857
15858
15859
\subsection{My program hangs.}
15860
\index{infinite loop}
15861
\index{infinite recursion}
15862
\index{hanging}
15863
15864
If a program stops and seems to be doing nothing, it is ``hanging''.
15865
Often that means that it is caught in an infinite loop or infinite
15866
recursion.
15867
15868
\begin{itemize}
15869
15870
\item If there is a particular loop that you suspect is the
15871
problem, add a {\tt print} statement immediately before the loop that says
15872
``entering the loop'' and another immediately after that says
15873
``exiting the loop''.
15874
15875
Run the program. If you get the first message and not the second,
15876
you've got an infinite loop. Go to the ``Infinite Loop'' section
15877
below.
15878
15879
\item Most of the time, an infinite recursion will cause the program
15880
to run for a while and then produce a ``RuntimeError: Maximum
15881
recursion depth exceeded'' error. If that happens, go to the
15882
``Infinite Recursion'' section below.
15883
15884
If you are not getting this error but you suspect there is a problem
15885
with a recursive method or function, you can still use the techniques
15886
in the ``Infinite Recursion'' section.
15887
15888
\item If neither of those steps works, start testing other
15889
loops and other recursive functions and methods.
15890
15891
\item If that doesn't work, then it is possible that
15892
you don't understand the flow of execution in your program.
15893
Go to the ``Flow of Execution'' section below.
15894
15895
\end{itemize}
15896
15897
15898
\subsubsection{Infinite Loop}
15899
\index{infinite loop}
15900
\index{loop!infinite}
15901
\index{condition}
15902
\index{loop!condition}
15903
15904
If you think you have an infinite loop and you think you know
15905
what loop is causing the problem, add a {\tt print} statement at
15906
the end of the loop that prints the values of the variables in
15907
the condition and the value of the condition.
15908
15909
For example:
15910
15911
\begin{verbatim}
15912
while x > 0 and y < 0 :
15913
# do something to x
15914
# do something to y
15915
15916
print('x: ', x)
15917
print('y: ', y)
15918
print("condition: ", (x > 0 and y < 0))
15919
\end{verbatim}
15920
%
15921
Now when you run the program, you will see three lines of output
15922
for each time through the loop. The last time through the
15923
loop, the condition should be {\tt False}. If the loop keeps
15924
going, you will be able to see the values of {\tt x} and {\tt y},
15925
and you might figure out why they are not being updated correctly.
15926
15927
15928
\subsubsection{Infinite Recursion}
15929
\index{infinite recursion}
15930
\index{recursion!infinite}
15931
15932
Most of the time, infinite recursion causes the program to run
15933
for a while and then produce a {\tt Maximum recursion depth exceeded}
15934
error.
15935
15936
If you suspect that a function is causing an infinite
15937
recursion, make sure that there is a base case.
15938
There should be some condition that causes the
15939
function to return without making a recursive invocation.
15940
If not, you need to rethink the algorithm and identify a base
15941
case.
15942
15943
If there is a base case but the program doesn't seem to be reaching
15944
it, add a {\tt print} statement at the beginning of the function
15945
that prints the parameters. Now when you run the program, you will see
15946
a few lines of output every time the function is invoked,
15947
and you will see the parameter values. If the parameters are not moving
15948
toward the base case, you will get some ideas about why not.
15949
15950
15951
\subsubsection{Flow of Execution}
15952
\index{flow of execution}
15953
15954
If you are not sure how the flow of execution is moving through
15955
your program, add {\tt print} statements to the beginning of each
15956
function with a message like ``entering function {\tt foo}'', where
15957
{\tt foo} is the name of the function.
15958
15959
Now when you run the program, it will print a trace of each
15960
function as it is invoked.
15961
15962
15963
\subsection{When I run the program I get an exception.}
15964
\index{exception}
15965
\index{runtime error}
15966
15967
If something goes wrong during runtime, Python
15968
prints a message that includes the name of the
15969
exception, the line of the program where the problem occurred,
15970
and a traceback.
15971
\index{traceback}
15972
15973
The traceback identifies the function that is currently running, and
15974
then the function that called it, and then the function that called
15975
{\em that}, and so on. In other words, it traces the sequence of
15976
function calls that got you to where you are, including the line
15977
number in your file where each call occurred.
15978
15979
The first step is to examine the place in the program where
15980
the error occurred and see if you can figure out what happened.
15981
These are some of the most common runtime errors:
15982
15983
\begin{description}
15984
15985
\item[NameError:] You are trying to use a variable that doesn't
15986
exist in the current environment. Check if the name
15987
is spelled right, or at least consistently.
15988
And remember that local variables are local; you
15989
cannot refer to them from outside the function where they are defined.
15990
\index{NameError}
15991
\index{exception!NameError}
15992
15993
\item[TypeError:] There are several possible causes:
15994
\index{TypeError}
15995
\index{exception!TypeError}
15996
15997
\begin{itemize}
15998
15999
\item You are trying to use a value improperly. Example: indexing
16000
a string, list, or tuple with something other than an integer.
16001
\index{index}
16002
16003
\item There is a mismatch between the items in a format string and
16004
the items passed for conversion. This can happen if either the number
16005
of items does not match or an invalid conversion is called for.
16006
\index{format operator}
16007
\index{operator!format}
16008
16009
\item You are passing the wrong number of arguments to a function.
16010
For methods, look at the method definition and
16011
check that the first parameter is {\tt self}. Then look at the
16012
method invocation; make sure you are invoking the method on an
16013
object with the right type and providing the other arguments
16014
correctly.
16015
16016
\end{itemize}
16017
16018
\item[KeyError:] You are trying to access an element of a dictionary
16019
using a key that the dictionary does not contain. If the keys
16020
are strings, remember that capitalization matters.
16021
\index{KeyError}
16022
\index{exception!KeyError}
16023
\index{dictionary}
16024
16025
\item[AttributeError:] You are trying to access an attribute or method
16026
that does not exist. Check the spelling! You can use the built-in
16027
function {\tt vars} to list the attributes that do exist.
16028
\index{dir function}
16029
\index{function!dir}
16030
16031
If an AttributeError indicates that an object has {\tt NoneType},
16032
that means that it is {\tt None}. So the problem is not the
16033
attribute name, but the object.
16034
16035
The reason the object is none might be that you forgot
16036
to return a value from a function; if you get to the end of
16037
a function without hitting a {\tt return} statement, it returns
16038
{\tt None}. Another common cause is using the result from
16039
a list method, like {\tt sort}, that returns {\tt None}.
16040
\index{AttributeError}
16041
\index{exception!AttributeError}
16042
16043
\item[IndexError:] The index you are using
16044
to access a list, string, or tuple is greater than
16045
its length minus one. Immediately before the site of the error,
16046
add a {\tt print} statement to display
16047
the value of the index and the length of the array.
16048
Is the array the right size? Is the index the right value?
16049
\index{IndexError}
16050
\index{exception!IndexError}
16051
16052
\end{description}
16053
16054
The Python debugger ({\tt pdb}) is useful for tracking down
16055
exceptions because it allows you to examine the state of the
16056
program immediately before the error. You can read
16057
about {\tt pdb} at \url{https://docs.python.org/3/library/pdb.html}.
16058
\index{debugger (pdb)}
16059
\index{pdb (Python debugger)}
16060
16061
16062
\subsection{I added so many {\tt print} statements I get inundated with
16063
output.}
16064
\index{print statement}
16065
\index{statement!print}
16066
16067
One of the problems with using {\tt print} statements for debugging
16068
is that you can end up buried in output. There are two ways
16069
to proceed: simplify the output or simplify the program.
16070
16071
To simplify the output, you can remove or comment out {\tt print}
16072
statements that aren't helping, or combine them, or format
16073
the output so it is easier to understand.
16074
16075
To simplify the program, there are several things you can do. First,
16076
scale down the problem the program is working on. For example, if you
16077
are searching a list, search a {\em small} list. If the program takes
16078
input from the user, give it the simplest input that causes the
16079
problem.
16080
\index{dead code}
16081
16082
Second, clean up the program. Remove dead code and reorganize the
16083
program to make it as easy to read as possible. For example, if you
16084
suspect that the problem is in a deeply nested part of the program,
16085
try rewriting that part with simpler structure. If you suspect a
16086
large function, try splitting it into smaller functions and testing them
16087
separately.
16088
\index{testing!minimal test case}
16089
\index{test case, minimal}
16090
16091
Often the process of finding the minimal test case leads you to the
16092
bug. If you find that a program works in one situation but not in
16093
another, that gives you a clue about what is going on.
16094
16095
Similarly, rewriting a piece of code can help you find subtle
16096
bugs. If you make a change that you think shouldn't affect the
16097
program, and it does, that can tip you off.
16098
16099
16100
\section{Semantic errors}
16101
16102
In some ways, semantic errors are the hardest to debug,
16103
because the interpreter provides no information
16104
about what is wrong. Only you know what the program is supposed to
16105
do.
16106
\index{semantic error}
16107
\index{error!semantic}
16108
16109
The first step is to make a connection between the program
16110
text and the behavior you are seeing. You need a hypothesis
16111
about what the program is actually doing. One of the things
16112
that makes that hard is that computers run so fast.
16113
16114
You will often wish that you could slow the program down to human
16115
speed, and with some debuggers you can. But the time it takes to
16116
insert a few well-placed {\tt print} statements is often short compared to
16117
setting up the debugger, inserting and removing breakpoints, and
16118
``stepping'' the program to where the error is occurring.
16119
16120
16121
\subsection{My program doesn't work.}
16122
16123
You should ask yourself these questions:
16124
16125
\begin{itemize}
16126
16127
\item Is there something the program was supposed to do but
16128
which doesn't seem to be happening? Find the section of the code
16129
that performs that function and make sure it is executing when
16130
you think it should.
16131
16132
\item Is something happening that shouldn't? Find code in
16133
your program that performs that function and see if it is
16134
executing when it shouldn't.
16135
16136
\item Is a section of code producing an effect that is not
16137
what you expected? Make sure that you understand the code in
16138
question, especially if it involves functions or methods in
16139
other Python modules. Read the documentation for the functions you call.
16140
Try them out by writing simple test cases and checking the results.
16141
16142
\end{itemize}
16143
16144
In order to program, you need a mental model of how
16145
programs work. If you write a program that doesn't do what you expect,
16146
often the problem is not in the program; it's in your mental
16147
model.
16148
\index{model, mental}
16149
\index{mental model}
16150
16151
The best way to correct your mental model is to break the program
16152
into its components (usually the functions and methods) and test
16153
each component independently. Once you find the discrepancy
16154
between your model and reality, you can solve the problem.
16155
16156
Of course, you should be building and testing components as you
16157
develop the program. If you encounter a problem,
16158
there should be only a small amount of new code
16159
that is not known to be correct.
16160
16161
16162
\subsection{I've got a big hairy expression and it doesn't
16163
do what I expect.}
16164
\index{expression!big and hairy}
16165
\index{big, hairy expression}
16166
16167
Writing complex expressions is fine as long as they are readable,
16168
but they can be hard to debug. It is often a good idea to
16169
break a complex expression into a series of assignments to
16170
temporary variables.
16171
16172
For example:
16173
16174
\begin{verbatim}
16175
self.hands[i].addCard(self.hands[self.findNeighbor(i)].popCard())
16176
\end{verbatim}
16177
%
16178
This can be rewritten as:
16179
16180
\begin{verbatim}
16181
neighbor = self.findNeighbor(i)
16182
pickedCard = self.hands[neighbor].popCard()
16183
self.hands[i].addCard(pickedCard)
16184
\end{verbatim}
16185
%
16186
The explicit version is easier to read because the variable
16187
names provide additional documentation, and it is easier to debug
16188
because you can check the types of the intermediate variables
16189
and display their values.
16190
\index{temporary variable}
16191
\index{variable!temporary}
16192
16193
Another problem that can occur with big expressions is
16194
that the order of evaluation may not be what you expect.
16195
For example, if you are translating the expression
16196
$\frac{x}{2 \pi}$ into Python, you might write:
16197
16198
\begin{verbatim}
16199
y = x / 2 * math.pi
16200
\end{verbatim}
16201
%
16202
That is not correct because multiplication and division have
16203
the same precedence and are evaluated from left to right.
16204
So this expression computes $x \pi / 2$.
16205
\index{order of operations}
16206
\index{precedence}
16207
16208
A good way to debug expressions is to add parentheses to make
16209
the order of evaluation explicit:
16210
16211
\begin{verbatim}
16212
y = x / (2 * math.pi)
16213
\end{verbatim}
16214
%
16215
Whenever you are not sure of the order of evaluation, use
16216
parentheses. Not only will the program be correct (in the sense
16217
of doing what you intended), it will also be more readable for
16218
other people who haven't memorized the order of operations.
16219
16220
16221
\subsection{I've got a function that doesn't return what I
16222
expect.}
16223
\index{return statement}
16224
\index{statement!return}
16225
16226
If you have a {\tt return} statement with a complex expression,
16227
you don't have a chance to print the result before
16228
returning. Again, you can use a temporary variable. For
16229
example, instead of:
16230
16231
\begin{verbatim}
16232
return self.hands[i].removeMatches()
16233
\end{verbatim}
16234
%
16235
you could write:
16236
16237
\begin{verbatim}
16238
count = self.hands[i].removeMatches()
16239
return count
16240
\end{verbatim}
16241
%
16242
Now you have the opportunity to display the value of
16243
{\tt count} before returning.
16244
16245
16246
\subsection{I'm really, really stuck and I need help.}
16247
16248
First, try getting away from the computer for a few minutes.
16249
Computers emit waves that affect the brain, causing these
16250
symptoms:
16251
16252
\begin{itemize}
16253
16254
\item Frustration and rage.
16255
\index{frustration}
16256
\index{rage}
16257
\index{debugging!emotional response}
16258
\index{emotional debugging}
16259
16260
\item Superstitious beliefs (``the computer hates me'') and
16261
magical thinking (``the program only works when I wear my
16262
hat backward'').
16263
\index{debugging!superstition}
16264
\index{superstitious debugging}
16265
16266
\item Random walk programming (the attempt to program by writing
16267
every possible program and choosing the one that does the right
16268
thing).
16269
\index{random walk programming}
16270
\index{development plan!random walk programming}
16271
16272
\end{itemize}
16273
16274
If you find yourself suffering from any of these symptoms, get
16275
up and go for a walk. When you are calm, think about the program.
16276
What is it doing? What are some possible causes of that
16277
behavior? When was the last time you had a working program,
16278
and what did you do next?
16279
16280
Sometimes it just takes time to find a bug. I often find bugs
16281
when I am away from the computer and let my mind wander. Some
16282
of the best places to find bugs are trains, showers, and in bed,
16283
just before you fall asleep.
16284
16285
16286
\subsection{No, I really need help.}
16287
16288
It happens. Even the best programmers occasionally get stuck.
16289
Sometimes you work on a program so long that you can't see the
16290
error. You need a fresh pair of eyes.
16291
16292
Before you bring someone else in, make sure you are prepared.
16293
Your program should be as simple
16294
as possible, and you should be working on the smallest input
16295
that causes the error. You should have {\tt print} statements in the
16296
appropriate places (and the output they produce should be
16297
comprehensible). You should understand the problem well enough
16298
to describe it concisely.
16299
16300
When you bring someone in to help, be sure to give
16301
them the information they need:
16302
16303
\begin{itemize}
16304
16305
\item If there is an error message, what is it
16306
and what part of the program does it indicate?
16307
16308
\item What was the last thing you did before this error occurred?
16309
What were the last lines of code that you wrote, or what is
16310
the new test case that fails?
16311
16312
\item What have you tried so far, and what have you learned?
16313
16314
\end{itemize}
16315
16316
When you find the bug, take a second to think about what you
16317
could have done to find it faster. Next time you see something
16318
similar, you will be able to find the bug more quickly.
16319
16320
Remember, the goal is not just to make the program
16321
work. The goal is to learn how to make the program work.
16322
16323
16324
\chapter{Analysis of Algorithms}
16325
\label{algorithms}
16326
16327
\begin{quote}
16328
This appendix is an edited excerpt from {\it Think Complexity}, by
16329
Allen B. Downey, also published by O'Reilly Media (2012). When you
16330
are done with this book, you might want to move on to that one.
16331
\end{quote}
16332
16333
{\bf Analysis of algorithms} is a branch of computer science that
16334
studies the performance of algorithms, especially their run time and
16335
space requirements. See
16336
\url{http://en.wikipedia.org/wiki/Analysis_of_algorithms}.
16337
\index{algorithm} \index{analysis of algorithms}
16338
16339
The practical goal of algorithm analysis is to predict the performance
16340
of different algorithms in order to guide design decisions.
16341
16342
During the 2008 United States Presidential Campaign, candidate
16343
Barack Obama was asked to perform an impromptu analysis when
16344
he visited Google. Chief executive Eric Schmidt jokingly asked him
16345
for ``the most efficient way to sort a million 32-bit integers.''
16346
Obama had apparently been tipped off, because he quickly
16347
replied, ``I think the bubble sort would be the wrong way to go.''
16348
See \url{http://www.youtube.com/watch?v=k4RRi_ntQc8}.
16349
\index{Obama, Barack}
16350
\index{Schmidt, Eric}
16351
\index{bubble sort}
16352
16353
This is true: bubble sort is conceptually simple but slow for
16354
large datasets. The answer Schmidt was probably looking for is
16355
``radix sort'' (\url{http://en.wikipedia.org/wiki/Radix_sort})\footnote{
16356
But if you get a question like this in an interview, I think
16357
a better answer is, ``The fastest way to sort a million integers
16358
is to use whatever sort function is provided by the language
16359
I'm using. Its performance is good enough for the vast majority
16360
of applications, but if it turned out that my application was too
16361
slow, I would use a profiler to see where the time was being
16362
spent. If it looked like a faster sort algorithm would have
16363
a significant effect on performance, then I would look
16364
around for a good implementation of radix sort.''}.
16365
\index{radix sort}
16366
16367
The goal of algorithm analysis is to make meaningful
16368
comparisons between algorithms, but there are some problems:
16369
\index{comparing algorithms}
16370
16371
\begin{itemize}
16372
16373
\item The relative performance of the algorithms might
16374
depend on characteristics of the hardware, so one algorithm
16375
might be faster on Machine A, another on Machine B.
16376
The general solution to this problem is to specify a
16377
{\bf machine model} and analyze the number of steps, or
16378
operations, an algorithm requires under a given model.
16379
\index{machine model}
16380
16381
\item Relative performance might depend on the details of
16382
the dataset. For example, some sorting
16383
algorithms run faster if the data are already partially sorted;
16384
other algorithms run slower in this case.
16385
A common way to avoid this problem is to analyze the
16386
{\bf worst case} scenario. It is sometimes useful to
16387
analyze average case performance, but that's usually harder,
16388
and it might not be obvious what set of cases to average over.
16389
\index{worst case}
16390
\index{average case}
16391
16392
\item Relative performance also depends on the size of the
16393
problem. A sorting algorithm that is fast for small lists
16394
might be slow for long lists.
16395
The usual solution to this problem is to express run time
16396
(or number of operations) as a function of problem size,
16397
and group functions into categories depending on how quickly
16398
they grow as problem size increases.
16399
16400
\end{itemize}
16401
16402
The good thing about this kind of comparison is that it lends
16403
itself to simple classification of algorithms. For example,
16404
if I know that the run time of Algorithm A tends to be
16405
proportional to the size of the input, $n$, and Algorithm B
16406
tends to be proportional to $n^2$, then I
16407
expect A to be faster than B, at least for large values of $n$.
16408
16409
This kind of analysis comes with some caveats, but we'll get
16410
to that later.
16411
16412
16413
\section{Order of growth}
16414
16415
Suppose you have analyzed two algorithms and expressed
16416
their run times in terms of the size of the input:
16417
Algorithm A takes $100n+1$ steps to solve a problem with
16418
size $n$; Algorithm B takes $n^2 + n + 1$ steps.
16419
\index{order of growth}
16420
16421
The following table shows the run time of these algorithms
16422
for different problem sizes:
16423
16424
\begin{tabular}{|r|r|r|}
16425
\hline
16426
Input & Run time of & Run time of \\
16427
size & Algorithm A & Algorithm B \\
16428
\hline
16429
10 & 1 001 & 111 \\
16430
100 & 10 001 & 10 101 \\
16431
1 000 & 100 001 & 1 001 001 \\
16432
10 000 & 1 000 001 & 100 010 001 \\
16433
\hline
16434
\end{tabular}
16435
16436
At $n=10$, Algorithm A looks pretty bad; it takes almost 10 times
16437
longer than Algorithm B. But for $n=100$ they are about the same, and
16438
for larger values A is much better.
16439
16440
The fundamental reason is that for large values of $n$, any function
16441
that contains an $n^2$ term will grow faster than a function whose
16442
leading term is $n$. The {\bf leading term} is the term with the
16443
highest exponent.
16444
\index{leading term}
16445
\index{exponent}
16446
16447
For Algorithm A, the leading term has a large coefficient, 100, which
16448
is why B does better than A for small $n$. But regardless of the
16449
coefficients, there will always be some value of $n$ where
16450
$a n^2 > b n$, for any values of $a$ and $b$.
16451
\index{leading coefficient}
16452
16453
The same argument applies to the non-leading terms. Even if the run
16454
time of Algorithm A were $n+1000000$, it would still be better than
16455
Algorithm B for sufficiently large $n$.
16456
16457
In general, we expect an algorithm with a smaller leading term to be a
16458
better algorithm for large problems, but for smaller problems, there
16459
may be a {\bf crossover point} where another algorithm is better. The
16460
location of the crossover point depends on the details of the
16461
algorithms, the inputs, and the hardware, so it is usually ignored for
16462
purposes of algorithmic analysis. But that doesn't mean you can forget
16463
about it.
16464
\index{crossover point}
16465
16466
If two algorithms have the same leading order term, it is hard to say
16467
which is better; again, the answer depends on the details. So for
16468
algorithmic analysis, functions with the same leading term
16469
are considered equivalent, even if they have different coefficients.
16470
16471
An {\bf order of growth} is a set of functions whose growth
16472
behavior is considered equivalent. For example, $2n$, $100n$ and $n+1$
16473
belong to the same order of growth, which is written $O(n)$ in
16474
{\bf Big-Oh notation} and often called {\bf linear} because every function
16475
in the set grows linearly with $n$.
16476
\index{big-oh notation}
16477
\index{linear growth}
16478
16479
All functions with the leading term $n^2$ belong to $O(n^2)$; they are
16480
called {\bf quadratic}.
16481
\index{quadratic growth}
16482
16483
The following table shows some of the orders of growth that
16484
appear most commonly in algorithmic analysis,
16485
in increasing order of badness.
16486
\index{badness}
16487
16488
\begin{tabular}{|r|r|r|}
16489
\hline
16490
Order of & Name \\
16491
growth & \\
16492
\hline
16493
$O(1)$ & constant \\
16494
$O(\log_b n)$ & logarithmic (for any $b$) \\
16495
$O(n)$ & linear \\
16496
$O(n \log_b n)$ & linearithmic \\
16497
$O(n^2)$ & quadratic \\
16498
$O(n^3)$ & cubic \\
16499
$O(c^n)$ & exponential (for any $c$) \\
16500
\hline
16501
\end{tabular}
16502
16503
For the logarithmic terms, the base of the logarithm doesn't matter;
16504
changing bases is the equivalent of multiplying by a constant, which
16505
doesn't change the order of growth. Similarly, all exponential
16506
functions belong to the same order of growth regardless of the base of
16507
the exponent.
16508
Exponential functions grow very quickly, so exponential algorithms are
16509
only useful for small problems.
16510
\index{logarithmic growth}
16511
\index{exponential growth}
16512
16513
16514
\begin{exercise}
16515
16516
Read the Wikipedia page on Big-Oh notation at
16517
\url{http://en.wikipedia.org/wiki/Big_O_notation} and
16518
answer the following questions:
16519
16520
\begin{enumerate}
16521
\item What is the order of growth of $n^3 + n^2$?
16522
What about $1000000 n^3 + n^2$?
16523
What about $n^3 + 1000000 n^2$?
16524
16525
\item What is the order of growth of $(n^2 + n) \cdot (n + 1)$? Before
16526
you start multiplying, remember that you only need the leading term.
16527
16528
\item If $f$ is in $O(g)$, for some unspecified function $g$, what can
16529
we say about $af+b$, where $a$ and $b$ are constants?
16530
16531
\item If $f_1$ and $f_2$ are in $O(g)$, what can we say about $f_1 + f_2$?
16532
16533
\item If $f_1$ is in $O(g)$
16534
and $f_2$ is in $O(h)$,
16535
what can we say about $f_1 + f_2$?
16536
16537
\item If $f_1$ is in $O(g)$ and $f_2$ is $O(h)$,
16538
what can we say about $f_1 \cdot f_2$?
16539
\end{enumerate}
16540
16541
\end{exercise}
16542
16543
Programmers who care about performance often find this kind of
16544
analysis hard to swallow. They have a point: sometimes the
16545
coefficients and the non-leading terms make a real difference.
16546
Sometimes the details of the hardware, the programming language, and
16547
the characteristics of the input make a big difference. And for small
16548
problems, order of growth is irrelevant.
16549
16550
But if you keep those caveats in mind, algorithmic analysis is a
16551
useful tool. At least for large problems, the ``better'' algorithm
16552
is usually better, and sometimes it is {\em much} better. The
16553
difference between two algorithms with the same order of growth is
16554
usually a constant factor, but the difference between a good algorithm
16555
and a bad algorithm is unbounded!
16556
16557
16558
\section{Analysis of basic Python operations}
16559
16560
In Python, most arithmetic operations are constant time;
16561
multiplication usually takes longer than addition and subtraction, and
16562
division takes even longer, but these run times don't depend on the
16563
magnitude of the operands. Very large integers are an exception; in
16564
that case the run time increases with the number of digits.
16565
\index{analysis of primitives}
16566
16567
Indexing operations---reading or writing elements in a sequence
16568
or dictionary---are also constant time, regardless of the size
16569
of the data structure.
16570
\index{indexing}
16571
16572
A {\tt for} loop that traverses a sequence or dictionary is
16573
usually linear, as long as all of the operations in the body
16574
of the loop are constant time. For example, adding up the
16575
elements of a list is linear:
16576
16577
\begin{verbatim}
16578
total = 0
16579
for x in t:
16580
total += x
16581
\end{verbatim}
16582
16583
The built-in function {\tt sum} is also linear because it does
16584
the same thing, but it tends to be faster because it is a more
16585
efficient implementation; in the language of algorithmic analysis,
16586
it has a smaller leading coefficient.
16587
16588
As a rule of thumb, if the body of a loop is in $O(n^a)$ then
16589
the whole loop is in $O(n^{a+1})$. The exception is if you can
16590
show that the loop exits after a constant number of iterations.
16591
If a loop runs $k$ times regardless of $n$, then
16592
the loop is in $O(n^a)$, even for large $k$.
16593
16594
Multiplying by $k$ doesn't change the order of growth, but neither
16595
does dividing. So if the body of a loop is in $O(n^a)$ and it runs
16596
$n/k$ times, the loop is in $O(n^{a+1})$, even for large $k$.
16597
16598
Most string and tuple operations are linear, except indexing and {\tt
16599
len}, which are constant time. The built-in functions {\tt min} and
16600
{\tt max} are linear. The run-time of a slice operation is
16601
proportional to the length of the output, but independent of the size
16602
of the input.
16603
\index{string methods}
16604
\index{tuple methods}
16605
16606
String concatenation is linear; the run time depends on the sum
16607
of the lengths of the operands.
16608
\index{string concatenation}
16609
16610
All string methods are linear, but if the lengths of
16611
the strings are bounded by a constant---for example, operations on single
16612
characters---they are considered constant time.
16613
The string method {\tt join} is linear; the run time depends on
16614
the total length of the strings.
16615
\index{join@{\tt join}}
16616
16617
Most list methods are linear, but there are some exceptions:
16618
\index{list methods}
16619
16620
\begin{itemize}
16621
16622
\item Adding an element to the end of a list is constant time on
16623
average; when it runs out of room it occasionally gets copied
16624
to a bigger location, but the total time for $n$ operations
16625
is $O(n)$, so the average time for each
16626
operation is $O(1)$.
16627
16628
\item Removing an element from the end of a list is constant time.
16629
16630
\item Sorting is $O(n \log n)$.
16631
\index{sorting}
16632
16633
\end{itemize}
16634
16635
Most dictionary operations and methods are constant time, but
16636
there are some exceptions:
16637
\index{dictionary methods}
16638
16639
\begin{itemize}
16640
16641
\item The run time of {\tt update} is
16642
proportional to the size of the dictionary passed as a parameter,
16643
not the dictionary being updated.
16644
16645
\item {\tt keys}, {\tt values} and {\tt items} are constant time because
16646
they return iterators. But
16647
if you loop through the iterators, the loop will be linear.
16648
\index{iterator}
16649
16650
\end{itemize}
16651
16652
The performance of dictionaries is one of the minor miracles of
16653
computer science. We will see how they work in
16654
Section~\ref{hashtable}.
16655
16656
16657
\begin{exercise}
16658
16659
Read the Wikipedia page on sorting algorithms at
16660
\url{http://en.wikipedia.org/wiki/Sorting_algorithm} and answer
16661
the following questions:
16662
\index{sorting}
16663
16664
\begin{enumerate}
16665
16666
\item What is a ``comparison sort?'' What is the best worst-case order
16667
of growth for a comparison sort? What is the best worst-case order
16668
of growth for any sort algorithm?
16669
\index{comparison sort}
16670
16671
\item What is the order of growth of bubble sort, and why does Barack
16672
Obama think it is ``the wrong way to go?''
16673
16674
\item What is the order of growth of radix sort? What preconditions
16675
do we need to use it?
16676
16677
\item What is a stable sort and why might it matter in practice?
16678
\index{stable sort}
16679
16680
\item What is the worst sorting algorithm (that has a name)?
16681
16682
\item What sort algorithm does the C library use? What sort algorithm
16683
does Python use? Are these algorithms stable? You might have to
16684
Google around to find these answers.
16685
16686
\item Many of the non-comparison sorts are linear, so why does does
16687
Python use an $O(n \log n)$ comparison sort?
16688
16689
\end{enumerate}
16690
16691
\end{exercise}
16692
16693
16694
\section{Analysis of search algorithms}
16695
16696
A {\bf search} is an algorithm that takes a collection and a target
16697
item and determines whether the target is in the collection, often
16698
returning the index of the target.
16699
\index{search}
16700
16701
The simplest search algorithm is a ``linear search'', which traverses
16702
the items of the collection in order, stopping if it finds the target.
16703
In the worst case it has to traverse the entire collection, so the run
16704
time is linear.
16705
\index{linear search}
16706
16707
The {\tt in} operator for sequences uses a linear search; so do string
16708
methods like {\tt find} and {\tt count}.
16709
\index{in@{\tt in} operator}
16710
16711
If the elements of the sequence are in order, you can use a {\bf
16712
bisection search}, which is $O(\log n)$. Bisection search is
16713
similar to the algorithm you might use to look a word up in a
16714
dictionary (a paper dictionary, not the data structure). Instead of
16715
starting at the beginning and checking each item in order, you start
16716
with the item in the middle and check whether the word you are looking
16717
for comes before or after. If it comes before, then you search the
16718
first half of the sequence. Otherwise you search the second half.
16719
Either way, you cut the number of remaining items in half.
16720
\index{bisection search}
16721
16722
If the sequence has 1,000,000 items, it will take about 20 steps to
16723
find the word or conclude that it's not there. So that's about 50,000
16724
times faster than a linear search.
16725
16726
Bisection search can be much faster than linear search, but
16727
it requires the sequence to be in order, which might require
16728
extra work.
16729
16730
There is another data structure, called a {\bf hashtable} that
16731
is even faster---it can do a search in constant time---and it
16732
doesn't require the items to be sorted. Python dictionaries
16733
are implemented using hashtables, which is why most dictionary
16734
operations, including the {\tt in} operator, are constant time.
16735
16736
16737
\section{Hashtables}
16738
\label{hashtable}
16739
16740
To explain how hashtables work and why their performance is so
16741
good, I start with a simple implementation of a map and
16742
gradually improve it until it's a hashtable.
16743
\index{hashtable}
16744
16745
I use Python to demonstrate these implementations, but in real
16746
life you wouldn't write code like this in Python; you would just use a
16747
dictionary! So for the rest of this chapter, you have to imagine that
16748
dictionaries don't exist and you want to implement a data structure
16749
that maps from keys to values. The operations you have to
16750
implement are:
16751
16752
\begin{description}
16753
16754
\item[{\tt add(k, v)}:] Add a new item that maps from key {\tt k}
16755
to value {\tt v}. With a Python dictionary, {\tt d}, this operation
16756
is written {\tt d[k] = v}.
16757
16758
\item[{\tt get(k)}:] Look up and return the value that corresponds
16759
to key {\tt k}. With a Python dictionary, {\tt d}, this operation
16760
is written {\tt d[k]} or {\tt d.get(k)}.
16761
16762
\end{description}
16763
16764
For now, I assume that each key only appears once.
16765
The simplest implementation of this interface uses a list of
16766
tuples, where each tuple is a key-value pair.
16767
\index{LinearMap@{\tt LinearMap}}
16768
16769
\begin{verbatim}
16770
class LinearMap:
16771
16772
def __init__(self):
16773
self.items = []
16774
16775
def add(self, k, v):
16776
self.items.append((k, v))
16777
16778
def get(self, k):
16779
for key, val in self.items:
16780
if key == k:
16781
return val
16782
raise KeyError
16783
\end{verbatim}
16784
16785
{\tt add} appends a key-value tuple to the list of items, which
16786
takes constant time.
16787
16788
{\tt get} uses a {\tt for} loop to search the list:
16789
if it finds the target key it returns the corresponding value;
16790
otherwise it raises a {\tt KeyError}.
16791
So {\tt get} is linear.
16792
\index{KeyError@{\tt KeyError}}
16793
16794
An alternative is to keep the list sorted by key. Then {\tt get}
16795
could use a bisection search, which is $O(\log n)$. But inserting a
16796
new item in the middle of a list is linear, so this might not be the
16797
best option. There are other data structures that can implement {\tt
16798
add} and {\tt get} in log time, but that's still not as good as
16799
constant time, so let's move on.
16800
\index{red-black tree}
16801
16802
One way to improve {\tt LinearMap} is to break the list of key-value
16803
pairs into smaller lists. Here's an implementation called
16804
{\tt BetterMap}, which is a list of 100 LinearMaps. As we'll see
16805
in a second, the order of growth for {\tt get} is still linear,
16806
but {\tt BetterMap} is a step on the path toward hashtables:
16807
\index{BetterMap@{\tt BetterMap}}
16808
16809
\begin{verbatim}
16810
class BetterMap:
16811
16812
def __init__(self, n=100):
16813
self.maps = []
16814
for i in range(n):
16815
self.maps.append(LinearMap())
16816
16817
def find_map(self, k):
16818
index = hash(k) % len(self.maps)
16819
return self.maps[index]
16820
16821
def add(self, k, v):
16822
m = self.find_map(k)
16823
m.add(k, v)
16824
16825
def get(self, k):
16826
m = self.find_map(k)
16827
return m.get(k)
16828
\end{verbatim}
16829
16830
\verb"__init__" makes a list of {\tt n} {\tt LinearMap}s.
16831
16832
\verb"find_map" is used by
16833
{\tt add} and {\tt get}
16834
to figure out which map to put the
16835
new item in, or which map to search.
16836
16837
\verb"find_map" uses the built-in function {\tt hash}, which takes
16838
almost any Python object and returns an integer. A limitation of this
16839
implementation is that it only works with hashable keys. Mutable
16840
types like lists and dictionaries are unhashable.
16841
\index{hash function}
16842
16843
Hashable objects that are considered equivalent return the same hash
16844
value, but the converse is not necessarily true: two objects with
16845
different values can return the same hash value.
16846
16847
\verb"find_map" uses the modulus operator to wrap the hash values
16848
into the range from 0 to {\tt len(self.maps)}, so the result is a legal
16849
index into the list. Of course, this means that many different
16850
hash values will wrap onto the same index. But if the hash function
16851
spreads things out pretty evenly (which is what hash functions
16852
are designed to do), then we expect $n/100$ items per LinearMap.
16853
16854
Since the run time of {\tt LinearMap.get} is proportional to the
16855
number of items, we expect BetterMap to be about 100 times faster
16856
than LinearMap. The order of growth is still linear, but the
16857
leading coefficient is smaller. That's nice, but still not
16858
as good as a hashtable.
16859
16860
Here (finally) is the crucial idea that makes hashtables fast: if you
16861
can keep the maximum length of the LinearMaps bounded, {\tt
16862
LinearMap.get} is constant time. All you have to do is keep track
16863
of the number of items and when the number of
16864
items per LinearMap exceeds a threshold, resize the hashtable by
16865
adding more LinearMaps.
16866
\index{bounded}
16867
16868
Here is an implementation of a hashtable:
16869
\index{HashMap}
16870
16871
\begin{verbatim}
16872
class HashMap:
16873
16874
def __init__(self):
16875
self.maps = BetterMap(2)
16876
self.num = 0
16877
16878
def get(self, k):
16879
return self.maps.get(k)
16880
16881
def add(self, k, v):
16882
if self.num == len(self.maps.maps):
16883
self.resize()
16884
16885
self.maps.add(k, v)
16886
self.num += 1
16887
16888
def resize(self):
16889
new_maps = BetterMap(self.num * 2)
16890
16891
for m in self.maps.maps:
16892
for k, v in m.items:
16893
new_maps.add(k, v)
16894
16895
self.maps = new_maps
16896
\end{verbatim}
16897
16898
\verb"__init__" creates a {\tt BetterMap} and initializes {\tt num}, which keeps track of the number of items.
16899
16900
{\tt get} just dispatches to {\tt BetterMap}. The real work happens
16901
in {\tt add}, which checks the number of items and the size of the
16902
{\tt BetterMap}: if they are equal, the average number of items per
16903
LinearMap is 1, so it calls {\tt resize}.
16904
16905
{\tt resize} make a new {\tt BetterMap}, twice as big as the previous
16906
one, and then ``rehashes'' the items from the old map to the new.
16907
16908
Rehashing is necessary because changing the number of LinearMaps
16909
changes the denominator of the modulus operator in
16910
\verb"find_map". That means that some objects that used
16911
to hash into the same LinearMap will get split up (which is
16912
what we wanted, right?).
16913
\index{rehashing}
16914
16915
Rehashing is linear, so
16916
{\tt resize} is linear, which might seem bad, since I promised
16917
that {\tt add} would be constant time. But remember that
16918
we don't have to resize every time, so {\tt add} is usually
16919
constant time and only occasionally linear. The total amount
16920
of work to run {\tt add} $n$ times is proportional to $n$,
16921
so the average time of each {\tt add} is constant time!
16922
\index{constant time}
16923
16924
To see how this works, think about starting with an empty
16925
HashTable and adding a sequence of items. We start with 2 LinearMaps,
16926
so the first 2 adds are fast (no resizing required). Let's
16927
say that they take one unit of work each. The next add
16928
requires a resize, so we have to rehash the first two
16929
items (let's call that 2 more units of work) and then
16930
add the third item (one more unit). Adding the next item
16931
costs 1 unit, so the total so far is
16932
6 units of work for 4 items.
16933
16934
The next {\tt add} costs 5 units, but the next three
16935
are only one unit each, so the total is 14 units for the
16936
first 8 adds.
16937
16938
The next {\tt add} costs 9 units, but then we can add 7 more
16939
before the next resize, so the total is 30 units for the
16940
first 16 adds.
16941
16942
After 32 adds, the total cost is 62 units, and I hope you are starting
16943
to see a pattern. After $n$ adds, where $n$ is a power of two, the
16944
total cost is $2n-2$ units, so the average work per add is
16945
a little less than 2 units. When $n$ is a power of two, that's
16946
the best case; for other values of $n$ the average work is a little
16947
higher, but that's not important. The important thing is that it
16948
is $O(1)$.
16949
\index{average cost}
16950
16951
Figure~\ref{fig.hash} shows how this works graphically. Each
16952
block represents a unit of work. The columns show the total
16953
work for each add in order from left to right: the first two
16954
{\tt adds} cost 1 unit each, the third costs 3 units, etc.
16955
16956
\begin{figure}
16957
\centerline{\includegraphics[width=5.5in]{figs/towers.pdf}}
16958
\caption{The cost of a hashtable add.\label{fig.hash}}
16959
\end{figure}
16960
16961
The extra work of rehashing appears as a sequence of increasingly
16962
tall towers with increasing space between them. Now if you knock
16963
over the towers, spreading the cost of resizing over all
16964
adds, you can see graphically that the total cost after $n$
16965
adds is $2n - 2$.
16966
16967
An important feature of this algorithm is that when we resize the
16968
HashTable it grows geometrically; that is, we multiply the size by a
16969
constant. If you increase the size
16970
arithmetically---adding a fixed number each time---the average time
16971
per {\tt add} is linear.
16972
\index{geometric resizing}
16973
16974
You can download my implementation of HashMap from
16975
\url{http://thinkpython2.com/code/Map.py}, but remember that there
16976
is no reason to use it; if you want a map, just use a Python dictionary.
16977
16978
\section{Glossary}
16979
16980
\begin{description}
16981
16982
\item[analysis of algorithms:] A way to compare algorithms in terms of
16983
their run time and/or space requirements.
16984
\index{analysis of algorithms}
16985
16986
\item[machine model:] A simplified representation of a computer used
16987
to describe algorithms.
16988
\index{machine model}
16989
16990
\item[worst case:] The input that makes a given algorithm run slowest (or
16991
require the most space).
16992
\index{worst case}
16993
16994
\item[leading term:] In a polynomial, the term with the highest exponent.
16995
\index{leading term}
16996
16997
\item[crossover point:] The problem size where two algorithms require
16998
the same run time or space.
16999
\index{crossover point}
17000
17001
\item[order of growth:] A set of functions that all grow in a way
17002
considered equivalent for purposes of analysis of algorithms.
17003
For example, all functions that grow linearly belong to the same
17004
order of growth.
17005
\index{order of growth}
17006
17007
\item[Big-Oh notation:] Notation for representing an order of growth;
17008
for example, $O(n)$ represents the set of functions that grow
17009
linearly.
17010
\index{Big-Oh notation}
17011
17012
\item[linear:] An algorithm whose run time is proportional to
17013
problem size, at least for large problem sizes.
17014
\index{linear}
17015
17016
\item[quadratic:] An algorithm whose run time is proportional to
17017
$n^2$, where $n$ is a measure of problem size.
17018
\index{quadratic}
17019
17020
\item[search:] The problem of locating an element of a collection
17021
(like a list or dictionary) or determining that it is not present.
17022
\index{search}
17023
17024
\item[hashtable:] A data structure that represents a collection of
17025
key-value pairs and performs search in constant time.
17026
\index{hashtable}
17027
17028
\end{description}
17029
17030
17031
\printindex
17032
17033
\clearemptydoublepage
17034
%\blankpage
17035
%\blankpage
17036
%\blankpage
17037
17038
17039
\end{document}
17040
17041