Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
| Download

📚 The CoCalc Library - books, templates and other resources

Views: 96189
License: OTHER
1
% LaTeX source for ``Think DSP: Digital Signal Processing for Programmers''
2
% Copyright 2014 Allen B. Downey.
3
4
% License: Creative Commons Attribution-NonCommercial 3.0 Unported License.
5
% http://creativecommons.org/licenses/by-nc/3.0/
6
%
7
8
\documentclass[12pt]{book}
9
\usepackage[width=5.5in,height=8.5in,
10
hmarginratio=3:2,vmarginratio=1:1]{geometry}
11
12
% for some of these packages, you might have to install
13
% texlive-latex-extra (in Ubuntu)
14
15
\usepackage[T1]{fontenc}
16
\usepackage{textcomp}
17
\usepackage{mathpazo}
18
\usepackage{url}
19
\usepackage{graphicx}
20
\usepackage{subfig}
21
\usepackage{amsmath}
22
\usepackage{amsthm}
23
\usepackage{makeidx}
24
\usepackage{setspace}
25
\usepackage{hevea}
26
\usepackage{upquote}
27
\usepackage{fancyhdr}
28
\usepackage[bookmarks]{hyperref}
29
30
\title{Think DSP}
31
\author{Allen B. Downey}
32
33
\newcommand{\thetitle}{Think DSP: Digital Signal Processing in Python}
34
\newcommand{\theversion}{0.10.4}
35
36
% these styles get translated in CSS for the HTML version
37
\newstyle{a:link}{color:black;}
38
\newstyle{p+p}{margin-top:1em;margin-bottom:1em}
39
\newstyle{img}{border:0px}
40
41
% change the arrows in the HTML version
42
\setlinkstext
43
{\imgsrc[ALT="Previous"]{back.png}}
44
{\imgsrc[ALT="Up"]{up.png}}
45
{\imgsrc[ALT="Next"]{next.png}}
46
47
\makeindex
48
49
\newif\ifplastex
50
\plastexfalse
51
52
\begin{document}
53
54
\frontmatter
55
56
\ifplastex
57
58
\else
59
\fi
60
61
\ifplastex
62
\usepackage{localdef}
63
\maketitle
64
65
\else
66
67
\newtheoremstyle{exercise}% name of the style to be used
68
{\topsep}% measure of space to leave above the theorem. E.g.: 3pt
69
{\topsep}% measure of space to leave below the theorem. E.g.: 3pt
70
{}% name of font to use in the body of the theorem
71
{0pt}% measure of space to indent
72
{\bfseries}% name of head font
73
{}% punctuation between head and body
74
{ }% space after theorem head; " " = normal interword space
75
{}% Manually specify head
76
77
\theoremstyle{exercise}
78
\newtheorem{exercise}{Exercise}[chapter]
79
80
\input{latexonly}
81
82
\begin{latexonly}
83
84
\renewcommand{\blankpage}{\thispagestyle{empty} \quad \newpage}
85
86
% TITLE PAGES FOR LATEX VERSION
87
88
%-half title--------------------------------------------------
89
\thispagestyle{empty}
90
91
\begin{flushright}
92
\vspace*{2.0in}
93
94
\begin{spacing}{3}
95
{\huge Think DSP}\\
96
{\Large Digital Signal Processing in Python}
97
\end{spacing}
98
99
\vspace{0.25in}
100
101
Version \theversion
102
103
\vfill
104
105
\end{flushright}
106
107
%--verso------------------------------------------------------
108
109
\blankpage
110
\blankpage
111
112
%--title page--------------------------------------------------
113
\pagebreak
114
\thispagestyle{empty}
115
116
\begin{flushright}
117
\vspace*{2.0in}
118
119
\begin{spacing}{3}
120
{\huge Think DSP}\\
121
{\Large Digital Signal Processing in Python}
122
\end{spacing}
123
124
\vspace{0.25in}
125
126
Version \theversion
127
128
\vspace{1in}
129
130
131
{\Large
132
Allen B. Downey\\
133
}
134
135
136
\vspace{0.5in}
137
138
{\Large Green Tea Press}
139
140
{\small Needham, Massachusetts}
141
142
\vfill
143
144
\end{flushright}
145
146
147
%--copyright--------------------------------------------------
148
\pagebreak
149
\thispagestyle{empty}
150
151
Copyright \copyright ~2014 Allen B. Downey.
152
153
154
\vspace{0.2in}
155
156
\begin{flushleft}
157
Green Tea Press \\
158
9 Washburn Ave \\
159
Needham MA 02492
160
\end{flushleft}
161
162
Permission is granted to copy, distribute, and/or modify this document
163
under the terms of the Creative Commons Attribution-NonCommercial 3.0 Unported
164
License, which is available at \url{http://creativecommons.org/licenses/by-nc/3.0/}.
165
166
\vspace{0.2in}
167
168
\end{latexonly}
169
170
171
% HTMLONLY
172
173
\begin{htmlonly}
174
175
% TITLE PAGE FOR HTML VERSION
176
177
{\Large \thetitle}
178
179
{\large Allen B. Downey}
180
181
Version \theversion
182
183
\vspace{0.25in}
184
185
Copyright 2012 Allen B. Downey
186
187
\vspace{0.25in}
188
189
Permission is granted to copy, distribute, and/or modify this document
190
under the terms of the Creative Commons Attribution-NonCommercial 3.0
191
Unported License, which is available at
192
\url{http://creativecommons.org/licenses/by-nc/3.0/}.
193
194
\setcounter{chapter}{-1}
195
196
\end{htmlonly}
197
198
\fi
199
% END OF THE PART WE SKIP FOR PLASTEX
200
201
\chapter{Preface}
202
\label{preface}
203
204
Signal processing is one of my favorite topics. It is useful
205
in many areas of science and engineering, and if you understand
206
the fundamental ideas, it provides insight into many things
207
we see in the world, and especially the things we hear.
208
209
But unless you studied electrical or mechanical engineering, you
210
probably haven't had a chance to learn about signal processing. The
211
problem is that most books (and the classes that use them) present the
212
material bottom-up, starting with mathematical abstractions like
213
phasors. And they tend to be theoretical, with few applications and
214
little apparent relevance.
215
216
The premise of this book is that if you know how to program, you
217
can use that skill to learn other things, and have fun doing it.
218
219
With a programming-based approach, I can present the most important
220
ideas right away. By the end of the first chapter, you can analyze
221
sound recordings and other signals, and generate new sounds. Each
222
chapter introduces a new technique and an application you can
223
apply to real signals. At each step you learn how to use a
224
technique first, and then how it works.
225
226
This approach is more practical and, I hope you'll agree, more fun.
227
228
229
\section{Who is this book for?}
230
231
The examples and supporting code for this book are in Python. You
232
should know core Python and you should be
233
familiar with object-oriented features, at least using objects if not
234
defining your own.
235
236
If you are not already familiar with Python, you might want to start
237
with my other book, {\it Think Python}, which is an introduction to
238
Python for people who have never programmed, or Mark
239
Lutz's {\it Learning Python}, which might be better for people with
240
programming experience.
241
242
I use NumPy and SciPy extensively. If you are familiar with them
243
already, that's great, but I will also explain the functions
244
and data structures I use.
245
246
I assume that the reader knows basic mathematics, including complex
247
numbers. You don't need much calculus; if you understand the concepts
248
of integration and differentiation, that will do.
249
I use some linear algebra, but I will explain it as we
250
go along.
251
252
253
\section{Using the code}
254
\label{code}
255
256
The code and sound samples used in this book are available from
257
\url{https://github.com/AllenDowney/ThinkDSP}. Git is a version
258
control system that allows you to keep track of the files that
259
make up a project. A collection of files under Git's control is
260
called a {\bf repository}. GitHub is a hosting service that provides
261
storage for Git repositories and a convenient web interface.
262
\index{repository}
263
\index{Git}
264
\index{GitHub}
265
266
The GitHub homepage for my repository provides several ways to
267
work with the code:
268
269
\begin{itemize}
270
271
\item You can create a copy of my repository
272
on GitHub by pressing the {\sf Fork} button. If you don't already
273
have a GitHub account, you'll need to create one. After forking, you'll
274
have your own repository on GitHub that you can use to keep track
275
of code you write while working on this book. Then you can
276
clone the repo, which means that you copy the files
277
to your computer.
278
\index{fork}
279
280
\item Or you could clone
281
my repository. You don't need a GitHub account to do this, but you
282
won't be able to write your changes back to GitHub.
283
\index{clone}
284
285
\item If you don't want to use Git at all, you can download the files
286
in a Zip file using the button in the lower-right corner of the
287
GitHub page.
288
289
\end{itemize}
290
291
All of the code is written to work in both Python 2 and Python 3
292
with no translation.
293
294
I developed this book using Anaconda from
295
Continuum Analytics, which is a free Python distribution that includes
296
all the packages you'll need to run the code (and lots more).
297
I found Anaconda easy to install. By default it does a user-level
298
installation, not system-level, so you don't need administrative
299
privileges. And it supports both Python 2 and Python 3. You can
300
download Anaconda from \url{http://continuum.io/downloads}.
301
\index{Anaconda}
302
303
If you don't want to use Anaconda, you will need the following
304
packages:
305
306
\begin{itemize}
307
308
\item NumPy for basic numerical computation, \url{http://www.numpy.org/};
309
\index{NumPy}
310
311
\item SciPy for scientific computation,
312
\url{http://www.scipy.org/};
313
\index{SciPy}
314
315
\item matplotlib for visualization, \url{http://matplotlib.org/}.
316
\index{matplotlib}
317
318
\end{itemize}
319
320
Although these are commonly used packages, they are not included with
321
all Python installations, and they can be hard to install in some
322
environments. If you have trouble installing them, I
323
recommend using Anaconda or one of the other Python distributions
324
that include these packages.
325
\index{installation}
326
327
Most exercises use Python scripts, but some also use the IPython
328
notebook. If you have not used IPython notebook before, I suggest
329
you start with the documentation at
330
\url{http://ipython.org/ipython-doc/stable/notebook/notebook.html}.
331
\index{IPython}
332
333
Good luck, and have fun!
334
335
336
337
\section*{Contributor List}
338
339
If you have a suggestion or correction, please send email to
340
{\tt downey@allendowney.com}. If I make a change based on your
341
feedback, I will add you to the contributor list
342
(unless you ask to be omitted).
343
\index{contributors}
344
345
If you include at least part of the sentence the
346
error appears in, that makes it easy for me to search. Page and
347
section numbers are fine, too, but not as easy to work with.
348
Thanks!
349
350
\small
351
352
\begin{itemize}
353
354
\item Before I started writing, my thoughts about this book
355
benefited from conversations with Boulos Harb at Google and
356
Aurelio Ramos, formerly at Harmonix Music Systems.
357
358
\item During the Fall 2013 semester, Nathan Lintz and Ian Daniher
359
worked with me on an independent study project and helped me with
360
the first draft of this book.
361
362
\item On Reddit's DSP forum, the anonymous user RamjetSoundwave
363
helped me fix a problem with my implementation of Brownian Noise.
364
And andodli found a typo.
365
366
\item In Spring 2015 I had the pleasure of teaching this material
367
along with Prof. Oscar Mur-Miranda and Prof. Siddhartan Govindasamy.
368
Both made many suggestions and corrections.
369
370
\item Silas Gyger corrected an arithmetic error.
371
372
% ENDCONTRIB
373
374
\end{itemize}
375
376
\normalsize
377
378
\clearemptydoublepage
379
380
% TABLE OF CONTENTS
381
\begin{latexonly}
382
383
\tableofcontents
384
385
\clearemptydoublepage
386
387
\end{latexonly}
388
389
% START THE BOOK
390
\mainmatter
391
392
393
\chapter{Sounds and signals}
394
\label{sounds}
395
396
A {\bf signal} represents a quantity that varies in time,
397
or space, or both. That definition is pretty abstract, so let's start
398
with a concrete example: sound. Sound is variation in air pressure.
399
A sound signal represents variations in air pressure over time.
400
401
A microphone is a device that measures these variations and generates
402
an electrical signal that represents sound. A speaker is a device
403
that takes an electrical signal and produces sound.
404
Microphones and speakers are called {\bf transducers} because they
405
transduce, or convert, signals from one form to another.
406
407
This book is about signal processing, which includes processes for
408
synthesizing, transforming, and analyzing signals. I will focus on
409
sound signals, but the same methods apply to electronic signals,
410
mechanical vibration, and signals in many other domains.
411
412
They also apply to signals that vary in space rather than time, like
413
elevation along a hiking trail. And they apply to signals in more
414
than one dimension, like an image, which you can think of as a signal
415
that varies in two-dimensional space. Or a movie, which is
416
a signal that varies in two-dimensional space {\it and} time.
417
418
But we start with simple one-dimensional sound.
419
420
The code for this chapter is in {\tt chap01.ipynb}, which is in the
421
repository for this book (see Section~\ref{code}).
422
You can also view it at \url{http://tinyurl.com/thinkdsp01}.
423
424
425
\section{Periodic signals}
426
\label{violin}
427
428
\begin{figure}
429
% sounds.py
430
\centerline{\includegraphics[height=2.5in]{figs/sounds1.eps}}
431
\caption{Segment from a recording of a bell.}
432
\label{fig.sounds1}
433
\end{figure}
434
435
We'll start with {\bf periodic signals}, which are signals that
436
repeat themselves after some period of time. For example, if you
437
strike a bell, it vibrates and generates sound. If you record
438
that sound and plot the transduced signal, it looks like
439
Figure~\ref{fig.sounds1}.
440
441
This signal resembles a {\bf sinusoid}, which means it has the same
442
shape as the trigonometric sine function.
443
444
You can see that this signal is periodic. I chose the duration
445
to show three full periods, also known as {\bf cycles}.
446
The duration of each cycle is about 2.3 ms.
447
448
The {\bf frequency} of a signal is the number of cycles
449
per second, which is the inverse of the period.
450
The units of frequency are cycles per second, or {\bf Hertz},
451
abbreviated ``Hz''.
452
453
The frequency of this signal is about 439 Hz, slightly lower than 440
454
Hz, which is the standard tuning pitch for orchestral music. The
455
musical name of this note is A, or more specifically, A4. If you are
456
not familiar with ``scientific pitch notation'', the numerical suffix
457
indicates which octave the note is in. A4 is the A above middle C.
458
A5 is one octave higher. See
459
\url{http://en.wikipedia.org/wiki/Scientific_pitch_notation}.
460
461
\begin{figure}
462
% sounds.py
463
\centerline{\includegraphics[height=2.5in]{figs/sounds2.eps}}
464
\caption{Segment from a recording of a violin.}
465
\label{fig.sounds2}
466
\end{figure}
467
468
A tuning fork generates a sinusoid because the vibration of the tines
469
is a form of simple harmonic motion. Most musical instruments
470
produce periodic signals, but the shape of these signals is not
471
sinusoidal. For example, Figure~\ref{fig.sounds2} shows a segment
472
from a recording of a violin playing
473
Boccherini's String Quintet No. 5 in E, 3rd
474
movement.
475
476
%\footnote{The recording is from
477
% \url{http://www.freesound.org/people/jcveliz/sounds/92002/}.
478
%I identified the piece using \url{http://www.musipedia.org}.}
479
% Parson's code: DUUDDUURDR
480
481
Again we can see that the signal is periodic, but the shape of the
482
signal is more complex. The shape of a periodic signal is called
483
the {\bf waveform}. Most musical instruments produce waveforms more
484
complex than a sinusoid. The shape of the waveform determines the
485
musical {\bf timbre}, which is our perception of the quality of the
486
sound. People usually perceive complex waveforms as rich, warm and
487
more interesting than sinusoids.
488
489
490
\section{Spectral decomposition}
491
492
\begin{figure}
493
% sounds.py
494
\centerline{\includegraphics[height=2.5in]{figs/sounds3.eps}}
495
\caption{Spectrum of a segment from the violin recording.}
496
\label{fig.sounds3}
497
\end{figure}
498
499
The most important topic in this book is {\bf spectral decomposition},
500
which is the idea that any signal can be expressed as the sum of
501
sinusoids with different frequencies.
502
503
The most important mathematical idea in this book is the {\bf discrete
504
Fourier transform}, or {\bf DFT}, which takes a signal and produces
505
its {\bf spectrum}. The spectrum is the set of sinusoids that add up to
506
produce the signal.
507
508
And the most important algorithm in this book is the {\bf Fast
509
Fourier transform}, or {\bf FFT}, which is an efficient way to
510
compute the DFT.
511
512
For example, Figure~\ref{fig.sounds3} shows the spectrum of the violin
513
recording in Figure~\ref{fig.sounds2}. The x-axis is the range of
514
frequencies that make up the signal. The y-axis shows the strength
515
or {\bf amplitude} of each frequency component.
516
517
The lowest frequency component is called the {\bf fundamental
518
frequency}. The fundamental frequency of this signal is near 440 Hz
519
(actually a little lower, or ``flat'').
520
521
In this signal the fundamental frequency has the largest amplitude,
522
so it is also the {\bf dominant frequency}.
523
Normally the perceived pitch of a sound is determined by the
524
fundamental frequency, even if it is not dominant.
525
526
The other spikes in the spectrum are at frequencies 880, 1320, 1760, and
527
2200, which are integer multiples of the fundamental.
528
These components are called {\bf harmonics} because they are
529
musically harmonious with the fundamental:
530
531
\begin{itemize}
532
533
\item 880 is the frequency of A5, one octave higher than the fundamental.
534
535
\item 1320 is approximately E6, which is a major fifth above A5.
536
If you are not familiar with musical intervals like "major fifth'', see
537
\url{https://en.wikipedia.org/wiki/Interval_(music)}.
538
539
\item 1760 is A6, two octaves above the fundamental.
540
541
\item 2200 is approximately C$\sharp$7, which is a major third
542
above A6.
543
544
\end{itemize}
545
546
These harmonics make up the notes of an A major
547
chord, although not all in the same octave. Some of them are only
548
approximate because the notes that make up Western music have been
549
adjusted for {\bf equal temperament} (see
550
\url{http://en.wikipedia.org/wiki/Equal_temperament}).
551
552
Given the harmonics and their amplitudes, you can reconstruct the
553
signal by adding up sinusoids.
554
Next we'll see how.
555
556
557
\section{Signals}
558
559
I wrote a Python module called {\tt thinkdsp.py} that contains classes
560
and functions for working with signals and spectrums\footnote{The
561
plural of ``spectrum'' is often written ``spectra'', but I prefer
562
to use standard English plurals. If you are familiar with ``spectra'',
563
I hope my choice doesn't sound too strange.}. You
564
will find it in the repository for this book (see Section~\ref{code}).
565
566
To represent signals, {\tt thinkdsp} provides a class called
567
{\tt Signal}, which is the parent class for several signal types,
568
including {\tt Sinusoid}, which represents both sine and cosine
569
signals.
570
571
{\tt thinkdsp} provides functions to create sine and cosine signals:
572
573
\begin{verbatim}
574
cos_sig = thinkdsp.CosSignal(freq=440, amp=1.0, offset=0)
575
sin_sig = thinkdsp.SinSignal(freq=880, amp=0.5, offset=0)
576
\end{verbatim}
577
578
{\tt freq} is frequency in Hz. {\tt amp} is amplitude in unspecified
579
units where 1.0 is defined as the largest amplitude we can record or
580
play back.
581
582
{\tt offset} is a {\bf phase offset} in radians. Phase offset
583
determines where in the period the signal starts. For example, a
584
sine signal with {\tt offset=0} starts at $\sin 0$, which is 0.
585
With {\tt offset=pi/2} it starts at $\sin \pi/2$, which is 1. A cosine
586
signal with {\tt offset=0} also starts at 0. In fact, a cosine signal
587
with {\tt offset=0} is identical to a sine signal with {\tt
588
offset=pi/2}.
589
590
Signals have an \verb"__add__" method, so you can use the {\tt +}
591
operator to add them:
592
593
\begin{verbatim}
594
mix = sin_sig + cos_sig
595
\end{verbatim}
596
597
The result is a {\tt SumSignal}, which represents the sum of two
598
or more signals.
599
600
A Signal is basically a Python representation of a mathematical
601
function. Most signals are defined for all values of {\tt t},
602
from negative infinity to infinity.
603
604
You can't do much with a Signal until you evaluate it. In this
605
context, ``evaluate'' means taking a sequence of points in time, {\tt
606
ts}, and computing the corresponding values of the signal, {\tt ys}.
607
I represent {\tt ts} and {\tt ys} using NumPy arrays and excapsulate
608
them in an object called a Wave.
609
610
A Wave represents a signal evaluated at a sequence of points in
611
time. Each point in time is called a {\bf frame} (a term borrowed
612
from movies and video). The measurement itself is called a
613
{\bf sample}, although ``frame'' and ``sample'' are sometimes
614
used interchangeably.
615
616
{\tt Signal} provides \verb"make_wave", which returns a new
617
Wave object:
618
619
\begin{verbatim}
620
wave = mix.make_wave(duration=0.5, start=0, framerate=11025)
621
\end{verbatim}
622
623
{\tt duration} is the length of the Wave in seconds. {\tt start} is
624
the start time, also in seconds. {\tt framerate} is the (integer)
625
number of frames per second, which is also the number of samples
626
per second.
627
628
11,025 frames per second is one of several framerates commonly used in
629
audio file formats, including Waveform Audio File (WAV) and mp3.
630
631
This example evaluates the signal from {\tt t=0} to {\tt t=0.5} at
632
5,513 equally-spaced frames (because 5,513 is half of 11,025).
633
The time between frames, or {\bf timestep}, is {\tt 1/11025} seconds, or
634
91 $\mu$s.
635
636
{\tt Wave} provides a {\tt plot} method that uses {\tt pyplot}.
637
You can plot the wave like this:
638
639
\begin{verbatim}
640
wave.plot()
641
pyplot.show()
642
\end{verbatim}
643
644
{\tt pyplot} is part of {\tt matplotlib}; it is included in many
645
Python distributions, or you might have to install it.
646
647
\begin{figure}
648
% sounds4.py
649
\centerline{\includegraphics[height=2.5in]{figs/sounds4.eps}}
650
\caption{Segment from a mixture of two sinusoid signals.}
651
\label{fig.sounds4}
652
\end{figure}
653
654
At {\tt freq=440} there are 220 periods in 0.5 seconds, so this plot
655
would look like a solid block of color. To zoom in on a small number
656
of periods, we can use {\tt segment}, which copies a segment of a Wave
657
and returns a new wave:
658
659
\begin{verbatim}
660
period = mix.period
661
segment = wave.segment(start=0, duration=period*3)
662
\end{verbatim}
663
664
{\tt period} is a property of a Signal; it returns the period in seconds.
665
666
{\tt start} and {\tt duration} are in seconds. This example copies
667
the first three periods from {\tt mix}. The result is a Wave object.
668
669
If we plot {\tt segment}, it looks like Figure~\ref{fig.sounds4}.
670
This signal contains two frequency components, so it is more
671
complicated than the signal from the tuning fork, but less complicated
672
than the violin.
673
674
675
\section{Reading and writing Waves}
676
677
{\tt thinkdsp} provides \verb"read_wave", which reads a WAV
678
file and returns a Wave:
679
680
\begin{verbatim}
681
violin_wave = thinkdsp.read_wave('input.wav')
682
\end{verbatim}
683
684
And {\tt Wave} provides {\tt write}, which writes a WAV file:
685
686
\begin{verbatim}
687
wave.write(filename='output.wav')
688
\end{verbatim}
689
690
You can listen to the Wave with any media player that plays WAV
691
files. On UNIX systems, I use {\tt aplay}, which is simple, robust,
692
and included in many Linux distributions.
693
694
{\tt thinkdsp} also provides \verb"play_wave", which runs
695
the media player as a subprocess:
696
697
\begin{verbatim}
698
thinkdsp.play_wave(filename='output.wav', player='aplay')
699
\end{verbatim}
700
701
It uses {\tt aplay} by default, but you can provide the
702
name of another player.
703
704
705
\section{Spectrums}
706
\label{spectrums}
707
708
{\tt Wave} provides \verb"make_spectrum", which returns a
709
{\tt Spectrum}:
710
711
\begin{verbatim}
712
spectrum = wave.make_spectrum()
713
\end{verbatim}
714
715
And {\tt Spectrum} provides {\tt plot}:
716
717
\begin{verbatim}
718
spectrum.plot()
719
thinkplot.show()
720
\end{verbatim}
721
722
{\tt thinkplot} is a module I wrote to provide wrappers around some of
723
the functions in {\tt pyplot}. It is included in the
724
Git repository for this book (see Section~\ref{code}).
725
726
{\tt Spectrum} provides three methods that modify the spectrum:
727
728
\begin{itemize}
729
730
\item \verb"low_pass" applies a low-pass filter, which means that
731
components above a given cutoff frequency are attenuated (that is,
732
reduced in magnitude) by a factor.
733
734
\item \verb"high_pass" applies a high-pass filter, which means that
735
it attenuates components below the cutoff.
736
737
\item \verb"band_stop" attenuates components in the band of
738
frequencies between two cutoffs.
739
740
\end{itemize}
741
742
This example attenuates all frequencies above 600 by 99\%:
743
744
\begin{verbatim}
745
spectrum.low_pass(cutoff=600, factor=0.01)
746
\end{verbatim}
747
748
A low pass filter removes bright, high-frequency sounds, so
749
the result sounds muffled and darker. To hear what it sounds
750
like, you can convert the Spectrum back to a Wave, and then play it.
751
752
\begin{verbatim}
753
wave = spectrum.make_wave()
754
wave.play('temp.wav')
755
\end{verbatim}
756
757
The {\tt play} method writes the wave to a file and then plays it.
758
If you use IPython notebooks, you can use \verb"make_audio", which
759
makes an Audio widget that plays the sound.
760
761
762
\section{Wave objects}
763
764
\begin{figure}
765
% http://yuml.me/edit/5294b377
766
% pdftops -eps diagram1.pdf
767
\centerline{\includegraphics[width=3.5in]{figs/diagram1.eps}}
768
\caption{Relationships among the classes in {\tt thinkdsp}.}
769
\label{fig.diagram1}
770
\end{figure}
771
772
There is nothing very complicated in {\tt thinkdsp.py}. Most
773
of the functions it provides are thin wrappers around functions
774
from NumPy and SciPy.
775
776
The primary classes in {\tt thinkdsp} are Signal, Wave, and Spectrum.
777
Given a Signal, you can make a Wave. Given a Wave, you can
778
make a Spectrum, and vice versa. These relationships are shown
779
in Figure~\ref{fig.diagram1}.
780
781
A Wave object contains three attributes: {\tt ys} is a NumPy array
782
that contains the values in the signal; {\tt ts} is an array of the
783
times where the signal was evaluated or sampled; and {\tt
784
framerate} is the number of samples per unit of time. The
785
unit of time is usually seconds, but it doesn't have to be. In
786
one of my examples, it's days.
787
788
Wave also provides three read-only properties: {\tt start},
789
{\tt end}, and {\tt duration}. If you modify {\tt ts}, these
790
properties change accordingly.
791
792
To modify a wave, you can access the {\tt ts} and {\tt ys} directly.
793
For example:
794
795
\begin{verbatim}
796
wave.ys *= 2
797
wave.ts += 1
798
\end{verbatim}
799
800
The first line scales the wave by a factor of 2, making
801
it louder. The second line shifts the wave in time, making it
802
start 1 second later.
803
804
But Wave provides methods that perform many common operations.
805
For example, the same two transformations could be written:
806
807
\begin{verbatim}
808
wave.scale(2)
809
wave.shift(1)
810
\end{verbatim}
811
812
You can read the documentation of these methods and others at
813
\url{http://think-dsp.com/thinkdsp.html}.
814
815
816
\section{Signal objects}
817
\label{sigobs}
818
819
Signal is a parent class that provides functions common to all
820
kinds of signals, like \verb"make_wave". Child classes inherit
821
these methods and provide {\tt evaluate}, which evaluates the
822
signal at a given sequence of times.
823
824
For example, Sinusoid is a child class of Signal, with this
825
definition:
826
827
\begin{verbatim}
828
class Sinusoid(Signal):
829
830
def __init__(self, freq=440, amp=1.0, offset=0, func=np.sin):
831
Signal.__init__(self)
832
self.freq = freq
833
self.amp = amp
834
self.offset = offset
835
self.func = func
836
\end{verbatim}
837
838
The parameters of \verb"__init__" are:
839
840
\begin{itemize}
841
842
\item {\tt freq}: frequency in cycles per second, or Hz.
843
844
\item {\tt amp}: amplitude. The units of amplitude are arbitrary,
845
usually chosen so 1.0 corresponds to the maximum input from a
846
microphone or maximum output to a speaker.
847
848
\item {\tt offset}: indicates where in its period the signal starts;
849
{\tt offset} is in units of radians, for reasons I explain below.
850
851
\item {\tt func}: a Python function used
852
to evaluate the signal at a particular point in time. It is
853
usually either {\tt np.sin} or {\tt np.cos}, yielding a sine or
854
cosine signal.
855
856
\end{itemize}
857
858
Like many init methods, this one just tucks the parameters away for
859
future use.
860
861
Signal provides \verb"make_wave", which looks like
862
this:
863
864
\begin{verbatim}
865
def make_wave(self, duration=1, start=0, framerate=11025):
866
n = round(duration * framerate)
867
ts = start + np.arange(n) / framerate
868
ys = self.evaluate(ts)
869
return Wave(ys, ts, framerate=framerate)
870
\end{verbatim}
871
872
{\tt start} and {\tt duration} are the start time and duration
873
in seconds. {\tt framerate} is the number of frames (samples)
874
per second.
875
876
{\tt n} is the number of samples, and {\tt ts} is a NumPy array
877
of sample times.
878
879
To compute the {\tt ys}, \verb"make_wave" invokes {\tt evaluate},
880
is provided by {\tt Sinusoid}:
881
882
\begin{verbatim}
883
def evaluate(self, ts):
884
phases = PI2 * self.freq * ts + self.offset
885
ys = self.amp * self.func(phases)
886
return ys
887
\end{verbatim}
888
889
Let's unwind this function one step at time:
890
891
\begin{enumerate}
892
893
\item {\tt self.freq} is frequency in cycles per second, and each
894
element of {\tt ts} is a time in seconds, so their product is the
895
number of cycles since the start time.
896
897
\item {\tt PI2} is a constant that stores $2 \pi$. Multiplying by
898
{\tt PI2} converts from cycles to {\bf phase}. You can think of
899
phase as ``cycles since the start time'' expressed in radians. Each
900
cycle is $2 \pi$ radians.
901
902
\item {\tt self.offset} is the phase when $t=0$.
903
It has the effect of shifting the signal left or right in time.
904
905
\item If {\tt self.func} is {\tt np.sin} or {\tt np.cos}, the result is a
906
value between $-1$ and $+1$.
907
908
\item Multiplying by {\tt self.amp} yields a signal that ranges from
909
{\tt -self.amp} to {\tt +self.amp}.
910
911
\end{enumerate}
912
913
In math notation, {\tt evaluate} is written like this:
914
%
915
\[ y = A \cos (2 \pi f t + \phi_0) \]
916
%
917
where $A$ is amplitude, $f$ is frequency, $t$ is time, and $\phi_0$
918
is the phase offset. It may seem like I wrote a lot of code
919
to evaluate one simple expression, but as we'll see, this code
920
provides a framework for dealing with all kinds of signals, not
921
just sinusoids.
922
923
924
\section{Exercises}
925
926
Before you begin these exercises, you should download the code
927
for this book, following the instructions in Section~\ref{code}.
928
929
Solutions to these exercises are in {\tt chap01soln.ipynb}.
930
931
\begin{exercise}
932
If you have IPython, load {\tt chap01.ipynb}, read through it, and run
933
the examples. You can also view this notebook at
934
\url{http://tinyurl.com/thinkdsp01}.
935
\end{exercise}
936
937
938
\begin{exercise}
939
Go to \url{http://freesound.org} and download a sound sample that
940
includes music, speech, or other sounds that have a well-defined pitch.
941
Select a roughly half-second segment where the pitch is
942
constant. Compute and plot the spectrum of the segment you selected.
943
What connection can you make between the timbre of the sound and the
944
harmonic structure you see in the spectrum?
945
946
Use \verb"high_pass", \verb"low_pass", and \verb"band_stop" to
947
filter out some of the harmonics. Then convert the spectrum back
948
to a wave and listen to it. How does the sound relate to the
949
changes you made in the spectrum?
950
\end{exercise}
951
952
953
\begin{exercise}
954
Synthesize a compound signal by creating SinSignal and CosSignal
955
objects and adding them up. Evaluate the signal to get a Wave,
956
and listen to it. Compute its Spectrum and plot it.
957
What happens if you add frequency
958
components that are not multiples of the fundamental?
959
\end{exercise}
960
961
962
\begin{exercise}
963
Write a function called {\tt stretch} that takes a Wave and a stretch
964
factor and speeds up or slows down the wave by modifying {\tt ts} and
965
{\tt framerate}. Hint: it should only take two lines of code.
966
\end{exercise}
967
968
969
\chapter{Harmonics}
970
\label{harmonics}
971
972
In this chapter I present several new waveforms; we will look and
973
their spectrums to understand their {\bf harmonic structure}, which is
974
the set of sinusoids they are made up of.
975
976
I'll also introduce one of the most important phenomena in digital
977
signal processing: aliasing. And I'll explain a little more about how
978
the Spectrum class works.
979
980
The code for this chapter is in {\tt chap02.ipynb}, which is in the
981
repository for this book (see Section~\ref{code}).
982
You can also view it at \url{http://tinyurl.com/thinkdsp02}.
983
984
985
\section{Triangle waves}
986
\label{triangle}
987
988
A sinusoid contains only one frequency component, so its spectrum
989
has only one peak. More complicated waveforms, like the
990
violin recording, yield DFTs with many peaks. In this section we
991
investigate the relationship between waveforms and their spectrums.
992
993
\begin{figure}
994
% aliasing.py
995
\centerline{\includegraphics[height=2.5in]{figs/triangle-200-1.eps}}
996
\caption{Segment of a triangle signal at 200 Hz.}
997
\label{fig.triangle.200.1}
998
\end{figure}
999
1000
I'll start with a triangle waveform, which is like a straight-line
1001
version of a sinusoid. Figure~\ref{fig.triangle.200.1} shows a
1002
triangle waveform with frequency 200 Hz.
1003
1004
To generate a triangle wave, you can use {\tt thinkdsp.TriangleSignal}:
1005
1006
\begin{verbatim}
1007
class TriangleSignal(Sinusoid):
1008
1009
def evaluate(self, ts):
1010
cycles = self.freq * ts + self.offset / PI2
1011
frac, _ = np.modf(cycles)
1012
ys = np.abs(frac - 0.5)
1013
ys = normalize(unbias(ys), self.amp)
1014
return ys
1015
\end{verbatim}
1016
1017
{\tt TriangleSignal} inherits \verb"__init__" from {\tt Sinusoid},
1018
so it takes the same arguments: {\tt freq}, {\tt amp}, and {\tt offset}.
1019
1020
The only difference is {\tt evaluate}. As we saw before,
1021
{\tt ts} is the sequence of sample times where we want to
1022
evaluate the signal.
1023
1024
There are many ways to generate a triangle wave. The details
1025
are not important, but here's how {\tt evaluate} works:
1026
1027
\begin{enumerate}
1028
1029
\item {\tt cycles} is the number of cycles since the start time.
1030
{\tt np.modf} splits the number of cycles into the fraction
1031
part, stored in {\tt frac}, and the integer part, which is ignored
1032
\footnote{Using an underscore as a variable name is a convention that
1033
means, ``I don't intend to use this value.''}.
1034
1035
\item {\tt frac} is a sequence that ramps from 0 to 1 with the given
1036
frequency. Subtracting 0.5 yields values between -0.5 and 0.5.
1037
Taking the absolute value yields a waveform that zig-zags between
1038
0.5 and 0.
1039
1040
\item {\tt unbias} shifts the waveform down so it is centered at 0; then
1041
{\tt normalize} scales it to the given amplitude, {\tt amp}.
1042
1043
\end{enumerate}
1044
1045
Here's the code that generates Figure~\ref{fig.triangle.200.1}:
1046
1047
\begin{verbatim}
1048
signal = thinkdsp.TriangleSignal(200)
1049
signal.plot()
1050
\end{verbatim}
1051
1052
\begin{figure}
1053
% aliasing.py
1054
\centerline{\includegraphics[height=2.5in]{figs/triangle-200-2.eps}}
1055
\caption{Spectrum of a triangle signal at 200 Hz.}
1056
\label{fig.triangle.200.2}
1057
\end{figure}
1058
1059
Next we can use the Signal to make a Wave, and use the Wave to
1060
make a Spectrum:
1061
1062
\begin{verbatim}
1063
wave = signal.make_wave(duration=0.5, framerate=10000)
1064
spectrum = wave.make_spectrum()
1065
spectrum.plot()
1066
\end{verbatim}
1067
1068
Figure~\ref{fig.triangle.200.2} shows the result. As expected, the
1069
highest peak is at the fundamental frequency, 200 Hz, and there
1070
are additional peaks at harmonic frequencies, which are integer
1071
multiples of 200.
1072
1073
But one surprise is that there are no peaks at the even multiples:
1074
400, 800, etc. The harmonics of a triangle wave are all
1075
odd multiples of the fundamental frequency, in this example
1076
600, 1000, 1400, etc.
1077
1078
Another feature of this spectrum is the relationship between the
1079
amplitude and frequency of the harmonics. Their amplitude drops off
1080
in proportion to frequency squared. For example the frequency ratio
1081
of the first two harmonics (200 and 600 Hz) is 3, and the amplitude
1082
ratio is approximately 9. The frequency ratio of the next two
1083
harmonics (600 and 1000 Hz) is 1.7, and the amplitude ratio is
1084
approximately $1.7^2 = 2.9$. This relationship is called the
1085
{\bf harmonic structure}.
1086
1087
1088
\section{Square waves}
1089
\label{square}
1090
1091
\begin{figure}
1092
% aliasing.py
1093
\centerline{\includegraphics[height=2.5in]{figs/square-100-1.eps}}
1094
\caption{Segment of a square signal at 100 Hz.}
1095
\label{fig.square.100.1}
1096
\end{figure}
1097
1098
{\tt thinkdsp} also provides {\tt SquareSignal}, which represents
1099
a square signal. Here's the class definition:
1100
1101
\begin{verbatim}
1102
class SquareSignal(Sinusoid):
1103
1104
def evaluate(self, ts):
1105
cycles = self.freq * ts + self.offset / PI2
1106
frac, _ = np.modf(cycles)
1107
ys = self.amp * np.sign(unbias(frac))
1108
return ys
1109
\end{verbatim}
1110
1111
Like {\tt TriangleSignal}, {\tt SquareSignal} inherits
1112
\verb"__init__" from {\tt Sinusoid}, so it takes the same
1113
parameters.
1114
1115
And the {\tt evaluate} method is similar. Again, {\tt cycles} is
1116
the number of cycles since the start time, and {\tt frac} is the
1117
fractional part, which ramps from 0 to 1 each period.
1118
1119
{\tt unbias} shifts {\tt frac} so it ramps from -0.5 to 0.5,
1120
then {\tt np.sign} maps the negative values to -1 and the
1121
positive values to 1. Multiplying by {\tt amp} yields a square
1122
wave that jumps between {\tt -amp} and {\tt amp}.
1123
1124
\begin{figure}
1125
% aliasing.py
1126
\centerline{\includegraphics[height=2.5in]{figs/square-100-2.eps}}
1127
\caption{Spectrum of a square signal at 100 Hz.}
1128
\label{fig.square.100.2}
1129
\end{figure}
1130
1131
Figure~\ref{fig.square.100.1} shows three periods of a square
1132
wave with frequency 100 Hz,
1133
and Figure~\ref{fig.square.100.2} shows its spectrum.
1134
1135
Like a triangle wave, the square wave contains only odd harmonics,
1136
which is why there are peaks at 300, 500, and 700 Hz, etc.
1137
But the amplitude of the harmonics drops off more slowly.
1138
Specifically, amplitude drops in proportion to frequency (not frequency
1139
squared).
1140
1141
The exercises at the end of this chapter give you a chance to
1142
explore other waveforms and other harmonic structures.
1143
1144
1145
\section{Aliasing}
1146
\label{aliasing}
1147
1148
\begin{figure}
1149
% aliasing.py
1150
\centerline{\includegraphics[height=2.5in]{figs/triangle-1100-2.eps}}
1151
\caption{Spectrum of a triangle signal at 1100 Hz sampled at
1152
10,000 frames per second.}
1153
\label{fig.triangle.1100.2}
1154
\end{figure}
1155
1156
I have a confession. I chose the examples in the previous section
1157
carefully to avoid showing you something confusing. But now it's
1158
time to get confused.
1159
1160
Figure~\ref{fig.triangle.1100.2} shows the spectrum of a triangle wave
1161
at 1100 Hz, sampled at 10,000 frames per second. The harmonics
1162
of this wave should be at 3300, 5500, 7700, and 9900 Hz.
1163
1164
In the figure, there are peaks at 1100 and 3300 Hz, as expected, but
1165
the third peak is at 4500, not 5500 Hz. The
1166
fourth peak is at 2300, not 7700 Hz. And if you look closely, the
1167
peak that should be at 9900 is actually at 100 Hz. What's going on?
1168
1169
The problem is that when you evaluate the signal at
1170
discrete points in time, you lose information about what happened
1171
between samples. For low frequency components, that's not a
1172
problem, because you have lots of samples per period.
1173
1174
But if you sample a signal at 5000 Hz with 10,000 frames per second,
1175
you only have two samples per period. That turns out to be enough,
1176
just barely, but if the frequency is higher, it's not.
1177
1178
To see why, let's generate cosine signals at 4500 and 5500 Hz,
1179
and sample them at 10,000 frames per second:
1180
1181
\begin{verbatim}
1182
framerate = 10000
1183
1184
signal = thinkdsp.CosSignal(4500)
1185
duration = signal.period*5
1186
segment = signal.make_wave(duration, framerate=framerate)
1187
segment.plot()
1188
1189
signal = thinkdsp.CosSignal(5500)
1190
segment = signal.make_wave(duration, framerate=framerate)
1191
segment.plot()
1192
\end{verbatim}
1193
1194
\begin{figure}
1195
% aliasing.py
1196
\centerline{\includegraphics[height=2.5in]{figs/aliasing1.eps}}
1197
\caption{Cosine signals at 4500 and 5500 Hz, sampled at 10,000 frames
1198
per second. They are identical.}
1199
\label{fig.aliasing1}
1200
\end{figure}
1201
1202
Figure~\ref{fig.aliasing1} shows the result. I plotted the
1203
samples using vertical lines, to make it easier to compare the
1204
two Waves, and I offset them slightly in time. The problem
1205
should be clear: the
1206
the two Waves are exactly the same!
1207
1208
When we sample a 5500 Hz signal at 10,000 frames per second, the
1209
result is indistinguishable from a 4500 Hz signal.
1210
For the same reason, a 7700 Hz signal is indistinguishable
1211
from 2300 Hz, and a 9900 Hz is indistinguishable from 100 Hz.
1212
1213
This effect is called {\bf aliasing} because when the high frequency
1214
signal is sampled, it appears to be a low frequency signal.
1215
1216
In this example, the highest frequency we can measure is 5000 Hz,
1217
which is half the sampling rate. Frequencies above 5000 Hz are folded
1218
back below 5000 Hz, which is why this threshold is sometimes called
1219
the ``folding frequency''. Is is sometimes also called the {\bf
1220
Nyquist frequency}. See
1221
\url{http://en.wikipedia.org/wiki/Nyquist_frequency}.
1222
1223
The folding pattern continues if the aliased frequency goes below
1224
zero. For example, the 5th harmonic of the 1100 Hz triangle wave is
1225
at 12,100 Hz. Folded at 5000 Hz, it would appear at -2100 Hz, but it
1226
gets folded again at 0 Hz, back to 2100 Hz. In fact, you can see a
1227
small peak at 2100 Hz in Figure~\ref{fig.square.100.2}, and the next
1228
one at 4300 Hz.
1229
1230
1231
\section{Computing the spectrum}
1232
1233
We have seen the Wave method \verb"make_spectrum" several times.
1234
Here is the implementation (leaving out some details we'll get
1235
to later):
1236
1237
\begin{verbatim}
1238
from np.fft import rfft, rfftfreq
1239
1240
# class Wave:
1241
def make_spectrum(self):
1242
n = len(self.ys)
1243
d = 1 / self.framerate
1244
1245
hs = rfft(self.ys)
1246
fs = rfftfreq(n, d)
1247
1248
return Spectrum(hs, fs, self.framerate)
1249
\end{verbatim}
1250
1251
The parameter {\tt self} is a Wave object. {\tt n} is the number
1252
of samples in the wave, and {\tt d} is the inverse of the
1253
frame rate, which is the time between samples.
1254
1255
{\tt np.fft} is the NumPy module that provides functions related
1256
to the {\bf Fast Fourier Transform} (FFT), which is an efficient
1257
algorithm that computes the Discrete Fourier Transform (DFT).
1258
1259
%TODO: add a forward reference to full fft
1260
\verb"make_spectrum" uses {\tt rfft}, which stands for ``real
1261
FFT'', because the Wave contains real values, not complex. Later
1262
we'll see the full FFT, which can handle complex signals. The result
1263
of {\tt rfft}, which I call {\tt hs}, is a NumPy array of complex
1264
numbers that represents the amplitude and phase offset of each
1265
frequency component in the wave.
1266
1267
The result of {\tt rfftfreq}, which I call {\tt fs}, is an array that
1268
contains frequencies corresponding to the {\tt hs}.
1269
1270
To understand the values in {\tt hs}, consider these two ways to think
1271
about complex numbers:
1272
1273
\begin{itemize}
1274
1275
\item A complex number is the sum of a real part and an imaginary
1276
part, often written $x + iy$, where $i$ is the imaginary unit,
1277
$\sqrt{-1}$. You can think of $x$ and $y$ as Cartesian coordinates.
1278
1279
\item A complex number is also the product of a magnitude and a
1280
complex exponential, $A e^{i \phi}$, where $A$ is the {\bf
1281
magnitude} and $\phi$ is the {\bf angle} in radians, also called
1282
the ``argument''. You can think of $A$ and $\phi$ as polar
1283
coordinates.
1284
1285
\end{itemize}
1286
1287
Each value in {\tt hs} corresponds to a frequency component: its
1288
magnitude is proportional to the amplitude of the corresponding
1289
component; its angle is the phase offset.
1290
1291
The Spectrum class provides two read-only properties, {\tt amps}
1292
and {\tt angles}, which return NumPy arrays representing the
1293
magnitudes and angles of the {\tt hs}. When we plot a Spectrum
1294
object, we usually plot {\tt amps} versus {\tt fs}. Sometimes
1295
it is also useful to plot {\tt angles} versus {\tt fs}.
1296
1297
Although it might be tempting to look at the real and imaginary
1298
parts of {\tt hs}, you will almost never need to. I encourage
1299
you to think of the DFT as a vector of amplitudes and phase offsets
1300
that happen to be encoded in the form of complex numbers.
1301
1302
To modify a Spectrum, you can access the {\tt hs} directly.
1303
For example:
1304
1305
\begin{verbatim}
1306
spectrum.hs *= 2
1307
spectrum.hs[spectrum.fs > cutoff] = 0
1308
\end{verbatim}
1309
1310
The first line multiples the elements of {\tt hs} by 2, which
1311
doubles the amplitudes of all components. The second line
1312
sets to 0 only the elements of {\tt hs} where the corresponding
1313
frequency exceeds some cutoff frequency.
1314
1315
But Spectrum also provides methods to perform these operations:
1316
1317
\begin{verbatim}
1318
spectrum.scale(2)
1319
spectrum.low_pass(cutoff)
1320
\end{verbatim}
1321
1322
You can read the documentation of these methods and others at
1323
\url{http://think-dsp.com/thinkdsp.html}.
1324
1325
At this point you should have a better idea of how the Signal, Wave,
1326
and Spectrum classes work, but I have not explained how the Fast
1327
Fourier Transform works. That will take a few more chapters.
1328
1329
1330
\section{Exercises}
1331
1332
Solutions to these exercises are in {\tt chap02soln.ipynb}.
1333
1334
\begin{exercise}
1335
If you use IPython, load {\tt chap02.ipynb} and try out the examples.
1336
You can also view the notebook at \url{http://tinyurl.com/thinkdsp02}.
1337
\end{exercise}
1338
1339
\begin{exercise}
1340
A sawtooth signal has a waveform that ramps up linearly from -1 to 1,
1341
then drops to -1 and repeats. See
1342
\url{http://en.wikipedia.org/wiki/Sawtooth_wave}
1343
1344
Write a class called
1345
{\tt SawtoothSignal} that extends {\tt Signal} and provides
1346
{\tt evaluate} to evaluate a sawtooth signal.
1347
1348
Compute the spectrum of a sawtooth wave. How does the harmonic
1349
structure compare to triangle and square waves?
1350
\end{exercise}
1351
1352
\begin{exercise}
1353
Make a square signal at 1100 Hz and make a wave that samples it
1354
at 10000 frames per second. If you plot the spectrum, you can
1355
see that most of the harmonics are aliased.
1356
When you listen to the wave, can you hear the aliased harmonics?
1357
\end{exercise}
1358
1359
1360
\begin{exercise}
1361
If you have a spectrum object, {\tt spectrum}, and print the
1362
first few values of {\tt spectrum.fs}, you'll see that they
1363
start at zero. So {\tt spectrum.hs[0]} is the magnitude
1364
of the component with frequency 0. But what does that mean?
1365
1366
Try this experiment:
1367
1368
\begin{enumerate}
1369
1370
\item Make a triangle signal with frequency 440 and make
1371
a Wave with duration 0.01 seconds. Plot the waveform.
1372
1373
\item Make a Spectrum object and print {\tt spectrum.hs[0]}.
1374
What is the amplitude and phase of this component?
1375
1376
\item Set {\tt spectrum.hs[0] = 100}. Make a Wave from the
1377
modified Spectrum and plot it. What effect does this operation
1378
have on the waveform?
1379
1380
\end{enumerate}
1381
1382
\end{exercise}
1383
1384
1385
\begin{exercise}
1386
Write a function that takes a Spectrum as a parameter and modifies
1387
it by dividing each element of {\tt hs} by the corresponding frequency
1388
from {\tt fs}. Hint: since division by zero is undefined, you
1389
might want to set {\tt spectrum.hs[0] = 0}.
1390
1391
Test your function using a square, triangle, or sawtooth wave.
1392
1393
\begin{enumerate}
1394
1395
\item Compute the Spectrum and plot it.
1396
1397
\item Modify the Spectrum using your function and plot it again.
1398
1399
\item Make a Wave from the modified Spectrum and listen to it. What
1400
effect does this operation have on the signal?
1401
1402
\end{enumerate}
1403
1404
\end{exercise}
1405
1406
\begin{exercise}
1407
Triangle and square waves have odd harmonics only; the sawtooth
1408
wave has both even and odd harmonics. The harmonics of the
1409
square and sawtooth waves drop off in proportion to $1/f$; the
1410
harmonics of the triangle wave drop off like $1/f^2$. Can you
1411
find a waveform that has even and odd harmonics that drop off
1412
like $1/f^2$?
1413
1414
Hint: There are two ways you could approach this: you could
1415
construct the signal you want by adding up sinusoids, or you
1416
could start with a signal that is similar to what you want and
1417
modify it.
1418
\end{exercise}
1419
1420
1421
\chapter{Non-periodic signals}
1422
\label{nonperiodic}
1423
1424
The signals we have worked with so far are periodic, which means
1425
that they repeat forever. It also means that the frequency
1426
components they contain do not change over time.
1427
In this chapter, we consider non-periodic signals,
1428
whose frequency components {\em do} change over time.
1429
In other words, pretty much all sound signals.
1430
1431
This chapter also presents spectrograms, a common way to visualize
1432
non-periodic signals.
1433
1434
The code for this chapter is in {\tt chap03.ipynb}, which is in the
1435
repository for this book (see Section~\ref{code}).
1436
You can also view it at \url{http://tinyurl.com/thinkdsp03}.
1437
1438
1439
\section{Linear chirp}
1440
1441
\begin{figure}
1442
% chirp.py
1443
\centerline{\includegraphics[height=2.5in]{figs/chirp3.eps}}
1444
\caption{Chirp waveform near the beginning, middle, and end.}
1445
\label{fig.chirp3}
1446
\end{figure}
1447
1448
We'll start with a {\bf chirp}, which is a signal with variable
1449
frequency. {\tt thinkdsp} provides a Signal called Chirp that
1450
makes a sinusoid that sweeps linearly through a range of
1451
frequencies.
1452
1453
Here's an example what sweeps from 220 to 880 Hz, which
1454
is two octaves from A3 to A5:
1455
1456
\begin{verbatim}
1457
signal = thinkdsp.Chirp(start=220, end=880)
1458
wave = signal.make_wave()
1459
\end{verbatim}
1460
1461
Figure~\ref{fig.chirp3} shows segments of this wave near the
1462
beginning, middle, and end. It's clear that the frequency is
1463
increasing.
1464
1465
Before we go on, let's see how Chirp is implemented. Here
1466
is the class definition:
1467
1468
\begin{verbatim}
1469
class Chirp(Signal):
1470
1471
def __init__(self, start=440, end=880, amp=1.0):
1472
self.start = start
1473
self.end = end
1474
self.amp = amp
1475
\end{verbatim}
1476
1477
{\tt start} and {\tt end} are the frequencies, in Hz, at the start
1478
and end of the chirp. {\tt amp} is amplitude.
1479
1480
Here is the function that evaluates the signal:
1481
1482
\begin{verbatim}
1483
def evaluate(self, ts):
1484
freqs = np.linspace(self.start, self.end, len(ts)-1)
1485
return self._evaluate(ts, freqs)
1486
\end{verbatim}
1487
1488
{\tt ts} is the sequence of points in time where the signal should be
1489
evaluated; to keep this function simple, I assume they are equally-spaced.
1490
1491
If the length of {\tt ts} is $n$, you can think of it as a sequence of
1492
$n-1$ intervals of time. To compute the frequency during each
1493
interval, I use {\tt np.linspace}, which returns a NumPy array of
1494
$n-1$ values between {\tt start} and {\tt end}.
1495
1496
\verb"_evaluate" is a private method
1497
that does the rest of the math\footnote{Beginning a method name
1498
with an underscore makes it ``private'', indicating that it is not
1499
part of the API that should be used outside the class definition.}:
1500
1501
\begin{verbatim}
1502
def _evaluate(self, ts, freqs):
1503
dts = np.diff(ts)
1504
dphis = PI2 * freqs * dts
1505
phases = np.cumsum(dphis)
1506
phases = np.insert(phases, 0, 0)
1507
ys = self.amp * np.cos(phases)
1508
return ys
1509
\end{verbatim}
1510
1511
{\tt np.diff} computes the difference between adjacent elements
1512
of {\tt ts}, returning the length of each interval in seconds.
1513
If the elements of {\tt ts} are equally spaced,
1514
the {\tt dts} are all the same.
1515
1516
The next step is to figure out how much the phase changes during
1517
each interval. In Section~\ref{sigobs} we saw that when frequency is
1518
constant, the phase, $\phi$, increases linearly over time:
1519
%
1520
\[ \phi = 2 \pi f t \]
1521
%
1522
When frequency is a function of time, the {\em change} in phase
1523
during a short time interval, $\Delta t$ is:
1524
%
1525
\[ \Delta \phi = 2 \pi f(t) \Delta t \]
1526
%
1527
In Python, since {\tt freqs} contains $f(t)$ and {\tt dts}
1528
contains the time intervals, we can write
1529
1530
\begin{verbatim}
1531
dphis = PI2 * freqs * dts
1532
\end{verbatim}
1533
1534
Now, since {\tt dphis} contains the changes in phase, we can
1535
get the total phase at each timestep by adding up the changes:
1536
1537
\begin{verbatim}
1538
phases = np.cumsum(dphis)
1539
phases = np.insert(phases, 0, 0)
1540
\end{verbatim}
1541
1542
{\tt np.cumsum} computes the cumulative sum, which is almost
1543
what we want, but it doesn't start at 0. So I use {\tt np.insert}
1544
to add a 0 at the beginning.
1545
1546
The result is a NumPy array where the {\tt i}th element contains the
1547
sum of the first {\tt i} terms from {\tt dphis}; that is, the total
1548
phase at the end of the {\tt i}th interval. Finally, {\tt np.cos}
1549
computes the amplitude of the wave as a function of phase (remember
1550
that phase is expressed in radians).
1551
1552
If you know calculus, you might notice that the limit as
1553
$\Delta t$ gets small is
1554
%
1555
\[ d\phi = 2 \pi f(t) dt \]
1556
%
1557
Dividing through by $dt$ yields
1558
%
1559
\[ \frac{d\phi}{dt} = 2 \pi f(t) \]
1560
%
1561
In other words, frequency is the derivative of phase. Conversely,
1562
phase is the integral of frequency. When we used {\tt cumsum}
1563
to go from frequency to phase, we were approximating integration.
1564
1565
1566
\section{Exponential chirp}
1567
1568
When you listen to this chirp, you might notice that the pitch
1569
rises quickly at first and then slows down.
1570
The chirp spans two octaves, but it only takes 2/3 s to span
1571
the first octave, and twice as long to span the second.
1572
1573
The reason is that our perception of pitch depends on the logarithm of
1574
frequency. As a result, the {\bf interval} we hear between two notes
1575
depends on the {\em ratio} of their frequencies, not the difference.
1576
``Interval'' is the musical term for the perceived difference between
1577
two pitches.
1578
1579
For example, an octave is an interval where the ratio of two
1580
pitches is 2. So the interval from 220 to 440 is one octave
1581
and the interval from 440 to 880 is also one octave. The difference
1582
in frequency is bigger, but the ratio is the same.
1583
1584
As a result, if frequency increases linearly, as in a linear
1585
chirp, the perceived pitch increases logarithmically.
1586
1587
If you want the perceived pitch to increase linearly, the frequency
1588
has to increase exponentially. A signal with that shape is called
1589
an {\bf exponential chirp}.
1590
1591
Here's the code that makes one:
1592
1593
\begin{verbatim}
1594
class ExpoChirp(Chirp):
1595
1596
def evaluate(self, ts):
1597
start, end = np.log10(self.start), np.log10(self.end)
1598
freqs = np.logspace(start, end, len(ts)-1)
1599
return self._evaluate(ts, freqs)
1600
\end{verbatim}
1601
1602
Instead of {\tt np.linspace}, this version of evaluate uses
1603
{\tt np.logspace}, which creates a series of frequencies
1604
whose logarithms are equally spaced, which means that they increase
1605
exponentially.
1606
1607
That's it; everything else is the same as Chirp. Here's the code
1608
that makes one:
1609
1610
\begin{verbatim}
1611
signal = thinkdsp.ExpoChirp(start=220, end=880)
1612
wave = signal.make_wave(duration=1)
1613
\end{verbatim}
1614
1615
You can listen to these examples in {\tt chap03.ipynb} and compare
1616
the linear and exponential chirps.
1617
1618
1619
\section{Spectrum of a chirp}
1620
\label{sauron}
1621
1622
\begin{figure}
1623
% chirp.py
1624
\centerline{\includegraphics[height=2.5in]{figs/chirp1.eps}}
1625
\caption{Spectrum of a one-second one-octave chirp.}
1626
\label{fig.chirp1}
1627
\end{figure}
1628
1629
What do you think happens if you compute the spectrum of a chirp?
1630
Here's an example that constructs a one-second, one-octave chirp and
1631
its spectrum:
1632
1633
\begin{verbatim}
1634
signal = thinkdsp.Chirp(start=220, end=440)
1635
wave = signal.make_wave(duration=1)
1636
spectrum = wave.make_spectrum()
1637
\end{verbatim}
1638
1639
Figure~\ref{fig.chirp1} shows the result. The spectrum has
1640
components at every frequency from 220 to 440 Hz, with variations
1641
that look a little like the Eye of Sauron
1642
(see \url{http://en.wikipedia.org/wiki/Sauron}).
1643
1644
The spectrum is approximately flat between 220 and 440 Hz, which
1645
indicates that the signal spends equal time at each frequency in this
1646
range. Based on that observation, you should be able to guess what
1647
the spectrum of an exponential chirp looks like.
1648
1649
The spectrum gives hints about the structure of the signal,
1650
but it obscures the relationship between frequency and time.
1651
For example, we cannot tell by looking at this spectrum whether
1652
the frequency went up or down, or both.
1653
1654
1655
\section{Spectrogram}
1656
1657
\begin{figure}
1658
% chirp.py
1659
\centerline{\includegraphics[height=2.5in]{figs/chirp2.eps}}
1660
\caption{Spectrogram of a one-second one-octave chirp.}
1661
\label{fig.chirp2}
1662
\end{figure}
1663
1664
To recover the relationship between frequency and time, we can break
1665
the chirp into segments and plot the spectrum of each segment. The
1666
result is called a {\bf short-time Fourier transform} (STFT).
1667
1668
There are several ways to visualize a STFT, but the most common
1669
is a {\bf spectrogram}, which shows time on the x-axis and frequency
1670
on the y-axis. Each column in the spectrogram shows the spectrum of
1671
a short segment, using color or grayscale to represent amplitude.
1672
1673
As an example, I'll compute the spectrogram of this chirp:
1674
1675
\begin{verbatim}
1676
signal = thinkdsp.Chirp(start=220, end=440)
1677
wave = signal.make_wave(duration=1, framerate=11025)
1678
\end{verbatim}
1679
1680
{\tt Wave} provides \verb"make_spectrogram", which returns a
1681
{\tt Spectrogram} object:
1682
1683
\begin{verbatim}
1684
spectrogram = wave.make_spectrogram(seg_length=512)
1685
spectrogram.plot(high=700)
1686
\end{verbatim}
1687
1688
\verb"seg_length" is the number of samples in each segment. I chose
1689
512 because FFT is most efficient when the number of samples is a
1690
power of 2.
1691
1692
Figure~\ref{fig.chirp2} shows the result. The x-axis shows time from
1693
0 to 1 seconds. The y-axis shows frequency from 0 to 700 Hz. I cut
1694
off the top part of the spectrogram; the full range goes to 5512.5 Hz,
1695
which is half of the framerate.
1696
1697
The spectrogram shows clearly that frequency increases linearly
1698
over time. Similarly, in the spectrogram of an exponential chirp, we
1699
can see the shape of the exponential curve.
1700
1701
However, notice that the peak in each column is blurred across 2--3
1702
cells. This blurring reflects the limited resolution of the
1703
spectrogram.
1704
1705
1706
\section{The Gabor limit}
1707
\label{gabor}
1708
1709
The {\bf time resolution} of the spectrogram is the duration of the
1710
segments, which corresponds to the width of the cells in the
1711
spectrogram. Since each segment is 512 frames, and there are 11,025
1712
frames per second, there are 0.046 seconds per segment.
1713
1714
The {\bf frequency resolution} is the frequency range between
1715
elements in the spectrum, which corresponds to the height of the
1716
cells. With 512 frames, we get 256 frequency components over a range
1717
from 0 to 5512.5 Hz, so the range between components is 21.6 Hz.
1718
1719
More generally, if $n$ is the segment length, the spectrum contains
1720
$n/2$ components. If the framerate is $r$, the maximum frequency in
1721
the spectrum is $r/2$. So the time resolution is $n/r$ and the
1722
frequency resolution is
1723
%
1724
\[ \frac{r/2}{n/2} \]
1725
%
1726
which is $r/n$.
1727
1728
Ideally we would like time resolution to be small, so we can see rapid
1729
changes in frequency. And we would like frequency resolution to be
1730
small so we can see small changes in frequency. But you can't have
1731
both. Notice that time resolution, $n/r$, is the inverse of frequency
1732
resolution, $r/n$. So if one gets smaller, the other gets bigger.
1733
1734
For example, if you double the segment length, you cut frequency
1735
resolution in half (which is good), but you double time resolution
1736
(which is bad). Even increasing the framerate doesn't help. You get
1737
more samples, but the range of frequencies increases at
1738
the same time.
1739
1740
This tradeoff is called the {\bf Gabor limit} and it is a fundamental
1741
limitation of this kind of time-frequency analysis.
1742
1743
1744
\section{Leakage}
1745
1746
\begin{figure}
1747
% chirp.py
1748
\centerline{\includegraphics[height=2.5in]{figs/windowing1.eps}}
1749
\caption{Spectrum of a periodic segment of a sinusoid (left), a
1750
non-periodic segment (middle), a windowed non-periodic segment
1751
(right).}
1752
\label{fig.windowing1}
1753
\end{figure}
1754
1755
In order to explain how \verb"make_spectrogram" works, I have
1756
to explain windowing; and in order to explain windowing, I have to
1757
show you the problem it is meant to address, which is leakage.
1758
1759
The Discrete Fourier Transform (DFT), which we use to compute
1760
Spectrums, treats waves as if they are periodic; that is, it assumes
1761
that the finite segment it operates on is a complete period from an
1762
infinite signal that repeats over all time. In practice, this
1763
assumption is often false, which creates problems.
1764
1765
One common problem is discontinuity at the beginning and end of the
1766
segment. Because DFT assumes that the signal is periodic, it
1767
implicitly connects the end of the segment back to the beginning to
1768
make a loop. If the end does not connect smoothly to the beginning,
1769
the discontinuity creates additional frequency components in the
1770
segment that are not in the signal.
1771
1772
As an example, let's start with a sine signal that contains only
1773
one frequency component at 440 Hz.
1774
1775
\begin{verbatim}
1776
signal = thinkdsp.SinSignal(freq=440)
1777
\end{verbatim}
1778
1779
If we select a segment that happens to be an integer multiple of
1780
the period, the end of the segment connects smoothly with the
1781
beginning, and DFT behaves well.
1782
1783
\begin{verbatim}
1784
duration = signal.period * 30
1785
wave = signal.make_wave(duration)
1786
spectrum = wave.make_spectrum()
1787
\end{verbatim}
1788
1789
Figure~\ref{fig.windowing1} (left) shows the result. As expected,
1790
there is a single peak at 440 Hz.
1791
1792
But if the duration is not a multiple of the period, bad things
1793
happen. With {\tt duration = signal.period * 30.25}, the signal
1794
starts at 0 and ends at 1.
1795
1796
Figure~\ref{fig.windowing1} (middle) shows
1797
the spectrum of this segment. Again, the peak is at 440 Hz, but now
1798
there are additional components spread out from 240 to 640 Hz. This
1799
spread is called ``spectral leakage'', because some of the energy that
1800
is actually at the fundamental frequency leaks into other frequencies.
1801
1802
In this example, leakage happens because we are using DFT on a segment
1803
that becomes discontinuous when we treat it as periodic.
1804
1805
1806
\section{Windowing}
1807
1808
\begin{figure}
1809
% chirp.py
1810
\centerline{\includegraphics[height=3.5in]{figs/windowing2.eps}}
1811
\caption{Segment of a sinusoid (top), Hamming window (middle), product
1812
of the segment and the window (bottom).}
1813
\label{fig.windowing2}
1814
\end{figure}
1815
1816
We can reduce leakage by smoothing out the discontinuity between
1817
the beginning and end of the segment, and one way to do that is
1818
{\bf windowing}.
1819
1820
A ``window'' is a function designed to transform a non-periodic
1821
segment into something that can pass for periodic.
1822
Figure~\ref{fig.windowing2} (top) shows a segment where the end does not
1823
connect smoothly to the beginning.
1824
1825
Figure~\ref{fig.windowing2} (middle) shows a ``Hamming window'', one of the
1826
more common window functions. No window function is perfect, but some
1827
can be shown to be optimal for different applications, and Hamming
1828
is a good, all-purpose window.
1829
1830
Figure~\ref{fig.windowing2} (bottom) shows the result of multiplying the
1831
window by the original signal. Where the window is close to 1, the
1832
signal is unchanged. Where the window is close to 0, the signal is
1833
attenuated. Because the window tapers at both ends, the end of the
1834
segment connects smoothly to the beginning.
1835
1836
Figure~\ref{fig.windowing1} (right) shows the spectrum of the windowed
1837
signal. Windowing has reduced leakage substantially, but not
1838
completely.
1839
1840
Here's what the code looks like. {\tt Wave} provides {\tt window},
1841
which applies a Hamming window:
1842
1843
\begin{verbatim}
1844
#class Wave:
1845
def window(self, window):
1846
self.ys *= window
1847
\end{verbatim}
1848
1849
And NumPy provides {\tt hamming}, which computes a Hamming window
1850
with a given length:
1851
1852
\begin{verbatim}
1853
window = np.hamming(len(wave))
1854
wave.window(window)
1855
\end{verbatim}
1856
1857
NumPy provides functions to compute other window
1858
functions, including {\tt bartlett}, {\tt blackman}, {\tt hanning},
1859
and {\tt kaiser}. One of the exercises at the end of this chapter
1860
asks you to experiment with these other windows.
1861
1862
1863
\section{Implementing spectrograms}
1864
1865
\begin{figure}
1866
% chirp.py
1867
\centerline{\includegraphics[height=2.5in]{figs/windowing3.eps}}
1868
\caption{Overlapping Hamming windows.}
1869
\label{fig.windowing3}
1870
\end{figure}
1871
1872
Now that we understand windowing, we can understand the
1873
implementation of spectrogram.
1874
Here is the {\tt Wave} method that computes spectrograms:
1875
1876
\begin{verbatim}
1877
#class Wave:
1878
def make_spectrogram(self, seg_length):
1879
window = np.hamming(seg_length)
1880
i, j = 0, seg_length
1881
step = seg_length / 2
1882
1883
spec_map = {}
1884
1885
while j < len(self.ys):
1886
segment = self.slice(i, j)
1887
segment.window(window)
1888
1889
t = (segment.start + segment.end) / 2
1890
spec_map[t] = segment.make_spectrum()
1891
1892
i += step
1893
j += step
1894
1895
return Spectrogram(spec_map, seg_length)
1896
\end{verbatim}
1897
1898
This is the longest function in the book, so if you can handle
1899
this, you can handle anything.
1900
1901
The parameter, {\tt self}, is a Wave object.
1902
\verb"seg_length" is the number of samples in each segment.
1903
1904
{\tt window} is a Hamming window with the same length as the segments.
1905
1906
{\tt i} and {\tt j} are the slice indices that select segments from
1907
the wave. {\tt step} is the offset between segments. Since {\tt
1908
step} is half of \verb"seg_length", the segments overlap by half.
1909
Figure~\ref{fig.windowing3} shows what these overlapping windows look
1910
like.
1911
1912
\verb"spec_map" is a dictionary that maps from a timestamp to
1913
a Spectrum.
1914
1915
Inside the while loop, we select a slice from the wave and apply
1916
the window; then we construct a Spectrum
1917
object and add it to \verb"spec_map". The nominal time of
1918
each segment, {\tt t}, is the midpoint.
1919
1920
Then we advance {\tt i} and {\tt j}, and continue as long as {\tt j}
1921
doesn't go past the end of the Wave.
1922
1923
Finally, the method constructs and returns a Spectrogram. Here
1924
is the definition of Spectrogram:
1925
1926
\begin{verbatim}
1927
class Spectrogram(object):
1928
def __init__(self, spec_map, seg_length):
1929
self.spec_map = spec_map
1930
self.seg_length = seg_length
1931
\end{verbatim}
1932
1933
Like many init methods, this one just stores the
1934
parameters as attributes.
1935
1936
{\tt Spectrogram} provides {\tt plot}, which generates a
1937
pseudocolor plot with time along the x-axis and frequency along
1938
the y-axis.
1939
1940
And that's how Spectrograms are implemented.
1941
1942
1943
\section{Exercises}
1944
1945
Solutions to these exercises are in {\tt chap03soln.ipynb}.
1946
1947
\begin{exercise}
1948
Run and listen to the examples in {\tt chap03.ipynb}, which is
1949
in the repository for this book, and also available at
1950
\url{http://tinyurl.com/thinkdsp03}.
1951
1952
In the leakage example, try replacing the Hamming window with one of
1953
the other windows provided by NumPy, and see what effect they have on
1954
leakage. See
1955
\url{http://docs.scipy.org/doc/numpy/reference/routines.window.html}
1956
\end{exercise}
1957
1958
1959
\begin{exercise}
1960
Write a class called {\tt SawtoothChirp} that extends {\tt Chirp}
1961
and overrides {\tt evaluate} to generate a sawtooth waveform with
1962
frequency that increases (or decreases) linearly.
1963
1964
Hint: combine the evaluate functions from {\tt Chirp} and
1965
{\tt SawtoothSignal}.
1966
1967
Draw a sketch of what you think the spectrogram of this signal
1968
looks like, and then plot it. The effect of aliasing should be
1969
visually apparent, and if you listen carefully, you can hear it.
1970
\end{exercise}
1971
1972
1973
\begin{exercise}
1974
Make a sawtooth chirp that sweeps from 2500 to 3000 Hz, then use it to
1975
make a wave with duration 1 s and framerate 20 kHz. Draw a sketch of
1976
what you think the spectrum will look like. Then plot the
1977
spectrum and see if you got it right.
1978
\end{exercise}
1979
1980
1981
\begin{exercise}
1982
In musical terminology, a ``glissando'' is a note that slides from one
1983
pitch to another, so it is similar to a chirp.
1984
1985
Find or make a recording of a glissando and plot a spectrogram of the
1986
first few seconds. One suggestion: George Gershwin's {\it Rhapsody in
1987
Blue} starts with a famous clarinet glissando, which you can download
1988
from \url{http://archive.org/details/rhapblue11924}.
1989
\end{exercise}
1990
1991
1992
\begin{exercise}
1993
A trombone player can play a glissando by extending the trombone
1994
slide while blowing continuously. As the slide extends, the total
1995
length of the tube gets longer, and the resulting pitch is inversely
1996
proportional to length.
1997
1998
Assuming that the player moves the slide at a constant speed, how
1999
does frequency vary with time?
2000
2001
Write a class called {\tt TromboneGliss} that extends {\tt Chirp} and
2002
provides {\tt evaluate}. Make a wave that simulates a trombone
2003
glissando from C3 up to F3 and back down to C3. C3 is 262 Hz; F3 is
2004
349 Hz.
2005
2006
Plot a spectrogram of the resulting wave. Is a trombone glissando
2007
more like a linear or exponential chirp?
2008
\end{exercise}
2009
2010
2011
\begin{exercise}
2012
Make or find a recording of a series of vowel sounds and look at the
2013
spectrogram. Can you identify different vowels?
2014
\end{exercise}
2015
2016
2017
\chapter{Noise}
2018
2019
In English, ``noise'' means an unwanted or unpleasant sound. In the
2020
context of signal processing, it has two different senses:
2021
2022
\begin{enumerate}
2023
2024
\item As in English, it can mean an unwanted signal of any kind. If
2025
two signals interfere with each other, each signal would consider
2026
the other to be noise.
2027
2028
\item ``Noise'' also refers to a signal that contains components at
2029
many frequencies, so it lacks the harmonic structure of the periodic
2030
signals we saw in previous chapters.
2031
2032
\end{enumerate}
2033
2034
This chapter is about the second kind.
2035
2036
The code for this chapter is in {\tt chap04.ipynb}, which is in the
2037
repository for this book (see Section~\ref{code}).
2038
You can also view it at \url{http://tinyurl.com/thinkdsp04}.
2039
2040
2041
\section{Uncorrelated noise}
2042
2043
\begin{figure}
2044
% noise.py
2045
\centerline{\includegraphics[height=2.5in]{figs/whitenoise0.eps}}
2046
\caption{Waveform of uncorrelated uniform noise.}
2047
\label{fig.whitenoise0}
2048
\end{figure}
2049
2050
The simplest way to understand noise is to generate it, and the
2051
simplest kind to generate is uncorrelated uniform noise (UU noise).
2052
``Uniform'' means the signal contains random values from a uniform
2053
distribution; that is, every value in the range is equally likely.
2054
``Uncorrelated'' means that the values are independent; that is,
2055
knowing one value provides no information about the others.
2056
2057
Here's a class that represents UU noise:
2058
2059
\begin{verbatim}
2060
class UncorrelatedUniformNoise(_Noise):
2061
2062
def evaluate(self, ts):
2063
ys = np.random.uniform(-self.amp, self.amp, len(ts))
2064
return ys
2065
\end{verbatim}
2066
2067
{\tt UncorrelatedUniformNoise} inherits from \verb"_Noise", which
2068
inherits from {\tt Signal}.
2069
2070
As usual, the evaluate function takes {\tt ts}, the times when the
2071
signal should be evaluated. It uses
2072
{\tt np.random.uniform}, which generates values from a
2073
uniform distribution. In this example, the values are in
2074
the range between {\tt -amp} to {\tt amp}.
2075
2076
The following example generates UU noise with duration 0.5
2077
seconds at 11,025 samples per second.
2078
2079
\begin{verbatim}
2080
signal = thinkdsp.UncorrelatedUniformNoise()
2081
wave = signal.make_wave(duration=0.5, framerate=11025)
2082
\end{verbatim}
2083
2084
If you play this wave, it sounds like the static you hear if you tune
2085
a radio between channels. Figure~\ref{fig.whitenoise0} shows what the
2086
waveform looks like. As expected, it looks pretty random.
2087
2088
\begin{figure}
2089
% noise.py
2090
\centerline{\includegraphics[height=2.5in]{figs/whitenoise1.eps}}
2091
\caption{Power spectrum of uncorrelated uniform noise.}
2092
\label{fig.whitenoise1}
2093
\end{figure}
2094
2095
Now let's take a look at the spectrum:
2096
2097
\begin{verbatim}
2098
spectrum = wave.make_spectrum()
2099
spectrum.plot_power()
2100
\end{verbatim}
2101
2102
\verb"Spectrum.plot_power" is similar to \verb"Spectrum.plot",
2103
except that it plots power instead of amplitude.
2104
Power is the square of amplitude.
2105
I am switching from amplitude to power in this chapter because
2106
it is more conventional in the context of noise.
2107
2108
Figure~\ref{fig.whitenoise1} shows the result. Like the signal, the
2109
spectrum looks pretty random. In fact, it {\em is} random, but we have to
2110
be more precise about the word ``random''. There are at least three
2111
things we might like to know about a noise signal or its spectrum:
2112
2113
\begin{itemize}
2114
2115
\item Distribution: The distribution of a random signal is the set of
2116
possible values and their probabilities. For example, in the
2117
uniform noise signal, the set of values is the range from -1 to 1,
2118
and all values have the same probability. An alternative is
2119
{\bf Gaussian noise}, where the set of values is the range from negative
2120
to positive infinity, but values near 0 are the most likely, with
2121
probability that drops off according to the Gaussian or
2122
``bell'' curve.
2123
2124
\item Correlation: Is each value in the signal independent of the
2125
others, or are there dependencies between them? In UU noise, the
2126
values are independent.
2127
An alternative is {\bf Brownian noise}, where each value is the sum
2128
of the previous value and a random ``step''. So if the value of the
2129
signal is high at a particular point in time, we expect it to stay
2130
high, and if it is low, we expect
2131
it to stay low.
2132
2133
\item Relationship between power and frequency: In the spectrum of UU
2134
noise, the power at all frequencies is drawn from the same
2135
distribution; that is, the average power is the same for all
2136
frequencies. An alternative is {\bf pink noise}, where power is
2137
inversely related to frequency; that is, the power at frequency $f$
2138
is drawn from a distribution whose mean is proportional to $1/f$.
2139
2140
\end{itemize}
2141
2142
2143
\section{Integrated spectrum}
2144
2145
For UU noise we can see the relationship between power and frequency
2146
more clearly by looking at the {\bf integrated spectrum}, which
2147
is a function of frequency, $f$, that shows the cumulative power in
2148
the spectrum up to $f$.
2149
2150
\begin{figure}
2151
% noise.py
2152
\centerline{\includegraphics[height=2.5in]{figs/whitenoise2.eps}}
2153
\caption{Integrated spectrum of uncorrelated uniform noise.}
2154
\label{fig.whitenoise2}
2155
\end{figure}
2156
2157
{\tt Spectrum} provides a method that computes the IntegratedSpectrum:
2158
2159
\begin{verbatim}
2160
def make_integrated_spectrum(self):
2161
cs = np.cumsum(self.power)
2162
cs /= cs[-1]
2163
return IntegratedSpectrum(cs, self.fs)
2164
\end{verbatim}
2165
2166
{\tt self.power} is a NumPy array containing power for each frequency.
2167
{\tt np.cumsum} computes the cumulative sum of the powers.
2168
Dividing through by the last element normalizes the integrated
2169
spectrum so it runs from 0 to 1.
2170
2171
The result is an IntegratedSpectrum. Here is the class definition:
2172
2173
\begin{verbatim}
2174
class IntegratedSpectrum(object):
2175
def __init__(self, cs, fs):
2176
self.cs = cs
2177
self.fs = fs
2178
\end{verbatim}
2179
2180
Like Spectrum, IntegratedSpectrum provides \verb"plot_power", so
2181
we can compute and plot the integrated spectrum like this:
2182
2183
\begin{verbatim}
2184
integ = spectrum.make_integrated_spectrum()
2185
integ.plot_power()
2186
\end{verbatim}
2187
2188
The result, shown in Figure~\ref{fig.whitenoise2}, is a straight line,
2189
which indicates that power at all frequencies is constant, on average.
2190
Noise with equal power at all frequencies is called {\bf white noise}
2191
by analogy with light, because an equal mixture of light at all
2192
visible frequencies is white.
2193
2194
2195
2196
\section{Brownian noise}
2197
\label{brownian}
2198
2199
\begin{figure}
2200
% noise.py
2201
\centerline{\includegraphics[height=2.5in]{figs/rednoise0.eps}}
2202
\caption{Waveform of Brownian noise.}
2203
\label{fig.rednoise0}
2204
\end{figure}
2205
2206
UU noise is uncorrelated, which means that each value does not depend
2207
on the others. An alternative is Brownian noise, in which each value
2208
is the sum of the previous value and a random ``step''.
2209
2210
It is called ``Brownian'' by analogy with Brownian motion, in which a
2211
particle suspended in a fluid moves apparently at random, due to
2212
unseen interactions with the fluid. Brownian motion is often
2213
described using a {\bf random walk}, which is a mathematical model
2214
of a path where the distance between steps is characterized by a
2215
random distribution.
2216
2217
In a one-dimensional random walk, the particle moves up or down
2218
by a random amount at each time step. The location of the particle
2219
at any point in time is the sum of all previous steps.
2220
2221
This observation suggests a way to generate Brownian noise:
2222
generate uncorrelated random steps and then add them up.
2223
Here is a class definition that implements this algorithm:
2224
2225
\begin{verbatim}
2226
class BrownianNoise(_Noise):
2227
2228
def evaluate(self, ts):
2229
dys = np.random.uniform(-1, 1, len(ts))
2230
ys = np.cumsum(dys)
2231
ys = normalize(unbias(ys), self.amp)
2232
return ys
2233
\end{verbatim}
2234
2235
{\tt evaluate} uses {\tt np.random.uniform} to generate an
2236
uncorrelated signal and {\tt np.cumsum} to compute their cumulative
2237
sum.
2238
2239
Since the sum is likely to escape the range from -1 to
2240
1, we have to use {\tt unbias} to shift the mean to 0, and {\tt
2241
normalize} to get the desired maximum amplitude.
2242
2243
Here's the code that generates a BrownianNoise object and plots the
2244
waveform.
2245
2246
\begin{verbatim}
2247
signal = thinkdsp.BrownianNoise()
2248
wave = signal.make_wave(duration=0.5, framerate=11025)
2249
wave.plot()
2250
\end{verbatim}
2251
2252
Figure~\ref{fig.rednoise0} shows the result. The waveform
2253
wanders up and down, but there is a clear correlation between
2254
successive values. When the amplitude is high, it tends to stay
2255
high, and vice versa.
2256
2257
\begin{figure}
2258
% noise.py
2259
\centerline{\includegraphics[height=2.5in]{figs/rednoise3.eps}}
2260
\caption{Spectrum of Brownian noise on a log-log scale.}
2261
\label{fig.rednoise3}
2262
\end{figure}
2263
2264
If you plot the spectrum of Brownian noise, it doesn't look like
2265
much. Nearly all of the power is at the lowest frequencies; on a
2266
linear scale, the higher frequency components are not visible.
2267
2268
To see the shape of the spectrum more clearly, we can plot power
2269
and frequency on a log-log scale. Here's the code:
2270
2271
\begin{verbatim}
2272
spectrum = wave.make_spectrum()
2273
spectrum.plot_power(linewidth=1, alpha=0.5)
2274
thinkplot.config(xscale='log', yscale='log')
2275
\end{verbatim}
2276
2277
The result is in Figure~\ref{fig.rednoise3}. The relationship between
2278
power and frequency is noisy, but roughly linear.
2279
2280
{\tt Spectrum} provides \verb"estimate_slope", which uses SciPy to compute
2281
a least squares fit to the power spectrum:
2282
2283
\begin{verbatim}
2284
#class Spectrum
2285
2286
def estimate_slope(self):
2287
x = np.log(self.fs[1:])
2288
y = np.log(self.power[1:])
2289
t = scipy.stats.linregress(x,y)
2290
return t
2291
\end{verbatim}
2292
2293
It discards the first component of the spectrum because
2294
this component corresponds to $f=0$, and $\log 0$ is undefined.
2295
2296
\verb"estimate_slope" returns the result from {\tt
2297
scipy.stats.linregress} which is an object that contains the
2298
estimated slope and intercept, coefficient of determination ($R^2$),
2299
p-value, and standard error. For our purposes, we only need the
2300
slope.
2301
2302
For Brownian noise, the slope of the power spectrum is -2 (we'll see
2303
why in Chapter~\ref{diffint}), so we can write this relationship:
2304
%
2305
\[ \log P = k -2 \log f \]
2306
%
2307
where $P$ is power, $f$ is frequency, and $k$ is the intercept
2308
of the line, which is not important for our purposes.
2309
Exponentiating both sides yields:
2310
%
2311
\[ P = K / f^{2} \]
2312
%
2313
where $K$ is $e^k$, but still not important. More relevant is
2314
that power is proportional to $1/f^2$, which is characteristic
2315
of Brownian noise.
2316
2317
Brownian noise is also called {\bf red noise}, for the same reason that
2318
white noise is called ``white''. If you combine visible light with
2319
power proportional to $1/f^2$, most of the power
2320
would be at the low-frequency end of the spectrum, which is red.
2321
Brownian noise is also sometimes called ``brown noise'', but I think
2322
that's confusing, so I won't use it.
2323
2324
%TODO: refer back to this in the diff/int chapter
2325
2326
2327
\section{Pink Noise}
2328
\label{pink}
2329
2330
\begin{figure}
2331
% noise.py
2332
\centerline{\includegraphics[height=2.5in]{figs/pinknoise0.eps}}
2333
\caption{Waveform of pink noise with $\beta=1$.}
2334
\label{fig.pinknoise0}
2335
\end{figure}
2336
2337
For red noise, the relationship between frequency
2338
and power is
2339
%
2340
\[ P = K / f^{2} \]
2341
%
2342
There is nothing special about the exponent 2. More generally,
2343
we can synthesize noise with any exponent, $\beta$.
2344
%
2345
\[ P = K / f^{\beta} \]
2346
%
2347
When $\beta = 0$, power is constant at all frequencies,
2348
so the result is white noise. When $\beta=2$ the result is red noise.
2349
2350
When $\beta$ is between 0 and 2, the result is between white and
2351
red noise, so it is called ``pink noise''.
2352
2353
There are several ways to generate pink noise. The simplest is to
2354
generate white noise and then apply a low-pass filter with the
2355
desired exponent. {\tt thinkdsp} provides a class that represents
2356
a pink noise signal:
2357
2358
\begin{verbatim}
2359
class PinkNoise(_Noise):
2360
2361
def __init__(self, amp=1.0, beta=1.0):
2362
self.amp = amp
2363
self.beta = beta
2364
\end{verbatim}
2365
2366
{\tt amp} is the desired amplitude of the signal.
2367
{\tt beta} is the desired exponent. {\tt PinkNoise} provides
2368
\verb"make_wave", which generates a Wave.
2369
2370
\begin{verbatim}
2371
def make_wave(self, duration=1, start=0, framerate=11025):
2372
signal = UncorrelatedUniformNoise()
2373
wave = signal.make_wave(duration, start, framerate)
2374
spectrum = wave.make_spectrum()
2375
2376
spectrum.pink_filter(beta=self.beta)
2377
2378
wave2 = spectrum.make_wave()
2379
wave2.unbias()
2380
wave2.normalize(self.amp)
2381
return wave2
2382
\end{verbatim}
2383
2384
{\tt duration} is the length of the wave in seconds. {\tt start} is
2385
the start time of the wave; it is included so that \verb"make_wave"
2386
has the same interface for all types of signal, but for random noise,
2387
start time is irrelevant. And {\tt framerate} is the number of
2388
samples per second.
2389
2390
\begin{figure}
2391
% noise.py
2392
\centerline{\includegraphics[height=2.5in]{figs/noise-triple.eps}}
2393
\caption{Spectrum of white, pink, and red noise on a log-log scale.}
2394
\label{fig.noise-triple}
2395
\end{figure}
2396
2397
\verb"make_wave" creates a white noise wave, computes its spectrum,
2398
applies a filter with the desired exponent, and then converts the
2399
filtered spectrum back to a wave. Then it unbiases and normalizes
2400
the wave.
2401
2402
{\tt Spectrum} provides \verb"pink_filter":
2403
2404
\begin{verbatim}
2405
def pink_filter(self, beta=1.0):
2406
denom = self.fs ** (beta/2.0)
2407
denom[0] = 1
2408
self.hs /= denom
2409
\end{verbatim}
2410
2411
\verb"pink_filter" divides each element of the spectrum by
2412
$f^{\beta/2}$. Since power is the square of amplitude, this
2413
operation divides the power at
2414
each component by $f^\beta$. It treats the component
2415
at $f=0$ as a special case, partly to avoid dividing by 0, and partly
2416
because this element represents the bias of the signal,
2417
which we are going to set to 0 anyway.
2418
2419
Figure~\ref{fig.pinknoise0} shows the resulting waveform. Like
2420
Brownian noise, it wanders up and down in a way that suggests
2421
correlation between successive values, but at least visually, it looks
2422
more random. In the next chapter we will come back to this
2423
observation and I will be more precise about what I mean by
2424
``correlation'' and ``more random''.
2425
2426
Finally, Figure~\ref{fig.noise-triple} shows a spectrum for
2427
white, pink, and red noise on the same log-log scale.
2428
The relationship between the exponent, $\beta$, and the slope
2429
of the spectrum is apparent in this figure.
2430
2431
2432
\section{Gaussian noise}
2433
2434
\begin{figure}
2435
% noise.py
2436
\centerline{\includegraphics[height=2.5in]{figs/noise1.eps}}
2437
\caption{Normal probability plot for the real and imaginary parts
2438
of the spectrum of Gaussian noise.}
2439
\label{fig.noise1}
2440
\end{figure}
2441
2442
We started with uncorrelated uniform (UU) noise and showed that,
2443
because its spectrum has equal power at all frequencies, on
2444
average, UU noise is white.
2445
2446
But when people talk about ``white noise'', they don't always
2447
mean UU noise. In fact, more often they mean uncorrelated
2448
Gaussian (UG) noise.
2449
2450
{\tt thinkdsp} provides an implementation of UG noise:
2451
2452
\begin{verbatim}
2453
class UncorrelatedGaussianNoise(_Noise):
2454
2455
def evaluate(self, ts):
2456
ys = np.random.normal(0, self.amp, len(ts))
2457
return ys
2458
\end{verbatim}
2459
2460
{\tt np.random.normal} returns a NumPy array of values from a
2461
Gaussian distribution, in this case with mean 0 and standard deviation
2462
{\tt self.amp}. In theory the range of values is from negative to
2463
positive infinity, but we expect about 99\% of the values to be
2464
between -3 and 3.
2465
2466
UG noise is similar in many ways to UU noise. The spectrum has
2467
equal power at all frequencies, on average, so UG is also white.
2468
And it has one other interesting property: the spectrum of UG
2469
noise is also UG noise. More precisely, the real and imaginary
2470
parts of the spectrum are uncorrelated Gaussian values.
2471
2472
To test that claim, we can generate the spectrum of UG noise and
2473
then generate a ``normal probability plot'', which is a graphical
2474
way to test whether a distribution is Gaussian.
2475
2476
\begin{verbatim}
2477
signal = thinkdsp.UncorrelatedGaussianNoise()
2478
wave = signal.make_wave(duration=0.5, framerate=11025)
2479
spectrum = wave.make_spectrum()
2480
2481
thinkstats2.NormalProbabilityPlot(spectrum.real)
2482
thinkstats2.NormalProbabilityPlot(spectrum.imag)
2483
\end{verbatim}
2484
2485
{\tt NormalProbabilityPlot} is provided by {\tt thinkstats2}, which is
2486
included in the repository for this book. If you are not familiar
2487
with normal probability plots, you can read about them in Chapter 5
2488
of {\it Think Stats} at \url{http://thinkstats2.com}.
2489
2490
Figure~\ref{fig.noise1} shows the results. The gray lines
2491
show a linear model fit to the data; the dark lines show the
2492
data.
2493
2494
A straight line on a normal probability plot indicates
2495
that the data come from a Gaussian distribution. Except for
2496
some random variation at the extremes, these lines are straight,
2497
which indicates that the spectrum of UG noise is UG noise.
2498
2499
The spectrum of UU noise is also UG noise, at least approximately. In
2500
fact, by the Central Limit Theorem, the spectrum of almost any
2501
uncorrelated noise is approximately Gaussian, as long as the
2502
distribution has finite mean and standard deviation, and the number of
2503
samples is large.
2504
2505
2506
\section{Exercises}
2507
2508
Solutions to these exercises are in {\tt chap04soln.ipynb}.
2509
2510
\begin{exercise}
2511
``A Soft Murmur'' is a web site that plays a mixture of natural
2512
noise sources, including rain, waves, wind, etc. At
2513
\url{http://asoftmurmur.com/about/} you can find their list
2514
of recordings, most of which are at \url{http://freesound.org}.
2515
2516
Download a few of these files and compute the spectrum of each
2517
signal. Does the power spectrum look like white noise, pink noise,
2518
or Brownian noise? How does the spectrum vary over time?
2519
\end{exercise}
2520
2521
2522
\begin{exercise}
2523
In a noise signal, the mixture of frequencies changes over time.
2524
In the long run, we expect the power at all frequencies to be equal,
2525
but in any sample, the power at each frequency is random.
2526
2527
To estimate the long-term average power at each frequency, we can
2528
break a long signal into segments, compute the power spectrum
2529
for each segment, and then compute the average across
2530
the segments. You can read more about this algorithm at
2531
\url{http://en.wikipedia.org/wiki/Bartlett's_method}.
2532
2533
Implement Bartlett's method and use it to estimate the power
2534
spectrum for a noise wave. Hint: look at the implementation
2535
of \verb"make_spectrogram".
2536
\end{exercise}
2537
2538
2539
\begin{exercise}
2540
At \url{http://www.coindesk.com} you can download the daily
2541
price of a BitCoin as a CSV file. Read this file and compute
2542
the spectrum of BitCoin prices as a function of time.
2543
Does it resemble white, pink, or Brownian noise?
2544
\end{exercise}
2545
2546
2547
\begin{exercise}
2548
A Geiger counter is a device that detects radiation.
2549
When an ionizing particle strikes the detector, it outputs a surge of
2550
current. The total output at a point in time can be modeled as
2551
uncorrelated Poisson (UP) noise, where each sample is
2552
a random quantity from a Poisson distribution, which corresponds to the
2553
number of particles detected during an interval.
2554
2555
Write a class called {\tt UncorrelatedPoissonNoise} that inherits
2556
from \verb"thinkdsp._Noise" and provides {\tt evaluate}. It should
2557
use {\tt np.random.poisson} to generate random values from a Poisson
2558
distribution. The parameter of this function, {\tt lam}, is the
2559
average number of particles during each interval. You can use the
2560
attribute {\tt amp} to specify {\tt lam}. For example, if the
2561
framerate is 10 kHz and {\tt amp} is 0.001, we expect about 10
2562
``clicks'' per second.
2563
2564
Generate about a second of UP noise and listen to it. For low values
2565
of {\tt amp}, like 0.001, it should sound like a Geiger counter. For
2566
higher values it should sound like white noise. Compute and plot the
2567
power spectrum to see whether it looks like white noise.
2568
\end{exercise}
2569
2570
2571
\begin{exercise}
2572
The algorithm in this chapter for generating pink noise is
2573
conceptually simple but computationally expensive. There are
2574
more efficient alternatives, like the Voss-McCartney algorithm.
2575
Research this method, implement it, compute the spectrum of
2576
the result, and confirm that it has the desired relationship
2577
between power and frequency.
2578
\end{exercise}
2579
2580
2581
\chapter{Autocorrelation}
2582
2583
In the previous chapter I characterized white noise as ``uncorrelated'',
2584
which means that each value is independent of the others, and Brownian
2585
noise as ``correlated'', because each value depends on the preceding
2586
value. In this chapter I define these terms more precisely and
2587
present the {\bf autocorrelation function}, which is a useful tool
2588
for signal analysis.
2589
2590
The code for this chapter is in {\tt chap05.ipynb}, which is in the
2591
repository for this book (see Section~\ref{code}).
2592
You can also view it at \url{http://tinyurl.com/thinkdsp05}.
2593
2594
2595
\section{Correlation}
2596
2597
In general, correlation between variables means that if you know the
2598
value of one, you have some information about the other. There are
2599
several ways to quantify correlation, but the most common is the
2600
Pearson product-moment correlation coefficient, usually denoted
2601
$\rho$. For two variables, $x$ and $y$, that each contain $N$ values:
2602
%
2603
\[ \rho = \frac{ \sum_i (x_i - \mu_x) (y_i - \mu_y)}{N \sigma_x \sigma_y} \]
2604
%
2605
Where $\mu_x$ and $\mu_y$ are the means of $x$ and $y$, and
2606
$\sigma_x$ and $\sigma_y$ are their standard deviations.
2607
2608
Pearson's correlation is always between -1 and +1 (including both).
2609
If $\rho$ is positive, we say that the correlation is positive,
2610
which means that when one variable is high, the other tends to be
2611
high. If $\rho$ is negative, the correlation is negative, so
2612
when one variable is high, the other tends to be low.
2613
2614
The magnitude of $\rho$ indicates the strength of the correlation. If
2615
$\rho$ is 1 or -1, the variables are perfectly correlated, which means
2616
that if you know one, you can make a perfect prediction about the
2617
other. If $\rho$ is near zero, the correlation is probably weak, so
2618
if you know one, it doesn't tell you much about the others,
2619
2620
I say ``probably weak'' because it is also possible that there is
2621
a nonlinear relationship that is not captured by the coefficient
2622
of correlation. Nonlinear relationships are often important in
2623
statistics, but less often relevant for signal processing, so I
2624
won't say more about them here.
2625
2626
Python provides several ways to compute correlations. {\tt
2627
np.corrcoef} takes any number of variables and computes a {\bf
2628
correlation matrix} that includes correlations between each pair of
2629
variables.
2630
2631
\begin{figure}
2632
% autocorr.py
2633
\centerline{\includegraphics[height=2.5in]{figs/autocorr1.eps}}
2634
\caption{Two sine waves that differ by a phase offset of 1 radian;
2635
their coefficient of correlation is 0.54.}
2636
\label{fig.autocorr1}
2637
\end{figure}
2638
2639
I'll present an example with only two variables. First, I define
2640
a function that constructs sine waves with different phase offsets:
2641
2642
\begin{verbatim}
2643
def make_sine(offset):
2644
signal = thinkdsp.SinSignal(freq=440, offset=offset)
2645
wave = signal.make_wave(duration=0.5, framerate=10000)
2646
return wave
2647
\end{verbatim}
2648
2649
Next I instantiate two waves with different offsets:
2650
2651
\begin{verbatim}
2652
wave1 = make_sine(offset=0)
2653
wave2 = make_sine(offset=1)
2654
\end{verbatim}
2655
2656
Figure~\ref{fig.autocorr1} shows what the first few periods of these
2657
waves look like. When one wave is high, the other is usually high, so we
2658
expect them to be correlated.
2659
2660
\begin{verbatim}
2661
>>> corr_matrix = np.corrcoef(wave1.ys, wave2.ys, ddof=0)
2662
[[ 1. 0.54]
2663
[ 0.54 1. ]]
2664
\end{verbatim}
2665
2666
The option {\tt ddof=0} indicates that {\tt corrcoef} should divide by
2667
$N$, as in the equation above, rather than use the default, $N-1$.
2668
2669
The result is a correlation matrix:
2670
the first element is the correlation of {\tt wave1}
2671
with itself, which is always 1. Similarly, the last element
2672
is the correlation of {\tt wave2} with itself.
2673
2674
\begin{figure}
2675
% autocorr.py
2676
\centerline{\includegraphics[height=2.5in]{figs/autocorr2.eps}}
2677
\caption{The correlation of two sine waves as a function of the
2678
phase offset between them. The result is a cosine.}
2679
\label{fig.autocorr2}
2680
\end{figure}
2681
2682
The off-diagonal elements contain the value we're interested in,
2683
the correlation of {\tt wave1} and {\tt wave2}. The value 0.54
2684
indicates that the strength of the correlation is moderate.
2685
2686
As the phase offset increases, this correlation decreases until
2687
the waves are 180 degrees out of phase, which yields correlation
2688
-1. Then it increases until the offset differs by 360 degrees.
2689
At that point we have come full circle and the correlation is 1.
2690
2691
Figure~\ref{fig.autocorr2} shows the relationship between correlation and
2692
phase offset for a sine wave. The shape of that curve should look
2693
familiar; it is a cosine.
2694
2695
{\tt thinkdsp} provides a simple interface for computing the
2696
correlation between waves:
2697
2698
\begin{verbatim}
2699
>>> wave1.corr(wave2)
2700
0.54
2701
\end{verbatim}
2702
2703
2704
\section{Serial correlation}
2705
2706
Signals often represent measurements of quantities that vary in
2707
time. For example, the sound signals we've worked with represent
2708
measurements of voltage (or current), which correspond to the changes
2709
in air pressure we perceive as sound.
2710
2711
Measurements like this almost always have serial correlation, which
2712
is the correlation between each element and the next (or the previous).
2713
To compute serial correlation, we can shift a signal and then compute
2714
the correlation of the shifted version with the original.
2715
2716
\begin{verbatim}
2717
def serial_corr(wave, lag=1):
2718
n = len(wave)
2719
y1 = wave.ys[lag:]
2720
y2 = wave.ys[:n-lag]
2721
corr = np.corrcoef(y1, y2, ddof=0)[0, 1]
2722
return corr
2723
\end{verbatim}
2724
2725
\verb"serial_corr" takes a Wave object and
2726
{\tt lag}, which is the integer number of places to shift the wave.
2727
It computes the correlation of the wave with a shifted version
2728
of itself.
2729
2730
We can test this function with the noise signals from the previous
2731
chapter. We expect UU noise to be uncorrelated, based on the
2732
way it's generated (not to mention the name):
2733
2734
\begin{verbatim}
2735
signal = thinkdsp.UncorrelatedGaussianNoise()
2736
wave = signal.make_wave(duration=0.5, framerate=11025)
2737
serial_corr(wave)
2738
\end{verbatim}
2739
2740
When I ran this example, I got 0.006, which indicates a very
2741
small serial correlation. You might get a different value when you run
2742
it, but it should be comparably small.
2743
2744
In a Brownian noise signal, each value is the sum of the previous
2745
value and a random ``step'', so we expect a strong serial
2746
correlation:
2747
2748
\begin{verbatim}
2749
signal = thinkdsp.BrownianNoise()
2750
wave = signal.make_wave(duration=0.5, framerate=11025)
2751
serial_corr(wave)
2752
\end{verbatim}
2753
2754
Sure enough, the result I got is greater than 0.999.
2755
2756
\begin{figure}
2757
% autocorr.py
2758
\centerline{\includegraphics[height=2.5in]{figs/autocorr3.eps}}
2759
\caption{Serial correlation for pink noise with a range of
2760
parameters.}
2761
\label{fig.autocorr3}
2762
\end{figure}
2763
2764
Since pink noise is in some sense between Brownian noise and UU noise,
2765
we might expect an intermediate correlation:
2766
2767
\begin{verbatim}
2768
signal = thinkdsp.PinkNoise(beta=1)
2769
wave = signal.make_wave(duration=0.5, framerate=11025)
2770
serial_corr(wave)
2771
\end{verbatim}
2772
2773
With parameter $\beta=1$, I got a serial correlation of 0.851.
2774
As we vary the parameter from $\beta=0$, which is uncorrelated
2775
noise, to $\beta=2$, which is Brownian, serial correlation
2776
ranges from 0 to almost 1, as shown in Figure~\ref{fig.autocorr3}.
2777
2778
2779
\section{Autocorrelation}
2780
\label{autopink}
2781
2782
In the previous section we computed the correlation between each
2783
value and the next, so we shifted the elements of the array by 1.
2784
But we can easily compute serial correlations with
2785
different lags.
2786
2787
\begin{figure}
2788
% autocorr.py
2789
\centerline{\includegraphics[height=2.5in]{figs/autocorr4.eps}}
2790
\caption{Autocorrelation functions for pink noise with a range
2791
of parameters.}
2792
\label{fig.autocorr4}
2793
\end{figure}
2794
2795
You can think of \verb"serial_corr" as a function that
2796
maps from each value of lag to the corresponding correlation, and we
2797
can evaluate that function by looping through values of {\tt lag}:
2798
2799
\begin{verbatim}
2800
def autocorr(wave):
2801
lags = range(len(wave.ys)//2)
2802
corrs = [serial_corr(wave, lag) for lag in lags]
2803
return lags, corrs
2804
\end{verbatim}
2805
2806
{\tt autocorr} takes a Wave object and returns the autocorrelation
2807
function as a pair of sequences: {\tt lags} is a sequence of
2808
integers from 0 to half the length of the wave; {\tt corrs}
2809
is the sequence of serial correlations for each lag.
2810
2811
Figure~\ref{fig.autocorr4} shows autocorrelation functions for pink
2812
noise with three values of $\beta$. For low values of $\beta$, the
2813
signal is less correlated, and the autocorrelation function drops
2814
off to zero quickly. For larger values, serial correlation
2815
is stronger and drops off more slowly. With $\beta=1.7$ serial
2816
correlation is strong even for long lags; this phenomenon is called
2817
{\bf long-range dependence}, because it indicates that each value in
2818
the signal depends on many preceding values.
2819
2820
2821
\section{Autocorrelation of periodic signals}
2822
2823
The autocorrelation of pink noise has interesting mathematical
2824
properties, but limited applications. The autocorrelation of
2825
periodic signals is more useful.
2826
2827
\begin{figure}
2828
% autocorr.py
2829
\centerline{\includegraphics[height=2.5in]{figs/autocorr5.eps}}
2830
\caption{Spectrogram of a vocal chirp.}
2831
\label{fig.autocorr5}
2832
\end{figure}
2833
2834
As an example, I downloaded from \url{freesound.org} a recording of
2835
someone singing a chirp; the repository for this book includes the
2836
file: \url{28042__bcjordan__voicedownbew.wav}. You can use the
2837
IPython notebook for this chapter, {\tt chap05.ipynb}, to play it.
2838
2839
Figure~\ref{fig.autocorr5} shows the spectrogram of this wave.
2840
The fundamental frequency and some of the harmonics show up clearly.
2841
The chirp starts near 500 Hz and drops down to about 300 Hz, roughly
2842
from C5 to E4.
2843
2844
\begin{figure}
2845
% autocorr.py
2846
\centerline{\includegraphics[height=2.5in]{figs/autocorr6.eps}}
2847
\caption{Spectrum of a segment from a vocal chirp.}
2848
\label{fig.autocorr6}
2849
\end{figure}
2850
2851
To estimate pitch at a particular point in time, we could use the
2852
spectrum, but it doesn't work very well. To see why not, I'll take
2853
a short segment from the wave and plot its spectrum:
2854
2855
\begin{verbatim}
2856
duration = 0.01
2857
segment = wave.segment(start=0.2, duration=duration)
2858
spectrum = segment.make_spectrum()
2859
spectrum.plot(high=1000)
2860
\end{verbatim}
2861
2862
This segment starts at 0.2 seconds and lasts 0.01 seconds.
2863
Figure~\ref{fig.autocorr6} shows its spectrum. There is a clear peak
2864
near 400 Hz, but it is hard to identify the pitch precisely. The
2865
length of the segment is 441 samples at a framerate of 44100 Hz, so
2866
the frequency resolution is 100 Hz (see Section~\ref{gabor}).
2867
That means the estimated pitch might be off by 50 Hz; in musical
2868
terms, the range from 350 Hz to 450 Hz is about 5 semitones, which is
2869
a big difference!
2870
2871
We could get better frequency resolution by taking a longer segment,
2872
but since the pitch is changing over time, we would also get ``motion
2873
blur''; that is, the peak would spread between the start and end pitch of
2874
the segment, as we saw in Section~\ref{sauron}.
2875
2876
We can estimate pitch more precisely using autocorrelation.
2877
If a signal is periodic, we expect the autocorrelation to spike
2878
when the lag equals the period.
2879
2880
\begin{figure}
2881
% autocorr.py
2882
\centerline{\includegraphics[height=2.5in]{figs/autocorr7.eps}}
2883
\caption{Two segments from a chirp, one starting 0.0023 seconds
2884
after the other.}
2885
\label{fig.autocorr7}
2886
\end{figure}
2887
2888
To show why that works, I'll plot two segments from the same
2889
recording.
2890
2891
\begin{verbatim}
2892
def plot_shifted(wave, offset=0.001, start=0.2):
2893
thinkplot.preplot(2)
2894
segment1 = wave.segment(start=start, duration=0.01)
2895
segment1.plot(linewidth=2, alpha=0.8)
2896
2897
segment2 = wave.segment(start=start-offset, duration=0.01)
2898
segment2.shift(offset)
2899
segment2.plot(linewidth=2, alpha=0.4)
2900
2901
corr = segment1.corr(segment2)
2902
text = r'$\rho =$ %.2g' % corr
2903
thinkplot.text(segment1.start+0.0005, -0.8, text)
2904
thinkplot.config(xlabel='Time (s)')
2905
\end{verbatim}
2906
2907
One segment starts at 0.2 seconds; the other starts 0.0023 seconds
2908
later. Figure~\ref{fig.autocorr7} shows the result. The segments
2909
are similar, and their correlation is 0.99. This result suggests
2910
that the period is near 0.0023 seconds, which corresponds to a frequency
2911
of 435 Hz.
2912
2913
\begin{figure}
2914
% autocorr.py
2915
\centerline{\includegraphics[height=2.5in]{figs/autocorr8.eps}}
2916
\caption{Autocorrelation function for a segment from a chirp.}
2917
\label{fig.autocorr8}
2918
\end{figure}
2919
2920
For this example, I estimated the period by trial and error. To automate
2921
the process, we can use the autocorrelation function.
2922
2923
\begin{verbatim}
2924
lags, corrs = autocorr(segment)
2925
thinkplot.plot(lags, corrs)
2926
\end{verbatim}
2927
2928
Figure~\ref{fig.autocorr8} shows the autocorrelation function for
2929
the segment starting at $t=0.2$ seconds. The first peak occurs at
2930
{\tt lag=101}. We can compute the frequency that corresponds
2931
to that period like this:
2932
2933
\begin{verbatim}
2934
period = lag / segment.framerate
2935
frequency = 1 / period
2936
\end{verbatim}
2937
2938
The estimated fundamental frequency is 437 Hz. To evaluate the
2939
precision of the estimate, we can run the same computation with
2940
lags 100 and 102, which correspond to frequencies 432 and 441 Hz.
2941
The frequency precision using autocorrelation is less than 10 Hz,
2942
compared with 100 Hz using the spectrum. In musical terms, the
2943
expected error is about 30 cents (a third of a semitone).
2944
2945
2946
\section{Correlation as dot product}
2947
\label{dotproduct}
2948
2949
I started the chapter with this definition of Pearson's
2950
correlation coefficient:
2951
%
2952
\[ \rho = \frac{ \sum_i (x_i - \mu_x) (y_i - \mu_y)}{N \sigma_x \sigma_y} \]
2953
%
2954
Then I used $\rho$ to define serial correlation and autocorrelation.
2955
That's consistent with how these terms are used in statistics,
2956
but in the context of signal processing, the definitions are
2957
a little different.
2958
2959
In signal processing, we are often working with unbiased signals,
2960
where the mean is 0, and normalized signals, where the standard
2961
deviation is 1. In that case, the definition of $\rho$ simplifies to:
2962
%
2963
\[ \rho = \frac{1}{N} \sum_i x_i y_i \]
2964
%
2965
And it is common to simplify even further:
2966
%
2967
\[ r = \sum_i x_i y_i \]
2968
%
2969
This definition of correlation is not ``standardized'', so it doesn't
2970
generally fall between -1 and 1. But it has other useful properties.
2971
2972
If you think of $x$ and $y$ as vectors, you might recognize this
2973
formula as the {\bf dot product}, $x \cdot y$. See
2974
\url{http://en.wikipedia.org/wiki/Dot_product}.
2975
2976
\newcommand{\norm}{\mathrm{norm}}
2977
2978
The dot product indicates the degree to which the signals are similar.
2979
If they are normalized so their standard deviations are 1,
2980
%
2981
\[ x \cdot y = \cos \theta \]
2982
%
2983
where $\theta$ is the angle between the vectors. And that explains
2984
why Figure~\ref{fig.autocorr2} is a cosine curve.
2985
2986
2987
\section{Using NumPy}
2988
\label{correlate}
2989
2990
\begin{figure}
2991
% autocorr.py
2992
\centerline{\includegraphics[height=2.5in]{figs/autocorr9.eps}}
2993
\caption{Autocorrelation function computed with {\tt np.correlate}.}
2994
\label{fig.autocorr9}
2995
\end{figure}
2996
2997
NumPy provides a function, {\tt correlate}, that computes
2998
the correlation of two functions or the autocorrelation of one
2999
function. We can use it to compute the autocorrelation of
3000
the segment from the previous section:
3001
3002
\begin{verbatim}
3003
corrs2 = np.correlate(segment.ys, segment.ys, mode='same')
3004
\end{verbatim}
3005
3006
The option {\tt mode} tells {\tt correlate} what range
3007
of {\tt lag} to use. With the value {\tt 'same'}, the
3008
range is from $-N/2$ to $N/2$, where $N$ is the length of the
3009
wave array.
3010
3011
Figure~\ref{fig.autocorr9} shows the result. It is symmetric because
3012
the two signals are identical, so a negative lag on one has the same
3013
effect as a positive lag on the other. To compare with the results
3014
from {\tt autocorr}, we can select the second half:
3015
3016
\begin{verbatim}
3017
N = len(corrs2)
3018
half = corrs2[N//2:]
3019
\end{verbatim}
3020
3021
If you compare Figure~\ref{fig.autocorr9} to Figure~\ref{fig.autocorr8},
3022
you'll notice that the correlations computed by {\tt np.correlate}
3023
get smaller as the lags increase. That's because {\tt np.correlate}
3024
uses the unstandardized definition of correlation;
3025
as the lag gets bigger, the
3026
overlap between the two signals gets smaller, so the magnitude of
3027
the correlations decreases.
3028
3029
We can correct that by dividing through by the lengths:
3030
3031
\begin{verbatim}
3032
lengths = range(N, N//2, -1)
3033
half /= lengths
3034
\end{verbatim}
3035
3036
Finally, we can standardize the results so the correlation with
3037
{\tt lag=0} is 1.
3038
3039
\begin{verbatim}
3040
half /= half[0]
3041
\end{verbatim}
3042
3043
With these adjustments, the results computed by {\tt autocorr} and
3044
{\tt np.correlate} are nearly the same. They still differ by
3045
1-2\%. The reason is not important, but if you are curious: {\tt autocorr}
3046
standardizes the correlations independently for each lag; for
3047
{\tt np.correlate}, we standardized them all at the end.
3048
3049
More importantly, now you know what autocorrelation is, how to
3050
use it to estimate the fundamental period of a signal, and two
3051
ways to compute it.
3052
3053
3054
\section{Exercises}
3055
3056
Solutions to these exercises are in {\tt chap05soln.ipynb}.
3057
3058
\begin{exercise}
3059
The IPython notebook for this chapter, {\tt chap05.ipynb}, includes
3060
an interaction that lets you compute autocorrelations for different
3061
lags. Use this interaction to estimate the pitch of the vocal chirp
3062
for a few different start times.
3063
\end{exercise}
3064
3065
3066
\begin{exercise}
3067
The example code in \verb"chap05.ipynb" shows how to use autocorrelation
3068
to estimate the fundamental frequency of a periodic signal.
3069
Encapsulate this code in a function called \verb"estimate_fundamental",
3070
and use it to track the pitch of a recorded sound.
3071
3072
To see how well it works, try superimposing your pitch estimates on a
3073
spectrogram of the recording.
3074
\end{exercise}
3075
3076
3077
\begin{exercise}
3078
If you did the exercises in the previous chapter, you downloaded
3079
the historical price of BitCoins and estimated the power spectrum
3080
of the price changes. Using the same data, compute the autocorrelation
3081
of BitCoin prices. Does the autocorrelation function drop off quickly?
3082
Is there evidence of periodic behavior?
3083
\end{exercise}
3084
3085
3086
\begin{exercise}
3087
In the repository for this book you will find an IPython notebook
3088
called \verb"saxophone.ipynb" that explores autocorrelation,
3089
pitch perception, and a phenomenon called the ``missing fundamental''.
3090
Read through this notebook and run the examples. Try selecting
3091
a different segment of the recording and running the examples again.
3092
\end{exercise}
3093
3094
3095
3096
\chapter{Discrete cosine transform}
3097
\label{dct}
3098
3099
The topic of this chapter is the {\bf Discrete Cosine
3100
Transform} (DCT), which is used in MP3 and related formats for
3101
compressing music; JPEG and similar formats for images; and the MPEG
3102
family of formats for video.
3103
3104
DCT is similar in many ways to the Discrete Fourier Transform (DFT),
3105
which we have been using for spectral analysis.
3106
Once we learn how DCT works, it will be easier to explain DFT.
3107
3108
Here are the steps to get there:
3109
3110
\begin{enumerate}
3111
3112
\item We'll start with the synthesis problem: given a set of frequency
3113
components and their amplitudes, how can we construct a wave?
3114
3115
\item Next we'll rewrite the synthesis problem using NumPy arrays.
3116
This move is good for performance, and also provides insight
3117
for the next step.
3118
3119
\item We'll look at the analysis problem: given a signal and a
3120
set of frequencies, how can we find the amplitude of each frequency
3121
component? We'll start with a solution that is conceptually simple
3122
but slow.
3123
3124
\item Finally, we'll use some principles from linear algebra to find a
3125
more efficient algorithm. If you already know linear algebra,
3126
that's great, but I'll explain what you need as we go.
3127
3128
\end{enumerate}
3129
3130
The code for this chapter is in {\tt
3131
chap06.ipynb} which is in the repository for this book (see
3132
Section~\ref{code}).
3133
You can also view it at \url{http://tinyurl.com/thinkdsp06}.
3134
3135
3136
\section{Synthesis}
3137
\label{synth1}
3138
3139
Suppose I give you a list of amplitudes and a list of frequencies,
3140
and ask you to construct a signal that is the sum of these frequency
3141
components. Using objects in the {\tt thinkdsp} module, there is
3142
a simple way to perform this operation, which is called {\bf synthesis}:
3143
3144
\begin{verbatim}
3145
def synthesize1(amps, fs, ts):
3146
components = [thinkdsp.CosSignal(freq, amp)
3147
for amp, freq in zip(amps, fs)]
3148
signal = thinkdsp.SumSignal(*components)
3149
3150
ys = signal.evaluate(ts)
3151
return ys
3152
\end{verbatim}
3153
3154
{\tt amps} is a list of amplitudes, {\tt fs} is the list
3155
of frequencies, and {\tt ts} is the sequence
3156
of times where the signal should be evaluated.
3157
3158
{\tt components} is a list of {\tt CosSignal} objects, one for
3159
each amplitude-frequency pair. {\tt SumSignal} represents the
3160
sum of these frequency components.
3161
3162
Finally, {\tt evaluate} computes the value of the signal at each
3163
time in {\tt ts}.
3164
3165
We can test this function like this:
3166
3167
\begin{verbatim}
3168
amps = np.array([0.6, 0.25, 0.1, 0.05])
3169
fs = [100, 200, 300, 400]
3170
framerate = 11025
3171
3172
ts = np.linspace(0, 1, framerate)
3173
ys = synthesize1(amps, fs, ts)
3174
wave = thinkdsp.Wave(ys, framerate)
3175
\end{verbatim}
3176
3177
This example makes a signal that contains a fundamental frequency at
3178
100 Hz and three harmonics (100 Hz is a sharp G2). It renders the
3179
signal for one second at 11,025 frames per second and puts the results
3180
into a Wave object.
3181
3182
Conceptually, synthesis is pretty simple. But in this form it doesn't
3183
help much with {\bf analysis}, which is the inverse problem: given the
3184
wave, how do we identify the frequency components and their amplitudes?
3185
3186
3187
\section{Synthesis with arrays}
3188
\label{synthesis}
3189
3190
\begin{figure}
3191
\input{diagram}
3192
\caption{Synthesis with arrays.}
3193
\label{fig.synthesis}
3194
\end{figure}
3195
3196
Here's another way to write {\tt synthesize}:
3197
3198
\begin{verbatim}
3199
def synthesize2(amps, fs, ts):
3200
args = np.outer(ts, fs)
3201
M = np.cos(PI2 * args)
3202
ys = np.dot(M, amps)
3203
return ys
3204
\end{verbatim}
3205
3206
This function looks very different, but it does the same thing.
3207
Let's see how it works:
3208
3209
\begin{enumerate}
3210
3211
\item {\tt np.outer} computes the outer product of {\tt ts} and
3212
{\tt fs}. The result is an array with one row for each element
3213
of {\tt ts} and one column for each element of {\tt fs}. Each
3214
element in the array is the product of a frequency and a time, $f
3215
t$.
3216
3217
\item We multiply {\tt args} by $2 \pi$ and apply {\tt cos}, so each
3218
element of the result is $\cos (2 \pi f t)$. Since the {\tt ts} run
3219
down the columns, each column contains a cosine signal at a
3220
particular frequency, evaluated at a sequence of times.
3221
3222
\item {\tt np.dot} multiplies each row of {\tt M} by {\tt amps},
3223
element-wise, and then adds up the products. In terms of linear
3224
algebra, we are multiplying a matrix, {\tt M}, by a vector, {\tt
3225
amps}. In terms of signals, we are computing the weighted sum
3226
of frequency components.
3227
3228
\end{enumerate}
3229
3230
Figure~\ref{fig.synthesis} shows the structure of this computation.
3231
Each row of the matrix, {\tt M}, corresponds to a time
3232
from 0.0 to 1.0 seconds; $t_n$ is the time of the $n$th row.
3233
Each column corresponds to a frequency from
3234
100 to 400 Hz; $f_k$ is the frequency of the $k$th column.
3235
3236
I labeled the $n$th row with the letters $a$ through $d$; as an
3237
example, the value of $a$ is $\cos [2 \pi (100) t_n]$.
3238
3239
The result of the dot product, {\tt ys}, is a vector with one element
3240
for each row of {\tt M}. The $n$th element, labeled $e$, is the sum
3241
of products:
3242
%
3243
\[ e = 0.6 a + 0.25 b + 0.1 c + 0.05 d \]
3244
%
3245
And likewise with the other elements of {\tt ys}. So each element
3246
of {\tt y} is the sum of four frequency components, evaluated at
3247
a point in time, and multiplied by the corresponding amplitudes.
3248
And that's exactly what we wanted.
3249
3250
We can use the code from the previous section to check that the two
3251
versions of {\tt synthesize} produce the same results.
3252
3253
\begin{verbatim}
3254
ys1 = synthesize1(amps, fs, ts)
3255
ys2 = synthesize2(amps, fs, ts)
3256
max(abs(ys1 - ys2))
3257
\end{verbatim}
3258
3259
The biggest difference between {\tt ys1} and {\tt ys2} is about {\tt
3260
1e-13}, which is what we expect due to floating-point errors.
3261
3262
Writing this computation in terms of linear algebra makes the code
3263
smaller and faster. Linear algebra
3264
provides concise notation for operations on matrices and vectors. For
3265
example, we could write {\tt synthesize} like this:
3266
%
3267
\begin{eqnarray*}
3268
M &=& \cos (2 \pi t \otimes f) \\
3269
y &=& M a
3270
\end{eqnarray*}
3271
%
3272
where $a$ is a vector of amplitudes,
3273
$t$ is a vector of times, $f$ is a vector of frequencies, and
3274
$\otimes$ is the symbol for the outer product of two vectors.
3275
3276
3277
\section{Analysis}
3278
\label{analysis}
3279
3280
Now we are ready to solve the analysis problem. Suppose I give you
3281
a wave and tell you that it is the sum of cosines with a given set
3282
of frequencies. How would you find the amplitude for each frequency
3283
component? In other words, given {\tt ys}, {\tt ts} and {\tt fs},
3284
can you recover {\tt amps}?
3285
3286
In terms of linear algebra, the first step is the same as for
3287
synthesis: we compute $M = \cos (2 \pi t \otimes f)$. Then we want
3288
to find $a$ so that $y = M a$; in other words, we want to solve a
3289
linear system. NumPy provides {\tt linalg.solve}, which does
3290
exactly that.
3291
3292
Here's what the code looks like:
3293
3294
\begin{verbatim}
3295
def analyze1(ys, fs, ts):
3296
args = np.outer(ts, fs)
3297
M = np.cos(PI2 * args)
3298
amps = np.linalg.solve(M, ys)
3299
return amps
3300
\end{verbatim}
3301
3302
The first two lines use {\tt ts} and {\tt fs} to build the
3303
matrix, {\tt M}. Then {\tt np.linalg.solve} computes {\tt amps}.
3304
3305
But there's a hitch. In general we can only solve a system of linear
3306
equations if the matrix is square; that is, the number of equations
3307
(rows) is the same as the number of unknowns (columns).
3308
3309
In this example, we have only 4 frequencies, but we evaluated the
3310
signal at 11,025 times. So we have many more equations than unknowns.
3311
3312
In general if {\tt ys}
3313
contains more than 4 elements, it is unlikely that we can analyze it
3314
using only 4 frequencies.
3315
3316
But in this case, we know that the {\tt ys} were actually generated by
3317
adding only 4 frequency components, so we can use any 4 values from
3318
the wave array to recover {\tt amps}.
3319
3320
For simplicity, I'll use the first 4 samples from the signal.
3321
Using the values of {\tt ys}, {\tt fs} and {\tt ts} from
3322
the previous section, we can run {\tt analyze1} like this:
3323
3324
\begin{verbatim}
3325
n = len(fs)
3326
amps2 = analyze1(ys[:n], fs, ts[:n])
3327
\end{verbatim}
3328
3329
And sure enough, {\tt amps2} is
3330
3331
\begin{verbatim}
3332
[ 0.6 0.25 0.1 0.05 ]
3333
\end{verbatim}
3334
3335
This algorithm works, but it is slow. Solving a linear
3336
system of equations takes time proportional to $n^3$, where $n$ is
3337
the number of columns in $M$. We can do better.
3338
3339
3340
\section{Orthogonal matrices}
3341
3342
One way to solve linear systems is by inverting matrices. The
3343
inverse of a matrix $M$ is written $M^{-1}$, and it has the property
3344
that $M^{-1}M = I$. $I$ is the identity matrix, which has
3345
the value 1 on all diagonal elements and 0 everywhere else.
3346
3347
So, to solve the equation $y = Ma$, we can multiply both sides by
3348
$M^{-1}$, which yields:
3349
%
3350
\[ M^{-1}y = M^{-1} M a \]
3351
%
3352
On the right side, we can replace $M^{-1}M$ with $I$:
3353
%
3354
\[ M^{-1}y = I a \]
3355
%
3356
If we multiply $I$ by any vector $a$, the result is $a$, so
3357
%
3358
\[ M^{-1}y = a \]
3359
%
3360
This implies that if we can compute $M^{-1}$ efficiently, we can find
3361
$a$ with a simple matrix multiplication (using {\tt np.dot}). That
3362
takes time proportional to $n^2$, which is better than $n^3$.
3363
3364
Inverting a matrix is slow, in general, but some special cases are
3365
faster. In particular, if $M$ is {\bf orthogonal}, the inverse of $M$
3366
is just the transpose of $M$, written $M^T$. In NumPy
3367
transposing an array is a constant-time operation. It
3368
doesn't actually move the elements of the array; instead, it creates a
3369
``view'' that changes the way the elements are accessed.
3370
3371
Again, a matrix is orthogonal if its transpose is also its inverse;
3372
that is, $M^T = M^{-1}$. That implies that $M^TM = I$, which means we
3373
can check whether a matrix is orthogonal by computing $M^TM$.
3374
3375
So let's see what the matrix looks like in {\tt synthesize2}. In
3376
the previous example, $M$ has 11,025 rows, so it might be a good idea
3377
to work with a smaller example:
3378
3379
\begin{verbatim}
3380
def test1():
3381
amps = np.array([0.6, 0.25, 0.1, 0.05])
3382
N = 4.0
3383
time_unit = 0.001
3384
ts = np.arange(N) / N * time_unit
3385
max_freq = N / time_unit / 2
3386
fs = np.arange(N) / N * max_freq
3387
ys = synthesize2(amps, fs, ts)
3388
\end{verbatim}
3389
3390
{\tt amps} is the same vector of amplitudes we saw before.
3391
Since we have 4 frequency components, we'll sample the signal
3392
at 4 points in time. That way, $M$ is square.
3393
3394
{\tt ts} is a vector of equally spaced sample times in the range from
3395
0 to 1 time unit. I chose the time unit to be 1 millisecond, but it
3396
is an arbitrary choice, and we will see in a minute that it drops out
3397
of the computation anyway.
3398
3399
Since the frame rate is $N$ samples per time unit, the Nyquist
3400
frequency is \verb"N / time_unit / 2", which is 2000 Hz in this
3401
example. So {\tt fs} is a vector of equally spaced frequencies
3402
between 0 and 2000 Hz.
3403
3404
With these values of {\tt ts} and {\tt fs}, the matrix, $M$, is:
3405
3406
\begin{verbatim}
3407
[[ 1. 1. 1. 1. ]
3408
[ 1. 0.707 0. -0.707]
3409
[ 1. 0. -1. -0. ]
3410
[ 1. -0.707 -0. 0.707]]
3411
\end{verbatim}
3412
3413
You might recognize 0.707 as an approximation of $\sqrt{2}/2$,
3414
which is $\cos \pi/4$. You also might notice that this matrix
3415
is {\bf symmetric}, which means that the element at $(j, k)$ always
3416
equals the element at $(k, j)$. This implies that $M$ is its own
3417
transpose; that is, $M^T = M$.
3418
3419
But sadly, $M$ is not orthogonal. If we compute $M^TM$, we get:
3420
3421
\begin{verbatim}
3422
[[ 4. 1. -0. 1.]
3423
[ 1. 2. 1. -0.]
3424
[-0. 1. 2. 1.]
3425
[ 1. -0. 1. 2.]]
3426
\end{verbatim}
3427
3428
And that's not the identity matrix.
3429
3430
3431
\section{DCT-IV}
3432
\label{dctiv}
3433
3434
But if we choose {\tt ts} and {\tt fs} carefully,
3435
we can make $M$ orthogonal. There are several ways to do it, which
3436
is why there are several versions of the discrete cosine transform (DCT).
3437
3438
One simple option is to shift {\tt ts} and {\tt fs} by a half unit.
3439
This version is called DCT-IV, where ``IV'' is a roman numeral
3440
indicating that this is the fourth of eight versions of the DCT.
3441
3442
Here's an updated version of {\tt test1}:
3443
3444
\begin{verbatim}
3445
def test2():
3446
amps = np.array([0.6, 0.25, 0.1, 0.05])
3447
N = 4.0
3448
ts = (0.5 + np.arange(N)) / N
3449
fs = (0.5 + np.arange(N)) / 2
3450
ys = synthesize2(amps, fs, ts)
3451
\end{verbatim}
3452
3453
If you compare this to the previous version, you'll notice
3454
two changes. First, I added 0.5 to {\tt ts} and {\tt fs}.
3455
Second, I cancelled out \verb"time_units", which simplifies
3456
the expression for {\tt fs}.
3457
3458
With these values, $M$ is
3459
3460
\begin{verbatim}
3461
[[ 0.981 0.831 0.556 0.195]
3462
[ 0.831 -0.195 -0.981 -0.556]
3463
[ 0.556 -0.981 0.195 0.831]
3464
[ 0.195 -0.556 0.831 -0.981]]
3465
\end{verbatim}
3466
3467
And $M^TM$ is
3468
3469
\begin{verbatim}
3470
[[ 2. 0. 0. 0.]
3471
[ 0. 2. -0. 0.]
3472
[ 0. -0. 2. -0.]
3473
[ 0. 0. -0. 2.]]
3474
\end{verbatim}
3475
3476
Some of the off-diagonal elements are displayed as -0, which means
3477
that the floating-point representation is a small negative number.
3478
This matrix is very close to $2I$, which means $M$ is almost
3479
orthogonal; it's just off by a factor of 2. And for our purposes,
3480
that's good enough.
3481
3482
Because $M$ is symmetric and (almost) orthogonal, the inverse of $M$
3483
is just $M/2$. Now we can write a more efficient version of {\tt
3484
analyze}:
3485
3486
\begin{verbatim}
3487
def analyze2(ys, fs, ts):
3488
args = np.outer(ts, fs)
3489
M = np.cos(PI2 * args)
3490
amps = np.dot(M, ys) / 2
3491
return amps
3492
\end{verbatim}
3493
3494
Instead of using {\tt np.linalg.solve}, we just multiply
3495
by $M/2$.
3496
3497
Combining {\tt test2} and {\tt analyze2}, we can write an
3498
implementation of DCT-IV:
3499
3500
\begin{verbatim}
3501
def dct_iv(ys):
3502
N = len(ys)
3503
ts = (0.5 + np.arange(N)) / N
3504
fs = (0.5 + np.arange(N)) / 2
3505
args = np.outer(ts, fs)
3506
M = np.cos(PI2 * args)
3507
amps = np.dot(M, ys) / 2
3508
return amps
3509
\end{verbatim}
3510
3511
Again, {\tt ys} is the wave array. We don't have to pass
3512
{\tt ts} and {\tt fs} as parameters; \verb"dct_iv" can
3513
figure them out based on {\tt N}, the length of {\tt ys}.
3514
3515
If we've got it right, this function should solve the analysis
3516
problem; that is, given {\tt ys} it should be able to recover
3517
{\tt amps}. We can test it like this.
3518
3519
\begin{verbatim}
3520
amps = np.array([0.6, 0.25, 0.1, 0.05])
3521
N = 4.0
3522
ts = (0.5 + np.arange(N)) / N
3523
fs = (0.5 + np.arange(N)) / 2
3524
ys = synthesize2(amps, fs, ts)
3525
amps2 = dct_iv(ys)
3526
max(abs(amps - amps2))
3527
\end{verbatim}
3528
3529
Starting with {\tt amps}, we synthesize a wave array, then use
3530
\verb"dct_iv" to compute {\tt amps2}. The biggest
3531
difference between {\tt amps} and {\tt amps2} is about {\tt 1e-16},
3532
which is what we expect due to floating-point errors.
3533
3534
3535
\section{Inverse DCT}
3536
3537
Finally, notice that {\tt analyze2} and {\tt synthesize2} are almost
3538
identical. The only difference is that {\tt analyze2} divides the
3539
result by 2. We can use this insight to compute the inverse DCT:
3540
3541
\begin{verbatim}
3542
def inverse_dct_iv(amps):
3543
return dct_iv(amps) * 2
3544
\end{verbatim}
3545
3546
\verb"inverse_dct_iv" solves the synthesis problem: it
3547
takes the vector of amplitudes and returns
3548
the wave array, {\tt ys}. We can test it by starting
3549
with {\tt amps}, applying \verb"inverse_dct_iv" and \verb"dct_iv",
3550
and testing that we get back what we started with.
3551
3552
\begin{verbatim}
3553
amps = [0.6, 0.25, 0.1, 0.05]
3554
ys = inverse_dct_iv(amps)
3555
amps2 = dct_iv(ys)
3556
max(abs(amps - amps2))
3557
\end{verbatim}
3558
3559
Again, the biggest difference is about {\tt 1e-16}.
3560
3561
3562
\section{The Dct class}
3563
3564
\begin{figure}
3565
% dct.py
3566
\centerline{\includegraphics[height=2.5in]{figs/dct1.eps}}
3567
\caption{DCT of a triangle signal at 400 Hz, sampled at 10 kHz.}
3568
\label{fig.dct1}
3569
\end{figure}
3570
3571
{\tt thinkdsp} provides a Dct class that encapsulates the
3572
DCT in the same way the Spectrum class encapsulates the FFT.
3573
To make a Dct object, you can invoke \verb"make_dct" on a Wave.
3574
3575
\begin{verbatim}
3576
signal = thinkdsp.TriangleSignal(freq=400)
3577
wave = signal.make_wave(duration=1.0, framerate=10000)
3578
dct = wave.make_dct()
3579
dct.plot()
3580
\end{verbatim}
3581
3582
The result is the DCT of a triangle wave at 400 Hz, shown in
3583
Figure~\ref{dct1}. The values of the DCT can be positive or negative;
3584
a negative value in the DCT corresponds to a negated cosine or,
3585
equivalently, to a cosine shifted by 180 degrees.
3586
3587
\verb"make_dct" uses DCT-II, which is the most common type of DCT,
3588
provided by {\tt scipy.fftpack}.
3589
3590
\begin{verbatim}
3591
import scipy.fftpack
3592
3593
# class Wave:
3594
def make_dct(self):
3595
N = len(self.ys)
3596
hs = scipy.fftpack.dct(self.ys, type=2)
3597
fs = (0.5 + np.arange(N)) / 2
3598
return Dct(hs, fs, self.framerate)
3599
\end{verbatim}
3600
3601
The results from {\tt dct} are stored in {\tt hs}. The corresponding
3602
frequencies, computed as in Section~\ref{dctiv}, are stored in {\tt fs}.
3603
And then both are used to initialize the Dct object.
3604
3605
Dct provides \verb"make_wave", which performs the inverse DCT.
3606
We can test it like this:
3607
3608
\begin{verbatim}
3609
wave2 = dct.make_wave()
3610
max(abs(wave.ys-wave2.ys))
3611
\end{verbatim}
3612
3613
The biggest difference between {\tt ys1} and {\tt ys2} is about {\tt
3614
1e-16}, which is what we expect due to floating-point errors.
3615
3616
\verb"make_wave" uses {\tt scipy.fftpack.idct}:
3617
3618
\begin{verbatim}
3619
# class Dct
3620
def make_wave(self):
3621
n = len(self.hs)
3622
ys = scipy.fftpack.idct(self.hs, type=2) / 2 / n
3623
return Wave(ys, framerate=self.framerate)
3624
\end{verbatim}
3625
3626
Be default, the inverse DCT doesn't normalize the result, so we have
3627
to divide through by $2N$.
3628
3629
3630
\section{Exercises}
3631
3632
For the following exercises, I provide some starter code in
3633
{\tt chap06starter.ipynb}.
3634
Solutions are in {\tt chap06soln.ipynb}.
3635
3636
\begin{exercise}
3637
In this chapter I claim that {\tt analyze1} takes time proportional
3638
to $n^3$ and {\tt analyze2} takes time proportional to $n^2$. To
3639
see if that's true, run them on a range of input sizes and time
3640
them. In IPython, you can use the ``magic command'' \verb"%timeit".
3641
3642
If you plot run time versus input size on a log-log scale, you
3643
should get a straight line with slope 3 for {\tt analyze1} and
3644
slope 2 for {\tt analyze2}.
3645
3646
You also might want to test \verb"dct_iv"
3647
and {\tt scipy.fftpack.dct}.
3648
3649
\end{exercise}
3650
3651
3652
\begin{exercise}
3653
One of the major applications of the DCT is compression for both
3654
sound and images. In its simplest form, DCT-based compression
3655
works like this:
3656
3657
\begin{enumerate}
3658
3659
\item Break a long signal into segments.
3660
3661
\item Compute the DCT of each segment.
3662
3663
\item Identify frequency components with amplitudes so low they are
3664
inaudible, and remove them. Store only the frequencies and
3665
amplitudes that remain.
3666
3667
\item To play back the signal, load the frequencies and amplitudes
3668
for each segment and apply the inverse DCT.
3669
3670
\end{enumerate}
3671
3672
Implement a version of this algorithm and apply it to a recording
3673
of music or speech. How many components can you eliminate before
3674
the difference is perceptible?
3675
3676
In order to make this method practical, you need some way to store a
3677
sparse array; that is, an array where most of the elements are zero.
3678
NumPy provides several implementations of sparse arrays, which you can
3679
read about at
3680
\url{http://docs.scipy.org/doc/scipy/reference/sparse.html}.
3681
\end{exercise}
3682
3683
3684
\begin{exercise}
3685
In the repository for this book you will find an IPython notebook
3686
called \verb"phase.ipynb" that the effect of phase on sound
3687
perception.
3688
Read through this notebook and run the examples.
3689
Choose another segment of sound and run the same experiments.
3690
Can you find any general relationships between the phase structure
3691
of a sound and how we perceive it?
3692
\end{exercise}
3693
3694
3695
3696
3697
\chapter{Discrete Fourier Transform}
3698
\label{dft}
3699
3700
We've been using the discrete Fourier transform (DFT) since
3701
Chapter~\ref{sounds}, but I haven't explained how it works. Now is
3702
the time.
3703
3704
If you understand the discrete cosine transform (DCT), you will
3705
understand the DFT. The only difference is that instead of using the
3706
cosine function, we'll use the complex exponential function. I'll
3707
start by explaining complex exponentials, then I'll follow the
3708
same progression as in Chapter~\ref{dct}:
3709
3710
\begin{enumerate}
3711
3712
\item We'll start with the synthesis
3713
problem: given a set of frequency components and their amplitudes,
3714
how can we construct a signal? The synthesis problem is
3715
equivalent to the inverse DFT.
3716
3717
\item Then I'll rewrite the synthesis problem in the form of matrix
3718
multiplication using NumPy arrays.
3719
3720
\item Next we'll solve the analysis problem, which is equivalent to
3721
the DFT: given a signal, how to we find the amplitude and phase
3722
offset of its frequency components?
3723
3724
\item Finally, we'll use linear algebra to find a more efficient way
3725
to compute the DFT.
3726
3727
\end{enumerate}
3728
3729
The code for this chapter is in {\tt chap07.ipynb}, which is in the
3730
repository for this book (see Section~\ref{code}).
3731
You can also view it at \url{http://tinyurl.com/thinkdsp07}.
3732
3733
3734
\section{Complex exponentials}
3735
3736
One of the more interesting moves in mathematics is the generalization
3737
of an operation from one type to another. For example, factorial is a
3738
function that operates on integers; the natural definition for
3739
factorial of $n$ is the product of all integers from 1 to $n$.
3740
3741
If you are of a certain inclination, you might wonder how to compute
3742
the factorial of a non-integer like 3.5. Since the natural definition
3743
doesn't apply, you might look for other ways to compute the factorial
3744
function, ways that would work with non-integers.
3745
3746
In 1730, Leonhard Euler found one, a generalization of the factorial
3747
function that we know as the gamma function (see
3748
\url{http://en.wikipedia.org/wiki/Gamma_function}).
3749
3750
Euler also found one of the most useful generalizations in applied
3751
mathematics, the complex exponential function.
3752
3753
The natural definition of exponentiation is repeated multiplication.
3754
For example, $\phi^3 = \phi \cdot \phi \cdot \phi$. But this
3755
definition doesn't apply to non-integer exponents.
3756
3757
However, exponentiation can also be expressed as a power series:
3758
%
3759
\[ e^\phi = 1 + \phi + \phi^2/2! + \phi^3/3! + ... \]
3760
%
3761
This definition works with real numbers, imaginary numbers and, by a simple
3762
extension, with complex numbers. Applying this definition
3763
to a pure imaginary number, $i\phi$, we get
3764
%
3765
\[ e^{i\phi} = 1 + i\phi - \phi^2/2! - i\phi^3/3! + ... \]
3766
%
3767
By rearranging terms, we can show that this is equivalent to:
3768
%
3769
\[ e^{i\phi} = \cos \phi + i \sin \phi \]
3770
%
3771
You can see the derivation at
3772
\url{http://en.wikipedia.org/wiki/Euler's_formula}.
3773
3774
This formula
3775
implies that $e^{i\phi}$ is a complex number with magnitude 1; if you
3776
think of it as a point in the complex plane, it is always on the unit
3777
circle. And if you think of it as a vector, the angle in radians
3778
between the vector and the positive x-axis is the argument, $\phi$.
3779
3780
In the case where the exponent is a complex number, we have:
3781
%
3782
\[ e^{a + i\phi} = e^a e^{i\phi} = A e^{i\phi} \]
3783
%
3784
where $A$ is a real number that indicates amplitude and
3785
$e^{i\phi}$ is a unit complex number that indicates angle.
3786
3787
NumPy provides a version of {\tt exp} that works with complex numbers:
3788
3789
\begin{verbatim}
3790
>>> phi = 1.5
3791
>>> z = np.exp(1j * phi)
3792
>>> z
3793
(0.0707+0.997j)
3794
\end{verbatim}
3795
3796
Python uses {\tt j} to represent the imaginary unit, rather
3797
than {\tt i}. A number ending in {\tt j} is considered imaginary,
3798
so {\tt 1j} is just $i$.
3799
3800
When the argument to {\tt np.exp} is imaginary or complex, the
3801
result is a complex number; specifically, a {\tt np.complex128},
3802
which is represented by two 64-bit floating-point numbers.
3803
In this example, the result is {\tt 0.0707+0.997j}.
3804
3805
Complex numbers have attributes {\tt real} and {\tt imag}:
3806
3807
\begin{verbatim}
3808
>>> z.real
3809
0.0707
3810
>>> z.imag
3811
0.997
3812
\end{verbatim}
3813
3814
To get the magnitude, you can use the built-in function {\tt abs}
3815
or {\tt np.absolute}:
3816
3817
\begin{verbatim}
3818
>>> abs(z)
3819
1.0
3820
>>> np.absolute(z)
3821
1.0
3822
\end{verbatim}
3823
3824
To get the angle, you can use {\tt np.angle}:
3825
3826
\begin{verbatim}
3827
>>> np.angle(z)
3828
1.5
3829
\end{verbatim}
3830
3831
This example confirms that $e^{i \phi}$ is a complex number with
3832
magnitude 1 and angle $\phi$ radians.
3833
3834
3835
\section{Complex signals}
3836
3837
If $\phi(t)$ is a function of time, $e^{i \phi(t)}$ is also a function
3838
of time. Specifically,
3839
%
3840
\[ e^{i \phi(t)} = \cos \phi(t) + i \sin \phi(t) \]
3841
%
3842
This function describes a quantity that varies in time, so it is
3843
a signal. Specifically, it is a {\bf complex exponential signal}.
3844
3845
In the special case where the frequency of the signal is constant,
3846
$\phi(t)$ is $2 \pi f t$ and the result is a {\bf complex sinusoid}:
3847
%
3848
\[ e^{i 2 \pi f t} = \cos 2 \pi f t + i \sin 2 \pi f t \]
3849
%
3850
Or more generally, the signal might start at a phase offset
3851
$\phi_0$, yielding
3852
%
3853
\[ e^{i (2 \pi f t + \phi_0)} \]
3854
%
3855
{\tt thinkdsp} provides an implementation of this signal,
3856
{\tt ComplexSinusoid}:
3857
3858
\begin{verbatim}
3859
class ComplexSinusoid(Sinusoid):
3860
3861
def evaluate(self, ts):
3862
phases = PI2 * self.freq * ts + self.offset
3863
ys = self.amp * np.exp(1j * phases)
3864
return ys
3865
\end{verbatim}
3866
3867
{\tt ComplexSinusoid} inherits \verb"__init__" from
3868
{\tt Sinusoid}. It provides a version of {\tt evaluate}
3869
that is almost identical to {\tt Sinusoid.evaluate}; the
3870
only difference is that it uses {\tt np.exp} instead of
3871
{\tt np.sin}.
3872
3873
The result is a NumPy array of complex numbers:
3874
3875
\begin{verbatim}
3876
>>> signal = thinkdsp.ComplexSinusoid(freq=1, amp=0.6, offset=1)
3877
>>> wave = signal.make_wave(duration=1, framerate=4)
3878
>>> print(wave.ys)
3879
[ 0.324+0.505j -0.505+0.324j -0.324-0.505j 0.505-0.324j]
3880
\end{verbatim}
3881
3882
The frequency of this signal is 1 cycle per second; the amplitude
3883
is 0.6 (in unspecified units); and the phase offset is 1 radian.
3884
3885
This example evaluates the signal at 4 places equally spaced between
3886
0 and 1 second. The resulting samples are complex numbers.
3887
3888
3889
\section{The synthesis problem}
3890
3891
Just as we did with real sinusoids, we we can create compound signals
3892
by adding up complex sinusoids with different frequencies. And that
3893
brings us to the complex version of the synthesis problem: given the
3894
frequency and amplitude of each complex component, how do we evaluate the
3895
signal?
3896
3897
The simplest solution is to create {\tt ComplexSinusoid} objects and
3898
add them up.
3899
3900
\begin{verbatim}
3901
def synthesize1(amps, fs, ts):
3902
components = [thinkdsp.ComplexSinusoid(freq, amp)
3903
for amp, freq in zip(amps, fs)]
3904
signal = thinkdsp.SumSignal(*components)
3905
ys = signal.evaluate(ts)
3906
return ys
3907
\end{verbatim}
3908
3909
This function is almost identical to {\tt synthesize1} in
3910
Section~\ref{synth1}; the only difference is that I replaced
3911
{\tt CosSignal} with {\tt ComplexSinusoid}.
3912
3913
Here's an example:
3914
3915
\begin{verbatim}
3916
>>> amps = np.array([0.6, 0.25, 0.1, 0.05])
3917
>>> fs = [100, 200, 300, 400]
3918
>>> framerate = 11025
3919
>>> ts = np.linspace(0, 1, framerate)
3920
>>> ys = synthesize1(amps, fs, ts)
3921
>>> print(ys)
3922
[ 1.000 +0.000e+00j 0.995 +9.093e-02j 0.979 +1.803e-01j ...,
3923
0.979 -1.803e-01j 0.995 -9.093e-02j 1.000 -5.081e-15j]
3924
\end{verbatim}
3925
3926
At the lowest level, a complex signal is a sequence of complex
3927
numbers. But how should we interpret it? We have some intuition for
3928
real signals: they represent quantities that vary in time; for
3929
example, a sound signal represents changes in air pressure.
3930
But nothing we measure in the world yields complex numbers.
3931
3932
\begin{figure}
3933
% dft.py
3934
\centerline{\includegraphics[height=2.5in]{figs/dft1.eps}}
3935
\caption{Real and imaginary parts of a mixture of complex sinusoids.}
3936
\label{fig.dft1}
3937
\end{figure}
3938
3939
So what is a complex signal? I don't have a satisfying answer to this
3940
question. The best I can offer is two unsatisfying
3941
answers:
3942
3943
\begin{enumerate}
3944
3945
\item A complex signal is a mathematical abstraction that is useful
3946
for computation and analysis, but it does not correspond directly
3947
with anything in the real world.
3948
3949
\item If you like, you can think of a complex signal as a sequence of
3950
complex numbers that contains two signals as its real and imaginary
3951
parts.
3952
3953
\end{enumerate}
3954
3955
Taking the second point of view, we can split the previous
3956
signal into its real and imaginary parts:
3957
3958
\begin{verbatim}
3959
n = 500
3960
thinkplot.plot(ts[:n], ys[:n].real, label='real')
3961
thinkplot.plot(ts[:n], ys[:n].imag, label='imag')
3962
\end{verbatim}
3963
3964
Figure~\ref{fig.dft1} shows a segment of the result. The
3965
real part is a sum of cosines; the imaginary part is
3966
a sum of sines. Although the waveforms look different, they
3967
contain the same frequency components in the same proportions.
3968
To our ears, they sound the same (in general, we don't hear
3969
phase offsets).
3970
3971
3972
\section{Synthesis with matrices}
3973
\label{synthmat}
3974
3975
As we saw in Section~\ref{synthesis}, we can also express the synthesis
3976
problem in terms of matrix multiplication:
3977
3978
\begin{verbatim}
3979
PI2 = np.pi * 2
3980
3981
def synthesize2(amps, fs, ts):
3982
args = np.outer(ts, fs)
3983
M = np.exp(1j * PI2 * args)
3984
ys = np.dot(M, amps)
3985
return ys
3986
\end{verbatim}
3987
3988
Again, {\tt amps} is a NumPy array that contains a sequence of
3989
amplitudes.
3990
3991
{\tt fs} is a sequence containing the frequencies of the
3992
components. {\tt ts} contains the times where we will evaluate
3993
the signal.
3994
3995
{\tt args} contains the outer product of {\tt ts} and {\tt fs},
3996
with the {\tt ts} running down the rows and the {\tt fs} running
3997
across the columns (you might want to refer back to
3998
Figure~\ref{fig.synthesis}).
3999
4000
Each column of matrix {\tt M} contains a complex sinusoid with
4001
a particular frequency, evaluated at a sequence of {\tt ts}.
4002
4003
When we multiply {\tt M} by the amplitudes, the result is a vector
4004
whose elements correspond to the {\tt ts}; each element is the sum of
4005
several complex sinusoids, evaluated at a particular time.
4006
4007
Here's the example from the previous section again:
4008
4009
\begin{verbatim}
4010
>>> ys = synthesize2(amps, fs, ts)
4011
>>> print(ys)
4012
[ 1.000 +0.000e+00j 0.995 +9.093e-02j 0.979 +1.803e-01j ...,
4013
0.979 -1.803e-01j 0.995 -9.093e-02j 1.000 -5.081e-15j]
4014
\end{verbatim}
4015
4016
The result is the same.
4017
4018
In this example the amplitudes are real, but they could also be
4019
complex. What effect does a complex amplitude have on the result?
4020
Remember that we can think of a complex number in two ways: either the
4021
sum of a real and imaginary part, $x + i y$, or the product of a real
4022
amplitude and a complex exponential, $A e^{i \phi_0}$. Using the
4023
second interpretation, we can see what happens when we multiply
4024
a complex amplitude by a complex sinusoid. For each frequency, $f$,
4025
we have:
4026
%
4027
\[ A e^{i \phi_0} \cdot e^{i 2 \pi f t} = A e^{i 2 \pi f t + \phi_0} \]
4028
%
4029
Multiplying by $A e^{i \phi_0}$ multiplies the amplitude by $A$
4030
and adds the phase offset $\phi_0$.
4031
4032
\begin{figure}
4033
% dft.py
4034
\centerline{\includegraphics[height=2.5in]{figs/dft2.eps}}
4035
\caption{Real part of two complex signals that differ by a phase
4036
offset.}
4037
\label{fig.dft2}
4038
\end{figure}
4039
4040
We can test that claim by running the previous example with
4041
$\phi_0 = 1.5$ for all frequency components:
4042
4043
\begin{verbatim}
4044
phi = 1.5
4045
amps2 = amps * np.exp(1j * phi)
4046
ys2 = synthesize2(amps2, fs, ts)
4047
4048
thinkplot.plot(ts[:n], ys.real[:n])
4049
thinkplot.plot(ts[:n], ys2.real[:n])
4050
\end{verbatim}
4051
4052
Since {\tt amps}
4053
is an array of reals, multiplying by {\tt np.exp(1j * phi)} yields
4054
an array of complex numbers with phase offset {\tt phi} radians, and
4055
the same magnitudes as {\tt amps}.
4056
4057
Figure~\ref{fig.dft2} shows the result. The phase offset
4058
$\phi_0 = 1.5$ shifts the wave to the left by about one quarter of
4059
a cycle; it also changes the waveform, because the same phase
4060
offset applied to different frequencies changes how the frequency
4061
components line up with each other.
4062
4063
Now that we have the more general solution to the synthesis problem --
4064
one that handles complex amplitudes -- we are ready for the analysis
4065
problem.
4066
4067
4068
\section{The analysis problem}
4069
4070
The analysis problem is the inverse of the synthesis problem: given a
4071
sequence of samples, $y$, and knowing the frequencies
4072
that make up the signal, can we compute the complex amplitudes of the
4073
components, $a$?
4074
4075
As we saw in Section~\ref{analysis}, we can solve this problem by forming
4076
the synthesis matrix, $M$, and solving the system of linear
4077
equations, $M a = y$ for $a$.
4078
4079
\begin{verbatim}
4080
def analyze1(ys, fs, ts):
4081
args = np.outer(ts, fs)
4082
M = np.exp(1j * PI2 * args)
4083
amps = np.linalg.solve(M, ys)
4084
return amps
4085
\end{verbatim}
4086
4087
{\tt analyze1} takes a (possibly complex) wave array, {\tt ys}, a
4088
sequence of real frequencies, {\tt fs}, and a sequence of real
4089
times, {\tt ts}. It returns a sequence of complex amplitudes, {\tt
4090
amps}.
4091
4092
Continuing the previous example, we can confirm that {\tt analyze1}
4093
recovers the amplitudes we started with. For the linear system
4094
solver to work, {\tt M} has to be square, so we need {\tt ys}, {\tt
4095
fs} and {\tt ts} to have the same length. I'll insure that by
4096
slicing {\tt ys} and {\tt ts} down to the length of {\tt fs}:
4097
4098
\begin{verbatim}
4099
>>> n = len(fs)
4100
>>> amps2 = analyze1(ys[:n], fs, ts[:n])
4101
>>> print(amps2)
4102
[ 0.60+0.j 0.25-0.j 0.10+0.j 0.05-0.j]
4103
\end{verbatim}
4104
4105
These are approximately the amplitudes we started with, although
4106
each component has a small imaginary part due to
4107
floating-point errors.
4108
4109
4110
\section{Efficient analysis}
4111
4112
Unfortunately, solving a linear system of equations is slow. For the
4113
DCT, we were able to speed things up by choosing {\tt fs} and {\tt
4114
ts} so that {\tt M} is orthogonal. That way, the inverse of {\tt M}
4115
is the transpose of {\tt M}, and we can compute both DCT and inverse
4116
DCT by matrix multiplication.
4117
4118
We'll do the same thing for the DFT, with one small change.
4119
Since {\tt M} is complex, we need it to be {\bf unitary}, rather
4120
than orthogonal, which means that the inverse of {\tt M} is
4121
the conjugate transpose of {\tt M}, which we can compute by
4122
transposing the matrix and negating the imaginary part of each
4123
element. See \url{http://en.wikipedia.org/wiki/Unitary_matrix}.
4124
4125
The NumPy methods {\tt conj} and {\tt transpose} do what we
4126
want. Here's the code that computes {\tt M} for $N=4$ components:
4127
4128
\begin{verbatim}
4129
>>> N = 4
4130
>>> ts = np.arange(N) / N
4131
>>> fs = np.arange(N)
4132
>>> args = np.outer(ts, fs)
4133
>>> M = np.exp(1j * PI2 * args)
4134
\end{verbatim}
4135
4136
If $M$ is unitary, $M^*M = I$, where $M^*$ is the conjugate transpose
4137
of $M$, and $I$ is the identity matrix. We can test whether $M$
4138
is unitary like this:
4139
4140
\begin{verbatim}
4141
>>> MstarM = M.conj().transpose().dot(M)
4142
\end{verbatim}
4143
4144
The result, within the tolerance of floating-point error, is
4145
$4 I$, so $M$ is unitary except for an extra factor of $N$,
4146
similar to the extra factor of 2 we found with the DCT.
4147
4148
We can use this result to write a faster version of {\tt analyze1}:
4149
4150
\begin{verbatim}
4151
def analyze2(ys, fs, ts):
4152
args = np.outer(ts, fs)
4153
M = np.exp(1j * PI2 * args)
4154
amps = M.conj().transpose().dot(ys) / N
4155
return amps
4156
\end{verbatim}
4157
4158
And test it with appropriate values of {\tt fs} and {\tt ts}:
4159
4160
\begin{verbatim}
4161
>>> N = 4
4162
>>> amps = np.array([0.6, 0.25, 0.1, 0.05])
4163
>>> fs = np.arange(N)
4164
>>> ts = np.arange(N) / N
4165
>>> ys = synthesize2(amps, fs, ts)
4166
4167
>>> amps23 = analyze2(ys, fs, ts)
4168
>>> print(amps3)
4169
[ 0.60+0.j 0.25+0.j 0.10-0.j 0.05-0.j]
4170
\end{verbatim}
4171
4172
Again, the result is correct within the tolerance of floating-point
4173
arithmetic.
4174
4175
4176
\section{DFT}
4177
\label{dftsection}
4178
4179
As a function, {\tt analyze2} would be hard to use because it
4180
only works if {\tt fs} and {\tt ts} are chosen correctly.
4181
Instead, I will rewrite it to take just {\tt ys} and compute {\tt freq}
4182
and {\tt ts} itself.
4183
4184
First, I'll make a function to compute the synthesis matrix, $M$:
4185
4186
\begin{verbatim}
4187
def synthesis_matrix(N):
4188
ts = np.arange(N) / N
4189
fs = np.arange(N)
4190
args = np.outer(ts, fs)
4191
M = np.exp(1j * PI2 * args)
4192
return M
4193
\end{verbatim}
4194
4195
Then I'll write the function that takes {\tt ys} and returns
4196
{\tt amps}:
4197
4198
\begin{verbatim}
4199
def analyze3(ys):
4200
N = len(ys)
4201
M = synthesis_matrix(N)
4202
amps = M.conj().transpose().dot(ys) / N
4203
return amps
4204
\end{verbatim}
4205
4206
We are almost done; {\tt analyze3} computes something very
4207
close to the DFT, with one difference. The conventional definition
4208
of DFT does not divide by {\tt N}:
4209
4210
\begin{verbatim}
4211
def dft(ys):
4212
N = len(ys)
4213
M = synthesis_matrix(N)
4214
amps = M.conj().transpose().dot(ys)
4215
return amps
4216
\end{verbatim}
4217
4218
Now we can confirm that my version yields the same result as
4219
FFT:
4220
4221
\begin{verbatim}
4222
>>> dft(ys)
4223
[ 2.4+0.j 1.0+0.j 0.4-0.j 0.2-0.j]
4224
\end{verbatim}
4225
4226
The result is close to {\tt amps * N}.
4227
And here's the version in {\tt np.fft}:
4228
4229
\begin{verbatim}
4230
>>> np.fft.fft(ys)
4231
[ 2.4+0.j 1.0+0.j 0.4-0.j 0.2-0.j]
4232
\end{verbatim}
4233
4234
They are the same, within floating point error.
4235
4236
The inverse DFT is almost the same, except we don't have to transpose
4237
and conjugate $M$, and {\em now} we have to divide through by N:
4238
4239
\begin{verbatim}
4240
def idft(ys):
4241
N = len(ys)
4242
M = synthesis_matrix(N)
4243
amps = M.dot(ys) / N
4244
return amps
4245
\end{verbatim}
4246
4247
Finally, we can confirm that {\tt dft(idft(amps))} yields {\tt amps}.
4248
4249
\begin{verbatim}
4250
>>> ys = idft(amps)
4251
>>> dft(ys)
4252
[ 0.60+0.j 0.25+0.j 0.10-0.j 0.05-0.j]
4253
\end{verbatim}
4254
4255
If I could go back in time, I might change the definition of
4256
DFT so it divides by $N$ and the inverse DFT doesn't. That would
4257
be more consistent with my presentation of the synthesis and analysis
4258
problems.
4259
4260
Or I might change the definition so that both operations divide
4261
through by $\sqrt{N}$. Then the DFT and inverse DFT would be
4262
more symmetric.
4263
4264
But I can't go back in time (yet!), so we're stuck with a slightly
4265
weird convention. For practical purposes it doesn't really
4266
matter.
4267
4268
4269
\section{The DFT is periodic}
4270
4271
In this chapter I presented the DFT in the form of matrix multiplication.
4272
We compute the synthesis matrix, $M$, and the analysis matrix, $M^*$.
4273
When we multiply $M^{*}$ by the wave array, $y$, each element of the
4274
result is the product of a row from $M^*$ and $y$, which we can
4275
write in the form of a summation:
4276
%
4277
\[ DFT(y)[k] = \sum_n y[n] \exp(-2 \pi i n k / N) \]
4278
%
4279
where $k$ is an index of frequency from
4280
$0$ to $N-1$ and $n$ is an index of time from $0$ to $N-1$.
4281
So $DFT(y)[k]$ is the $k$th element of the DFT of $y$.
4282
4283
Normally we evaluate this summation for $N$ values of $k$, running from
4284
0 to $N-1$. We {\em could} evaluate it for other values of $k$, but
4285
there is no point, because they start to repeat. That is, the value at
4286
$k$ is the same as the value at $k+N$ or $k+2N$ or $k-N$, etc.
4287
4288
We can see that mathematically by plugging $k+N$ into the summation:
4289
%
4290
\[ DFT(y)[k+N] = \sum_n y[n] \exp(-2 \pi i n (k+N) / N) \]
4291
%
4292
Since there is a sum in the exponent, we can break it into two parts:
4293
%
4294
\[ DFT(y)[k+N] = \sum_n y[n] \exp(-2 \pi i n k / N) \exp(-2 \pi i n N / N) \]
4295
%
4296
In the second term, the exponent is always an integer multiple of
4297
$2 \pi$, so the result is always 1, and we can drop it:
4298
%
4299
\[ DFT(y)[k+N] = \sum_n y[n] \exp(-2 \pi i n k / N) \]
4300
%
4301
And we can see that this summation is equivalent to $ DFT(y)[k]$.
4302
So the DFT is periodic, with period $N$. You will need this result
4303
for one of the exercises below, which asks you to implement the Fast Fourier
4304
Transform (FFT).
4305
4306
As an aside, writing the DFT in the form of a summation provides an
4307
insight into how it works. If you review the diagram in
4308
Section~\ref{synthesis}, you'll see that each column of the synthesis matrix
4309
is a signal evaluated at a sequence of times. The analysis matrix is
4310
the (conjugate) transpose of the synthesis matrix, so each {\em row}
4311
is a signal evaluated at a sequence of times.
4312
4313
Therefore, each summation is the correlation of $y$ with one of the
4314
signals in the array (see Section~\ref{dotproduct}). That is, each
4315
element of the DFT is a correlation that quantifies the similarity of
4316
the wave array, $y$, and a complex exponential at a particular
4317
frequency.
4318
4319
4320
\section{DFT of real signals}
4321
4322
\begin{figure}
4323
% dft.py
4324
\centerline{\includegraphics[height=2.5in]{figs/dft3.eps}}
4325
\caption{DFT of a 500 Hz sawtooth signal sampled at 10 kHz.}
4326
\label{fig.dft3}
4327
\end{figure}
4328
4329
The Spectrum class in {\tt thinkdsp} is based on {\tt np.ftt.rfft},
4330
which computes the ``real DFT''; that is, it works with real signals.
4331
But the DFT as presented in this chapter is more general than that; it
4332
works with complex signals.
4333
4334
So what happens when we apply the ``full DFT'' to a real signal?
4335
Let's look at an example:
4336
4337
\begin{verbatim}
4338
>>> signal = thinkdsp.SawtoothSignal(freq=500)
4339
>>> wave = signal.make_wave(duration=0.1, framerate=10000)
4340
>>> hs = dft(wave.ys)
4341
>>> amps = np.absolute(hs)
4342
\end{verbatim}
4343
4344
This code makes a sawtooth wave with frequency 500 Hz, sampled at
4345
framerate 10 kHz. {\tt hs} contains the complex DFT of the wave;
4346
{\tt amps} contains the amplitude at each frequency. But what
4347
frequency do these amplitudes correspond to? If we look at the
4348
body of {\tt dft}, we see:
4349
4350
\begin{verbatim}
4351
>>> fs = np.arange(N)
4352
\end{verbatim}
4353
4354
It's tempting to think that these values are the right frequencies.
4355
The problem is that {\tt dft} doesn't know the sampling rate. The DFT
4356
assumes that the duration of the wave is 1 time unit, so it thinks the
4357
sampling rate is $N$ per time unit. In order to interpret the
4358
frequencies, we have to convert from these arbitrary time units back
4359
to seconds, like this:
4360
4361
\begin{verbatim}
4362
>>> fs = np.arange(N) * framerate / N
4363
\end{verbatim}
4364
4365
With this change, the range of frequencies is from 0 to the actual
4366
framerate, 10 kHz. Now we can plot the spectrum:
4367
4368
\begin{verbatim}
4369
>>> thinkplot.plot(fs, amps)
4370
>>> thinkplot.config(xlabel='frequency (Hz)',
4371
ylabel='amplitude')
4372
\end{verbatim}
4373
4374
Figure~\ref{fig.dft3} shows the amplitude of the signal for each
4375
frequency component from 0 to 10 kHz. The left half of the figure
4376
is what we should expect: the dominant frequency is at 500 Hz, with
4377
harmonics dropping off like $1/f$.
4378
4379
But the right half of the figure is a surprise. Past 5000 Hz, the
4380
amplitude of the harmonics start growing again, peaking at 9500 Hz.
4381
What's going on?
4382
4383
The answer: aliasing. Remember that with framerate 10000 Hz, the
4384
folding frequency is 5000 Hz. As we saw in Section~\ref{aliasing},
4385
a component at 5500 Hz is indistinguishable from a component
4386
at 4500 Hz. When we evaluate the DFT at 5500 Hz, we get the same
4387
value as at 4500 Hz. Similarly, the value at 6000 Hz is the same
4388
as the one at 4000 Hz, and so on.
4389
4390
The DFT of a real signal is symmetric around the folding frequency.
4391
Since there is no additional information past this point, we can
4392
save time by evaluating only the first half of the DFT,
4393
and that's exactly what {\tt np.fft.rfft} does.
4394
4395
4396
4397
\section{Exercises}
4398
4399
Solutions to these exercises are in {\tt chap07soln.ipynb}.
4400
4401
4402
\begin{exercise}
4403
The notebook for this chapter, {\tt chap07.ipynb}, contains
4404
additional examples and explanations. Read through it and run
4405
the code.
4406
\end{exercise}
4407
4408
4409
\begin{exercise}
4410
In this chapter, I showed how we can express the DFT and inverse DFT
4411
as matrix multiplications. These operations take time proportional to
4412
$N^2$, where $N$ is the length of the wave array. That is fast enough
4413
for many applications, but there is a faster
4414
algorithm, the Fast Fourier Transform (FFT), which takes time
4415
proportional to $N \log N$.
4416
4417
The key to the FFT is the Danielson-Lanczos lemma:
4418
%
4419
\[ DFT(y)[n] = DFT(e)[n] + exp(-2 \pi i n / N) DFT(o)[n] \]
4420
%
4421
Where $ DFT(y)[n]$ is the $n$th element of the DFT of $y$; $e$ is a
4422
wave array containing the even elements of $y$, and $o$ contains the
4423
odd elements of $y$.
4424
4425
This lemma suggests a recursive algorithm for the DFT:
4426
4427
\begin{enumerate}
4428
4429
\item Given a wave array, $y$, split it into its even elements, $e$,
4430
and its odd elements, $o$.
4431
4432
\item Compute the DFT of $e$ and $o$ by making recursive calls.
4433
4434
\item Compute $DFT(y)$ for each value of $n$ using the
4435
Danielson-Lanczos lemma.
4436
4437
\end{enumerate}
4438
4439
For the base case of this recursion, you could wait until the length
4440
of $y$ is 1. In that case, $DFT(y) = y$. Or if the length of $y$
4441
is sufficiently small, you could compute its DFT by matrix multiplication,
4442
possibly using a precomputed matrix.
4443
4444
Hint: I suggest you implement this algorithm incrementally by starting
4445
with a version that is not truly recursive. In Step 2, instead of
4446
making a recursive call, use {\tt dft}, as defined by
4447
Section~\ref{dftsection}, or {\tt np.fft.fft}. Get Step 3 working,
4448
and confirm that the results are consistent with the other
4449
implementations. Then add a base case and confirm that it works.
4450
Finally, replace Step 2 with recursive calls.
4451
4452
One more hint: Remember that the DFT is periodic; you might find {\tt
4453
np.tile} useful.
4454
4455
You can read more about the FFT at
4456
\url{https://en.wikipedia.org/wiki/Fast_Fourier_transform}.
4457
4458
\end{exercise}
4459
4460
4461
\chapter{Filtering and Convolution}
4462
4463
In this chapter I present one of the most important and useful
4464
ideas related to signal processing: the Convolution Theorem.
4465
But before we can understand the Convolution Theorem, we have to understand
4466
convolution. I'll start with a simple example, smoothing, and
4467
we'll go from there.
4468
4469
The code for this chapter is in {\tt chap08.ipynb}, which is in the
4470
repository for this book (see Section~\ref{code}).
4471
You can also view it at \url{http://tinyurl.com/thinkdsp08}.
4472
4473
4474
\section{Smoothing}
4475
\label{smoothing}
4476
4477
\begin{figure}
4478
% convolution1.py
4479
\centerline{\includegraphics[height=2.5in]{figs/convolution1.eps}}
4480
\caption{Daily closing price of Facebook stock and a 30-day moving
4481
average.}
4482
\label{fig.convolution1}
4483
\end{figure}
4484
4485
Smoothing is an operation that tries to remove short-term variations
4486
from a signal in order to reveal long-term trends. For example, if
4487
you plot daily changes in the price of a stock, it would look noisy;
4488
a smoothing operator might make it easier to see whether the price
4489
was generally going up or down over time.
4490
4491
A common smoothing algorithm is a moving average, which computes
4492
the mean of the previous $n$ values, for some value of $n$.
4493
4494
For example, Figure~\ref{fig.convolution1} shows the daily
4495
closing price of Facebook from May 17, 2012 to December 8,
4496
2015. The gray line
4497
is the raw data, the darker line shows the 30-day moving average.
4498
Smoothing removes the most
4499
extreme changes and makes it easier to see long-term trends.
4500
4501
Smoothing operations also apply to sound signals. As an example, I'll
4502
start with a square wave at 440 Hz. As we saw in
4503
Section~\ref{square}, the harmonics of a square wave drop off
4504
slowly, so it contains many high-frequency components.
4505
4506
\begin{verbatim}
4507
>>> signal = thinkdsp.SquareSignal(freq=440)
4508
>>> wave = signal.make_wave(duration=1, framerate=44100)
4509
>>> segment = wave.segment(duration=0.01)
4510
\end{verbatim}
4511
4512
{\tt wave} is a 1-second segment of the signal; {\tt segment}
4513
is a shorter segment I'll use for plotting.
4514
4515
To compute the moving average of this signal, I'll create
4516
a window with 11 elements and normalize it so the elements
4517
add up to 1.
4518
4519
\begin{verbatim}
4520
>>> window = np.ones(11)
4521
>>> window /= sum(window)
4522
\end{verbatim}
4523
4524
Now I can compute the average of the first 11 elements by
4525
multiplying the window by the wave array:
4526
4527
\begin{verbatim}
4528
>>> ys = segment.ys
4529
>>> N = len(ys)
4530
>>> padded = thinkdsp.zero_pad(window, N)
4531
>>> prod = padded * ys
4532
>>> sum(prod)
4533
-1
4534
\end{verbatim}
4535
4536
{\tt padded} is a version of the window with zeros added to
4537
the end so it has the same length as {\tt segment.ys}
4538
{\tt prod} is the product of the window and the wave array.
4539
4540
The sum of the elementwise products is the average of the first 11
4541
elements of the array. Since these elements are all -1, their average
4542
is -1.
4543
4544
\begin{figure}
4545
% convolution2.py
4546
\centerline{\includegraphics[height=2.5in]{figs/convolution2.eps}}
4547
\caption{A square signal at 400 Hz (gray) and an 11-element
4548
moving average.}
4549
\label{fig.convolution2}
4550
\end{figure}
4551
4552
To compute the next element of the smoothed signal, we shift the
4553
window to the right and compute the average of the next 11 elements of
4554
the wave array, starting with the second.
4555
4556
\begin{verbatim}
4557
>>> rolled = np.roll(rolled, 1)
4558
>>> prod = rolled * ys
4559
>>> sum(prod)
4560
-1
4561
\end{verbatim}
4562
4563
To compute the moving average, we repeat this process and store
4564
the result in an array named {\tt smoothed}:
4565
4566
\begin{verbatim}
4567
>>> smoothed = np.zeros_like(ys)
4568
>>> rolled = padded
4569
>>> for i in range(N):
4570
smoothed[i] = sum(rolled * ys)
4571
rolled = np.roll(rolled, 1)
4572
\end{verbatim}
4573
4574
{\tt rolled} is a copy of {\tt padded} that gets shifted to
4575
the right by one element each time through the loop. Inside
4576
the loop, we multiply the segment by {\tt rolled} to select
4577
11 elements, and then add them up.
4578
4579
Figure~\ref{fig.convolution2} shows the result. The gray line
4580
is the original signal; the darker line is the smoothed signal.
4581
The smoothed signal starts to ramp up when the leading edge of
4582
the window reaches the first transition, and levels off when
4583
the window crosses the transition. As a result, the transitions
4584
are less abrupt, and the corners less sharp. If you listen
4585
to the smoothed signal, it sounds less buzzy and slightly muffled.
4586
4587
4588
\section{Convolution}
4589
\label{convolution}
4590
4591
The operation we just computed is called {\bf convolution},
4592
and it is such a common operation that NumPy provides an
4593
implementation that is simpler and faster than my version:
4594
4595
\begin{verbatim}
4596
>>> convolved = np.convolve(ys, window, mode='valid')
4597
>>> smooth2 = thinkdsp.Wave(convolved, framerate=wave.framerate)
4598
\end{verbatim}
4599
4600
{\tt np.convolve} computes the convolution of the wave
4601
array and the window. The mode flag {\tt valid} indicates
4602
that it should only compute values when the window and the
4603
wave array overlap completely, so it stops when the right
4604
edge of the window reaches the end of the wave array. Other
4605
than that, the result is the same as in Figure~\ref{fig.convolution2}.
4606
4607
\newcommand{\conv}{\ast}
4608
4609
Actually, there is one other difference. The loop in the
4610
previous section actually computes {\bf cross-correlation}:
4611
%
4612
\[ (f \star g)[n] = \sum_{m=0}^{N-1} f[m] g[n+m] \]
4613
%
4614
where $f$ is a wave array with length $N$, $g$ is the window,
4615
and $\star$ is the symbol for cross-correlation. To
4616
compute the $n$th element of the result, we shift $g$ to
4617
the right, which is why the index is $n+m$.
4618
4619
The definition of convolution is slightly different:
4620
%
4621
\[ (f \conv g)[n] = \sum_{m=0}^{N-1} f[m] g[n-m] \]
4622
%
4623
The symbol $\conv$ represents convolution. The difference is in the
4624
index of $g$: $m$ has been negated, so the summation iterates the
4625
elements of $g$ backward (assuming that negative indices wrap around
4626
to the end of the array).
4627
4628
Because the window we used in this example is symmetric,
4629
cross-correlation and convolution yield the same result. When we use
4630
other windows, we will have to be more careful.
4631
4632
You might wonder why convolution is defined the way it is. There
4633
are two reasons:
4634
4635
\begin{itemize}
4636
4637
\item This definition comes up naturally for several applications,
4638
especially analysis of signal-processing systems, which is
4639
the topic of Chapter~\ref{systems}.
4640
4641
\item Also, this definition is the basis of the Convolution Theorem,
4642
coming up very soon.
4643
4644
\end{itemize}
4645
4646
4647
\section{The frequency domain}
4648
4649
\begin{figure}
4650
% convolution4.py
4651
\centerline{\includegraphics[height=2.5in]{figs/convolution4.eps}}
4652
\caption{Spectrum of the square wave before and after smoothing.}
4653
\label{fig.convolution4}
4654
\end{figure}
4655
4656
Smoothing makes the transitions in a square signal less abrupt,
4657
and makes the sound slightly muffled. Let's see what effect this
4658
operation has on the spectrum. First I'll plot the spectrum
4659
of the original wave:
4660
4661
\begin{verbatim}
4662
>>> spectrum = wave.make_spectrum()
4663
>>> spectrum.plot(color=GRAY)
4664
\end{verbatim}
4665
4666
Then the smoothed wave:
4667
4668
\begin{verbatim}
4669
>>> convolved = np.convolve(wave.ys, window, mode='same')
4670
>>> smooth = thinkdsp.Wave(convolved, framerate=wave.framerate)
4671
>>> spectrum2 = smooth.make_spectrum()
4672
>>> spectrum2.plot()
4673
\end{verbatim}
4674
4675
The mode flag {\tt same} indicates that the result should have the
4676
same length as the input. In this example, it will include a few values
4677
that ``wrap around'', but that's ok for now.
4678
4679
Figure~\ref{fig.convolution4} shows the result. The fundamental
4680
frequency is almost unchanged; the first few harmonics are
4681
attenuated, and the higher harmonics are almost eliminated. So
4682
smoothing has the effect of a low-pass filter, which we
4683
saw in Section~\ref{spectrums} and Section~\ref{pink}.
4684
4685
To see how much each component has been attenuated, we can
4686
compute the ratio of the two spectrums:
4687
4688
\begin{verbatim}
4689
>>> amps = spectrum.amps
4690
>>> amps2 = spectrum2.amps
4691
>>> ratio = amps2 / amps
4692
>>> ratio[amps<560] = 0
4693
>>> thinkplot.plot(ratio)
4694
\end{verbatim}
4695
4696
{\tt ratio} is the ratio of the amplitude before and after
4697
smoothing. When {\tt amps} is small, this ratio can be big
4698
and noisy, so for simplicity I set the ratio to 0 except
4699
where the harmonics are.
4700
4701
\begin{figure}
4702
% convolution5.py
4703
\centerline{\includegraphics[height=2.5in]{figs/convolution5.eps}}
4704
\caption{Ratio of spectrums for the square wave, before and after smoothing.}
4705
\label{fig.convolution5}
4706
\end{figure}
4707
4708
Figure~\ref{fig.convolution5} shows the result. As expected, the
4709
ratio is high for low frequencies and drops off at a cutoff frequency
4710
near 4000 Hz. But there is another feature we did not expect: above
4711
the cutoff, the ratio bounces around between 0 and 0.2.
4712
What's up with that?
4713
4714
4715
\section{The convolution theorem}
4716
\label{convtheorem}
4717
4718
\begin{figure}
4719
% convolution6.py
4720
\centerline{\includegraphics[height=2.5in]{figs/convolution6.eps}}
4721
\caption{Ratio of spectrums for the square wave, before and after
4722
smoothing, along with the DFT of the smoothing window.}
4723
\label{fig.convolution6}
4724
\end{figure}
4725
4726
\newcommand{\DFT}{\mathrm{DFT}}
4727
\newcommand{\IDFT}{\mathrm{IDFT}}
4728
4729
The answer is the convolution theorem. Stated mathematically:
4730
%
4731
\[ \DFT(f \conv g) = \DFT(f) \cdot \DFT(g) \]
4732
%
4733
where $f$ is a wave array and $g$ is a window. In words,
4734
the convolution theorem says that if we convolve $f$ and $g$,
4735
and then compute the DFT, we get the same answer as computing
4736
the DFT of $f$ and $g$, and then multiplying the results
4737
element-wise. More concisely, convolution in the time
4738
domain corresponds to multiplication in the frequency domain.
4739
4740
And that explains Figure~\ref{fig.convolution5}, because when we
4741
convolve a wave and a window, we multiply the spectrum of
4742
the wave with the spectrum of the window. To see how that works,
4743
we can compute the DFT of the window:
4744
4745
\begin{verbatim}
4746
padded = zero_pad(window, N)
4747
dft_window = np.fft.rfft(padded)
4748
thinkplot.plot(abs(dft_window))
4749
\end{verbatim}
4750
4751
{\tt padded} contains the window, padded with zeros to be the
4752
same length as {\tt wave}. \verb"dft_window" contains the
4753
DFT of the smoothing window.
4754
4755
\newcommand{\abs}{\mathrm{abs}}
4756
4757
Figure~\ref{fig.convolution6} shows the result, along with the
4758
ratios we computed in the previous section. The ratios are
4759
exactly the amplitudes in \verb"dft_window". Mathematically:
4760
%
4761
\[ \abs(\DFT(f \conv g)) / \abs(\DFT(f)) = \abs(\DFT(g)) \]
4762
%
4763
In this context, the DFT of a window is called a {\bf filter}.
4764
For any convolution window in the time domain, there is a
4765
corresponding filter in the frequency domain. And for any
4766
filter than can be expressed by element-wise multiplication in
4767
the frequency domain, there is a corresponding window.
4768
4769
4770
\section{Gaussian filter}
4771
4772
\begin{figure}
4773
% convolution7.py
4774
\centerline{\includegraphics[height=2.5in]{figs/convolution7.eps}}
4775
\caption{Boxcar and Gaussian windows.}
4776
\label{fig.convolution7}
4777
\end{figure}
4778
4779
The moving average window we used in the previous section is a
4780
low-pass filter, but it is not a very good one. The DFT drops off
4781
steeply at first, but then it bounces around. Those bounces are
4782
called {\bf sidelobes}, and they are there because the moving average
4783
window is like a square wave, so its spectrum contains high-frequency
4784
harmonics that drop off proportionally to $1/f$, which is relatively
4785
slow.
4786
4787
We can do better with a Gaussian window. SciPy provides functions
4788
that compute many common convolution windows, including {\tt
4789
gaussian}:
4790
4791
\begin{verbatim}
4792
gaussian = scipy.signal.gaussian(M=11, std=2)
4793
gaussian /= sum(gaussian)
4794
\end{verbatim}
4795
4796
{\tt M} is the number of elements in the window; {\tt std}
4797
is the standard deviation of the Gaussian distribution used to
4798
compute it. Figure~\ref{fig.convolution7} shows the shape
4799
of the window. It is a discrete approximation of the Gaussian
4800
``bell curve''. The figure also shows the moving average window
4801
from the previous example, which is sometimes called a
4802
``boxcar'' window because it looks like a rectangular railway car.
4803
4804
\begin{figure}
4805
% convolution.py
4806
\centerline{\includegraphics[height=2.5in]{figs/convolution8.eps}}
4807
\caption{Ratio of spectrums before and after Gaussian smoothing, and
4808
the DFT of the window.}
4809
\label{fig.convolution8}
4810
\end{figure}
4811
4812
I ran the computations from the previous sections again
4813
with this curve, and generated Figure~\ref{fig.convolution8},
4814
which shows the ratio of the spectrums before and after
4815
smoothing, along with the DFT of the Gaussian window.
4816
4817
As a low-pass filter, Gaussian smoothing is better than a simple
4818
moving average. After the ratio drops off, it stays low, with almost
4819
none of the sidelobes we saw with the boxcar window. So it does a
4820
better job of cutting off the higher frequencies.
4821
4822
The reason it does so well is that the DFT of a Gaussian curve is also a
4823
Gaussian curve. So the ratio drops off in proportion to $\exp(-f^2)$,
4824
which is much faster than $1/f$.
4825
4826
4827
\section{Efficient convolution}
4828
\label{effconv}
4829
4830
One of the reasons the FFT is such an important algorithm is that,
4831
combined with the Convolution Theorem, it provides an efficient
4832
way to compute convolution, cross-correlation, and autocorrelation.
4833
4834
Again, the Convolution Theorem states
4835
%
4836
\[ \DFT(f \conv g) = \DFT(f) \cdot \DFT(g) \]
4837
%
4838
So one way to compute a convolution is:
4839
%
4840
\[ f \conv g = \IDFT(\DFT(f) \cdot \DFT(g)) \]
4841
%
4842
where $IDFT$ is the inverse DFT. A simple implementation of
4843
convolution takes time proportional to $N^2$; this algorithm,
4844
using FFT, takes time proportional to $N \log N$.
4845
4846
We can confirm that it works by computing the same convolution
4847
both ways. As an example, I'll apply it to the BitCoin data
4848
shown in Figure~\ref{fig.convolution1}.
4849
4850
\begin{verbatim}
4851
df = pandas.read_csv('coindesk-bpi-USD-close.csv',
4852
nrows=1625,
4853
parse_dates=[0])
4854
ys = df.Close.values
4855
\end{verbatim}
4856
4857
This example uses Pandas to read the data from the CSV file (included
4858
in the repository for this book). If you are not familiar with
4859
Pandas, don't worry: I'm not going to do much with it in this book.
4860
But if you're interested, you can learn more about it in
4861
{\it Think Stats} at \url{http://thinkstats2.com}.
4862
4863
The result, {\tt df}, is a DataFrame, one of the data structures
4864
provided by Pandas. {\tt ys} is a NumPy array that contains daily
4865
closing prices.
4866
4867
Next I'll create a Gaussian window and convolve it with {\tt ys}:
4868
4869
\begin{verbatim}
4870
window = scipy.signal.gaussian(M=30, std=6)
4871
window /= window.sum()
4872
smoothed = np.convolve(ys, window, mode='valid')
4873
\end{verbatim}
4874
4875
\verb"fft_convolve" computes the same thing using FFT:
4876
4877
\begin{verbatim}
4878
from np.fft import fft, ifft
4879
4880
def fft_convolve(signal, window):
4881
fft_signal = fft(signal)
4882
fft_window = fft(window)
4883
return ifft(fft_signal * fft_window)
4884
\end{verbatim}
4885
4886
We can test it by padding the window to the same length
4887
as {\tt ys} and then computing the convolution:
4888
4889
\begin{verbatim}
4890
padded = zero_pad(window, N)
4891
smoothed2 = fft_convolve(ys, padded)
4892
\end{verbatim}
4893
4894
The result has $M-1$ bogus values at the beginning, where $M$ is the
4895
length of the window. If we slice off the bogus values, the result
4896
agrees with \verb"fft_convolve" with about 12 digits of precision.
4897
4898
\begin{verbatim}
4899
M = len(window)
4900
smoothed2 = smoothed2[M-1:]
4901
\end{verbatim}
4902
4903
4904
\section{Efficient autocorrelation}
4905
4906
\begin{figure}
4907
% convolution.py
4908
\centerline{\includegraphics[height=2.5in]{figs/convolution9.eps}}
4909
\caption{Autocorrelation functions computed by NumPy and
4910
{\tt fft\_correlate}.}
4911
\label{fig.convolution9}
4912
\end{figure}
4913
4914
In Section~\ref{convolution} I presented definitions of
4915
cross-correlation and convolution, and we saw that they are
4916
almost the same, except that in convolution the window is
4917
reversed.
4918
4919
Now that we have an efficient algorithm for convolution, we
4920
can also use it to compute cross-correlations and autocorrelations.
4921
Using the data from the previous section, we can compute the
4922
autocorrelation Facebook stock prices:
4923
4924
\begin{verbatim}
4925
corrs = np.correlate(close, close, mode='same')
4926
\end{verbatim}
4927
4928
With {\tt mode='same'}, the result has the same length as {\tt close},
4929
corresponding to lags from $-N/2$ to $N/2-1$.
4930
The gray line in Figure~\ref{fig.convolution9} shows the result.
4931
Except at {\tt lag=0}, there are no peaks, so there is no apparent
4932
periodic behavior in this signal. However, the autocorrelation
4933
function drops off slowly, suggesting that this signal resembled
4934
pink noise, as we saw in Section~\ref{autopink}.
4935
4936
To compute autocorrelation using convolution,
4937
have to zero-pad the signal to double the length.
4938
This trick is necessary because the FFT is based
4939
on the assumption that the signal is periodic; that is, that it wraps
4940
around from the end to the beginning. With time-series data like
4941
this, that assumption is invalid. Adding zeros, and then trimming
4942
the results, removes the bogus values.
4943
4944
Also, remember that convolution reverse the direction of the window.
4945
In order to cancel that effect, we reverse the direction of the
4946
window before calling \verb"fft_convolve", using {\tt np.flipud},
4947
which flips a NumPy array. The result is a view of the array,
4948
not a copy, so this operation is fast.
4949
4950
\begin{verbatim}
4951
def fft_autocorr(signal):
4952
N = len(signal)
4953
signal = thinkdsp.zero_pad(signal, 2*N)
4954
window = np.flipud(signal)
4955
4956
corrs = fft_convolve(signal, window)
4957
corrs = np.roll(corrs, N//2+1)[:N]
4958
return corrs
4959
\end{verbatim}
4960
4961
The result from \verb"fft_convolve" has length $2N$. Of those,
4962
the first and last $N/2$ are valid; the rest are the result of
4963
zero-padding. To select the valid element, we roll the results
4964
and select the first $N$, corresponding to lags from $-N/2$ to
4965
$N/2-1$.
4966
4967
As shown in Figure~\ref{fig.convolution9} the results from
4968
\verb"fft_autocorr" and {\tt np.correlate} are identical (with
4969
about 9 digits of precision).
4970
4971
Notice that the correlations in Figure~\ref{fig.convolution9} are
4972
large numbers; we could normalize them (between -1 and 1) as shown
4973
in Section~\ref{correlate}.
4974
4975
The strategy we used here for auto-correlation also works for
4976
cross-correlation. Again, you have to prepare the signals by flipping
4977
one and padding both, and then you have to trim the invalid parts of
4978
the result. This padding and trimming is a nuisance, but that's why
4979
libraries like NumPy provide functions to do it for you.
4980
4981
4982
\section{Exercises}
4983
4984
Solutions to these exercises are in {\tt chap08soln.ipynb}.
4985
4986
\begin{exercise}
4987
The notebook for this chapter is {\tt chap08.ipynb}.
4988
Read through it and run the code.
4989
4990
It contains an interactive widget that lets you
4991
experiment with the parameters of the Gaussian window to see
4992
what effect they have on the cutoff frequency.
4993
4994
What goes wrong when you increase the width of the Gaussian,
4995
{\tt std}, without increasing the number of elements in the window,
4996
{\tt M}?
4997
\end{exercise}
4998
4999
5000
\begin{exercise}
5001
In this chapter I claimed that the Fourier transform of a Gaussian
5002
curve is also a Gaussian curve. For discrete Fourier transforms,
5003
this relationship is approximately true.
5004
5005
Try it out for a few examples. What happens to the Fourier transform
5006
as you vary {\tt std}?
5007
\end{exercise}
5008
5009
5010
\begin{exercise}
5011
If you did the exercises in Chapter~\ref{nonperiodic}, you saw the
5012
effect of the Hamming window, and some of the other windows provided
5013
by NumPy, on spectral leakage. We can get some insight into the
5014
effect of these windows by looking at their DFTs.
5015
5016
In addition to the Gaussian window we used in this window, create a
5017
Hamming window with the same size. Zero pad the windows and plot
5018
their DFTs. Which window acts as a better low-pass filter? You might
5019
find it useful to plot the DFTs on a log-$y$ scale.
5020
5021
Experiment with a few different windows and a few different sizes.
5022
\end{exercise}
5023
5024
5025
5026
\chapter{Differentiation and Integration}
5027
\label{diffint}
5028
5029
This chapter picks up where the previous chapter left off,
5030
looking at the relationship between windows in the time domain
5031
and filters in the frequency domain.
5032
5033
In particular, we'll look at the effect of a finite difference
5034
window, which approximates differentiation, and the cumulative
5035
sum operation, which approximates integration.
5036
5037
The code for this chapter is in {\tt chap09.ipynb}, which is in the
5038
repository for this book (see Section~\ref{code}).
5039
You can also view it at \url{http://tinyurl.com/thinkdsp09}.
5040
5041
5042
\section{Finite differences}
5043
\label{diffs}
5044
5045
In Section~\ref{smoothing}, we applied a smoothing window to
5046
the stock price of Facebook and found that a smoothing
5047
window in the time domain corresponds to a low-pass filter in
5048
the frequency domain.
5049
5050
In this section, we'll look at daily price changes and
5051
see that computing the difference between successive elements,
5052
in the time domain, corresponds to a high-pass filter.
5053
5054
Here's the code to read the data, store it as a wave, and compute its
5055
spectrum.
5056
5057
\begin{verbatim}
5058
import pandas as pd
5059
5060
names = ['date', 'open', 'high', 'low', 'close', 'volume']
5061
df = pd.read_csv('fb.csv', header=0, names=names)
5062
ys = df.close.values[::-1]
5063
close = thinkdsp.Wave(ys, framerate=1)
5064
spectrum = wave.make_spectrum()
5065
\end{verbatim}
5066
5067
This example uses Pandas to read the CSV file; the
5068
result is a DataFrame, {\tt df}, with columns for the opening
5069
price, closing price, and high and low prices. I select the closing
5070
prices and save them in a Wave object.
5071
The framerate is 1 sample per day.
5072
5073
Figure~\ref{fig.diff_int1} shows
5074
this time series and its spectrum.
5075
Visually, the time series resembles Brownian noise (see
5076
Section~\ref{brownian}).
5077
And the spectrum looks like a straight
5078
line, albeit a noisy one. The estimated slope is -1.9,
5079
which is consistent with Brownian noise.
5080
5081
\begin{figure}
5082
% diff_int.py
5083
\centerline{\includegraphics[height=2.5in]{figs/diff_int1.eps}}
5084
\caption{Daily closing price of Facebook and the spectrum of this time
5085
series.}
5086
\label{fig.diff_int1}
5087
\end{figure}
5088
5089
Now let's compute the daily price change using {\tt np.diff}:
5090
5091
\begin{verbatim}
5092
diff = np.diff(ys)
5093
change = thinkdsp.Wave(diff, framerate=1)
5094
change_spectrum = change.make_spectrum()
5095
\end{verbatim}
5096
5097
Figure~\ref{fig.diff_int2} shows the resulting wave and its spectrum.
5098
The daily changes resemble white noise, and the estimated slope of the
5099
spectrum, -0.06, is near zero, which is what we expect for white
5100
noise.
5101
5102
\begin{figure}
5103
% diff_int2.py
5104
\centerline{\includegraphics[height=2.5in]{figs/diff_int2.eps}}
5105
\caption{Daily price change of Facebook and the spectrum of this time series.}
5106
\label{fig.diff_int2}
5107
\end{figure}
5108
5109
5110
\section{The frequency domain}
5111
5112
Computing the difference
5113
between successive elements is the same as convolution with
5114
the window {\tt [1, -1]}.
5115
If the order of those elements seems backward,
5116
remember that convolution reverses the window before applying it
5117
to the signal.
5118
5119
We can see the effect of this operation in the frequency domain
5120
by computing the DFT of the window.
5121
5122
\begin{verbatim}
5123
diff_window = np.array([1.0, -1.0])
5124
padded = thinkdsp.zero_pad(diff_window, len(close))
5125
diff_wave = thinkdsp.Wave(padded, framerate=close.framerate)
5126
diff_filter = diff_wave.make_spectrum()
5127
\end{verbatim}
5128
5129
\begin{figure}
5130
% diff_int.py
5131
\centerline{\includegraphics[height=2.5in]{figs/diff_int3.eps}}
5132
\caption{Filters corresponding to the diff and differentiate operators (left) and integration operator (right, log-$y$ scale).}
5133
\label{fig.diff_int3}
5134
\end{figure}
5135
5136
Figure~\ref{fig.diff_int3} shows the result. The finite difference
5137
window corresponds to a high pass filter: its amplitude increases with
5138
frequency, linearly for low frequencies, and then sublinearly after
5139
that. In the next section, we'll see why.
5140
5141
5142
\section{Differentiation}
5143
\label{effdiff}
5144
5145
The window we used in the previous section is a
5146
numerical approximation of the first derivative, so the filter
5147
approximates the effect of differentiation.
5148
5149
Differentiation in the time domain corresponds to a simple filter
5150
in the frequency domain; we can figure out what it is with a little
5151
math.
5152
5153
Suppose we have a complex sinusoid with frequency $f$:
5154
%
5155
\[ E_f(t) = e^{2 \pi i f t} \]
5156
%
5157
The first derivative of $E_f$ is
5158
%
5159
\[ \frac{d}{dt} E_f(t) = 2 \pi i f e^{2 \pi i f t} \]
5160
%
5161
which we can rewrite as
5162
%
5163
\[ \frac{d}{dt} E_f(t) = 2 \pi i f E_f(t) \]
5164
%
5165
In other words, taking the derivative of $E_f$ is the same
5166
as multiplying by $2 \pi i f$, which is a complex number
5167
with magnitude $2 \pi f$ and angle $\pi/2$.
5168
5169
We can compute the filter that corresponds to differentiation,
5170
like this:
5171
5172
\begin{verbatim}
5173
deriv_filter = close.make_spectrum()
5174
deriv_filter.hs = PI2 * 1j * deriv_filter.fs
5175
\end{verbatim}
5176
5177
I started with the spectrum of {\tt close}, which has the right
5178
size and framerate, then replaced the {\tt hs} with $2 \pi i f$.
5179
Figure~\ref{fig.diff_int3} (left) shows this filter; it is a straight line.
5180
5181
As we saw in Section~\ref{synthmat}, multiplying a complex sinusoid
5182
by a complex number has two effects: it multiplies
5183
the amplitude, in this case by $2 \pi f$, and shifts the phase
5184
offset, in this case by $\pi/2$.
5185
5186
If you are familiar with the language of operators and eigenfunctions,
5187
each $E_f$ is an eigenfunction of the differentiation operator, with the
5188
corresponding eigenvalue $2 \pi i f$. See
5189
\url{http://en.wikipedia.org/wiki/Eigenfunction}.
5190
5191
If you are not familiar with that language, here's what it
5192
means:
5193
5194
\newcommand{\op}{\mathcal{A}}
5195
5196
\begin{itemize}
5197
5198
\item An operator is a function that takes a function and returns
5199
another function. For example, differentiation is an operator.
5200
5201
\item A function, $g$, is an eigenfunction of an operator, $\op$, if
5202
applying $\op$ to $g$ has the effect of multiplying the function by
5203
a scalar. That is, $\op g = \lambda g$.
5204
5205
\item In that case, the scalar $\lambda$ is the eigenvalue that
5206
corresponds to the eigenfunction $g$.
5207
5208
\item A given operator might have many eigenfunctions, each with
5209
a corresponding eigenvalue.
5210
5211
\end{itemize}
5212
5213
Because complex sinusoids are eigenfunctions of the differentiation
5214
operator, they are easy to differentiate. All we have to do is
5215
multiply by a complex scalar.
5216
5217
For signals with more than one
5218
component, the process is only slightly harder:
5219
5220
\begin{enumerate}
5221
5222
\item Express the signal as the sum of complex sinusoids.
5223
5224
\item Compute the derivative of each component by multiplication.
5225
5226
\item Add up the differentiated components.
5227
5228
\end{enumerate}
5229
5230
If that process sounds familiar, that's because it is identical
5231
to the algorithm for convolution in Section~\ref{effconv}: compute
5232
the DFT, multiply by a filter, and compute the inverse DFT.
5233
5234
{\tt Spectrum} provides a method that applies the differentiation
5235
filter:
5236
5237
\begin{verbatim}
5238
# class Spectrum:
5239
5240
def differentiate(self):
5241
self.hs *= PI2 * 1j * self.fs
5242
\end{verbatim}
5243
5244
We can use it to compute the derivative of the Facebook time series:
5245
5246
\begin{verbatim}
5247
deriv_spectrum = close.make_spectrum()
5248
deriv_spectrum.differentiate()
5249
deriv = deriv_spectrum.make_wave()
5250
\end{verbatim}
5251
5252
\begin{figure}
5253
% diff_int.py
5254
\centerline{\includegraphics[height=2.5in]{figs/diff_int4.eps}}
5255
\caption{Comparison of daily price changes computed by
5256
{\tt np.diff} and by applying the differentiation filter.}
5257
\label{fig.diff_int4}
5258
\end{figure}
5259
5260
Figure~\ref{fig.diff_int4} compares the daily price changes computed by
5261
{\tt np.diff} with the derivative we just computed.
5262
I selected the first 50 values in the time series so we can see the
5263
differences more clearly.
5264
5265
The derivative is noisier, because it amplifies the high frequency
5266
components more, as shown in Figure~\ref{fig.diff_int3} (left). Also, the
5267
first few elements of the derivative are very noisy. The problem
5268
there is that the DFT-based derivative is based on the assumption that
5269
the signal is periodic. In effect, it connects the last element in
5270
the time series back to the first element, which creates artifacts at
5271
the boundaries.
5272
5273
To summarize, we have shown:
5274
5275
\begin{itemize}
5276
5277
\item Computing the difference between successive values in a signal
5278
can be expressed as convolution with a simple window. The result is
5279
an approximation of the first derivative.
5280
5281
\item Differentiation in the time domain corresponds to a simple
5282
filter in the frequency domain. For periodic signals, the result is
5283
the first derivative, exactly. For some non-periodic signals, it
5284
can approximate the derivative.
5285
5286
\end{itemize}
5287
5288
Using the DFT to compute derivatives is the basis of {\bf spectral
5289
methods} for solving differential equations (see
5290
\url{http://en.wikipedia.org/wiki/Spectral_method}).
5291
5292
In particular, it is useful for the analysis of linear, time-invariant
5293
systems, which is coming up in Chapter~\ref{systems}.
5294
5295
5296
\section{Integration}
5297
5298
\begin{figure}
5299
% diff_int.py
5300
\centerline{\includegraphics[height=2.5in]{figs/diff_int5.eps}}
5301
\caption{Comparison of the original time series and the integrated
5302
derivative.}
5303
\label{fig.diff_int5}
5304
\end{figure}
5305
5306
In the previous section, we showed that differentiation in the time
5307
domain corresponds to a simple filter in the frequency domain: it
5308
multiplies each component by $2 \pi i f$. Since integration is
5309
the inverse of differentiation, it also corresponds to a simple
5310
filter: it divides each component by $2 \pi i f$.
5311
5312
We can compute this filter like this:
5313
5314
\begin{verbatim}
5315
integ_filter = close.make_spectrum()
5316
integ_filter.hs = 1 / (PI2 * 1j * integ_filter.fs)
5317
\end{verbatim}
5318
5319
Figure~\ref{fig.diff_int3} (right) shows this filter on a log-$y$ scale,
5320
which makes it easier to see.
5321
5322
{\tt Spectrum} provides a method that applies the integration filter:
5323
5324
\begin{verbatim}
5325
# class Spectrum:
5326
5327
def integrate(self):
5328
self.hs /= PI2 * 1j * self.fs
5329
\end{verbatim}
5330
5331
We can confirm that the integration filter is correct by applying it
5332
to the spectrum of the derivative we just computed:
5333
5334
\begin{verbatim}
5335
integ_spectrum = deriv_spectrum.copy()
5336
integ_spectrum.integrate()
5337
\end{verbatim}
5338
5339
But notice that at $f=0$, we are dividing by 0. The result in
5340
NumPy is NaN, which is a special floating-point value that
5341
represents ``not a number''. We can partially deal with this
5342
problem by setting this value to 0 before converting the
5343
spectrum back to a wave:
5344
5345
\begin{verbatim}
5346
integ_spectrum.hs[0] = 0
5347
integ_wave = integ_spectrum.make_wave()
5348
\end{verbatim}
5349
5350
Figure~\ref{fig.diff_int5} shows this integrated derivative along with
5351
the original time series. They are almost identical, but the
5352
integrated derivative has been shifted down. The problem is that when
5353
we clobbered the $f=0$ component, we set the bias of the signal to 0.
5354
But that should not be surprising; in general, differentiation loses
5355
information about the bias, and integration can't recover it. In some
5356
sense, the NaN at $f=0$ is telling us that this element is unknown.
5357
5358
\begin{figure}
5359
% diff_int.py
5360
\centerline{\includegraphics[height=2.5in]{figs/diff_int6.eps}}
5361
\caption{A sawtooth wave and its spectrum.}
5362
\label{fig.diff_int6}
5363
\end{figure}
5364
5365
If we provide this ``constant of integration'', the results are
5366
identical, which confirms that this integration filter is the correct
5367
inverse of the differentiation filter.
5368
5369
\section{Cumulative sum}
5370
\label{cumsum}
5371
5372
In the same way that the diff operator approximates differentiation,
5373
the cumulative sum approximates integration.
5374
I'll demonstrate with a Sawtooth signal.
5375
5376
\begin{verbatim}
5377
signal = thinkdsp.SawtoothSignal(freq=50)
5378
in_wave = signal.make_wave(duration=0.1, framerate=44100)
5379
\end{verbatim}
5380
5381
Figure~\ref{fig.diff_int6} shows this wave and its spectrum.
5382
5383
{\tt Wave} provides a method that computes the cumulative sum of
5384
a wave array and returns a new Wave object:
5385
5386
\begin{verbatim}
5387
# class Wave:
5388
5389
def cumsum(self):
5390
ys = np.cumsum(self.ys)
5391
ts = self.ts.copy()
5392
return Wave(ys, ts, self.framerate)
5393
\end{verbatim}
5394
5395
We can use it to compute the cumulative sum of \verb"in_wave":
5396
5397
\begin{verbatim}
5398
out_wave = in_wave.cumsum()
5399
out_wave.unbias()
5400
\end{verbatim}
5401
5402
\begin{figure}
5403
% diff_int.py
5404
\centerline{\includegraphics[height=2.5in]{figs/diff_int7.eps}}
5405
\caption{A parabolic wave and its spectrum.}
5406
\label{fig.diff_int7}
5407
\end{figure}
5408
5409
Figure~\ref{fig.diff_int7} shows the resulting wave and its spectrum.
5410
If you did the exercises in Chapter~\ref{harmonics}, this waveform should
5411
look familiar: it's a parabolic signal.
5412
5413
Comparing the spectrum of the parabolic signal to the spectrum of the
5414
sawtooth, the amplitudes of the components drop off more quickly. In
5415
Chapter~\ref{harmonics}, we saw that the components of the sawtooth
5416
drop off in proportion to $1/f$. Since the cumulative sum
5417
approximates integration, and integration filters components in
5418
proportion to $1/f$, the components of the parabolic wave drop off in
5419
proportion to $1/f^2$.
5420
5421
We can see that graphically by computing the filter that corresponds
5422
to the cumulative sum:
5423
5424
\begin{verbatim}
5425
cumsum_filter = diff_filter.copy()
5426
cumsum_filter.hs = 1 / cumsum_filter.hs
5427
\end{verbatim}
5428
5429
Because {\tt cumsum} is the inverse operation of {\tt diff}, we
5430
start with a copy of \verb"diff_filter", which is the filter
5431
that corresponds to the {\tt diff} operation, and then invert the
5432
{\tt hs}.
5433
5434
\begin{figure}
5435
% diff_int.py
5436
\centerline{\includegraphics[height=2.5in]{figs/diff_int8.eps}}
5437
\caption{Filters corresponding to cumulative sum and integration.}
5438
\label{fig.diff_int8}
5439
\end{figure}
5440
5441
Figure~\ref{fig.diff_int8} shows the filters corresponding to
5442
cumulative sum and integration. The cumulative sum is a good
5443
approximation of integration except at the highest frequencies,
5444
where it drops off a little faster.
5445
5446
To confirm that this is the correct filter for the cumulative
5447
sum, we can compare it to the ratio of the spectrum
5448
\verb"out_wave" to the spectrum of \verb"in_wave":
5449
5450
\begin{verbatim}
5451
in_spectrum = in_wave.make_spectrum()
5452
out_spectrum = out_wave.make_spectrum()
5453
ratio_spectrum = out_spectrum.ratio(in_spectrum, thresh=1)
5454
\end{verbatim}
5455
5456
\begin{figure}
5457
% diff_int.py
5458
\centerline{\includegraphics[height=2.5in]{figs/diff_int9.eps}}
5459
\caption{Filter corresponding to cumulative sum and actual ratios of
5460
the before-and-after spectrums.}
5461
\label{fig.diff_int9}
5462
\end{figure}
5463
5464
And here's the method that computes the ratios:
5465
5466
\begin{verbatim}
5467
def ratio(self, denom, thresh=1):
5468
ratio_spectrum = self.copy()
5469
ratio_spectrum.hs /= denom.hs
5470
ratio_spectrum.hs[denom.amps < thresh] = np.nan
5471
return ratio_spectrum
5472
\end{verbatim}
5473
5474
When {\tt denom.amps} is small, the resulting ratio is noisy,
5475
so I set those values to NaN.
5476
5477
Figure~\ref{fig.diff_int8} shows the ratios and the filter
5478
corresponding to the cumulative sum. They agree, which confirms that
5479
inverting the filter for {\tt diff} yields the filter for {\tt
5480
cumsum}.
5481
5482
Finally, we can confirm that the Convolution Theorem applies by
5483
applying the {\tt cumsum} filter in the frequency domain:
5484
5485
\begin{verbatim}
5486
out_wave2 = (in_spectrum * cumsum_filter).make_wave()
5487
\end{verbatim}
5488
5489
Within the limits of floating-point error, \verb"out_wave2" is
5490
identical to \verb"out_wave", which we computed using {\tt cumsum}, so
5491
the Convolution Theorem works! But note that this demonstration only
5492
works with periodic signals.
5493
5494
5495
\section{Integrating noise}
5496
5497
In Section~\ref{brownian}, we generated Brownian noise by computing the
5498
cumulative sum of white noise.
5499
Now that we understand the effect of {\tt cumsum} in the frequency
5500
domain, we have some insight into the spectrum of Brownian noise.
5501
5502
White noise has equal power at all frequencies, on average. When we
5503
compute the cumulative sum, the amplitude of each component is divided
5504
by $f$. Since power is the square of magnitude, the power of each
5505
component is divided by $f^2$. So on average, the power at frequency
5506
$f$ is proportional to $1 / f^2$:
5507
%
5508
\[ P_f = K / f^2 \]
5509
%
5510
where $K$ is a constant that's not important.
5511
Taking the log of both sides yields:
5512
%
5513
\[ \log P_f = \log K - 2 \log f \]
5514
%
5515
And that's why, when we plot the spectrum of Brownian noise on a
5516
log-log scale, we expect to see a straight line with slope -2, at
5517
least approximately.
5518
5519
In Section~\ref{diffs} we plotted the spectrum of closing prices for
5520
Facebook, and estimated that the slope is -1.9, which is consistent
5521
with Brownian noise. Many stock prices have similar spectrums.
5522
5523
When we use the {\tt diff} operator to compute daily changes, we
5524
multiplied the {\em amplitude} of each component by a filter proportional to
5525
$f$, which means we multiplied the {\em power} of each component by $f^2$.
5526
On a log-log scale, this operation adds 2 to the slope of the
5527
power spectrum, which is why the estimated slope of the result
5528
is near 0.1 (but a little lower, because {\tt diff} only approximates
5529
differentiation).
5530
5531
5532
5533
\section{Exercises}
5534
5535
Solutions to these exercises are in {\tt chap09soln.ipynb}.
5536
5537
\begin{exercise}
5538
The notebook for this chapter is {\tt chap08.ipynb}.
5539
Read through it and run the code.
5540
5541
In Section~\ref{cumsum}, I mentioned that some of the
5542
examples don't work with non-periodic signals. Try replacing the
5543
sawtooth wave, which is periodic, with the Facebook data, which is
5544
not, and see what goes wrong.
5545
\end{exercise}
5546
5547
\begin{exercise}
5548
The goal of this exercise is to explore the effect of {\tt diff} and
5549
{\tt differentiate} on a signal. Create a triangle wave and plot
5550
it. Apply {\tt diff} and plot the result. Compute the spectrum of the
5551
triangle wave, apply {\tt differentiate}, and plot the result. Convert
5552
the spectrum back to a wave and plot it. Are there differences between
5553
the effect of {\tt diff} and {\tt differentiate} for this wave?
5554
\end{exercise}
5555
5556
\begin{exercise}
5557
The goal of this exercise is to explore the effect of {\tt cumsum} and
5558
{\tt integrate} on a signal. Create a square wave and plot it. Apply
5559
{\tt cumsum} and plot the result. Compute the spectrum of the square
5560
wave, apply {\tt integrate}, and plot the result. Convert the spectrum
5561
back to a wave and plot it. Are there differences between the effect
5562
of {\tt cumsum} and {\tt integrate} for this wave?
5563
\end{exercise}
5564
5565
\begin{exercise}
5566
The goal of this exercise is the explore the effect of integrating
5567
twice. Create a sawtooth wave, compute its spectrum, then apply {\tt
5568
integrate} twice. Plot the resulting wave and its spectrum. What is
5569
the mathematical form of the wave? Why does it resemble a sinusoid?
5570
\end{exercise}
5571
5572
\begin{exercise}
5573
The goal of this exercise is to explore the effect of the 2nd
5574
difference and 2nd derivative. Create a {\tt CubicSignal}, which is
5575
defined in {\tt thinkdsp}. Compute the second difference by applying
5576
{\tt diff} twice. What does the result look like? Compute the second
5577
derivative by applying {\tt differentiate} to the spectrum twice.
5578
Does the result look the same?
5579
5580
Plot the filters that corresponds to the 2nd difference and the 2nd
5581
derivative and compare them. Hint: In order to get the filters on the
5582
same scale, use a wave with framerate 1.
5583
\end{exercise}
5584
5585
5586
5587
5588
\chapter{LTI systems}
5589
\label{systems}
5590
5591
This chapter presents the theory of signals and systems, using
5592
musical acoustics as an example. And it explains an
5593
important application of the Convolution Theorem, characterization
5594
of linear, time-invariant systems (which I'll define soon).
5595
5596
The code for this chapter is in {\tt chap10.ipynb}, which is in the
5597
repository for this book (see Section~\ref{code}).
5598
You can also view it at \url{http://tinyurl.com/thinkdsp10}.
5599
5600
5601
5602
\section{Signals and systems}
5603
5604
In the context of signal processing, a {\bf system} is an abstract
5605
representation of anything that takes a signal as input and produces
5606
a signal as output.
5607
5608
For example, an electronic amplifier is a circuit that takes an
5609
electrical signal as input and produces a (louder) signal as output.
5610
5611
As another example, when you listen to a musical performance, you
5612
can think of the room as a system that takes the sound of the
5613
performance at the location where it is produced and produces a
5614
somewhat different sound at the location where you hear it.
5615
5616
A {\bf linear, time-invariant system}\footnote{My presentation here
5617
follows \url{http://en.wikipedia.org/wiki/LTI_system_theory}.} is a
5618
system with these additional properties:
5619
5620
\begin{enumerate}
5621
5622
\item Linearity: If you put two inputs into the system at the same
5623
time, the result is the sum of their outputs. Mathematically, if an
5624
input $x_1$ produces output $y_1$ and another input $x_2$ produces
5625
$y_2$, then $a x_1 + b x_2$ produces $a y_1 + b y_2$, where $a$ and
5626
$b$ are scalars.
5627
5628
\item Time invariance: The
5629
effect of the system doesn't vary over time, or depend on the state
5630
of the system. So if inputs $x_1$ and $x_2$ differ by a shift in time,
5631
their outputs $y_1$ and $y_2$ differ by the same shift, but are otherwise
5632
identical.
5633
5634
\end{enumerate}
5635
5636
Many physical systems have these properties, at least approximately.
5637
5638
\begin{itemize}
5639
5640
\item Circuits that contain only resistors, capacitors and inductors are
5641
LTI, to the degree that the components behave like their idealized
5642
models.
5643
5644
\item Mechanical systems that contain springs, masses and
5645
dashpots are also LTI, assuming linear springs (force proportional
5646
to displacement) and dashpots (force proportional to velocity).
5647
5648
\item Also, and most relevant to applications in this book,
5649
the media that transmit sounds (including air, water
5650
and solids) are well-modeled by LTI systems.
5651
5652
\end{itemize}
5653
5654
LTI systems are described by linear differential equations, and
5655
the solutions of those equations are complex sinusoids (see
5656
\url{http://en.wikipedia.org/wiki/Linear_differential_equation}).
5657
5658
This result provides an algorithm for computing the effect of
5659
an LTI system on an input signal:
5660
5661
\begin{enumerate}
5662
5663
\item Express the signal as the sum of complex sinusoid components.
5664
5665
\item For each input component, compute the corresponding output component.
5666
5667
\item Add up the output components.
5668
5669
\end{enumerate}
5670
5671
At this point, I hope this algorithm sounds familiar. It's the
5672
same algorithm we used for convolution in Section~\ref{effconv}, and
5673
for differentiation in Section~\ref{effdiff}. This process
5674
is called {\bf spectral decomposition} because we ``decompose''
5675
the input signal into its spectral components.
5676
5677
In order to apply this process to an LTI system, we have to {\bf
5678
characterize} the system by finding its effect on each component
5679
of the input signal. For mechanical systems, it turns out that there
5680
is a simple and efficient way to do that: you kick it and record
5681
the output.
5682
5683
Technically, the ``kick'' is called an {\bf impulse} and the
5684
output is called the {\bf impulse response}. You might wonder
5685
how a single impulse can completely characterize a system. You
5686
can see the answer by computing the DFT of an impulse. Here's
5687
a wave array with an impulse at $t=0$:
5688
5689
\begin{verbatim}
5690
>>> impulse = np.zeros(8)
5691
>>> impulse[0] = 1
5692
>>> impulse
5693
[ 1. 0. 0. 0. 0. 0. 0. 0.]
5694
\end{verbatim}
5695
5696
And here's its spectrum:
5697
5698
\begin{verbatim}
5699
>>> spectrum = np.fft.fft(impulse)
5700
>>> spectrum
5701
[ 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j]
5702
\end{verbatim}
5703
5704
That's right: an impulse is the sum of components with equal
5705
magnitudes at all frequencies\footnote{Not to be confused with white
5706
noise, which has the same average power at all frequencies, but
5707
varies around that average.} (and a phase offset of 0).
5708
5709
Therefore when you test a system by inputting
5710
an impulse, you are testing the response of the
5711
system at all frequencies. And you can test them all at the same
5712
time because the system is linear, so simultaneous tests don't
5713
interfere with each other.
5714
5715
5716
\section{Transfer functions}
5717
5718
To characterize the acoustic response of a room or open space, a
5719
simple way to generate an impulse is to pop a balloon or
5720
fire a gun. A gunshot puts an impulse into
5721
the system; the sound you hear is the impulse response.
5722
5723
%TODO: update the numbers for the figures in this chapter
5724
\begin{figure}
5725
% systems.py
5726
\centerline{\includegraphics[height=2.5in]{figs/systems6.eps}}
5727
\caption{Waveform of a gunshot.}
5728
\label{fig.systems6}
5729
\end{figure}
5730
5731
As an example, I'll use a recording of a gunshot to characterize
5732
the room where the gun was fired, then use the impulse response
5733
to simulate the effect of that room on a violin recording.
5734
5735
This example is in {\tt chap10.ipynb}, which is in the repository
5736
for this book; you can also view it, and listen to the examples,
5737
at \url{http://tinyurl.com/thinkdsp10}.
5738
5739
Here's the gunshot:
5740
5741
\begin{verbatim}
5742
response = thinkdsp.read_wave('180961__kleeb__gunshots.wav')
5743
response = response.segment(start=0.26, duration=5.0)
5744
response.normalize()
5745
response.plot()
5746
\end{verbatim}
5747
5748
I select a segment starting at 0.26 seconds to remove the silence
5749
before the gunshot. Figure~\ref{fig.systems6} (left) shows the
5750
waveform of the gunshot. Next we compute the DFT of {\tt response}:
5751
5752
\begin{verbatim}
5753
transfer = response.make_spectrum()
5754
transfer.plot()
5755
\end{verbatim}
5756
5757
Figure~\ref{fig.systems6} (right) shows the result. This spectrum
5758
encodes the response of the room; for each frequency, the spectrum
5759
contains a complex number that represents an amplitude multiplier and
5760
a phase shift. This spectrum is called a {\bf transfer
5761
function} because it contains information about how the system transfers
5762
the input to the output.
5763
5764
Now we can simulate the effect this room would have on the sound
5765
of a violin. Here is the violin recording we used in Section~\ref{violin}
5766
5767
\begin{verbatim}
5768
wave = thinkdsp.read_wave('92002__jcveliz__violin-origional.wav')
5769
wave.ys = wave.ys[:len(response)]
5770
wave.normalize()
5771
\end{verbatim}
5772
5773
The violin and gunshot waves were sampled at the same framerate,
5774
44,100 Hz. And coincidentally, the duration of both is about the
5775
same. I trimmed the violin wave to the same length as the gunshot.
5776
5777
Next I compute the DFT of the violin wave:
5778
5779
\begin{verbatim}
5780
spectrum = wave.make_spectrum()
5781
\end{verbatim}
5782
5783
Now I know the magnitude and phase of each component in the
5784
input, and I know the transfer function of the system. Their
5785
product is the DFT of the output, which we can use to compute the
5786
output wave:
5787
5788
\begin{verbatim}
5789
output = (spectrum * transfer).make_wave()
5790
output.normalize()
5791
output.plot()
5792
\end{verbatim}
5793
5794
%TODO: Am I missing a figure here?
5795
Figure~\ref{fig.systems7} shows the output of the system superimposed
5796
on the input. The overall shape is similar, but there are substantial
5797
differences. And those differences are clearly audible. Load
5798
{\tt chap10.ipynb} and listen to them. One thing I find striking
5799
about this example is that you can get a sense of what the room
5800
was like; to me, it sounds like a long, narrow room with hard floors
5801
and ceilings. That is, like a firing range.
5802
5803
There's one things I glossed over in this example that I'll mention
5804
in case it bothers anyone. The violin recording I started with
5805
has already been transformed by one system: the room where it was
5806
recorded. So what I really computed in my example is the sound
5807
of the violin after two transformations. To properly simulate
5808
the sound of a violin in a different room, I should have characterized
5809
the room where the violin was recorded and applied the inverse
5810
of that transfer function first.
5811
5812
5813
\section{Systems and convolution}
5814
5815
\begin{figure}
5816
% systems.py
5817
\centerline{\includegraphics[height=2.5in]{figs/systems8.eps}}
5818
\caption{Sum of a gunshot waveform with a shifted, scaled version of
5819
itself.}
5820
\label{fig.systems8}
5821
\end{figure}
5822
5823
If you think the previous example is black magic,
5824
you are not alone. I've been thinking about it for a while and it
5825
still makes my head hurt.
5826
5827
In the previous section, I suggested one way to think about it:
5828
5829
\begin{itemize}
5830
5831
\item An impulse is made up of components with amplitude 1 at all
5832
frequencies.
5833
5834
\item The impulse response contains the sum of the responses of the
5835
system to all of these components.
5836
5837
\item The transfer function, which is the DFT of the impulse response,
5838
encodes the effect of the system on each frequency component in the form
5839
of an amplitude multiplier and a phase shift.
5840
5841
\item For any input, we can compute the response of the system
5842
by breaking the input into components, computing the response to
5843
each component, and adding them up.
5844
5845
\end{itemize}
5846
5847
But if you don't like that, there's another way to think about
5848
it altogether: convolution!
5849
5850
Suppose that instead of firing one gun, you fire two guns:
5851
a big one with amplitude 1 at $t=0$ and a smaller one with
5852
amplitude 0.5 at $t=1$.
5853
5854
We can compute the response of the system by adding up
5855
the original impulse response and a scaled, shifted version of itself.
5856
Here's a function that makes a shifted, scaled version of
5857
a wave:
5858
5859
\begin{verbatim}
5860
def shifted_scaled(wave, shift, factor):
5861
res = wave.copy()
5862
res.shift(shift)
5863
res.scale(factor)
5864
return res
5865
\end{verbatim}
5866
5867
Here's how we use it to compute the response to a two-gun salute:
5868
5869
\begin{verbatim}
5870
dt = 1
5871
shift = dt * response.framerate
5872
factor = 0.5
5873
response2 = response + shifted_scaled(response, shift, factor)
5874
\end{verbatim}
5875
5876
Figure~\ref{fig.systems8} shows the result. You can hear what
5877
it sounds like in {\tt chap10.ipynb}. Not surprisingly, it
5878
sounds like two gunshots, the first one louder than the second.
5879
5880
Now suppose instead of two guns, you add up 100 guns, with
5881
a shift of 100 between them. This loop computes the result:
5882
5883
\begin{verbatim}
5884
total = 0
5885
for j in range(100):
5886
total += shifted_scaled(response, j*100, 1.0)
5887
\end{verbatim}
5888
5889
At a framerate of 44,100 Hz, there are 441 gunshots per second,
5890
so you don't hear the individual shots. Instead, it sounds
5891
like a periodic signal at 441 Hz. If you play this example, it
5892
sounds like a car horn in a garage.
5893
5894
And that brings us to a key insight: you can think of any wave as a
5895
series of samples, where each sample is an impulse with a different
5896
magnitude.
5897
5898
As a example, I'll generate a sawtooth signal at 410 Hz:
5899
5900
\begin{verbatim}
5901
signal = thinkdsp.SawtoothSignal(freq=410)
5902
wave = signal.make_wave(duration=0.1,
5903
framerate=response.framerate)
5904
\end{verbatim}
5905
5906
Now I'll loop through the series of impulses that make up the
5907
sawtooth, and add up the impulse responses:
5908
5909
\begin{verbatim}
5910
total = 0
5911
for j, y in enumerate(wave.ys):
5912
total += shifted_scaled(response, j, y)
5913
\end{verbatim}
5914
5915
The result is what it would sound like to play a sawtooth wave in a
5916
firing range. Again, you can listen to it in {\tt chap10.ipynb}.
5917
5918
\begin{figure}
5919
\input{diagram2}
5920
\caption{Diagram of the sum of scaled and shifted copies of $g$.}
5921
\label{fig.convolution}
5922
\end{figure}
5923
5924
Figure~\ref{fig.convolution} shows a diagram of this computation,
5925
where $f$ is the sawtooth, $g$ is the impulse response, and $h$
5926
is the sum of the shifted, scaled copies of $g$.
5927
5928
For the example shown:
5929
5930
\[ h[2] = f[0]g[2] + f[1]g[1] + f[2]g[0] \]
5931
5932
And more generally,
5933
5934
\[ h[n] = \sum_{m=0}^{N-1} f[m] g[n-m] \]
5935
5936
You might recognize this equation from Section~\ref{convolution}. It
5937
is the convolution of $f$ and $g$. This shows that if the input is $f$
5938
and the impulse response of the system is $g$, the output is the
5939
convolution of $f$ and $g$.
5940
5941
In summary, there are two ways to think about the effect of a system
5942
on a signal:
5943
5944
\begin{enumerate}
5945
5946
\item The input is a sequence of impulses, so the output is the sum of
5947
scaled, shifted copies of the impulse response; that sum is the
5948
convolution of the input and the impulse response.
5949
5950
\item The DFT of the impulse response is a transfer function that
5951
encodes the effect of the system on each frequency component as a
5952
magnitude and phase offset. The DFT of the input encodes the
5953
magnitude and phase offset of the frequency components it contains.
5954
Multiplying the DFT of the input by the transfer function yields
5955
the DFT of the output.
5956
5957
\end{enumerate}
5958
5959
The equivalence of these two descriptions should not be a surprise;
5960
it is basically a statement of the Convolution Theorem:
5961
convolution of $f$ and $g$ in the time
5962
domain corresponds to multiplication in the frequency domain.
5963
And if you wondered why convolution is defined as it is, which
5964
seemed backwards when we talked about smoothing and difference
5965
windows, now you know the reason: the definition of convolution
5966
appears naturally in the response of an LTI system to a signal.
5967
5968
5969
\section{Proof of the Convolution Theorem}
5970
5971
Well, I've put it off long enough. It's time to prove the Convolution
5972
Theorem (CT), which states:
5973
5974
\[ \DFT(f \conv g) = \DFT(f) \DFT(g) \]
5975
5976
where $f$ and $g$ are vectors with the same length, $N$.
5977
5978
I'll proceed in two steps:
5979
5980
\begin{enumerate}
5981
5982
\item I'll show that in the special case where $f$ is a complex
5983
exponential, convolution with $g$ has the effect of multiplying
5984
$f$ by a scalar.
5985
5986
\item In the more general case where $f$ is not a complex exponential,
5987
we can use the DFT to express it as a sum of exponential components,
5988
compute the convolution of each component (by multiplication) and
5989
then add up the results.
5990
5991
\end{enumerate}
5992
5993
Together these steps prove the Convolution Theorem. But first, let's
5994
assemble the pieces we'll need. The DFT of $g$ is:
5995
%
5996
\[ DFT(g) = G[k] = \sum_n g[n] \exp(-2 \pi i n k / N) \]
5997
%
5998
where $k$ is an index of frequency from
5999
0 to $N-1$ and $n$ is an index of time from 0 to $N-1$.
6000
The result, $G$, is a vector of $N$ complex numbers.
6001
6002
The inverse DFT of $F$, which I'll call $f$, is:
6003
%
6004
\[ IDFT(F) = f[n] = \sum_k F[k] \exp(2 \pi i n k / N) \]
6005
%
6006
And here's the definition of convolution:
6007
%
6008
\[ (f \conv g)[n] = \sum_m f[m] g[n-m] \]
6009
%
6010
where $m$ is another index of time from 0 to $N-1$.
6011
Convolution is commutative, so I could equivalently write:
6012
%
6013
\[ (f \conv g)[n] = \sum_m f[n-m] g[m] \]
6014
%
6015
Now let's consider the special case where $f$ is a complex
6016
exponential with frequency $k$, which I'll call $e_k$:
6017
%
6018
\[ f[n] = e_k[n] = \exp(2 \pi i n k / N) \]
6019
%
6020
where $k$ is an index of frequency and $n$ is an index of time.
6021
6022
Plugging $e_k$ into the second definition of convolution yields
6023
%
6024
\[ (e_k \conv g)[n] = \sum_m \exp(2 \pi i (n-m) k / N) g[m] \]
6025
%
6026
We can split the first term into a product:
6027
%
6028
\[ (e_k \conv g)[n] = \sum_m \exp(2 \pi i n k / N) \exp(-2 \pi i m k / N) g[m] \]
6029
%
6030
The first half does not depend on $m$, so we can pull it out of the
6031
summation:
6032
%
6033
\[ (e_k \conv g)[n] = \exp(2 \pi i n k / N) \sum_m \exp(-2 \pi i m k / N) g[m] \]
6034
%
6035
Now we recognize that the first term is $e_k$, and the summation is
6036
$G[k]$ (using $m$ as the index of time). So we can write:
6037
%
6038
\[ (e_k \conv g)[n] = e_k[n] G[k] \]
6039
%
6040
which shows that for each complex exponential, $e_k$, convolution
6041
with $g$ has the effect of multiplying $e_k$ by $G[k]$. In mathematical
6042
terms, each $e_k$ is an eigenvector of this operation, and
6043
$G[k]$ is the corresponding eigenvalue.
6044
6045
Now for the second part of the proof. If the input signal, $f$, doesn't
6046
happen to be a complex exponential, we can express it as a sum of
6047
complex exponentials by computing its DFT, $F$.
6048
For each value of $k$ from 0 to $N-1$, $F[k]$ is the complex
6049
magnitude of the component with frequency $k$.
6050
6051
Each input component is a complex exponential with magnitude
6052
$F[k]$, so each output component is a complex
6053
exponential with magnitude $F[k]G[k]$, based on the first part of
6054
the proof.
6055
6056
Because the system is linear, the output is just the sum of the
6057
output components:
6058
%
6059
\[ (f \conv g)[n] = \sum_k F[k] G[k] e_k[n] \]
6060
%
6061
Plugging in the definition of $e_k$ yields
6062
%
6063
\[ (f \conv g)[n] = \sum_k F[k] G[k] \exp(2 \pi i n k / N) \]
6064
%
6065
The right hand side is the inverse DFT of the product $F G$. Thus:
6066
%
6067
\[ (f \conv g) = \IDFT( F G ) \]
6068
%
6069
Substituting $F = \DFT(f)$ and $G = \DFT(g)$:
6070
%
6071
\[ (f \conv g) = \IDFT( \DFT(f) \DFT(g) ) \]
6072
%
6073
Finally, taking the DFT of both sides yields the Convolution Theorem:
6074
%
6075
\[ \DFT(f \conv g) = \DFT(f) \DFT(g) \]
6076
%
6077
QED
6078
6079
6080
\section{Exercises}
6081
6082
% TODO: make exercises
6083
\begin{exercise}
6084
\end{exercise}
6085
6086
\begin{exercise}
6087
\end{exercise}
6088
6089
Solutions to these exercises are in {\tt chap10soln.ipynb}.
6090
6091
6092
\chapter{Modulation and sampling}
6093
6094
In Section~\ref{aliasing} I defined the folding frequency, also called
6095
the Nyquist frequency, and I claimed that ???
6096
6097
This chapter presents the theory of signals and systems, using
6098
musical acoustics as an example. And it explains an
6099
important application of the Convolution Theorem, characterization
6100
of linear, time-invariant systems (which I'll define soon).
6101
6102
The code for this chapter is in {\tt chap11.ipynb}, which is in the
6103
repository for this book (see Section~\ref{code}).
6104
You can also view it at \url{http://tinyurl.com/thinkdsp11}.
6105
6106
6107
\section{Convolution with impulses}
6108
6109
\begin{figure}
6110
% sampling.py
6111
\centerline{\includegraphics[height=2.5in]{figs/sampling1.eps}}
6112
\caption{.}
6113
\label{fig.sampling1}
6114
\end{figure}
6115
6116
Figure~\ref{fig.sampling1} shows
6117
6118
6119
6120
6121
6122
6123
6124
\end{document}
6125
6126
6127
6128
Case study ideas:
6129
6130
\begin{itemize}
6131
6132
\item Pitch tracking in Rock Band
6133
6134
\item Auto-Tune
6135
6136
\item JPEG compression
6137
6138
\item Image recognition in Fourier space (money)
6139
6140
\item Application of cepstrum?
6141
6142
\item Application of z transform -- digital control systems
6143
6144
\item tilt shift effect on a photo
6145
6146
6147
6148
\end{itemize}
6149
6150