Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In
| Download

📚 The CoCalc Library - books, templates and other resources

Views: 96181
License: OTHER
1
% LaTeX source for ``Think DSP: Digital Signal Processing for Programmers''
2
% Copyright 2014 Allen B. Downey.
3
4
% License: Creative Commons Attribution-NonCommercial 3.0 Unported License.
5
% http://creativecommons.org/licenses/by-nc/3.0/
6
%
7
8
\documentclass[12pt]{book}
9
\usepackage[width=5.5in,height=8.5in,
10
hmarginratio=3:2,vmarginratio=1:1]{geometry}
11
12
% for some of these packages, you might have to install
13
% texlive-latex-extra (in Ubuntu)
14
15
\usepackage[T1]{fontenc}
16
\usepackage{textcomp}
17
\usepackage{mathpazo}
18
\usepackage{url}
19
\usepackage{graphicx}
20
\usepackage{subfig}
21
\usepackage{amsmath}
22
\usepackage{amsthm}
23
\usepackage{makeidx}
24
\usepackage{setspace}
25
\usepackage{hevea}
26
\usepackage{upquote}
27
\usepackage{fancyhdr}
28
\usepackage[bookmarks]{hyperref}
29
30
\title{Think DSP}
31
\author{Allen B. Downey}
32
33
\newcommand{\thetitle}{Think DSP: Digital Signal Processing in Python}
34
\newcommand{\theversion}{1.0.5}
35
36
% these styles get translated in CSS for the HTML version
37
\newstyle{a:link}{color:black;}
38
\newstyle{p+p}{margin-top:1em;margin-bottom:1em}
39
\newstyle{img}{border:0px}
40
41
% change the arrows in the HTML version
42
\setlinkstext
43
{\imgsrc[ALT="Previous"]{back.png}}
44
{\imgsrc[ALT="Up"]{up.png}}
45
{\imgsrc[ALT="Next"]{next.png}}
46
47
\makeindex
48
49
\newif\ifplastex
50
\plastexfalse
51
52
\begin{document}
53
54
\frontmatter
55
56
\ifplastex
57
58
\else
59
\fi
60
61
\ifplastex
62
\usepackage{localdef}
63
\maketitle
64
65
\else
66
67
\newtheoremstyle{exercise}% name of the style to be used
68
{\topsep}% measure of space to leave above the theorem. E.g.: 3pt
69
{\topsep}% measure of space to leave below the theorem. E.g.: 3pt
70
{}% name of font to use in the body of the theorem
71
{}% measure of space to indent
72
{\bfseries}% name of head font
73
{}% punctuation between head and body
74
{ }% space after theorem head; " " = normal interword space
75
{}% Manually specify head
76
77
\theoremstyle{exercise}
78
\newtheorem{exercise}{Exercise}[chapter]
79
80
\input{latexonly}
81
82
\begin{latexonly}
83
84
\renewcommand{\topfraction}{0.9}
85
\renewcommand{\blankpage}{\thispagestyle{empty} \quad \newpage}
86
87
% TITLE PAGES FOR LATEX VERSION
88
89
%-half title--------------------------------------------------
90
\thispagestyle{empty}
91
92
\begin{flushright}
93
\vspace*{2.0in}
94
95
\begin{spacing}{3}
96
{\huge Think DSP}\\
97
{\Large Digital Signal Processing in Python}
98
\end{spacing}
99
100
\vspace{0.25in}
101
102
Version \theversion
103
104
\vfill
105
106
\end{flushright}
107
108
%--verso------------------------------------------------------
109
110
\blankpage
111
\blankpage
112
113
%--title page--------------------------------------------------
114
\pagebreak
115
\thispagestyle{empty}
116
117
\begin{flushright}
118
\vspace*{2.0in}
119
120
\begin{spacing}{3}
121
{\huge Think DSP}\\
122
{\Large Digital Signal Processing in Python}
123
\end{spacing}
124
125
\vspace{0.25in}
126
127
Version \theversion
128
129
\vspace{1in}
130
131
132
{\Large
133
Allen B. Downey\\
134
}
135
136
137
\vspace{0.5in}
138
139
{\Large Green Tea Press}
140
141
{\small Needham, Massachusetts}
142
143
\vfill
144
145
\end{flushright}
146
147
148
%--copyright--------------------------------------------------
149
\pagebreak
150
\thispagestyle{empty}
151
152
Copyright \copyright ~2014 Allen B. Downey.
153
154
155
\vspace{0.2in}
156
157
\begin{flushleft}
158
Green Tea Press \\
159
9 Washburn Ave \\
160
Needham MA 02492
161
\end{flushleft}
162
163
Permission is granted to copy, distribute, and/or modify this document
164
under the terms of the Creative Commons Attribution-NonCommercial 3.0 Unported
165
License, which is available at \url{http://creativecommons.org/licenses/by-nc/3.0/}.
166
167
\vspace{0.2in}
168
169
\end{latexonly}
170
171
172
% HTMLONLY
173
174
\begin{htmlonly}
175
176
% TITLE PAGE FOR HTML VERSION
177
178
{\Large \thetitle}
179
180
{\large Allen B. Downey}
181
182
Version \theversion
183
184
\vspace{0.25in}
185
186
Copyright 2012 Allen B. Downey
187
188
\vspace{0.25in}
189
190
Permission is granted to copy, distribute, and/or modify this document
191
under the terms of the Creative Commons Attribution-NonCommercial 3.0
192
Unported License, which is available at
193
\url{http://creativecommons.org/licenses/by-nc/3.0/}.
194
195
\setcounter{chapter}{-1}
196
197
\end{htmlonly}
198
199
\fi
200
% END OF THE PART WE SKIP FOR PLASTEX
201
202
\chapter{Preface}
203
\label{preface}
204
205
Signal processing is one of my favorite topics. It is useful
206
in many areas of science and engineering, and if you understand
207
the fundamental ideas, it provides insight into many things
208
we see in the world, and especially the things we hear.
209
210
But unless you studied electrical or mechanical engineering, you
211
probably haven't had a chance to learn about signal processing. The
212
problem is that most books (and the classes that use them) present the
213
material bottom-up, starting with mathematical abstractions like
214
phasors. And they tend to be theoretical, with few applications and
215
little apparent relevance.
216
217
The premise of this book is that if you know how to program, you
218
can use that skill to learn other things, and have fun doing it.
219
220
With a programming-based approach, I can present the most important
221
ideas right away. By the end of the first chapter, you can analyze
222
sound recordings and other signals, and generate new sounds. Each
223
chapter introduces a new technique and an application you can
224
apply to real signals. At each step you learn how to use a
225
technique first, and then how it works.
226
227
This approach is more practical and, I hope you'll agree, more fun.
228
229
230
\section{Who is this book for?}
231
232
The examples and supporting code for this book are in Python. You
233
should know core Python and you should be
234
familiar with object-oriented features, at least using objects if not
235
defining your own.
236
237
If you are not already familiar with Python, you might want to start
238
with my other book, {\it Think Python}, which is an introduction to
239
Python for people who have never programmed, or Mark
240
Lutz's {\it Learning Python}, which might be better for people with
241
programming experience.
242
243
I use NumPy and SciPy extensively. If you are familiar with them
244
already, that's great, but I will also explain the functions
245
and data structures I use.
246
247
I assume that the reader knows basic mathematics, including complex
248
numbers. You don't need much calculus; if you understand the concepts
249
of integration and differentiation, that will do.
250
I use some linear algebra, but I will explain it as we
251
go along.
252
253
254
\section{Using the code}
255
\label{code}
256
257
The code and sound samples used in this book are available from
258
\url{https://github.com/AllenDowney/ThinkDSP}. Git is a version
259
control system that allows you to keep track of the files that
260
make up a project. A collection of files under Git's control is
261
called a ``repository''. GitHub is a hosting service that provides
262
storage for Git repositories and a convenient web interface.
263
\index{repository}
264
\index{Git}
265
\index{GitHub}
266
267
The GitHub homepage for my repository provides several ways to
268
work with the code:
269
270
\begin{itemize}
271
272
\item You can create a copy of my repository
273
on GitHub by pressing the {\sf Fork} button. If you don't already
274
have a GitHub account, you'll need to create one. After forking, you'll
275
have your own repository on GitHub that you can use to keep track
276
of code you write while working on this book. Then you can
277
clone the repo, which means that you copy the files
278
to your computer.
279
\index{fork}
280
281
\item Or you could clone
282
my repository. You don't need a GitHub account to do this, but you
283
won't be able to write your changes back to GitHub.
284
\index{clone}
285
286
\item If you don't want to use Git at all, you can download the files
287
in a Zip file using the button in the lower-right corner of the
288
GitHub page.
289
290
\end{itemize}
291
292
All of the code is written to work in both Python 2 and Python 3
293
with no translation.
294
295
I developed this book using Anaconda from
296
Continuum Analytics, which is a free Python distribution that includes
297
all the packages you'll need to run the code (and lots more).
298
I found Anaconda easy to install. By default it does a user-level
299
installation, not system-level, so you don't need administrative
300
privileges. And it supports both Python 2 and Python 3. You can
301
download Anaconda from \url{https://www.anaconda.com/distribution/}.
302
\index{Anaconda}
303
304
If you don't want to use Anaconda, you will need the following
305
packages:
306
307
\begin{itemize}
308
309
\item NumPy for basic numerical computation, \url{http://www.numpy.org/};
310
\index{NumPy}
311
312
\item SciPy for scientific computation,
313
\url{http://www.scipy.org/};
314
\index{SciPy}
315
316
\item matplotlib for visualization, \url{http://matplotlib.org/}.
317
\index{matplotlib}
318
319
\end{itemize}
320
321
Although these are commonly used packages, they are not included with
322
all Python installations, and they can be hard to install in some
323
environments. If you have trouble installing them, I
324
recommend using Anaconda or one of the other Python distributions
325
that include these packages.
326
\index{installation}
327
328
Most exercises use Python scripts, but some also use Jupyter
329
notebooks. If you have not used Jupyter before, you can read about
330
it at \url{http://jupyter.org}.
331
\index{Jupyter}
332
333
There are three ways you can work with the Jupyter notebooks:
334
335
\begin{description}
336
337
\item[Run Jupyter on your computer]
338
339
If you installed Anaconda, you
340
probably got Jupyter by default. To check, start the server from
341
the command line, like this:
342
343
\begin{verbatim}
344
$ jupyter notebook
345
\end{verbatim}
346
347
If it's not installed, you can install it in Anaconda like this
348
349
\begin{verbatim}
350
$ conda install jupyter
351
\end{verbatim}
352
353
When you start the server, it should launch your default web browser
354
or create a new tab in an open browser window.
355
356
\item[Run Jupyter on Binder]
357
358
Binder is a service that runs Jupyter in a virtual machine. If you
359
follow this link, \url{http://mybinder.org/repo/AllenDowney/ThinkDSP},
360
you should get a Jupyter home page with the notebooks for this book
361
and the supporting data and scripts.
362
363
You can run the scripts and modify them to run your own code, but the
364
virtual machine you run in is temporary. Any changes you make will
365
disappear, along with the virtual machine, if you leave it idle for
366
more than about an hour.
367
368
\item[View notebooks on nbviewer]
369
370
When we refer to notebooks later in the book, we will provide links to
371
nbviewer, which provides a static view of the code and results. You
372
can use these links to read the notebooks and listen to the examples,
373
but you won't be able to modify or run the code, or use the
374
interactive widgets.
375
376
\end{description}
377
378
Good luck, and have fun!
379
380
381
382
\section*{Contributor List}
383
384
If you have a suggestion or correction, please send email to
385
{\tt downey@allendowney.com}. If I make a change based on your
386
feedback, I will add you to the contributor list
387
(unless you ask to be omitted).
388
\index{contributors}
389
390
If you include at least part of the sentence the
391
error appears in, that makes it easy for me to search. Page and
392
section numbers are fine, too, but not as easy to work with.
393
Thanks!
394
395
\small
396
397
\begin{itemize}
398
399
\item Before I started writing, my thoughts about this book
400
benefited from conversations with Boulos Harb at Google and
401
Aurelio Ramos, formerly at Harmonix Music Systems.
402
403
\item During the Fall 2013 semester, Nathan Lintz and Ian Daniher
404
worked with me on an independent study project and helped me with
405
the first draft of this book.
406
407
\item On Reddit's DSP forum, the anonymous user RamjetSoundwave
408
helped me fix a problem with my implementation of Brownian Noise.
409
And andodli found a typo.
410
411
\item In Spring 2015 I had the pleasure of teaching this material
412
along with Prof. Oscar Mur-Miranda and Prof. Siddhartan Govindasamy.
413
Both made many suggestions and corrections.
414
415
\item Silas Gyger corrected an arithmetic error.
416
417
\item Giuseppe Masetti sent a number of very helpful suggestions.
418
419
\item Eric Peters sent many helpful suggestions.
420
421
% ENDCONTRIB
422
423
\end{itemize}
424
425
426
Special thanks to Freesound, which is the source of many of the
427
sound samples I use in this book, and to the Freesound users who
428
uploaded those sounds. I include some of their wave files in
429
the GitHub repository for this book, using the original file
430
names, so it should be easy to find their sources.
431
432
Unfortunately, most Freesound users don't make their real names
433
available, so I can only thank them using their user names. Samples
434
used in this book were contributed by Freesound users: iluppai,
435
wcfl10, thirsk, docquesting, kleeb, landup, zippi1, themusicalnomad,
436
bcjordan, rockwehrmann, marcgascon7, jcveliz. Thank you all!
437
438
%100475__iluppai__saxophone-weep.wav
439
%http://www.freesound.org/people/iluppai/sounds/100475/
440
441
%105977__wcfl10__favorite-station.wav
442
%http://www.freesound.org/people/wcfl10/sounds/105977/
443
444
%120994__thirsk__120-oboe.wav
445
%http://www.freesound.org/people/Thirsk/sounds/120994/
446
447
%132736__ciccarelli__ocean-waves.wav
448
%http://www.freesound.org/people/ciccarelli/sounds/132736/
449
450
%180960__kleeb__gunshot.wav
451
%http://www.freesound.org/people/Kleeb/sounds/180960/
452
453
%18871__zippi1__sound-bell-440hz.wav
454
%http://www.freesound.org/people/zippi1/sounds/18871/
455
456
%253887__themusicalnomad__positive-beeps.wav
457
%http://www.freesound.org/people/themusicalnomad/sounds/253887/
458
459
%28042__bcjordan__voicedownbew.wav
460
%http://www.freesound.org/people/bcjordan/sounds/28042/
461
462
%72475__rockwehrmann__glissup02.wav
463
%http://www.freesound.org/people/rockwehrmann/sounds/72475/
464
465
%87778__marcgascon7__vocals.wav
466
%http://www.freesound.org/people/marcgascon7/sounds/87778/
467
468
%92002__jcveliz__violin-origional.wav
469
%http://www.freesound.org/people/jcveliz/sounds/92002/
470
471
472
\normalsize
473
474
\clearemptydoublepage
475
476
% TABLE OF CONTENTS
477
\begin{latexonly}
478
479
\tableofcontents
480
481
\clearemptydoublepage
482
483
\end{latexonly}
484
485
% START THE BOOK
486
\mainmatter
487
488
489
\chapter{Sounds and signals}
490
\label{sounds}
491
492
A {\bf signal} represents a quantity that varies in time. That
493
definition is pretty abstract, so let's start with a concrete example:
494
sound. Sound is variation in air pressure. A sound signal represents
495
variations in air pressure over time.
496
497
A microphone is a device that measures these variations and generates
498
an electrical signal that represents sound. A speaker is a device
499
that takes an electrical signal and produces sound.
500
Microphones and speakers are called {\bf transducers} because they
501
transduce, or convert, signals from one form to another.
502
503
This book is about signal processing, which includes processes for
504
synthesizing, transforming, and analyzing signals. I will focus on
505
sound signals, but the same methods apply to electronic signals,
506
mechanical vibration, and signals in many other domains.
507
508
They also apply to signals that vary in space rather than time, like
509
elevation along a hiking trail. And they apply to signals in more
510
than one dimension, like an image, which you can think of as a signal
511
that varies in two-dimensional space. Or a movie, which is
512
a signal that varies in two-dimensional space {\it and} time.
513
514
But we start with simple one-dimensional sound.
515
516
The code for this chapter is in {\tt chap01.ipynb}, which is in the
517
repository for this book (see Section~\ref{code}).
518
You can also view it at \url{http://tinyurl.com/thinkdsp01}.
519
520
521
\section{Periodic signals}
522
\label{violin}
523
524
\begin{figure}
525
% sounds.py
526
\centerline{\includegraphics[height=2.5in]{figs/sounds1.pdf}}
527
\caption{Segment from a recording of a bell.}
528
\label{fig.sounds1}
529
\end{figure}
530
531
We'll start with {\bf periodic signals}, which are signals that
532
repeat themselves after some period of time. For example, if you
533
strike a bell, it vibrates and generates sound. If you record
534
that sound and plot the transduced signal, it looks like
535
Figure~\ref{fig.sounds1}.
536
537
This signal resembles a {\bf sinusoid}, which means it has the same
538
shape as the trigonometric sine function.
539
540
You can see that this signal is periodic. I chose the duration
541
to show three full repetitions, also known as {\bf cycles}.
542
The duration of each cycle, called the {\bf period}, is about 2.3 ms.
543
544
The {\bf frequency} of a signal is the number of cycles
545
per second, which is the inverse of the period.
546
The units of frequency are cycles per second, or {\bf Hertz},
547
abbreviated ``Hz''. (Strictly speaking, the number of cycles is
548
a dimensionless number, so a Hertz is really a ``per second'').
549
550
The frequency of this signal is about 439 Hz, slightly lower than 440
551
Hz, which is the standard tuning pitch for orchestral music. The
552
musical name of this note is A, or more specifically, A4. If you are
553
not familiar with ``scientific pitch notation'', the numerical suffix
554
indicates which octave the note is in. A4 is the A above middle C.
555
A5 is one octave higher. See
556
\url{http://en.wikipedia.org/wiki/Scientific_pitch_notation}.
557
558
\begin{figure}
559
% sounds.py
560
\centerline{\includegraphics[height=2.5in]{figs/sounds2.pdf}}
561
\caption{Segment from a recording of a violin.}
562
\label{fig.sounds2}
563
\end{figure}
564
565
A tuning fork generates a sinusoid because the vibration of the tines
566
is a form of simple harmonic motion. Most musical instruments
567
produce periodic signals, but the shape of these signals is not
568
sinusoidal. For example, Figure~\ref{fig.sounds2} shows a segment
569
from a recording of a violin playing
570
Boccherini's String Quintet No. 5 in E, 3rd
571
movement.
572
573
%\footnote{The recording is from
574
% \url{http://www.freesound.org/people/jcveliz/sounds/92002/}.
575
%I identified the piece using \url{http://www.musipedia.org}.}
576
% Parson's code: DUUDDUURDR
577
578
Again we can see that the signal is periodic, but the shape of the
579
signal is more complex. The shape of a periodic signal is called
580
the {\bf waveform}. Most musical instruments produce waveforms more
581
complex than a sinusoid. The shape of the waveform determines the
582
musical {\bf timbre}, which is our perception of the quality of the
583
sound. People usually perceive complex waveforms as rich, warm and
584
more interesting than sinusoids.
585
586
587
\section{Spectral decomposition}
588
589
\begin{figure}
590
% sounds.py
591
\centerline{\includegraphics[height=2.5in]{figs/sounds3.pdf}}
592
\caption{Spectrum of a segment from the violin recording.}
593
\label{fig.sounds3}
594
\end{figure}
595
596
The most important topic in this book is {\bf spectral decomposition},
597
which is the idea that any signal can be expressed as the sum of
598
sinusoids with different frequencies.
599
600
The most important mathematical idea in this book is the {\bf discrete
601
Fourier transform}, or {\bf DFT}, which takes a signal and produces
602
its {\bf spectrum}. The spectrum is the set of sinusoids that add up to
603
produce the signal.
604
605
And the most important algorithm in this book is the {\bf Fast
606
Fourier transform}, or {\bf FFT}, which is an efficient way to
607
compute the DFT.
608
609
For example, Figure~\ref{fig.sounds3} shows the spectrum of the violin
610
recording in Figure~\ref{fig.sounds2}. The x-axis is the range of
611
frequencies that make up the signal. The y-axis shows the strength
612
or {\bf amplitude} of each frequency component.
613
614
The lowest frequency component is called the {\bf fundamental
615
frequency}. The fundamental frequency of this signal is near 440 Hz
616
(actually a little lower, or ``flat'').
617
618
In this signal the fundamental frequency has the largest amplitude,
619
so it is also the {\bf dominant frequency}.
620
Normally the perceived pitch of a sound is determined by the
621
fundamental frequency, even if it is not dominant.
622
623
The other spikes in the spectrum are at frequencies 880, 1320, 1760, and
624
2200, which are integer multiples of the fundamental.
625
These components are called {\bf harmonics} because they are
626
musically harmonious with the fundamental:
627
628
\begin{itemize}
629
630
\item 880 is the frequency of A5, one octave higher than the fundamental.
631
An {\bf octave} is a doubling in frequency.
632
633
\item 1320 is approximately E6, which is a major fifth above A5.
634
If you are not familiar with musical intervals like "major fifth'', see
635
\url{https://en.wikipedia.org/wiki/Interval_(music)}.
636
637
\item 1760 is A6, two octaves above the fundamental.
638
639
\item 2200 is approximately C$\sharp$7, which is a major third
640
above A6.
641
642
\end{itemize}
643
644
These harmonics make up the notes of an A major
645
chord, although not all in the same octave. Some of them are only
646
approximate because the notes that make up Western music have been
647
adjusted for {\bf equal temperament} (see
648
\url{http://en.wikipedia.org/wiki/Equal_temperament}).
649
650
Given the harmonics and their amplitudes, you can reconstruct the
651
signal by adding up sinusoids.
652
Next we'll see how.
653
654
655
\section{Signals}
656
657
I wrote a Python module called {\tt thinkdsp.py} that contains classes
658
and functions for working with signals and spectrums\footnote{The
659
plural of ``spectrum'' is often written ``spectra'', but I prefer
660
to use standard English plurals. If you are familiar with ``spectra'',
661
I hope my choice doesn't sound too strange.}. You
662
will find it in the repository for this book (see Section~\ref{code}).
663
664
To represent signals, {\tt thinkdsp} provides a class called
665
{\tt Signal}, which is the parent class for several signal types,
666
including {\tt Sinusoid}, which represents both sine and cosine
667
signals.
668
669
{\tt thinkdsp} provides functions to create sine and cosine signals:
670
671
\begin{verbatim}
672
cos_sig = thinkdsp.CosSignal(freq=440, amp=1.0, offset=0)
673
sin_sig = thinkdsp.SinSignal(freq=880, amp=0.5, offset=0)
674
\end{verbatim}
675
676
{\tt freq} is frequency in Hz. {\tt amp} is amplitude in unspecified
677
units where 1.0 is defined as the largest amplitude we can record or
678
play back.
679
680
{\tt offset} is a {\bf phase offset} in radians. Phase offset
681
determines where in the period the signal starts. For example, a
682
sine signal with {\tt offset=0} starts at $\sin 0$, which is 0.
683
With {\tt offset=pi/2} it starts at $\sin \pi/2$, which is 1.
684
685
Signals have an \verb"__add__" method, so you can use the {\tt +}
686
operator to add them:
687
688
\begin{verbatim}
689
mix = sin_sig + cos_sig
690
\end{verbatim}
691
692
The result is a {\tt SumSignal}, which represents the sum of two
693
or more signals.
694
695
A Signal is basically a Python representation of a mathematical
696
function. Most signals are defined for all values of {\tt t},
697
from negative infinity to infinity.
698
699
You can't do much with a Signal until you evaluate it. In this
700
context, ``evaluate'' means taking a sequence of points in time, {\tt
701
ts}, and computing the corresponding values of the signal, {\tt ys}.
702
I represent {\tt ts} and {\tt ys} using NumPy arrays and encapsulate
703
them in an object called a Wave.
704
705
A Wave represents a signal evaluated at a sequence of points in
706
time. Each point in time is called a {\bf frame} (a term borrowed
707
from movies and video). The measurement itself is called a
708
{\bf sample}, although ``frame'' and ``sample'' are sometimes
709
used interchangeably.
710
711
{\tt Signal} provides \verb"make_wave", which returns a new
712
Wave object:
713
714
\begin{verbatim}
715
wave = mix.make_wave(duration=0.5, start=0, framerate=11025)
716
\end{verbatim}
717
718
{\tt duration} is the length of the Wave in seconds. {\tt start} is
719
the start time, also in seconds. {\tt framerate} is the (integer)
720
number of frames per second, which is also the number of samples
721
per second.
722
723
11,025 frames per second is one of several framerates commonly used in
724
audio file formats, including Waveform Audio File (WAV) and mp3.
725
726
This example evaluates the signal from {\tt t=0} to {\tt t=0.5} at
727
5,513 equally-spaced frames (because 5,513 is half of 11,025).
728
The time between frames, or {\bf timestep}, is {\tt 1/11025} seconds, about
729
91 $\mu$s.
730
731
{\tt Wave} provides a {\tt plot} method that uses {\tt pyplot}.
732
You can plot the wave like this:
733
734
\begin{verbatim}
735
wave.plot()
736
pyplot.show()
737
\end{verbatim}
738
739
{\tt pyplot} is part of {\tt matplotlib}; it is included in many
740
Python distributions, or you might have to install it.
741
742
\begin{figure}
743
% sounds4.py
744
\centerline{\includegraphics[height=2.5in]{figs/sounds4.pdf}}
745
\caption{Segment from a mixture of two sinusoid signals.}
746
\label{fig.sounds4}
747
\end{figure}
748
749
At {\tt freq=440} there are 220 periods in 0.5 seconds, so this plot
750
would look like a solid block of color. To zoom in on a small number
751
of periods, we can use {\tt segment}, which copies a segment of a Wave
752
and returns a new wave:
753
754
\begin{verbatim}
755
period = mix.period
756
segment = wave.segment(start=0, duration=period*3)
757
\end{verbatim}
758
759
{\tt period} is a property of a Signal; it returns the period in seconds.
760
761
{\tt start} and {\tt duration} are in seconds. This example copies
762
the first three periods from {\tt mix}. The result is a Wave object.
763
764
If we plot {\tt segment}, it looks like Figure~\ref{fig.sounds4}.
765
This signal contains two frequency components, so it is more
766
complicated than the signal from the tuning fork, but less complicated
767
than the violin.
768
769
770
\section{Reading and writing Waves}
771
772
{\tt thinkdsp} provides \verb"read_wave", which reads a WAV
773
file and returns a Wave:
774
775
\begin{verbatim}
776
violin_wave = thinkdsp.read_wave('input.wav')
777
\end{verbatim}
778
779
And {\tt Wave} provides {\tt write}, which writes a WAV file:
780
781
\begin{verbatim}
782
wave.write(filename='output.wav')
783
\end{verbatim}
784
785
You can listen to the Wave with any media player that plays WAV
786
files. On UNIX systems, I use {\tt aplay}, which is simple, robust,
787
and included in many Linux distributions.
788
789
{\tt thinkdsp} also provides \verb"play_wave", which runs
790
the media player as a subprocess:
791
792
\begin{verbatim}
793
thinkdsp.play_wave(filename='output.wav', player='aplay')
794
\end{verbatim}
795
796
It uses {\tt aplay} by default, but you can provide the
797
name of another player.
798
799
800
\section{Spectrums}
801
\label{spectrums}
802
803
{\tt Wave} provides \verb"make_spectrum", which returns a
804
{\tt Spectrum}:
805
806
\begin{verbatim}
807
spectrum = wave.make_spectrum()
808
\end{verbatim}
809
810
And {\tt Spectrum} provides {\tt plot}:
811
812
\begin{verbatim}
813
spectrum.plot()
814
thinkplot.show()
815
\end{verbatim}
816
817
{\tt thinkplot} is a module I wrote to provide wrappers around some of
818
the functions in {\tt pyplot}. It is included in the
819
Git repository for this book (see Section~\ref{code}).
820
821
{\tt Spectrum} provides three methods that modify the spectrum:
822
823
\begin{itemize}
824
825
\item \verb"low_pass" applies a low-pass filter, which means that
826
components above a given cutoff frequency are attenuated (that is,
827
reduced in magnitude) by a factor.
828
829
\item \verb"high_pass" applies a high-pass filter, which means that
830
it attenuates components below the cutoff.
831
832
\item \verb"band_stop" attenuates components in the band of
833
frequencies between two cutoffs.
834
835
\end{itemize}
836
837
This example attenuates all frequencies above 600 by 99\%:
838
839
\begin{verbatim}
840
spectrum.low_pass(cutoff=600, factor=0.01)
841
\end{verbatim}
842
843
A low pass filter removes bright, high-frequency sounds, so
844
the result sounds muffled and darker. To hear what it sounds
845
like, you can convert the Spectrum back to a Wave, and then play it.
846
847
\begin{verbatim}
848
wave = spectrum.make_wave()
849
wave.play('temp.wav')
850
\end{verbatim}
851
852
The {\tt play} method writes the wave to a file and then plays it.
853
If you use Jupyter notebooks, you can use \verb"make_audio", which
854
makes an Audio widget that plays the sound.
855
856
857
\section{Wave objects}
858
859
\begin{figure}
860
% http://yuml.me/edit/5294b377
861
% pdftops -eps uml_diagram1.pdf
862
\centerline{\includegraphics[width=3.5in]{figs/uml_diagram1.pdf}}
863
\caption{Relationships among the classes in {\tt thinkdsp}.}
864
\label{fig.diagram1}
865
\end{figure}
866
867
There is nothing very complicated in {\tt thinkdsp.py}. Most
868
of the functions it provides are thin wrappers around functions
869
from NumPy and SciPy.
870
871
The primary classes in {\tt thinkdsp} are Signal, Wave, and Spectrum.
872
Given a Signal, you can make a Wave. Given a Wave, you can
873
make a Spectrum, and vice versa. These relationships are shown
874
in Figure~\ref{fig.diagram1}.
875
876
A Wave object contains three attributes: {\tt ys} is a NumPy array
877
that contains the values in the signal; {\tt ts} is an array of the
878
times where the signal was evaluated or sampled; and {\tt
879
framerate} is the number of samples per unit of time. The
880
unit of time is usually seconds, but it doesn't have to be. In
881
one of my examples, it's days.
882
883
Wave also provides three read-only properties: {\tt start},
884
{\tt end}, and {\tt duration}. If you modify {\tt ts}, these
885
properties change accordingly.
886
887
To modify a wave, you can access the {\tt ts} and {\tt ys} directly.
888
For example:
889
890
\begin{verbatim}
891
wave.ys *= 2
892
wave.ts += 1
893
\end{verbatim}
894
895
The first line scales the wave by a factor of 2, making
896
it louder. The second line shifts the wave in time, making it
897
start 1 second later.
898
899
But Wave provides methods that perform many common operations.
900
For example, the same two transformations could be written:
901
902
\begin{verbatim}
903
wave.scale(2)
904
wave.shift(1)
905
\end{verbatim}
906
907
You can read the documentation of these methods and others at
908
\url{http://greenteapress.com/thinkdsp.html}.
909
910
911
\section{Signal objects}
912
\label{sigobs}
913
914
Signal is a parent class that provides functions common to all
915
kinds of signals, like \verb"make_wave". Child classes inherit
916
these methods and provide {\tt evaluate}, which evaluates the
917
signal at a given sequence of times.
918
919
For example, Sinusoid is a child class of Signal, with this
920
definition:
921
922
\begin{verbatim}
923
class Sinusoid(Signal):
924
925
def __init__(self, freq=440, amp=1.0, offset=0, func=np.sin):
926
Signal.__init__(self)
927
self.freq = freq
928
self.amp = amp
929
self.offset = offset
930
self.func = func
931
\end{verbatim}
932
933
The parameters of \verb"__init__" are:
934
935
\begin{itemize}
936
937
\item {\tt freq}: frequency in cycles per second, or Hz.
938
939
\item {\tt amp}: amplitude. The units of amplitude are arbitrary,
940
usually chosen so 1.0 corresponds to the maximum input from a
941
microphone or maximum output to a speaker.
942
943
\item {\tt offset}: indicates where in its period the signal starts;
944
{\tt offset} is in units of radians, for reasons I explain below.
945
946
\item {\tt func}: a Python function used
947
to evaluate the signal at a particular point in time. It is
948
usually either {\tt np.sin} or {\tt np.cos}, yielding a sine or
949
cosine signal.
950
951
\end{itemize}
952
953
Like many init methods, this one just tucks the parameters away for
954
future use.
955
956
Signal provides \verb"make_wave", which looks like
957
this:
958
959
\begin{verbatim}
960
def make_wave(self, duration=1, start=0, framerate=11025):
961
n = round(duration * framerate)
962
ts = start + np.arange(n) / framerate
963
ys = self.evaluate(ts)
964
return Wave(ys, ts, framerate=framerate)
965
\end{verbatim}
966
967
{\tt start} and {\tt duration} are the start time and duration
968
in seconds. {\tt framerate} is the number of frames (samples)
969
per second.
970
971
{\tt n} is the number of samples, and {\tt ts} is a NumPy array
972
of sample times.
973
974
To compute the {\tt ys}, \verb"make_wave" invokes {\tt evaluate},
975
is provided by {\tt Sinusoid}:
976
977
\begin{verbatim}
978
def evaluate(self, ts):
979
phases = PI2 * self.freq * ts + self.offset
980
ys = self.amp * self.func(phases)
981
return ys
982
\end{verbatim}
983
984
Let's unwind this function one step at time:
985
986
\begin{enumerate}
987
988
\item {\tt self.freq} is frequency in cycles per second, and each
989
element of {\tt ts} is a time in seconds, so their product is the
990
number of cycles since the start time.
991
992
\item {\tt PI2} is a constant that stores $2 \pi$. Multiplying by
993
{\tt PI2} converts from cycles to {\bf phase}. You can think of
994
phase as ``cycles since the start time'' expressed in radians. Each
995
cycle is $2 \pi$ radians.
996
997
\item {\tt self.offset} is the phase when $t=0$.
998
It has the effect of shifting the signal left or right in time.
999
1000
\item If {\tt self.func} is {\tt np.sin} or {\tt np.cos}, the result is a
1001
value between $-1$ and $+1$.
1002
1003
\item Multiplying by {\tt self.amp} yields a signal that ranges from
1004
{\tt -self.amp} to {\tt +self.amp}.
1005
1006
\end{enumerate}
1007
1008
In math notation, {\tt evaluate} is written like this:
1009
%
1010
\[ y = A \cos (2 \pi f t + \phi_0) \]
1011
%
1012
where $A$ is amplitude, $f$ is frequency, $t$ is time, and $\phi_0$
1013
is the phase offset. It may seem like I wrote a lot of code
1014
to evaluate one simple expression, but as we'll see, this code
1015
provides a framework for dealing with all kinds of signals, not
1016
just sinusoids.
1017
1018
1019
\section{Exercises}
1020
1021
Before you begin these exercises, you should download the code
1022
for this book, following the instructions in Section~\ref{code}.
1023
1024
Solutions to these exercises are in {\tt chap01soln.ipynb}.
1025
1026
\begin{exercise}
1027
If you have Jupyter, load {\tt chap01.ipynb}, read through it, and run
1028
the examples. You can also view this notebook at
1029
\url{http://tinyurl.com/thinkdsp01}.
1030
\end{exercise}
1031
1032
1033
\begin{exercise}
1034
Go to \url{http://freesound.org} and download a sound sample that
1035
includes music, speech, or other sounds that have a well-defined pitch.
1036
Select a roughly half-second segment where the pitch is
1037
constant. Compute and plot the spectrum of the segment you selected.
1038
What connection can you make between the timbre of the sound and the
1039
harmonic structure you see in the spectrum?
1040
1041
Use \verb"high_pass", \verb"low_pass", and \verb"band_stop" to
1042
filter out some of the harmonics. Then convert the spectrum back
1043
to a wave and listen to it. How does the sound relate to the
1044
changes you made in the spectrum?
1045
\end{exercise}
1046
1047
1048
\begin{exercise}
1049
Synthesize a compound signal by creating SinSignal and CosSignal
1050
objects and adding them up. Evaluate the signal to get a Wave,
1051
and listen to it. Compute its Spectrum and plot it.
1052
What happens if you add frequency
1053
components that are not multiples of the fundamental?
1054
\end{exercise}
1055
1056
1057
\begin{exercise}
1058
Write a function called {\tt stretch} that takes a Wave and a stretch
1059
factor and speeds up or slows down the wave by modifying {\tt ts} and
1060
{\tt framerate}. Hint: it should only take two lines of code.
1061
\end{exercise}
1062
1063
1064
\chapter{Harmonics}
1065
\label{harmonics}
1066
1067
In this chapter I present several new waveforms; we will look and
1068
their spectrums to understand their {\bf harmonic structure}, which is
1069
the set of sinusoids they are made up of.
1070
1071
I'll also introduce one of the most important phenomena in digital
1072
signal processing: aliasing. And I'll explain a little more about how
1073
the Spectrum class works.
1074
1075
The code for this chapter is in {\tt chap02.ipynb}, which is in the
1076
repository for this book (see Section~\ref{code}).
1077
You can also view it at \url{http://tinyurl.com/thinkdsp02}.
1078
1079
1080
\section{Triangle waves}
1081
\label{triangle}
1082
1083
A sinusoid contains only one frequency component, so its spectrum
1084
has only one peak. More complicated waveforms, like the
1085
violin recording, yield DFTs with many peaks. In this section we
1086
investigate the relationship between waveforms and their spectrums.
1087
1088
\begin{figure}
1089
% aliasing.py
1090
\centerline{\includegraphics[height=2.5in]{figs/triangle-200-1.pdf}}
1091
\caption{Segment of a triangle signal at 200 Hz.}
1092
\label{fig.triangle.200.1}
1093
\end{figure}
1094
1095
I'll start with a triangle waveform, which is like a straight-line
1096
version of a sinusoid. Figure~\ref{fig.triangle.200.1} shows a
1097
triangle waveform with frequency 200 Hz.
1098
1099
To generate a triangle wave, you can use {\tt thinkdsp.TriangleSignal}:
1100
1101
\begin{verbatim}
1102
class TriangleSignal(Sinusoid):
1103
1104
def evaluate(self, ts):
1105
cycles = self.freq * ts + self.offset / PI2
1106
frac, _ = np.modf(cycles)
1107
ys = np.abs(frac - 0.5)
1108
ys = normalize(unbias(ys), self.amp)
1109
return ys
1110
\end{verbatim}
1111
1112
{\tt TriangleSignal} inherits \verb"__init__" from {\tt Sinusoid},
1113
so it takes the same arguments: {\tt freq}, {\tt amp}, and {\tt offset}.
1114
1115
The only difference is {\tt evaluate}. As we saw before,
1116
{\tt ts} is the sequence of sample times where we want to
1117
evaluate the signal.
1118
1119
There are many ways to generate a triangle wave. The details
1120
are not important, but here's how {\tt evaluate} works:
1121
1122
\begin{enumerate}
1123
1124
\item {\tt cycles} is the number of cycles since the start time.
1125
{\tt np.modf} splits the number of cycles into the fraction
1126
part, stored in {\tt frac}, and the integer part, which is ignored
1127
\footnote{Using an underscore as a variable name is a convention that
1128
means, ``I don't intend to use this value.''}.
1129
1130
\item {\tt frac} is a sequence that ramps from 0 to 1 with the given
1131
frequency. Subtracting 0.5 yields values between -0.5 and 0.5.
1132
Taking the absolute value yields a waveform that zig-zags between
1133
0.5 and 0.
1134
1135
\item {\tt unbias} shifts the waveform down so it is centered at 0; then
1136
{\tt normalize} scales it to the given amplitude, {\tt amp}.
1137
1138
\end{enumerate}
1139
1140
Here's the code that generates Figure~\ref{fig.triangle.200.1}:
1141
1142
\begin{verbatim}
1143
signal = thinkdsp.TriangleSignal(200)
1144
signal.plot()
1145
\end{verbatim}
1146
1147
\begin{figure}
1148
% aliasing.py
1149
\centerline{\includegraphics[height=2.5in]{figs/triangle-200-2.pdf}}
1150
\caption{Spectrum of a triangle signal at 200 Hz, shown on two
1151
vertical scales. The version on the right cuts off the fundamental
1152
to show the harmonics more clearly.}
1153
\label{fig.triangle.200.2}
1154
\end{figure}
1155
1156
Next we can use the Signal to make a Wave, and use the Wave to
1157
make a Spectrum:
1158
1159
\begin{verbatim}
1160
wave = signal.make_wave(duration=0.5, framerate=10000)
1161
spectrum = wave.make_spectrum()
1162
spectrum.plot()
1163
\end{verbatim}
1164
1165
Figure~\ref{fig.triangle.200.2} shows two views of the result; the
1166
view on the right is scaled to show the harmonics more clearly. As
1167
expected, the highest peak is at the fundamental frequency, 200 Hz,
1168
and there are additional peaks at harmonic frequencies, which are
1169
integer multiples of 200.
1170
1171
But one surprise is that there are no peaks at the even multiples:
1172
400, 800, etc. The harmonics of a triangle wave are all
1173
odd multiples of the fundamental frequency, in this example
1174
600, 1000, 1400, etc.
1175
1176
Another feature of this spectrum is the relationship between the
1177
amplitude and frequency of the harmonics. Their amplitude drops off
1178
in proportion to frequency squared. For example the frequency ratio
1179
of the first two harmonics (200 and 600 Hz) is 3, and the amplitude
1180
ratio is approximately 9. The frequency ratio of the next two
1181
harmonics (600 and 1000 Hz) is 1.7, and the amplitude ratio is
1182
approximately $1.7^2 = 2.9$. This relationship is called the
1183
{\bf harmonic structure}.
1184
1185
1186
\section{Square waves}
1187
\label{square}
1188
1189
\begin{figure}
1190
% aliasing.py
1191
\centerline{\includegraphics[height=2.5in]{figs/square-100-1.pdf}}
1192
\caption{Segment of a square signal at 100 Hz.}
1193
\label{fig.square.100.1}
1194
\end{figure}
1195
1196
{\tt thinkdsp} also provides {\tt SquareSignal}, which represents
1197
a square signal. Here's the class definition:
1198
1199
\begin{verbatim}
1200
class SquareSignal(Sinusoid):
1201
1202
def evaluate(self, ts):
1203
cycles = self.freq * ts + self.offset / PI2
1204
frac, _ = np.modf(cycles)
1205
ys = self.amp * np.sign(unbias(frac))
1206
return ys
1207
\end{verbatim}
1208
1209
Like {\tt TriangleSignal}, {\tt SquareSignal} inherits
1210
\verb"__init__" from {\tt Sinusoid}, so it takes the same
1211
parameters.
1212
1213
And the {\tt evaluate} method is similar. Again, {\tt cycles} is
1214
the number of cycles since the start time, and {\tt frac} is the
1215
fractional part, which ramps from 0 to 1 each period.
1216
1217
{\tt unbias} shifts {\tt frac} so it ramps from -0.5 to 0.5,
1218
then {\tt np.sign} maps the negative values to -1 and the
1219
positive values to 1. Multiplying by {\tt amp} yields a square
1220
wave that jumps between {\tt -amp} and {\tt amp}.
1221
1222
\begin{figure}
1223
% aliasing.py
1224
\centerline{\includegraphics[height=2.5in]{figs/square-100-2.pdf}}
1225
\caption{Spectrum of a square signal at 100 Hz.}
1226
\label{fig.square.100.2}
1227
\end{figure}
1228
1229
Figure~\ref{fig.square.100.1} shows three periods of a square
1230
wave with frequency 100 Hz,
1231
and Figure~\ref{fig.square.100.2} shows its spectrum.
1232
1233
Like a triangle wave, the square wave contains only odd harmonics,
1234
which is why there are peaks at 300, 500, and 700 Hz, etc.
1235
But the amplitude of the harmonics drops off more slowly.
1236
Specifically, amplitude drops in proportion to frequency (not frequency
1237
squared).
1238
1239
The exercises at the end of this chapter give you a chance to
1240
explore other waveforms and other harmonic structures.
1241
1242
1243
\section{Aliasing}
1244
\label{aliasing}
1245
1246
\begin{figure}
1247
% aliasing.py
1248
\centerline{\includegraphics[height=2.5in]{figs/triangle-1100-2.pdf}}
1249
\caption{Spectrum of a triangle signal at 1100 Hz sampled at
1250
10,000 frames per second. The view on the right is scaled to
1251
show the harmonics.}
1252
\label{fig.triangle.1100.2}
1253
\end{figure}
1254
1255
I have a confession. I chose the examples in the previous section
1256
carefully to avoid showing you something confusing. But now it's
1257
time to get confused.
1258
1259
Figure~\ref{fig.triangle.1100.2} shows the spectrum of a triangle wave
1260
at 1100 Hz, sampled at 10,000 frames per second. Again, the view on
1261
the right is scaled to show the harmonics.
1262
1263
The harmonics
1264
of this wave should be at 3300, 5500, 7700, and 9900 Hz.
1265
In the figure, there are peaks at 1100 and 3300 Hz, as expected, but
1266
the third peak is at 4500, not 5500 Hz. The
1267
fourth peak is at 2300, not 7700 Hz. And if you look closely, the
1268
peak that should be at 9900 is actually at 100 Hz. What's going on?
1269
1270
The problem is that when you evaluate the signal at
1271
discrete points in time, you lose information about what happened
1272
between samples. For low frequency components, that's not a
1273
problem, because you have lots of samples per period.
1274
1275
But if you sample a signal at 5000 Hz with 10,000 frames per second,
1276
you only have two samples per period. That turns out to be enough,
1277
just barely, but if the frequency is higher, it's not.
1278
1279
To see why, let's generate cosine signals at 4500 and 5500 Hz,
1280
and sample them at 10,000 frames per second:
1281
1282
\begin{verbatim}
1283
framerate = 10000
1284
1285
signal = thinkdsp.CosSignal(4500)
1286
duration = signal.period*5
1287
segment = signal.make_wave(duration, framerate=framerate)
1288
segment.plot()
1289
1290
signal = thinkdsp.CosSignal(5500)
1291
segment = signal.make_wave(duration, framerate=framerate)
1292
segment.plot()
1293
\end{verbatim}
1294
1295
\begin{figure}
1296
% aliasing.py
1297
\centerline{\includegraphics[height=3.5in]{figs/aliasing1.pdf}}
1298
\caption{Cosine signals at 4500 and 5500 Hz, sampled at 10,000 frames
1299
per second. The signals are different, but the samples are identical.}
1300
\label{fig.aliasing1}
1301
\end{figure}
1302
1303
Figure~\ref{fig.aliasing1} shows the result. I plotted the Signals
1304
with thin gray lines and the samples using vertical lines,
1305
to make it easier to compare the
1306
two Waves. The problem
1307
should be clear: even though the Signals are different, the
1308
Waves are identical!
1309
1310
When we sample a 5500 Hz signal at 10,000 frames per second, the
1311
result is indistinguishable from a 4500 Hz signal.
1312
For the same reason, a 7700 Hz signal is indistinguishable
1313
from 2300 Hz, and a 9900 Hz is indistinguishable from 100 Hz.
1314
1315
This effect is called {\bf aliasing} because when the high frequency
1316
signal is sampled, it appears to be a low frequency signal.
1317
1318
In this example, the highest frequency we can measure is 5000 Hz,
1319
which is half the sampling rate. Frequencies above 5000 Hz are folded
1320
back below 5000 Hz, which is why this threshold is sometimes called
1321
the ``folding frequency''. It is sometimes also called the {\bf
1322
Nyquist frequency}. See
1323
\url{http://en.wikipedia.org/wiki/Nyquist_frequency}.
1324
1325
The folding pattern continues if the aliased frequency goes below
1326
zero. For example, the 5th harmonic of the 1100 Hz triangle wave is
1327
at 12,100 Hz. Folded at 5000 Hz, it would appear at -2100 Hz, but it
1328
gets folded again at 0 Hz, back to 2100 Hz. In fact, you can see a
1329
small peak at 2100 Hz in Figure~\ref{fig.square.100.2}, and the next
1330
one at 4300 Hz.
1331
1332
1333
\section{Computing the spectrum}
1334
1335
We have seen the Wave method \verb"make_spectrum" several times.
1336
Here is the implementation (leaving out some details we'll get
1337
to later):
1338
1339
\begin{verbatim}
1340
from np.fft import rfft, rfftfreq
1341
1342
# class Wave:
1343
def make_spectrum(self):
1344
n = len(self.ys)
1345
d = 1 / self.framerate
1346
1347
hs = rfft(self.ys)
1348
fs = rfftfreq(n, d)
1349
1350
return Spectrum(hs, fs, self.framerate)
1351
\end{verbatim}
1352
1353
The parameter {\tt self} is a Wave object. {\tt n} is the number
1354
of samples in the wave, and {\tt d} is the inverse of the
1355
frame rate, which is the time between samples.
1356
1357
{\tt np.fft} is the NumPy module that provides functions related
1358
to the {\bf Fast Fourier Transform} (FFT), which is an efficient
1359
algorithm that computes the Discrete Fourier Transform (DFT).
1360
1361
%TODO: add a forward reference to full fft
1362
\verb"make_spectrum" uses {\tt rfft}, which stands for ``real
1363
FFT'', because the Wave contains real values, not complex. Later
1364
we'll see the full FFT, which can handle complex signals. The result
1365
of {\tt rfft}, which I call {\tt hs}, is a NumPy array of complex
1366
numbers that represents the amplitude and phase offset of each
1367
frequency component in the wave.
1368
1369
The result of {\tt rfftfreq}, which I call {\tt fs}, is an array that
1370
contains frequencies corresponding to the {\tt hs}.
1371
1372
To understand the values in {\tt hs}, consider these two ways to think
1373
about complex numbers:
1374
1375
\begin{itemize}
1376
1377
\item A complex number is the sum of a real part and an imaginary
1378
part, often written $x + iy$, where $i$ is the imaginary unit,
1379
$\sqrt{-1}$. You can think of $x$ and $y$ as Cartesian coordinates.
1380
1381
\item A complex number is also the product of a magnitude and a
1382
complex exponential, $A e^{i \phi}$, where $A$ is the {\bf
1383
magnitude} and $\phi$ is the {\bf angle} in radians, also called
1384
the ``argument''. You can think of $A$ and $\phi$ as polar
1385
coordinates.
1386
1387
\end{itemize}
1388
1389
Each value in {\tt hs} corresponds to a frequency component: its
1390
magnitude is proportional to the amplitude of the corresponding
1391
component; its angle is the phase offset.
1392
1393
The Spectrum class provides two read-only properties, {\tt amps}
1394
and {\tt angles}, which return NumPy arrays representing the
1395
magnitudes and angles of the {\tt hs}. When we plot a Spectrum
1396
object, we usually plot {\tt amps} versus {\tt fs}. Sometimes
1397
it is also useful to plot {\tt angles} versus {\tt fs}.
1398
1399
Although it might be tempting to look at the real and imaginary
1400
parts of {\tt hs}, you will almost never need to. I encourage
1401
you to think of the DFT as a vector of amplitudes and phase offsets
1402
that happen to be encoded in the form of complex numbers.
1403
1404
To modify a Spectrum, you can access the {\tt hs} directly.
1405
For example:
1406
1407
\begin{verbatim}
1408
spectrum.hs *= 2
1409
spectrum.hs[spectrum.fs > cutoff] = 0
1410
\end{verbatim}
1411
1412
The first line multiples the elements of {\tt hs} by 2, which
1413
doubles the amplitudes of all components. The second line
1414
sets to 0 only the elements of {\tt hs} where the corresponding
1415
frequency exceeds some cutoff frequency.
1416
1417
But Spectrum also provides methods to perform these operations:
1418
1419
\begin{verbatim}
1420
spectrum.scale(2)
1421
spectrum.low_pass(cutoff)
1422
\end{verbatim}
1423
1424
You can read the documentation of these methods and others at
1425
\url{http://greenteapress.com/thinkdsp.html}.
1426
1427
At this point you should have a better idea of how the Signal, Wave,
1428
and Spectrum classes work, but I have not explained how the Fast
1429
Fourier Transform works. That will take a few more chapters.
1430
1431
1432
\section{Exercises}
1433
1434
Solutions to these exercises are in {\tt chap02soln.ipynb}.
1435
1436
\begin{exercise}
1437
If you use Jupyter, load {\tt chap02.ipynb} and try out the examples.
1438
You can also view the notebook at \url{http://tinyurl.com/thinkdsp02}.
1439
\end{exercise}
1440
1441
\begin{exercise}
1442
A sawtooth signal has a waveform that ramps up linearly from -1 to 1,
1443
then drops to -1 and repeats. See
1444
\url{http://en.wikipedia.org/wiki/Sawtooth_wave}
1445
1446
Write a class called
1447
{\tt SawtoothSignal} that extends {\tt Signal} and provides
1448
{\tt evaluate} to evaluate a sawtooth signal.
1449
1450
Compute the spectrum of a sawtooth wave. How does the harmonic
1451
structure compare to triangle and square waves?
1452
\end{exercise}
1453
1454
\begin{exercise}
1455
Make a square signal at 1100 Hz and make a wave that samples it
1456
at 10000 frames per second. If you plot the spectrum, you can
1457
see that most of the harmonics are aliased.
1458
When you listen to the wave, can you hear the aliased harmonics?
1459
\end{exercise}
1460
1461
1462
\begin{exercise}
1463
If you have a spectrum object, {\tt spectrum}, and print the
1464
first few values of {\tt spectrum.fs}, you'll see that they
1465
start at zero. So {\tt spectrum.hs[0]} is the magnitude
1466
of the component with frequency 0. But what does that mean?
1467
1468
Try this experiment:
1469
1470
\begin{enumerate}
1471
1472
\item Make a triangle signal with frequency 440 and make
1473
a Wave with duration 0.01 seconds. Plot the waveform.
1474
1475
\item Make a Spectrum object and print {\tt spectrum.hs[0]}.
1476
What is the amplitude and phase of this component?
1477
1478
\item Set {\tt spectrum.hs[0] = 100}. Make a Wave from the
1479
modified Spectrum and plot it. What effect does this operation
1480
have on the waveform?
1481
1482
\end{enumerate}
1483
1484
\end{exercise}
1485
1486
1487
\begin{exercise}
1488
Write a function that takes a Spectrum as a parameter and modifies
1489
it by dividing each element of {\tt hs} by the corresponding frequency
1490
from {\tt fs}. Hint: since division by zero is undefined, you
1491
might want to set {\tt spectrum.hs[0] = 0}.
1492
1493
Test your function using a square, triangle, or sawtooth wave.
1494
1495
\begin{enumerate}
1496
1497
\item Compute the Spectrum and plot it.
1498
1499
\item Modify the Spectrum using your function and plot it again.
1500
1501
\item Make a Wave from the modified Spectrum and listen to it. What
1502
effect does this operation have on the signal?
1503
1504
\end{enumerate}
1505
1506
\end{exercise}
1507
1508
\begin{exercise}
1509
Triangle and square waves have odd harmonics only; the sawtooth
1510
wave has both even and odd harmonics. The harmonics of the
1511
square and sawtooth waves drop off in proportion to $1/f$; the
1512
harmonics of the triangle wave drop off like $1/f^2$. Can you
1513
find a waveform that has even and odd harmonics that drop off
1514
like $1/f^2$?
1515
1516
Hint: There are two ways you could approach this: you could
1517
construct the signal you want by adding up sinusoids, or you
1518
could start with a signal that is similar to what you want and
1519
modify it.
1520
\end{exercise}
1521
1522
1523
\chapter{Non-periodic signals}
1524
\label{nonperiodic}
1525
1526
The signals we have worked with so far are periodic, which means
1527
that they repeat forever. It also means that the frequency
1528
components they contain do not change over time.
1529
In this chapter, we consider non-periodic signals,
1530
whose frequency components {\em do} change over time.
1531
In other words, pretty much all sound signals.
1532
1533
This chapter also presents spectrograms, a common way to visualize
1534
non-periodic signals.
1535
1536
The code for this chapter is in {\tt chap03.ipynb}, which is in the
1537
repository for this book (see Section~\ref{code}).
1538
You can also view it at \url{http://tinyurl.com/thinkdsp03}.
1539
1540
1541
\section{Linear chirp}
1542
1543
\begin{figure}
1544
% chirp.py
1545
\centerline{\includegraphics[height=2.5in]{figs/chirp3.pdf}}
1546
\caption{Chirp waveform near the beginning, middle, and end.}
1547
\label{fig.chirp3}
1548
\end{figure}
1549
1550
We'll start with a {\bf chirp}, which is a signal with variable
1551
frequency. {\tt thinkdsp} provides a Signal called Chirp that
1552
makes a sinusoid that sweeps linearly through a range of
1553
frequencies.
1554
1555
Here's an example that sweeps from 220 to 880 Hz, which
1556
is two octaves from A3 to A5:
1557
1558
\begin{verbatim}
1559
signal = thinkdsp.Chirp(start=220, end=880)
1560
wave = signal.make_wave()
1561
\end{verbatim}
1562
1563
Figure~\ref{fig.chirp3} shows segments of this wave near the
1564
beginning, middle, and end. It's clear that the frequency is
1565
increasing.
1566
1567
Before we go on, let's see how Chirp is implemented. Here
1568
is the class definition:
1569
1570
\begin{verbatim}
1571
class Chirp(Signal):
1572
1573
def __init__(self, start=440, end=880, amp=1.0):
1574
self.start = start
1575
self.end = end
1576
self.amp = amp
1577
\end{verbatim}
1578
1579
{\tt start} and {\tt end} are the frequencies, in Hz, at the start
1580
and end of the chirp. {\tt amp} is amplitude.
1581
1582
Here is the function that evaluates the signal:
1583
1584
\begin{verbatim}
1585
def evaluate(self, ts):
1586
freqs = np.linspace(self.start, self.end, len(ts)-1)
1587
return self._evaluate(ts, freqs)
1588
\end{verbatim}
1589
1590
{\tt ts} is the sequence of points in time where the signal should be
1591
evaluated; to keep this function simple, I assume they are equally-spaced.
1592
1593
If the length of {\tt ts} is $n$, you can think of it as a sequence of
1594
$n-1$ intervals of time. To compute the frequency during each
1595
interval, I use {\tt np.linspace}, which returns a NumPy array of
1596
$n-1$ values between {\tt start} and {\tt end}.
1597
1598
\verb"_evaluate" is a private method
1599
that does the rest of the math\footnote{Beginning a method name
1600
with an underscore makes it ``private'', indicating that it is not
1601
part of the API that should be used outside the class definition.}:
1602
1603
\begin{verbatim}
1604
def _evaluate(self, ts, freqs):
1605
dts = np.diff(ts)
1606
dphis = PI2 * freqs * dts
1607
phases = np.cumsum(dphis)
1608
phases = np.insert(phases, 0, 0)
1609
ys = self.amp * np.cos(phases)
1610
return ys
1611
\end{verbatim}
1612
1613
{\tt np.diff} computes the difference between adjacent elements
1614
of {\tt ts}, returning the length of each interval in seconds.
1615
If the elements of {\tt ts} are equally spaced,
1616
the {\tt dts} are all the same.
1617
1618
The next step is to figure out how much the phase changes during
1619
each interval. In Section~\ref{sigobs} we saw that when frequency is
1620
constant, the phase, $\phi$, increases linearly over time:
1621
%
1622
\[ \phi = 2 \pi f t \]
1623
%
1624
When frequency is a function of time, the {\em change} in phase
1625
during a short time interval, $\Delta t$ is:
1626
%
1627
\[ \Delta \phi = 2 \pi f(t) \Delta t \]
1628
%
1629
In Python, since {\tt freqs} contains $f(t)$ and {\tt dts}
1630
contains the time intervals, we can write
1631
1632
\begin{verbatim}
1633
dphis = PI2 * freqs * dts
1634
\end{verbatim}
1635
1636
Now, since {\tt dphis} contains the changes in phase, we can
1637
get the total phase at each timestep by adding up the changes:
1638
1639
\begin{verbatim}
1640
phases = np.cumsum(dphis)
1641
phases = np.insert(phases, 0, 0)
1642
\end{verbatim}
1643
1644
{\tt np.cumsum} computes the cumulative sum, which is almost
1645
what we want, but it doesn't start at 0. So I use {\tt np.insert}
1646
to add a 0 at the beginning.
1647
1648
The result is a NumPy array where the {\tt i}th element contains the
1649
sum of the first {\tt i} terms from {\tt dphis}; that is, the total
1650
phase at the end of the {\tt i}th interval. Finally, {\tt np.cos}
1651
computes the amplitude of the wave as a function of phase (remember
1652
that phase is expressed in radians).
1653
1654
If you know calculus, you might notice that the limit as
1655
$\Delta t$ gets small is
1656
%
1657
\[ d\phi = 2 \pi f(t) dt \]
1658
%
1659
Dividing through by $dt$ yields
1660
%
1661
\[ \frac{d\phi}{dt} = 2 \pi f(t) \]
1662
%
1663
In other words, frequency is the derivative of phase. Conversely,
1664
phase is the integral of frequency. When we used {\tt cumsum}
1665
to go from frequency to phase, we were approximating integration.
1666
1667
1668
\section{Exponential chirp}
1669
1670
When you listen to this chirp, you might notice that the pitch
1671
rises quickly at first and then slows down.
1672
The chirp spans two octaves, but it only takes 2/3 s to span
1673
the first octave, and twice as long to span the second.
1674
1675
The reason is that our perception of pitch depends on the logarithm of
1676
frequency. As a result, the {\bf interval} we hear between two notes
1677
depends on the {\em ratio} of their frequencies, not the difference.
1678
``Interval'' is the musical term for the perceived difference between
1679
two pitches.
1680
1681
For example, an octave is an interval where the ratio of two
1682
pitches is 2. So the interval from 220 to 440 is one octave
1683
and the interval from 440 to 880 is also one octave. The difference
1684
in frequency is bigger, but the ratio is the same.
1685
1686
As a result, if frequency increases linearly, as in a linear
1687
chirp, the perceived pitch increases logarithmically.
1688
1689
If you want the perceived pitch to increase linearly, the frequency
1690
has to increase exponentially. A signal with that shape is called
1691
an {\bf exponential chirp}.
1692
1693
Here's the code that makes one:
1694
1695
\begin{verbatim}
1696
class ExpoChirp(Chirp):
1697
1698
def evaluate(self, ts):
1699
start, end = np.log10(self.start), np.log10(self.end)
1700
freqs = np.logspace(start, end, len(ts)-1)
1701
return self._evaluate(ts, freqs)
1702
\end{verbatim}
1703
1704
Instead of {\tt np.linspace}, this version of evaluate uses
1705
{\tt np.logspace}, which creates a series of frequencies
1706
whose logarithms are equally spaced, which means that they increase
1707
exponentially.
1708
1709
That's it; everything else is the same as Chirp. Here's the code
1710
that makes one:
1711
1712
\begin{verbatim}
1713
signal = thinkdsp.ExpoChirp(start=220, end=880)
1714
wave = signal.make_wave(duration=1)
1715
\end{verbatim}
1716
1717
You can listen to these examples in {\tt chap03.ipynb} and compare
1718
the linear and exponential chirps.
1719
1720
1721
\section{Spectrum of a chirp}
1722
\label{sauron}
1723
1724
\begin{figure}
1725
% chirp.py
1726
\centerline{\includegraphics[height=2.5in]{figs/chirp1.pdf}}
1727
\caption{Spectrum of a one-second one-octave chirp.}
1728
\label{fig.chirp1}
1729
\end{figure}
1730
1731
What do you think happens if you compute the spectrum of a chirp?
1732
Here's an example that constructs a one-second, one-octave chirp and
1733
its spectrum:
1734
1735
\begin{verbatim}
1736
signal = thinkdsp.Chirp(start=220, end=440)
1737
wave = signal.make_wave(duration=1)
1738
spectrum = wave.make_spectrum()
1739
\end{verbatim}
1740
1741
Figure~\ref{fig.chirp1} shows the result. The spectrum has
1742
components at every frequency from 220 to 440 Hz, with variations
1743
that look a little like the Eye of Sauron
1744
(see \url{http://en.wikipedia.org/wiki/Sauron}).
1745
1746
The spectrum is approximately flat between 220 and 440 Hz, which
1747
indicates that the signal spends equal time at each frequency in this
1748
range. Based on that observation, you should be able to guess what
1749
the spectrum of an exponential chirp looks like.
1750
1751
The spectrum gives hints about the structure of the signal,
1752
but it obscures the relationship between frequency and time.
1753
For example, we cannot tell by looking at this spectrum whether
1754
the frequency went up or down, or both.
1755
1756
1757
\section{Spectrogram}
1758
1759
\begin{figure}
1760
% chirp.py
1761
\centerline{\includegraphics[height=2.5in]{figs/chirp2.pdf}}
1762
\caption{Spectrogram of a one-second one-octave chirp.}
1763
\label{fig.chirp2}
1764
\end{figure}
1765
1766
To recover the relationship between frequency and time, we can break
1767
the chirp into segments and plot the spectrum of each segment. The
1768
result is called a {\bf short-time Fourier transform} (STFT).
1769
1770
There are several ways to visualize a STFT, but the most common
1771
is a {\bf spectrogram}, which shows time on the x-axis and frequency
1772
on the y-axis. Each column in the spectrogram shows the spectrum of
1773
a short segment, using color or grayscale to represent amplitude.
1774
1775
As an example, I'll compute the spectrogram of this chirp:
1776
1777
\begin{verbatim}
1778
signal = thinkdsp.Chirp(start=220, end=440)
1779
wave = signal.make_wave(duration=1, framerate=11025)
1780
\end{verbatim}
1781
1782
{\tt Wave} provides \verb"make_spectrogram", which returns a
1783
{\tt Spectrogram} object:
1784
1785
\begin{verbatim}
1786
spectrogram = wave.make_spectrogram(seg_length=512)
1787
spectrogram.plot(high=700)
1788
\end{verbatim}
1789
1790
\verb"seg_length" is the number of samples in each segment. I chose
1791
512 because FFT is most efficient when the number of samples is a
1792
power of 2.
1793
1794
Figure~\ref{fig.chirp2} shows the result. The x-axis shows time from
1795
0 to 1 seconds. The y-axis shows frequency from 0 to 700 Hz. I cut
1796
off the top part of the spectrogram; the full range goes to 5512.5 Hz,
1797
which is half of the framerate.
1798
1799
The spectrogram shows clearly that frequency increases linearly
1800
over time. Similarly, in the spectrogram of an exponential chirp, we
1801
can see the shape of the exponential curve.
1802
1803
However, notice that the peak in each column is blurred across 2--3
1804
cells. This blurring reflects the limited resolution of the
1805
spectrogram.
1806
1807
1808
\section{The Gabor limit}
1809
\label{gabor}
1810
1811
The {\bf time resolution} of the spectrogram is the duration of the
1812
segments, which corresponds to the width of the cells in the
1813
spectrogram. Since each segment is 512 frames, and there are 11,025
1814
frames per second, the duration of each segment is about 0.046 seconds.
1815
1816
The {\bf frequency resolution} is the frequency range between
1817
elements in the spectrum, which corresponds to the height of the
1818
cells. With 512 frames, we get 256 frequency components over a range
1819
from 0 to 5512.5 Hz, so the range between components is 21.6 Hz.
1820
1821
More generally, if $n$ is the segment length, the spectrum contains
1822
$n/2$ components. If the framerate is $r$, the maximum frequency in
1823
the spectrum is $r/2$. So the time resolution is $n/r$ and the
1824
frequency resolution is
1825
%
1826
\[ \frac{r/2}{n/2} \]
1827
%
1828
which is $r/n$.
1829
1830
Ideally we would like time resolution to be small, so we can see rapid
1831
changes in frequency. And we would like frequency resolution to be
1832
small so we can see small changes in frequency. But you can't have
1833
both. Notice that time resolution, $n/r$, is the inverse of frequency
1834
resolution, $r/n$. So if one gets smaller, the other gets bigger.
1835
1836
For example, if you double the segment length, you cut frequency
1837
resolution in half (which is good), but you double time resolution
1838
(which is bad). Even increasing the framerate doesn't help. You get
1839
more samples, but the range of frequencies increases at
1840
the same time.
1841
1842
This tradeoff is called the {\bf Gabor limit} and it is a fundamental
1843
limitation of this kind of time-frequency analysis.
1844
1845
1846
\section{Leakage}
1847
1848
\begin{figure}
1849
% chirp.py
1850
\centerline{\includegraphics[width=5.5in]{figs/windowing1.pdf}}
1851
\caption{Spectrum of a periodic segment of a sinusoid (left), a
1852
non-periodic segment (middle), a windowed non-periodic segment
1853
(right).}
1854
\label{fig.windowing1}
1855
\end{figure}
1856
1857
In order to explain how \verb"make_spectrogram" works, I have
1858
to explain windowing; and in order to explain windowing, I have to
1859
show you the problem it is meant to address, which is leakage.
1860
1861
The Discrete Fourier Transform (DFT), which we use to compute
1862
Spectrums, treats waves as if they are periodic; that is, it assumes
1863
that the finite segment it operates on is a complete period from an
1864
infinite signal that repeats over all time. In practice, this
1865
assumption is often false, which creates problems.
1866
1867
One common problem is discontinuity at the beginning and end of the
1868
segment. Because DFT assumes that the signal is periodic, it
1869
implicitly connects the end of the segment back to the beginning to
1870
make a loop. If the end does not connect smoothly to the beginning,
1871
the discontinuity creates additional frequency components in the
1872
segment that are not in the signal.
1873
1874
As an example, let's start with a sine signal that contains only
1875
one frequency component at 440 Hz.
1876
1877
\begin{verbatim}
1878
signal = thinkdsp.SinSignal(freq=440)
1879
\end{verbatim}
1880
1881
If we select a segment that happens to be an integer multiple of
1882
the period, the end of the segment connects smoothly with the
1883
beginning, and DFT behaves well.
1884
1885
\begin{verbatim}
1886
duration = signal.period * 30
1887
wave = signal.make_wave(duration)
1888
spectrum = wave.make_spectrum()
1889
\end{verbatim}
1890
1891
Figure~\ref{fig.windowing1} (left) shows the result. As expected,
1892
there is a single peak at 440 Hz.
1893
1894
But if the duration is not a multiple of the period, bad things
1895
happen. With {\tt duration = signal.period * 30.25}, the signal
1896
starts at 0 and ends at 1.
1897
1898
Figure~\ref{fig.windowing1} (middle) shows
1899
the spectrum of this segment. Again, the peak is at 440 Hz, but now
1900
there are additional components spread out from 240 to 640 Hz. This
1901
spread is called {\bf spectral leakage}, because some of the energy that
1902
is actually at the fundamental frequency leaks into other frequencies.
1903
1904
In this example, leakage happens because we are using DFT on a segment
1905
that becomes discontinuous when we treat it as periodic.
1906
1907
1908
\section{Windowing}
1909
1910
\begin{figure}
1911
% chirp.py
1912
\centerline{\includegraphics[height=3.5in]{figs/windowing2.pdf}}
1913
\caption{Segment of a sinusoid (top), Hamming window (middle), product
1914
of the segment and the window (bottom).}
1915
\label{fig.windowing2}
1916
\end{figure}
1917
1918
We can reduce leakage by smoothing out the discontinuity between
1919
the beginning and end of the segment, and one way to do that is
1920
{\bf windowing}.
1921
1922
A ``window'' is a function designed to transform a non-periodic
1923
segment into something that can pass for periodic.
1924
Figure~\ref{fig.windowing2} (top) shows a segment where the end does not
1925
connect smoothly to the beginning.
1926
1927
Figure~\ref{fig.windowing2} (middle) shows a ``Hamming window'', one of the
1928
more common window functions. No window function is perfect, but some
1929
can be shown to be optimal for different applications, and Hamming
1930
is a good, all-purpose window.
1931
1932
Figure~\ref{fig.windowing2} (bottom) shows the result of multiplying the
1933
window by the original signal. Where the window is close to 1, the
1934
signal is unchanged. Where the window is close to 0, the signal is
1935
attenuated. Because the window tapers at both ends, the end of the
1936
segment connects smoothly to the beginning.
1937
1938
Figure~\ref{fig.windowing1} (right) shows the spectrum of the windowed
1939
signal. Windowing has reduced leakage substantially, but not
1940
completely.
1941
1942
Here's what the code looks like. {\tt Wave} provides {\tt window},
1943
which applies a Hamming window:
1944
1945
\begin{verbatim}
1946
#class Wave:
1947
def window(self, window):
1948
self.ys *= window
1949
\end{verbatim}
1950
1951
And NumPy provides {\tt hamming}, which computes a Hamming window
1952
with a given length:
1953
1954
\begin{verbatim}
1955
window = np.hamming(len(wave))
1956
wave.window(window)
1957
\end{verbatim}
1958
1959
NumPy provides functions to compute other window
1960
functions, including {\tt bartlett}, {\tt blackman}, {\tt hanning},
1961
and {\tt kaiser}. One of the exercises at the end of this chapter
1962
asks you to experiment with these other windows.
1963
1964
1965
\section{Implementing spectrograms}
1966
1967
\begin{figure}
1968
% chirp.py
1969
\centerline{\includegraphics[height=2.5in]{figs/windowing3.pdf}}
1970
\caption{Overlapping Hamming windows.}
1971
\label{fig.windowing3}
1972
\end{figure}
1973
1974
Now that we understand windowing, we can understand the
1975
implementation of spectrogram.
1976
Here is the {\tt Wave} method that computes spectrograms:
1977
1978
\begin{verbatim}
1979
#class Wave:
1980
def make_spectrogram(self, seg_length):
1981
window = np.hamming(seg_length)
1982
i, j = 0, seg_length
1983
step = seg_length / 2
1984
1985
spec_map = {}
1986
1987
while j < len(self.ys):
1988
segment = self.slice(i, j)
1989
segment.window(window)
1990
1991
t = (segment.start + segment.end) / 2
1992
spec_map[t] = segment.make_spectrum()
1993
1994
i += step
1995
j += step
1996
1997
return Spectrogram(spec_map, seg_length)
1998
\end{verbatim}
1999
2000
This is the longest function in the book, so if you can handle
2001
this, you can handle anything.
2002
2003
The parameter, {\tt self}, is a Wave object.
2004
\verb"seg_length" is the number of samples in each segment.
2005
2006
{\tt window} is a Hamming window with the same length as the segments.
2007
2008
{\tt i} and {\tt j} are the slice indices that select segments from
2009
the wave. {\tt step} is the offset between segments. Since {\tt
2010
step} is half of \verb"seg_length", the segments overlap by half.
2011
Figure~\ref{fig.windowing3} shows what these overlapping windows look
2012
like.
2013
2014
\verb"spec_map" is a dictionary that maps from a timestamp to
2015
a Spectrum.
2016
2017
Inside the while loop, we select a slice from the wave and apply
2018
the window; then we construct a Spectrum
2019
object and add it to \verb"spec_map". The nominal time of
2020
each segment, {\tt t}, is the midpoint.
2021
2022
Then we advance {\tt i} and {\tt j}, and continue as long as {\tt j}
2023
doesn't go past the end of the Wave.
2024
2025
Finally, the method constructs and returns a Spectrogram. Here
2026
is the definition of Spectrogram:
2027
2028
\begin{verbatim}
2029
class Spectrogram(object):
2030
def __init__(self, spec_map, seg_length):
2031
self.spec_map = spec_map
2032
self.seg_length = seg_length
2033
\end{verbatim}
2034
2035
Like many init methods, this one just stores the
2036
parameters as attributes.
2037
2038
{\tt Spectrogram} provides {\tt plot}, which generates a
2039
pseudocolor plot with time along the x-axis and frequency along
2040
the y-axis.
2041
2042
And that's how Spectrograms are implemented.
2043
2044
2045
\section{Exercises}
2046
2047
Solutions to these exercises are in {\tt chap03soln.ipynb}.
2048
2049
\begin{exercise}
2050
Run and listen to the examples in {\tt chap03.ipynb}, which is
2051
in the repository for this book, and also available at
2052
\url{http://tinyurl.com/thinkdsp03}.
2053
2054
In the leakage example, try replacing the Hamming window with one of
2055
the other windows provided by NumPy, and see what effect they have on
2056
leakage. See
2057
\url{http://docs.scipy.org/doc/numpy/reference/routines.window.html}
2058
\end{exercise}
2059
2060
2061
\begin{exercise}
2062
Write a class called {\tt SawtoothChirp} that extends {\tt Chirp}
2063
and overrides {\tt evaluate} to generate a sawtooth waveform with
2064
frequency that increases (or decreases) linearly.
2065
2066
Hint: combine the evaluate functions from {\tt Chirp} and
2067
{\tt SawtoothSignal}.
2068
2069
Draw a sketch of what you think the spectrogram of this signal
2070
looks like, and then plot it. The effect of aliasing should be
2071
visually apparent, and if you listen carefully, you can hear it.
2072
\end{exercise}
2073
2074
2075
\begin{exercise}
2076
Make a sawtooth chirp that sweeps from 2500 to 3000 Hz, then use it to
2077
make a wave with duration 1 s and framerate 20 kHz. Draw a sketch of
2078
what you think the spectrum will look like. Then plot the
2079
spectrum and see if you got it right.
2080
\end{exercise}
2081
2082
2083
\begin{exercise}
2084
In musical terminology, a ``glissando'' is a note that slides from one
2085
pitch to another, so it is similar to a chirp.
2086
2087
Find or make a recording of a glissando and plot a spectrogram of the
2088
first few seconds. One suggestion: George Gershwin's {\it Rhapsody in
2089
Blue} starts with a famous clarinet glissando, which you can download
2090
from \url{http://archive.org/details/rhapblue11924}.
2091
\end{exercise}
2092
2093
2094
\begin{exercise}
2095
A trombone player can play a glissando by extending the trombone
2096
slide while blowing continuously. As the slide extends, the total
2097
length of the tube gets longer, and the resulting pitch is inversely
2098
proportional to length.
2099
2100
Assuming that the player moves the slide at a constant speed, how
2101
does frequency vary with time?
2102
2103
Write a class called {\tt TromboneGliss} that extends {\tt Chirp} and
2104
provides {\tt evaluate}. Make a wave that simulates a trombone
2105
glissando from C3 up to F3 and back down to C3. C3 is 262 Hz; F3 is
2106
349 Hz.
2107
2108
Plot a spectrogram of the resulting wave. Is a trombone glissando
2109
more like a linear or exponential chirp?
2110
\end{exercise}
2111
2112
2113
\begin{exercise}
2114
Make or find a recording of a series of vowel sounds and look at the
2115
spectrogram. Can you identify different vowels?
2116
\end{exercise}
2117
2118
2119
\chapter{Noise}
2120
2121
In English, ``noise'' means an unwanted or unpleasant sound. In the
2122
context of signal processing, it has two different senses:
2123
2124
\begin{enumerate}
2125
2126
\item As in English, it can mean an unwanted signal of any kind. If
2127
two signals interfere with each other, each signal would consider
2128
the other to be noise.
2129
2130
\item ``Noise'' also refers to a signal that contains components at
2131
many frequencies, so it lacks the harmonic structure of the periodic
2132
signals we saw in previous chapters.
2133
2134
\end{enumerate}
2135
2136
This chapter is about the second kind.
2137
2138
The code for this chapter is in {\tt chap04.ipynb}, which is in the
2139
repository for this book (see Section~\ref{code}).
2140
You can also view it at \url{http://tinyurl.com/thinkdsp04}.
2141
2142
2143
\section{Uncorrelated noise}
2144
2145
\begin{figure}
2146
% noise.py
2147
\centerline{\includegraphics[height=2.5in]{figs/whitenoise0.pdf}}
2148
\caption{Waveform of uncorrelated uniform noise.}
2149
\label{fig.whitenoise0}
2150
\end{figure}
2151
2152
The simplest way to understand noise is to generate it, and the
2153
simplest kind to generate is uncorrelated uniform noise (UU noise).
2154
``Uniform'' means the signal contains random values from a uniform
2155
distribution; that is, every value in the range is equally likely.
2156
``Uncorrelated'' means that the values are independent; that is,
2157
knowing one value provides no information about the others.
2158
2159
Here's a class that represents UU noise:
2160
2161
\begin{verbatim}
2162
class UncorrelatedUniformNoise(_Noise):
2163
2164
def evaluate(self, ts):
2165
ys = np.random.uniform(-self.amp, self.amp, len(ts))
2166
return ys
2167
\end{verbatim}
2168
2169
{\tt UncorrelatedUniformNoise} inherits from \verb"_Noise", which
2170
inherits from {\tt Signal}.
2171
2172
As usual, the evaluate function takes {\tt ts}, the times when the
2173
signal should be evaluated. It uses
2174
{\tt np.random.uniform}, which generates values from a
2175
uniform distribution. In this example, the values are in
2176
the range between {\tt -amp} to {\tt amp}.
2177
2178
The following example generates UU noise with duration 0.5
2179
seconds at 11,025 samples per second.
2180
2181
\begin{verbatim}
2182
signal = thinkdsp.UncorrelatedUniformNoise()
2183
wave = signal.make_wave(duration=0.5, framerate=11025)
2184
\end{verbatim}
2185
2186
If you play this wave, it sounds like the static you hear if you tune
2187
a radio between channels. Figure~\ref{fig.whitenoise0} shows what the
2188
waveform looks like. As expected, it looks pretty random.
2189
2190
\begin{figure}
2191
% noise.py
2192
\centerline{\includegraphics[height=2.5in]{figs/whitenoise1.pdf}}
2193
\caption{Power spectrum of uncorrelated uniform noise.}
2194
\label{fig.whitenoise1}
2195
\end{figure}
2196
2197
Now let's take a look at the spectrum:
2198
2199
\begin{verbatim}
2200
spectrum = wave.make_spectrum()
2201
spectrum.plot_power()
2202
\end{verbatim}
2203
2204
\verb"Spectrum.plot_power" is similar to \verb"Spectrum.plot",
2205
except that it plots power instead of amplitude.
2206
Power is the square of amplitude.
2207
I am switching from amplitude to power in this chapter because
2208
it is more conventional in the context of noise.
2209
2210
Figure~\ref{fig.whitenoise1} shows the result. Like the signal, the
2211
spectrum looks pretty random. In fact, it {\em is} random, but we have to
2212
be more precise about the word ``random''. There are at least three
2213
things we might like to know about a noise signal or its spectrum:
2214
2215
\begin{itemize}
2216
2217
\item Distribution: The distribution of a random signal is the set of
2218
possible values and their probabilities. For example, in the
2219
uniform noise signal, the set of values is the range from -1 to 1,
2220
and all values have the same probability. An alternative is
2221
{\bf Gaussian noise}, where the set of values is the range from negative
2222
to positive infinity, but values near 0 are the most likely, with
2223
probability that drops off according to the Gaussian or
2224
``bell'' curve.
2225
2226
\item Correlation: Is each value in the signal independent of the
2227
others, or are there dependencies between them? In UU noise, the
2228
values are independent.
2229
An alternative is {\bf Brownian noise}, where each value is the sum
2230
of the previous value and a random ``step''. So if the value of the
2231
signal is high at a particular point in time, we expect it to stay
2232
high, and if it is low, we expect
2233
it to stay low.
2234
2235
\item Relationship between power and frequency: In the spectrum of UU
2236
noise, the power at all frequencies is drawn from the same
2237
distribution; that is, the average power is the same for all
2238
frequencies. An alternative is {\bf pink noise}, where power is
2239
inversely related to frequency; that is, the power at frequency $f$
2240
is drawn from a distribution whose mean is proportional to $1/f$.
2241
2242
\end{itemize}
2243
2244
2245
\section{Integrated spectrum}
2246
2247
For UU noise we can see the relationship between power and frequency
2248
more clearly by looking at the {\bf integrated spectrum}, which
2249
is a function of frequency, $f$, that shows the cumulative power in
2250
the spectrum up to $f$.
2251
2252
\begin{figure}
2253
% noise.py
2254
\centerline{\includegraphics[height=2.5in]{figs/whitenoise2.pdf}}
2255
\caption{Integrated spectrum of uncorrelated uniform noise.}
2256
\label{fig.whitenoise2}
2257
\end{figure}
2258
2259
{\tt Spectrum} provides a method that computes the IntegratedSpectrum:
2260
2261
\begin{verbatim}
2262
def make_integrated_spectrum(self):
2263
cs = np.cumsum(self.power)
2264
cs /= cs[-1]
2265
return IntegratedSpectrum(cs, self.fs)
2266
\end{verbatim}
2267
2268
{\tt self.power} is a NumPy array containing power for each frequency.
2269
{\tt np.cumsum} computes the cumulative sum of the powers.
2270
Dividing through by the last element normalizes the integrated
2271
spectrum so it runs from 0 to 1.
2272
2273
The result is an IntegratedSpectrum. Here is the class definition:
2274
2275
\begin{verbatim}
2276
class IntegratedSpectrum(object):
2277
def __init__(self, cs, fs):
2278
self.cs = cs
2279
self.fs = fs
2280
\end{verbatim}
2281
2282
Like Spectrum, IntegratedSpectrum provides \verb"plot_power", so
2283
we can compute and plot the integrated spectrum like this:
2284
2285
\begin{verbatim}
2286
integ = spectrum.make_integrated_spectrum()
2287
integ.plot_power()
2288
\end{verbatim}
2289
2290
The result, shown in Figure~\ref{fig.whitenoise2}, is a straight line,
2291
which indicates that power at all frequencies is constant, on average.
2292
Noise with equal power at all frequencies is called {\bf white noise}
2293
by analogy with light, because an equal mixture of light at all
2294
visible frequencies is white.
2295
2296
2297
2298
\section{Brownian noise}
2299
\label{brownian}
2300
2301
\begin{figure}
2302
% noise.py
2303
\centerline{\includegraphics[height=2.5in]{figs/rednoise0.pdf}}
2304
\caption{Waveform of Brownian noise.}
2305
\label{fig.rednoise0}
2306
\end{figure}
2307
2308
UU noise is uncorrelated, which means that each value does not depend
2309
on the others. An alternative is {\bf Brownian noise}, in which each value
2310
is the sum of the previous value and a random ``step''.
2311
2312
It is called ``Brownian'' by analogy with Brownian motion, in which a
2313
particle suspended in a fluid moves apparently at random, due to
2314
unseen interactions with the fluid. Brownian motion is often
2315
described using a {\bf random walk}, which is a mathematical model
2316
of a path where the distance between steps is characterized by a
2317
random distribution.
2318
2319
In a one-dimensional random walk, the particle moves up or down
2320
by a random amount at each time step. The location of the particle
2321
at any point in time is the sum of all previous steps.
2322
2323
This observation suggests a way to generate Brownian noise:
2324
generate uncorrelated random steps and then add them up.
2325
Here is a class definition that implements this algorithm:
2326
2327
\begin{verbatim}
2328
class BrownianNoise(_Noise):
2329
2330
def evaluate(self, ts):
2331
dys = np.random.uniform(-1, 1, len(ts))
2332
ys = np.cumsum(dys)
2333
ys = normalize(unbias(ys), self.amp)
2334
return ys
2335
\end{verbatim}
2336
2337
{\tt evaluate} uses {\tt np.random.uniform} to generate an
2338
uncorrelated signal and {\tt np.cumsum} to compute their cumulative
2339
sum.
2340
2341
Since the sum is likely to escape the range from -1 to
2342
1, we have to use {\tt unbias} to shift the mean to 0, and {\tt
2343
normalize} to get the desired maximum amplitude.
2344
2345
Here's the code that generates a BrownianNoise object and plots the
2346
waveform.
2347
2348
\begin{verbatim}
2349
signal = thinkdsp.BrownianNoise()
2350
wave = signal.make_wave(duration=0.5, framerate=11025)
2351
wave.plot()
2352
\end{verbatim}
2353
2354
Figure~\ref{fig.rednoise0} shows the result. The waveform
2355
wanders up and down, but there is a clear correlation between
2356
successive values. When the amplitude is high, it tends to stay
2357
high, and vice versa.
2358
2359
\begin{figure}
2360
% noise.py
2361
\centerline{\includegraphics[height=2.5in]{figs/rednoise3.pdf}}
2362
\caption{Spectrum of Brownian noise on a linear scale (left) and
2363
log-log scale (right).}
2364
\label{fig.rednoise3}
2365
\end{figure}
2366
2367
If you plot the spectrum of Brownian noise on a linear scale, as
2368
in Figure~\ref{fig.rednoise3} (left), it
2369
doesn't look like much. Nearly all of the power is at the lowest
2370
frequencies; the higher frequency components are not visible.
2371
2372
To see the shape of the spectrum more clearly, we can plot power
2373
and frequency on a log-log scale. Here's the code:
2374
2375
\begin{verbatim}
2376
spectrum = wave.make_spectrum()
2377
spectrum.plot_power(linewidth=1, alpha=0.5)
2378
thinkplot.config(xscale='log', yscale='log')
2379
\end{verbatim}
2380
2381
The result is in Figure~\ref{fig.rednoise3} (right). The relationship between
2382
power and frequency is noisy, but roughly linear.
2383
2384
{\tt Spectrum} provides \verb"estimate_slope", which uses SciPy to compute
2385
a least squares fit to the power spectrum:
2386
2387
\begin{verbatim}
2388
#class Spectrum
2389
2390
def estimate_slope(self):
2391
x = np.log(self.fs[1:])
2392
y = np.log(self.power[1:])
2393
t = scipy.stats.linregress(x,y)
2394
return t
2395
\end{verbatim}
2396
2397
It discards the first component of the spectrum because
2398
this component corresponds to $f=0$, and $\log 0$ is undefined.
2399
2400
\verb"estimate_slope" returns the result from {\tt
2401
scipy.stats.linregress} which is an object that contains the
2402
estimated slope and intercept, coefficient of determination ($R^2$),
2403
p-value, and standard error. For our purposes, we only need the
2404
slope.
2405
2406
For Brownian noise, the slope of the power spectrum is -2 (we'll see
2407
why in Chapter~\ref{diffint}), so we can write this relationship:
2408
%
2409
\[ \log P = k -2 \log f \]
2410
%
2411
where $P$ is power, $f$ is frequency, and $k$ is the intercept
2412
of the line, which is not important for our purposes.
2413
Exponentiating both sides yields:
2414
%
2415
\[ P = K / f^{2} \]
2416
%
2417
where $K$ is $e^k$, but still not important. More relevant is
2418
that power is proportional to $1/f^2$, which is characteristic
2419
of Brownian noise.
2420
2421
Brownian noise is also called {\bf red noise}, for the same reason that
2422
white noise is called ``white''. If you combine visible light with
2423
power proportional to $1/f^2$, most of the power
2424
would be at the low-frequency end of the spectrum, which is red.
2425
Brownian noise is also sometimes called ``brown noise'', but I think
2426
that's confusing, so I won't use it.
2427
2428
%TODO: refer back to this in the diff/int chapter
2429
2430
2431
\section{Pink Noise}
2432
\label{pink}
2433
2434
\begin{figure}
2435
% noise.py
2436
\centerline{\includegraphics[height=2.5in]{figs/pinknoise0.pdf}}
2437
\caption{Waveform of pink noise with $\beta=1$.}
2438
\label{fig.pinknoise0}
2439
\end{figure}
2440
2441
For red noise, the relationship between frequency
2442
and power is
2443
%
2444
\[ P = K / f^{2} \]
2445
%
2446
There is nothing special about the exponent 2. More generally,
2447
we can synthesize noise with any exponent, $\beta$.
2448
%
2449
\[ P = K / f^{\beta} \]
2450
%
2451
When $\beta = 0$, power is constant at all frequencies,
2452
so the result is white noise. When $\beta=2$ the result is red noise.
2453
2454
When $\beta$ is between 0 and 2, the result is between white and
2455
red noise, so it is called {\bf pink noise}.
2456
2457
There are several ways to generate pink noise. The simplest is to
2458
generate white noise and then apply a low-pass filter with the
2459
desired exponent. {\tt thinkdsp} provides a class that represents
2460
a pink noise signal:
2461
2462
\begin{verbatim}
2463
class PinkNoise(_Noise):
2464
2465
def __init__(self, amp=1.0, beta=1.0):
2466
self.amp = amp
2467
self.beta = beta
2468
\end{verbatim}
2469
2470
{\tt amp} is the desired amplitude of the signal.
2471
{\tt beta} is the desired exponent. {\tt PinkNoise} provides
2472
\verb"make_wave", which generates a Wave.
2473
2474
\begin{verbatim}
2475
def make_wave(self, duration=1, start=0, framerate=11025):
2476
signal = UncorrelatedUniformNoise()
2477
wave = signal.make_wave(duration, start, framerate)
2478
spectrum = wave.make_spectrum()
2479
2480
spectrum.pink_filter(beta=self.beta)
2481
2482
wave2 = spectrum.make_wave()
2483
wave2.unbias()
2484
wave2.normalize(self.amp)
2485
return wave2
2486
\end{verbatim}
2487
2488
{\tt duration} is the length of the wave in seconds. {\tt start} is
2489
the start time of the wave; it is included so that \verb"make_wave"
2490
has the same interface for all types of signal, but for random noise,
2491
start time is irrelevant. And {\tt framerate} is the number of
2492
samples per second.
2493
2494
\begin{figure}
2495
% noise.py
2496
\centerline{\includegraphics[height=2.5in]{figs/noise-triple.pdf}}
2497
\caption{Spectrum of white, pink, and red noise on a log-log scale.}
2498
\label{fig.noise-triple}
2499
\end{figure}
2500
2501
\verb"make_wave" creates a white noise wave, computes its spectrum,
2502
applies a filter with the desired exponent, and then converts the
2503
filtered spectrum back to a wave. Then it unbiases and normalizes
2504
the wave.
2505
2506
{\tt Spectrum} provides \verb"pink_filter":
2507
2508
\begin{verbatim}
2509
def pink_filter(self, beta=1.0):
2510
denom = self.fs ** (beta/2.0)
2511
denom[0] = 1
2512
self.hs /= denom
2513
\end{verbatim}
2514
2515
\verb"pink_filter" divides each element of the spectrum by
2516
$f^{\beta/2}$. Since power is the square of amplitude, this
2517
operation divides the power at
2518
each component by $f^\beta$. It treats the component
2519
at $f=0$ as a special case, partly to avoid dividing by 0, and partly
2520
because this element represents the bias of the signal,
2521
which we are going to set to 0 anyway.
2522
2523
Figure~\ref{fig.pinknoise0} shows the resulting waveform. Like
2524
Brownian noise, it wanders up and down in a way that suggests
2525
correlation between successive values, but at least visually, it looks
2526
more random. In the next chapter we will come back to this
2527
observation and I will be more precise about what I mean by
2528
``correlation'' and ``more random''.
2529
2530
Finally, Figure~\ref{fig.noise-triple} shows a spectrum for
2531
white, pink, and red noise on the same log-log scale.
2532
The relationship between the exponent, $\beta$, and the slope
2533
of the spectrum is apparent in this figure.
2534
2535
2536
\section{Gaussian noise}
2537
2538
\begin{figure}
2539
% noise.py
2540
\centerline{\includegraphics[height=2.5in]{figs/noise1.pdf}}
2541
\caption{Normal probability plot for the real and imaginary parts
2542
of the spectrum of Gaussian noise.}
2543
\label{fig.noise1}
2544
\end{figure}
2545
2546
We started with uncorrelated uniform (UU) noise and showed that,
2547
because its spectrum has equal power at all frequencies, on
2548
average, UU noise is white.
2549
2550
But when people talk about ``white noise'', they don't always
2551
mean UU noise. In fact, more often they mean uncorrelated
2552
Gaussian (UG) noise.
2553
2554
{\tt thinkdsp} provides an implementation of UG noise:
2555
2556
\begin{verbatim}
2557
class UncorrelatedGaussianNoise(_Noise):
2558
2559
def evaluate(self, ts):
2560
ys = np.random.normal(0, self.amp, len(ts))
2561
return ys
2562
\end{verbatim}
2563
2564
{\tt np.random.normal} returns a NumPy array of values from a
2565
Gaussian distribution, in this case with mean 0 and standard deviation
2566
{\tt self.amp}. In theory the range of values is from negative to
2567
positive infinity, but we expect about 99\% of the values to be
2568
between -3 and 3.
2569
2570
UG noise is similar in many ways to UU noise. The spectrum has
2571
equal power at all frequencies, on average, so UG is also white.
2572
And it has one other interesting property: the spectrum of UG
2573
noise is also UG noise. More precisely, the real and imaginary
2574
parts of the spectrum are uncorrelated Gaussian values.
2575
2576
To test that claim, we can generate the spectrum of UG noise and
2577
then generate a ``normal probability plot'', which is a graphical
2578
way to test whether a distribution is Gaussian.
2579
2580
\begin{verbatim}
2581
signal = thinkdsp.UncorrelatedGaussianNoise()
2582
wave = signal.make_wave(duration=0.5, framerate=11025)
2583
spectrum = wave.make_spectrum()
2584
2585
thinkstats2.NormalProbabilityPlot(spectrum.real)
2586
thinkstats2.NormalProbabilityPlot(spectrum.imag)
2587
\end{verbatim}
2588
2589
{\tt NormalProbabilityPlot} is provided by {\tt thinkstats2}, which is
2590
included in the repository for this book. If you are not familiar
2591
with normal probability plots, you can read about them in Chapter 5
2592
of {\it Think Stats} at \url{http://thinkstats2.com}.
2593
2594
Figure~\ref{fig.noise1} shows the results. The gray lines
2595
show a linear model fit to the data; the dark lines show the
2596
data.
2597
2598
A straight line on a normal probability plot indicates
2599
that the data come from a Gaussian distribution. Except for
2600
some random variation at the extremes, these lines are straight,
2601
which indicates that the spectrum of UG noise is UG noise.
2602
2603
The spectrum of UU noise is also UG noise, at least approximately. In
2604
fact, by the Central Limit Theorem, the spectrum of almost any
2605
uncorrelated noise is approximately Gaussian, as long as the
2606
distribution has finite mean and standard deviation, and the number of
2607
samples is large.
2608
2609
2610
\section{Exercises}
2611
2612
Solutions to these exercises are in {\tt chap04soln.ipynb}.
2613
2614
\begin{exercise}
2615
``A Soft Murmur'' is a web site that plays a mixture of natural
2616
noise sources, including rain, waves, wind, etc. At
2617
\url{http://asoftmurmur.com/about/} you can find their list
2618
of recordings, most of which are at \url{http://freesound.org}.
2619
2620
Download a few of these files and compute the spectrum of each
2621
signal. Does the power spectrum look like white noise, pink noise,
2622
or Brownian noise? How does the spectrum vary over time?
2623
\end{exercise}
2624
2625
2626
\begin{exercise}
2627
In a noise signal, the mixture of frequencies changes over time.
2628
In the long run, we expect the power at all frequencies to be equal,
2629
but in any sample, the power at each frequency is random.
2630
2631
To estimate the long-term average power at each frequency, we can
2632
break a long signal into segments, compute the power spectrum
2633
for each segment, and then compute the average across
2634
the segments. You can read more about this algorithm at
2635
\url{http://en.wikipedia.org/wiki/Bartlett's_method}.
2636
2637
Implement Bartlett's method and use it to estimate the power
2638
spectrum for a noise wave. Hint: look at the implementation
2639
of \verb"make_spectrogram".
2640
\end{exercise}
2641
2642
2643
\begin{exercise}
2644
At \url{http://www.coindesk.com} you can download the daily
2645
price of a BitCoin as a CSV file. Read this file and compute
2646
the spectrum of BitCoin prices as a function of time.
2647
Does it resemble white, pink, or Brownian noise?
2648
\end{exercise}
2649
2650
2651
\begin{exercise}
2652
A Geiger counter is a device that detects radiation.
2653
When an ionizing particle strikes the detector, it outputs a surge of
2654
current. The total output at a point in time can be modeled as
2655
uncorrelated Poisson (UP) noise, where each sample is
2656
a random quantity from a Poisson distribution, which corresponds to the
2657
number of particles detected during an interval.
2658
2659
Write a class called {\tt UncorrelatedPoissonNoise} that inherits
2660
from \verb"thinkdsp._Noise" and provides {\tt evaluate}. It should
2661
use {\tt np.random.poisson} to generate random values from a Poisson
2662
distribution. The parameter of this function, {\tt lam}, is the
2663
average number of particles during each interval. You can use the
2664
attribute {\tt amp} to specify {\tt lam}. For example, if the
2665
framerate is 10 kHz and {\tt amp} is 0.001, we expect about 10
2666
``clicks'' per second.
2667
2668
Generate about a second of UP noise and listen to it. For low values
2669
of {\tt amp}, like 0.001, it should sound like a Geiger counter. For
2670
higher values it should sound like white noise. Compute and plot the
2671
power spectrum to see whether it looks like white noise.
2672
\end{exercise}
2673
2674
2675
\begin{exercise}
2676
The algorithm in this chapter for generating pink noise is
2677
conceptually simple but computationally expensive. There are
2678
more efficient alternatives, like the Voss-McCartney algorithm.
2679
Research this method, implement it, compute the spectrum of
2680
the result, and confirm that it has the desired relationship
2681
between power and frequency.
2682
\end{exercise}
2683
2684
2685
\chapter{Autocorrelation}
2686
2687
In the previous chapter I characterized white noise as ``uncorrelated'',
2688
which means that each value is independent of the others, and Brownian
2689
noise as ``correlated'', because each value depends on the preceding
2690
value. In this chapter I define these terms more precisely and
2691
present the {\bf autocorrelation function}, which is a useful tool
2692
for signal analysis.
2693
2694
The code for this chapter is in {\tt chap05.ipynb}, which is in the
2695
repository for this book (see Section~\ref{code}).
2696
You can also view it at \url{http://tinyurl.com/thinkdsp05}.
2697
2698
2699
\section{Correlation}
2700
2701
In general, correlation between variables means that if you know the
2702
value of one, you have some information about the other. There are
2703
several ways to quantify correlation, but the most common is the
2704
Pearson product-moment correlation coefficient, usually denoted
2705
$\rho$. For two variables, $x$ and $y$, that each contain $N$ values:
2706
%
2707
\[ \rho = \frac{ \sum_i (x_i - \mu_x) (y_i - \mu_y)}{N \sigma_x \sigma_y} \]
2708
%
2709
Where $\mu_x$ and $\mu_y$ are the means of $x$ and $y$, and
2710
$\sigma_x$ and $\sigma_y$ are their standard deviations.
2711
2712
Pearson's correlation is always between -1 and +1 (including both).
2713
If $\rho$ is positive, we say that the correlation is positive,
2714
which means that when one variable is high, the other tends to be
2715
high. If $\rho$ is negative, the correlation is negative, so
2716
when one variable is high, the other tends to be low.
2717
2718
The magnitude of $\rho$ indicates the strength of the correlation. If
2719
$\rho$ is 1 or -1, the variables are perfectly correlated, which means
2720
that if you know one, you can make a perfect prediction about the
2721
other. If $\rho$ is near zero, the correlation is probably weak, so
2722
if you know one, it doesn't tell you much about the others,
2723
2724
I say ``probably weak'' because it is also possible that there is
2725
a nonlinear relationship that is not captured by the coefficient
2726
of correlation. Nonlinear relationships are often important in
2727
statistics, but less often relevant for signal processing, so I
2728
won't say more about them here.
2729
2730
Python provides several ways to compute correlations. {\tt
2731
np.corrcoef} takes any number of variables and computes a {\bf
2732
correlation matrix} that includes correlations between each pair of
2733
variables.
2734
2735
\begin{figure}
2736
% autocorr.py
2737
\centerline{\includegraphics[height=2.5in]{figs/autocorr1.pdf}}
2738
\caption{Two sine waves that differ by a phase offset of 1 radian;
2739
their coefficient of correlation is 0.54.}
2740
\label{fig.autocorr1}
2741
\end{figure}
2742
2743
I'll present an example with only two variables. First, I define
2744
a function that constructs sine waves with different phase offsets:
2745
2746
\begin{verbatim}
2747
def make_sine(offset):
2748
signal = thinkdsp.SinSignal(freq=440, offset=offset)
2749
wave = signal.make_wave(duration=0.5, framerate=10000)
2750
return wave
2751
\end{verbatim}
2752
2753
Next I instantiate two waves with different offsets:
2754
2755
\begin{verbatim}
2756
wave1 = make_sine(offset=0)
2757
wave2 = make_sine(offset=1)
2758
\end{verbatim}
2759
2760
Figure~\ref{fig.autocorr1} shows what the first few periods of these
2761
waves look like. When one wave is high, the other is usually high, so we
2762
expect them to be correlated.
2763
2764
\begin{verbatim}
2765
>>> corr_matrix = np.corrcoef(wave1.ys, wave2.ys, ddof=0)
2766
[[ 1. 0.54]
2767
[ 0.54 1. ]]
2768
\end{verbatim}
2769
2770
The option {\tt ddof=0} indicates that {\tt corrcoef} should divide by
2771
$N$, as in the equation above, rather than use the default, $N-1$.
2772
2773
The result is a correlation matrix:
2774
the first element is the correlation of {\tt wave1}
2775
with itself, which is always 1. Similarly, the last element
2776
is the correlation of {\tt wave2} with itself.
2777
2778
\begin{figure}
2779
% autocorr.py
2780
\centerline{\includegraphics[height=2.5in]{figs/autocorr2.pdf}}
2781
\caption{The correlation of two sine waves as a function of the
2782
phase offset between them. The result is a cosine.}
2783
\label{fig.autocorr2}
2784
\end{figure}
2785
2786
The off-diagonal elements contain the value we're interested in,
2787
the correlation of {\tt wave1} and {\tt wave2}. The value 0.54
2788
indicates that the strength of the correlation is moderate.
2789
2790
As the phase offset increases, this correlation decreases until
2791
the waves are 180 degrees out of phase, which yields correlation
2792
-1. Then it increases until the offset differs by 360 degrees.
2793
At that point we have come full circle and the correlation is 1.
2794
2795
Figure~\ref{fig.autocorr2} shows the relationship between correlation and
2796
phase offset for a sine wave. The shape of that curve should look
2797
familiar; it is a cosine.
2798
2799
{\tt thinkdsp} provides a simple interface for computing the
2800
correlation between waves:
2801
2802
\begin{verbatim}
2803
>>> wave1.corr(wave2)
2804
0.54
2805
\end{verbatim}
2806
2807
2808
\section{Serial correlation}
2809
2810
Signals often represent measurements of quantities that vary in
2811
time. For example, the sound signals we've worked with represent
2812
measurements of voltage (or current), which correspond to the changes
2813
in air pressure we perceive as sound.
2814
2815
Measurements like this almost always have serial correlation, which
2816
is the correlation between each element and the next (or the previous).
2817
To compute serial correlation, we can shift a signal and then compute
2818
the correlation of the shifted version with the original.
2819
2820
\begin{verbatim}
2821
def serial_corr(wave, lag=1):
2822
n = len(wave)
2823
y1 = wave.ys[lag:]
2824
y2 = wave.ys[:n-lag]
2825
corr = np.corrcoef(y1, y2, ddof=0)[0, 1]
2826
return corr
2827
\end{verbatim}
2828
2829
\verb"serial_corr" takes a Wave object and
2830
{\tt lag}, which is the integer number of places to shift the wave.
2831
It computes the correlation of the wave with a shifted version
2832
of itself.
2833
2834
We can test this function with the noise signals from the previous
2835
chapter. We expect UU noise to be uncorrelated, based on the
2836
way it's generated (not to mention the name):
2837
2838
\begin{verbatim}
2839
signal = thinkdsp.UncorrelatedGaussianNoise()
2840
wave = signal.make_wave(duration=0.5, framerate=11025)
2841
serial_corr(wave)
2842
\end{verbatim}
2843
2844
When I ran this example, I got 0.006, which indicates a very
2845
small serial correlation. You might get a different value when you run
2846
it, but it should be comparably small.
2847
2848
In a Brownian noise signal, each value is the sum of the previous
2849
value and a random ``step'', so we expect a strong serial
2850
correlation:
2851
2852
\begin{verbatim}
2853
signal = thinkdsp.BrownianNoise()
2854
wave = signal.make_wave(duration=0.5, framerate=11025)
2855
serial_corr(wave)
2856
\end{verbatim}
2857
2858
Sure enough, the result I got is greater than 0.999.
2859
2860
\begin{figure}
2861
% autocorr.py
2862
\centerline{\includegraphics[height=2.5in]{figs/autocorr3.pdf}}
2863
\caption{Serial correlation for pink noise with a range of
2864
parameters.}
2865
\label{fig.autocorr3}
2866
\end{figure}
2867
2868
Since pink noise is in some sense between Brownian noise and UU noise,
2869
we might expect an intermediate correlation:
2870
2871
\begin{verbatim}
2872
signal = thinkdsp.PinkNoise(beta=1)
2873
wave = signal.make_wave(duration=0.5, framerate=11025)
2874
serial_corr(wave)
2875
\end{verbatim}
2876
2877
With parameter $\beta=1$, I got a serial correlation of 0.851.
2878
As we vary the parameter from $\beta=0$, which is uncorrelated
2879
noise, to $\beta=2$, which is Brownian, serial correlation
2880
ranges from 0 to almost 1, as shown in Figure~\ref{fig.autocorr3}.
2881
2882
2883
\section{Autocorrelation}
2884
\label{autopink}
2885
2886
In the previous section we computed the correlation between each
2887
value and the next, so we shifted the elements of the array by 1.
2888
But we can easily compute serial correlations with
2889
different lags.
2890
2891
\begin{figure}
2892
% autocorr.py
2893
\centerline{\includegraphics[height=2.5in]{figs/autocorr4.pdf}}
2894
\caption{Autocorrelation functions for pink noise with a range
2895
of parameters.}
2896
\label{fig.autocorr4}
2897
\end{figure}
2898
2899
You can think of \verb"serial_corr" as a function that
2900
maps from each value of lag to the corresponding correlation, and we
2901
can evaluate that function by looping through values of {\tt lag}:
2902
2903
\begin{verbatim}
2904
def autocorr(wave):
2905
lags = range(len(wave.ys)//2)
2906
corrs = [serial_corr(wave, lag) for lag in lags]
2907
return lags, corrs
2908
\end{verbatim}
2909
2910
{\tt autocorr} takes a Wave object and returns the autocorrelation
2911
function as a pair of sequences: {\tt lags} is a sequence of
2912
integers from 0 to half the length of the wave; {\tt corrs}
2913
is the sequence of serial correlations for each lag.
2914
2915
Figure~\ref{fig.autocorr4} shows autocorrelation functions for pink
2916
noise with three values of $\beta$. For low values of $\beta$, the
2917
signal is less correlated, and the autocorrelation function drops
2918
off to zero quickly. For larger values, serial correlation
2919
is stronger and drops off more slowly. With $\beta=1.7$ serial
2920
correlation is strong even for long lags; this phenomenon is called
2921
{\bf long-range dependence}, because it indicates that each value in
2922
the signal depends on many preceding values.
2923
2924
2925
\section{Autocorrelation of periodic signals}
2926
2927
The autocorrelation of pink noise has interesting mathematical
2928
properties, but limited applications. The autocorrelation of
2929
periodic signals is more useful.
2930
2931
\begin{figure}
2932
% autocorr.py
2933
\centerline{\includegraphics[height=2.5in]{figs/autocorr5.pdf}}
2934
\caption{Spectrogram of a vocal chirp.}
2935
\label{fig.autocorr5}
2936
\end{figure}
2937
2938
As an example, I downloaded from \url{freesound.org} a recording of
2939
someone singing a chirp; the repository for this book includes the
2940
file: \url{28042__bcjordan__voicedownbew.wav}. You can use the
2941
Jupyter notebook for this chapter, {\tt chap05.ipynb}, to play it.
2942
2943
Figure~\ref{fig.autocorr5} shows the spectrogram of this wave.
2944
The fundamental frequency and some of the harmonics show up clearly.
2945
The chirp starts near 500 Hz and drops down to about 300 Hz, roughly
2946
from C5 to E4.
2947
2948
\begin{figure}
2949
% autocorr.py
2950
\centerline{\includegraphics[height=2.5in]{figs/autocorr6.pdf}}
2951
\caption{Spectrum of a segment from a vocal chirp.}
2952
\label{fig.autocorr6}
2953
\end{figure}
2954
2955
To estimate pitch at a particular point in time, we could use the
2956
spectrum, but it doesn't work very well. To see why not, I'll take
2957
a short segment from the wave and plot its spectrum:
2958
2959
\begin{verbatim}
2960
duration = 0.01
2961
segment = wave.segment(start=0.2, duration=duration)
2962
spectrum = segment.make_spectrum()
2963
spectrum.plot(high=1000)
2964
\end{verbatim}
2965
2966
This segment starts at 0.2 seconds and lasts 0.01 seconds.
2967
Figure~\ref{fig.autocorr6} shows its spectrum. There is a clear peak
2968
near 400 Hz, but it is hard to identify the pitch precisely. The
2969
length of the segment is 441 samples at a framerate of 44100 Hz, so
2970
the frequency resolution is 100 Hz (see Section~\ref{gabor}).
2971
That means the estimated pitch might be off by 50 Hz; in musical
2972
terms, the range from 350 Hz to 450 Hz is about 5 semitones, which is
2973
a big difference!
2974
2975
We could get better frequency resolution by taking a longer segment,
2976
but since the pitch is changing over time, we would also get ``motion
2977
blur''; that is, the peak would spread between the start and end pitch of
2978
the segment, as we saw in Section~\ref{sauron}.
2979
2980
We can estimate pitch more precisely using autocorrelation.
2981
If a signal is periodic, we expect the autocorrelation to spike
2982
when the lag equals the period.
2983
2984
\begin{figure}
2985
% autocorr.py
2986
\centerline{\includegraphics[height=2.5in]{figs/autocorr7.pdf}}
2987
\caption{Two segments from a chirp, one starting 0.0023 seconds
2988
after the other.}
2989
\label{fig.autocorr7}
2990
\end{figure}
2991
2992
To show why that works, I'll plot two segments from the same
2993
recording.
2994
2995
\begin{verbatim}
2996
def plot_shifted(wave, offset=0.001, start=0.2):
2997
thinkplot.preplot(2)
2998
segment1 = wave.segment(start=start, duration=0.01)
2999
segment1.plot(linewidth=2, alpha=0.8)
3000
3001
segment2 = wave.segment(start=start-offset, duration=0.01)
3002
segment2.shift(offset)
3003
segment2.plot(linewidth=2, alpha=0.4)
3004
3005
corr = segment1.corr(segment2)
3006
text = r'$\rho =$ %.2g' % corr
3007
thinkplot.text(segment1.start+0.0005, -0.8, text)
3008
thinkplot.config(xlabel='Time (s)')
3009
\end{verbatim}
3010
3011
One segment starts at 0.2 seconds; the other starts 0.0023 seconds
3012
later. Figure~\ref{fig.autocorr7} shows the result. The segments
3013
are similar, and their correlation is 0.99. This result suggests
3014
that the period is near 0.0023 seconds, which corresponds to a frequency
3015
of 435 Hz.
3016
3017
\begin{figure}
3018
% autocorr.py
3019
\centerline{\includegraphics[height=2.5in]{figs/autocorr8.pdf}}
3020
\caption{Autocorrelation function for a segment from a chirp.}
3021
\label{fig.autocorr8}
3022
\end{figure}
3023
3024
For this example, I estimated the period by trial and error. To automate
3025
the process, we can use the autocorrelation function.
3026
3027
\begin{verbatim}
3028
lags, corrs = autocorr(segment)
3029
thinkplot.plot(lags, corrs)
3030
\end{verbatim}
3031
3032
Figure~\ref{fig.autocorr8} shows the autocorrelation function for
3033
the segment starting at $t=0.2$ seconds. The first peak occurs at
3034
{\tt lag=101}. We can compute the frequency that corresponds
3035
to that period like this:
3036
3037
\begin{verbatim}
3038
period = lag / segment.framerate
3039
frequency = 1 / period
3040
\end{verbatim}
3041
3042
The estimated fundamental frequency is 437 Hz. To evaluate the
3043
precision of the estimate, we can run the same computation with
3044
lags 100 and 102, which correspond to frequencies 432 and 441 Hz.
3045
The frequency precision using autocorrelation is less than 10 Hz,
3046
compared with 100 Hz using the spectrum. In musical terms, the
3047
expected error is about 30 cents (a third of a semitone).
3048
3049
3050
\section{Correlation as dot product}
3051
\label{dotproduct}
3052
3053
I started the chapter with this definition of Pearson's
3054
correlation coefficient:
3055
%
3056
\[ \rho = \frac{ \sum_i (x_i - \mu_x) (y_i - \mu_y)}{N \sigma_x \sigma_y} \]
3057
%
3058
Then I used $\rho$ to define serial correlation and autocorrelation.
3059
That's consistent with how these terms are used in statistics,
3060
but in the context of signal processing, the definitions are
3061
a little different.
3062
3063
In signal processing, we are often working with unbiased signals,
3064
where the mean is 0, and normalized signals, where the standard
3065
deviation is 1. In that case, the definition of $\rho$ simplifies to:
3066
%
3067
\[ \rho = \frac{1}{N} \sum_i x_i y_i \]
3068
%
3069
And it is common to simplify even further:
3070
%
3071
\[ r = \sum_i x_i y_i \]
3072
%
3073
This definition of correlation is not ``standardized'', so it doesn't
3074
generally fall between -1 and 1. But it has other useful properties.
3075
3076
If you think of $x$ and $y$ as vectors, you might recognize this
3077
formula as the {\bf dot product}, $x \cdot y$. See
3078
\url{http://en.wikipedia.org/wiki/Dot_product}.
3079
3080
\newcommand{\norm}{\mathrm{norm}}
3081
3082
The dot product indicates the degree to which the signals are similar.
3083
If they are normalized so their standard deviations are 1,
3084
%
3085
\[ x \cdot y = \cos \theta \]
3086
%
3087
where $\theta$ is the angle between the vectors. And that explains
3088
why Figure~\ref{fig.autocorr2} is a cosine curve.
3089
3090
3091
\section{Using NumPy}
3092
\label{correlate}
3093
3094
\begin{figure}
3095
% autocorr.py
3096
\centerline{\includegraphics[height=2.5in]{figs/autocorr9.pdf}}
3097
\caption{Autocorrelation function computed with {\tt np.correlate}.}
3098
\label{fig.autocorr9}
3099
\end{figure}
3100
3101
NumPy provides a function, {\tt correlate}, that computes
3102
the correlation of two functions or the autocorrelation of one
3103
function. We can use it to compute the autocorrelation of
3104
the segment from the previous section:
3105
3106
\begin{verbatim}
3107
corrs2 = np.correlate(segment.ys, segment.ys, mode='same')
3108
\end{verbatim}
3109
3110
The option {\tt mode} tells {\tt correlate} what range
3111
of {\tt lag} to use. With the value {\tt 'same'}, the
3112
range is from $-N/2$ to $N/2$, where $N$ is the length of the
3113
wave array.
3114
3115
Figure~\ref{fig.autocorr9} shows the result. It is symmetric because
3116
the two signals are identical, so a negative lag on one has the same
3117
effect as a positive lag on the other. To compare with the results
3118
from {\tt autocorr}, we can select the second half:
3119
3120
\begin{verbatim}
3121
N = len(corrs2)
3122
half = corrs2[N//2:]
3123
\end{verbatim}
3124
3125
If you compare Figure~\ref{fig.autocorr9} to Figure~\ref{fig.autocorr8},
3126
you'll notice that the correlations computed by {\tt np.correlate}
3127
get smaller as the lags increase. That's because {\tt np.correlate}
3128
uses the unstandardized definition of correlation;
3129
as the lag gets bigger, the
3130
overlap between the two signals gets smaller, so the magnitude of
3131
the correlations decreases.
3132
3133
We can correct that by dividing through by the lengths:
3134
3135
\begin{verbatim}
3136
lengths = range(N, N//2, -1)
3137
half /= lengths
3138
\end{verbatim}
3139
3140
Finally, we can standardize the results so the correlation with
3141
{\tt lag=0} is 1.
3142
3143
\begin{verbatim}
3144
half /= half[0]
3145
\end{verbatim}
3146
3147
With these adjustments, the results computed by {\tt autocorr} and
3148
{\tt np.correlate} are nearly the same. They still differ by
3149
1-2\%. The reason is not important, but if you are curious: {\tt autocorr}
3150
standardizes the correlations independently for each lag; for
3151
{\tt np.correlate}, we standardized them all at the end.
3152
3153
More importantly, now you know what autocorrelation is, how to
3154
use it to estimate the fundamental period of a signal, and two
3155
ways to compute it.
3156
3157
3158
\section{Exercises}
3159
3160
Solutions to these exercises are in {\tt chap05soln.ipynb}.
3161
3162
\begin{exercise}
3163
The Jupyter notebook for this chapter, {\tt chap05.ipynb}, includes
3164
an interaction that lets you compute autocorrelations for different
3165
lags. Use this interaction to estimate the pitch of the vocal chirp
3166
for a few different start times.
3167
\end{exercise}
3168
3169
3170
\begin{exercise}
3171
The example code in \verb"chap05.ipynb" shows how to use autocorrelation
3172
to estimate the fundamental frequency of a periodic signal.
3173
Encapsulate this code in a function called \verb"estimate_fundamental",
3174
and use it to track the pitch of a recorded sound.
3175
3176
To see how well it works, try superimposing your pitch estimates on a
3177
spectrogram of the recording.
3178
\end{exercise}
3179
3180
3181
\begin{exercise}
3182
If you did the exercises in the previous chapter, you downloaded
3183
the historical price of BitCoins and estimated the power spectrum
3184
of the price changes. Using the same data, compute the autocorrelation
3185
of BitCoin prices. Does the autocorrelation function drop off quickly?
3186
Is there evidence of periodic behavior?
3187
\end{exercise}
3188
3189
3190
\begin{exercise}
3191
In the repository for this book you will find a Jupyter notebook
3192
called \verb"saxophone.ipynb" that explores autocorrelation,
3193
pitch perception, and a phenomenon called the {\bf missing fundamental}.
3194
Read through this notebook and run the examples. Try selecting
3195
a different segment of the recording and running the examples again.
3196
3197
Vi Hart has an excellent video called ``What is up with Noises? (The
3198
Science and Mathematics of Sound, Frequency, and Pitch)''; it
3199
demonstrates the missing fundamental phenomenon and explains how pitch
3200
perception works (at least, to the degree that we know). Watch it at
3201
\url{https://www.youtube.com/watch?v=i_0DXxNeaQ0}.
3202
3203
3204
3205
\end{exercise}
3206
3207
3208
3209
\chapter{Discrete cosine transform}
3210
\label{dct}
3211
3212
The topic of this chapter is the {\bf Discrete Cosine
3213
Transform} (DCT), which is used in MP3 and related formats for
3214
compressing music; JPEG and similar formats for images; and the MPEG
3215
family of formats for video.
3216
3217
DCT is similar in many ways to the Discrete Fourier Transform (DFT),
3218
which we have been using for spectral analysis.
3219
Once we learn how DCT works, it will be easier to explain DFT.
3220
3221
Here are the steps to get there:
3222
3223
\begin{enumerate}
3224
3225
\item We'll start with the synthesis problem: given a set of frequency
3226
components and their amplitudes, how can we construct a wave?
3227
3228
\item Next we'll rewrite the synthesis problem using NumPy arrays.
3229
This move is good for performance, and also provides insight
3230
for the next step.
3231
3232
\item We'll look at the analysis problem: given a signal and a
3233
set of frequencies, how can we find the amplitude of each frequency
3234
component? We'll start with a solution that is conceptually simple
3235
but slow.
3236
3237
\item Finally, we'll use some principles from linear algebra to find a
3238
more efficient algorithm. If you already know linear algebra,
3239
that's great, but I'll explain what you need as we go.
3240
3241
\end{enumerate}
3242
3243
The code for this chapter is in {\tt
3244
chap06.ipynb} which is in the repository for this book (see
3245
Section~\ref{code}).
3246
You can also view it at \url{http://tinyurl.com/thinkdsp06}.
3247
3248
3249
\section{Synthesis}
3250
\label{synth1}
3251
3252
Suppose I give you a list of amplitudes and a list of frequencies,
3253
and ask you to construct a signal that is the sum of these frequency
3254
components. Using objects in the {\tt thinkdsp} module, there is
3255
a simple way to perform this operation, which is called {\bf synthesis}:
3256
3257
\begin{verbatim}
3258
def synthesize1(amps, fs, ts):
3259
components = [thinkdsp.CosSignal(freq, amp)
3260
for amp, freq in zip(amps, fs)]
3261
signal = thinkdsp.SumSignal(*components)
3262
3263
ys = signal.evaluate(ts)
3264
return ys
3265
\end{verbatim}
3266
3267
{\tt amps} is a list of amplitudes, {\tt fs} is the list
3268
of frequencies, and {\tt ts} is the sequence
3269
of times where the signal should be evaluated.
3270
3271
{\tt components} is a list of {\tt CosSignal} objects, one for
3272
each amplitude-frequency pair. {\tt SumSignal} represents the
3273
sum of these frequency components.
3274
3275
Finally, {\tt evaluate} computes the value of the signal at each
3276
time in {\tt ts}.
3277
3278
We can test this function like this:
3279
3280
\begin{verbatim}
3281
amps = np.array([0.6, 0.25, 0.1, 0.05])
3282
fs = [100, 200, 300, 400]
3283
framerate = 11025
3284
3285
ts = np.linspace(0, 1, framerate)
3286
ys = synthesize1(amps, fs, ts)
3287
wave = thinkdsp.Wave(ys, framerate)
3288
\end{verbatim}
3289
3290
This example makes a signal that contains a fundamental frequency at
3291
100 Hz and three harmonics (100 Hz is a sharp G2). It renders the
3292
signal for one second at 11,025 frames per second and puts the results
3293
into a Wave object.
3294
3295
Conceptually, synthesis is pretty simple. But in this form it doesn't
3296
help much with {\bf analysis}, which is the inverse problem: given the
3297
wave, how do we identify the frequency components and their amplitudes?
3298
3299
3300
\section{Synthesis with arrays}
3301
\label{synthesis}
3302
3303
\begin{figure}
3304
\centerline{\includegraphics[height=2.5in]{figs/diagram1.pdf}}
3305
\caption{Synthesis with arrays.}
3306
\label{fig.synthesis}
3307
\end{figure}
3308
3309
Here's another way to write {\tt synthesize}:
3310
3311
\begin{verbatim}
3312
def synthesize2(amps, fs, ts):
3313
args = np.outer(ts, fs)
3314
M = np.cos(PI2 * args)
3315
ys = np.dot(M, amps)
3316
return ys
3317
\end{verbatim}
3318
3319
This function looks very different, but it does the same thing.
3320
Let's see how it works:
3321
3322
\begin{enumerate}
3323
3324
\item {\tt np.outer} computes the outer product of {\tt ts} and
3325
{\tt fs}. The result is an array with one row for each element
3326
of {\tt ts} and one column for each element of {\tt fs}. Each
3327
element in the array is the product of a frequency and a time, $f
3328
t$.
3329
3330
\item We multiply {\tt args} by $2 \pi$ and apply {\tt cos}, so each
3331
element of the result is $\cos (2 \pi f t)$. Since the {\tt ts} run
3332
down the columns, each column contains a cosine signal at a
3333
particular frequency, evaluated at a sequence of times.
3334
3335
\item {\tt np.dot} multiplies each row of {\tt M} by {\tt amps},
3336
element-wise, and then adds up the products. In terms of linear
3337
algebra, we are multiplying a matrix, {\tt M}, by a vector, {\tt
3338
amps}. In terms of signals, we are computing the weighted sum
3339
of frequency components.
3340
3341
\end{enumerate}
3342
3343
Figure~\ref{fig.synthesis} shows the structure of this computation.
3344
Each row of the matrix, {\tt M}, corresponds to a time
3345
from 0.0 to 1.0 seconds; $t_n$ is the time of the $n$th row.
3346
Each column corresponds to a frequency from
3347
100 to 400 Hz; $f_k$ is the frequency of the $k$th column.
3348
3349
I labeled the $n$th row with the letters $a$ through $d$; as an
3350
example, the value of $a$ is $\cos [2 \pi (100) t_n]$.
3351
3352
The result of the dot product, {\tt ys}, is a vector with one element
3353
for each row of {\tt M}. The $n$th element, labeled $e$, is the sum
3354
of products:
3355
%
3356
\[ e = 0.6 a + 0.25 b + 0.1 c + 0.05 d \]
3357
%
3358
And likewise with the other elements of {\tt ys}. So each element
3359
of {\tt y} is the sum of four frequency components, evaluated at
3360
a point in time, and multiplied by the corresponding amplitudes.
3361
And that's exactly what we wanted.
3362
3363
We can use the code from the previous section to check that the two
3364
versions of {\tt synthesize} produce the same results.
3365
3366
\begin{verbatim}
3367
ys1 = synthesize1(amps, fs, ts)
3368
ys2 = synthesize2(amps, fs, ts)
3369
max(abs(ys1 - ys2))
3370
\end{verbatim}
3371
3372
The biggest difference between {\tt ys1} and {\tt ys2} is about {\tt
3373
1e-13}, which is what we expect due to floating-point errors.
3374
3375
Writing this computation in terms of linear algebra makes the code
3376
smaller and faster. Linear algebra
3377
provides concise notation for operations on matrices and vectors. For
3378
example, we could write {\tt synthesize} like this:
3379
%
3380
\begin{eqnarray*}
3381
M &=& \cos (2 \pi t \otimes f) \\
3382
y &=& M a
3383
\end{eqnarray*}
3384
%
3385
where $a$ is a vector of amplitudes,
3386
$t$ is a vector of times, $f$ is a vector of frequencies, and
3387
$\otimes$ is the symbol for the outer product of two vectors.
3388
3389
3390
\section{Analysis}
3391
\label{analysis}
3392
3393
Now we are ready to solve the analysis problem. Suppose I give you
3394
a wave and tell you that it is the sum of cosines with a given set
3395
of frequencies. How would you find the amplitude for each frequency
3396
component? In other words, given {\tt ys}, {\tt ts} and {\tt fs},
3397
can you recover {\tt amps}?
3398
3399
In terms of linear algebra, the first step is the same as for
3400
synthesis: we compute $M = \cos (2 \pi t \otimes f)$. Then we want
3401
to find $a$ so that $y = M a$; in other words, we want to solve a
3402
linear system. NumPy provides {\tt linalg.solve}, which does
3403
exactly that.
3404
3405
Here's what the code looks like:
3406
3407
\begin{verbatim}
3408
def analyze1(ys, fs, ts):
3409
args = np.outer(ts, fs)
3410
M = np.cos(PI2 * args)
3411
amps = np.linalg.solve(M, ys)
3412
return amps
3413
\end{verbatim}
3414
3415
The first two lines use {\tt ts} and {\tt fs} to build the
3416
matrix, {\tt M}. Then {\tt np.linalg.solve} computes {\tt amps}.
3417
3418
But there's a hitch. In general we can only solve a system of linear
3419
equations if the matrix is square; that is, the number of equations
3420
(rows) is the same as the number of unknowns (columns).
3421
3422
In this example, we have only 4 frequencies, but we evaluated the
3423
signal at 11,025 times. So we have many more equations than unknowns.
3424
3425
In general if {\tt ys}
3426
contains more than 4 elements, it is unlikely that we can analyze it
3427
using only 4 frequencies.
3428
3429
But in this case, we know that the {\tt ys} were actually generated by
3430
adding only 4 frequency components, so we can use any 4 values from
3431
the wave array to recover {\tt amps}.
3432
3433
For simplicity, I'll use the first 4 samples from the signal.
3434
Using the values of {\tt ys}, {\tt fs} and {\tt ts} from
3435
the previous section, we can run {\tt analyze1} like this:
3436
3437
\begin{verbatim}
3438
n = len(fs)
3439
amps2 = analyze1(ys[:n], fs, ts[:n])
3440
\end{verbatim}
3441
3442
And sure enough, {\tt amps2} is
3443
3444
\begin{verbatim}
3445
[ 0.6 0.25 0.1 0.05 ]
3446
\end{verbatim}
3447
3448
This algorithm works, but it is slow. Solving a linear
3449
system of equations takes time proportional to $n^3$, where $n$ is
3450
the number of columns in $M$. We can do better.
3451
3452
3453
\section{Orthogonal matrices}
3454
3455
One way to solve linear systems is by inverting matrices. The
3456
inverse of a matrix $M$ is written $M^{-1}$, and it has the property
3457
that $M^{-1}M = I$. $I$ is the identity matrix, which has
3458
the value 1 on all diagonal elements and 0 everywhere else.
3459
3460
So, to solve the equation $y = Ma$, we can multiply both sides by
3461
$M^{-1}$, which yields:
3462
%
3463
\[ M^{-1}y = M^{-1} M a \]
3464
%
3465
On the right side, we can replace $M^{-1}M$ with $I$:
3466
%
3467
\[ M^{-1}y = I a \]
3468
%
3469
If we multiply $I$ by any vector $a$, the result is $a$, so
3470
%
3471
\[ M^{-1}y = a \]
3472
%
3473
This implies that if we can compute $M^{-1}$ efficiently, we can find
3474
$a$ with a simple matrix multiplication (using {\tt np.dot}). That
3475
takes time proportional to $n^2$, which is better than $n^3$.
3476
3477
Inverting a matrix is slow, in general, but some special cases are
3478
faster. In particular, if $M$ is {\bf orthogonal}, the inverse of $M$
3479
is just the transpose of $M$, written $M^T$. In NumPy
3480
transposing an array is a constant-time operation. It
3481
doesn't actually move the elements of the array; instead, it creates a
3482
``view'' that changes the way the elements are accessed.
3483
3484
Again, a matrix is orthogonal if its transpose is also its inverse;
3485
that is, $M^T = M^{-1}$. That implies that $M^TM = I$, which means we
3486
can check whether a matrix is orthogonal by computing $M^TM$.
3487
3488
So let's see what the matrix looks like in {\tt synthesize2}. In
3489
the previous example, $M$ has 11,025 rows, so it might be a good idea
3490
to work with a smaller example:
3491
3492
\begin{verbatim}
3493
def test1():
3494
amps = np.array([0.6, 0.25, 0.1, 0.05])
3495
N = 4.0
3496
time_unit = 0.001
3497
ts = np.arange(N) / N * time_unit
3498
max_freq = N / time_unit / 2
3499
fs = np.arange(N) / N * max_freq
3500
ys = synthesize2(amps, fs, ts)
3501
\end{verbatim}
3502
3503
{\tt amps} is the same vector of amplitudes we saw before.
3504
Since we have 4 frequency components, we'll sample the signal
3505
at 4 points in time. That way, $M$ is square.
3506
3507
{\tt ts} is a vector of equally spaced sample times in the range from
3508
0 to 1 time unit. I chose the time unit to be 1 millisecond, but it
3509
is an arbitrary choice, and we will see in a minute that it drops out
3510
of the computation anyway.
3511
3512
Since the frame rate is $N$ samples per time unit, the Nyquist
3513
frequency is \verb"N / time_unit / 2", which is 2000 Hz in this
3514
example. So {\tt fs} is a vector of equally spaced frequencies
3515
between 0 and 2000 Hz.
3516
3517
With these values of {\tt ts} and {\tt fs}, the matrix, $M$, is:
3518
3519
\begin{verbatim}
3520
[[ 1. 1. 1. 1. ]
3521
[ 1. 0.707 0. -0.707]
3522
[ 1. 0. -1. -0. ]
3523
[ 1. -0.707 -0. 0.707]]
3524
\end{verbatim}
3525
3526
You might recognize 0.707 as an approximation of $\sqrt{2}/2$,
3527
which is $\cos \pi/4$. You also might notice that this matrix
3528
is {\bf symmetric}, which means that the element at $(j, k)$ always
3529
equals the element at $(k, j)$. This implies that $M$ is its own
3530
transpose; that is, $M^T = M$.
3531
3532
But sadly, $M$ is not orthogonal. If we compute $M^TM$, we get:
3533
3534
\begin{verbatim}
3535
[[ 4. 1. -0. 1.]
3536
[ 1. 2. 1. -0.]
3537
[-0. 1. 2. 1.]
3538
[ 1. -0. 1. 2.]]
3539
\end{verbatim}
3540
3541
And that's not the identity matrix.
3542
3543
3544
\section{DCT-IV}
3545
\label{dctiv}
3546
3547
But if we choose {\tt ts} and {\tt fs} carefully,
3548
we can make $M$ orthogonal. There are several ways to do it, which
3549
is why there are several versions of the discrete cosine transform (DCT).
3550
3551
One simple option is to shift {\tt ts} and {\tt fs} by a half unit.
3552
This version is called DCT-IV, where ``IV'' is a roman numeral
3553
indicating that this is the fourth of eight versions of the DCT.
3554
3555
Here's an updated version of {\tt test1}:
3556
3557
\begin{verbatim}
3558
def test2():
3559
amps = np.array([0.6, 0.25, 0.1, 0.05])
3560
N = 4.0
3561
ts = (0.5 + np.arange(N)) / N
3562
fs = (0.5 + np.arange(N)) / 2
3563
ys = synthesize2(amps, fs, ts)
3564
\end{verbatim}
3565
3566
If you compare this to the previous version, you'll notice
3567
two changes. First, I added 0.5 to {\tt ts} and {\tt fs}.
3568
Second, I canceled out \verb"time_units", which simplifies
3569
the expression for {\tt fs}.
3570
3571
With these values, $M$ is
3572
3573
\begin{verbatim}
3574
[[ 0.981 0.831 0.556 0.195]
3575
[ 0.831 -0.195 -0.981 -0.556]
3576
[ 0.556 -0.981 0.195 0.831]
3577
[ 0.195 -0.556 0.831 -0.981]]
3578
\end{verbatim}
3579
3580
And $M^TM$ is
3581
3582
\begin{verbatim}
3583
[[ 2. 0. 0. 0.]
3584
[ 0. 2. -0. 0.]
3585
[ 0. -0. 2. -0.]
3586
[ 0. 0. -0. 2.]]
3587
\end{verbatim}
3588
3589
Some of the off-diagonal elements are displayed as -0, which means
3590
that the floating-point representation is a small negative number.
3591
This matrix is very close to $2I$, which means $M$ is almost
3592
orthogonal; it's just off by a factor of 2. And for our purposes,
3593
that's good enough.
3594
3595
Because $M$ is symmetric and (almost) orthogonal, the inverse of $M$
3596
is just $M/2$. Now we can write a more efficient version of {\tt
3597
analyze}:
3598
3599
\begin{verbatim}
3600
def analyze2(ys, fs, ts):
3601
args = np.outer(ts, fs)
3602
M = np.cos(PI2 * args)
3603
amps = np.dot(M, ys) / 2
3604
return amps
3605
\end{verbatim}
3606
3607
Instead of using {\tt np.linalg.solve}, we just multiply
3608
by $M/2$.
3609
3610
Combining {\tt test2} and {\tt analyze2}, we can write an
3611
implementation of DCT-IV:
3612
3613
\begin{verbatim}
3614
def dct_iv(ys):
3615
N = len(ys)
3616
ts = (0.5 + np.arange(N)) / N
3617
fs = (0.5 + np.arange(N)) / 2
3618
args = np.outer(ts, fs)
3619
M = np.cos(PI2 * args)
3620
amps = np.dot(M, ys) / 2
3621
return amps
3622
\end{verbatim}
3623
3624
Again, {\tt ys} is the wave array. We don't have to pass
3625
{\tt ts} and {\tt fs} as parameters; \verb"dct_iv" can
3626
figure them out based on {\tt N}, the length of {\tt ys}.
3627
3628
If we've got it right, this function should solve the analysis
3629
problem; that is, given {\tt ys} it should be able to recover
3630
{\tt amps}. We can test it like this.
3631
3632
\begin{verbatim}
3633
amps = np.array([0.6, 0.25, 0.1, 0.05])
3634
N = 4.0
3635
ts = (0.5 + np.arange(N)) / N
3636
fs = (0.5 + np.arange(N)) / 2
3637
ys = synthesize2(amps, fs, ts)
3638
amps2 = dct_iv(ys)
3639
max(abs(amps - amps2))
3640
\end{verbatim}
3641
3642
Starting with {\tt amps}, we synthesize a wave array, then use
3643
\verb"dct_iv" to compute {\tt amps2}. The biggest
3644
difference between {\tt amps} and {\tt amps2} is about {\tt 1e-16},
3645
which is what we expect due to floating-point errors.
3646
3647
3648
\section{Inverse DCT}
3649
3650
Finally, notice that {\tt analyze2} and {\tt synthesize2} are almost
3651
identical. The only difference is that {\tt analyze2} divides the
3652
result by 2. We can use this insight to compute the inverse DCT:
3653
3654
\begin{verbatim}
3655
def inverse_dct_iv(amps):
3656
return dct_iv(amps) * 2
3657
\end{verbatim}
3658
3659
\verb"inverse_dct_iv" solves the synthesis problem: it
3660
takes the vector of amplitudes and returns
3661
the wave array, {\tt ys}. We can test it by starting
3662
with {\tt amps}, applying \verb"inverse_dct_iv" and \verb"dct_iv",
3663
and testing that we get back what we started with.
3664
3665
\begin{verbatim}
3666
amps = [0.6, 0.25, 0.1, 0.05]
3667
ys = inverse_dct_iv(amps)
3668
amps2 = dct_iv(ys)
3669
max(abs(amps - amps2))
3670
\end{verbatim}
3671
3672
Again, the biggest difference is about {\tt 1e-16}.
3673
3674
3675
\section{The Dct class}
3676
3677
\begin{figure}
3678
% dct.py
3679
\centerline{\includegraphics[height=2.5in]{figs/dct1.pdf}}
3680
\caption{DCT of a triangle signal at 400 Hz, sampled at 10 kHz.}
3681
\label{fig.dct1}
3682
\end{figure}
3683
3684
{\tt thinkdsp} provides a Dct class that encapsulates the
3685
DCT in the same way the Spectrum class encapsulates the FFT.
3686
To make a Dct object, you can invoke \verb"make_dct" on a Wave.
3687
3688
\begin{verbatim}
3689
signal = thinkdsp.TriangleSignal(freq=400)
3690
wave = signal.make_wave(duration=1.0, framerate=10000)
3691
dct = wave.make_dct()
3692
dct.plot()
3693
\end{verbatim}
3694
3695
The result is the DCT of a triangle wave at 400 Hz, shown in
3696
Figure~\ref{fig.dct1}. The values of the DCT can be positive or negative;
3697
a negative value in the DCT corresponds to a negated cosine or,
3698
equivalently, to a cosine shifted by 180 degrees.
3699
3700
\verb"make_dct" uses DCT-II, which is the most common type of DCT,
3701
provided by {\tt scipy.fftpack}.
3702
3703
\begin{verbatim}
3704
import scipy.fftpack
3705
3706
# class Wave:
3707
def make_dct(self):
3708
N = len(self.ys)
3709
hs = scipy.fftpack.dct(self.ys, type=2)
3710
fs = (0.5 + np.arange(N)) / 2
3711
return Dct(hs, fs, self.framerate)
3712
\end{verbatim}
3713
3714
The results from {\tt dct} are stored in {\tt hs}. The corresponding
3715
frequencies, computed as in Section~\ref{dctiv}, are stored in {\tt fs}.
3716
And then both are used to initialize the Dct object.
3717
3718
Dct provides \verb"make_wave", which performs the inverse DCT.
3719
We can test it like this:
3720
3721
\begin{verbatim}
3722
wave2 = dct.make_wave()
3723
max(abs(wave.ys-wave2.ys))
3724
\end{verbatim}
3725
3726
The biggest difference between {\tt ys1} and {\tt ys2} is about {\tt
3727
1e-16}, which is what we expect due to floating-point errors.
3728
3729
\verb"make_wave" uses {\tt scipy.fftpack.idct}:
3730
3731
\begin{verbatim}
3732
# class Dct
3733
def make_wave(self):
3734
n = len(self.hs)
3735
ys = scipy.fftpack.idct(self.hs, type=2) / 2 / n
3736
return Wave(ys, framerate=self.framerate)
3737
\end{verbatim}
3738
3739
Be default, the inverse DCT doesn't normalize the result, so we have
3740
to divide through by $2N$.
3741
3742
3743
\section{Exercises}
3744
3745
For the following exercises, I provide some starter code in
3746
{\tt chap06starter.ipynb}.
3747
Solutions are in {\tt chap06soln.ipynb}.
3748
3749
\begin{exercise}
3750
In this chapter I claim that {\tt analyze1} takes time proportional
3751
to $n^3$ and {\tt analyze2} takes time proportional to $n^2$. To
3752
see if that's true, run them on a range of input sizes and time
3753
them. In Jupyter, you can use the ``magic command'' \verb"%timeit".
3754
3755
If you plot run time versus input size on a log-log scale, you
3756
should get a straight line with slope 3 for {\tt analyze1} and
3757
slope 2 for {\tt analyze2}.
3758
3759
You also might want to test \verb"dct_iv"
3760
and {\tt scipy.fftpack.dct}.
3761
3762
\end{exercise}
3763
3764
3765
\begin{exercise}
3766
One of the major applications of the DCT is compression for both
3767
sound and images. In its simplest form, DCT-based compression
3768
works like this:
3769
3770
\begin{enumerate}
3771
3772
\item Break a long signal into segments.
3773
3774
\item Compute the DCT of each segment.
3775
3776
\item Identify frequency components with amplitudes so low they are
3777
inaudible, and remove them. Store only the frequencies and
3778
amplitudes that remain.
3779
3780
\item To play back the signal, load the frequencies and amplitudes
3781
for each segment and apply the inverse DCT.
3782
3783
\end{enumerate}
3784
3785
Implement a version of this algorithm and apply it to a recording
3786
of music or speech. How many components can you eliminate before
3787
the difference is perceptible?
3788
3789
In order to make this method practical, you need some way to store a
3790
sparse array; that is, an array where most of the elements are zero.
3791
NumPy provides several implementations of sparse arrays, which you can
3792
read about at
3793
\url{http://docs.scipy.org/doc/scipy/reference/sparse.html}.
3794
\end{exercise}
3795
3796
3797
\begin{exercise}
3798
In the repository for this book you will find a Jupyter notebook
3799
called \verb"phase.ipynb" that explores the effect of phase on sound
3800
perception.
3801
Read through this notebook and run the examples.
3802
Choose another segment of sound and run the same experiments.
3803
Can you find any general relationships between the phase structure
3804
of a sound and how we perceive it?
3805
\end{exercise}
3806
3807
3808
3809
3810
\chapter{Discrete Fourier Transform}
3811
\label{dft}
3812
3813
We've been using the discrete Fourier transform (DFT) since
3814
Chapter~\ref{sounds}, but I haven't explained how it works. Now is
3815
the time.
3816
3817
If you understand the discrete cosine transform (DCT), you will
3818
understand the DFT. The only difference is that instead of using the
3819
cosine function, we'll use the complex exponential function. I'll
3820
start by explaining complex exponentials, then I'll follow the
3821
same progression as in Chapter~\ref{dct}:
3822
3823
\begin{enumerate}
3824
3825
\item We'll start with the synthesis
3826
problem: given a set of frequency components and their amplitudes,
3827
how can we construct a signal? The synthesis problem is
3828
equivalent to the inverse DFT.
3829
3830
\item Then I'll rewrite the synthesis problem in the form of matrix
3831
multiplication using NumPy arrays.
3832
3833
\item Next we'll solve the analysis problem, which is equivalent to
3834
the DFT: given a signal, how to we find the amplitude and phase
3835
offset of its frequency components?
3836
3837
\item Finally, we'll use linear algebra to find a more efficient way
3838
to compute the DFT.
3839
3840
\end{enumerate}
3841
3842
The code for this chapter is in {\tt chap07.ipynb}, which is in the
3843
repository for this book (see Section~\ref{code}).
3844
You can also view it at \url{http://tinyurl.com/thinkdsp07}.
3845
3846
3847
\section{Complex exponentials}
3848
3849
One of the more interesting moves in mathematics is the generalization
3850
of an operation from one type to another. For example, factorial is a
3851
function that operates on integers; the natural definition for
3852
factorial of $n$ is the product of all integers from 1 to $n$.
3853
3854
If you are of a certain inclination, you might wonder how to compute
3855
the factorial of a non-integer like 3.5. Since the natural definition
3856
doesn't apply, you might look for other ways to compute the factorial
3857
function, ways that would work with non-integers.
3858
3859
In 1730, Leonhard Euler found one, a generalization of the factorial
3860
function that we know as the gamma function (see
3861
\url{http://en.wikipedia.org/wiki/Gamma_function}).
3862
3863
Euler also found one of the most useful generalizations in applied
3864
mathematics, the complex exponential function.
3865
3866
The natural definition of exponentiation is repeated multiplication.
3867
For example, $\phi^3 = \phi \cdot \phi \cdot \phi$. But this
3868
definition doesn't apply to non-integer exponents.
3869
3870
However, exponentiation can also be expressed as a power series:
3871
%
3872
\[ e^\phi = 1 + \phi + \phi^2/2! + \phi^3/3! + ... \]
3873
%
3874
This definition works with real numbers, imaginary numbers and, by a simple
3875
extension, with complex numbers. Applying this definition
3876
to a pure imaginary number, $i\phi$, we get
3877
%
3878
\[ e^{i\phi} = 1 + i\phi - \phi^2/2! - i\phi^3/3! + ... \]
3879
%
3880
By rearranging terms, we can show that this is equivalent to:
3881
%
3882
\[ e^{i\phi} = \cos \phi + i \sin \phi \]
3883
%
3884
You can see the derivation at
3885
\url{http://en.wikipedia.org/wiki/Euler's_formula}.
3886
3887
This formula
3888
implies that $e^{i\phi}$ is a complex number with magnitude 1; if you
3889
think of it as a point in the complex plane, it is always on the unit
3890
circle. And if you think of it as a vector, the angle in radians
3891
between the vector and the positive x-axis is the argument, $\phi$.
3892
3893
In the case where the exponent is a complex number, we have:
3894
%
3895
\[ e^{a + i\phi} = e^a e^{i\phi} = A e^{i\phi} \]
3896
%
3897
where $A$ is a real number that indicates amplitude and
3898
$e^{i\phi}$ is a unit complex number that indicates angle.
3899
3900
NumPy provides a version of {\tt exp} that works with complex numbers:
3901
3902
\begin{verbatim}
3903
>>> phi = 1.5
3904
>>> z = np.exp(1j * phi)
3905
>>> z
3906
(0.0707+0.997j)
3907
\end{verbatim}
3908
3909
Python uses {\tt j} to represent the imaginary unit, rather
3910
than {\tt i}. A number ending in {\tt j} is considered imaginary,
3911
so {\tt 1j} is just $i$.
3912
3913
When the argument to {\tt np.exp} is imaginary or complex, the
3914
result is a complex number; specifically, a {\tt np.complex128},
3915
which is represented by two 64-bit floating-point numbers.
3916
In this example, the result is {\tt 0.0707+0.997j}.
3917
3918
Complex numbers have attributes {\tt real} and {\tt imag}:
3919
3920
\begin{verbatim}
3921
>>> z.real
3922
0.0707
3923
>>> z.imag
3924
0.997
3925
\end{verbatim}
3926
3927
To get the magnitude, you can use the built-in function {\tt abs}
3928
or {\tt np.absolute}:
3929
3930
\begin{verbatim}
3931
>>> abs(z)
3932
1.0
3933
>>> np.absolute(z)
3934
1.0
3935
\end{verbatim}
3936
3937
To get the angle, you can use {\tt np.angle}:
3938
3939
\begin{verbatim}
3940
>>> np.angle(z)
3941
1.5
3942
\end{verbatim}
3943
3944
This example confirms that $e^{i \phi}$ is a complex number with
3945
magnitude 1 and angle $\phi$ radians.
3946
3947
3948
\section{Complex signals}
3949
3950
If $\phi(t)$ is a function of time, $e^{i \phi(t)}$ is also a function
3951
of time. Specifically,
3952
%
3953
\[ e^{i \phi(t)} = \cos \phi(t) + i \sin \phi(t) \]
3954
%
3955
This function describes a quantity that varies in time, so it is
3956
a signal. Specifically, it is a {\bf complex exponential signal}.
3957
3958
In the special case where the frequency of the signal is constant,
3959
$\phi(t)$ is $2 \pi f t$ and the result is a {\bf complex sinusoid}:
3960
%
3961
\[ e^{i 2 \pi f t} = \cos 2 \pi f t + i \sin 2 \pi f t \]
3962
%
3963
Or more generally, the signal might start at a phase offset
3964
$\phi_0$, yielding
3965
%
3966
\[ e^{i (2 \pi f t + \phi_0)} \]
3967
%
3968
{\tt thinkdsp} provides an implementation of this signal,
3969
{\tt ComplexSinusoid}:
3970
3971
\begin{verbatim}
3972
class ComplexSinusoid(Sinusoid):
3973
3974
def evaluate(self, ts):
3975
phases = PI2 * self.freq * ts + self.offset
3976
ys = self.amp * np.exp(1j * phases)
3977
return ys
3978
\end{verbatim}
3979
3980
{\tt ComplexSinusoid} inherits \verb"__init__" from
3981
{\tt Sinusoid}. It provides a version of {\tt evaluate}
3982
that is almost identical to {\tt Sinusoid.evaluate}; the
3983
only difference is that it uses {\tt np.exp} instead of
3984
{\tt np.sin}.
3985
3986
The result is a NumPy array of complex numbers:
3987
3988
\begin{verbatim}
3989
>>> signal = thinkdsp.ComplexSinusoid(freq=1, amp=0.6, offset=1)
3990
>>> wave = signal.make_wave(duration=1, framerate=4)
3991
>>> wave.ys
3992
[ 0.324+0.505j -0.505+0.324j -0.324-0.505j 0.505-0.324j]
3993
\end{verbatim}
3994
3995
The frequency of this signal is 1 cycle per second; the amplitude
3996
is 0.6 (in unspecified units); and the phase offset is 1 radian.
3997
3998
This example evaluates the signal at 4 places equally spaced between
3999
0 and 1 second. The resulting samples are complex numbers.
4000
4001
4002
\section{The synthesis problem}
4003
4004
Just as we did with real sinusoids, we can create compound signals
4005
by adding up complex sinusoids with different frequencies. And that
4006
brings us to the complex version of the synthesis problem: given the
4007
frequency and amplitude of each complex component, how do we evaluate the
4008
signal?
4009
4010
The simplest solution is to create {\tt ComplexSinusoid} objects and
4011
add them up.
4012
4013
\begin{verbatim}
4014
def synthesize1(amps, fs, ts):
4015
components = [thinkdsp.ComplexSinusoid(freq, amp)
4016
for amp, freq in zip(amps, fs)]
4017
signal = thinkdsp.SumSignal(*components)
4018
ys = signal.evaluate(ts)
4019
return ys
4020
\end{verbatim}
4021
4022
This function is almost identical to {\tt synthesize1} in
4023
Section~\ref{synth1}; the only difference is that I replaced
4024
{\tt CosSignal} with {\tt ComplexSinusoid}.
4025
4026
Here's an example:
4027
4028
\begin{verbatim}
4029
amps = np.array([0.6, 0.25, 0.1, 0.05])
4030
fs = [100, 200, 300, 400]
4031
framerate = 11025
4032
ts = np.linspace(0, 1, framerate)
4033
ys = synthesize1(amps, fs, ts)
4034
\end{verbatim}
4035
4036
The result is:
4037
4038
\begin{verbatim}
4039
[ 1.000 +0.000e+00j 0.995 +9.093e-02j 0.979 +1.803e-01j ...,
4040
0.979 -1.803e-01j 0.995 -9.093e-02j 1.000 -5.081e-15j]
4041
\end{verbatim}
4042
4043
At the lowest level, a complex signal is a sequence of complex
4044
numbers. But how should we interpret it? We have some intuition for
4045
real signals: they represent quantities that vary in time; for
4046
example, a sound signal represents changes in air pressure.
4047
But nothing we measure in the world yields complex numbers.
4048
4049
\begin{figure}
4050
% dft.py
4051
\centerline{\includegraphics[height=2.5in]{figs/dft1.pdf}}
4052
\caption{Real and imaginary parts of a mixture of complex sinusoids.}
4053
\label{fig.dft1}
4054
\end{figure}
4055
4056
So what is a complex signal? I don't have a satisfying answer to this
4057
question. The best I can offer is two unsatisfying
4058
answers:
4059
4060
\begin{enumerate}
4061
4062
\item A complex signal is a mathematical abstraction that is useful
4063
for computation and analysis, but it does not correspond directly
4064
with anything in the real world.
4065
4066
\item If you like, you can think of a complex signal as a sequence of
4067
complex numbers that contains two signals as its real and imaginary
4068
parts.
4069
4070
\end{enumerate}
4071
4072
Taking the second point of view, we can split the previous
4073
signal into its real and imaginary parts:
4074
4075
\begin{verbatim}
4076
n = 500
4077
thinkplot.plot(ts[:n], ys[:n].real, label='real')
4078
thinkplot.plot(ts[:n], ys[:n].imag, label='imag')
4079
\end{verbatim}
4080
4081
Figure~\ref{fig.dft1} shows a segment of the result. The
4082
real part is a sum of cosines; the imaginary part is
4083
a sum of sines. Although the waveforms look different, they
4084
contain the same frequency components in the same proportions.
4085
To our ears, they sound the same (in general, we don't hear
4086
phase offsets).
4087
4088
4089
\section{Synthesis with matrices}
4090
\label{synthmat}
4091
4092
As we saw in Section~\ref{synthesis}, we can also express the synthesis
4093
problem in terms of matrix multiplication:
4094
4095
\begin{verbatim}
4096
PI2 = np.pi * 2
4097
4098
def synthesize2(amps, fs, ts):
4099
args = np.outer(ts, fs)
4100
M = np.exp(1j * PI2 * args)
4101
ys = np.dot(M, amps)
4102
return ys
4103
\end{verbatim}
4104
4105
Again, {\tt amps} is a NumPy array that contains a sequence of
4106
amplitudes.
4107
4108
{\tt fs} is a sequence containing the frequencies of the
4109
components. {\tt ts} contains the times where we will evaluate
4110
the signal.
4111
4112
{\tt args} contains the outer product of {\tt ts} and {\tt fs},
4113
with the {\tt ts} running down the rows and the {\tt fs} running
4114
across the columns (you might want to refer back to
4115
Figure~\ref{fig.synthesis}).
4116
4117
Each column of matrix {\tt M} contains a complex sinusoid with
4118
a particular frequency, evaluated at a sequence of {\tt ts}.
4119
4120
When we multiply {\tt M} by the amplitudes, the result is a vector
4121
whose elements correspond to the {\tt ts}; each element is the sum of
4122
several complex sinusoids, evaluated at a particular time.
4123
4124
Here's the example from the previous section again:
4125
4126
\begin{verbatim}
4127
>>> ys = synthesize2(amps, fs, ts)
4128
>>> ys
4129
[ 1.000 +0.000e+00j 0.995 +9.093e-02j 0.979 +1.803e-01j ...,
4130
0.979 -1.803e-01j 0.995 -9.093e-02j 1.000 -5.081e-15j]
4131
\end{verbatim}
4132
4133
The result is the same.
4134
4135
In this example the amplitudes are real, but they could also be
4136
complex. What effect does a complex amplitude have on the result?
4137
Remember that we can think of a complex number in two ways: either the
4138
sum of a real and imaginary part, $x + i y$, or the product of a real
4139
amplitude and a complex exponential, $A e^{i \phi_0}$. Using the
4140
second interpretation, we can see what happens when we multiply
4141
a complex amplitude by a complex sinusoid. For each frequency, $f$,
4142
we have:
4143
%
4144
\[ A \exp{i \phi_0} \cdot \exp{i 2 \pi f t} = A \exp{i (2 \pi f t + \phi_0)} \]
4145
%
4146
Multiplying by $A e^{i \phi_0}$ multiplies the amplitude by $A$
4147
and adds the phase offset $\phi_0$.
4148
4149
\begin{figure}
4150
% dft.py
4151
\centerline{\includegraphics[height=2.5in]{figs/dft2.pdf}}
4152
\caption{Real part of two complex signals that differ by a phase
4153
offset.}
4154
\label{fig.dft2}
4155
\end{figure}
4156
4157
We can test that claim by running the previous example with
4158
$\phi_0 = 1.5$ for all frequency components:
4159
4160
\begin{verbatim}
4161
phi = 1.5
4162
amps2 = amps * np.exp(1j * phi)
4163
ys2 = synthesize2(amps2, fs, ts)
4164
4165
thinkplot.plot(ts[:n], ys.real[:n])
4166
thinkplot.plot(ts[:n], ys2.real[:n])
4167
\end{verbatim}
4168
4169
Since {\tt amps}
4170
is an array of reals, multiplying by {\tt np.exp(1j * phi)} yields
4171
an array of complex numbers with phase offset {\tt phi} radians, and
4172
the same magnitudes as {\tt amps}.
4173
4174
Figure~\ref{fig.dft2} shows the result. The phase offset
4175
$\phi_0 = 1.5$ shifts the wave to the left by about one quarter of
4176
a cycle; it also changes the waveform, because the same phase
4177
offset applied to different frequencies changes how the frequency
4178
components line up with each other.
4179
4180
Now that we have the more general solution to the synthesis problem --
4181
one that handles complex amplitudes -- we are ready for the analysis
4182
problem.
4183
4184
4185
\section{The analysis problem}
4186
4187
The analysis problem is the inverse of the synthesis problem: given a
4188
sequence of samples, $y$, and knowing the frequencies
4189
that make up the signal, can we compute the complex amplitudes of the
4190
components, $a$?
4191
4192
As we saw in Section~\ref{analysis}, we can solve this problem by forming
4193
the synthesis matrix, $M$, and solving the system of linear
4194
equations, $M a = y$ for $a$.
4195
4196
\begin{verbatim}
4197
def analyze1(ys, fs, ts):
4198
args = np.outer(ts, fs)
4199
M = np.exp(1j * PI2 * args)
4200
amps = np.linalg.solve(M, ys)
4201
return amps
4202
\end{verbatim}
4203
4204
{\tt analyze1} takes a (possibly complex) wave array, {\tt ys}, a
4205
sequence of real frequencies, {\tt fs}, and a sequence of real
4206
times, {\tt ts}. It returns a sequence of complex amplitudes, {\tt
4207
amps}.
4208
4209
Continuing the previous example, we can confirm that {\tt analyze1}
4210
recovers the amplitudes we started with. For the linear system
4211
solver to work, {\tt M} has to be square, so we need {\tt ys}, {\tt
4212
fs} and {\tt ts} to have the same length. I'll insure that by
4213
slicing {\tt ys} and {\tt ts} down to the length of {\tt fs}:
4214
4215
\begin{verbatim}
4216
>>> n = len(fs)
4217
>>> amps2 = analyze1(ys[:n], fs, ts[:n])
4218
>>> amps2
4219
[ 0.60+0.j 0.25-0.j 0.10+0.j 0.05-0.j]
4220
\end{verbatim}
4221
4222
These are approximately the amplitudes we started with, although
4223
each component has a small imaginary part due to
4224
floating-point errors.
4225
4226
4227
\section{Efficient analysis}
4228
4229
Unfortunately, solving a linear system of equations is slow. For the
4230
DCT, we were able to speed things up by choosing {\tt fs} and {\tt
4231
ts} so that {\tt M} is orthogonal. That way, the inverse of {\tt M}
4232
is the transpose of {\tt M}, and we can compute both DCT and inverse
4233
DCT by matrix multiplication.
4234
4235
We'll do the same thing for the DFT, with one small change.
4236
Since {\tt M} is complex, we need it to be {\bf unitary}, rather
4237
than orthogonal, which means that the inverse of {\tt M} is
4238
the conjugate transpose of {\tt M}, which we can compute by
4239
transposing the matrix and negating the imaginary part of each
4240
element. See \url{http://en.wikipedia.org/wiki/Unitary_matrix}.
4241
4242
The NumPy methods {\tt conj} and {\tt transpose} do what we
4243
want. Here's the code that computes {\tt M} for $N=4$ components:
4244
4245
\begin{verbatim}
4246
N = 4
4247
ts = np.arange(N) / N
4248
fs = np.arange(N)
4249
args = np.outer(ts, fs)
4250
M = np.exp(1j * PI2 * args)
4251
\end{verbatim}
4252
4253
If $M$ is unitary, $M^*M = I$, where $M^*$ is the conjugate transpose
4254
of $M$, and $I$ is the identity matrix. We can test whether $M$
4255
is unitary like this:
4256
4257
\begin{verbatim}
4258
MstarM = M.conj().transpose().dot(M)
4259
\end{verbatim}
4260
4261
The result, within the tolerance of floating-point error, is
4262
$4 I$, so $M$ is unitary except for an extra factor of $N$,
4263
similar to the extra factor of 2 we found with the DCT.
4264
4265
We can use this result to write a faster version of {\tt analyze1}:
4266
4267
\begin{verbatim}
4268
def analyze2(ys, fs, ts):
4269
args = np.outer(ts, fs)
4270
M = np.exp(1j * PI2 * args)
4271
amps = M.conj().transpose().dot(ys) / N
4272
return amps
4273
\end{verbatim}
4274
4275
And test it with appropriate values of {\tt fs} and {\tt ts}:
4276
4277
\begin{verbatim}
4278
N = 4
4279
amps = np.array([0.6, 0.25, 0.1, 0.05])
4280
fs = np.arange(N)
4281
ts = np.arange(N) / N
4282
ys = synthesize2(amps, fs, ts)
4283
amps3 = analyze2(ys, fs, ts)
4284
\end{verbatim}
4285
4286
Again, the result is correct within the tolerance of floating-point
4287
arithmetic.
4288
4289
\begin{verbatim}
4290
[ 0.60+0.j 0.25+0.j 0.10-0.j 0.05-0.j]
4291
\end{verbatim}
4292
4293
4294
\section{DFT}
4295
\label{dftsection}
4296
4297
As a function, {\tt analyze2} would be hard to use because it
4298
only works if {\tt fs} and {\tt ts} are chosen correctly.
4299
Instead, I will rewrite it to take just {\tt ys} and compute {\tt freq}
4300
and {\tt ts} itself.
4301
4302
First, I'll make a function to compute the synthesis matrix, $M$:
4303
4304
\begin{verbatim}
4305
def synthesis_matrix(N):
4306
ts = np.arange(N) / N
4307
fs = np.arange(N)
4308
args = np.outer(ts, fs)
4309
M = np.exp(1j * PI2 * args)
4310
return M
4311
\end{verbatim}
4312
4313
Then I'll write the function that takes {\tt ys} and returns
4314
{\tt amps}:
4315
4316
\begin{verbatim}
4317
def analyze3(ys):
4318
N = len(ys)
4319
M = synthesis_matrix(N)
4320
amps = M.conj().transpose().dot(ys) / N
4321
return amps
4322
\end{verbatim}
4323
4324
We are almost done; {\tt analyze3} computes something very
4325
close to the DFT, with one difference. The conventional definition
4326
of DFT does not divide by {\tt N}:
4327
4328
\begin{verbatim}
4329
def dft(ys):
4330
N = len(ys)
4331
M = synthesis_matrix(N)
4332
amps = M.conj().transpose().dot(ys)
4333
return amps
4334
\end{verbatim}
4335
4336
Now we can confirm that my version yields the same result as
4337
FFT:
4338
4339
\begin{verbatim}
4340
>>> dft(ys)
4341
[ 2.4+0.j 1.0+0.j 0.4-0.j 0.2-0.j]
4342
\end{verbatim}
4343
4344
The result is close to {\tt amps * N}.
4345
And here's the version in {\tt np.fft}:
4346
4347
\begin{verbatim}
4348
>>> np.fft.fft(ys)
4349
[ 2.4+0.j 1.0+0.j 0.4-0.j 0.2-0.j]
4350
\end{verbatim}
4351
4352
They are the same, within floating point error.
4353
4354
The inverse DFT is almost the same, except we don't have to transpose
4355
and conjugate $M$, and {\em now} we have to divide through by N:
4356
4357
\begin{verbatim}
4358
def idft(ys):
4359
N = len(ys)
4360
M = synthesis_matrix(N)
4361
amps = M.dot(ys) / N
4362
return amps
4363
\end{verbatim}
4364
4365
Finally, we can confirm that {\tt dft(idft(amps))} yields {\tt amps}.
4366
4367
\begin{verbatim}
4368
>>> ys = idft(amps)
4369
>>> dft(ys)
4370
[ 0.60+0.j 0.25+0.j 0.10-0.j 0.05-0.j]
4371
\end{verbatim}
4372
4373
If I could go back in time, I might change the definition of
4374
DFT so it divides by $N$ and the inverse DFT doesn't. That would
4375
be more consistent with my presentation of the synthesis and analysis
4376
problems.
4377
4378
Or I might change the definition so that both operations divide
4379
through by $\sqrt{N}$. Then the DFT and inverse DFT would be
4380
more symmetric.
4381
4382
But I can't go back in time (yet!), so we're stuck with a slightly
4383
weird convention. For practical purposes it doesn't really
4384
matter.
4385
4386
4387
\section{The DFT is periodic}
4388
4389
In this chapter I presented the DFT in the form of matrix multiplication.
4390
We compute the synthesis matrix, $M$, and the analysis matrix, $M^*$.
4391
When we multiply $M^{*}$ by the wave array, $y$, each element of the
4392
result is the product of a row from $M^*$ and $y$, which we can
4393
write in the form of a summation:
4394
%
4395
\[ DFT(y)[k] = \sum_n y[n] \exp(-2 \pi i n k / N) \]
4396
%
4397
where $k$ is an index of frequency from
4398
$0$ to $N-1$ and $n$ is an index of time from $0$ to $N-1$.
4399
So $DFT(y)[k]$ is the $k$th element of the DFT of $y$.
4400
4401
Normally we evaluate this summation for $N$ values of $k$, running from
4402
0 to $N-1$. We {\em could} evaluate it for other values of $k$, but
4403
there is no point, because they start to repeat. That is, the value at
4404
$k$ is the same as the value at $k+N$ or $k+2N$ or $k-N$, etc.
4405
4406
We can see that mathematically by plugging $k+N$ into the summation:
4407
%
4408
\[ DFT(y)[k+N] = \sum_n y[n] \exp(-2 \pi i n (k+N) / N) \]
4409
%
4410
Since there is a sum in the exponent, we can break it into two parts:
4411
%
4412
\[ DFT(y)[k+N] = \sum_n y[n] \exp(-2 \pi i n k / N) \exp(-2 \pi i n N / N) \]
4413
%
4414
In the second term, the exponent is always an integer multiple of
4415
$2 \pi$, so the result is always 1, and we can drop it:
4416
%
4417
\[ DFT(y)[k+N] = \sum_n y[n] \exp(-2 \pi i n k / N) \]
4418
%
4419
And we can see that this summation is equivalent to $ DFT(y)[k]$.
4420
So the DFT is periodic, with period $N$. You will need this result
4421
for one of the exercises below, which asks you to implement the Fast Fourier
4422
Transform (FFT).
4423
4424
As an aside, writing the DFT in the form of a summation provides an
4425
insight into how it works. If you review the diagram in
4426
Section~\ref{synthesis}, you'll see that each column of the synthesis matrix
4427
is a signal evaluated at a sequence of times. The analysis matrix is
4428
the (conjugate) transpose of the synthesis matrix, so each {\em row}
4429
is a signal evaluated at a sequence of times.
4430
4431
Therefore, each summation is the correlation of $y$ with one of the
4432
signals in the array (see Section~\ref{dotproduct}). That is, each
4433
element of the DFT is a correlation that quantifies the similarity of
4434
the wave array, $y$, and a complex exponential at a particular
4435
frequency.
4436
4437
4438
\section{DFT of real signals}
4439
4440
\begin{figure}
4441
% dft.py
4442
\centerline{\includegraphics[height=2.5in]{figs/dft3.pdf}}
4443
\caption{DFT of a 500 Hz sawtooth signal sampled at 10 kHz.}
4444
\label{fig.dft3}
4445
\end{figure}
4446
4447
The Spectrum class in {\tt thinkdsp} is based on {\tt np.ftt.rfft},
4448
which computes the ``real DFT''; that is, it works with real signals.
4449
But the DFT as presented in this chapter is more general than that; it
4450
works with complex signals.
4451
4452
So what happens when we apply the ``full DFT'' to a real signal?
4453
Let's look at an example:
4454
4455
\begin{verbatim}
4456
signal = thinkdsp.SawtoothSignal(freq=500)
4457
wave = signal.make_wave(duration=0.1, framerate=10000)
4458
hs = dft(wave.ys)
4459
amps = np.absolute(hs)
4460
\end{verbatim}
4461
4462
This code makes a sawtooth wave with frequency 500 Hz, sampled at
4463
framerate 10 kHz. {\tt hs} contains the complex DFT of the wave;
4464
{\tt amps} contains the amplitude at each frequency. But what
4465
frequency do these amplitudes correspond to? If we look at the
4466
body of {\tt dft}, we see:
4467
4468
\begin{verbatim}
4469
fs = np.arange(N)
4470
\end{verbatim}
4471
4472
It's tempting to think that these values are the right frequencies.
4473
The problem is that {\tt dft} doesn't know the sampling rate. The DFT
4474
assumes that the duration of the wave is 1 time unit, so it thinks the
4475
sampling rate is $N$ per time unit. In order to interpret the
4476
frequencies, we have to convert from these arbitrary time units back
4477
to seconds, like this:
4478
4479
\begin{verbatim}
4480
fs = np.arange(N) * framerate / N
4481
\end{verbatim}
4482
4483
With this change, the range of frequencies is from 0 to the actual
4484
framerate, 10 kHz. Now we can plot the spectrum:
4485
4486
\begin{verbatim}
4487
thinkplot.plot(fs, amps)
4488
thinkplot.config(xlabel='frequency (Hz)',
4489
ylabel='amplitude')
4490
\end{verbatim}
4491
4492
Figure~\ref{fig.dft3} shows the amplitude of the signal for each
4493
frequency component from 0 to 10 kHz. The left half of the figure
4494
is what we should expect: the dominant frequency is at 500 Hz, with
4495
harmonics dropping off like $1/f$.
4496
4497
But the right half of the figure is a surprise. Past 5000 Hz, the
4498
amplitude of the harmonics start growing again, peaking at 9500 Hz.
4499
What's going on?
4500
4501
The answer: aliasing. Remember that with framerate 10000 Hz, the
4502
folding frequency is 5000 Hz. As we saw in Section~\ref{aliasing},
4503
a component at 5500 Hz is indistinguishable from a component
4504
at 4500 Hz. When we evaluate the DFT at 5500 Hz, we get the same
4505
value as at 4500 Hz. Similarly, the value at 6000 Hz is the same
4506
as the one at 4000 Hz, and so on.
4507
4508
The DFT of a real signal is symmetric around the folding frequency.
4509
Since there is no additional information past this point, we can
4510
save time by evaluating only the first half of the DFT,
4511
and that's exactly what {\tt np.fft.rfft} does.
4512
4513
4514
4515
\section{Exercises}
4516
4517
Solutions to these exercises are in {\tt chap07soln.ipynb}.
4518
4519
4520
\begin{exercise}
4521
The notebook for this chapter, {\tt chap07.ipynb}, contains
4522
additional examples and explanations. Read through it and run
4523
the code.
4524
\end{exercise}
4525
4526
4527
\begin{exercise}
4528
In this chapter, I showed how we can express the DFT and inverse DFT
4529
as matrix multiplications. These operations take time proportional to
4530
$N^2$, where $N$ is the length of the wave array. That is fast enough
4531
for many applications, but there is a faster
4532
algorithm, the Fast Fourier Transform (FFT), which takes time
4533
proportional to $N \log N$.
4534
4535
The key to the FFT is the Danielson-Lanczos lemma:
4536
%
4537
\[ DFT(y)[n] = DFT(e)[n] + exp(-2 \pi i n / N) DFT(o)[n] \]
4538
%
4539
Where $ DFT(y)[n]$ is the $n$th element of the DFT of $y$; $e$ is a
4540
wave array containing the even elements of $y$, and $o$ contains the
4541
odd elements of $y$.
4542
4543
This lemma suggests a recursive algorithm for the DFT:
4544
4545
\begin{enumerate}
4546
4547
\item Given a wave array, $y$, split it into its even elements, $e$,
4548
and its odd elements, $o$.
4549
4550
\item Compute the DFT of $e$ and $o$ by making recursive calls.
4551
4552
\item Compute $DFT(y)$ for each value of $n$ using the
4553
Danielson-Lanczos lemma.
4554
4555
\end{enumerate}
4556
4557
For the base case of this recursion, you could wait until the length
4558
of $y$ is 1. In that case, $DFT(y) = y$. Or if the length of $y$
4559
is sufficiently small, you could compute its DFT by matrix multiplication,
4560
possibly using a precomputed matrix.
4561
4562
Hint: I suggest you implement this algorithm incrementally by starting
4563
with a version that is not truly recursive. In Step 2, instead of
4564
making a recursive call, use {\tt dft}, as defined by
4565
Section~\ref{dftsection}, or {\tt np.fft.fft}. Get Step 3 working,
4566
and confirm that the results are consistent with the other
4567
implementations. Then add a base case and confirm that it works.
4568
Finally, replace Step 2 with recursive calls.
4569
4570
One more hint: Remember that the DFT is periodic; you might find {\tt
4571
np.tile} useful.
4572
4573
You can read more about the FFT at
4574
\url{https://en.wikipedia.org/wiki/Fast_Fourier_transform}.
4575
4576
\end{exercise}
4577
4578
4579
\chapter{Filtering and Convolution}
4580
4581
In this chapter I present one of the most important and useful
4582
ideas related to signal processing: the Convolution Theorem.
4583
But before we can understand the Convolution Theorem, we have to understand
4584
convolution. I'll start with a simple example, smoothing, and
4585
we'll go from there.
4586
4587
The code for this chapter is in {\tt chap08.ipynb}, which is in the
4588
repository for this book (see Section~\ref{code}).
4589
You can also view it at \url{http://tinyurl.com/thinkdsp08}.
4590
4591
4592
\section{Smoothing}
4593
\label{smoothing}
4594
4595
\begin{figure}
4596
% convolution1.py
4597
\centerline{\includegraphics[height=2.5in]{figs/convolution1.pdf}}
4598
\caption{Daily closing price of Facebook stock and a 30-day moving
4599
average.}
4600
\label{fig.convolution1}
4601
\end{figure}
4602
4603
Smoothing is an operation that tries to remove short-term variations
4604
from a signal in order to reveal long-term trends. For example, if
4605
you plot daily changes in the price of a stock, it would look noisy;
4606
a smoothing operator might make it easier to see whether the price
4607
was generally going up or down over time.
4608
4609
A common smoothing algorithm is a moving average, which computes
4610
the mean of the previous $n$ values, for some value of $n$.
4611
4612
For example, Figure~\ref{fig.convolution1} shows the daily
4613
closing price of Facebook from May 17, 2012 to December 8,
4614
2015. The gray line
4615
is the raw data, the darker line shows the 30-day moving average.
4616
Smoothing removes the most
4617
extreme changes and makes it easier to see long-term trends.
4618
4619
Smoothing operations also apply to sound signals. As an example, I'll
4620
start with a square wave at 440 Hz. As we saw in
4621
Section~\ref{square}, the harmonics of a square wave drop off
4622
slowly, so it contains many high-frequency components.
4623
4624
\begin{verbatim}
4625
signal = thinkdsp.SquareSignal(freq=440)
4626
wave = signal.make_wave(duration=1, framerate=44100)
4627
segment = wave.segment(duration=0.01)
4628
\end{verbatim}
4629
4630
{\tt wave} is a 1-second segment of the signal; {\tt segment}
4631
is a shorter segment I'll use for plotting.
4632
4633
To compute the moving average of this signal, I'll create
4634
a window with 11 elements and normalize it so the elements
4635
add up to 1.
4636
4637
\begin{verbatim}
4638
window = np.ones(11)
4639
window /= sum(window)
4640
\end{verbatim}
4641
4642
Now I can compute the average of the first 11 elements by
4643
multiplying the window by the wave array:
4644
4645
\begin{verbatim}
4646
ys = segment.ys
4647
N = len(ys)
4648
padded = thinkdsp.zero_pad(window, N)
4649
prod = padded * ys
4650
sum(prod)
4651
\end{verbatim}
4652
4653
{\tt padded} is a version of the window with zeros added to
4654
the end so it has the same length as {\tt segment.ys}
4655
{\tt prod} is the product of the window and the wave array.
4656
4657
The sum of the elementwise products is the average of the first 11
4658
elements of the array. Since these elements are all -1, their average
4659
is -1.
4660
4661
\begin{figure}
4662
% convolution2.py
4663
\centerline{\includegraphics[height=2.5in]{figs/convolution2.pdf}}
4664
\caption{A square signal at 400 Hz (gray) and an 11-element
4665
moving average.}
4666
\label{fig.convolution2}
4667
\end{figure}
4668
4669
To compute the next element of the smoothed signal, we shift the
4670
window to the right and compute the average of the next 11 elements of
4671
the wave array, starting with the second.
4672
4673
\begin{verbatim}
4674
rolled = np.roll(rolled, 1)
4675
prod = rolled * ys
4676
sum(prod)
4677
\end{verbatim}
4678
4679
The result is -1 again.
4680
To compute the moving average, we repeat this process and store
4681
the result in an array named {\tt smoothed}:
4682
4683
\begin{verbatim}
4684
smoothed = np.zeros_like(ys)
4685
rolled = padded
4686
for i in range(N):
4687
smoothed[i] = sum(rolled * ys)
4688
rolled = np.roll(rolled, 1)
4689
\end{verbatim}
4690
4691
{\tt rolled} is a copy of {\tt padded} that gets shifted to
4692
the right by one element each time through the loop. Inside
4693
the loop, we multiply the segment by {\tt rolled} to select
4694
11 elements, and then add them up.
4695
4696
Figure~\ref{fig.convolution2} shows the result. The gray line
4697
is the original signal; the darker line is the smoothed signal.
4698
The smoothed signal starts to ramp up when the leading edge of
4699
the window reaches the first transition, and levels off when
4700
the window crosses the transition. As a result, the transitions
4701
are less abrupt, and the corners less sharp. If you listen
4702
to the smoothed signal, it sounds less buzzy and slightly muffled.
4703
4704
4705
\section{Convolution}
4706
\label{convolution}
4707
4708
The operation we just computed is called {\bf convolution},
4709
and it is such a common operation that NumPy provides an
4710
implementation that is simpler and faster than my version:
4711
4712
\begin{verbatim}
4713
convolved = np.convolve(ys, window, mode='valid')
4714
smooth2 = thinkdsp.Wave(convolved, framerate=wave.framerate)
4715
\end{verbatim}
4716
4717
{\tt np.convolve} computes the convolution of the wave
4718
array and the window. The mode flag {\tt valid} indicates
4719
that it should only compute values when the window and the
4720
wave array overlap completely, so it stops when the right
4721
edge of the window reaches the end of the wave array. Other
4722
than that, the result is the same as in Figure~\ref{fig.convolution2}.
4723
4724
\newcommand{\conv}{\ast}
4725
4726
Actually, there is one other difference. The loop in the
4727
previous section actually computes {\bf cross-correlation}:
4728
%
4729
\[ (f \star g)[n] = \sum_{m=0}^{N-1} f[m] g[n+m] \]
4730
%
4731
where $f$ is a wave array with length $N$, $g$ is the window,
4732
and $\star$ is the symbol for cross-correlation. To
4733
compute the $n$th element of the result, we shift $g$ to
4734
the right, which is why the index is $n+m$.
4735
4736
The definition of convolution is slightly different:
4737
%
4738
\[ (f \conv g)[n] = \sum_{m=0}^{N-1} f[m] g[n-m] \]
4739
%
4740
The symbol $\conv$ represents convolution. The difference is in the
4741
index of $g$: $m$ has been negated, so the summation iterates the
4742
elements of $g$ backward (assuming that negative indices wrap around
4743
to the end of the array).
4744
4745
Because the window we used in this example is symmetric,
4746
cross-correlation and convolution yield the same result. When we use
4747
other windows, we will have to be more careful.
4748
4749
You might wonder why convolution is defined the way it is. There
4750
are two reasons:
4751
4752
\begin{itemize}
4753
4754
\item This definition comes up naturally for several applications,
4755
especially analysis of signal-processing systems, which is
4756
the topic of Chapter~\ref{systems}.
4757
4758
\item Also, this definition is the basis of the Convolution Theorem,
4759
coming up very soon.
4760
4761
\end{itemize}
4762
4763
4764
\section{The frequency domain}
4765
4766
\begin{figure}
4767
% convolution4.py
4768
\centerline{\includegraphics[height=2.5in]{figs/convolution4.pdf}}
4769
\caption{Spectrum of the square wave before and after smoothing.}
4770
\label{fig.convolution4}
4771
\end{figure}
4772
4773
Smoothing makes the transitions in a square signal less abrupt,
4774
and makes the sound slightly muffled. Let's see what effect this
4775
operation has on the spectrum. First I'll plot the spectrum
4776
of the original wave:
4777
4778
\begin{verbatim}
4779
spectrum = wave.make_spectrum()
4780
spectrum.plot(color=GRAY)
4781
\end{verbatim}
4782
4783
Then the smoothed wave:
4784
4785
\begin{verbatim}
4786
convolved = np.convolve(wave.ys, window, mode='same')
4787
smooth = thinkdsp.Wave(convolved, framerate=wave.framerate)
4788
spectrum2 = smooth.make_spectrum()
4789
spectrum2.plot()
4790
\end{verbatim}
4791
4792
The mode flag {\tt same} indicates that the result should have the
4793
same length as the input. In this example, it will include a few values
4794
that ``wrap around'', but that's ok for now.
4795
4796
Figure~\ref{fig.convolution4} shows the result. The fundamental
4797
frequency is almost unchanged; the first few harmonics are
4798
attenuated, and the higher harmonics are almost eliminated. So
4799
smoothing has the effect of a low-pass filter, which we
4800
saw in Section~\ref{spectrums} and Section~\ref{pink}.
4801
4802
To see how much each component has been attenuated, we can
4803
compute the ratio of the two spectrums:
4804
4805
\begin{verbatim}
4806
amps = spectrum.amps
4807
amps2 = spectrum2.amps
4808
ratio = amps2 / amps
4809
ratio[amps<560] = 0
4810
thinkplot.plot(ratio)
4811
\end{verbatim}
4812
4813
{\tt ratio} is the ratio of the amplitude before and after
4814
smoothing. When {\tt amps} is small, this ratio can be big
4815
and noisy, so for simplicity I set the ratio to 0 except
4816
where the harmonics are.
4817
4818
\begin{figure}
4819
% convolution5.py
4820
\centerline{\includegraphics[height=2.5in]{figs/convolution5.pdf}}
4821
\caption{Ratio of spectrums for the square wave, before and after smoothing.}
4822
\label{fig.convolution5}
4823
\end{figure}
4824
4825
Figure~\ref{fig.convolution5} shows the result. As expected, the
4826
ratio is high for low frequencies and drops off at a cutoff frequency
4827
near 4000 Hz. But there is another feature we did not expect: above
4828
the cutoff, the ratio bounces around between 0 and 0.2.
4829
What's up with that?
4830
4831
4832
\section{The convolution theorem}
4833
\label{convtheorem}
4834
4835
\begin{figure}
4836
% convolution6.py
4837
\centerline{\includegraphics[height=2.5in]{figs/convolution6.pdf}}
4838
\caption{Ratio of spectrums for the square wave, before and after
4839
smoothing, along with the DFT of the smoothing window.}
4840
\label{fig.convolution6}
4841
\end{figure}
4842
4843
\newcommand{\DFT}{\mathrm{DFT}}
4844
\newcommand{\IDFT}{\mathrm{IDFT}}
4845
4846
The answer is the convolution theorem. Stated mathematically:
4847
%
4848
\[ \DFT(f \conv g) = \DFT(f) \cdot \DFT(g) \]
4849
%
4850
where $f$ is a wave array and $g$ is a window. In words,
4851
the convolution theorem says that if we convolve $f$ and $g$,
4852
and then compute the DFT, we get the same answer as computing
4853
the DFT of $f$ and $g$, and then multiplying the results
4854
element-wise. More concisely, convolution in the time
4855
domain corresponds to multiplication in the frequency domain.
4856
4857
And that explains Figure~\ref{fig.convolution5}, because when we
4858
convolve a wave and a window, we multiply the spectrum of
4859
the wave with the spectrum of the window. To see how that works,
4860
we can compute the DFT of the window:
4861
4862
\begin{verbatim}
4863
padded = zero_pad(window, N)
4864
dft_window = np.fft.rfft(padded)
4865
thinkplot.plot(abs(dft_window))
4866
\end{verbatim}
4867
4868
{\tt padded} contains the window, padded with zeros to be the
4869
same length as {\tt wave}. \verb"dft_window" contains the
4870
DFT of the smoothing window.
4871
4872
\newcommand{\abs}{\mathrm{abs}}
4873
4874
Figure~\ref{fig.convolution6} shows the result, along with the
4875
ratios we computed in the previous section. The ratios are
4876
exactly the amplitudes in \verb"dft_window". Mathematically:
4877
%
4878
\[ \abs(\DFT(f \conv g)) / \abs(\DFT(f)) = \abs(\DFT(g)) \]
4879
%
4880
In this context, the DFT of a window is called a {\bf filter}.
4881
For any convolution window in the time domain, there is a
4882
corresponding filter in the frequency domain. And for any
4883
filter than can be expressed by element-wise multiplication in
4884
the frequency domain, there is a corresponding window.
4885
4886
4887
\section{Gaussian filter}
4888
4889
\begin{figure}
4890
% convolution7.py
4891
\centerline{\includegraphics[height=2.5in]{figs/convolution7.pdf}}
4892
\caption{Boxcar and Gaussian windows.}
4893
\label{fig.convolution7}
4894
\end{figure}
4895
4896
The moving average window we used in the previous section is a
4897
low-pass filter, but it is not a very good one. The DFT drops off
4898
steeply at first, but then it bounces around. Those bounces are
4899
called {\bf sidelobes}, and they are there because the moving average
4900
window is like a square wave, so its spectrum contains high-frequency
4901
harmonics that drop off proportionally to $1/f$, which is relatively
4902
slow.
4903
4904
We can do better with a Gaussian window. SciPy provides functions
4905
that compute many common convolution windows, including {\tt
4906
gaussian}:
4907
4908
\begin{verbatim}
4909
gaussian = scipy.signal.gaussian(M=11, std=2)
4910
gaussian /= sum(gaussian)
4911
\end{verbatim}
4912
4913
{\tt M} is the number of elements in the window; {\tt std}
4914
is the standard deviation of the Gaussian distribution used to
4915
compute it. Figure~\ref{fig.convolution7} shows the shape
4916
of the window. It is a discrete approximation of the Gaussian
4917
``bell curve''. The figure also shows the moving average window
4918
from the previous example, which is sometimes called a
4919
{\bf boxcar window} because it looks like a rectangular railway car.
4920
4921
\begin{figure}
4922
% convolution.py
4923
\centerline{\includegraphics[height=2.5in]{figs/convolution8.pdf}}
4924
\caption{Ratio of spectrums before and after Gaussian smoothing, and
4925
the DFT of the window.}
4926
\label{fig.convolution8}
4927
\end{figure}
4928
4929
I ran the computations from the previous sections again
4930
with this curve, and generated Figure~\ref{fig.convolution8},
4931
which shows the ratio of the spectrums before and after
4932
smoothing, along with the DFT of the Gaussian window.
4933
4934
As a low-pass filter, Gaussian smoothing is better than a simple
4935
moving average. After the ratio drops off, it stays low, with almost
4936
none of the sidelobes we saw with the boxcar window. So it does a
4937
better job of cutting off the higher frequencies.
4938
4939
The reason it does so well is that the DFT of a Gaussian curve is also a
4940
Gaussian curve. So the ratio drops off in proportion to $\exp(-f^2)$,
4941
which is much faster than $1/f$.
4942
4943
4944
\section{Efficient convolution}
4945
\label{effconv}
4946
4947
One of the reasons the FFT is such an important algorithm is that,
4948
combined with the Convolution Theorem, it provides an efficient
4949
way to compute convolution, cross-correlation, and autocorrelation.
4950
4951
Again, the Convolution Theorem states
4952
%
4953
\[ \DFT(f \conv g) = \DFT(f) \cdot \DFT(g) \]
4954
%
4955
So one way to compute a convolution is:
4956
%
4957
\[ f \conv g = \IDFT(\DFT(f) \cdot \DFT(g)) \]
4958
%
4959
where $IDFT$ is the inverse DFT. A simple implementation of
4960
convolution takes time proportional to $N^2$; this algorithm,
4961
using FFT, takes time proportional to $N \log N$.
4962
4963
We can confirm that it works by computing the same convolution
4964
both ways. As an example, I'll apply it to the BitCoin data
4965
shown in Figure~\ref{fig.convolution1}.
4966
4967
\begin{verbatim}
4968
df = pandas.read_csv('coindesk-bpi-USD-close.csv',
4969
nrows=1625,
4970
parse_dates=[0])
4971
ys = df.Close.values
4972
\end{verbatim}
4973
4974
This example uses Pandas to read the data from the CSV file (included
4975
in the repository for this book). If you are not familiar with
4976
Pandas, don't worry: I'm not going to do much with it in this book.
4977
But if you're interested, you can learn more about it in
4978
{\it Think Stats} at \url{http://thinkstats2.com}.
4979
4980
The result, {\tt df}, is a DataFrame, one of the data structures
4981
provided by Pandas. {\tt ys} is a NumPy array that contains daily
4982
closing prices.
4983
4984
Next I'll create a Gaussian window and convolve it with {\tt ys}:
4985
4986
\begin{verbatim}
4987
window = scipy.signal.gaussian(M=30, std=6)
4988
window /= window.sum()
4989
smoothed = np.convolve(ys, window, mode='valid')
4990
\end{verbatim}
4991
4992
\verb"fft_convolve" computes the same thing using FFT:
4993
4994
\begin{verbatim}
4995
from np.fft import fft, ifft
4996
4997
def fft_convolve(signal, window):
4998
fft_signal = fft(signal)
4999
fft_window = fft(window)
5000
return ifft(fft_signal * fft_window)
5001
\end{verbatim}
5002
5003
We can test it by padding the window to the same length
5004
as {\tt ys} and then computing the convolution:
5005
5006
\begin{verbatim}
5007
padded = zero_pad(window, N)
5008
smoothed2 = fft_convolve(ys, padded)
5009
\end{verbatim}
5010
5011
The result has $M-1$ bogus values at the beginning, where $M$ is the
5012
length of the window. If we slice off the bogus values, the result
5013
agrees with \verb"fft_convolve" with about 12 digits of precision.
5014
5015
\begin{verbatim}
5016
M = len(window)
5017
smoothed2 = smoothed2[M-1:]
5018
\end{verbatim}
5019
5020
5021
\section{Efficient autocorrelation}
5022
5023
\begin{figure}
5024
% convolution.py
5025
\centerline{\includegraphics[height=2.5in]{figs/convolution9.pdf}}
5026
\caption{Autocorrelation functions computed by NumPy and
5027
{\tt fft\_correlate}.}
5028
\label{fig.convolution9}
5029
\end{figure}
5030
5031
In Section~\ref{convolution} I presented definitions of
5032
cross-correlation and convolution, and we saw that they are
5033
almost the same, except that in convolution the window is
5034
reversed.
5035
5036
Now that we have an efficient algorithm for convolution, we
5037
can also use it to compute cross-correlations and autocorrelations.
5038
Using the data from the previous section, we can compute the
5039
autocorrelation Facebook stock prices:
5040
5041
\begin{verbatim}
5042
corrs = np.correlate(close, close, mode='same')
5043
\end{verbatim}
5044
5045
With {\tt mode='same'}, the result has the same length as {\tt close},
5046
corresponding to lags from $-N/2$ to $N/2-1$.
5047
The gray line in Figure~\ref{fig.convolution9} shows the result.
5048
Except at {\tt lag=0}, there are no peaks, so there is no apparent
5049
periodic behavior in this signal. However, the autocorrelation
5050
function drops off slowly, suggesting that this signal resembled
5051
pink noise, as we saw in Section~\ref{autopink}.
5052
5053
To compute autocorrelation using convolution,
5054
have to zero-pad the signal to double the length.
5055
This trick is necessary because the FFT is based
5056
on the assumption that the signal is periodic; that is, that it wraps
5057
around from the end to the beginning. With time-series data like
5058
this, that assumption is invalid. Adding zeros, and then trimming
5059
the results, removes the bogus values.
5060
5061
Also, remember that convolution reverse the direction of the window.
5062
In order to cancel that effect, we reverse the direction of the
5063
window before calling \verb"fft_convolve", using {\tt np.flipud},
5064
which flips a NumPy array. The result is a view of the array,
5065
not a copy, so this operation is fast.
5066
5067
\begin{verbatim}
5068
def fft_autocorr(signal):
5069
N = len(signal)
5070
signal = thinkdsp.zero_pad(signal, 2*N)
5071
window = np.flipud(signal)
5072
5073
corrs = fft_convolve(signal, window)
5074
corrs = np.roll(corrs, N//2+1)[:N]
5075
return corrs
5076
\end{verbatim}
5077
5078
The result from \verb"fft_convolve" has length $2N$. Of those,
5079
the first and last $N/2$ are valid; the rest are the result of
5080
zero-padding. To select the valid element, we roll the results
5081
and select the first $N$, corresponding to lags from $-N/2$ to
5082
$N/2-1$.
5083
5084
As shown in Figure~\ref{fig.convolution9} the results from
5085
\verb"fft_autocorr" and {\tt np.correlate} are identical (with
5086
about 9 digits of precision).
5087
5088
Notice that the correlations in Figure~\ref{fig.convolution9} are
5089
large numbers; we could normalize them (between -1 and 1) as shown
5090
in Section~\ref{correlate}.
5091
5092
The strategy we used here for auto-correlation also works for
5093
cross-correlation. Again, you have to prepare the signals by flipping
5094
one and padding both, and then you have to trim the invalid parts of
5095
the result. This padding and trimming is a nuisance, but that's why
5096
libraries like NumPy provide functions to do it for you.
5097
5098
5099
\section{Exercises}
5100
5101
Solutions to these exercises are in {\tt chap08soln.ipynb}.
5102
5103
\begin{exercise}
5104
The notebook for this chapter is {\tt chap08.ipynb}.
5105
Read through it and run the code.
5106
5107
It contains an interactive widget that lets you
5108
experiment with the parameters of the Gaussian window to see
5109
what effect they have on the cutoff frequency.
5110
5111
What goes wrong when you increase the width of the Gaussian,
5112
{\tt std}, without increasing the number of elements in the window,
5113
{\tt M}?
5114
\end{exercise}
5115
5116
5117
\begin{exercise}
5118
In this chapter I claimed that the Fourier transform of a Gaussian
5119
curve is also a Gaussian curve. For discrete Fourier transforms,
5120
this relationship is approximately true.
5121
5122
Try it out for a few examples. What happens to the Fourier transform
5123
as you vary {\tt std}?
5124
\end{exercise}
5125
5126
5127
\begin{exercise}
5128
If you did the exercises in Chapter~\ref{nonperiodic}, you saw the
5129
effect of the Hamming window, and some of the other windows provided
5130
by NumPy, on spectral leakage. We can get some insight into the
5131
effect of these windows by looking at their DFTs.
5132
5133
In addition to the Gaussian window we used in this window, create a
5134
Hamming window with the same size. Zero pad the windows and plot
5135
their DFTs. Which window acts as a better low-pass filter? You might
5136
find it useful to plot the DFTs on a log-$y$ scale.
5137
5138
Experiment with a few different windows and a few different sizes.
5139
\end{exercise}
5140
5141
5142
5143
\chapter{Differentiation and Integration}
5144
\label{diffint}
5145
5146
This chapter picks up where the previous chapter left off,
5147
looking at the relationship between windows in the time domain
5148
and filters in the frequency domain.
5149
5150
In particular, we'll look at the effect of a finite difference
5151
window, which approximates differentiation, and the cumulative
5152
sum operation, which approximates integration.
5153
5154
The code for this chapter is in {\tt chap09.ipynb}, which is in the
5155
repository for this book (see Section~\ref{code}).
5156
You can also view it at \url{http://tinyurl.com/thinkdsp09}.
5157
5158
5159
\section{Finite differences}
5160
\label{diffs}
5161
5162
In Section~\ref{smoothing}, we applied a smoothing window to
5163
the stock price of Facebook and found that a smoothing
5164
window in the time domain corresponds to a low-pass filter in
5165
the frequency domain.
5166
5167
In this section, we'll look at daily price changes and
5168
see that computing the difference between successive elements,
5169
in the time domain, corresponds to a high-pass filter.
5170
5171
Here's the code to read the data, store it as a wave, and compute its
5172
spectrum.
5173
5174
\begin{verbatim}
5175
import pandas as pd
5176
5177
names = ['date', 'open', 'high', 'low', 'close', 'volume']
5178
df = pd.read_csv('fb.csv', header=0, names=names)
5179
ys = df.close.values[::-1]
5180
close = thinkdsp.Wave(ys, framerate=1)
5181
spectrum = wave.make_spectrum()
5182
\end{verbatim}
5183
5184
This example uses Pandas to read the CSV file; the
5185
result is a DataFrame, {\tt df}, with columns for the opening
5186
price, closing price, and high and low prices. I select the closing
5187
prices and save them in a Wave object.
5188
The framerate is 1 sample per day.
5189
5190
Figure~\ref{fig.diff_int1} shows
5191
this time series and its spectrum.
5192
Visually, the time series resembles Brownian noise (see
5193
Section~\ref{brownian}).
5194
And the spectrum looks like a straight
5195
line, albeit a noisy one. The estimated slope is -1.9,
5196
which is consistent with Brownian noise.
5197
5198
\begin{figure}
5199
% diff_int.py
5200
\centerline{\includegraphics[height=2.5in]{figs/diff_int1.pdf}}
5201
\caption{Daily closing price of Facebook and the spectrum of this time
5202
series.}
5203
\label{fig.diff_int1}
5204
\end{figure}
5205
5206
Now let's compute the daily price change using {\tt np.diff}:
5207
5208
\begin{verbatim}
5209
diff = np.diff(ys)
5210
change = thinkdsp.Wave(diff, framerate=1)
5211
change_spectrum = change.make_spectrum()
5212
\end{verbatim}
5213
5214
Figure~\ref{fig.diff_int2} shows the resulting wave and its spectrum.
5215
The daily changes resemble white noise, and the estimated slope of the
5216
spectrum, -0.06, is near zero, which is what we expect for white
5217
noise.
5218
5219
\begin{figure}
5220
% diff_int2.py
5221
\centerline{\includegraphics[height=2.5in]{figs/diff_int2.pdf}}
5222
\caption{Daily price change of Facebook and the spectrum of this time series.}
5223
\label{fig.diff_int2}
5224
\end{figure}
5225
5226
5227
\section{The frequency domain}
5228
5229
Computing the difference
5230
between successive elements is the same as convolution with
5231
the window {\tt [1, -1]}.
5232
If the order of those elements seems backward,
5233
remember that convolution reverses the window before applying it
5234
to the signal.
5235
5236
We can see the effect of this operation in the frequency domain
5237
by computing the DFT of the window.
5238
5239
\begin{verbatim}
5240
diff_window = np.array([1.0, -1.0])
5241
padded = thinkdsp.zero_pad(diff_window, len(close))
5242
diff_wave = thinkdsp.Wave(padded, framerate=close.framerate)
5243
diff_filter = diff_wave.make_spectrum()
5244
\end{verbatim}
5245
5246
\begin{figure}
5247
% diff_int.py
5248
\centerline{\includegraphics[height=2.5in]{figs/diff_int3.pdf}}
5249
\caption{Filters corresponding to the diff and differentiate operators (left) and integration operator (right, log-$y$ scale).}
5250
\label{fig.diff_int3}
5251
\end{figure}
5252
5253
Figure~\ref{fig.diff_int3} shows the result. The finite difference
5254
window corresponds to a high pass filter: its amplitude increases with
5255
frequency, linearly for low frequencies, and then sublinearly after
5256
that. In the next section, we'll see why.
5257
5258
5259
\section{Differentiation}
5260
\label{effdiff}
5261
5262
The window we used in the previous section is a
5263
numerical approximation of the first derivative, so the filter
5264
approximates the effect of differentiation.
5265
5266
Differentiation in the time domain corresponds to a simple filter
5267
in the frequency domain; we can figure out what it is with a little
5268
math.
5269
5270
Suppose we have a complex sinusoid with frequency $f$:
5271
%
5272
\[ E_f(t) = e^{2 \pi i f t} \]
5273
%
5274
The first derivative of $E_f$ is
5275
%
5276
\[ \frac{d}{dt} E_f(t) = 2 \pi i f e^{2 \pi i f t} \]
5277
%
5278
which we can rewrite as
5279
%
5280
\[ \frac{d}{dt} E_f(t) = 2 \pi i f E_f(t) \]
5281
%
5282
In other words, taking the derivative of $E_f$ is the same
5283
as multiplying by $2 \pi i f$, which is a complex number
5284
with magnitude $2 \pi f$ and angle $\pi/2$.
5285
5286
We can compute the filter that corresponds to differentiation,
5287
like this:
5288
5289
\begin{verbatim}
5290
deriv_filter = close.make_spectrum()
5291
deriv_filter.hs = PI2 * 1j * deriv_filter.fs
5292
\end{verbatim}
5293
5294
I started with the spectrum of {\tt close}, which has the right
5295
size and framerate, then replaced the {\tt hs} with $2 \pi i f$.
5296
Figure~\ref{fig.diff_int3} (left) shows this filter; it is a straight line.
5297
5298
As we saw in Section~\ref{synthmat}, multiplying a complex sinusoid
5299
by a complex number has two effects: it multiplies
5300
the amplitude, in this case by $2 \pi f$, and shifts the phase
5301
offset, in this case by $\pi/2$.
5302
5303
If you are familiar with the language of operators and eigenfunctions,
5304
each $E_f$ is an eigenfunction of the differentiation operator, with the
5305
corresponding eigenvalue $2 \pi i f$. See
5306
\url{http://en.wikipedia.org/wiki/Eigenfunction}.
5307
5308
If you are not familiar with that language, here's what it
5309
means:
5310
5311
\newcommand{\op}{\mathcal{A}}
5312
5313
\begin{itemize}
5314
5315
\item An operator is a function that takes a function and returns
5316
another function. For example, differentiation is an operator.
5317
5318
\item A function, $g$, is an eigenfunction of an operator, $\op$, if
5319
applying $\op$ to $g$ has the effect of multiplying the function by
5320
a scalar. That is, $\op g = \lambda g$.
5321
5322
\item In that case, the scalar $\lambda$ is the eigenvalue that
5323
corresponds to the eigenfunction $g$.
5324
5325
\item A given operator might have many eigenfunctions, each with
5326
a corresponding eigenvalue.
5327
5328
\end{itemize}
5329
5330
Because complex sinusoids are eigenfunctions of the differentiation
5331
operator, they are easy to differentiate. All we have to do is
5332
multiply by a complex scalar.
5333
5334
For signals with more than one
5335
component, the process is only slightly harder:
5336
5337
\begin{enumerate}
5338
5339
\item Express the signal as the sum of complex sinusoids.
5340
5341
\item Compute the derivative of each component by multiplication.
5342
5343
\item Add up the differentiated components.
5344
5345
\end{enumerate}
5346
5347
If that process sounds familiar, that's because it is identical
5348
to the algorithm for convolution in Section~\ref{effconv}: compute
5349
the DFT, multiply by a filter, and compute the inverse DFT.
5350
5351
{\tt Spectrum} provides a method that applies the differentiation
5352
filter:
5353
5354
\begin{verbatim}
5355
# class Spectrum:
5356
5357
def differentiate(self):
5358
self.hs *= PI2 * 1j * self.fs
5359
\end{verbatim}
5360
5361
We can use it to compute the derivative of the Facebook time series:
5362
5363
\begin{verbatim}
5364
deriv_spectrum = close.make_spectrum()
5365
deriv_spectrum.differentiate()
5366
deriv = deriv_spectrum.make_wave()
5367
\end{verbatim}
5368
5369
\begin{figure}
5370
% diff_int.py
5371
\centerline{\includegraphics[height=2.5in]{figs/diff_int4.pdf}}
5372
\caption{Comparison of daily price changes computed by
5373
{\tt np.diff} and by applying the differentiation filter.}
5374
\label{fig.diff_int4}
5375
\end{figure}
5376
5377
Figure~\ref{fig.diff_int4} compares the daily price changes computed by
5378
{\tt np.diff} with the derivative we just computed.
5379
I selected the first 50 values in the time series so we can see the
5380
differences more clearly.
5381
5382
The derivative is noisier, because it amplifies the high frequency
5383
components more, as shown in Figure~\ref{fig.diff_int3} (left). Also, the
5384
first few elements of the derivative are very noisy. The problem
5385
there is that the DFT-based derivative is based on the assumption that
5386
the signal is periodic. In effect, it connects the last element in
5387
the time series back to the first element, which creates artifacts at
5388
the boundaries.
5389
5390
To summarize, we have shown:
5391
5392
\begin{itemize}
5393
5394
\item Computing the difference between successive values in a signal
5395
can be expressed as convolution with a simple window. The result is
5396
an approximation of the first derivative.
5397
5398
\item Differentiation in the time domain corresponds to a simple
5399
filter in the frequency domain. For periodic signals, the result is
5400
the first derivative, exactly. For some non-periodic signals, it
5401
can approximate the derivative.
5402
5403
\end{itemize}
5404
5405
Using the DFT to compute derivatives is the basis of {\bf spectral
5406
methods} for solving differential equations (see
5407
\url{http://en.wikipedia.org/wiki/Spectral_method}).
5408
5409
In particular, it is useful for the analysis of linear, time-invariant
5410
systems, which is coming up in Chapter~\ref{systems}.
5411
5412
5413
\section{Integration}
5414
5415
\begin{figure}
5416
% diff_int.py
5417
\centerline{\includegraphics[height=2.5in]{figs/diff_int5.pdf}}
5418
\caption{Comparison of the original time series and the integrated
5419
derivative.}
5420
\label{fig.diff_int5}
5421
\end{figure}
5422
5423
In the previous section, we showed that differentiation in the time
5424
domain corresponds to a simple filter in the frequency domain: it
5425
multiplies each component by $2 \pi i f$. Since integration is
5426
the inverse of differentiation, it also corresponds to a simple
5427
filter: it divides each component by $2 \pi i f$.
5428
5429
We can compute this filter like this:
5430
5431
\begin{verbatim}
5432
integ_filter = close.make_spectrum()
5433
integ_filter.hs = 1 / (PI2 * 1j * integ_filter.fs)
5434
\end{verbatim}
5435
5436
Figure~\ref{fig.diff_int3} (right) shows this filter on a log-$y$ scale,
5437
which makes it easier to see.
5438
5439
{\tt Spectrum} provides a method that applies the integration filter:
5440
5441
\begin{verbatim}
5442
# class Spectrum:
5443
5444
def integrate(self):
5445
self.hs /= PI2 * 1j * self.fs
5446
\end{verbatim}
5447
5448
We can confirm that the integration filter is correct by applying it
5449
to the spectrum of the derivative we just computed:
5450
5451
\begin{verbatim}
5452
integ_spectrum = deriv_spectrum.copy()
5453
integ_spectrum.integrate()
5454
\end{verbatim}
5455
5456
But notice that at $f=0$, we are dividing by 0. The result in
5457
NumPy is NaN, which is a special floating-point value that
5458
represents ``not a number''. We can partially deal with this
5459
problem by setting this value to 0 before converting the
5460
spectrum back to a wave:
5461
5462
\begin{verbatim}
5463
integ_spectrum.hs[0] = 0
5464
integ_wave = integ_spectrum.make_wave()
5465
\end{verbatim}
5466
5467
Figure~\ref{fig.diff_int5} shows this integrated derivative along with
5468
the original time series. They are almost identical, but the
5469
integrated derivative has been shifted down. The problem is that when
5470
we clobbered the $f=0$ component, we set the bias of the signal to 0.
5471
But that should not be surprising; in general, differentiation loses
5472
information about the bias, and integration can't recover it. In some
5473
sense, the NaN at $f=0$ is telling us that this element is unknown.
5474
5475
\begin{figure}
5476
% diff_int.py
5477
\centerline{\includegraphics[height=2.5in]{figs/diff_int6.pdf}}
5478
\caption{A sawtooth wave and its spectrum.}
5479
\label{fig.diff_int6}
5480
\end{figure}
5481
5482
If we provide this ``constant of integration'', the results are
5483
identical, which confirms that this integration filter is the correct
5484
inverse of the differentiation filter.
5485
5486
\section{Cumulative sum}
5487
\label{cumsum}
5488
5489
In the same way that the diff operator approximates differentiation,
5490
the cumulative sum approximates integration.
5491
I'll demonstrate with a Sawtooth signal.
5492
5493
\begin{verbatim}
5494
signal = thinkdsp.SawtoothSignal(freq=50)
5495
in_wave = signal.make_wave(duration=0.1, framerate=44100)
5496
\end{verbatim}
5497
5498
Figure~\ref{fig.diff_int6} shows this wave and its spectrum.
5499
5500
{\tt Wave} provides a method that computes the cumulative sum of
5501
a wave array and returns a new Wave object:
5502
5503
\begin{verbatim}
5504
# class Wave:
5505
5506
def cumsum(self):
5507
ys = np.cumsum(self.ys)
5508
ts = self.ts.copy()
5509
return Wave(ys, ts, self.framerate)
5510
\end{verbatim}
5511
5512
We can use it to compute the cumulative sum of \verb"in_wave":
5513
5514
\begin{verbatim}
5515
out_wave = in_wave.cumsum()
5516
out_wave.unbias()
5517
\end{verbatim}
5518
5519
\begin{figure}
5520
% diff_int.py
5521
\centerline{\includegraphics[height=2.5in]{figs/diff_int7.pdf}}
5522
\caption{A parabolic wave and its spectrum.}
5523
\label{fig.diff_int7}
5524
\end{figure}
5525
5526
Figure~\ref{fig.diff_int7} shows the resulting wave and its spectrum.
5527
If you did the exercises in Chapter~\ref{harmonics}, this waveform should
5528
look familiar: it's a parabolic signal.
5529
5530
Comparing the spectrum of the parabolic signal to the spectrum of the
5531
sawtooth, the amplitudes of the components drop off more quickly. In
5532
Chapter~\ref{harmonics}, we saw that the components of the sawtooth
5533
drop off in proportion to $1/f$. Since the cumulative sum
5534
approximates integration, and integration filters components in
5535
proportion to $1/f$, the components of the parabolic wave drop off in
5536
proportion to $1/f^2$.
5537
5538
We can see that graphically by computing the filter that corresponds
5539
to the cumulative sum:
5540
5541
\begin{verbatim}
5542
cumsum_filter = diff_filter.copy()
5543
cumsum_filter.hs = 1 / cumsum_filter.hs
5544
\end{verbatim}
5545
5546
Because {\tt cumsum} is the inverse operation of {\tt diff}, we
5547
start with a copy of \verb"diff_filter", which is the filter
5548
that corresponds to the {\tt diff} operation, and then invert the
5549
{\tt hs}.
5550
5551
\begin{figure}
5552
% diff_int.py
5553
\centerline{\includegraphics[height=2.5in]{figs/diff_int8.pdf}}
5554
\caption{Filters corresponding to cumulative sum and integration.}
5555
\label{fig.diff_int8}
5556
\end{figure}
5557
5558
Figure~\ref{fig.diff_int8} shows the filters corresponding to
5559
cumulative sum and integration. The cumulative sum is a good
5560
approximation of integration except at the highest frequencies,
5561
where it drops off a little faster.
5562
5563
To confirm that this is the correct filter for the cumulative
5564
sum, we can compare it to the ratio of the spectrum
5565
\verb"out_wave" to the spectrum of \verb"in_wave":
5566
5567
\begin{verbatim}
5568
in_spectrum = in_wave.make_spectrum()
5569
out_spectrum = out_wave.make_spectrum()
5570
ratio_spectrum = out_spectrum.ratio(in_spectrum, thresh=1)
5571
\end{verbatim}
5572
5573
\begin{figure}
5574
% diff_int.py
5575
\centerline{\includegraphics[height=2.5in]{figs/diff_int9.pdf}}
5576
\caption{Filter corresponding to cumulative sum and actual ratios of
5577
the before-and-after spectrums.}
5578
\label{fig.diff_int9}
5579
\end{figure}
5580
5581
And here's the method that computes the ratios:
5582
5583
\begin{verbatim}
5584
def ratio(self, denom, thresh=1):
5585
ratio_spectrum = self.copy()
5586
ratio_spectrum.hs /= denom.hs
5587
ratio_spectrum.hs[denom.amps < thresh] = np.nan
5588
return ratio_spectrum
5589
\end{verbatim}
5590
5591
When {\tt denom.amps} is small, the resulting ratio is noisy,
5592
so I set those values to NaN.
5593
5594
Figure~\ref{fig.diff_int9} shows the ratios and the filter
5595
corresponding to the cumulative sum. They agree, which confirms that
5596
inverting the filter for {\tt diff} yields the filter for {\tt
5597
cumsum}.
5598
5599
Finally, we can confirm that the Convolution Theorem applies by
5600
applying the {\tt cumsum} filter in the frequency domain:
5601
5602
\begin{verbatim}
5603
out_wave2 = (in_spectrum * cumsum_filter).make_wave()
5604
\end{verbatim}
5605
5606
Within the limits of floating-point error, \verb"out_wave2" is
5607
identical to \verb"out_wave", which we computed using {\tt cumsum}, so
5608
the Convolution Theorem works! But note that this demonstration only
5609
works with periodic signals.
5610
5611
5612
\section{Integrating noise}
5613
5614
In Section~\ref{brownian}, we generated Brownian noise by computing the
5615
cumulative sum of white noise.
5616
Now that we understand the effect of {\tt cumsum} in the frequency
5617
domain, we have some insight into the spectrum of Brownian noise.
5618
5619
White noise has equal power at all frequencies, on average. When we
5620
compute the cumulative sum, the amplitude of each component is divided
5621
by $f$. Since power is the square of magnitude, the power of each
5622
component is divided by $f^2$. So on average, the power at frequency
5623
$f$ is proportional to $1 / f^2$:
5624
%
5625
\[ P_f = K / f^2 \]
5626
%
5627
where $K$ is a constant that's not important.
5628
Taking the log of both sides yields:
5629
%
5630
\[ \log P_f = \log K - 2 \log f \]
5631
%
5632
And that's why, when we plot the spectrum of Brownian noise on a
5633
log-log scale, we expect to see a straight line with slope -2, at
5634
least approximately.
5635
5636
In Section~\ref{diffs} we plotted the spectrum of closing prices for
5637
Facebook, and estimated that the slope is -1.9, which is consistent
5638
with Brownian noise. Many stock prices have similar spectrums.
5639
5640
When we use the {\tt diff} operator to compute daily changes, we
5641
multiplied the {\em amplitude} of each component by a filter proportional to
5642
$f$, which means we multiplied the {\em power} of each component by $f^2$.
5643
On a log-log scale, this operation adds 2 to the slope of the
5644
power spectrum, which is why the estimated slope of the result
5645
is near 0.1 (but a little lower, because {\tt diff} only approximates
5646
differentiation).
5647
5648
5649
5650
\section{Exercises}
5651
5652
The notebook for this chapter is {\tt chap09.ipynb}.
5653
You might want to read through it and run the code.
5654
5655
Solutions to these exercises are in {\tt chap09soln.ipynb}.
5656
5657
\begin{exercise}
5658
The goal of this exercise is to explore the effect of {\tt diff} and
5659
{\tt differentiate} on a signal. Create a triangle wave and plot
5660
it. Apply {\tt diff} and plot the result. Compute the spectrum of the
5661
triangle wave, apply {\tt differentiate}, and plot the result. Convert
5662
the spectrum back to a wave and plot it. Are there differences between
5663
the effect of {\tt diff} and {\tt differentiate} for this wave?
5664
\end{exercise}
5665
5666
\begin{exercise}
5667
The goal of this exercise is to explore the effect of {\tt cumsum} and
5668
{\tt integrate} on a signal. Create a square wave and plot it. Apply
5669
{\tt cumsum} and plot the result. Compute the spectrum of the square
5670
wave, apply {\tt integrate}, and plot the result. Convert the spectrum
5671
back to a wave and plot it. Are there differences between the effect
5672
of {\tt cumsum} and {\tt integrate} for this wave?
5673
\end{exercise}
5674
5675
\begin{exercise}
5676
The goal of this exercise is the explore the effect of integrating
5677
twice. Create a sawtooth wave, compute its spectrum, then apply {\tt
5678
integrate} twice. Plot the resulting wave and its spectrum. What is
5679
the mathematical form of the wave? Why does it resemble a sinusoid?
5680
\end{exercise}
5681
5682
\begin{exercise}
5683
The goal of this exercise is to explore the effect of the 2nd
5684
difference and 2nd derivative. Create a {\tt CubicSignal}, which is
5685
defined in {\tt thinkdsp}. Compute the second difference by applying
5686
{\tt diff} twice. What does the result look like? Compute the second
5687
derivative by applying {\tt differentiate} to the spectrum twice.
5688
Does the result look the same?
5689
5690
Plot the filters that corresponds to the 2nd difference and the 2nd
5691
derivative and compare them. Hint: In order to get the filters on the
5692
same scale, use a wave with framerate 1.
5693
\end{exercise}
5694
5695
5696
5697
5698
\chapter{LTI systems}
5699
\label{systems}
5700
5701
This chapter presents the theory of signals and systems, using
5702
musical acoustics as an example. It explains an
5703
important application of the Convolution Theorem, characterization
5704
of linear, time-invariant systems (which I'll define soon).
5705
5706
The code for this chapter is in {\tt chap10.ipynb}, which is in the
5707
repository for this book (see Section~\ref{code}).
5708
You can also view it at \url{http://tinyurl.com/thinkdsp10}.
5709
5710
5711
5712
\section{Signals and systems}
5713
5714
In the context of signal processing, a {\bf system} is an abstract
5715
representation of anything that takes a signal as input and produces
5716
a signal as output.
5717
5718
For example, an electronic amplifier is a circuit that takes an
5719
electrical signal as input and produces a (louder) signal as output.
5720
5721
As another example, when you listen to a musical performance, you
5722
can think of the room as a system that takes the sound of the
5723
performance at the location where it is generated and produces a
5724
somewhat different sound at the location where you hear it.
5725
5726
A {\bf linear, time-invariant system}\footnote{My presentation here
5727
follows \url{http://en.wikipedia.org/wiki/LTI_system_theory}.} is a
5728
system with these two properties:
5729
5730
\begin{enumerate}
5731
5732
\item Linearity: If you put two inputs into the system at the same
5733
time, the result is the sum of their outputs. Mathematically, if an
5734
input $x_1$ produces output $y_1$ and another input $x_2$ produces
5735
$y_2$, then $a x_1 + b x_2$ produces $a y_1 + b y_2$, where $a$ and
5736
$b$ are scalars.
5737
5738
\item Time invariance: The
5739
effect of the system doesn't vary over time, or depend on the state
5740
of the system. So if inputs $x_1$ and $x_2$ differ by a shift in time,
5741
their outputs $y_1$ and $y_2$ differ by the same shift, but are otherwise
5742
identical.
5743
5744
\end{enumerate}
5745
5746
Many physical systems have these properties, at least approximately.
5747
5748
\begin{itemize}
5749
5750
\item Circuits that contain only resistors, capacitors and inductors are
5751
LTI, to the degree that the components behave like their idealized
5752
models.
5753
5754
\item Mechanical systems that contain springs, masses and
5755
dashpots are also LTI, assuming linear springs (force proportional
5756
to displacement) and dashpots (force proportional to velocity).
5757
5758
\item Also, and most relevant to applications in this book,
5759
the media that transmit sound (including air, water
5760
and solids) are well-modeled by LTI systems.
5761
5762
\end{itemize}
5763
5764
LTI systems are described by linear differential equations, and
5765
the solutions of those equations are complex sinusoids (see
5766
\url{http://en.wikipedia.org/wiki/Linear_differential_equation}).
5767
5768
This result provides an algorithm for computing the effect of
5769
an LTI system on an input signal:
5770
5771
\begin{enumerate}
5772
5773
\item Express the signal as the sum of complex sinusoid components.
5774
5775
\item For each input component, compute the corresponding output component.
5776
5777
\item Add up the output components.
5778
5779
\end{enumerate}
5780
5781
At this point, I hope this algorithm sounds familiar. It's the
5782
same algorithm we used for convolution in Section~\ref{effconv}, and
5783
for differentiation in Section~\ref{effdiff}. This process
5784
is called {\bf spectral decomposition} because we ``decompose''
5785
the input signal into its spectral components.
5786
5787
In order to apply this process to an LTI system, we have to {\bf
5788
characterize} the system by finding its effect on each component
5789
of the input signal. For mechanical systems, it turns out that there
5790
is a simple and efficient way to do that: you kick it and record
5791
the output.
5792
5793
Technically, the ``kick'' is called an {\bf impulse} and the
5794
output is called the {\bf impulse response}. You might wonder
5795
how a single impulse can completely characterize a system. You
5796
can see the answer by computing the DFT of an impulse. Here's
5797
a wave array with an impulse at $t=0$:
5798
5799
\begin{verbatim}
5800
impulse = np.zeros(8)
5801
impulse[0] = 1
5802
impulse_spectrum = np.fft.fft(impulse)
5803
\end{verbatim}
5804
5805
Here's the wave array:
5806
5807
\begin{verbatim}
5808
[ 1. 0. 0. 0. 0. 0. 0. 0.]
5809
\end{verbatim}
5810
5811
And here's its spectrum:
5812
5813
\begin{verbatim}
5814
[ 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j]
5815
\end{verbatim}
5816
5817
The spectrum is all ones; that is, an impulse is the sum of components
5818
with equal magnitudes at all frequencies. This
5819
spectrum should not be confused with white noise, which has the same
5820
{\em average} power at all frequencies, but varies around that
5821
average.
5822
5823
When you test a system by inputting
5824
an impulse, you are testing the response of the
5825
system at all frequencies. And you can test them all at the same
5826
time because the system is linear, so simultaneous tests don't
5827
interfere with each other.
5828
5829
5830
\section{Windows and filters}
5831
\label{winfilt}
5832
5833
To show why this kind of system characterization works, I
5834
will start with a simple example: a 2-element moving average.
5835
We can think of this operation as a system that takes a signal
5836
as an input and produces a slightly smoother signal as an output.
5837
5838
In this example we know what the window is, so we can compute
5839
the corresponding filter. But that's not usually the case; in the
5840
next section we'll look at an example where we don't know the
5841
window or the filter ahead of time.
5842
5843
\begin{figure}
5844
% systems.py
5845
\centerline{\includegraphics[height=2.5in]{figs/systems1.pdf}}
5846
\caption{DFT of a 2-element moving average window.}
5847
\label{fig.systems1}
5848
\end{figure}
5849
5850
Here's a window that computes a
5851
2-element moving average (see Section~\ref{smoothing}):
5852
5853
\begin{verbatim}
5854
window_array = np.array([0.5, 0.5, 0, 0, 0, 0, 0, 0,])
5855
window = thinkdsp.Wave(window_array, framerate=8)
5856
\end{verbatim}
5857
5858
We can find the corresponding filter by computing the DFT of the
5859
window:
5860
5861
\begin{verbatim}
5862
filtr = window.make_spectrum(full=True)
5863
\end{verbatim}
5864
5865
Figure~\ref{fig.systems1} shows the result. The filter that corresponds to a
5866
moving average window is a low-pass filter with the approximate shape
5867
of a Gaussian curve.
5868
5869
Now imagine that we did not know the window or the corresponding filter,
5870
and we wanted to characterize this system. We would do that by
5871
inputting an impulse and measuring the impulse response.
5872
5873
In this example, we can compute the impulse response by multiplying
5874
spectrum of the impulse and the filter, and then converting the result
5875
from a spectrum to a wave:
5876
5877
\begin{verbatim}
5878
product = impulse_spectrum * filtr
5879
filtered = product.make_wave()
5880
\end{verbatim}
5881
5882
Since \verb"impulse_spectrum" is all ones, the product is
5883
identical to the filter, and the filtered wave is identical to the
5884
window.
5885
5886
This example demonstrates two things:
5887
5888
\begin{itemize}
5889
5890
\item Because the spectrum of an impulse is all ones, the DFT
5891
of the impulse response is identical to the filter that
5892
characterizes the system.
5893
5894
\item Therefore, the impulse response is identical to the
5895
convolution window that characterizes the system.
5896
5897
\end{itemize}
5898
5899
5900
\section{Acoustic response}
5901
\label{response}
5902
5903
To characterize the acoustic response of a room or open space, a
5904
simple way to generate an impulse is to pop a balloon or
5905
fire a gun. A gunshot puts an impulse into
5906
the system; the sound you hear is the impulse response.
5907
5908
\begin{figure}
5909
% systems.py
5910
\centerline{\includegraphics[height=2.5in]{figs/systems6.pdf}}
5911
\caption{Waveform of a gunshot.}
5912
\label{fig.systems6}
5913
\end{figure}
5914
5915
As an example, I'll use a recording of a gunshot to characterize
5916
the room where the gun was fired, then use the impulse response
5917
to simulate the effect of that room on a violin recording.
5918
5919
This example is in {\tt chap10.ipynb}, which is in the repository
5920
for this book; you can also view it, and listen to the examples,
5921
at \url{http://tinyurl.com/thinkdsp10}.
5922
5923
Here's the gunshot:
5924
5925
\begin{verbatim}
5926
response = thinkdsp.read_wave('180961__kleeb__gunshots.wav')
5927
response = response.segment(start=0.26, duration=5.0)
5928
response.normalize()
5929
response.plot()
5930
\end{verbatim}
5931
5932
I select a segment starting at 0.26 seconds to remove the silence
5933
before the gunshot. Figure~\ref{fig.systems6} (left) shows the
5934
waveform of the gunshot. Next we compute the DFT of {\tt response}:
5935
5936
\begin{verbatim}
5937
transfer = response.make_spectrum()
5938
transfer.plot()
5939
\end{verbatim}
5940
5941
Figure~\ref{fig.systems6} (right) shows the result. This spectrum
5942
encodes the response of the room; for each frequency, the spectrum
5943
contains a complex number that represents an amplitude multiplier and
5944
a phase shift. This spectrum is called a {\bf transfer
5945
function} because it contains information about how the system transfers
5946
the input to the output.
5947
5948
Now we can simulate the effect this room would have on the sound
5949
of a violin. Here is the violin recording we used in Section~\ref{violin}
5950
5951
\begin{verbatim}
5952
violin = thinkdsp.read_wave('92002__jcveliz__violin-origional.wav')
5953
violin.truncate(len(response))
5954
violin.normalize()
5955
\end{verbatim}
5956
5957
The violin and gunshot waves were sampled at the same framerate,
5958
44,100 Hz. And coincidentally, the duration of both is about the
5959
same. I trimmed the violin wave to the same length as the gunshot.
5960
5961
Next I compute the DFT of the violin wave:
5962
5963
\begin{verbatim}
5964
spectrum = violin.make_spectrum()
5965
\end{verbatim}
5966
5967
Now I know the magnitude and phase of each component in the
5968
input, and I know the transfer function of the system. Their
5969
product is the DFT of the output, which we can use to compute the
5970
output wave:
5971
5972
\begin{verbatim}
5973
output = (spectrum * transfer).make_wave()
5974
output.normalize()
5975
output.plot()
5976
\end{verbatim}
5977
5978
\begin{figure}
5979
% systems.py
5980
\centerline{\includegraphics[height=2.5in]{figs/systems7.pdf}}
5981
\caption{The waveform of the violin recording before and after convolution.}
5982
\label{fig.systems7}
5983
\end{figure}
5984
5985
Figure~\ref{fig.systems7} shows the input (top) and output (bottom) of
5986
the system. They are substantially different, and the differences are
5987
clearly audible. Load {\tt chap10.ipynb} and listen to them. One
5988
thing I find striking about this example is that you can get a sense
5989
of what the room was like; to me, it sounds like a long, narrow room
5990
with hard floors and ceilings. That is, like a firing range.
5991
5992
There's one thing I glossed over in this example that I'll mention
5993
in case it bothers anyone. The violin recording I started with
5994
has already been transformed by one system: the room where it was
5995
recorded. So what I really computed in my example is the sound
5996
of the violin after two transformations. To properly simulate
5997
the sound of a violin in a different room, I should have characterized
5998
the room where the violin was recorded and applied the inverse
5999
of that transfer function first.
6000
6001
6002
\section{Systems and convolution}
6003
\label{sysconv}
6004
6005
\begin{figure}
6006
% systems.py
6007
\centerline{\includegraphics[height=2.5in]{figs/systems8.pdf}}
6008
\caption{Sum of a wave and a shifted, scaled copy.}
6009
\label{fig.systems8}
6010
\end{figure}
6011
6012
If you think the previous example is black magic,
6013
you are not alone. I've been thinking about it for a while and it
6014
still makes my head hurt.
6015
6016
In the previous section, I suggested one way to think about it:
6017
6018
\begin{itemize}
6019
6020
\item An impulse is made up of components with amplitude 1 at all
6021
frequencies.
6022
6023
\item The impulse response contains the sum of the responses of the
6024
system to all of these components.
6025
6026
\item The transfer function, which is the DFT of the impulse response,
6027
encodes the effect of the system on each frequency component in the form
6028
of an amplitude multiplier and a phase shift.
6029
6030
\item For any input, we can compute the response of the system
6031
by breaking the input into components, computing the response to
6032
each component, and adding them up.
6033
6034
\end{itemize}
6035
6036
But if you don't like that, there's another way to think about
6037
it altogether: convolution! By the Convolution Theorem, multiplication
6038
in the frequency domain corresponds to convolution in the time
6039
domain. In this example, the output of the system is the convolution
6040
of the input and the system response.
6041
6042
Here are the keys to understanding why that works:
6043
6044
\begin{itemize}
6045
6046
\item You can think of the samples in the input wave as a sequence
6047
of impulses with varying amplitude.
6048
6049
\item Each impulse in the input yields a copy of the impulse response,
6050
shifted in time (because the system is time-invariant) and scaled by
6051
the amplitude of the input.
6052
6053
\item The output is the sum of the shifted, scaled copies of the
6054
impulse response. The copies add up because the system is linear.
6055
6056
\end{itemize}
6057
6058
Let's work our way up gradually: suppose that instead of firing one
6059
gun, we fire two: a big one with amplitude 1 at $t=0$ and a
6060
smaller one with amplitude 0.5 at $t=1$.
6061
6062
We can compute the response of the system by adding up
6063
the original impulse response and a scaled, shifted copy of itself.
6064
Here's a function that makes a shifted, scaled copy of
6065
a wave:
6066
6067
\begin{verbatim}
6068
def shifted_scaled(wave, shift, factor):
6069
res = wave.copy()
6070
res.shift(shift)
6071
res.scale(factor)
6072
return res
6073
\end{verbatim}
6074
6075
The parameter {\tt shift} is a time shift in seconds; {\tt factor}
6076
is a multiplicative factor.
6077
6078
Here's how we use it to compute the response to a two-gun salute:
6079
6080
\begin{verbatim}
6081
shift = 1
6082
factor = 0.5
6083
gun2 = response + shifted_scaled(response, shift, factor)
6084
\end{verbatim}
6085
6086
Figure~\ref{fig.systems8} shows the result. You can hear what
6087
it sounds like in {\tt chap10.ipynb}. Not surprisingly, it
6088
sounds like two gunshots, the first one louder than the second.
6089
6090
Now suppose instead of two guns, you add up 100 guns fired at a rate
6091
of 441 shots per second.
6092
This loop computes the result:
6093
6094
\begin{verbatim}
6095
dt = 1 / 441
6096
total = 0
6097
for k in range(100):
6098
total += shifted_scaled(response, k*dt, 1.0)
6099
\end{verbatim}
6100
6101
With 441 shots per second,
6102
so you don't hear the individual shots. Instead, it sounds
6103
like a periodic signal at 441 Hz. If you play this example, it
6104
sounds like a car horn in a garage.
6105
6106
And that brings us to a key insight: you can think of any wave as a
6107
series of samples, where each sample is an impulse with a different
6108
amplitude.
6109
6110
As a example, I'll generate a sawtooth signal at 441 Hz:
6111
6112
\begin{verbatim}
6113
signal = thinkdsp.SawtoothSignal(freq=441)
6114
wave = signal.make_wave(duration=0.1,
6115
framerate=response.framerate)
6116
\end{verbatim}
6117
6118
Now I'll loop through the series of impulses that make up the
6119
sawtooth, and add up the impulse responses:
6120
6121
\begin{verbatim}
6122
total = 0
6123
for t, y in zip(wave.ts, wave.ys):
6124
total += shifted_scaled(response, t, y)
6125
\end{verbatim}
6126
6127
The result is what it would sound like to play a sawtooth wave in a
6128
firing range. Again, you can listen to it in {\tt chap10.ipynb}.
6129
6130
\begin{figure}
6131
\centerline{\includegraphics[height=1.2in]{figs/diagram2.pdf}}
6132
\caption{Diagram of the sum of scaled and shifted copies of $g$.}
6133
\label{fig.convolution}
6134
\end{figure}
6135
6136
Figure~\ref{fig.convolution} shows a diagram of this computation,
6137
where $f$ is the sawtooth, $g$ is the impulse response, and $h$
6138
is the sum of the shifted, scaled copies of $g$.
6139
6140
For the example shown:
6141
6142
\[ h[2] = f[0]g[2] + f[1]g[1] + f[2]g[0] \]
6143
6144
And more generally,
6145
6146
\[ h[n] = \sum_{m=0}^{N-1} f[m] g[n-m] \]
6147
6148
You might recognize this equation from Section~\ref{convolution}. It
6149
is the convolution of $f$ and $g$. This shows that if the input is $f$
6150
and the impulse response of the system is $g$, the output is the
6151
convolution of $f$ and $g$.
6152
6153
In summary, there are two ways to think about the effect of a system
6154
on a signal:
6155
6156
\begin{enumerate}
6157
6158
\item The input is a sequence of impulses, so the output is the sum of
6159
scaled, shifted copies of the impulse response; that sum is the
6160
convolution of the input and the impulse response.
6161
6162
\item The DFT of the impulse response is a transfer function that
6163
encodes the effect of the system on each frequency component as a
6164
magnitude and phase offset. The DFT of the input encodes the
6165
magnitude and phase offset of the frequency components it contains.
6166
Multiplying the DFT of the input by the transfer function yields
6167
the DFT of the output.
6168
6169
\end{enumerate}
6170
6171
The equivalence of these descriptions should not be a surprise;
6172
it is basically a statement of the Convolution Theorem:
6173
convolution of $f$ and $g$ in the time
6174
domain corresponds to multiplication in the frequency domain.
6175
And if you wondered why convolution is defined as it is, which
6176
seemed backwards when we talked about smoothing and difference
6177
windows, now you know the reason: the definition of convolution
6178
appears naturally in the response of an LTI system to a signal.
6179
6180
6181
\section{Proof of the Convolution Theorem}
6182
6183
Well, I've put it off long enough. It's time to prove the Convolution
6184
Theorem (CT), which states:
6185
6186
\[ \DFT(f \conv g) = \DFT(f) \DFT(g) \]
6187
6188
where $f$ and $g$ are vectors with the same length, $N$.
6189
6190
I'll proceed in two steps:
6191
6192
\begin{enumerate}
6193
6194
\item I'll show that in the special case where $f$ is a complex
6195
exponential, convolution with $g$ has the effect of multiplying
6196
$f$ by a scalar.
6197
6198
\item In the more general case where $f$ is not a complex exponential,
6199
we can use the DFT to express it as a sum of exponential components,
6200
compute the convolution of each component (by multiplication) and
6201
then add up the results.
6202
6203
\end{enumerate}
6204
6205
Together these steps prove the Convolution Theorem. But first, let's
6206
assemble the pieces we'll need. The DFT of $g$, which I'll call $G$
6207
is:
6208
%
6209
\[ DFT(g)[k] = G[k] = \sum_n g[n] \exp(-2 \pi i n k / N) \]
6210
%
6211
where $k$ is an index of frequency from
6212
0 to $N-1$ and $n$ is an index of time from 0 to $N-1$.
6213
The result is a vector of $N$ complex numbers.
6214
6215
The inverse DFT of $F$, which I'll call $f$, is:
6216
%
6217
\[ IDFT(F)[n] = f[n] = \sum_k F[k] \exp(2 \pi i n k / N) \]
6218
%
6219
Here's the definition of convolution:
6220
%
6221
\[ (f \conv g)[n] = \sum_m f[m] g[n-m] \]
6222
%
6223
where $m$ is another index of time from 0 to $N-1$.
6224
Convolution is commutative, so I could equivalently write:
6225
%
6226
\[ (f \conv g)[n] = \sum_m f[n-m] g[m] \]
6227
%
6228
Now let's consider the special case where $f$ is a complex
6229
exponential with frequency $k$, which I'll call $e_k$:
6230
%
6231
\[ f[n] = e_k[n] = \exp(2 \pi i n k / N) \]
6232
%
6233
where $k$ is an index of frequency and $n$ is an index of time.
6234
6235
Plugging $e_k$ into the second definition of convolution yields
6236
%
6237
\[ (e_k \conv g)[n] = \sum_m \exp(2 \pi i (n-m) k / N) g[m] \]
6238
%
6239
We can split the first term into a product:
6240
%
6241
\[ (e_k \conv g)[n] = \sum_m \exp(2 \pi i n k / N) \exp(-2 \pi i m k / N) g[m] \]
6242
%
6243
The first half does not depend on $m$, so we can pull it out of the
6244
summation:
6245
%
6246
\[ (e_k \conv g)[n] = \exp(2 \pi i n k / N) \sum_m \exp(-2 \pi i m k / N) g[m] \]
6247
%
6248
Now we recognize that the first term is $e_k$, and the summation is
6249
$G[k]$ (using $m$ as the index of time). So we can write:
6250
%
6251
\[ (e_k \conv g)[n] = e_k[n] G[k] \]
6252
%
6253
which shows that for each complex exponential, $e_k$, convolution
6254
with $g$ has the effect of multiplying $e_k$ by $G[k]$. In mathematical
6255
terms, each $e_k$ is an eigenvector of this operation, and
6256
$G[k]$ is the corresponding eigenvalue (see Section~\ref{effdiff}).
6257
6258
Now for the second part of the proof. If the input signal, $f$, doesn't
6259
happen to be a complex exponential, we can express it as a sum of
6260
complex exponentials by computing its DFT, $F$.
6261
For each value of $k$ from 0 to $N-1$, $F[k]$ is the complex
6262
magnitude of the component with frequency $k$.
6263
6264
Each input component is a complex exponential with magnitude
6265
$F[k]$, so each output component is a complex
6266
exponential with magnitude $F[k] G[k]$, based on the first part of
6267
the proof.
6268
6269
Because the system is linear, the output is just the sum of the
6270
output components:
6271
%
6272
\[ (f \conv g)[n] = \sum_k F[k] G[k] e_k[n] \]
6273
%
6274
Plugging in the definition of $e_k$ yields
6275
%
6276
\[ (f \conv g)[n] = \sum_k F[k] G[k] \exp(2 \pi i n k / N) \]
6277
%
6278
The right hand side is the inverse DFT of the product $F G$. Thus:
6279
%
6280
\[ (f \conv g) = \IDFT( F G ) \]
6281
%
6282
Substituting $F = \DFT(f)$ and $G = \DFT(g)$:
6283
%
6284
\[ (f \conv g) = \IDFT( \DFT(f) \DFT(g) ) \]
6285
%
6286
Finally, taking the DFT of both sides yields the Convolution Theorem:
6287
%
6288
\[ \DFT(f \conv g) = \DFT(f) \DFT(g) \]
6289
%
6290
QED
6291
6292
6293
\section{Exercises}
6294
6295
Solutions to these exercises are in {\tt chap10soln.ipynb}.
6296
6297
\begin{exercise}
6298
In Section~\ref{sysconv} I describe convolution as the sum of shifted,
6299
scaled copies of a signal. Strictly speaking, this operation is
6300
{\em linear} convolution, which does not assume that the signal
6301
is periodic.
6302
6303
But in Section~\ref{response}, when we multiply the
6304
DFT of the signal by the transfer function, that operation corresponds
6305
to {\em circular} convolution, which assumes that the signal is
6306
periodic. As a result, you might notice that the output contains
6307
an extra note at the beginning, which wraps around from the end.
6308
6309
Fortunately, there is a standard solution to this problem. If you
6310
add enough zeros to the end of the signal before computing the DFT,
6311
you can avoid wrap-around and compute a linear convolution.
6312
6313
Modify the example in {\tt chap10.ipynb} and confirm that zero-padding
6314
eliminates the extra note at the beginning of the output.
6315
\end{exercise}
6316
6317
6318
\begin{exercise}
6319
The Open AIR library provides a ``centralized... on-line resource for
6320
anyone interested in auralization and acoustical impulse response
6321
data'' (\url{http://www.openairlib.net}). Browse their collection
6322
of impulse response data and download one that sounds interesting.
6323
Find a short recording that has the same sample rate as the impulse
6324
response you downloaded.
6325
6326
Simulate the sound of your recording in the space where the impulse
6327
response was measured, computed two way: by convolving the recording
6328
with the impulse response and by computing the filter that corresponds
6329
to the impulse response and multiplying by the DFT of the recording.
6330
\end{exercise}
6331
6332
%TODO: one more?
6333
%\begin{exercise}
6334
%\end{exercise}
6335
6336
6337
6338
\chapter{Modulation and sampling}
6339
6340
In Section~\ref{aliasing} we saw that when a signal is sampled at
6341
10,000 Hz, a component at 5500 Hz is indistinguishable from a
6342
component at 4500 Hz. In this example, the folding frequency, 5000 Hz,
6343
is half of the sampling rate. But I didn't explain why.
6344
6345
This chapter explores the effect of sampling and presents the
6346
Sampling Theorem, which explains aliasing and the folding frequency.
6347
6348
I'll start by exploring the effect of convolution with impulses;
6349
I'll uses that to explain amplitude modulation (AM), which
6350
turns out to be useful for understanding the Sampling Theorem.
6351
6352
The code for this chapter is in {\tt chap11.ipynb}, which is in the
6353
repository for this book (see Section~\ref{code}).
6354
You can also view it at \url{http://tinyurl.com/thinkdsp11}.
6355
6356
6357
\section{Convolution with impulses}
6358
6359
As we saw in Section~\ref{sysconv}, convolution of a signal with
6360
a series of impulses has the effect of making shifted, scaled
6361
copies of the signal.
6362
6363
As an example, I'll read signal that sounds like a beep:
6364
6365
\begin{verbatim}
6366
filename = '253887__themusicalnomad__positive-beeps.wav'
6367
wave = thinkdsp.read_wave(filename)
6368
wave.normalize()
6369
\end{verbatim}
6370
6371
And I'll construct a wave with four impulses:
6372
6373
\begin{verbatim}
6374
imp_sig = thinkdsp.Impulses([0.005, 0.3, 0.6, 0.9],
6375
amps=[1, 0.5, 0.25, 0.1])
6376
impulses = imp_sig.make_wave(start=0, duration=1.0,
6377
framerate=wave.framerate)
6378
\end{verbatim}
6379
6380
And then convolve them:
6381
6382
\begin{verbatim}
6383
convolved = wave.convolve(impulses)
6384
\end{verbatim}
6385
6386
\begin{figure}
6387
% sampling.py
6388
\centerline{\includegraphics[height=3.5in]{figs/sampling1.pdf}}
6389
\caption{The effect of convolving a signal (top left) with a series of
6390
impulses (bottom left). The result (right) is the sum of shifted,
6391
scaled copies of the signal.}
6392
\label{fig.sampling1}
6393
\end{figure}
6394
6395
Figure~\ref{fig.sampling1} shows the results, with the signal in
6396
the top left, the impulses in the lower left, and the result on
6397
the right.
6398
6399
You can hear the result in {\tt chap11.ipynb}; it sounds like
6400
a series of four beeps with decreasing loudness.
6401
6402
The point of this example is just to demonstrate that convolution
6403
with impulses makes shifted, scaled copies. This result will be
6404
useful in the next section.
6405
6406
6407
\section{Amplitude modulation}
6408
\label{am}
6409
6410
\begin{figure}
6411
% sampling.py
6412
\centerline{\includegraphics[height=5.5in]{figs/sampling2.pdf}}
6413
\caption{Demonstration of amplitude modulation. The top row is the
6414
spectrum of the signal; the next row is the spectrum after modulation;
6415
the next row is the spectrum after demodulation; the last row is the
6416
demodulated signal after low-pass filtering.}
6417
\label{fig.sampling2}
6418
\end{figure}
6419
6420
Amplitude modulation (AM) is used to broadcast AM radio, among other
6421
applications. At the transmitter, the signal (which might contain
6422
speech, music, etc.) is ``modulated'' by multiplying it with a cosine
6423
signal that acts as a ``carrier wave''. The result is a
6424
high-frequency wave that is suitable for broadcast by radio. Typical
6425
frequencies for AM radio in the United States are 500--1600 kHz (see
6426
\url{https://en.wikipedia.org/wiki/AM_broadcasting}).
6427
6428
At the receiving end, the broadcast signal is ``demodulated'' to
6429
recover the original signal. Surprisingly, demodulation works by
6430
multiplying the broadcast signal, again, by the same carrier wave.
6431
6432
To see how that works, I'll modulate a signal with a carrier wave at
6433
10 kHz. Here's the signal:
6434
6435
\begin{verbatim}
6436
filename = '105977__wcfl10__favorite-station.wav'
6437
wave = thinkdsp.read_wave(filename)
6438
wave.unbias()
6439
wave.normalize()
6440
\end{verbatim}
6441
6442
And here's the carrier:
6443
6444
\begin{verbatim}
6445
carrier_sig = thinkdsp.CosSignal(freq=10000)
6446
carrier_wave = carrier_sig.make_wave(duration=wave.duration,
6447
framerate=wave.framerate)
6448
\end{verbatim}
6449
6450
We can multiply them using the {\tt *} operator, which multiplies
6451
the wave arrays elementwise:
6452
6453
\begin{verbatim}
6454
modulated = wave * carrier_wave
6455
\end{verbatim}
6456
6457
The result sounds pretty bad. You can hear it in {\tt chap11.ipynb}.
6458
6459
Figure~\ref{fig.sampling2} shows what's happening in the frequency
6460
domain. The top row is the spectrum of the original signal. The
6461
next row is the spectrum of the modulated signal, after multiplying
6462
by the carrier. It contains two copies of the original spectrum,
6463
shifted by plus and minus 10 kHz.
6464
6465
To understand why, recall that convolution in the time domain corresponds
6466
to multiplication in the frequency domain. Conversely, multiplication
6467
in the time domain corresponds to convolution in the frequency domain.
6468
When we multiply the signal by the carrier, we are convolving its
6469
spectrum with the DFT of the carrier.
6470
6471
Since the carrier is a simple cosine wave, its DFT is two impulses, at
6472
plus and minus 10 kHz. Convolution with these impulses makes
6473
shifted, scaled copies of the spectrum. Notice that the amplitude of
6474
the spectrum is smaller after modulation. The energy from the original
6475
signal is split between the copies.
6476
6477
We demodulate the signal, by multiplying by the carrier wave again:
6478
6479
\begin{verbatim}
6480
demodulated = modulated * carrier_wave
6481
\end{verbatim}
6482
6483
The third row of Figure~\ref{fig.sampling2} shows the result. Again,
6484
multiplication in the time domain corresponds to convolution in the
6485
frequency domain, which makes shifted, scaled copies of the spectrum.
6486
6487
Since the modulated spectrum contains two peaks, each peak gets split
6488
in half and shifted by plus and minus 20 kHz. Two of the copies
6489
meet at 0 kHz and get added together; the other two copies end up
6490
centered at plus and minus 20 kHz.
6491
6492
If you listen to the demodulated signal, it sounds pretty good. The
6493
extra copies of the spectrum add high frequency components that were
6494
not it the original signal, but they are so high that your speakers
6495
probably can't play them, and most people can't hear them. But if
6496
you have good speakers and good ears, you might.
6497
6498
In that case, you can get rid of the extra components by applying a
6499
low-pass filter:
6500
6501
\begin{verbatim}
6502
demodulated_spectrum = demodulated.make_spectrum(full=True)
6503
demodulated_spectrum.low_pass(10000)
6504
filtered = demodulated_spectrum.make_wave()
6505
\end{verbatim}
6506
6507
The result is quite close to the original wave, although about half
6508
of the power is lost after demodulating and filtering. But that's not
6509
a problem in practice, because much more of the power is lost in
6510
transmitting and receiving the broadcast signal. We have to amplify
6511
the result anyway, another factor of 2 is not an issue.
6512
6513
6514
\section{Sampling}
6515
6516
\begin{figure}
6517
% sampling.py
6518
\centerline{\includegraphics[height=3.0in]{figs/sampling3.pdf}}
6519
\caption{Spectrum of a signal before (top) and after (bottom) sampling.}
6520
\label{fig.sampling3}
6521
\end{figure}
6522
6523
I explained amplitude modulation in part because it is interesting, but
6524
mostly because it will help us understand sampling. ``Sampling'' is
6525
the process of measuring an analog signal at an series of points in
6526
time, usually with equal spacing.
6527
6528
For example, the WAV files we have used as examples were
6529
recorded by sampling the output of a microphone using an analog-to-digital
6530
converter (ADC). The sampling rate for most of them is 44.1 kHz,
6531
which is the standard rate for ``CD quality'' sound, or 48 kHz, which
6532
is the standard for DVD sound.
6533
6534
At 48 kHz, the folding frequency is 24 kHz, which is higher than most
6535
people can hear (see \url{https://en.wikipedia.org/wiki/Hearing_range}).
6536
6537
In most of these waves, each sample has 16 bits, so there
6538
are $2^{16}$ distinct levels. This ``bit depth'' turns out to be enough
6539
that adding more bits does not improve the sound quality noticeably
6540
(see \url{https://en.wikipedia.org/wiki/Digital_audio}).
6541
6542
Of course, applications other than audio signals might require higher
6543
sampling rates, in order to capture higher frequencies, or higher
6544
bit-depth, in order to reproduce waveforms with more fidelity.
6545
6546
%TODO: refer to Vi Hart's video
6547
%TODO: refer to the sampling video
6548
6549
To demonstrate the effect of the sampling process, I am going to start
6550
with a wave sampled at 44.1 kHz and select samples from it at about 11 kHz.
6551
This is not exactly the same as sampling from an analog signal, but
6552
the effect is the same.
6553
6554
First I'll load a recording of a drum solo:
6555
6556
\begin{verbatim}
6557
filename = '263868__kevcio__amen-break-a-160-bpm.wav'
6558
wave = thinkdsp.read_wave(filename)
6559
wave.normalize()
6560
\end{verbatim}
6561
6562
Figure~\ref{fig.sampling3} (top) shows the spectrum of this wave.
6563
Now here's the function that samples from the wave:
6564
6565
\begin{verbatim}
6566
def sample(wave, factor=4):
6567
ys = np.zeros(len(wave))
6568
ys[::factor] = wave.ys[::factor]
6569
return thinkdsp.Wave(ys, framerate=wave.framerate)
6570
\end{verbatim}
6571
6572
I'll use it to select every fourth element:
6573
6574
\begin{verbatim}
6575
sampled = sample(wave, 4)
6576
\end{verbatim}
6577
6578
The result has the same framerate as the original, but most of the
6579
elements are zero. If you play the sampled wave, it doesn't sound
6580
very good. The sampling process introduces high-frequency
6581
components that were not in the original.
6582
6583
Figure~\ref{fig.sampling3} (bottom) shows the spectrum of the sampled
6584
wave. It contains four copies of the original spectrum (it looks like
6585
five copies because one is split between the highest and lowest
6586
frequencies).
6587
6588
\begin{figure}
6589
% sampling.py
6590
\centerline{\includegraphics[height=4in]{figs/sampling9.pdf}}
6591
\caption{The DFT of an impulse train is also an impulse train.}
6592
\label{fig.sampling9}
6593
\end{figure}
6594
6595
To understand where these copies come from, we can think of the
6596
sampling process as multiplication with a series of impulses. Instead
6597
of using {\tt sample} to select every fourth element, we could use
6598
this function to make a series of impulses, sometimes called an
6599
{\bf impulse train}:
6600
6601
\begin{verbatim}
6602
def make_impulses(wave, factor):
6603
ys = np.zeros(len(wave))
6604
ys[::factor] = 1
6605
ts = np.arange(len(wave)) / wave.framerate
6606
return thinkdsp.Wave(ys, ts, wave.framerate)
6607
\end{verbatim}
6608
6609
And then multiply the original wave by the impulse train:
6610
6611
\begin{verbatim}
6612
impulses = make_impulses(wave, 4)
6613
sampled = wave * impulses
6614
\end{verbatim}
6615
6616
The result is the same; it still doesn't sound very good, but now
6617
we understand why. Multiplication in the time domain corresponds
6618
to convolution in the frequency domain. When we multiply
6619
by an impulse train, we are convolving with the DFT of an
6620
impulse train. As it turns out, the DFT of an impulse
6621
train is also an impulse train.
6622
6623
Figure~\ref{fig.sampling9} shows two examples. The top row is
6624
the impulse train in the example, with frequency 11,025 Hz.
6625
The DFT is a train of 4 impulses, which is why we get 4 copies
6626
of the spectrum. The bottom row shows an impulse
6627
train with a lower frequency, about 5512 Hz. Its DFT is a train
6628
of 8 impulses. In general, more impulses in the time
6629
domain correspond to fewer impulses in the frequency
6630
domain.
6631
6632
In summary:
6633
6634
\begin{itemize}
6635
6636
\item We can think of sampling as multiplication by an impulse train.
6637
6638
\item Multiplying by an impulse train corresponds
6639
to convolution with an impulse train in the frequency domain.
6640
6641
\item Convolution with an impulse train makes multiple copies of the
6642
signal's spectrum.
6643
6644
\end{itemize}
6645
6646
6647
\section{Aliasing}
6648
6649
\begin{figure}
6650
% sampling.py
6651
\centerline{\includegraphics[height=5.5in]{figs/sampling4.pdf}}
6652
\caption{Spectrum of the drum solo (top), spectrum of the impulse
6653
train (second row), spectrum of the sampled wave (third row),
6654
after low-pass filtering (bottom).}
6655
\label{fig.sampling4}
6656
\end{figure}
6657
6658
Section~\ref{am}, after demodulating an AM signal we got rid of
6659
the extra copies of the spectrum by applying a low-pass filter.
6660
We can do the same thing after sampling, but it turns out
6661
not to be a perfect solution.
6662
6663
Figure~\ref{fig.sampling4} shows why not. The top row is the spectrum
6664
of the drum solo. It contains high frequency components that extend
6665
past 10 kHz. When we sample this wave, we convolve the spectrum
6666
with the impulse train (second row), which makes copies of the spectrum
6667
(third row). The bottom row shows the result after applying a low-pass
6668
filter with a cutoff at the folding frequency, 5512 Hz.
6669
6670
If we convert the result back to a wave, it is similar to the original
6671
wave, but there are two problems:
6672
6673
\begin{itemize}
6674
6675
\item Because of the low-pass filter, the components above 5500 Hz
6676
have been lost, so the result sounds muted.
6677
6678
\item Even the components below 5500 Hz are not quite right, because
6679
the include contributions from leftover from the spectral copies we
6680
tried to filter out.
6681
6682
\end{itemize}
6683
6684
\begin{figure}
6685
% sampling.py
6686
\centerline{\includegraphics[height=4.8in]{figs/sampling5.pdf}}
6687
\caption{Spectrum of a bass guitar solo (top), its spectrum after
6688
sampling (middle), and after filtering (bottom).}
6689
\label{fig.sampling5}
6690
\end{figure}
6691
6692
If the spectral copies overlap after sampling, we lose information
6693
about the spectrum, and we won't be able to recover it.
6694
6695
But if the copies don't overlap, things work out pretty well. As
6696
a second example, I loaded a recording of a bass guitar solo.
6697
6698
Figure~\ref{fig.sampling5} shows its spectrum (top row), which contains
6699
no visible energy above 5512 Hz. The second row shows the spectrum of
6700
the sampled wave, and the third row shows the spectrum after the low
6701
pass filter. The amplitude is lower because we've filtered out some
6702
of the energy, but the shape of the spectrum is almost exactly what we
6703
started with. And if we convert back to a wave, it sounds the same.
6704
6705
6706
\section{Interpolation}
6707
6708
\begin{figure}
6709
% sampling.py
6710
\centerline{\includegraphics[height=2.5in]{figs/sampling6.pdf}}
6711
\caption{A brick wall low-pass filter (right) and the corresponding
6712
convolution window (left).}
6713
\label{fig.sampling6}
6714
\end{figure}
6715
6716
The low pass filter I used in the last step is a so-called {\bf brick
6717
wall filter}; frequencies above the cutoff are removed completely,
6718
as if they hit a brick wall.
6719
6720
Figure~\ref{fig.sampling6} (right) shows what this filter looks like.
6721
Of course, multiplication by the this filter, in the frequency domain,
6722
corresponds to convolution with a window in the time domain. We can
6723
find out what that window is by computing the inverse DFT of the
6724
filter, which is shown in Figure~\ref{fig.sampling6} (left).
6725
6726
That function has a name; it is the normalized sinc function, or at
6727
least a discrete approximation of it (see
6728
\url{https://en.wikipedia.org/wiki/Sinc_function}):
6729
6730
\[ \mathrm{sinc}(x) = \frac{\sin \pi x}{\pi x} \]
6731
6732
When we apply the low-pass filter, we are convolving with a sinc
6733
function. We can think of this convolution as the sum of shifted,
6734
scaled copies of the sinc function.
6735
6736
The value of sinc is 1 at 0 and 0 at every other integer
6737
value of $x$. When we shift the sinc function, we move the zero point.
6738
When we scale it, we change the height at the zero point.
6739
So when we add up the shifted, scaled copies, they interpolate
6740
between the sampled points.
6741
6742
\begin{figure}
6743
% sampling.py
6744
\centerline{\includegraphics[height=2.5in]{figs/sampling8.pdf}}
6745
\caption{.}
6746
\label{fig.sampling8}
6747
\end{figure}
6748
6749
Figure~\ref{fig.sampling8} shows how that works using a short segment
6750
of the bass guitar solo. The line across the to is the original
6751
wave. The vertical gray lines show the sampled values. The thin
6752
curves are the shifted, scaled copies of the sinc function.
6753
The sum of these sinc functions is identical to the original wave.
6754
6755
Read that last sentence again, because it is more surprising than it
6756
might seem. Because we started with a signal that contained no energy
6757
above 5512 Hz, and we sampled at 11,025 Hz, we were able to recover
6758
the original spectrum exactly.
6759
6760
In this example, I started with a wave that had already been
6761
sampled at 44,100 Hz, and I resampled it at 11,025 Hz. After
6762
resampling, the gap between the spectral copies is the sampling
6763
rate, 11.025 kHz. If the original signal contains components that
6764
exceed half of the sampling rate, 5512 Hz, the copies overlap
6765
and we lose information.
6766
6767
But if the signal is ``bandwidth limited''; that is, it contains no
6768
energy above 5512 Hz, the spectral copies don't overlap, we don't lose
6769
information, and we can recover the original signal exactly.
6770
6771
This result is known as the Nyquist-Shannon sampling theorem (see
6772
\url{https://en.wikipedia.org/wiki/Nyquist-Shannon_sampling_theorem}).
6773
6774
This example does not prove the Sampling Theorem, but I hope it
6775
helps you understand what it says and why it works.
6776
6777
Notice that the argument I made does
6778
not depend on the original sampling rate, 44.1 kHz. The result
6779
would be the same if the original had been sampled at a higher
6780
frequency, or even if the original had been a continuous analog
6781
signal: if we sample at framerate $f$, we can recover the original
6782
signal exactly, as long as it contains no energy at frequencies
6783
above $f/2$.
6784
6785
6786
\section{Exercises}
6787
6788
Solutions to these exercises are in {\tt chap11soln.ipynb}.
6789
6790
\begin{exercise}
6791
The code in this chapter is in {\tt chap11.ipynb}. Read through
6792
it and listen to the examples.
6793
\end{exercise}
6794
6795
6796
\begin{exercise}
6797
Chris ``Monty'' Montgomery has an excellent video called ``D/A and A/D
6798
| Digital Show and Tell''; it demonstrates the Sampling Theorem in
6799
action, and presents lots of other excellent information about
6800
sampling. Watch it at
6801
\url{https://www.youtube.com/watch?v=cIQ9IXSUzuM}.
6802
\end{exercise}
6803
6804
6805
\begin{exercise}
6806
As we have seen, if you sample a signal at too low a
6807
framerate, frequencies above the folding frequency get aliased.
6808
Once that happens, it is no longer possible to filter out
6809
these components, because they are indistinguishable from
6810
lower frequencies.
6811
6812
It is a good idea to filter out these frequencies {\em before}
6813
sampling; a low-pass filter used for this purpose is called
6814
an {\bf anti-aliasing filter}.
6815
6816
Returning to the drum solo example, apply a low-pass filter
6817
before sampling, then apply the low-pass filter again to remove
6818
the spectral copies introduced by sampling. The result should
6819
be identical to the filtered signal.
6820
\end{exercise}
6821
6822
6823
\backmatter
6824
\printindex
6825
6826
\end{document}
6827
6828
\end{itemize}
6829
6830