Kernel: Python [conda env:py37]
In [1]:
Introduction
Why Machine Learning?
Problems Machine Learning Can Solve
Knowing Your Task and Knowing Your Data
Why Python?
scikit-learn
Installing scikit-learn
Essential Libraries and Tools
Jupyter Notebook
NumPy
In [2]:
x:
[[1 2 3]
[4 5 6]]
SciPy
In [3]:
NumPy array:
[[1. 0. 0. 0.]
[0. 1. 0. 0.]
[0. 0. 1. 0.]
[0. 0. 0. 1.]]
In [4]:
SciPy sparse CSR matrix:
(0, 0) 1.0
(1, 1) 1.0
(2, 2) 1.0
(3, 3) 1.0
In [5]:
COO representation: (0, 0) 1.0
(1, 1) 1.0
(2, 2) 1.0
(3, 3) 1.0
matplotlib
In [6]:
[<matplotlib.lines.Line2D at 0x7f1d73e83ba8>]
pandas
In [7]:
Name | Location | Age | |
---|---|---|---|
0 | John | New York | 24 |
1 | Anna | Paris | 13 |
2 | Peter | Berlin | 53 |
3 | Linda | London | 33 |
In [8]:
Name | Location | Age | |
---|---|---|---|
2 | Peter | Berlin | 53 |
3 | Linda | London | 33 |
mglearn
Python 2 versus Python 3
Versions Used in this Book
In [9]:
Python version: 3.7.0 (default, Jun 28 2018, 13:15:42)
[GCC 7.2.0]
pandas version: 0.23.4
matplotlib version: 3.0.0
NumPy version: 1.15.2
SciPy version: 1.1.0
IPython version: 6.4.0
scikit-learn version: 0.21.dev0
A First Application: Classifying Iris Species
Meet the Data
In [10]:
In [11]:
Keys of iris_dataset:
dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])
In [12]:
.. _iris_dataset:
Iris plants dataset
--------------------
**Data Set Characteristics:**
:Number of Instances: 150 (50 in each of three classes)
:Number of Attributes: 4 numeric, pre
...
In [13]:
Target names: ['setosa' 'versicolor' 'virginica']
In [14]:
Feature names:
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
In [15]:
Type of data: <class 'numpy.ndarray'>
In [16]:
Shape of data: (150, 4)
In [17]:
First five rows of data:
[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 3.6 1.4 0.2]]
In [18]:
Type of target: <class 'numpy.ndarray'>
In [19]:
Shape of target: (150,)
In [20]:
Target:
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2]
Measuring Success: Training and Testing Data
In [21]:
In [22]:
X_train shape: (112, 4)
y_train shape: (112,)
In [23]:
X_test shape: (38, 4)
y_test shape: (38,)
First Things First: Look at Your Data
In [24]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f1e5c9a3ef0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7f1e5c9520f0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7f1e5c9794e0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7f1e5c91f978>],
[<matplotlib.axes._subplots.AxesSubplot object at 0x7f1e5c8c7e48>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7f1e5c8f8320>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7f1e5c8a27f0>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7f1e5c84acf8>],
[<matplotlib.axes._subplots.AxesSubplot object at 0x7f1e5c84ad30>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7f1e5c822710>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7f1e5c7cec18>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7f1e5c7fe160>],
[<matplotlib.axes._subplots.AxesSubplot object at 0x7f1e5c7a5668>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7f1e5c74db70>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7f1e5c77f0b8>,
<matplotlib.axes._subplots.AxesSubplot object at 0x7f1e5c7265c0>]],
dtype=object)
Building Your First Model: k-Nearest Neighbors
In [25]:
In [26]:
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=None, n_neighbors=1, p=2,
weights='uniform')
Making Predictions
In [27]:
X_new.shape: (1, 4)
In [28]:
Prediction: [0]
Predicted target name: ['setosa']
Evaluating the Model
In [29]:
Test set predictions:
[2 1 0 2 0 2 0 1 1 1 2 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0
2]
In [30]:
Test set score: 0.97
In [31]:
Test set score: 0.97
Summary and Outlook
In [32]:
Test set score: 0.97