| Hosted by CoCalc | Download
Kernel: Python 3 (system-wide)

Statsmodels OLS fit

import pandas as pd
data = pd.DataFrame({ 'x': [1, 2, 4, 5, 6, 7.7, 8.2, 9, 10, 11], 'y': [3, 3.6, 4.2, 4.1, 5, 4.9, 5.5, 6, 6.1, 6.5], })
import statsmodels.formula.api as smf
model = smf.ols('y ~ x', data=data)
res = model.fit()
res.predict(pd.DataFrame({'x': [0, 5, 10, 100]}))
0 2.741911 1 4.422732 2 6.103553 3 36.358325 dtype: float64
res.summary()
/usr/local/lib/python3.6/dist-packages/scipy/stats/stats.py:1535: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=10 "anyway, n=%i" % int(n))
OLS Regression Results
Dep. Variable: y R-squared: 0.962
Model: OLS Adj. R-squared: 0.957
Method: Least Squares F-statistic: 201.4
Date: Sun, 28 Jun 2020 Prob (F-statistic): 5.91e-07
Time: 09:20:50 Log-Likelihood: 1.2199
No. Observations: 10 AIC: 1.560
Df Residuals: 8 BIC: 2.165
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept 2.7419 0.169 16.201 0.000 2.352 3.132
x 0.3362 0.024 14.193 0.000 0.282 0.391
Omnibus: 2.060 Durbin-Watson: 2.907
Prob(Omnibus): 0.357 Jarque-Bera (JB): 1.180
Skew: -0.805 Prob(JB): 0.554
Kurtosis: 2.512 Cond. No. 16.2


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
import numpy as np import matplotlib.pyplot as plt
fig = data.plot.scatter('x', 'y', s=40, grid=True) xx = np.linspace(-2, 15, 100) yy = res.predict(pd.DataFrame({'x': xx})) fig.plot(xx, yy, color='green')
[<matplotlib.lines.Line2D at 0x7f48100caf98>]
Image in a Jupyter notebook
data2 = pd.DataFrame({ 'x1': [1, 2, 4, 5, 6, 7.7, 8.2, 9, 10, 11], 'x2': [4, 5, 4, 5, 6, 7.7, 7, 7.9, 8, 8.1], 'y': [3, 3.6, 4.2, 4.1, 5, 4.9, 5.5, 6, 6.1, 6.5], })
model2 = smf.ols('y ~ x1 + x2', data=data2) res2 = model2.fit()
res2.summary()
/usr/local/lib/python3.6/dist-packages/scipy/stats/stats.py:1535: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=10 "anyway, n=%i" % int(n))
OLS Regression Results
Dep. Variable: y R-squared: 0.962
Model: OLS Adj. R-squared: 0.951
Method: Least Squares F-statistic: 88.75
Date: Sun, 28 Jun 2020 Prob (F-statistic): 1.06e-05
Time: 09:20:54 Log-Likelihood: 1.2542
No. Observations: 10 AIC: 3.492
Df Residuals: 7 BIC: 4.399
Df Model: 2
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept 2.8433 0.496 5.730 0.001 1.670 4.017
x1 0.3503 0.069 5.063 0.001 0.187 0.514
x2 -0.0306 0.139 -0.219 0.833 -0.360 0.299
Omnibus: 1.553 Durbin-Watson: 2.877
Prob(Omnibus): 0.460 Jarque-Bera (JB): 0.992
Skew: -0.707 Prob(JB): 0.609
Kurtosis: 2.383 Cond. No. 61.4


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.