Contact
CoCalc Logo Icon
StoreFeaturesDocsShareSupport News AboutSign UpSign In
| Download

Github repo cloud-examples: https://github.com/sagemath/cloud-examples

Views: 7807
License: MIT

OLS vs RLM

The following shows a comparison between ordinary least squares and a robust linear model in statsmodels.

%auto import statsmodels.api as sm import numpy as np import matplotlib.pyplot as plt
%python COEFF = .7654321 # inclination OFFSET = 3.1415 # offset EPS = 3. # amout of disturbance xl = -5. # lower bound xu = 5. # upper bound nb = 100 # number of data points
xx = np.linspace(xl,xu,nb) + ((xu-xl) / nb / 2.) * np.random.randn(nb) yy = COEFF * xx + OFFSET + (5. / nb * xx * np.random.randn(nb)) selsize = nb / 10 randsel = np.random.choice(np.arange(nb / 2), selsize, replace=False) yy[randsel] += np.random.random(selsize) * EPS xx[randsel] -= np.random.random(selsize) * EPS randsel = np.random.choice(np.arange(nb / 2) + nb/2, selsize, replace=False) yy[randsel] -= np.random.random(selsize) * EPS xx[randsel] += np.random.random(selsize) * EPS
omodel = sm.OLS(yy, np.c_[np.ones(nb), xx]) ofit = omodel.fit() rmodel = sm.RLM(yy, np.c_[np.ones(nb), xx], M=sm.robust.norms.HuberT()) rfit = rmodel.fit()
true = np.array([OFFSET, COEFF]) print 'True: %s' % true print 'OLS: %s (%.5f)' % (ofit.params, np.linalg.norm(ofit.params - true)) print 'RLS: %s (%.5f)' % (rfit.params, np.linalg.norm(rfit.params - true))
True: [ 3.1415 0.7654321] OLS: [ 3.18771324 0.54335458] (0.22683) RLS: [ 3.1702073 0.73458569] (0.04214)
print rfit.summary()
Robust linear Model Regression Results ============================================================================== Dep. Variable: y No. Observations: 100 Model: RLM Df Residuals: 98 Method: IRLS Df Model: 1 Norm: HuberT Scale Est.: mad Cov Type: H1 Date: Sun, 11 Aug 2013 Time: 10:07:26 No. Iterations: 29 ============================================================================== coef std err z P>|z| [95.0% Conf. Int.] ------------------------------------------------------------------------------ const 3.1702 0.025 126.952 0.000 3.121 3.219 x1 0.7346 0.008 97.298 0.000 0.720 0.749 ============================================================================== If the model instance has been used for another fit with different fit parameters, then the fit options might not be the correct ones anymore .
plt.clf() _ = plt.plot(xx, yy, 'k.') _ = plt.plot(xx, ofit.fittedvalues, 'r-', linewidth=2) _= plt.plot(xx, rfit.fittedvalues, 'g-', linewidth=2) plt.show()