稳健回归

在稳健统计中，稳健回归试图克服传统回归分析的一些局限性。回归分析对自变量与因变量的关系进行建模。普通最小二乘法等标准类型的回归，在基本假设为真时有有利的特性，但其他情形下可能产生误导（即对违背假设的情形不稳健）。稳健回归法旨在限制底数据生成中违反假设的情形对回归估计的影响。

例如，最小二乘估计的回归模型对异常值非常敏感：误差幅度为典型观测值2倍的异常值，对平方误差损失函数的贡献是典型观测值的4倍（2倍的平方），因此对回归估计值的影响更大。休伯损失函数是普通平方误差损失的一种稳健替代，可减少异常值对平方误差损失的贡献，从而限制其对回归估计值的影响。

应用

异方差误差

当强烈怀疑存在异方差时，就要考虑采用稳健估计。同方差模型假定误差项的方差对所有x都是常数。例如，高收入人群的支出方差往往大于低收入人群。软件包通常默认使用同方差，尽管可能不如异方差模型准确。一种简单方法（Tofallis, 2008）是对百分误差应用最小二乘法，与普通最小二乘法相比，这样可减少因变量的较大值造成的影响。

异常值

使用稳健估计的另一种常见情况是数据包含异常值。若异常值与其他数据的产生过程不同，最小二乘法估算的效率就会很低，且会产生偏差。由于最小二乘预测结果会被拖向异常值，且估计值的方差也会被扩大，结果就是异常值会被掩盖（在许多时候，包括地理统计和医学统计的部分领域，待研究的往往是异常值）。

有时有人称最小二乘法（或一般的经典统计方法）是稳健的，但这只是指在违反模型的情况下第一类错误率不会增加。实际上，出现异常值时，第一类错误率往往会低于定类水平（nominal level），而第二类错误率则会急剧上升。第一类错误率的下降被称为经典方法的保守性。

稳健回归的历史与不受欢迎

虽然稳健回归法在很多时候都比最小二乘法的性能更优越，但仍未得到广泛应用。不受欢迎的原因有几个（Hampel et al. 1986, 2005），其一是有多种方法相互竞争，使得领域有多个错误的开端；另外，文件回归的计算量比最小二乘法大得多；近年来，随着算力的大幅提高，这种反对意见已变得不重要了。另一个原因可能是一些流行统计软件包还没有实现这些方法（Stromberg, 2004）。许多统计学家认为经典方法是稳健的，这可能又是一个原因^{[来源请求]}。

尽管稳健方法的应用进展缓慢，但现代的主流统计学教科书通常都有对这些方法的讨论（例如，Seber & Lee 及 Faraway 的著作；关于各种稳健回归方法如何相互发展的概述，请参阅 Andersen 的著作）。

稳健回归方法

最小二乘的代替

最简单的方法是使用最小一乘法估计回归模型中的参数，这种方法对异常值的敏感度低于最小二乘法。即便如此，严重的异常值仍会对模型产生相当大的影响，促使人们研究更加稳健的方法。

1964年，休伯引入了M估计，M代表“最大似然”，对响应变量中的异常值很稳健，但对解释变量（杠杆点）的异常值则无能为力。事实上，这时这种方法与最小二乘相比没有任何优势。 1980年代，提出了集中M估计的替代方案，试图克服缺乏抵抗的问题。可参Rousseeuw、Leroy的著作。最小截平方（LTS）是一种可行的替代，目前(2007)是Rousseeuw & Ryan (1997, 2008)的首选。泰尔-森估算的分解点低于LTS，但在统计上很有效，也很受欢迎。另一种建议的解决方案是S估计，能找到一条线（面或超平面），使残差规模的稳健估计值（名称出处）最小化。这种方法对杠杆点有很强抵抗力，对响应中的异常值也很稳健，但往往很低效。 MM估计试图保留S估计的稳健性，同时获得M估计的效率。首先要找到一个十分稳健、抗干扰的S估计值，可使残差尺度的M估计值（第一个M）最小化。然后，在确定参数的M估计值（第二个M）的同时，保持估计值不变。

参数替代方法

另一种稳健估计回归方法是用重尾分布代替正态分布。据报道，在各种实际情况下，自由度为4~6的T分布都是不错的选择。作为完全参数化的贝叶斯稳健回归，在很大程度上依赖于这种分布。

在残差为t分布的假设下，分布是一个位置尺度族，即 $x\leftarrow (x-\mu )/\sigma$ 。t分布的自由度，有时也称为峰度系数。Lange、Little & Taylor (1989)从非贝叶斯的角度深入讨论了这一模型；Gelman et al. (2003)对贝叶斯模型进行了阐述。

另一种参数方法是假设残差遵循混合正态分布（Daemi et al. 2019）；特别是污染正态分布，其中大部分观测值来自指定的正态分布，小部分来自方差大得多的正态分布。即，残差来自方差为 $\sigma ^{2}$ 的正态分布的概率为 $1-\varepsilon$ ，其中 $\varepsilon$ 很小，而对某个 $c>1$ ，来自方差为 $c\sigma ^{2}$ 的正态分布的概率为 $\varepsilon$ ：

e_{i}\sim (1-\varepsilon )N(0,\sigma ^{2})+\varepsilon N(0,c\sigma ^{2}).

通常有 $\varepsilon <0.1$ 。这有时被称为 $\varepsilon$ 污染模型。

参数法的优点是，由似然理论提供了一种“现成”的推断方法（虽然对 $\varepsilon$ 污染模型之类不适用通常的正则行条件），且可根据拟合结果建立模拟模型。但这种参数模型仍假定基本模型是真实的，因此不能考虑偏移的残差分布或有限的观测精度。

单位权

另一种稳健方法是单位权（Wainer & Thissen, 1976），适用于单一结果有多个预测因素的情况。Ernest Burgess (1928)用单位权法预测假释成功率，对21个积极因素进行评分，分为存在（如“无逮捕前科”= 1）或不存在（“有逮捕前科”= 0），然后求和得出预测得分，结果表明得分是预测假释成功的有效指标。Samuel S. Wilks (1938)的研究表明，几乎所有回归权集的和都是彼此高度相关的，也包括单位权，这一结果被称为威尔克斯定理（Ree, Carretta, & Earles, 1998）。Robyn Dawes (1979)研究了应用环境下的决策制定，发现使用单位权的简单模型的结果甚至往往优于人类专家。Bobko、Roth、Buster (2007)回顾了有关单位权的文献，并得出结论：数十年的经验研究表明，单位权在交叉验证中的表现与普通回归权相似。

另见

参考文献

Liu, J.; Cosman, P. C.; Rao, B. D. Robust Linear Regression via L0 Regularization. IEEE Transactions on Signal Processing. 2018, 66 (3): 698–713. doi:10.1109/TSP.2017.2771720  .
Andersen, R. Modern Methods for Robust Regression. Sage University Paper Series on Quantitative Applications in the Social Sciences, 07-152. 2008.
Ben-Gal I., Outlier detection （页面存档备份，存于互联网档案馆）, In: Maimon O. and Rockach L. (Eds.) Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers," Kluwer Academic Publishers, 2005, ISBN 0-387-24435-2.
Bobko, P., Roth, P. L., & Buster, M. A. (2007). "The usefulness of unit weights in creating composite scores: A literature review, application to content validity, and meta-analysis". Organizational Research Methods, volume 10, pages 689-709. doi:10.1177/1094428106294734
Daemi, Atefeh, Hariprasad Kodamana, and Biao Huang. "Gaussian process modelling with Gaussian mixture likelihood." Journal of Process Control 81 (2019): 209-220. doi:10.1016/j.jprocont.2019.06.007
Breiman, L. Statistical Modeling: the Two Cultures. Statistical Science. 2001, 16 (3): 199–231. JSTOR 2676681. doi:10.1214/ss/1009213725  .
Burgess, E. W. (1928). "Factors determining success or failure on parole". In A. A. Bruce (Ed.), The Workings of the Indeterminate Sentence Law and Parole in Illinois (pp. 205–249). Springfield, Illinois: Illinois State Parole Board. Google books
Dawes, Robyn M. (1979). "The robust beauty of improper linear models in decision making". American Psychologist, volume 34, pages 571-582. doi:10.1037/0003-066X.34.7.571 . archived pdf （页面存档备份，存于互联网档案馆）
Draper, David. Rank-Based Robust Analysis of Linear Models. I. Exposition and Review. Statistical Science. 1988, 3 (2): 239–257. JSTOR 2245578. doi:10.1214/ss/1177012915  .
Faraway, J. J. Linear Models with R. Chapman & Hall/CRC. 2004.
Fornalski, K. W. Applications of the robust Bayesian regression analysis. International Journal of Society Systems Science. 2015, 7 (4): 314–333. doi:10.1504/IJSSS.2015.073223.
Gelman, A.; J. B. Carlin; H. S. Stern; D. B. Rubin. Bayesian Data Analysis Second. Chapman & Hall/CRC. 2003.
Hampel, F. R.; E. M. Ronchetti; P. J. Rousseeuw; W. A. Stahel. Robust Statistics: The Approach Based on Influence Functions. Wiley. 2005 [1986].
Lange, K. L.; R. J. A. Little; J. M. G. Taylor. Robust statistical modeling using the t-distribution. Journal of the American Statistical Association. 1989, 84 (408): 881–896 [2023-10-14]. JSTOR 2290063. doi:10.2307/2290063. （原始内容存档于2022-12-22）.
Lerman, G.; McCoy, M.; Tropp, J. A.; Zhang T. (2012). "Robust computation of linear models, or how to find a needle in a haystack" （页面存档备份，存于互联网档案馆）,
arXiv:1202.4044
.
Maronna, R.; D. Martin; V. Yohai. Robust Statistics: Theory and Methods. Wiley. 2006.
McKean, Joseph W. Robust Analysis of Linear Models. Statistical Science. 2004, 19 (4): 562–570. JSTOR 4144426. doi:10.1214/088342304000000549  .
Radchenko S.G. Robust methods for statistical models estimation: Monograph. (on Russian language). Kiev: РР «Sanspariel». 2005: 504. ISBN 978-966-96574-0-4.
Ree, M. J., Carretta, T. R., & Earles, J. A. (1998). "In top-down decisions, weighting variables does not matter: A consequence of Wilk's theorem. Organizational Research Methods, volume 1(4), pages 407-420. doi:10.1177/109442819814003
Rousseeuw, P. J.; A. M. Leroy. Robust Regression and Outlier Detection. Wiley. 2003 [1986].
Ryan, T. P. Modern Regression Methods. Wiley. 2008 [1997].
Seber, G. A. F.; A. J. Lee. Linear Regression Analysis Second. Wiley. 2003.
Stromberg, A. J. Why write statistical software? The case of robust statistical methods. Journal of Statistical Software. 2004, 10 (5). doi:10.18637/jss.v010.i05  .
Strutz, T. Data Fitting and Uncertainty (A practical introduction to weighted least squares and beyond). Springer Vieweg. 2016. ISBN 978-3-658-11455-8.
Tofallis, Chris. Least Squares Percentage Regression. Journal of Modern Applied Statistical Methods. 2008, 7: 526–534 [2023-10-14]. SSRN 1406472  . doi:10.2139/ssrn.1406472. （原始内容存档于2023-08-14）.
Venables, W. N.; B. D. Ripley. Modern Applied Statistics with S. Springer. 2002.
Wainer, H., & Thissen, D. (1976). "Three steps toward robust regression." Psychometrika, volume 41(1), pages 9–34. doi:10.1007/BF02291695
Wilks, S. S. (1938). "Weighting systems for linear functions of correlated variables when there is no dependent variable". Psychometrika, volume 3, pages 23–40. doi:10.1007/BF02287917

外部链接

R programming wikibooks
Brian Ripley's robust statistics course notes.
Nick Fieller's course notes on Statistical Modelling and Computation （页面存档备份，存于互联网档案馆） contain material on robust regression.
Olfa Nasraoui's Overview of Robust Statistics （页面存档备份，存于互联网档案馆）
Olfa Nasraoui's Overview of Robust Clustering （页面存档备份，存于互联网档案馆）
Why write statistical software? The case of robust statistical methods, A. J. Stromberg （页面存档备份，存于互联网档案馆）
Free software (Fortran 95) L1-norm regression. Minimization of absolute deviations instead of least squares.
Free open-source python implementation for robust nonlinear regression. （页面存档备份，存于互联网档案馆）