計算統計學

計算統計學或統計計算是統計學與計算機科學之間的紐帶，是指通過計算方法實現的統計方法。計算統計學是計算科學中專門針對統計學數學科學的領域，目前還在迅速發展，因此有人呼籲在普通統計教育中教授更廣泛的計算概念。^[1]

與傳統統計學一樣，其目標是將原始數據轉化為知識，^[2]而重點在於計算機密集型統計方法，例如樣本量非常大的情形與非齊性數據集等。^[2]

「計算統計學」（computational statistics）與「統計計算」（statistical computing）兩詞常常混用，國際統計計算協會前主席Carlo Lauro建議加以區分，「統計計算」可定義為「計算機科學在統計學中的應用」，「計算統計學」則定義為「在計算機上實現統計方法的算法的設計，包括前計算機時代無法想像的算法（如自助法、蒙特卡洛方法等），並應對用分析難以解決的問題」。^[3]

「計算統計學」也可指計算密集型統計方法，如重抽樣、馬爾可夫鏈蒙特卡洛、局部回歸、核密度估計、人工神經網絡與廣義加性模型。

歷史

雖然計算統計學在今天得到了廣泛應用，但在統計學界被接受的歷史其實相對較短。大多數情況下，統計領域的奠基人在開發計算統計方法時依賴數學與漸進逼近。^[4]

統計學領域中，「計算機」（computer，即字面上的「計算用的機器」）一詞首次出現於Robert P. Porter於1891年發表在《美國統計協會雜誌》（Journal of the American Statistical Association）中的一篇文章，文章討論了赫爾曼·霍利里思的機器在美國第11次人口普查中的使用情況。^{[來源請求]}赫爾曼·霍利里思的機器又叫穿孔制表機（tabulating machine），是電動機械學機器，用於協助匯總存儲在打孔卡上的信息。發明者赫爾曼·霍利里思（1860年2月29日 – 1929年11月7日）是美國商人、發明家、統計學家，穿孔制表機於1884年獲得專利，用在了美國1890年的人口普查中。1880年普查大約有5000萬人參與，用了7年多時間才完成制表工作；而1890年普查時，人口有超過6200萬，卻只用了不到一年時間。這標誌着機械化計算統計與半自動數據處理系統時代的開端。 1908年，威廉·戈塞進行了現在廣為人知的蒙特卡洛模擬，從而發現了學生t-分佈。^[5]在計算方法的幫助下，他還繪製了經驗分佈圖與相應的理論分佈圖。計算機給模擬帶來了革命性變化，使複製戈塞的實驗變得不過是一種練習。^[6]^[7]

後來，科學家們提出了生成偽隨機性偏差的計算方法，用逆累積分佈函數或接受-拒絕方法將均勻偏差轉換為其他分佈形式，並開發了馬爾可夫鏈蒙特卡洛的狀態空間方法。^[8]1947年，蘭德公司首次嘗試全自動生成隨機數，生成的隨機數表整合為《百萬亂數表》，於1955年出版。

到20世紀50年代中期，已經有多篇文章和專利提出了隨機數生成器的設備，^[9]其開發源於用隨機數進行模擬和統計分析中其他基本組成的需要，其中最著名的是ERNIE，它產生的隨機數決定了英國發行的彩票債券Premium Bond的中獎者。1958年，約翰·圖基發明了大折刀（jackknife），是一種在非標準條件下減少樣本參數估計偏差的方法。^[10]這就需要計算機操作，至此，計算機使很多繁瑣的統計研究變得可行。^[11]

方法

最大似然估計

最大似然估計用於根據觀測數據估計假定概率分佈的參數。其方法是最大化似然函數，使觀測數據在假定的統計模型下最有可能實現。

蒙特卡洛法

蒙特卡洛法是依靠重複隨機抽樣獲得數值結果的統計方法，其概念是利用隨機性解決原則上確定性的問題，常用於物理學與數學問題，在難以使用其他方法是往往有效。蒙特卡洛法主要用於三類問題：最優化、數值積分與從概率分佈中生成抽樣。

馬爾可夫鏈蒙特卡洛

馬爾可夫鏈蒙特卡洛方法從連續隨機變量中創建樣本，概率分佈與已知函數成正比。這些樣本可用於估計變量的積分，如其期望值或方差。包含的步驟越多，樣本分佈就越接近實際預期分佈。

應用

協會

國際統計計算協會

另見

參考文獻

^ Nolan, D. & Temple Lang, D. (2010). "Computing in the Statistics Curricula", The American Statistician 64 (2), pp.97-107.
^ ^2.0 ^2.1 Wegman, Edward J. 「Computational Statistics: A New Agenda for Statistical Theory and Practice. （頁面存檔備份，存於互聯網檔案館）」 Journal of the Washington Academy of Sciences （頁面存檔備份，存於互聯網檔案館）, vol. 78, no. 4, 1988, pp. 310–322. JSTOR
^ Lauro, Carlo, Computational statistics or statistical computing, is that the question?, Computational Statistics & Data Analysis, 1996, 23 (1): 191–193, doi:10.1016/0167-9473(96)88920-1
^ Watnik, Mitchell. Early Computational Statistics. Journal of Computational and Graphical Statistics. 2011, 20 (4): 811–817 [2024-02-06]. ISSN 1061-8600. S2CID 120111510. doi:10.1198/jcgs.2011.204b. （原始內容存檔於2023-12-21）（英語）.
^ "Student" [William Sealy Gosset]. The probable error of a mean (PDF). Biometrika. 1908, 6 (1): 1–25 [2024-02-06]. JSTOR 2331554. doi:10.1093/biomet/6.1.1. hdl:10338.dmlcz/143545. （原始內容存檔 (PDF)於2008-03-08）.
^ Trahan, Travis John. Recent Advances in Monte Carlo Methods at Los Alamos National Laboratory. 2019-10-03. OSTI 1569710. doi:10.2172/1569710.
^ Metropolis, Nicholas; Ulam, S. The Monte Carlo Method. Journal of the American Statistical Association. 1949, 44 (247): 335–341. ISSN 0162-1459. PMID 18139350. doi:10.1080/01621459.1949.10483310.
^ Robert, Christian; Casella, George. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data. Statistical Science. 2011-02-01, 26 (1). ISSN 0883-4237. S2CID 2806098. arXiv:0808.2902  . doi:10.1214/10-sts351  .
^ Pierre L'Ecuyer. History of uniform random number generation (PDF). 2017 Winter Simulation Conference (WSC). 2017: 202–230 [2024-02-06]. ISBN 978-1-5386-3428-8. S2CID 4567651. doi:10.1109/WSC.2017.8247790. （原始內容存檔 (PDF)於2022-08-04）.
^ QUENOUILLE, M. H. Notes on Bias in Estimation. Biometrika. 1956, 43 (3–4): 353–360. ISSN 0006-3444. doi:10.1093/biomet/43.3-4.353.
^ Teichroew, Daniel. A History of Distribution Sampling Prior to the Era of the Computer and its Relevance to Simulation. Journal of the American Statistical Association. 1965, 60 (309): 27–49. ISSN 0162-1459. doi:10.1080/01621459.1965.10480773.

閱讀更多

文章

Albert, J.H.; Gentle, J.E., Albert, James H; Gentle, James E , 編, Special Section: Teaching Computational Statistics, The American Statistician, 2004, 58: 1, S2CID 219596225, doi:10.1198/0003130042872
Wilkinson, Leland, The Future of Statistical Computing (with discussion), Technometrics, 2008, 50 (4): 418–435, S2CID 3521989, doi:10.1198/004017008000000460

書

Drew, John H.; Evans, Diane L.; Glen, Andrew G.; Lemis, Lawrence M., Computational Probability: Algorithms and Applications in the Mathematical Sciences, Springer International Series in Operations Research & Management Science, Springer, 2007, ISBN 978-0-387-74675-3
Gentle, James E., Elements of Computational Statistics, Springer, 2002, ISBN 0-387-95489-9
Gentle, James E.; Härdle, Wolfgang; Mori, Yuichi (編), Handbook of Computational Statistics: Concepts and Methods, Springer, 2004, ISBN 3-540-40464-3
Givens, Geof H.; Hoeting, Jennifer A., Computational Statistics, Wiley Series in Probability and Statistics, Wiley-Interscience, 2005, ISBN 978-0-471-46124-1
Klemens, Ben, Modeling with Data: Tools and Techniques for Statistical Computing, Princeton University Press, 2008, ISBN 978-0-691-13314-0
Monahan, John, Numerical Methods of Statistics, Cambridge University Press, 2001, ISBN 978-0-521-79168-7
Rose, Colin; Smith, Murray D., Mathematical Statistics with Mathematica, Springer Texts in Statistics, Springer, 2002, ISBN 0-387-95234-9
Thisted, Ronald Aaron, Elements of Statistical Computing: Numerical Computation , CRC Press, 1988, ISBN 0-412-01371-1
Gharieb, Reda. R., Data Science: Scientific and Statistical Computing, Noor Publishing, 2017, ISBN 978-3-330-97256-8

外部連結

協會

International Association for Statistical Computing （頁面存檔備份，存於互聯網檔案館）
Statistical Computing section of the American Statistical Association （頁面存檔備份，存於互聯網檔案館）

期刊

Computational Statistics & Data Analysis （頁面存檔備份，存於互聯網檔案館）
Journal of Computational & Graphical Statistics
Statistics and Computing （頁面存檔備份，存於互聯網檔案館）

[1] Nolan, D. & Temple Lang, D. (2010). "Computing in the Statistics Curricula", The American Statistician 64 (2), pp.97-107.

[:0-2] 2.0 ^2.1 Wegman, Edward J. 「Computational Statistics: A New Agenda for Statistical Theory and Practice. （頁面存檔備份，存於互聯網檔案館）」 Journal of the Washington Academy of Sciences （頁面存檔備份，存於互聯網檔案館）, vol. 78, no. 4, 1988, pp. 310–322. JSTOR

[3] Lauro, Carlo, Computational statistics or statistical computing, is that the question?, Computational Statistics & Data Analysis, 1996, 23 (1): 191–193, doi:10.1016/0167-9473(96)88920-1

[4] Watnik, Mitchell. Early Computational Statistics. Journal of Computational and Graphical Statistics. 2011, 20 (4): 811–817 [2024-02-06]. ISSN 1061-8600. S2CID 120111510. doi:10.1198/jcgs.2011.204b. （原始內容存檔於2023-12-21）（英語）.

[5] "Student" [William Sealy Gosset]. The probable error of a mean (PDF). Biometrika. 1908, 6 (1): 1–25 [2024-02-06]. JSTOR 2331554. doi:10.1093/biomet/6.1.1. hdl:10338.dmlcz/143545. （原始內容存檔 (PDF)於2008-03-08）.

[6] Trahan, Travis John. Recent Advances in Monte Carlo Methods at Los Alamos National Laboratory. 2019-10-03. OSTI 1569710. doi:10.2172/1569710.

[7] Metropolis, Nicholas; Ulam, S. The Monte Carlo Method. Journal of the American Statistical Association. 1949, 44 (247): 335–341. ISSN 0162-1459. PMID 18139350. doi:10.1080/01621459.1949.10483310.

[8] Robert, Christian; Casella, George. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data. Statistical Science. 2011-02-01, 26 (1). ISSN 0883-4237. S2CID 2806098. arXiv:0808.2902  . doi:10.1214/10-sts351  .

[9] Pierre L'Ecuyer. History of uniform random number generation (PDF). 2017 Winter Simulation Conference (WSC). 2017: 202–230 [2024-02-06]. ISBN 978-1-5386-3428-8. S2CID 4567651. doi:10.1109/WSC.2017.8247790. （原始內容存檔 (PDF)於2022-08-04）.

[10] QUENOUILLE, M. H. Notes on Bias in Estimation. Biometrika. 1956, 43 (3–4): 353–360. ISSN 0006-3444. doi:10.1093/biomet/43.3-4.353.

[11] Teichroew, Daniel. A History of Distribution Sampling Prior to the Era of the Computer and its Relevance to Simulation. Journal of the American Statistical Association. 1965, 60 (309): 27–49. ISSN 0162-1459. doi:10.1080/01621459.1965.10480773.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]