維數災難

維數災難（英語：Curse of dimensionality，又名維度的詛咒）是一個最早由美國應用數學家理查德·貝爾曼在考慮優化問題時首次提出來的術語^[1]^[2]，用來描述當（數學）空間維度增加時，分析和組織高維空間（通常有成百上千維），因體積指數增加而遇到各種問題場景。這樣的難題在低維空間中不會遇到，如物理空間通常只用三維來建模。

舉例來說，100個平均分布的點能把一個單位區間以每個點距離不超過0.01採樣；而當維度增加到10後，如果以相鄰點距離不超過0.01小方格採樣一單位超正方體，則需要10²⁰ 個採樣點：所以，這個10維的超正方體也可以說是比單位區間大10¹⁸倍。（這個是理查德·貝爾曼所舉的例子）

在很多領域中，如採樣、組合數學、機器學習和數據挖掘都有提及到這個名字的現象。這些問題的共同特色是當維數提高時，空間的體積提高太快，因而可用數據變得很稀疏。稀疏性對於任何要求有統計學意義的方法而言都是一個問題，為了獲得在統計學上正確並且有可靠的結果，用來支撐這一結果所需要的數據量通常隨着維數的提高而呈指數級增長。而且，在組織和搜索數據時也有賴於檢測對象區域，這些區域中的對象通過相似度屬性而形成分組。然而在高維空間中，所有的數據都很稀疏，從很多角度看都不相似，因而平常使用的數據組織策略變得極其低效。

「維數災難」通常是用來作為不要處理高維數據的無力藉口。然而，學術界一直都對其有興趣，而且在繼續研究。另一方面，也由於本徵維度（英語：intrinsic dimension）的存在，其概念是指任意低維數據空間可簡單地通過增加空餘（如複製）或隨機維將其轉換至更高維空間中，相反地，許多高維空間中的數據集也可削減至低維空間數據，而不必丟失重要信息。這一點也通過眾多降維方法的有效性反映出來，如應用廣泛的主成分分析方法。針對距離函數和最近鄰搜索，當前的研究也表明除非其中存在太多不相關的維度，帶有維數災難特色的數據集依然可以處理，因為相關維度實際上可使得許多問題（如聚類分析）變得更加容易。另外，對於許多方法因為維數過高而處理棘手的數據集上，像馬爾科夫蒙特卡洛或共享最近鄰搜索方法^[3]可以表現得很好。

組合學

在一些問題中，每個變量都可取一系列離散值中的一個，或者可能值的範圍被劃分為有限個可能性。把這些變量放在一起，則必須考慮很多種值的組合方式，這後果就是常說的組合爆炸（英語：Combinatorial explosion）。即使在最簡單的二元變量例子中，可能產生的組合總數就已經是在維數上呈現指數級的 $O(2^{d})$ 。一般而言，每個額外的維度都需要成倍地增加嘗試所有組合方式的影響。

採樣

當在數學空間上額外增加一個維度時，其體積會呈指數級的增長。如，點間距離不超過10^-2=0.01，10²=100個均勻間距的樣本點足夠採樣到一個單位區間（「一個維度的立方體」）；一個10維單元超立方體的等價採樣，其相鄰兩點間的距離為10^-2=0.01則需要10²⁰個樣本點。一般而言，點距為10^-n的10維超立方體所需要的樣本點數量，是1維超立方體這樣的單元區間的10^n(10-1)倍。在上面的n=2的例子中：當樣本距離為0.01時，10維超立方體所需要的樣本點數量會比單元區間多10¹⁸倍。這一影響就是上面所述組合學問題中的組合結果，距離函數問題將在下面介紹。

優化

當用數值逆向歸納法（英語：backward induction）解決動態優化問題時，目標函數針對每個可能的組合都必須計算一遍，當狀態變量的維度很大時，這是極其困難的。

機器學習

在機器學習問題中，需要在高維特徵空間（每個特徵都能夠取一系列可能值）的有限數據樣本中學習一種「自然狀態」（可能是無窮分布），要求有相當數量的訓練數據含有一些樣本組合。給定固定數量的訓練樣本，其預測能力隨着維度的增加而減小，這就是所謂的Hughes影響^[4]或Hughes現象（以Gordon F. Hughes命名）。^[5]^[6]

貝葉斯統計

在貝葉斯統計中維數災難通常是一個難點，因為其後驗分布（英語：posterior distributions）通常都包含着許多參數。

然而，這一問題在基於模擬的貝葉斯推理（尤其是適應於很多實踐問題的馬爾科夫蒙特卡洛方法）出現後得到極大地克服，當然，基於模擬的方法收斂很慢，因此這也並不是解決高維問題的靈丹妙藥。

距離函數

當一個度量，如歐幾里德距離使用很多坐標來定義時，不同的樣本對之間的距離已經基本上沒有差別。

一種用來描述高維歐幾里德空間的巨型性的方法是將維數 $d$ 且半徑 $r$ 的內接超球體體積，和相同維數但邊長為 $2r$ 的超立方體體積相比較。這樣一個球體的體積計算如下： $V_{\mathrm {hypersphere} }={\frac {2r^{d}\pi ^{d/2}}{d\Gamma (d/2)}}$ ^{[註 1]}

立方體的體積計算如下： $V_{\mathrm {hypercube} }=(2r)^{d}$

隨着空間維度 $d$ 的增加，相對於超立方體的體積來說，超球體的體積就變得微不足道了。這一點可以從當 $d$ 趨於無窮時比較前面的比例清楚地看出： ${\frac {V_{\mathrm {hypersphere} }}{V_{\mathrm {hypercube} }}}={\frac {\pi ^{d/2}}{d2^{d-1}\Gamma (d/2)}}\rightarrow 0$

當 $d\rightarrow \infty$ 。因此，在某種意義上，幾乎所有的高維空間都遠離其中心，或者從另一個角度來看，高維單元空間可以說是幾乎完全由超立方體的「邊角」所組成的，沒有「中部」，這對於理解卡方分布是很重要的直覺理解。給定一個單一分布，由於其最小值和最大值與最小值相比收斂於0，因此，其最小值和最大值的距離變得不可辨別。 $\lim _{d\to \infty }{\frac {\operatorname {dist} _{\max }-\operatorname {dist} _{\min }}{\operatorname {dist} _{\min }}}\to 0$ .

這通常被引證為距離函數在高維環境下失去其意義的例子。

延伸閱讀

組合爆炸（英語：Combinatorial explosion）
相似度集中（英語：Concentration of measure）
降維
傅立葉變換列表（英語：Fourier-related transforms）
高維數據聚類（英語：Clustering high-dimensional data）

注釋

^ 在

n

維歐氏空間裡，半徑

R

的球體之

n

維體積為

V_{n}(R)={\frac {\pi ^{n/2}}{\Gamma ({\frac {n}{2}}+1)}}R^{n}

注釋-式1

其中 $\Gamma$ 表示Γ函數。因此根據(注釋-式1)半徑 $r$ 維數 $d$ 的球體體積為

V_{\mathrm {hypersphere} }=V_{d}(r)={\frac {\pi ^{d/2}}{\Gamma ({\frac {d}{2}}+1)}}r^{d}

注釋-式2

Γ函數的遞迴公式為

\Gamma (x+1)=x\Gamma (x)

注釋-式3

將(注釋-式3)的 $x$ 以 $d/2$ 取代

\Gamma \left({\frac {d}{2}}+1\right)={\frac {d\Gamma (d/2)}{2}}

注釋-式4

將(注釋-式4)代入(注釋-式2)得到

V_{\mathrm {hypersphere} }={\frac {2r^{d}\pi ^{d/2}}{d\Gamma (d/2)}}

注釋-式5

參考資料

^ Richard Ernest Bellman; Rand Corporation. Dynamic programming. Princeton University Press. 1957. ISBN 978-0-691-07951-6. ^{[失效連結]}
Republished: Richard Ernest Bellman. Dynamic Programming. Courier Dover Publications. 2003 [2012-05-18]. ISBN 978-0-486-42809-3. （原始內容存檔於2021-02-23）.
^ Richard Ernest Bellman. Adaptive control processes: a guided tour. Princeton University Press. 1961 [2012-05-18]. （原始內容存檔於2021-02-23）.
^ ^3.0 ^3.1 Michael E. Houle, Hans-Peter Kriegel, Peer Kröger, Erich Schubert, Arthur Zimek. Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?. Springer, Berlin, Heidelberg: 482–500. 2010-06-30 [2018-04-02]. ISBN 9783642138171. doi:10.1007/978-3-642-13818-8_34. （原始內容存檔於2018-06-17）（英語）.
^ Thomas Oommen, Debasmita Misra, Navin K. C. Twarakavi, Anupma Prakash, Bhaskar Sahoo, Sukumar Bandopadhyay. An Objective Analysis of Support Vector Machine Based Classification for Remote Sensing. Mathematical Geosciences. 2008-05-01, 40 (4): 409–424 [2018-04-02]. ISSN 1874-8961. doi:10.1007/s11004-008-9156-6. （原始內容存檔於2018-06-18）（英語）.
^ Hughes, G.F., 1968. "On the mean accuracy of statistical pattern recognizers", IEEE Transactions on Information Theory, IT-14:55-63.
^ Not to be confused with the unrelated, but similarly named, Hughes effect in electromagnetism (named after Declan C. Hughes^{[永久失效連結]}) which refers to an asymmetry in the hysteresis curves of laminated cores made of certain magnetic materials, such as permalloy or mu-metal, in alternating magnetic fields.
^ R. B. Marimont and M. B. Shapiro, "Nearest Neighbour Searches and the Curse of Dimensionality", Journal of the Institute of Mathematics and its Applications, 24, 1979, 59-70.
^ E. Chavez et al., "Searching in Metric Spaces", ACM Computing Surveys, 33, 2001, 273-321.
^ Thomas Bernecker, Michael E. Houle, Hans-Peter Kriegel, Peer Kröger, Matthias Renz, Erich Schubert, Arthur Zimek. Quality of Similarity Rankings in Time Series. Springer, Berlin, Heidelberg: 422–440. 2011-08-24 [2018-04-02]. ISBN 9783642229213. doi:10.1007/978-3-642-22922-0_25. （原始內容存檔於2019-12-23）（英語）.
^ Radovanovi?, Milo?; Nanopoulos, Alexandros; Ivanovi?, Mirjana. Hubs in space: Popular nearest neighbors in high-dimensional data (PDF). Journal of Machine Learning Research. 2010, 11: 2487–2531 [2012-05-18]. （原始內容存檔 (PDF)於2019-07-17）.
^ Milos Radovanović, Alexandros Nanopoulos, Mirjana Ivanović. On the existence of obstinate results in vector space models. ACM: 186–193. 2010-07-19 [2018-04-02]. ISBN 9781450301534. doi:10.1145/1835449.1835482.

Bellman, R.E. 1957. Dynamic Programming. Princeton University Press, Princeton, NJ.
- Republished 2003: Dover, ISBN 0486428095.
Bellman, R.E. 1961. Adaptive Control Processes. Princeton University Press, Princeton, NJ.
Powell, Warren B. 2007. Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley, ISBN 0470171553.

[7] 在 $n$ 維歐氏空間裡，半徑 $R$ 的球體之 $n$ 維體積為
$V_{n}(R)={\frac {\pi ^{n/2}}{\Gamma ({\frac {n}{2}}+1)}}R^{n}$ 注釋-式1

其中 $\Gamma$ 表示Γ函數。因此根據(注釋-式1)半徑 $r$ 維數 $d$ 的球體體積為

$V_{\mathrm {hypersphere} }=V_{d}(r)={\frac {\pi ^{d/2}}{\Gamma ({\frac {d}{2}}+1)}}r^{d}$ 注釋-式2

Γ函數的遞迴公式為

$\Gamma (x+1)=x\Gamma (x)$ 注釋-式3

將(注釋-式3)的 $x$ 以 $d/2$ 取代

$\Gamma \left({\frac {d}{2}}+1\right)={\frac {d\Gamma (d/2)}{2}}$ 注釋-式4

將(注釋-式4)代入(注釋-式2)得到

$V_{\mathrm {hypersphere} }={\frac {2r^{d}\pi ^{d/2}}{d\Gamma (d/2)}}$ 注釋-式5

[1] Richard Ernest Bellman; Rand Corporation. Dynamic programming. Princeton University Press. 1957. ISBN 978-0-691-07951-6. ^{[失效連結]}
Republished: Richard Ernest Bellman. Dynamic Programming. Courier Dover Publications. 2003 [2012-05-18]. ISBN 978-0-486-42809-3. （原始內容存檔於2021-02-23）.

[2] Richard Ernest Bellman. Adaptive control processes: a guided tour. Princeton University Press. 1961 [2012-05-18]. （原始內容存檔於2021-02-23）.

[houle-ssdbm10-3] 3.0 ^3.1 Michael E. Houle, Hans-Peter Kriegel, Peer Kröger, Erich Schubert, Arthur Zimek. Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?. Springer, Berlin, Heidelberg: 482–500. 2010-06-30 [2018-04-02]. ISBN 9783642138171. doi:10.1007/978-3-642-13818-8_34. （原始內容存檔於2018-06-17）（英語）.

[4] Thomas Oommen, Debasmita Misra, Navin K. C. Twarakavi, Anupma Prakash, Bhaskar Sahoo, Sukumar Bandopadhyay. An Objective Analysis of Support Vector Machine Based Classification for Remote Sensing. Mathematical Geosciences. 2008-05-01, 40 (4): 409–424 [2018-04-02]. ISSN 1874-8961. doi:10.1007/s11004-008-9156-6. （原始內容存檔於2018-06-18）（英語）.

[5] Hughes, G.F., 1968. "On the mean accuracy of statistical pattern recognizers", IEEE Transactions on Information Theory, IT-14:55-63.

[6] Not to be confused with the unrelated, but similarly named, Hughes effect in electromagnetism (named after Declan C. Hughes^{[永久失效連結]}) which refers to an asymmetry in the hysteresis curves of laminated cores made of certain magnetic materials, such as permalloy or mu-metal, in alternating magnetic fields.

[8] R. B. Marimont and M. B. Shapiro, "Nearest Neighbour Searches and the Curse of Dimensionality", Journal of the Institute of Mathematics and its Applications, 24, 1979, 59-70.

[9] E. Chavez et al., "Searching in Metric Spaces", ACM Computing Surveys, 33, 2001, 273-321.

[houle-sstd11-10] Thomas Bernecker, Michael E. Houle, Hans-Peter Kriegel, Peer Kröger, Matthias Renz, Erich Schubert, Arthur Zimek. Quality of Similarity Rankings in Time Series. Springer, Berlin, Heidelberg: 422–440. 2011-08-24 [2018-04-02]. ISBN 9783642229213. doi:10.1007/978-3-642-22922-0_25. （原始內容存檔於2019-12-23）（英語）.

[11] Radovanovi?, Milo?; Nanopoulos, Alexandros; Ivanovi?, Mirjana. Hubs in space: Popular nearest neighbors in high-dimensional data (PDF). Journal of Machine Learning Research. 2010, 11: 2487–2531 [2012-05-18]. （原始內容存檔 (PDF)於2019-07-17）.

[12] Milos Radovanović, Alexandros Nanopoulos, Mirjana Ivanović. On the existence of obstinate results in vector space models. ACM: 186–193. 2010-07-19 [2018-04-02]. ISBN 9781450301534. doi:10.1145/1835449.1835482.

[1]

[2]

[3]

[4]

[5]

[6]

[註 1]

[7]

[8]

[9]

[10]

[11]