訓練集、驗證集和測試集

機器學習的普遍任務就是從資料中學習和構建模型（該過程稱之為訓練），並且能夠在將來遇到的資料上進行預測。^[1]用於構建最終模型的資料集通常有多個；在構建模型的不同階段，通常有三種資料集：訓練集、驗證集和測試集。

首先，模型在訓練集（英語：training dataset）上進行調適。^[2]對於監督式學習，訓練集是由用來調適參數（例如類神經網絡中神經元之間連結的權重）的範例組成的集合。^[3]在實踐中，訓練集通常是由輸入向量（純量）和輸出向量（純量）組成的資料對。其中輸出向量（純量）被稱為目標或標籤。在訓練過程中，當前模型會對訓練集中的每個範例進行預測，並將預測結果與目標進行比較。根據比較的結果，學習演算法會更新模型的參數。模型調適的過程可能同時包括特徵選擇和參數估計。

接下來，調適得到的模型會在第二個資料集——驗證集（英語：validation dataset）——上進行預測。^[2]在對模型的超參數（例如神經網絡中隱藏層的神經元數量^[3]）進行調整時，驗證集提供了對在訓練集上調適得到模型的無偏評估。^[4]驗證集可用於正則化中的提前停止：在驗證集誤差上升時（這是在訓練集上過適的訊號），停止訓練。^[5]不過，在實踐中，由於驗證集誤差在訓練過程中會有起伏，這種做法有時不奏效。由此，人們發明了一些規則，用做判定過適更好的訊號。^[5]

最後，測試集（英語：test dataset）可被用來提供對最終模型的無偏評估。^[4]若測試集在訓練過程中從未用到（例如，沒有被用在交叉驗證當中），則它也被稱之為預留集。

參考文獻

^ Ron Kohavi; Foster Provost. Glossary of terms. Machine Learning. 1998, 30: 271–274 [2019-12-10]. （原始內容存檔於2019-11-11）.
^ ^2.0 ^2.1 James, Gareth. An Introduction to Statistical Learning: with Applications in R. Springer. 2013: 176 [2019-12-10]. ISBN 978-1461471370. （原始內容存檔於2019-06-23）.
^ ^3.0 ^3.1 Ripley, Brian. Pattern Recognition and Neural Networks. Cambridge University Press. 1996: 354. ISBN 978-0521717700.
^ ^4.0 ^4.1 Brownlee, Jason. What is the Difference Between Test and Validation Datasets?. 2017-07-13 [12 October 2017]. （原始內容存檔於2019-12-10）.
^ ^5.0 ^5.1 Prechelt, Lutz; Geneviève B. Orr. Early Stopping — But When?. Grégoire Montavon; Klaus-Robert Müller (編). Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science. Springer Berlin Heidelberg. 2012-01-01: 53–67. ISBN 978-3-642-35289-8. doi:10.1007/978-3-642-35289-8_5.

[1] Ron Kohavi; Foster Provost. Glossary of terms. Machine Learning. 1998, 30: 271–274 [2019-12-10]. （原始內容存檔於2019-11-11）.

[James_2013_176-2] 2.0 ^2.1 James, Gareth. An Introduction to Statistical Learning: with Applications in R. Springer. 2013: 176 [2019-12-10]. ISBN 978-1461471370. （原始內容存檔於2019-06-23）.

[Ripley_1996_354-3] 3.0 ^3.1 Ripley, Brian. Pattern Recognition and Neural Networks. Cambridge University Press. 1996: 354. ISBN 978-0521717700.

[Brownlee-4] 4.0 ^4.1 Brownlee, Jason. What is the Difference Between Test and Validation Datasets?. 2017-07-13 [12 October 2017]. （原始內容存檔於2019-12-10）.

[prechelt_early_2012-5] 5.0 ^5.1 Prechelt, Lutz; Geneviève B. Orr. Early Stopping — But When?. Grégoire Montavon; Klaus-Robert Müller (編). Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science. Springer Berlin Heidelberg. 2012-01-01: 53–67. ISBN 978-3-642-35289-8. doi:10.1007/978-3-642-35289-8_5.

[1]

[2]

[3]

[4]

[5]