半監督式學習英文semi-supervised learning)係一類機械學習技術,係種訓練過程憑藉啲數據既有標記過嘅又有未標記嘅——通常係一尐標記過嘅數據佮埋大量數據係未標記過嘅。半監督式學習介於非監督式學習(冇噻尐標記數據)戥監督式學習(數據冚唪唥都有標記)。標記數據畀機械學習通常需要帶技能嘅專家來分類憑手動畀啲訓練示例。成本畀隻過程令到完全標記嘅數據集冇可行性,同時未標記嘅數據通常平過。種情況下,半監督式學習有實用價值好大。

一個例子畀半監督式學習係佮練(co-training),其中一或者多部學習器得佮埋訓練喺同一數據集上高,但每部都使一批嘸同嘅特徵,最理想情況下係相互獨立嘅。

另一種方法係建模畀隻聯合概率分佈畀啲特徵同埋標籤嘅。對於啲數據啲未標記過嘅,可以捉標籤睇作係「缺失嘅數據」。啲技術攞來處理缺失嘅數據嘅似Gibbs採樣同埋最大化期望之類都可以使來估計參數。

讀埋

編輯
  1. Abney, S., Semisupervised Learning for Computational Linguistics. Chapman & Hall/CRC, 2008.
  2. Blum, A., Mitchell, T. Combining labeled and unlabeled data with co-training Lưu trữ 2011-09-04 tại Wayback Machine. COLT: Proceedings of the Workshop on Computational Learning Theory, Morgan Kaufmann, 1998, p. 92-100.
  3. Chapelle, O., B. Schölkopf and A. Zien: Semi-Supervised Learning. MIT Press, Cambridge, MA (2006). Further information Lưu trữ 2010-01-12 tại Wayback Machine.
  4. Huang T-M., Kecman V., Kopriva I. [1], Kernel Based Algorithms for Mining Huge Data Sets, Supervised, Semisupervised and Unsupervised Learning, Springer-Verlag, Berlin, Heidelberg, 260 pp. 96 illus., Hardcover, ISBN 3-540-31681-7, 2006.
  5. O'Neill, T. J. (1978) "Normal discrimination with unclassified observations". Journal of the American Statistical Association, 73, 821–826.
  6. Theodoridis S., Koutroumbas K. (2009) Pattern Recognition, 4th Edition, Academic Press, ISBN 978-1-59749-272-0.
  7. Zhu, X. Semi-supervised learning literature survey.
  8. Zhu, X., Goldberg, A. (2009) Introduction to Semi-Supervised Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 3, 1-130. Morgan & Claypool Publishers, 2009.
  9. Song, E. et al. [2], Semi-supervised multi-class Adaboost by exploiting unlabeled data, Expert Systems with Applications, Vol. 38, Issue 6, p. 6720-6726, June 2011.