強化學習

強化學習（粵拼：koeng4 faa3 hok6 zaap6 | 英文：reinforcement learning，RL）係機械學習上嘅一種學習範式。

喺強化學習嘅過程當中，研究者唔會有數據 ${\displaystyle \textstyle x}$ 俾個機械學習程式睇同跟住學－唔似得監督式或者非監督式學習，而係俾個程式自主噉同周圍環境互動（個環境可以係現場，又可以係一個模擬嘅環境）：喺每一個時間點 ${\displaystyle \textstyle t}$ ，個程式會產生一個用輸出嘅數字表示嘅動作，而跟住佢周圍個環境會俾返一啲 feedback－簡單啲講就係話返俾個程式聽，佢個動作啱唔啱。而個程式跟手就會根據呢個 feedback 計吓，睇吓要點樣改佢嗰啲參數，先可以令到下次佢做行動嗰陣得到正面回應嘅機會率高啲^[1]^[2]。

概念

一個強化學習過程可以模擬成一個馬可夫決策過程（Markov decision process）^[3]：

環境（environment）：
- 環境有若有若干個可能狀態（state）， $s$
- 環境狀態可以因為個體嘅行動而改變，
- 對個實驗者嚟講，環境狀態嘅改變規律－環境狀態嘅函數－可以係已知（如果係模擬環境），又可以係未知（如果係現實世界）。
個體（agent）：
- 個體有若干個可能狀態， $S_{a}$
- 個體有若干個可能行動（action）， $a$
- 政策（policy）－指「由外界感知到嘅 $s$ 」去到「要採取嘅行動 $a$ 」嘅關係，
- $P_{a}(s,s')=\Pr(s_{t+1}=s'\mid s_{t}=s,a_{t}=a)$ 係喺時間點 $t$ ，做咗行動 $a$ 之後環境由 $s$ 變成 $s'$ 嘅機會率，
- $R_{a}(s,s')$ 係強化或者叫回報（reward），簡單講反映咗「如果做咗行動 $a$ 之後環境由 $s$ 變成 $s'$ ，個個體有幾『鍾意』呢個結果」。

強化學習嘅用途好廣泛，例如可以用嚟教 AI 程式打機：只要研究者用某啲方法令個程式能夠感知遊戲嘅狀態同有方法向隻遊戲俾輸入，順利嘅話，強化學習可以令個程式學識玩隻遊戲^[4]^[5]。

用強化學習教一個程式喺一個虛擬空間入面爬嘅前後示範

結合好奇

強化學習可以結合人工好奇（artificial curiosity）嚟用：喺廿一世紀初，AI 最大嘅弱點係專化得滯，教一個 AI 幫手睇病，佢唔會識得（例如）做法律相關嘅判斷，但由現實經驗可知，人有能力學完一樣嘢走去學第樣；噉嘅其中一個重要原因係，人具有好奇心－喺手上資訊唔夠嗰陣，人往往會主動噉去搵新嘅資訊吸收；於是有 AI 研究者就提出咗「人工好奇」嘅概念，主張要用電腦模擬人類嘅好奇心，從而教到 AI 唔使吓吓都要由人類畀資訊佢，而係會曉自己搵資訊吸收^[6]^[7]。

對於人工好奇嘅概念，有人就舉咗個噉嘅例子^[8]：

想像家陣有個人喺一間超市裏面兜圈噉行，想搵菠菜；
喺每一步，個人都行經幾排貨架，呢啲貨架上面並冇菠菜；
如果個人係跟簡單 RL 行事嘅，佢會一路噉「周圍貨架冇菠菜，冇任何一個選項得到強化」，進入「永世都唔會離開個圈」嘅狀態；
但假如個人具有好奇心嘅能力，曉（例如）有動機想探索未行過嘅路線，隨機噉揀條圈以外嘅路線行－就有可能最後搵到菠菜，或者最少脫離「係噉兜圈」嘅狀態。

有研究者指，呢點就係缺乏好奇心嘅 AI 嘅問題所在－冇好奇心嘅智能體，一定要有人畀有用嘅資訊或者環境佢，先會有能力成長，但喺現實，人成日都會面對「周圍環境冇咩有用資訊」噉嘅情況，要自己去搵資訊^[9]；而好奇心正正就係能夠「令人自發噉去搵有用資訊」嘅嘢。要達致人工好奇，一段 RL 演算法起碼要有以下嘅嘢^[8]：

段演算法要有記憶能力，能夠記住過往嘅經驗；
段演算法要識得將「而家呢刻觀察到嘅嘢」攞去同記憶入面嘅片段對比；
$R_{a}$ 唔淨只取決於一個行動能唔能夠達到目的，仲要或多或少噉取決於「件行為帶嚟嘅觀察有幾新穎」（詳情可以睇吓好奇感啲起因）。

... 呀噉。

睇埋

參考

Auer, Peter; Jaksch, Thomas; Ortner, Ronald (2010). "Near-optimal regret bounds for reinforcement learning". Journal of Machine Learning Research. 11: 1563–1600.
Busoniu, Lucian; Babuska, Robert; De Schutter, Bart; Ernst, Damien (2010). Reinforcement Learning and Dynamic Programming using Function Approximators. Taylor & Francis CRC Press. ISBN 978-1-4398-2108-4.
François-Lavet, Vincent; Henderson, Peter; Islam, Riashat; Bellemare, Marc G.; Pineau, Joelle (2018). "An Introduction to Deep Reinforcement Learning". Foundations and Trends in Machine Learning. 11 (3–4): 219–354. arXiv:1811.12560. Bibcode:2018arXiv181112560F. doi:10.1561/2200000071.
Powell, Warren (2007). Approximate dynamic programming: solving the curses of dimensionality. Wiley-Interscience. ISBN 978-0-470-17155-4.
Sutton, Richard S.; Barto, Andrew G. (1998). Reinforcement Learning: An Introduction. MIT Press. ISBN 978-0-262-19398-6.
Sutton, Richard S. (1988). "Learning to predict by the method of temporal differences". Machine Learning. 3: 9–44. doi:10.1007/BF00115009.
Szita, Istvan; Szepesvari, Csaba (2010). "Model-based Reinforcement Learning with Nearly Tight Exploration Complexity Bounds" (PDF). ICML 2010. Omnipress. pp. 1031–1038. Archived from the original (PDF) on 2010-07-14.

攷

↑ Kaelbling, Leslie P.; Littman, Michael L.; Moore, Andrew W. (1996). "Reinforcement Learning: A Survey". Journal of Artificial Intelligence Research. 4: 237–285.
↑ Dominic, S.; Das, R.; Whitley, D.; Anderson, C. (July 1991). "Genetic reinforcement learning for neural networks". IJCNN-91-Seattle International Joint Conference on Neural Networks. Seattle, Washington, USA: IEEE.
↑ François-Lavet, Vincent; Henderson, Peter; Islam, Riashat; Bellemare, Marc G.; Pineau, Joelle (2018). "An Introduction to Deep Reinforcement Learning". Foundations and Trends in Machine Learning. 11 (3–4): 219–354.
↑ Dubey, R., Agrawal, P., Pathak, D., Griffiths, T. L., & Efros, A. A. (2018). Investigating human priors for playing video games. arXiv preprint arXiv:1802.10217.
↑ Algorta, S., & Şimşek, Ö. (2019). The Game of Tetris in Machine Learning. arXiv preprint arXiv:1905.01652.
↑ Schmidhuber, J. (2006). Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts (PDF). Connection Science, 18(2), 173-187.
↑ Schmidhuber, J. (2020). Generative adversarial networks are special cases of artificial curiosity (1990) and also closely related to predictability minimization (1991). Neural Networks, 127, 58-66.
↑ ^8.0 ^8.1 curiosity artificial intelligence (curiosity AI). Techtarget.
↑ How can Artificial Intelligence become curious?. Towards Data Science.

拎

Reinforcement Learning Repository.
Reinforcement Learning and Artificial Intelligence (RLAI, Rich Sutton's lab at the University of Alberta).
Autonomous Learning Laboratory (ALL, Andrew Barto's lab at the University of Massachusetts Amherst)/
Hybrid reinforcement learning.
Real-world reinforcement learning experiments at Delft University of Technology.
Stanford University Andrew Ng Lecture on Reinforcement Learning.
Dissecting Reinforcement Learning Series of blog post on RL with Python code.

[1] Kaelbling, Leslie P.; Littman, Michael L.; Moore, Andrew W. (1996). "Reinforcement Learning: A Survey". Journal of Artificial Intelligence Research. 4: 237–285.

[2] Dominic, S.; Das, R.; Whitley, D.; Anderson, C. (July 1991). "Genetic reinforcement learning for neural networks". IJCNN-91-Seattle International Joint Conference on Neural Networks. Seattle, Washington, USA: IEEE.

[3] François-Lavet, Vincent; Henderson, Peter; Islam, Riashat; Bellemare, Marc G.; Pineau, Joelle (2018). "An Introduction to Deep Reinforcement Learning". Foundations and Trends in Machine Learning. 11 (3–4): 219–354.

[4] Dubey, R., Agrawal, P., Pathak, D., Griffiths, T. L., & Efros, A. A. (2018). Investigating human priors for playing video games. arXiv preprint arXiv:1802.10217.

[5] Algorta, S., & Şimşek, Ö. (2019). The Game of Tetris in Machine Learning. arXiv preprint arXiv:1905.01652.

[6] Schmidhuber, J. (2006). Developmental robotics, optimal artificial curiosity, creativity, music, and the fine arts (PDF). Connection Science, 18(2), 173-187.

[7] Schmidhuber, J. (2020). Generative adversarial networks are special cases of artificial curiosity (1990) and also closely related to predictability minimization (1991). Neural Networks, 127, 58-66.

[techtar-8] 8.0 ^8.1 curiosity artificial intelligence (curiosity AI). Techtarget.

[9] How can Artificial Intelligence become curious?. Towards Data Science.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]