AlphaGo

AlphaGo 係由 Google 旗下嘅人工智能公司 Deepmind 開發嘅捉圍棋人工智能程式。

概論

喺廿一世紀嘅 AI 領域當中，捉圍棋一般俾人視為一樣好困難嘅工作，難過捉西洋象棋好多－西洋象棋喺每一個決策點有 35 個可能嘅棋步，而圍棋每一個決策點就有 250 個可能嘅棋步，所以後者要考慮嘅可能性多好多^[1]^[2]。AlphaGo 採取咗一套當時嶄新嘅做法－AlphaGo 個程式包含兩組深度神經網絡（deep neural network）：

一組係政策網絡（policy network），計算 $\Pr({\text{action}}|{\text{current state}})$ ^{[註 1]}，用嚟決定行乜嘢棋步，而
另一組係價值網絡（value network），計算 $\Pr({\text{victory}}|{\text{state and action}})$ ^{[註 2]}，用嚟評估棋盤嘅形勢，

然後工作組用監督式學習（supervised learning）訓練政策網絡，俾 AlphaGo 睇大量專業棋手捉棋嘅數據，學識計算 $\Pr({\text{action}}|{\text{current state}})$ ；然後用強化學習（reinforcement learning）訓練政策網絡，俾 AlphaGo 係噉同佢自己捉棋同學計邊啲 ${\text{action}}$ 能夠帶嚟勝利；再用強化學習訓練價值網絡計 $\Pr({\text{victory}}|{\text{state and action}})$ ^[3]。

喺真係捉棋嗰陣，個程式會靠蒙地卡羅樹搜索（Monte Carlo Tree Search）嘅方法：喺價值網絡同政策網絡嘅引導下揀要行邊步，即係 foreach 步，按價值網絡同政策網絡嘅 output 決定睇邊一個可能性，做若干次嘅模擬，然後再按模擬嘅結果揀要行邊一步。喺 2015 年 10 月，AlphaGo 初試牛刀，同職業棋手對奕，喺標準棋盤嘅情況下五戰全勝。喺 2016 年 3 月，佢再同九段（即係最高等級）棋手李世石對奕，五戰四勝，為人工智能玩遊戲開創咗歷史上嘅一次空前成功^[4]。

註釋

↑ 簡單講係「已知棋盤處於呢個狀態，大師級棋手會行呢步嘅機會率」；有關呢啲數學符號嘅意思，詳情可以睇概率論詞彙。
↑ 簡單講係「已知棋盤處於呢個狀態同行咗呢步，我方會贏嘅機會率」。

睇埋

文獻

Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., ... & Dieleman, S. (2016). Mastering the game of Go with deep neural networks and tree search (PDF). Nature, 529(7587), 484.

參考資料

↑ Allis, L. V. Searching for Solutions in Games and Artificial Intelligence. PhD thesis, Univ. Limburg, Maastricht, The Netherlands (1994).
↑ van den Herik, H., Uiterwijk, J. W. & van Rijswijck, J. Games solved: now and in the future. Artif. Intell. 134, 277–311 (2002).
↑ AlphaGo: How it works technically?. Medium.
↑ Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., ... & Dieleman, S. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484.

拎

AlphaGo嘅官方網站
AlphaGo wiki at Sensei's Library, including links to AlphaGo games.
AlphaGo page, with archive and games.

[3] 簡單講係「已知棋盤處於呢個狀態，大師級棋手會行呢步嘅機會率」；有關呢啲數學符號嘅意思，詳情可以睇概率論詞彙。

[4] 簡單講係「已知棋盤處於呢個狀態同行咗呢步，我方會贏嘅機會率」。

[1] Allis, L. V. Searching for Solutions in Games and Artificial Intelligence. PhD thesis, Univ. Limburg, Maastricht, The Netherlands (1994).

[2] van den Herik, H., Uiterwijk, J. W. & van Rijswijck, J. Games solved: now and in the future. Artif. Intell. 134, 277–311 (2002).

[5] AlphaGo: How it works technically?. Medium.

[6] Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., ... & Dieleman, S. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484.

[1]

[2]

[註 1]

[註 2]

[3]

[4]