深度學習

深度學習（粵拼：sam1 dou6 hok6 zaap6；英文：deep learning），又有叫深度結構化學習（deep structured learning），係一系列涉及「深入學習」嘅多層人工神經網絡技術，前饋同遞迴神經網絡都用得^[1]^[2]。

一個三層前饋神經網絡嘅圖解；每個圓圈代表一粒細胞，箭咀表示箭咀後嗰粒細胞嘅啟動程度受箭咀前嗰粒嘅影響。

深度學習嘅圖例；個神經網絡一層層噉將收到嘅資訊抽象化。

深度學習建基於人工神經網絡（artificial neural network）：一個多層嘅前饋神經網絡會有若干層隱藏細胞，每層細胞嘅啟動程度都受打前嗰層嘅細胞影響，所以當輸入層（input layer）細胞收到訊號（ $i$ ）啟動，就會引致後排嘅細胞啟動數值跟住改變，然後輸出層（output layer）細胞會有特定嘅輸出值（ $o$ ）－ $o=f(i)$ ，當中 $f$ 係代表個網絡嘅函數^[1]^[3]；一般機械學習演算法做嘅係要調整 $f$ 入面嗰啲參數（即係權重值等），令到做完學習之後，個網絡變得能夠準確噉由 $i$ 計出 $o$ ；深度學習做嘅就係進一步，想確保每兩浸細胞層之間都有準確嘅輸入輸出關係－如果由一個訓練咗嘅深度網絡裏面是但攞兩浸相鄰嘅細胞層出嚟，呢兩浸細胞層會能夠成為一個準確做預測嘅獨立網絡^[4]^[5]。

舉個例說明，想像有個用嚟處理動物圖像嘅前饋網絡；如果佢係一個非深度網絡，設計者會淨係想確保輸入層同輸出層之間有正確嘅關係－即係想個網絡能夠準確噉由圖像判斷「圖入面嗰隻係乜動物」－就算數。但如果佢係一個深度網絡，做法就會比較似以下嘅噉^[6]^[7]：

第一層負責接收幅圖嘅原始特徵（色水同點），
第二層會按原始特徵嘅位置分辨幅圖有邊啲綫，
第三層會按綫分辨幅圖有邊啲身體部位（例：有兩條相鄰打戙嘅線，噉嗰兩條線可能表示一條髀），
第四層會將所有身體部份一齊考慮，砌出一個表示隻動物抽象化之後嘅樣，
最後第五層就會計算抽象化之後個樣最似邊種已知動物，再俾返個標記佢（例：「隻動物係大笨象」）。

訓練好之後，如果由呢個網絡當中是但抽兩層相鄰嘅細胞出嚟，嗰兩層能夠成一個獨立做到有用運算嘅神經網絡^{[註 1]}。噉亦都表示，深度學習能夠令人工神經網絡好似人類噉學識做有層次嘅知識表示。因為噉，深度學習技術喺人工智能（AI）領域上相當受重視^[8]^[9]。

理論基礎

一個三層前饋網絡嘅抽象圖；每個圓圈代表一粒模擬神經細胞，而每粒細胞嘅啟動程度由佢打前嗰排細胞嘅啟動程度話事^[10]。

深度學習本質上係一種用人工神經網絡嚟有層次噉表示知識嘅做法^[1]^{:p. 199}：

人工神經網絡

深度學習建基於人工神經網絡（artificial neural network）：一個人工神經網絡由大量嘅人工神經細胞（artificial neuron）組成；喺用電腦程式整神經網絡嗰陣，研究者可以每粒人工神經細胞都同佢設返個變數代表佢嘅啟動程度（activation level）^[11]，而每粒神經細胞嘅啟動程度嘅數值都有條式計，呢條式包括咗喺佢之前嗰啲神經細胞嘅啟動程度。啲函數當中嘅參數可以變化，而如果個神經網絡嘅程式有演算法教佢靠經驗調整呢啲參數嘅話，嗰個神經網絡就會具有學習嘅能力^[12]^[13]。即係例如個程式會 foreach 細胞有一條類似噉樣嘅算式：

t=W_{1}A_{1}+W_{2}A_{2}...

；（啟動函數）

當中 $t$ 代表粒神經細胞嘅啟動程度， $A_{n}$ 代表前嗰排嘅神經細胞當中第 $n$ 粒嘅啟動程度，而 $W_{n}$ 就係前嗰排嘅神經細胞當中第 $n$ 粒嘅權重（指嗰粒神經細胞有幾影響到 $t$ ）。所以當一粒人工神經細胞啟動嗰陣，會帶起佢後面啲細胞跟住佢啟動－似十足生物神經網絡入面嗰啲神經細胞噉。假如個神經網絡嘅程式令佢能夠自行按照經驗改變 $W_{n}$ 嘅數值嘅話，佢就會曉學習^{[註 2]}^[12]^[14]。

知識表示

人工神經網絡可以用嚟做知識表示（knowledge representation）：知識表示係指一個心靈（mind）－無論係人類嘅定係人工智能－內部點樣表示有關「周圍世界點運作」嘅資訊，以及點樣運用呢啲資訊對感知到嘅事物作出判斷同預測^[15]^[16]^[17]。

舉個簡單例子說明，想像家陣有一個噉嘅人工神經網絡：個網絡有兩層，每層有若干粒神經細胞，第一層代表「見到嗰樣物件嘅特徵」，而第二層代表「將件物件分做乜嘢類」，第二層每粒神經細胞會同第一層嘅某啲細胞有連繫，喺某啲特定嘅第一層細胞啟動嗰時會跟住啟動，即係話，攞第 $i$ 粒第二層細胞：

C_{i}=W_{1}A_{1}+W_{2}A_{2}...

；（有個啟動函數）

假想第一層細胞每粒代表咗一個特徵，而第二層細胞每粒代表咗一個類別。當個網絡望到外界有一件具有「有毛皮」、「有鬚」、同「四隻腳」等特徵嘅物件嗰時，呢啲特徵相應嘅第一層細胞就會啟動，而第二層細胞當中同呢柞細胞有連繫嗰粒－例如代表「貓」嗰粒第二層細胞－就會跟住啟動。噉即係話，呢個網絡曉一接收到某啲特徵，就將件物件歸類做某個相應嘅類別－「貓有毛皮、有鬚、四隻腳」係一個知識，而呢個知識就係用一個神經網絡嘅方式「表示」咗出嚟^[18]^[19]。

核心諗頭

廿世紀嘅機械學習技術多數都係「淺」嘅：人工神經網絡做嘅嘢可以想像成特徵提取（feature extraction）－「特徵提取」指由數據數值嗰度計一啲新數值出嚟，而新數值可以內含有用嘅資訊，例如一個神經網絡由輸入層嗰 12 粒細胞嘅啟動程度值（數據數值）計跟住嗰層隱藏層嗰 8 粒細胞嘅啟動程度值（新數值），就係一個特徵提取過程；廿世紀嘅人工神經網絡通常頂櫳得嗰一至兩層非線性嘅特徵提取；實驗表明咗，呢類做法能夠有效噉解決好多相對簡單嘅問題，但理論上，呢啲咁簡單嘅網絡表示知識（知識係事物之間嘅關係）嘅能力有限，而事實亦都證實咗，呢啲簡單嘅網絡難以應付語言以及影像等複雜嘅問題^[1]^{:Ch. 2}。

一個人腦打橫切開嘅樣；油咗淺綠色嗰忽（最近後尾枕嗰忽）就係人腦嘅視覺皮層（visual cortex），即係大腦皮層當中專門處理視覺資訊嗰一橛。

同時，認知科學（cognitive science）上嘅發展亦都啟發咗機械學習領域：對認知嘅研究表明咗，人腦喺好多情況下都會以分層式（hierarchical）嘅方法嚟表示知識；例如有關視覺嘅研究就發現，人腦視覺系統（visual system）會有一柞特定嘅神經細胞負責處理最基本嘅資訊（例如「視野入面每一點係乜嘢色水」），而下一層嘅細胞會處理由呢啲資訊提取出嚟嘅資訊（例如「視野入面有邊啲線條」）... 如此類推^[20]^[21]，喺分層方式上好似一個多層嘅前饋神經網絡^{[註 3]}^[22]；呢種每層都做特徵提取，而且每層都表示緊某啲特定有用知識嘅編碼方法就係所謂嘅分層式知識表示－深度學習就係受呢種人腦知識表示法所啟發而有嘅諗頭^[1]^{:Ch. 2}。

深度學習當中嘅「深度」大致上表示有幾多層嘅人工神經細胞做咗特徵提取：人工神經網絡嘅深度可以用歸功分配路線（credit assignment path，CAP）嘅概念嚟想像，一條 CAP 係指由輸入去輸出嗰一連串嘅變化，而 CAP 嘅長度反映學習嘅「深度」；例如一個有 4 浸隱藏層嘅前饋網絡，由輸入層細胞去到最後輸出當中，啲數據頂櫳會經歷 4 + 1 = 5 次嘅轉化，而喺一個遞迴網絡裏面，CAP 嘅長度理論上可以係無限大－如果嗰個遞迴網絡設計成會儲起時間點 $t$ 嘅輸入（ $i_{t}$ ），並且俾呢份儲起咗嘅資訊影響時間點 $t+1$ 、 $t+2$ 、 $t+3...$ 嘅隱藏層狀態，噉如果個網絡一路唔終止係噉行， $i_{t}$ 理論上有可能會能夠影響無限咁多層之後嘅細胞嘅狀態。深度學習講緊嘅就係 CAP 嘅長度－一般嚟講，CAP 嘅數值超過 2 就可以算係「深度」嘅學習，而一個網絡嘅 CAP 長度會點影響佢嘅學習能力（能夠學習幾複雜嘅法則）喺人工神經網絡研究上係一個相當大嘅課題^[23]。

受限玻茲曼機

受限玻茲曼機（restricted Boltzmann machine，RBM）係一種簡單神經網絡，而深度神經網絡往往係由多部受限玻茲曼機砌埋一齊形成嘅：一部受限玻茲曼機分隱藏細胞（hidden units）同可見細胞（visible units），隱藏細胞彼此之間唔可以有連繫，而可見細胞之間亦都唔可以有連繫，每一條連繫都係連接住一粒隱藏細胞同埋一粒可見細胞嘅；每粒可見細胞都同所有隱藏細胞有連繫。例如下圖就係一個有四粒可見細胞（柞 $v_{i}$ ）同三粒隱藏細胞（柞 $h_{i}$ ）嘅受限玻茲曼機^[24]^[25]：

想像而家一個研究者噉做：佢俾一柞輸入落去柞可見細胞嗰度，可見細胞狀態成 $\mathbf {v} _{0}$ ，再等柞隱藏細胞按 $\mathbf {v} _{0}$ 同權重啟動成狀態 $\mathbf {h} _{0}$ （向前傳遞；forward pass）；然後第二步係做重構（reconstruction）－將隱藏細胞嘅狀態 $\mathbf {h} _{0}$ 做輸入，等可見細胞按呢柞輸入同權重變成狀態 $\mathbf {v} _{1}$ ；因為啲權重一般會喺初始化嗰陣設做隨機數值，所以 $\mathbf {v} _{1}$ 同 $\mathbf {v} _{0}$ 之間嘅差異（重構誤差）會相當大－「重構原初輸入」嘅工作失敗；喺呢個過程當中，部受限玻茲曼機會俾出兩樣資訊^[24]^[26]：

喺向前傳遞途中，個網絡可以俾有關 $\Pr(\mathbf {h} _{0}|\mathbf {v} _{0})$ 嘅資訊（由 $\mathbf {v} _{0}$ 估 $\mathbf {h} _{0}$ 嘅概率分佈）；而
喺重構途中，個網絡就可以俾有關 $\Pr(\mathbf {v} _{1}|\mathbf {h} _{0})$ 嘅資訊（由 $\mathbf {h} _{0}$ 估 $\mathbf {v} _{1}$ 嘅概率分佈）。

想像而家行咗學習演算法，部受限玻茲曼機能夠可靠噉重構原初輸入，每次都係 $\mathbf {v} _{1}$ ≈ $\mathbf {v} _{0}$ （可以睇埋自編碼器）；呢部受限玻茲曼機嘅輸入係一幅有若干像素嘅圖像（ $\mathbf {v}$ ），而隱藏層嘅細胞（ $\mathbf {h}$ ）表示嘅係「幅圖入面有啲乜嘢身體部位」；下一步，研究者再用隱藏層 $\mathbf {h}$ 做輸入，砌多個隱藏層（ $\mathbf {\text{new h}}$ ）上去， $\mathbf {\text{new h}}$ 表示嘅係「幅圖係乜嘢動物」（輸入輸出疊放），形成一個深度信念網絡。而最後得出呢個網絡能夠做到以下嘅嘢：

準確噉計 $\Pr(\mathbf {h} _{0}|\mathbf {v} _{0})$ ，等同「見到呢幅圖，估幅圖入面有邊啲身體部位」；
準確噉計 $\Pr(\mathbf {v} _{1}|\mathbf {h} _{0})$ ，等同「諗到呢啲身體部位，幅圖大致會係點嘅樣」；
準確噉計 $\Pr(\mathbf {\text{new h}} _{0}|\mathbf {h} _{0})$ ，等同「按照幅圖入面有嘅身體部位，估計幅圖係乜嘢動物」（例：如果幅圖有四隻腳，噉嗰隻嘢應該唔會係昆蟲）；
準確噉計 $\Pr(\mathbf {h} _{0}|\mathbf {\text{new h}} _{0})$ ，等同「已知手上有呢種動物，呢種動物有乜身體部位」

－呢一個網絡成功做到分層知識表示嘅效果^[26]^[27]。

深度信念網絡

學習範式

監督式學習

深度學習可以用監督式學習（supervised learning）嚟做：要製作一個深度嘅神經網絡，最直接嘅做法係輸入輸出疊放（input-output stacking），即係首先用監督式學習訓練一部多層感知機（MLP），等呢個網絡訓練好，能夠俾出正確嘅輸出，就用呢個網絡嘅輸出做下一部多層感知機嘅輸入，要嗰個新網絡做監督式學習，學到能夠準確噉由先前嗰個網絡嘅輸出嗰度計出下一層嘅輸出，如此類推，直至到製作出一個每層都表示到有用知識嘅深度疊放網絡（deep stacking network）^[28]^[29]。

一個深度網絡可以諗成一個上圖噉嘅神經抽象化金字塔（neural abstraction pyramid）：最低層嘅子網絡負責由最基本嘅數據（例：一幅圖每點係乜嘢色水）計出一啲抽象化少少嘅數據（例：一條一條線），如是者每層都將啲數據抽象化。

非監督式學習

深度神經網絡可以用非監督式學習（unsupervised learning）嘅方法做。深度自編碼器（deep autoencoder）就係一種用非監督式學習嚟訓練嘅深度神經網絡－自編碼器（autoencoder）係一種神經網絡，訓練嚟想佢由輸入俾出同輸入一樣嘅輸出，可以用嚟做降維等嘅工作。一個最簡單嘅收縮自編碼器得一浸隱藏層，隱藏層細胞數量少過輸入層嘅，輸出層細胞數量同輸入層一樣，所以喺行嗰陣，浸隱藏層會由輸入層做特徵提取，而且因為隱藏層細胞數量少過輸入層，所以（假如個自編碼器經已訓練好）浸隱藏層會做到「用數量少嘅特徵表示輸入」嘅效果－呢樣工作就係所謂嘅降維^[28]^[30]。

一個深度自編碼器包含兩個互相對稱嘅深度信念網絡：一個深度自編碼器分入碼面（encoder）同解碼面（decoder），入碼面係一個由 4 至 5 部受限玻茲曼機一層層砌埋一齊而成嘅深度信念網絡；解碼面同入碼面對稱－即係話入碼器會一層層噉做特徵提取，最後成一個壓縮特徵向量（compressed feature vector），然後（假設個深度自編碼器經已訓練好）解碼面會一層層噉做解碼，最後解碼出嘅輸出會同輸入一樣。喺實際應用上，深度自編碼器可以攞嚟做數據壓縮嘅工作^[31]^[32]。

上圖係一個簡單（總共得 5 層）深度自編碼器嘅結構圖解； $X$ 係輸入， $X'$ 係輸出層，中間嘅係隱藏層，而 $z$ 係壓縮特徵向量；左面嘅入碼面會由 $X$ 做特徵提取，然後（只要個網絡訓練好）解碼面會將入咗碼嘅特徵數值變返做原本輸出， $X'$ ≈ $X$ 。

認知科學應用

大腦皮層嘅橫切面抽象圖解；大腦皮層分做六大層。

深度學習呢個諗頭同認知科學（cognitive science）息息相關：

研究重點

純認知科學有興趣研究深度學習，但呢啲研究同應用人工智能上嘅比起嚟，更加重視個深度學習模型嘅真實性。例如認知科學上嘅研究就有試過噉做：研究者指出，深度神經網絡喺「多層」呢一點上的確似生物神經網絡，但廿世紀尾嘅深度網絡往往係靠反向傳播算法（back propagation）嚟更新個網絡啲權重嘅，而呢點同生物神經網絡並唔似－神經科學研究指，如果生物神經細胞真係靠反向傳播算法噉嘅做法嚟學習嘅話，生物神經細胞傳咗個訊號之後，理應會收到反向傳返嚟嘅訊號話俾佢哋知要點改變啲突觸（synapse），但呢點唔符合現實^[33]。因為噉，有認知科學家就研究諗可唔可以用比較合符神經科學發現嘅演算法達到教深度網絡學習嘅效果。相比之下，應用人工智能嘅研究係工程學，會比較在乎個模型係咪能夠有效噉解難－就算個模型唔能夠準確噉描述現實世界嘅認知系統都好^[34]^[35]。

同發育嘅啦掕

喺理論上，分做多層嘅特徵提取做法（無論係喺人腦當中定係喺人工智能當中都好）令到一個認知系統能夠將新知識建基於現有知識之上－如果一個認知系統係有能力用分層式嘅方法做知識表示嘅，佢就可以喺得到新知識嗰陣，將知識表示為由先前嘅某啲知識做特徵提取出嘅嘢，例：個認知系統之前學識咗由見到嘅圖像嗰度認定一隻動物係貓定老虎，如果佢能夠做分層嘅知識表示，佢可以再砌咗層知識上去，將「老虎」同「危險」聯想埋一齊，但唔將「貓」同危險聯想埋一齊－變相可以慳認知資源^[36]^[37]。

實際嘅神經科學研究表明咗，人腦嘅學習方式喺「分層」呢一點上似深度學習模型－大腦皮層（cerebral cortex；人腦最外面嗰浸）有一啲分層嘅結構，每層嘅神經細胞都會由打前嘅層嘅細胞嗰度攞訊號，然後將訊號射去下一層嘅細胞嗰度，而喺腦發育嘅過程當中，呢啲細胞層仲會逐層逐層噉成熟（低層嘅細胞發育完成咗先到下一層）^[38]^[39]。

應用

喺廿一世紀，深度學習有以下嘅應用，而且喺某啲情況當中，深度學習嘅表現仲好過人類：

... 等等。

批評

簡史

註釋

↑ 可以睇埋受限玻茲曼機。
↑ 學習喺定義上係「根據經驗改變自己嘅行為」。
↑ 但人工神經細胞依然係喺某啲結構特徵上同生物神經細胞有差異。

睇埋

參考文獻

Bengio, Y., Lee, D. H., Bornschein, J., Mesnard, T., & Lin, Z. (2015). Towards biologically plausible deep learning (PDF). arXiv preprint arXiv:1502.04156.
Bengio, Yoshua; Lamblin, Pascal; Popovici, Dan; Larochelle, Hugo (2007). Greedy layer-wise training of deep networks (PDF). Advances in neural information processing systems. pp. 153–160.
Deng, L.; Yu, D. (2014). "Deep Learning: Methods and Applications" (PDF). Foundations and Trends in Signal Processing. 7 (3–4): 1-199.
Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron (2016). Deep Learning. MIT Press. ISBN 978-0-26203561-3.
Kriegeskorte, N. (2015). Deep neural networks: a new framework for modeling biological vision and brain information processing (PDF). Annual review of vision science, 1, 417-446.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning (PDF). Nature, 521(7553), 436-444.
Perconti, P., & Plebe, A. (2020). Deep learning and cognitive science. Cognition, 203, 104365.
Utgoff, P. E.; Stracuzzi, D. J. (2002). "Many-layered learning" (PDF). Neural Computation. 14 (10): 2497–2529.

攷

↑ ^1.0 ^1.1 ^1.2 ^1.3 ^1.4 Deng, L.; Yu, D. (2014). "Deep Learning: Methods and Applications" (PDF). Foundations and Trends in Signal Processing. 7 (3–4): 1-199.
↑ LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning (PDF). Nature, 521(7553), 436-444.
↑ Omidvar, O., & Elliott, D. L. (1997). Neural systems for control. Elsevier.
↑ Bengio, Y.; Courville, A.; Vincent, P. (2013). "Representation Learning: A Review and New Perspectives". IEEE Transactions on Pattern Analysis and Machine Intelligence. 35 (8): 1798–1828.
↑ Schmidhuber, J. (2015). "Deep Learning in Neural Networks: An Overview". Neural Networks. 61: 85–117.
↑ ^6.0 ^6.1 Ciresan, Dan; Meier, U.; Schmidhuber, J. (June 2012). "Multi-column deep neural networks for image classification". 2012 IEEE Conference on Computer Vision and Pattern Recognition: 3642–3649.
↑ Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffry (2012). "ImageNet Classification with Deep Convolutional Neural Networks" (PDF). NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada.
↑ Marblestone, Adam H.; Wayne, Greg; Kording, Konrad P. (2016). "Toward an Integration of Deep Learning and Neuroscience". Frontiers in Computational Neuroscience. 10: 94.
↑ Olshausen, B. A. (1996). "Emergence of simple-cell receptive field properties by learning a sparse code for natural images". Nature. 381 (6583): 607–609.
↑ "Artificial Neural Networks as Models of Neural Information Processing | Frontiers Research Topic". Retrieved 2018-02-20.
↑ The Machine Learning Dictionary - activation level 互聯網檔案館嘅歸檔，歸檔日期2018年8月26號，..
↑ ^12.0 ^12.1 Learning process of a neural network 互聯網檔案館嘅歸檔，歸檔日期2021年2月11號，.. Towards Data Science.
↑ Tahmasebi; Hezarkhani (2012). "A hybrid neural networks-fuzzy logic-genetic algorithm for grade estimation". Computers & Geosciences. 42: 18–27
↑ Ivakhnenko, A. G.; Grigorʹevich Lapa, Valentin (1967). Cybernetics and forecasting techniques. American Elsevier Pub. Co.
↑ Roger Schank; Robert Abelson (1977). Scripts, Plans, Goals, and Understanding: An Inquiry Into Human Knowledge Structures. Lawrence Erlbaum Associates, Inc.
↑ Davis, Randall; Howard Shrobe; Peter Szolovits (Spring 1993). "What Is a Knowledge Representation?". AI Magazine. 14 (1): 17–33.
↑ Hoskins, J. C., & Himmelblau, D. M. (1988). Artificial neural network models of knowledge representation in chemical engineering. Computers & Chemical Engineering, 12(9-10), 881-890.
↑ Socher, R., Chen, D., Manning, C. D., & Ng, A. (2013). Reasoning with neural tensor networks for knowledge base completion. In Advances in neural information processing systems (pp. 926-934).
↑ Duda, Richard O.; Hart, Peter Elliot; Stork, David G. (2001). Pattern classification (2 ed.). Wiley.
↑ Riesenhuber, M., and Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nat. Neurosci. 2, 1019–1025.
↑ Hung, C. P., Kreiman, G., Poggio, T., & DiCarlo, J. J. (2005). Fast readout of object identity from macaque inferior temporal cortex (PDF). Science, 310(5749), 863-866.
↑ The differences between Artificial and Biological Neural Networks. Towards Data Science.
↑ Shigeki, Sugiyama (2019-04-12). Human Behavior and Another Kind in Consciousness: Emerging Research and Opportunities: Emerging Research and Opportunities. IGI Global.
↑ ^24.0 ^24.1 A Beginner's Guide to Restricted Boltzmann Machines (RBMs). Pathmind.
↑ Larochelle, H.; Bengio, Y. (2008). Classification using discriminative restricted Boltzmann machines (PDF). Proceedings of the 25th international conference on Machine learning - ICML '08.
↑ ^26.0 ^26.1 Restricted Boltzmann Machines - Simplified. Towards Data Science.
↑ Bengio, Y. (2009). "Learning Deep Architectures for AI". Foundations and Trends in Machine Learning. 2 (1): 1–127.
↑ ^28.0 ^28.1 Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layerwise training of deep networks. In Proceedings of Neural Information Processing Systems (NIPS). 2006.
↑ D. Wolpert. Stacked generalization. Neural Networks, 5(2):241–259. 1992.
↑ Vincent, Pascal; Larochelle, Hugo (2010). "Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion". Journal of Machine Learning Research. 11: 3371–3408.
↑ Deep Autoencoders.
↑ Autoencoders: Neural Networks for Unsupervised Learning. Medium..
↑ Rumelhart, D. E., Hinton, G. E., and Williams, R. J. 1986a. Learning internal representations by error propagation, Chap. 8. In Parallel Distributed Processing. Volume I: Foundations, D. E. Rumelhart, J. L. McClelland, and PDP Research Group, eds., pp. 318-362. MIT Press, Cambridge, MA.
↑ Mazzoni, P., Andersen, R. A., & Jordan, M. I. (1991). A more biologically plausible learning rule for neural networks (PDF). Proceedings of the National Academy of Sciences, 88(10), 4433-4437.
↑ O'Reilly, R. C. (1996). Biologically plausible error-driven learning using local activation differences: The generalized recirculation algorithm (PDF). Neural computation, 8(5), 895-938.
↑ Utgoff, P. E.; Stracuzzi, D. J. (2002). "Many-layered learning" (PDF). Neural Computation. 14 (10): 2497–2529.
↑ Elman, Jeffrey L. (1998). Rethinking Innateness: A Connectionist Perspective on Development. MIT Press.
↑ Shrager, J.; Johnson, MH (1996). "Dynamic plasticity influences the emergence of function in a simple cortical array". Neural Networks. 9 (7): 1119–1129.
↑ Quartz, SR; Sejnowski, TJ (1997). "The neural basis of cognitive development: A constructivist manifesto". Behavioral and Brain Sciences. 20 (4): 537–556.

拎

Deep Learning. Scholarpedia.
International Machine Learning Society.
mloss is an academic database of open-source machine learning software.

[8] 可以睇埋受限玻茲曼機。

[15] 學習喺定義上係「根據經驗改變自己嘅行為」。

[24] 但人工神經細胞依然係喺某啲結構特徵上同生物神經細胞有差異。

[deng2014-1] 1.0 ^1.1 ^1.2 ^1.3 ^1.4 Deng, L.; Yu, D. (2014). "Deep Learning: Methods and Applications" (PDF). Foundations and Trends in Signal Processing. 7 (3–4): 1-199.

[natureDD-2] LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning (PDF). Nature, 521(7553), 436-444.

[3] Omidvar, O., & Elliott, D. L. (1997). Neural systems for control. Elsevier.

[4] Bengio, Y.; Courville, A.; Vincent, P. (2013). "Representation Learning: A Review and New Perspectives". IEEE Transactions on Pattern Analysis and Machine Intelligence. 35 (8): 1798–1828.

[5] Schmidhuber, J. (2015). "Deep Learning in Neural Networks: An Overview". Neural Networks. 61: 85–117.

[ciresan2012-6] 6.0 ^6.1 Ciresan, Dan; Meier, U.; Schmidhuber, J. (June 2012). "Multi-column deep neural networks for image classification". 2012 IEEE Conference on Computer Vision and Pattern Recognition: 3642–3649.

[7] Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffry (2012). "ImageNet Classification with Deep Convolutional Neural Networks" (PDF). NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada.

[9] Marblestone, Adam H.; Wayne, Greg; Kording, Konrad P. (2016). "Toward an Integration of Deep Learning and Neuroscience". Frontiers in Computational Neuroscience. 10: 94.

[10] Olshausen, B. A. (1996). "Emergence of simple-cell receptive field properties by learning a sparse code for natural images". Nature. 381 (6583): 607–609.

[11] "Artificial Neural Networks as Models of Neural Information Processing | Frontiers Research Topic". Retrieved 2018-02-20.

[12] The Machine Learning Dictionary - activation level 互聯網檔案館嘅歸檔，歸檔日期2018年8月26號，..

[learning-13] 12.0 ^12.1 Learning process of a neural network 互聯網檔案館嘅歸檔，歸檔日期2021年2月11號，.. Towards Data Science.

[14] Tahmasebi; Hezarkhani (2012). "A hybrid neural networks-fuzzy logic-genetic algorithm for grade estimation". Computers & Geosciences. 42: 18–27

[16] Ivakhnenko, A. G.; Grigorʹevich Lapa, Valentin (1967). Cybernetics and forecasting techniques. American Elsevier Pub. Co.

[roger1977-17] Roger Schank; Robert Abelson (1977). Scripts, Plans, Goals, and Understanding: An Inquiry Into Human Knowledge Structures. Lawrence Erlbaum Associates, Inc.

[randall1993-18] Davis, Randall; Howard Shrobe; Peter Szolovits (Spring 1993). "What Is a Knowledge Representation?". AI Magazine. 14 (1): 17–33.

[19] Hoskins, J. C., & Himmelblau, D. M. (1988). Artificial neural network models of knowledge representation in chemical engineering. Computers & Chemical Engineering, 12(9-10), 881-890.

[socher2013-20] Socher, R., Chen, D., Manning, C. D., & Ng, A. (2013). Reasoning with neural tensor networks for knowledge base completion. In Advances in neural information processing systems (pp. 926-934).

[21] Duda, Richard O.; Hart, Peter Elliot; Stork, David G. (2001). Pattern classification (2 ed.). Wiley.

[22] Riesenhuber, M., and Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nat. Neurosci. 2, 1019–1025.

[23] Hung, C. P., Kreiman, G., Poggio, T., & DiCarlo, J. J. (2005). Fast readout of object identity from macaque inferior temporal cortex (PDF). Science, 310(5749), 863-866.

[diff-25] The differences between Artificial and Biological Neural Networks. Towards Data Science.

[26] Shigeki, Sugiyama (2019-04-12). Human Behavior and Another Kind in Consciousness: Emerging Research and Opportunities: Emerging Research and Opportunities. IGI Global.

[pathmindRBM-27] 24.0 ^24.1 A Beginner's Guide to Restricted Boltzmann Machines (RBMs). Pathmind.

[28] Larochelle, H.; Bengio, Y. (2008). Classification using discriminative restricted Boltzmann machines (PDF). Proceedings of the 25th international conference on Machine learning - ICML '08.

[RBMtowardsDS-29] 26.0 ^26.1 Restricted Boltzmann Machines - Simplified. Towards Data Science.

[30] Bengio, Y. (2009). "Learning Deep Architectures for AI". Foundations and Trends in Machine Learning. 2 (1): 1–127.

[bengio2006-31] 28.0 ^28.1 Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layerwise training of deep networks. In Proceedings of Neural Information Processing Systems (NIPS). 2006.

[32] D. Wolpert. Stacked generalization. Neural Networks, 5(2):241–259. 1992.

[33] Vincent, Pascal; Larochelle, Hugo (2010). "Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion". Journal of Machine Learning Research. 11: 3371–3408.

[34] Deep Autoencoders.

[35] Autoencoders: Neural Networks for Unsupervised Learning. Medium..

[36] Rumelhart, D. E., Hinton, G. E., and Williams, R. J. 1986a. Learning internal representations by error propagation, Chap. 8. In Parallel Distributed Processing. Volume I: Foundations, D. E. Rumelhart, J. L. McClelland, and PDP Research Group, eds., pp. 318-362. MIT Press, Cambridge, MA.

[37] Mazzoni, P., Andersen, R. A., & Jordan, M. I. (1991). A more biologically plausible learning rule for neural networks (PDF). Proceedings of the National Academy of Sciences, 88(10), 4433-4437.

[38] O'Reilly, R. C. (1996). Biologically plausible error-driven learning using local activation differences: The generalized recirculation algorithm (PDF). Neural computation, 8(5), 895-938.

[39] Utgoff, P. E.; Stracuzzi, D. J. (2002). "Many-layered learning" (PDF). Neural Computation. 14 (10): 2497–2529.

[40] Elman, Jeffrey L. (1998). Rethinking Innateness: A Connectionist Perspective on Development. MIT Press.

[41] Shrager, J.; Johnson, MH (1996). "Dynamic plasticity influences the emergence of function in a simple cortical array". Neural Networks. 9 (7): 1119–1129.

[42] Quartz, SR; Sejnowski, TJ (1997). "The neural basis of cognitive development: A constructivist manifesto". Behavioral and Brain Sciences. 20 (4): 537–556.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[註 1]

[8]

[9]

[10]

[11]

[12]

[13]

[註 2]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[註 3]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]