K-平均聚類法

K-平均聚類法（英文：k-means clustering）係一種常用嘅聚類分析演算法。

做法

而家

k=3

，研究者想將啲數據分做三個聚類－由藍色圓圈、黃色交叉同埋紅色十字代表。

K-平均聚類法嘅出發點係假定咗已知拃數據分幾多個聚類（聚類數量 $=k$ ）。喺最簡單嗰種情況下，K-平均聚類法可以噉嚟想像^[1]^[2]：

設 $n$ 做個案嘅數量而 $k$ 做聚類數量；
建立 $k$ 咁多個中心點（cluster center），每個中心點嘅位置叫 ${\boldsymbol {\mu }}_{i}$ ^{[註 1]}；
是但攞其中一個個案 $\mathbf {x} _{i}$ 嚟睇， $\left\|\mathbf {x} _{i}-{\boldsymbol {\mu }}_{i}\right\|$ 表示 $\mathbf {x} _{i}$ 同離佢最近嗰個 ${\boldsymbol {\mu }}_{i}$ 之間嘅距離；
慢慢調節啲中心點嘅位置^{[註 2]}，務求令啲 $\left\|\mathbf {x} -{\boldsymbol {\mu }}_{i}\right\|$ 嘅總值有咁細得咁細^{[註 3]}^[3]。

數學化啲噉講，K-平均聚類法係要將 $n$ 個個案分拆做 $k$ 個集（ $k\leq n$ ）， $\mathbf {S} =\{S_{1},S_{2},...S_{k}\}$ ，搵出：

{\underset {\mathbf {S} }{\operatorname {arg\,min} }}\sum _{i=1}^{k}\sum _{\mathbf {x} \in S_{i}}\left\|\mathbf {x} -{\boldsymbol {\mu }}_{i}\right\|^{2}={\underset {\mathbf {S} }{\operatorname {arg\,min} }}\sum _{i=1}^{k}|S_{i}|\operatorname {Var} S_{i}

K-平均聚類法亦可以用圖嚟表示。睇吓附圖幅 gif：想像家陣 $k=3$ ，研究者想將啲數據分做三個聚類－聚類由藍色圓圈、黃色交叉同埋紅色十字代表；當中大藍色圓圈、大黃色交叉同大紅色十字表示嗰三個聚類嘅預想中心點；K-平均聚類法就可以當成一次一次（隨住 iteration 數值上升）噉郁動三個中心點，直接達到理想數值（例如 $\left\|\mathbf {x} -{\boldsymbol {\mu }}_{i}\right\|$ 嘅總值去到最細）為止。

註釋

↑ 初始化嗰陣，中心點嘅位置可以隨機噉設。
↑ 即係好似爬山演算法等嘅最佳化做法噉。
↑ K-平均聚類法仲可以用第啲基準嚟界定乜先係「理想嘅中心點位置」，例如係要每個聚類內部嘅距離有咁細得咁細。

睇埋

等級聚類法

參考資料

↑ Tryon, Robert C. (1939). Cluster Analysis: Correlation Profile and Orthometric (factor) Analysis for the Isolation of Unities in Mind and Personality. Edwards Brothers.
↑ Arthur, David; Manthey, B.; Roeglin, H. (2009). "k-means has polynomial smoothed complexity". Proceedings of the 50th Symposium on Foundations of Computer Science (FOCS). arXiv:0904.1113.
↑ 引用錯誤無效嘅<ref>標籤；無文字提供畀叫做kaufman1990嘅參照

呢篇統計學文係楔位文。歡迎幫維基百科擴寫佢。

[3] 初始化嗰陣，中心點嘅位置可以隨機噉設。

[4] 即係好似爬山演算法等嘅最佳化做法噉。

[5] K-平均聚類法仲可以用第啲基準嚟界定乜先係「理想嘅中心點位置」，例如係要每個聚類內部嘅距離有咁細得咁細。

[1] Tryon, Robert C. (1939). Cluster Analysis: Correlation Profile and Orthometric (factor) Analysis for the Isolation of Unities in Mind and Personality. Edwards Brothers.

[2] Arthur, David; Manthey, B.; Roeglin, H. (2009). "k-means has polynomial smoothed complexity". Proceedings of the 50th Symposium on Foundations of Computer Science (FOCS). arXiv:0904.1113.

[kaufman1990-6] 引用錯誤無效嘅<ref>標籤；無文字提供畀叫做kaufman1990嘅參照

[1]

[2]

[註 1]

[註 2]

[註 3]

[3]