條件概率

${\displaystyle P(X\mid Y)}$

基礎概念

${\displaystyle P(A\mid B)}$

${\displaystyle P(A\mid B)={\frac {P(A\cap B)}{P(B)}}}$ ${\displaystyle [1]}$

${\displaystyle P(A\cap B)=0.12}$ ${\displaystyle A}$ ${\displaystyle B}$  都發生嘅機率）
${\displaystyle P(B)=0.12+0.04}$ ${\displaystyle B}$  發生嘅機率）

${\displaystyle P(A\mid B_{2})}$  就會係

${\displaystyle {\frac {0.12}{0.12+0.04}}=0.75}$

${\displaystyle P(X\mid Y)={\frac {P(X\cap Y)}{P(Y)}}={\frac {0}{P(Y)}}=0}$

—「已知 ${\displaystyle Y}$  發生咗而兩件事係互斥嘅，${\displaystyle X}$  唔會發生[註 1]。」

統計獨立

${\displaystyle P(A\mid B)}$  ${\displaystyle P(A)}$  ${\displaystyle 0}$
${\displaystyle P(B\mid A)}$  ${\displaystyle P(B)}$  ${\displaystyle 0}$
${\displaystyle P(A\cap B)}$  ${\displaystyle P(A)P(B)}$  ${\displaystyle 0}$

${\displaystyle P(A\cap B)=P(A)P(B)}$

${\displaystyle P(A\mid B)={\frac {P(A\cap B)}{P(B)}}}$

${\displaystyle P(A\mid B)={\frac {P(A)P(B)}{P(B)}}}$
${\displaystyle P(A\mid B)=P(A)}$

——由此可見，假如兩件事件統計獨立，就表示「知道 ${\displaystyle B}$  發生咗」並唔影響「預計 ${\displaystyle A}$  有幾大機會發生」。除此之外，仲可以思考吓條件獨立[e 5]：設 ${\displaystyle C}$  做條件，如果話 ${\displaystyle A}$ ${\displaystyle B}$  喺呢個條件下有條件獨立，意思即係話[3]

${\displaystyle P(A\cap B\mid C)=P(A\mid C)P(B\mid C)}$

${\displaystyle P(A\mid B\cap C)=P(A\mid C)}$

${\displaystyle P(A\cap B\mid C)=P(A\mid C)P(B\mid C)}$  → 根據條件概率嘅定義...

iff ${\displaystyle {\frac {P(A\cap B\cap C)}{P(C)}}=\left({\frac {P(A\cap C)}{P(C)}}\right)\left({\frac {P(B\cap C)}{P(C)}}\right)}$  → 兩邊齊齊乘 ${\displaystyle P(C)}$ ...

iff ${\displaystyle P(A\cap B\cap C)={\frac {P(A\cap C)P(B\cap C)}{P(C)}}}$  → 兩邊齊齊除 ${\displaystyle P(B\cap C)}$ ...

iff ${\displaystyle {\frac {P(A\cap B\cap C)}{P(B\cap C)}}={\frac {P(A\cap C)}{P(C)}}}$  → 根據條件概率嘅定義...

iff ${\displaystyle P(A\mid B\cap C)=P(A\mid C)}$

常見誤解

${\displaystyle P(A\mid B)\approx P(B\mid A)}$

${\displaystyle P({\text{B}}\mid {\text{sam}})=100\%}$

{\displaystyle {\begin{aligned}P(B\mid A)&={\frac {P(A\mid B)P(B)}{P(A)}}\\\Leftrightarrow {\frac {P(B\mid A)}{P(A\mid B)}}&={\frac {P(B)}{P(A)}}\end{aligned}}}

應用例子

N-gram

我星期日會同阿爺阿嫲去飲茶，最鍾意嗌燒賣嚟食。

${\displaystyle x_{i-(n-1)},\dots ,x_{i-1}}$

${\displaystyle P(x_{i}\mid x_{i-(n-1)},\dots ,x_{i-1})}$

——即係要計「已知前面嗰串符號係 ${\displaystyle x_{i-(n-1)},\dots ,x_{i-1}}$  噉嘅樣，${\displaystyle x_{i}}$  會係噉噉噉樣」噉嘅條件概率。淨係靠住用呢種方法做嘅 n-gram，已經可以做到某一啲比較基礎嘅自然語言處理工作，好似係語言辨認噉——例如同中文書面語（原則上係建基於標準官話嘅）比起嚟，粵語白話文名詞後面應該比較大機會出現呢隻字，所以一個人工智能可以靠呢啲條件概率，分辨唔同嘅語言。

關聯規則

 顧客 A：荔枝、啤酒、白米、豬肉 顧客 B：荔枝、啤酒、白米 顧客 C：芝士、啤酒、白米、豬肉 下略大約 6,000 個個案...

${\displaystyle P({\text{lai zi}})={\frac {\text{買 咗 荔 枝 嘅 顧 客 數 量 }}{\text{顧 客 總 數 量 }}}}$

• 決定攞走所有支持度（例如）低過 1% 嘅貨品，唔再對佢哋進行分析；
• 信心度[e 12]：設 C 同 D 做間超市嘅其中兩件貨品，關聯規則分析上講嘅信心度所指嘅，就係「如果某個客買咗 C，佢會買 D 嘅機會率」，設 ${\displaystyle {\text{sap bok}}}$ （取自粵語十扑）做支持度，即係[12]
${\displaystyle {\text{seon sam}}(C\rightarrow D)={\frac {{\text{sap bok}}(C\cup D)}{{\text{sap bok}}(C)}}={\frac {P(C\cap D)}{P(C)}}=P(D\mid C)}$
• 提升度[e 13]：齋靠信心度係唔夠嘅，噉係因為信心度呢個指標並冇考慮到貨品 D 幾多人買（${\displaystyle P(D)}$ ）。提升度可以詮釋做「設商品 D 嘅支持度做恆常[註 3]，C 至 D 嘅信心度」，即係話
${\displaystyle {\text{tai sing}}(C\rightarrow D)={\frac {P(C\cap D)}{P(C)\times P(D)}}={\frac {P(D\mid C)}{P(D)}}}$  [註 4]

註釋

1. 或者精確啲講，係近乎完全冇可能會發生。
2. 可以睇睇語境句法等嘅概念。
3. 亦可以睇吓控制變數嘅概念。
4. 如果呢個數值係 1，表示買唔買 C買唔買 D 之間根本冇啦掕。如果個數值大過 1，就表示買 C 會提升買 D 嘅機率。如果個數值細過 1，就表示買 C 會降低買 D 嘅機率。

引述

1. probability theory，專門研究機會率嘅一套數學理論
2. given
3. mutually exclusive
4. statistical independence
5. conditional independence
6. conditional probability fallacy / confusion of the inverse
7. Bayes' theorem
8. natural language processing，NLP
9. association rule
10. marketing
11. support
12. confidence
13. lift

1. Kolmogorov, Andrey (1956), Foundations of the Theory of Probability, Chelsea.
2. Russell, Stuart; Norvig, Peter (2002). Artificial Intelligence: A Modern Approach. Prentice Hall. p. 478.
3. Horimoto, K. (2013). Conditional Independence. In: Dubitzky, W., Wolkenhauer, O., Cho, KH., Yokota, H. (eds) Encyclopedia of Systems Biology. Springer, New York, NY.
4. Paulos, J.A. (1988) Innumeracy: Mathematical Illiteracy and its Consequences, Hill and Wang. (p. 63 et seq.)
5. Stuart, A.; Ord, K. (1994), Kendall's Advanced Theory of Statistics: Volume I - Distribution Theory, Edward Arnold, §8.7
6. Russell, S., & Norvig, P. (2002). Artificial intelligence: a Modern Approach. Pearson. Ch. 2.
7. Brown, P. F., Desouza, P. V., Mercer, R. L., Pietra, V. J. D., & Lai, J. C. (1992). Class-based n-gram models of natural language. Computational linguistics, 18(4), 467-479.
8. Jurafsky, D., & Martin, J. H. (2021). Speech and Language Processing. Stanford University. Ch. 3.
9. Millington, I. (2019). AI for Games. CRC Press. p. 582-584.
10. Kumbhare, T. A., & Chobe, S. V. (2014). An overview of association rule mining algorithms. International Journal of Computer Science and Information Technologies, 5(1), 927-930. "The performance of FP-growth is better than all other algorithms."
11. （英文） 簡介點樣用 R 程式語言嚟做關聯規則探勘，講到關聯規則探勘當中嘅 support-confidence-lift 三大指標。
12. Hornik, K., Grün, B., & Hahsler, M. (2005). arules - A computational environment for mining association rules and frequent item sets. Journal of Statistical Software, 14(15), 1-25.
13. Kumbhare, T. A., & Chobe, S. V. (2014). An overview of association rule mining algorithms. International Journal of Computer Science and Information Technologies, 5(1), 927-930. "The performance of FP-growth is better than all other algorithms."
14. Ng, A., & Soo, K. (2017). Numsense! Data Science for the Layman. Annalyn Ng and Kenneth Soo.