因素分析

因素分析（粵拼：jan1 sou3 fan1 sik1；英文：Factor analysis）係一類嘅統計分析做法，用嚟將大量嘅變數轉化做少量因素，當中「因素」通常係一啲數值冇得直接量度嘅嘢，所以就要由量度到嗰啲變數嚟「反映」佢哋。例如智商測驗就係因素分析嘅一個應用例子：智商測驗旨在量度智能，但係智能（因素）呢家嘢冇得話攞把尺去量度，一個個體嘅智能值只可以靠住由測驗題目攞到嘅分數（量度得到嘅變數）嚟反映^[1]。

用統計學行話講，因素分析有好多用途，除咗可以用嚟搵變數背後嘅「隱藏因素」之外，仲可以配合結構方程模型嚟用，做確定性質嘅因素分析，幫打前做咗嘅因素分析「確立」佢哋搵到嘅因素模型係正確嘅，仲可以檢驗啲隱藏因素，睇吓佢哋係咪真係同個因素模型以外嘅變數成預想中嘅關係。

因素分析呢種做法喺學術研究上頗有價值：喺社會科學當中，心理學家自從廿世紀起就一路有用因素分析嚟研究智能同智商嘅相關課題；而例如營銷學又成日會用因素分析，剖析消費者買嘢嗰陣嘅行為，從而幫手做生意上嘅決定；除此之外，就算係自然科學，研究者都有機會使用因素分析。

要理解因素分析，亦可以參考吓迴歸分析先。

基礎諗頭

因素分析嘅想像圖；家陣研究者想靠量度

X_{1}

、

X_{2}

、

X_{3}

...

X_{k}

，剖析

T

（例如智能）呢個睇唔到嘅「因素」。如果做嘅係 EFA，研究者唔知

T

嘅數量有幾多個。

用一句句子嚟講嘅話，因素分析嘅重點目標係要^[2]^{:p 1}

將數量龐大嘅變數，轉化做數量比較少嘅因素。

做科研嗰陣，研究者往往要面對好多變數，不過好多時一大拃變數其實都係反映緊某啲「潛在因素」（潛在變數）嘅，而因素分析就係想用一連串嘅演算法，搵出呢啲「潛在因素」。舉個具體例子說明，想像研究者畀受試者做咗個 IQ 測試，

個測試有 $k$ 咁多條題目，而 $X_{1}$ 、 $X_{2}$ 、 $X_{3}$ ... $X_{k}$ 表示受試者喺每條題目上嘅分數，每條題目都有個誤差值 $e_{i}$ ；呢 $k$ 個變數就係觀察咗嘅隨機變數；
$T$ 表示智能（IQ 測試想量度嘅嘢），每個 $X_{i}$ 都掕咗個 $\lambda _{i}$ 值， $\lambda _{i}$ （簡單講）反映嗰條題目嘅分數同 $T$ 有幾強相關，可以大致當係
$X_{i}=\lambda _{1}T+e_{i}$
$T$ 係一個冇被直接觀察到嘅隨機變數；順帶一提，而家呢個例子得 $T$ 一個「潛在因素」，複雜啲嘅因素模型可以一個模型有多過一個「潛在因素」；
行因素分析前， $T$ 嘅數值係未知，而因素分析做嘅，就係要搵出啲參數（啲 $\lambda _{i}$ ）嘅數值；

要圖像化嘅話，啲人通常會將因素分析畫做好似文頭嗰幅圖噉嘅模型。搵到啲參數嘅數值之後，研究者仲可以做好多唔同嘅分析，包括「啲變數係咪真係反映緊同一個潛在因素」或者係「個潛在因素嘅結構係點，會唔會有得再細分做兩個子因素」... 等等^[3]。事實上呢種噉嘅分析，係 IQ 呢個概念嘅數學基礎。

因素分析可以分兩大類型：探索型（EFA）同確定型（CFA）。如果一位研究者行嘅係 EFA，即係話佢冇事先指定要有幾多個因素，佢會叫部電腦按照某啲條件「睇吓呢啲數據望落似係分到做幾多個因素」，目標係要由數據嗰度產生理論模型；而如果一位研究者行嘅係 CFA，即係話佢會事先指定有幾多個因素，以及係每個因素包括邊啲可觀察變數，然後佢就會叫部電腦計吓，佢心目中嗰個模型同數據所顯示嘅「有幾吻合」—用統計學行話講，意思係話 CFA 會包含測試手上嘅假說^[4]。

EFA 步驟

以下係做 EFA 嘅步驟。

咩時候用

喺數據科學上，探索性質嘅因素分析可以好有用^[5]^{:p 2}：呢種分析能夠減少要考慮嘅變數嘅數量—用 $T$ 嘅 1 個數值總結晒嗰一大拃分數，達致用數量更少嘅概念解釋現象；探索型嘅因素分析又可以用嚟探討變數之間有咩關係，以及係好似 IQ 噉嘅理論概念嘅「內部結構」（例如會唔會某啲變數零舍反映得到 IQ 呢？）。除此之外，呢種分析仲可以用嚟處理做統計分析不時會遇到嘅多重共線性問題。

郁手行因素分析之前，分析者要睇睇以下呢啲嘢先：

樣本大細：因素分析係一種幾複雜嘅統計分析，樣本一般要起碼有 100 個個體至算得上係「探測真實結果嘅能力」夠高^[6]^[7]，而有再嚴格啲嘅基準會要求樣本最少有 300 個個體咁多^[8]。
樣本大細同可觀察變數個比例：樣本個體數量（ $n$ ）同可觀察變數嘅數量之間嗰個比例亦都好重要；一般嚟講 $n$ 同變數數量之間要去到 10:1 就比較穩陣，亦有啲統計學專家主張呢個比例要去到 20:1 先至算係可以接受^{[註 1]}。
統計相關：實際做因素分析之前，研究者通常都會睇吓啲可觀察變數之間嘅統計相關先；有統計學專家主張，啲變數之間嘅相關值最少要係 .30，先至有可能表示佢哋反映緊某啲潛在變數，而相關值去到 .50 或以上就可以算係「理想」^[9]。
常態分佈：因素分析假設咗啲變數係呈常態分佈嘅。

等等。

因素抽取

如果係做 EFA，研究者事先唔知個「個模型有幾多個因素」，呢樣嘢要由部電腦自行決定^{[註 2]}。呢個決定一啲都唔容易做。

想像而家部電腦計咗^{[註 3]}幾個因素模型出嚟，根據模型 A，嗰拃變數背後有三個潛在變數，模型 B 就話嗰拃變數背後得兩個潛在變數，而模型 C 就話嗰拃變數背後有四個潛在變數。噉亦即係話，分析者要搵某啲條件，作出「手上搵到嘅因素模型當中，邊一個係最可以接受，或者最似係真確嘅」噉嘅決定。而且決定因素數量本質上就係兩難：根據科學上嘅奧坎剃刀原則，科學追求嘅係用最少嘅概念解釋最多嘅現象，所以因素應該係愈少就愈理想；但係另一方面事實又表明，因素數量上升，個模型「解釋到嘅變數變異」實會跟住升—縱使個升幅可能好微細，例如加多一個因素，解釋咗嘅變異淨係升嗰 1% 咁多。

睇特徵值

特徵值（以符號 $\lambda$ 代表）係統計學成日提到嘅一個概念。簡化噉講，特徵值係反映緊添加一個因素能夠令「解釋到嘅變數變異」升幾多^[10]。而要選擇因素嘅數量，一個簡單嘅方法就係一邊加新嘅因素落去個模型度，一邊睇住特徵值點樣變化—噉一旦「加咗第 $p$ 個因素，解釋到嘅變異嘅升幅」數值（由特徵值反映）跌到低過預先設好嘅門檻（例如特徵值跌到細過 1），部電腦就會停手唔再加新嘅因素，最後得出一個 $p-1$ 咁多個因素嘅模型。根據慣常用嘅標準，自然科學嘅因素模型要解釋最少 95% 嘅變異，而社會科學嘅因素模型就要解釋最少 50 到 60% 嘅變異^[6]。

好似係以下呢個例子噉^[5]^{:p 7}（已解變異係指解釋到幾多變異，以 % 嚟計）：


	特徵值	添加因素已解變異會升...	累計已解變異總共幾多？
因素 1	19.095	40.627	40.627
因素 2	2.644	5.625	46.252
因素 3	1.733	3.688	49.940
因素 4	1.354	2.882	52.822
因素 5	1.156	2.459	55.281
因素 6	1.144	2.433	57.714
因素 7	1.014	2.158	59.873

—去到添加第 8 個因素嗰陣，特徵值跌到細過 1，就形成一個 7 個因素嘅模型，解釋得到約莫 60% 嘅變異。

岩屑堆圖

「一邊添加新因素，一邊睇住特徵值點變」噉嘅思考方法，可以用岩屑堆圖嘅方式圖像化。一幅岩屑堆圖有打橫打戙兩條軸，打橫嗰條表示因素嘅數量，而打戙嗰條表示特徵值。事實表明，隨住因素數量上升，特徵值會變到愈嚟愈細，即係話岩屑堆圖出嗰條線會偏向下跌，跌嘅速度就愈嚟愈慢，好似下圖^[11]：

Scree plot：岩屑堆圖 | Eigenvalue：特徵值 | Component number：因素嘅數量

而條虛線就表示特徵值係 1 嗰個位—特徵值一跌到落 1 以下，部電腦就停手唔再加新嘅因素。上述呢幅圖噉嘅情況，部電腦最後會出嗰個模型將會有 3 個因素。

因素旋轉

淨係出咗個模型係唔夠嘅。事實表明，因素分析出嘅模型好多時都「唔夠靚」：出咗個模型之後，是但攞一個變數嚟睇，個變數都會有條式

x_{i,m}-\mu _{i}=l_{i,1}f_{1,m}+\dots +l_{i,k}f_{k,m}+\varepsilon _{i,m}

當中

$x_{i,m}$ 係第 $m$ 個個體喺第 $i$ 個變數上嘅數值；
$\mu _{i}$ 係第 $i$ 個變數嘅平均值^{[註 4]}；
$l_{i,j}$ 係第 $i$ 個變數喺第 $j$ 個因素上嘅因素負荷量；
$f_{j,m}$ 係第 $m$ 個個體喺第 $j$ 個因素上嘅數值；
$\varepsilon _{i,m}$ 係 $(i,m)$ 嗰個估計唔到嘅誤差數值，平均係 0，變異數有限；

用矩陣式嘅寫法，就可以寫做望落簡潔啲嘅

X-\mathrm {M} =LF+\varepsilon

，當中

F

要受以下嘅限制——

$F$ 同 $\varepsilon$ 彼此獨立；
$F$ 嘅預期值係 0；
${\text{Cov}}(F)=I$ ， ${\text{Cov}}(F)$ 係指 $F$ 嘅協方差矩陣，而 $I$ 係單位矩陣；

—是但攞一對「變數-因素」組合，佢哋之間嘅因素負荷量都可以唔同，例如如果佢哋之間個 $l_{i,j}$ 近乎等如 0，就表示兩者之間咩關係都冇，而如果佢哋之間個 $l_{i,j}$ 數值好大，就表示兩者之間有好強嘅關係。因素旋轉做嘅嘢，就係想令到個模型入便多啲數值高嘅負荷量，同時少啲數值低嘅負荷量^[5]^{:p 9}，用日常用語講可以大致想像成「一步一步噉執吓啲 $l_{i,j}$ 佢，等個模型睇落理想啲」。

旋轉方法

因素分析上用嘅旋轉方法，可以有好多種^[12]：

Varimax：追求盡量減少「喺每個因素上都有高負荷量」嘅變數嘅數量，被指可以有效簡化對因素嘅詮釋。
Quartimax：盡量減少「每個變數需要」嘅因素嘅數量，令到每個變數都可以由一至兩個因素「解釋晒」。
Equamax：結合咗 varimax（簡化啲因素）同 quartimax（簡化啲變數），追求減少「一個因素有高負荷量掕住嘅變數數量」同埋減少「一個變數有高負荷量掕住嘅因素數量」。
Promax：容許因素之間有一定嘅統計相關。呢種做法計起上嚟快啲，因而被指比較能夠處理量大嘅數據。

喺廿一世紀初嘅統計學界，因素旋轉呢樣嘢受到一定嘅批評：事實表明，數據入便嘅細微變動，可以令到因素旋轉出嘅結果出現大變；例如而家手上有 300 個個體，用呢 300 個個體做 EFA 用 varimax 旋轉，然後再由 300 個個體入便是但剷走 10 個個體嘅數據，重做用 varimax 旋轉嘅 EFA，出嘅因素模型可以唔同晒（因素嘅數量唔同，而且「邊個變數負荷落去邊個因素」又唔同咗）；噉嘅問題亦表示，因素旋轉令到研究者難以比較唔同研究出嘅結果。事實係有社科研究曾經試過發生噉嘅事—班研究者喺度研究文化，個個都有用因素分析，用嘅旋轉方法唔同，打後嘅研究者發現，呢幾份研究冇旋轉得出嘅因素模型好相似，但係做咗旋轉之後嘅因素模型唔同晒，唔同研究者手上都有個唔同嘅因素模型，個個諗住自己發現咗新嘢，仲創造新概念嚟解釋呢啲「新發現」^[13]。

結果詮釋

搞掂晒呢啲步驟，分析者就要詮釋個結果^{[註 5]}：就算做完旋轉，個模型都只係一大拃數值，分析者要對呢拃數值賦予意義；舉個簡化例子，想像而家研究智商，研究者手上個智商測試有 30 條問題；佢行 EFA 搵到一個因素模型，個模型得一個因素，當中頭嗰 10 條問題嘅因素負荷量（標準化咗）做晒旋轉都仲係好低（連 0.4 都唔夠），同時尾嗰 20 條問題就條條都因素負荷量都超過 0.7（標準化咗）；噉佢就有理由相信

然後佢根據手上嘅理論，有理由相信呢啲題目都係反映緊智能嘅，所以佢就順利成章噉將個因素命名做智能；
手上個智商測試，啲題目全部都係大致反映緊同一樣嘢（智能）嘅；
頭嗰 10 條問題唔係咁反映得到個因素，可以考慮攞走以後都唔用佢哋；

有研究者指出，因素分析得出嘅因素幾有意義，講到埋尾都係由研究者定義嘅^[14]—有關要點樣同啲因素命名，廿一世紀初嘅學界並冇乜嘢精確嘅基準，好多時都係研究者睇吓喺呢個因素上負荷量高嘅變數，再認為佢哋「似係大致反映緊 XXX 呢個理論概念」，就當咗佢哋係反映緊呢個概念。

CFA 步驟

啲人做完 EFA，成日都會走去做 CFA：想像而家一班研究者搵咗拃數據返嚟，用 EFA 建立咗個因素模型；佢哋好多時都會想搵第個樣本，用新樣本嘅數據嚟行 CFA，檢驗吓由第一個樣本度搵到嘅因素模型「有幾用得到落去個新樣本度」^[15]^[16]— CFA 做嘅正正就係攞住

一拃數據，啲變數呈常態分佈，加埋
一個由研究者定義好嘅因素模型，

然後出一拃數值，講吓呢個模型有幾符合手上嗰拃數據^[17]。有關呢啲嘢嘅數學細節，可以睇睇結構方程模型。

亦要注意一樣嘢，就係行 CFA 實要估計好多個模型參數，對統計自由度有高嘅要求，一般嚟講樣本要起碼有 500 至 1,000 個個體，研究者先可以試行 CFA ^[18]^{:p 4}。

講定模型

做 CFA 嘅第一步就係要界定個模型。一般來講，研究者會做 CFA 可能係因為佢睇過前人做嘅研究，知道手上嗰拃變數應該係成點嘅因素結構嘅，亦有可能係佢做完 EFA 搵到一個因素模型。無論係點，佢跟住都同部電腦講，佢心目中個模型係點嘅，即係想像

x_{1,m}-\mu _{1}=l_{1,1}f_{1,m}+\dots +l_{i,k}f_{k,m}+\varepsilon _{1,m}

x_{2,m}-\mu _{2}=l_{2,1}f_{1,m}+\dots +l_{i,k}f_{k,m}+\varepsilon _{2,m}

...

x_{i,m}-\mu _{i}=l_{i,1}f_{1,m}+\dots +l_{i,k}f_{k,m}+\varepsilon _{i,m}

噉嘅因素模型，研究者要指定有幾多個因素（ $k$ 嘅數值），有幾多個變數（ $i$ 嘅數值），佢亦要指定每個變數係反映緊邊個因素（可以想像成係指定邊個 $l$ 係 0 邊個係非 0）—研究者要指定佢心目中個理論模型，再做分析睇吓能唔能夠確定佢個諗法係啱嘅。不過研究者唔使乜嘢都指定晒，喺多數情況下研究者都冇需要指定啲 $l$ 嘅具體數值—啲 $l$ 嘅具體數值會由做 CFA 嘅演算法^{[註 3]}負責估計^[19]。

計適合度

淨係搵到一個模型係唔夠嘅：搵到個模型啲參數（拃 $l$ ）數值之後，分析者仲要檢驗個模型嘅適合度夠唔夠高：適合度泛指一個統計模型（例如係一個 CFA 模型）有幾切合得到手上嘅數據^[20]；例如

卡方檢定，符號係 $χ 2$ ^{[註 6]}：呢種做法將「個模型係正確嘅」當做 $H_{0}$ （虛無假說），並且攞「個模型嘅協方差矩陣」同「實際觀察到嘅協方差矩陣」做卡方檢定，如果卡方檢定嘅數值（ $χ 2$ ）愈大，就表示兩個矩陣之間差異愈大－研究者就愈有理由相信個模型係錯嘅^[21]^[22]。
近似值根均方誤差，英文簡稱 RMSEA：一個數值愈低愈好嘅適合度指標；RMSEA 最細嘅可能數值係 0，而一般認為，RMSEA 數值喺 0.1 或者以上嘅話個模型嘅適合度就算低到唔可以接受^[23]。
標準化根均殘差，英文簡稱 SRMR：另一個數值愈低愈好嘅適合度指標；一般認為，SRMR 嘅數值最好係喺 0.1 以下，亦都有統計學家主張 SRMR 數值要喺 0.08 以下個模型先算係有充足嘅適合度^[21]。
比較適合指數，英文簡稱 CFI：一個主要反映數據當中嘅統計相關嘅大細嘅適合度指標，所以數值係愈高愈好；一般嚟講，CFI 嘅數值過到 0.95，個模型就算係可以接受^[21]。

等等。如果啲適合度指標反映手上嗰個模型「可以接受」^{[註 7]}，研究者就可以去下一步詮釋呢個模型。

應用價值

喺廿一世紀初嘅社會科學上，CFA 有好多用途^[17]^:22.2。

支持潛在變數嘅存在：有啲研究者做 CFA 嗰時第一步會係試行一個虛無模型^[18]^{:p 2}；一個虛無模型係指一個咩隱藏因素都冇嘅模型，而如果呢個模型嘅適合度非常惡劣，研究者就有理由相信應該真係有隱藏因素喺度。
檢驗聚合效度^[17]^{:p 4} ^[24]：研究者攞住手上嘅數據，量度咗受試者喺啲題目上嘅得分，就可以行 CFA，睇吓個潛在變數同其他變數有咩關係；舉例說明，想像而家研究者要開發一套新嘅 IQ 測試，同拃題目嘅分數做咗 EFA，發覺拃題目反映緊一個潛在嘅「智能」因素，然後佢可以用 CFA 模型，睇吓呢個潛在因素係咪真係同（例如）其他 IQ 測試有強嘅正相關；如果真係有，噉佢就有更強嘅理由主張，話佢個新測試都係量度緊智能嘅—用統計學詞彙講，佢噉做係確立緊個新測試嘅聚合效度。
檢驗分歧效度：想像嗰位研究者亦量度咗一啲性格方面嘅變數，而且呢啲變數已知係同智能冇啦掕嘅，噉個研究者就可以再行 CFA，睇吓佢個新 IQ 測試嘅分數係咪真係同性格「冇太強嘅相關」—如果呢個新 IQ 測試畀嘅分數同性格有太強嘅相關，就表示個測試唔係純粹量度緊智能，做唔到「同理應無關嘅嘢分歧」呢一樣嘢。
削短已有嘅量度工具：想像而家手上個 IQ 測試有成 100 條題目；咁多條題目做起上嚟幾嘥時間，所以研究者就想開發一個短篇版本嘅測試，佢可能（例如）做 EFA，抽咗因素負荷量最高嗰 20 題出嚟，然後就攞個 20 題嘅版本去畀受試者做，如果跟住落嚟做 CFA 發現個 20 題嘅新版本有好嘅聚合效度同分歧效度，噉佢就算係初步確立咗一個 20 題嘅短篇版，做咗一啲有用嘅嘢^[25]。

等等。

社科應用

心理測量

一條典型嘅 IQ 測試題目；呢條題目要受試者根據啲圖形嘅規律，估下一個圖形應該會係乜。

因素分析其中一個最出名嘅應用，就係喺心理測量上幫手量度智能。智能係有關認知嘅一個重要概念，某個個體嘅智能，大致可以當係講緊佢個腦有幾能夠處理資訊，用嚟思考同埋解難^[26]。

智能呢家嘢冇得話（例如）搵把間尺去度，不過就可以由某啲行為指標反映：喺廿世紀起，量度智能嘅一個常見做法係畀受試者做一大拃題目，呢啲題目係要求思考同解難能力嘅，然後研究者就可以對受試者喺呢啲題目上得到嘅分數（可以想像每條題目算係一個變數，有答啱同答錯兩個值）做因素分析，搵出呢啲變數背後嘅潛在變數——呢個潛在變數，應該就係心理學家搵緊嘅智能因素^[27]。事實係喺廿世紀初，心理學家發現細路喺唔同學科上嘅能力彼此之間有幾強正相關，出咗「呢啲能力背後受同一個因素主宰」噉嘅諗法。

除咗智能之外，因素分析喺性格心理學上亦都廣受採用：性格呢樣嘢都係冇得話攞把間尺度，不過原則上就可以由一大拃行為嚟反映——例如講一個人有幾外向，可以睇佢（例如）有幾常講嘢、有幾常行近陌生人度打招呼、以及係有幾常同陌生人做朋友... 呀噉；研究者可以郁手做性格測驗，畀受試者答問題描述自己嘅行為，或者請受試者身邊嘅人描述受試者，再同呢啲噉嘅行為變數做因素分析，睇吓（例如）

有幾常講嘢、
有幾常行近陌生人度打招呼、以及
有幾常同陌生人做朋友

呢啲變數背後係咪真係有一個潛在變數，望落似預想中嘅「外向度」概念嘅^[28]。

商學應用

好似噉嘅心理測量做法，喺營銷以至管理^[29]等嘅商學上都有用。營銷屬於商學，專研究點樣有效噉賣自己嘅產品同服務。喺營銷相關嘅研究上，研究者成日都會想量度消費者（例如）點樣睇手上嗰件產品（態度）、覺得件產品質素有幾高、以及係幾有意慾買嗰一件產品... 等嘅變數。呢啲變數都係屬於冇得直接量度嘅嘢——所以就有機會用到因素分析^[30]。

例如想像一份問卷調查，有以下呢幾條問題^[31]：

X 牌子能夠達到我嘅期望。

我對 X 牌子有信心。

X 牌子從來都唔會令我失望。

用 X 牌子嘅產品包保有滿足感。

X 牌子好老實好靠得住。

... 等，叫受訪者答每句句子佢哋有幾同意（可能有 1 至 5 分，當中分數愈高表示愈同意）。原則上，上述呢啲句子反映緊受訪者對 X 牌子有幾信任。研究者行 EFA，睇吓呢拃句子上嘅分數係咪真係成得到一個因素（EFA 出一個因素，而且每個分數嘅因素負荷量都有返咁上下），係嘅話佢哋就有信心主張，話手上嗰個調查似係量度緊受訪者對牌子 X 嘅信任程度，可以進一步研究（例如）受訪者對一隻牌子嘅信任程度會受咩因素影響。

其他應用

原則上，因素分析呢家嘢源於心理學、商學同社會學等嘅社會科學——社會科學本質上就係研究人類行為嘅，好多時都想量度啲人諗緊乜或者感受緊乜，包括係人嘅態度或者其他心理性質嘅變數。噉嘅變數通常都冇得直接量度，於是研究者就唯有走去量度一大拃佢哋有理由認為係「反映緊同一樣嘢」嘅變數，再做因素分析，從而探究背後嘅潛在變數。雖然係噉，但因素分析查實喺某啲自然科學或者工程學研究當中都可以好有用。

例如有部份地球科學研究，就用咗因素分析嚟分析水質：水質係一個多維度嘅概念，由水嘅化學成份嚟反映；原則上，水質可以大致上想像成啲水「夠唔夠乾淨」，係咪適合攞去做「畀人飲用」等嘅用途；依家想像研究者由水體嗰度抽咗啲水做樣本，再量度吓樣本入便各種化學物質（包括鐵同鉛）分別濃度有幾高；數據攞到手，研究者就行因素分析，以呢啲化學物質濃度數值做變數，睇吓佢哋背後係咪有一啲「隱藏因素」喺度；結果發現，呢啲濃度值背後的確有一啲「隱藏因素」，而呢啲「隱藏因素」反映咗某啲礦物嘅存在或者水污染事件等嘅重要資訊^[32]。

睇埋

詞彙

用咗嘅重要概念或者專有名詞嘅外語（主要係英文）名：

因素：factor
潛在變數：latent variable
可觀察變數：observed variable
探索型因素分析：exploratory factor analysis，EFA
確定型因素分析：confirmatory factor analysis，CFA
統計相關：statistical correlation
多重共線性：multicollinearity
常態分佈：normal distribution
最大似然估計：maximum likelihood estimation，MLE
可解釋變異：explained variance
特徵值：eigenvalue
岩屑堆圖：scree plot
因素負荷量：factor loading
因素分析上講嘅旋轉：rotation
詮釋：(to) interpret
自由度：degree of freedom，df
因素結構：factor structure
嵌套模型：nested model
虛無模型：null model
聚合效度：convergent validity
分歧效度：discriminant validity
智能：intelligence
IQ 測試，當中 IQ 全名係 intelligence quotient。
性格測驗：personality test
營銷：marketing

文獻

Statistics in Psychosocial Research - Lecture 8 Factor Analysis I (PDF).
Chapter 14 - Factor analysis (PDF).
Courtney, M. G. R. (2013). Determining the number of factors to retain in EFA: Using the SPSS R-Menu v2.0 to make more judicious estimations. Practical Assessment, Research and Evaluation, 18(8).
Henson RK, Roberts JK (2006). Use of exploratory factor analysis in published research: Common errors and some comment on improved practice. Educational and Psychological measurement, 66(3), 393-416，呢篇文有講到一啲針對 EFA 嘅批評，尤其係話呢種分析「太易出到統計顯著嘅結果」，彷彿好似研究者將啲旋轉方法逐個逐個試都可以出到結果。
Mundfrom, D. J., Shaw, D. G., & Ke, T. L. (2005). Minimum sample size recommendations for conducting factor analyses. International journal of testing, 5(2), 159-168.
Thompson B, Daniel LG. Factor analytic evidence for the construct validity of scores: A historical overview and some guidelines. Educational and psychological measurement. 1996 Apr;56(2):197-208，有講到點樣用平行分析決定因素嘅數量。

註釋

↑ 喺廿一世紀初，統計學界對於「因素分析用嘅樣本最少要有幾大」呢條問題查實有相當嘅爭議。
↑ 喺實用上，做統計嘅電腦軟件要自行喺研究人員俾嘅一堆變數裏面決定「個模型要有幾多個因素」。
↑ ^3.0 ^3.1 有關「電腦係用咩演算法做呢啲計算嘅」，可以睇睇最大似然估計等嘅概念。
↑ 可以睇睇標準分數嘅概念。
↑ 好多時亦會用卡隆巴系數檢驗吓啲因素結構。
↑ 卡方檢定要係個樣本大過要估計嘅參數數量（ $df>0$ ）嗰時至會有用，而且如果樣本大得滯，卡方檢定會變得唔可靠。
↑ 或者行咗幾個唔同嘅模型之後，研究者發覺手上嗰個模型啲適合度指標最理想。可以睇睇嵌套模型嘅概念。

參考

本文引用咗嘅學術文獻或者網頁：

↑ Child, Dennis (2006), The Essentials of Factor Analysis (3rd ed.), Continuum International.
↑ Yong, A. G., & Pearce, S. (2013). A beginner's guide to factor analysis: Focusing on exploratory factor analysis. Tutorials in quantitative methods for psychology, 9(2), 79-94.
↑ Bandalos, D. L. (2017). Measurement Theory and Applications for the Social Sciences. The Guilford Press.
↑ Pett MA, Lackey NR, Sullivan JJ. Making Sense of Factor Analysis: The use of factor analysis for instrument development in health care research. California: Sage Publications Inc; 2003.
↑ ^5.0 ^5.1 ^5.2 Williams, B., Onsman, A., & Brown, T. (2010). Exploratory factor analysis: A five-step guide for novices. Australasian journal of paramedicine, 8, 1-13，佢 Table 2 嗰度講到幾種「郁手行因素分析之前要做」嘅測試。
佢噉嚟描述旋轉："Rotation maximises high item loadings and minimises low item loadings, therefore producing a more interpretable and simplified solution."
↑ ^6.0 ^6.1 Hair J, Anderson RE, Tatham RL, Black WC. Multivariate data analysis. 4th ed. New Jersey: Prentice-Hall Inc; 1995.
↑ de Winter*, J. C., Dodou*, D. I. M. I. T. R. A., & Wieringa, P. A. (2009). Exploratory factor analysis with small sample sizes. Multivariate behavioral research, 44(2), 147-181，呢篇文亦提到話樣本要有最少 50 個個體。
↑ Tabachnick BG, Fidell LS. Using Multivariate Statistics. Boston: Pearson Education Inc; 2007.
↑ Hair J, Anderson RE, Tatham RL, Black WC. Multivariate data analysis. 4th ed. New Jersey: Prentice-Hall Inc; 1995. Hair et al. (1995) categorised these loadings using another rule of thumb as ±0.30=minimal, ±0.40=important, and ±.50=practically significant.
↑ Factor Analysis - Rachael Smyth and Andrew Johnson，佢哋噉講："Eigenvalues are a measure of the amount of variance accounted for by a factor, and so they can be useful in determining the number of factors that we need to extract."
↑ George Thomas Lewith; Wayne B. Jonas; Harald Walach (23 November 2010). Clinical Research in Complementary Therapies: Principles, Problems and Solutions. Elsevier Health Sciences. p. 354.
↑ Factor Analysis Rotation. IBM SPSS.
↑ Fog, A (2022). "Two-Dimensional Models of Cultural Differences: Statistical and Theoretical Analysis" (PDF). Cross-Cultural Research. 57 (2–3): 115–165.
↑ Henson RK, Roberts JK (2006). Use of exploratory factor analysis in published research: Common errors and some comment on improved practice. Educational and Psychological measurement, 66(3), 393-416，佢講咗："The meaningfulness of latent factors is ultimately dependent on researcher definition."
↑ （英文） So, K. K. F., King, C., & Sparks, B. (2014). Customer engagement with tourism brands: Scale development and validation. Journal of Hospitality & Tourism Research, 38(3), 304-329，佢哋做旅遊業研究設計問卷，步驟係界定想量度嘅概念、講吓個概念包含啲乜、設計題目同檢驗效度、做前導研究行 EFA、搵新數據做 CFA 檢驗信度，再做測試睇吓份問卷有冇聚合效度。
↑ （英文） Hollebeek, L. D., Glynn, M. S., & Brodie, R. J. (2014). Consumer brand engagement in social media: Conceptualization, scale development and validation. Journal of interactive marketing, 28(2), 149-165，市場學研究設計問卷，步驟係做面試研究產生題目同檢驗效度、做 EFA、搵新數據做 CFA，再做測試睇吓份問卷有冇聚合效度。
↑ ^17.0 ^17.1 ^17.2 Brown, T. A., & Moore, M. T. (2012). Confirmatory factor analysis. Handbook of structural equation modeling, 361, 379，第 4 頁尾第 5 頁頭嗰度講咗："Convergent validity is indicated by evidence that different indicators of theoretically similar or overlapping constructs are strongly interrelated; e.g., symptoms purported to be manifestations of a single mental disorder load on the same factor. Discriminant validity is indicated by results showing that indicators of theoretically distinct constructs are not highly intercorrelated."
↑ ^18.0 ^18.1 Bryant, F. B., Yarnold, P. R., & Michelson, E. A. (1999). Statistical methodology: VIII. Using confirmatory factor analysis (CFA) in emergency medicine research. Academic emergency medicine, 6(1), 54-66.
↑ Raykov, T. (2001). Estimation of congeneric scale reliability using covariance structure analysis with nonlinear constraints. British Journal of Mathematical and Statistical Psychology, 54, 315-323.
↑ Singh, R. (2009). Does my structural model represent the real phenomenon?: a review of the appropriate use of Structural Equation Modelling (SEM) model fit indices. The Marketing Review, 9(3), 199-212.
↑ ^21.0 ^21.1 ^21.2 Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1-55.
↑ Deciding Between Competing Models: Chi-Square Difference Tests (PDF). Introduction to Structural Equation Modeling with LISREL.
↑ Browne, M. W.; Cudeck, R. (1993). "Alternative ways of assessing model fit". In Bollen, K. A.; Long, J. S. (eds.). Testing structural equation models. Newbury Park, CA: Sage.
↑ Farrell, A. M., & Rudd, J. M. (2009). Factor analysis and discriminant validity: A brief review of some practical issues. In Australia and New Zealand Marketing Academy Conference 2009. Anzmac.
↑ Goetz, C., Coste, J., Lemetayer, F., Rat, A. C., Montel, S., Recchia, S., ... & Guillemin, F. (2013). Item reduction based on rigorous methodological guidelines is necessary to maintain validity when shortening composite measurement scales. Journal of Clinical Epidemiology, 66(7), 710-718.
↑ Gottfredson, Linda S. (1997). "Mainstream Science on Intelligence (editorial)" (PDF). Intelligence. 24: 13-23.
↑ Paulhus, D. L., Lysy, D. C., & Yik, M. S. (1998). Self-report measures of intelligence: Are they useful as proxy IQ tests?. Journal of personality, 66(4), 525-554.
↑ Sharpe, J. P., Martin, N. R., & Roth, K. A. (2011). Optimism and the Big Five factors of personality: Beyond neuroticism and extraversion. Personality and individual differences, 51(8), 946-951.
↑ Wong, S. K. S. (2013). Environmental requirements, knowledge sharing and green innovation: Empirical evidence from the electronics industry in China. Business Strategy and the Environment, 22(5), 321-338.
↑ Asshidin, N. H. N., Abidin, N., & Borhan, H. B. (2016). Perceived quality and emotional value that influence consumer's purchase intention towards American and local products. Procedia Economics and Finance, 35, 639-643.
↑ Sahin, A., Zehir, C., & Kitapçı, H. (2011). The effects of brand experiences, trust and satisfaction on building brand loyalty; an empirical research on global brands. Procedia-Social and Behavioral Sciences, 24, 1288-1301，佢個 Table 2 講到佢哋做嘅因素分析。
↑ Love, D., Hallbauer, D., Amos, A., & Hranova, R. (2004). Factor analysis as a tool in groundwater quality management: two southern African case studies. Physics and Chemistry of the Earth, Parts A/B/C, 29(15-18), 1135-1143.
↑ Çokluk Bökeoğlu, Ö., & Koçak, D. (2016). Using Horn's parallel analysis method in exploratory factor analysis for determining the number of factors. Educational sciences-theory & practice, 16(2).
↑ Factor Analysis Vs. PCA (Principal Component Analysis) – Which One to Use? 互聯網檔案館嘅歸檔，歸檔日期2024年5月11號，.. Analytix Labs.

拎

（英文）因素分析，IBM SPSS，用英文簡介因素分析。
（英文）因素分析嘅新手入門介紹，集中講 EFA，PDF 快勞
（英文）用 R 程式語言行因素分析，GeeksForGeeks

[9] 喺廿一世紀初，統計學界對於「因素分析用嘅樣本最少要有幾大」呢條問題查實有相當嘅爭議。

[11] 喺實用上，做統計嘅電腦軟件要自行喺研究人員俾嘅一堆變數裏面決定「個模型要有幾多個因素」。

[MLE-12] 3.0 ^3.1 有關「電腦係用咩演算法做呢啲計算嘅」，可以睇睇最大似然估計等嘅概念。

[15] 可以睇睇標準分數嘅概念。

[18] 好多時亦會用卡隆巴系數檢驗吓啲因素結構。

[26] 卡方檢定要係個樣本大過要估計嘅參數數量（ $df>0$ ）嗰時至會有用，而且如果樣本大得滯，卡方檢定會變得唔可靠。

[30] 或者行咗幾個唔同嘅模型之後，研究者發覺手上嗰個模型啲適合度指標最理想。可以睇睇嵌套模型嘅概念。

[1] Child, Dennis (2006), The Essentials of Factor Analysis (3rd ed.), Continuum International.

[2] Yong, A. G., & Pearce, S. (2013). A beginner's guide to factor analysis: Focusing on exploratory factor analysis. Tutorials in quantitative methods for psychology, 9(2), 79-94.

[3] Bandalos, D. L. (2017). Measurement Theory and Applications for the Social Sciences. The Guilford Press.

[4] Pett MA, Lackey NR, Sullivan JJ. Making Sense of Factor Analysis: The use of factor analysis for instrument development in health care research. California: Sage Publications Inc; 2003.

[williams2010-5] 5.0 ^5.1 ^5.2 Williams, B., Onsman, A., & Brown, T. (2010). Exploratory factor analysis: A five-step guide for novices. Australasian journal of paramedicine, 8, 1-13，佢 Table 2 嗰度講到幾種「郁手行因素分析之前要做」嘅測試。
佢噉嚟描述旋轉："Rotation maximises high item loadings and minimises low item loadings, therefore producing a more interpretable and simplified solution."

[hair95-6] 6.0 ^6.1 Hair J, Anderson RE, Tatham RL, Black WC. Multivariate data analysis. 4th ed. New Jersey: Prentice-Hall Inc; 1995.

[7] Winter*, J. C., Dodou*, D. I. M. I. T. R. A., & Wieringa, P. A. (2009). Exploratory factor analysis with small sample sizes. Multivariate behavioral research, 44(2), 147-181，呢篇文亦提到話樣本要有最少 50 個個體。

[8] Tabachnick BG, Fidell LS. Using Multivariate Statistics. Boston: Pearson Education Inc; 2007.

[10] Hair J, Anderson RE, Tatham RL, Black WC. Multivariate data analysis. 4th ed. New Jersey: Prentice-Hall Inc; 1995. Hair et al. (1995) categorised these loadings using another rule of thumb as ±0.30=minimal, ±0.40=important, and ±.50=practically significant.

[13] Factor Analysis - Rachael Smyth and Andrew Johnson，佢哋噉講："Eigenvalues are a measure of the amount of variance accounted for by a factor, and so they can be useful in determining the number of factors that we need to extract."

[14] George Thomas Lewith; Wayne B. Jonas; Harald Walach (23 November 2010). Clinical Research in Complementary Therapies: Principles, Problems and Solutions. Elsevier Health Sciences. p. 354.

[16] Factor Analysis Rotation. IBM SPSS.

[17] Fog, A (2022). "Two-Dimensional Models of Cultural Differences: Statistical and Theoretical Analysis" (PDF). Cross-Cultural Research. 57 (2–3): 115–165.

[19] Henson RK, Roberts JK (2006). Use of exploratory factor analysis in published research: Common errors and some comment on improved practice. Educational and Psychological measurement, 66(3), 393-416，佢講咗："The meaningfulness of latent factors is ultimately dependent on researcher definition."

[so14-20] （英文） So, K. K. F., King, C., & Sparks, B. (2014). Customer engagement with tourism brands: Scale development and validation. Journal of Hospitality & Tourism Research, 38(3), 304-329，佢哋做旅遊業研究設計問卷，步驟係界定想量度嘅概念、講吓個概念包含啲乜、設計題目同檢驗效度、做前導研究行 EFA、搵新數據做 CFA 檢驗信度，再做測試睇吓份問卷有冇聚合效度。

[21] （英文） Hollebeek, L. D., Glynn, M. S., & Brodie, R. J. (2014). Consumer brand engagement in social media: Conceptualization, scale development and validation. Journal of interactive marketing, 28(2), 149-165，市場學研究設計問卷，步驟係做面試研究產生題目同檢驗效度、做 EFA、搵新數據做 CFA，再做測試睇吓份問卷有冇聚合效度。

[brown12-22] 17.0 ^17.1 ^17.2 Brown, T. A., & Moore, M. T. (2012). Confirmatory factor analysis. Handbook of structural equation modeling, 361, 379，第 4 頁尾第 5 頁頭嗰度講咗："Convergent validity is indicated by evidence that different indicators of theoretically similar or overlapping constructs are strongly interrelated; e.g., symptoms purported to be manifestations of a single mental disorder load on the same factor. Discriminant validity is indicated by results showing that indicators of theoretically distinct constructs are not highly intercorrelated."

[bryant99-23] 18.0 ^18.1 Bryant, F. B., Yarnold, P. R., & Michelson, E. A. (1999). Statistical methodology: VIII. Using confirmatory factor analysis (CFA) in emergency medicine research. Academic emergency medicine, 6(1), 54-66.

[24] Raykov, T. (2001). Estimation of congeneric scale reliability using covariance structure analysis with nonlinear constraints. British Journal of Mathematical and Statistical Psychology, 54, 315-323.

[singh2009-25] Singh, R. (2009). Does my structural model represent the real phenomenon?: a review of the appropriate use of Structural Equation Modelling (SEM) model fit indices. The Marketing Review, 9(3), 199-212.

[hubentler1999-27] 21.0 ^21.1 ^21.2 Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1-55.

[28] Deciding Between Competing Models: Chi-Square Difference Tests (PDF). Introduction to Structural Equation Modeling with LISREL.

[29] Browne, M. W.; Cudeck, R. (1993). "Alternative ways of assessing model fit". In Bollen, K. A.; Long, J. S. (eds.). Testing structural equation models. Newbury Park, CA: Sage.

[31] Farrell, A. M., & Rudd, J. M. (2009). Factor analysis and discriminant validity: A brief review of some practical issues. In Australia and New Zealand Marketing Academy Conference 2009. Anzmac.

[32] Goetz, C., Coste, J., Lemetayer, F., Rat, A. C., Montel, S., Recchia, S., ... & Guillemin, F. (2013). Item reduction based on rigorous methodological guidelines is necessary to maintain validity when shortening composite measurement scales. Journal of Clinical Epidemiology, 66(7), 710-718.

[33] Gottfredson, Linda S. (1997). "Mainstream Science on Intelligence (editorial)" (PDF). Intelligence. 24: 13-23.

[34] Paulhus, D. L., Lysy, D. C., & Yik, M. S. (1998). Self-report measures of intelligence: Are they useful as proxy IQ tests?. Journal of personality, 66(4), 525-554.

[35] Sharpe, J. P., Martin, N. R., & Roth, K. A. (2011). Optimism and the Big Five factors of personality: Beyond neuroticism and extraversion. Personality and individual differences, 51(8), 946-951.

[36] Wong, S. K. S. (2013). Environmental requirements, knowledge sharing and green innovation: Empirical evidence from the electronics industry in China. Business Strategy and the Environment, 22(5), 321-338.

[37] Asshidin, N. H. N., Abidin, N., & Borhan, H. B. (2016). Perceived quality and emotional value that influence consumer's purchase intention towards American and local products. Procedia Economics and Finance, 35, 639-643.

[38] Sahin, A., Zehir, C., & Kitapçı, H. (2011). The effects of brand experiences, trust and satisfaction on building brand loyalty; an empirical research on global brands. Procedia-Social and Behavioral Sciences, 24, 1288-1301，佢個 Table 2 講到佢哋做嘅因素分析。

[39] Love, D., Hallbauer, D., Amos, A., & Hranova, R. (2004). Factor analysis as a tool in groundwater quality management: two southern African case studies. Physics and Chemistry of the Earth, Parts A/B/C, 29(15-18), 1135-1143.

[40] Çokluk Bökeoğlu, Ö., & Koçak, D. (2016). Using Horn's parallel analysis method in exploratory factor analysis for determining the number of factors. Educational sciences-theory & practice, 16(2).

[41] Factor Analysis Vs. PCA (Principal Component Analysis) – Which One to Use? 互聯網檔案館嘅歸檔，歸檔日期2024年5月11號，.. Analytix Labs.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[註 1]

[9]

[註 2]

[註 3]

[10]

[11]

[註 4]

[12]

[13]

[註 5]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[註 6]

[21]

[22]

[23]

[註 7]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]