竹家庄: 如何解读一个交互影响显著而简单斜率不显著的回归模型？

Anonymous @ 2009-07-30:

Hi, Dr ZHU, I may want introduce myself first, I am a Phd Candidate from your university majoring in xxx (EDITED BY 庄主). I find your forum by accident and I have read some reviews on it, I really feel it is a good place to know more statistics knowledge.

I want to ask one question about the interaction effects of two independent variables (or we may also can say the moderating effect of one independent variable and one moderator, I may name them "A" and "B") on one dependent variable (I may name it "C"). I first use the traditional method, OLS- multiple linear regression in SPSS to run the produced term A*B's effect on C, I got a significant result, the T value is around 2.2 thus the P<0.05.

Then I further explore the internal mechanism of the interaction effect by using Cohen and Cohen (1983), Aiken and West (1991) and Dawson and Richter’s (2006) Simple Slopes Test [this method is designed for interpretation of the interaction effects of two continuous predictor variables, by this way one could interpret the significance level of the causal relationships between independent variable and dependent variable under high or low level of the moderator. To illustrate and test the significant interaction effects, separate regression lines were computed, plotted, and tested with one standard deviation below the mean on the moderating variables as well as one standard deviation above the mean of them.], this time I find the causal relationship between A and C is insignificant both under high B and low B level, in fact the coefficients take the opposite direction. Under high B level the coefficient between A and C is negative but insignificant, under low B level the coefficient between A and C is positive but insignificant.

I originally explain the significant moderating effect is: although under both high and low B level the causal relationship between A and C is insignificant, but because of the opposite coefficient, thus the moderating effect (A*B) may still be significant. But current one famous professor rejected my explanation; he told me that the results were inconsistent: 1. from one side, the moderating effect is insignificant. 2. form the other side, both under high and low B level, the causal relationship between A and C is insignificant thus can be treated as no relationship. How can you say them taking moderating effect first and later told us the effects were equal (A and C have no relationship) under both conditions (high and low B level)? Thus I am a little confused, as you know, in many cases, run the linear regression will meet such question as I described, so commonly how can we explain this phenomenon to cope with the journal reviewers' critique on this issue?

Many thanks!

庄主 @ 2009-08-29:

Thanks for the detailed explanations of your question. It’s satisfying to know that someone of my own institution also reads this blog. Sorry for the delayed response as I’ve been traveling in the summer. To benefit other readers who might not be efficient in English, please allow me to reply in Chinese.

先简单回顾一下你的问题。你有模型1

C = b₀ + b₁A + b₂B + b₃AB (1)

其中A、B和C都是定距变量。你用OLS回归检验，发现b₃（还是b₁或b₂？）的t值 = 2.2 (p < 0.05)，即AB对C有显著的交互影响。为了进一步理解这种交互关系的“内在机制”，你采用了Cohen & Cohen等推荐的“简单斜率检验法”(test of simple regression slope)，即根据模型1的结果，将B的均值±1个标准差的值（分别记为B_H和B_L）代人模型1，来算出以下两个简单回归模型的斜率：

C = b₀ + b₁A + b₂B_L+ b₃AB_L = (b₀ + b₂B_L) + (b₁+ b₃B_L)A (2)

和

C = b₀ + b₁A + b₂B_H+ b₃AB_H = (b₀ + b₂B_H) + (b₁+ b₃B_H)A (3)

说明：因为B_H和B_L均是一个常数（而原来的B是一个变量），所以它们代入模型1后而得到的模型2和3，经过整理以后，都成为只含自变量A的一元（或简单）回归模型，而模型2中的b₁+ b₃B_L和模型3中的b₁+ b₃B_H就是你说的的简单斜率（就是我用蓝色标明的部分）。这时，你发现模型2的简单斜率b₁+ b₃B_L成了负值而模型3的简单斜率b₁+ b₃B_H仍是正值。你进一步对这两个斜率作了显著检验，发现两者均不显著。你的结论是“虽然A在调节变量B的不同条件下对C都没有显著影响，但是A和B的显著交互影响仍然存在”。但是，一位著名教授不同意你的说法，理由是：一、你的交互影响不显著（我不理解这句话，因为它与你说的模型1中的t = 2.2有矛盾）；二、A在B的不同条件下对C的影响都不显著。（不知我的上述理解是否有误？）

好了，现在谈谈我的看法。

首先，我有两处不清楚：一、如我已在上面问过，模型1中显著的是b₃还是其它系数？二、你还没有提到，模型1中的A、B和AB是否为各自的centered values（“取中值”，以避免或降低AB与A、AB与C之间的相关程度）？鉴于你已读过Cohen & Cohen, Aiken & West等经典文献，应该熟悉检验交互影响的基本步骤，所以我的回答是基于以下假定的：一、你说的模型1中t = 2.2的显著系数是b₃；二、模型1中AB与A、AB与C之间并不相关（这点很重要，不然、假定一是没有意义的）。

如果上述两个条件成立，那么你的A和B在样本中对C就是确实具有显著的交互影响。当然，因为你的t值接近临界值（1.96或更大），AB的交互影响应该是marginal（相当勉强）的，所以要审慎对待，至少要检查一下A的数据中是否存在异常值；如有异常值，则需要剔除后再次检验模型1，看看AB的影响是否继续保持显著，已确保该模型的robustness（“鲁棒性”）。

你的主要困惑（也是你教授理由之二）在于：为什么在模型1的b₃显著的前提之下，模型2和3的斜率不显著？这里存在一个许多教科书上没有明确解释、但初学者往往容易误解的事实，即AB之间的显著交互关系只是说明自变量(A)对因变量(C)的主影响(main effects)将随着调节变量(B)的取值而变化，但并不保证在B的不同取值上A的所有主影响都是显著的（但至少有一个取值上A的主影响是显著的，否则AB不可能显著）。用英语说，A significant interaction effect ensures not only the main effect of an independent variable on a dependent variable varies across different levels of a moderator variable, but also at least one of the main effects is significantly different from zero. However, the significant interaction doesn’t guarantee all main effects of the independent variable on the dependent variable to differ significantly from zero. 我们很容易用以下的图示来说明这个道理。

左图是我随手画的，其中有五条回归线，但都是根据同一模型（如你的模型1）而取B的最大值(Max)、最小值(Min)、均值(Mean)、均值减一个标准差（你的模型2）、均值加一个标准差（你的模型3）而画出的。虽然没有实际数据，但可以大致猜出五条回归线所依据的模型1的b₁和b₃的取值范围，即b₁（即A的总斜率）应该等于0，b₃（即AB的交互影响）则明显大于0（因为A对C的影响是放射的、即随着B的增加而增加）。当然，我们并不知道（也无必要知道）b₀和b₂的取值范围，因为它们与本案毫无关系。

图中B=均值的回归线显然是一条水平线，所以是不显著的(即斜率等于0)。而B=均值加/减标准差的两条线，虽然并非水平线，但也很接近，考虑到各自的抽样误差（注意，不要与标准差相混淆了）而其斜率与0没有显著差别（也许你的数据就是这种情况）。但是，B=极大值/极小值的两条线的斜率显然不等于0。如果我们再添加B=均值+/-2个标准差的两条线，可以想象它们的斜率也会不等于0。结论：当A和B对C有显著影响时，A对C的影响随着在B的取值而变化，有些显著而有些可能不显著。

以上是直观的解释。我们还需要略正规一点地总结一下。回到模型2和3，注意其中的蓝色部分（即各自的斜率）。它们是否=0，是由下述公式来检验的：

和

其中Var_b₁和Var_b₂分别是b₁和₂的方差、Cov_b₁b₃是b₁和₂的协方差。我们略过如何计算系数的方差和协方差的技术细节，而来看一下如何使得t_L和t_H达到显著水平（即大于2）。不言而喻的是分子要大而分母要小。就分子而言，从表面上看b₁、b₃和B_H或B_L三者均要越大越好，但更重要的是要三者取同样方向，否则会互相抵消（这不容易做到，尤其是当数据经过中心化处理之后，B_L一定是负的，而b₁或b₃的方向也会因A或B的中心化而与原始数据的方向相反）。而且，调节变量的取值(B_L或B_H)也不是越大越好，因为同时也会扩大分母。结论：t_L和t_H的取值受到很多正反因素的影响，很难做到永远显著。但是，这与b₁是否显著没有一一对应关系。

最后，你问如何使得期刊的reviewers接受你的解释。是的，有些（如果我用“很多”的话，大家可能会觉得我太狂妄，但是事实上“有些”不是一个小数目）reviewers也会将交互影响等同于主影响，所以你不仅要自己弄得很清楚，而且要说得很明白易懂，这时，公式和图表就是必要的辅助工具了。如何写好交互影响的报告，确是一个挑战。多读几遍Cohen & Cohen吧（他们的第三版就邀Aiken和West加盟合写了）。

2009-08-30

如何解读一个交互影响显著而简单斜率不显著的回归模型？

1 comment:

Blog Archive

博客分类

你从哪里来