第一题应该是binary logistic 我只会minitab。。。sorry
不知道OR是什么意思 count/4000 需要小数
第二题
> x1<-count1
> x2<-edu
> x3<-view
> d<-data.frame(x1,x2,x3)
> cor(d)
x1 x2 x3
x1 1.0000000 0.387191 0.1081838
x2 0.3871910 1.000000 0.0000000
x3 0.1081838 0.000000 1.0000000
x1和x2 有一个微弱的积极线性关系 x3 则极其微弱
x3,x2 更应该 是categorical predictor
假设x1和x2 有线性关系
》job<-within(job,{view<-factor(view,levels=1:4, labels=c("view1","view2","view3","view4"))})
> m1=glm(count1~view+edu,family=poisson(link=log), data=job)
> summary(m1)
Call:
glm(formula = count1 ~ view + edu, family = poisson(link = log),
data = job)
Deviance Residuals:
Min 1Q Median 3Q Max
-12.390 -4.687 -2.100 4.209 10.019
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.13697 0.10357 20.634 <2e-16 ***
viewview2 1.39829 0.09274 15.078 <2e-16 ***
viewview3 2.13478 0.08782 24.309 <2e-16 ***
viewview4 0.01370 0.11704 0.117 0.907
edu 0.36656 0.01675 21.878 <2e-16 ***
然后X2和X3 都是cat per
> job<-within(job,{edu<-factor(edu,levels=1:5, labels=c("edu1","edu2","edu3","edu4","edu5"))})
> m1=glm(count1~view+edu,family=poisson(link=log), data=job)
> summary(m1)
Call:
glm(formula = count1 ~ view + edu, family = poisson(link = log),
data = job)
Deviance Residuals:
Min 1Q Median 3Q Max
-5.8543 -1.5090 -0.4735 1.5840 3.1883
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.03824 0.17553 5.915 3.32e-09 ***
viewview2 1.39829 0.09274 15.078 < 2e-16 ***
viewview3 2.13478 0.08782 24.309 < 2e-16 ***
viewview4 0.01370 0.11704 0.117 0.907
eduedu2 1.33628 0.17550 7.614 2.65e-14 ***
eduedu3 2.74163 0.16113 17.015 < 2e-16 ***
eduedu4 3.02695 0.15991 18.929 < 2e-16 ***
eduedu5 2.34087 0.16352 14.316 < 2e-16 ***
R-Sq R-Sq(adj) AIC
96.63% 96.37% 221.45
r-square 很好 p=value也都很小
然后比较negative bionomal regression
> m2=glm.nb(count1~view+edu,link=log, data=job)
> summary(m2)
Call:
glm.nb(formula = count1 ~ view + edu, data = job, link = log,
init.theta = 18.36584586)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.5034 -0.6295 -0.1434 0.8038 1.4841
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.0284 0.2552 4.029 5.59e-05 ***
viewview2 1.5430 0.1901 8.116 4.80e-16 ***
viewview3 2.1684 0.1873 11.577 < 2e-16 ***
viewview4 0.1115 0.2062 0.541 0.589
eduedu2 1.2238 0.2600 4.706 2.52e-06 ***
eduedu3 2.7106 0.2460 11.020 < 2e-16 ***
eduedu4 2.9712 0.2449 12.132 < 2e-16 ***
eduedu5 2.2527 0.2486 9.061 < 2e-16 ***
negativeB 要稍微好一些 但是他们的coefficient十分接近 如果data在大一些他们会几乎相同
|