找回密码
 立即注册
查看: 4792|回复: 1

Exploratory Multivariate Analysis by Example Using R英文版下载

[复制链接]
发表于 2013-2-15 17:39:46 | 显示全部楼层 |阅读模式
Exploratory Multivariate Analysis by Example Using R.pdf
目录
1 Principal Component Analysis (PCA) 1
1.1 Data | Notation | Examples . . . . . . . . . . . . . . . . . 1
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.1 Studying Individuals . . . . . . . . . . . . . . . . . . . 2
1.2.2 Studying Variables . . . . . . . . . . . . . . . . . . . . 3
1.2.3 Relationships between the Two Studies . . . . . . . . 5
1.3 Studying Individuals . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 The Cloud of Individuals . . . . . . . . . . . . . . . . 5
1.3.2 Fitting the Cloud of Individuals . . . . . . . . . . . . 7
1.3.2.1 Best Plane Representation of NI . . . . . . . 7
1.3.2.2 Sequence of Axes for Representing NI . . . . 9
1.3.2.3 How Are the Components Obtained? . . . . 10
1.3.2.4 Example . . . . . . . . . . . . . . . . . . . . 10
1.3.3 Representation of the Variables as an Aid for
Interpreting the Cloud of Individuals . . . . . . . . . . 11
1.4 Studying Variables . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.1 The Cloud of Variables . . . . . . . . . . . . . . . . . 13
1.4.2 Fitting the Cloud of Variables . . . . . . . . . . . . . . 14
1.5 Relationships between the Two Representations NI and NK 16
1.6 Interpreting the Data . . . . . . . . . . . . . . . . . . . . . . 17
1.6.1 Numerical Indicators . . . . . . . . . . . . . . . . . . . 17
1.6.1.1 Percentage of Inertia Associated with a
Component . . . . . . . . . . . . . . . . . . . 17
1.6.1.2 Quality of Representation of an Individual or
Variable . . . . . . . . . . . . . . . . . . . . . 18
1.6.1.3 Detecting Outliers . . . . . . . . . . . . . . . 19
1.6.1.4 Contribution of an Individual or Variable to
the Construction of a Component . . . . . . 19
1.6.2 Supplementary Elements . . . . . . . . . . . . . . . . . 20
1.6.2.1 Representing Supplementary Quantitative
Variables . . . . . . . . . . . . . . . . . . . . 21
1.6.2.2 Representing Supplementary Categorical
Variables . . . . . . . . . . . . . . . . . . . . 22
1.6.2.3 Representing Supplementary Individuals . . 23
v
vi Exploratory Multivariate Analysis by Example Using R
1.6.3 Automatic Description of the Components . . . . . . . 24
1.7 Implementation with FactoMineR . . . . . . . . . . . . . . . 25
1.8 Additional Results . . . . . . . . . . . . . . . . . . . . . . . . 26
1.8.1 Testing the Signi cance of the Components . . . . . . 26
1.8.2 Variables: Loadings versus Correlations . . . . . . . . 27
1.8.3 Simultaneous Representation: Biplots . . . . . . . . . 27
1.8.4 Missing Values . . . . . . . . . . . . . . . . . . . . . . 28
1.8.5 Large Datasets . . . . . . . . . . . . . . . . . . . . . . 28
1.8.6 Varimax Rotation . . . . . . . . . . . . . . . . . . . . 28
1.9 Example: The Decathlon Dataset . . . . . . . . . . . . . . . 29
1.9.1 Data Description | Issues . . . . . . . . . . . . . . . 29
1.9.2 Analysis Parameters . . . . . . . . . . . . . . . . . . . 31
1.9.2.1 Choice of Active Elements . . . . . . . . . . 31
1.9.2.2 Should the Variables Be Standardised? . . . 31
1.9.3 Implementation of the Analysis . . . . . . . . . . . . . 31
1.9.3.1 Choosing the Number of Dimensions to
Examine . . . . . . . . . . . . . . . . . . . . 32
1.9.3.2 Studying the Cloud of Individuals . . . . . . 33
1.9.3.3 Studying the Cloud of Variables . . . . . . . 36
1.9.3.4 Joint Analysis of the Cloud of Individuals and
the Cloud of Variables . . . . . . . . . . . . . 39
1.9.3.5 Comments on the Data . . . . . . . . . . . . 43
1.10 Example: The Temperature Dataset . . . . . . . . . . . . . . 44
1.10.1 Data Description | Issues . . . . . . . . . . . . . . . 44
1.10.2 Analysis Parameters . . . . . . . . . . . . . . . . . . . 44
1.10.2.1 Choice of Active Elements . . . . . . . . . . 44
1.10.2.2 Should the Variables Be Standardised? . . . 45
1.10.3 Implementation of the Analysis . . . . . . . . . . . . . 46
1.11 Example of Genomic Data: The Chicken Dataset . . . . . . 51
1.11.1 Data Description | Issues . . . . . . . . . . . . . . . 51
1.11.2 Analysis Parameters . . . . . . . . . . . . . . . . . . . 52
1.11.3 Implementation of the Analysis . . . . . . . . . . . . . 52
2 Correspondence Analysis (CA) 59
2.1 Data | Notation | Examples . . . . . . . . . . . . . . . . . 59
2.2 Objectives and the Independence Model . . . . . . . . . . . . 61
2.2.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.2.2 Independence Model and 2 Test . . . . . . . . . . . . 62
2.2.3 The Independence Model and CA . . . . . . . . . . . 64
2.3 Fitting the Clouds . . . . . . . . . . . . . . . . . . . . . . . . 65
2.3.1 Clouds of Row Pro les . . . . . . . . . . . . . . . . . . 65
2.3.2 Clouds of Column Pro les . . . . . . . . . . . . . . . . 66
2.3.3 Fitting Clouds NI and NJ . . . . . . . . . . . . . . . . 68
2.3.4 Example: Women's Attitudes toWomen'sWork in France
in 1970 . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Contents vii
2.3.4.1 Column Representation (Mother's Activity) . 70
2.3.4.2 Row Representation (Partner's Work) . . . . 72
2.3.5 Superimposed Representation of Both Rows and
Columns . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.4 Interpreting the Data . . . . . . . . . . . . . . . . . . . . . . 77
2.4.1 Inertias Associated with the Dimensions (Eigenvalues) 77
2.4.2 Contribution of Points to a Dimension's Inertia . . . . 80
2.4.3 Representation Quality of Points on a Dimension or
Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.4.4 Distance and Inertia in the Initial Space . . . . . . . . 82
2.5 Supplementary Elements (= Illustrative) . . . . . . . . . . . 83
2.6 Implementation with FactoMineR . . . . . . . . . . . . . . . 86
2.7 CA and Textual Data Processing . . . . . . . . . . . . . . . . 88
2.8 Example: The Olympic Games Dataset . . . . . . . . . . . . 92
2.8.1 Data Description | Issues . . . . . . . . . . . . . . . 92
2.8.2 Implementation of the Analysis . . . . . . . . . . . . . 94
2.8.2.1 Choosing the Number of Dimensions to
Examine . . . . . . . . . . . . . . . . . . . . 95
2.8.2.2 Studying the Superimposed Representation . 96
2.8.2.3 Interpreting the Results . . . . . . . . . . . . 96
2.8.2.4 Comments on the Data . . . . . . . . . . . . 100
2.9 Example: The White Wines Dataset . . . . . . . . . . . . . . 101
2.9.1 Data Description | Issues . . . . . . . . . . . . . . . 101
2.9.2 Margins . . . . . . . . . . . . . . . . . . . . . . . . . . 104
2.9.3 Inertia . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
2.9.4 Representation on the First Plane . . . . . . . . . . . 106
2.10 Example: The Causes of Mortality Dataset . . . . . . . . . . 109
2.10.1 Data Description | Issues . . . . . . . . . . . . . . . 109
2.10.2 Margins . . . . . . . . . . . . . . . . . . . . . . . . . . 111
2.10.3 Inertia . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
2.10.4 First Dimension . . . . . . . . . . . . . . . . . . . . . 115
2.10.5 Plane 2-3 . . . . . . . . . . . . . . . . . . . . . . . . . 117
2.10.6 Projecting the Supplementary Elements . . . . . . . . 121
2.10.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 125
3 Multiple Correspondence Analysis (MCA) 127
3.1 Data | Notation | Examples . . . . . . . . . . . . . . . . . 127
3.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
3.2.1 Studying Individuals . . . . . . . . . . . . . . . . . . . 128
3.2.2 Studying the Variables and Categories . . . . . . . . . 129
3.3 De ning Distances between Individuals and Distances between
Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
3.3.1 Distances between the Individuals . . . . . . . . . . . 130
3.3.2 Distances between the Categories . . . . . . . . . . . . 130
3.4 CA on the Indicator Matrix . . . . . . . . . . . . . . . . . . 132
viii Exploratory Multivariate Analysis by Example Using R
3.4.1 Relationship between MCA and CA . . . . . . . . . . 132
3.4.2 The Cloud of Individuals . . . . . . . . . . . . . . . . 133
3.4.3 The Cloud of Variables . . . . . . . . . . . . . . . . . 134
3.4.4 The Cloud of Categories . . . . . . . . . . . . . . . . . 135
3.4.5 Transition Relations . . . . . . . . . . . . . . . . . . . 138
3.5 Interpreting the Data . . . . . . . . . . . . . . . . . . . . . . 140
3.5.1 Numerical Indicators . . . . . . . . . . . . . . . . . . . 140
3.5.1.1 Percentage of Inertia Associated with a
Component . . . . . . . . . . . . . . . . . . . 140
3.5.1.2 Contribution and Representation Quality of
an Individual or Category . . . . . . . . . . . 141
3.5.2 Supplementary Elements . . . . . . . . . . . . . . . . . 142
3.5.3 Automatic Description of the Components . . . . . . . 143
3.6 Implementation with FactoMineR . . . . . . . . . . . . . . . 145
3.7 Addendum . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
3.7.1 Analysing a Survey . . . . . . . . . . . . . . . . . . . . 148
3.7.1.1 Designing a Questionnaire: Choice of Format 148
3.7.1.2 Accounting for Rare Categories . . . . . . . . 150
3.7.2 Description of a Categorical Variable or a
Subpopulation . . . . . . . . . . . . . . . . . . . . . . 150
3.7.2.1 Description of a Categorical Variable by a
Categorical Variable . . . . . . . . . . . . . . 150
3.7.2.2 Description of a Subpopulation (or a
Category) by a Quantitative Variable . . . . 151
3.7.2.3 Description of a Subpopulation (or a
Category) by the Categories of a Categorical
Variable . . . . . . . . . . . . . . . . . . . . . 152
3.7.3 The Burt Table . . . . . . . . . . . . . . . . . . . . . . 154
3.8 Example: The Survey on the Perception of Genetically
Modi ed Organisms . . . . . . . . . . . . . . . . . . . . . . . 155
3.8.1 Data Description | Issues . . . . . . . . . . . . . . . 155
3.8.2 Analysis Parameters and Implementation with
FactoMineR . . . . . . . . . . . . . . . . . . . . . . . . 158
3.8.3 Analysing the First Plane . . . . . . . . . . . . . . . . 159
3.8.4 Projection of Supplementary Variables . . . . . . . . . 160
3.8.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 162
3.9 Example: The Sorting Task Dataset . . . . . . . . . . . . . . 162
3.9.1 Data Description | Issues . . . . . . . . . . . . . . . 162
3.9.2 Analysis Parameters . . . . . . . . . . . . . . . . . . . 164
3.9.3 Representation of Individuals on the First Plane . . . 164
3.9.4 Representation of Categories . . . . . . . . . . . . . . 165
3.9.5 Representation of the Variables . . . . . . . . . . . . . 166
Contents ix
4 Clustering 169
4.1 Data | Issues . . . . . . . . . . . . . . . . . . . . . . . . . . 169
4.2 Formalising the Notion of Similarity . . . . . . . . . . . . . . 173
4.2.1 Similarity between Individuals . . . . . . . . . . . . . 173
4.2.1.1 Distances and Euclidean Distances . . . . . . 173
4.2.1.2 Example of Non-Euclidean Distance . . . . . 174
4.2.1.3 Other Euclidean Distances . . . . . . . . . . 175
4.2.1.4 Similarities and Dissimilarities . . . . . . . . 175
4.2.2 Similarity between Groups of Individuals . . . . . . . 176
4.3 Constructing an Indexed Hierarchy . . . . . . . . . . . . . . 177
4.3.1 Classic Agglomerative Algorithm . . . . . . . . . . . . 177
4.3.2 Hierarchy and Partitions . . . . . . . . . . . . . . . . . 179
4.4 Ward's Method . . . . . . . . . . . . . . . . . . . . . . . . . 179
4.4.1 Partition Quality . . . . . . . . . . . . . . . . . . . . . 180
4.4.2 Agglomeration According to Inertia . . . . . . . . . . 181
4.4.3 Two Properties of the Agglomeration Criterion . . . . 183
4.4.4 Analysing Hierarchies, Choosing Partitions . . . . . . 184
4.5 Direct Search for Partitions: K-means Algorithm . . . . . . . 185
4.5.1 Data | Issues . . . . . . . . . . . . . . . . . . . . . . 185
4.5.2 Principle . . . . . . . . . . . . . . . . . . . . . . . . . 186
4.5.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . 187
4.6 Partitioning and Hierarchical Clustering . . . . . . . . . . . . 187
4.6.1 Consolidating Partitions . . . . . . . . . . . . . . . . . 188
4.6.2 Mixed Algorithm . . . . . . . . . . . . . . . . . . . . . 188
4.7 Clustering and Principal Component Methods . . . . . . . . 188
4.7.1 Principal Component Methods Prior to AHC . . . . . 189
4.7.2 Simultaneous Analysis of a Principal Component Map
and Hierarchy . . . . . . . . . . . . . . . . . . . . . . . 189
4.8 Example: The Temperature Dataset . . . . . . . . . . . . . . 190
4.8.1 Data Description | Issues . . . . . . . . . . . . . . . 190
4.8.2 Analysis Parameters . . . . . . . . . . . . . . . . . . . 190
4.8.3 Implementation of the Analysis . . . . . . . . . . . . . 191
4.9 Example: The Tea Dataset . . . . . . . . . . . . . . . . . . . 197
4.9.1 Data Description | Issues . . . . . . . . . . . . . . . 197
4.9.2 Constructing the AHC . . . . . . . . . . . . . . . . . . 197
4.9.3 De ning the Clusters . . . . . . . . . . . . . . . . . . . 199
4.10 Dividing Quantitative Variables into Classes . . . . . . . . . 202
Appendix 205
A.1 Percentage of Inertia Explained by the First Component or by
the First Plane . . . . . . . . . . . . . . . . . . . . . . . . . . 205
A.2 R Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
A.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 210
A.2.2 The Rcmdr Package . . . . . . . . . . . . . . . . . . . 214
A.2.3 The FactoMineR Package . . . . . . . . . . . . . . . . 216
x Exploratory Multivariate Analysis by Example Using R
Bibliography of Software Packages 221
Bibliography 223
Index 225


下载:
Exploratory_Multivariate_Analysis_by_Example_Using_R.rar (8.6 MB, 下载次数: 0, 售价: 5 )

备注:
很多人都有收集一堆资料而不看的习惯。为了有效利用资源,养成下载一本看一本的习惯,特设置了积分下载,请见谅。
多参加论坛的活动、多帮助别人,会很容易凑够积分的!
祝大家使用愉快!
回复

使用道具 举报

发表于 2013-4-23 23:41:27 | 显示全部楼层
钱不够,晕……
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

Archiver|手机版|小黑屋|R语言中文网

GMT+8, 2024-11-25 01:56 , Processed in 0.023241 second(s), 20 queries .

Powered by Discuz! X3.5

© 2001-2024 Discuz! Team.

快速回复 返回顶部 返回列表