We’ll use the Old Faithful data that comes with R, type just “faithful” and hit enter to see the raw data
faithful
## eruptions waiting
## 1 3.600 79
## 2 1.800 54
## 3 3.333 74
## 4 2.283 62
## 5 4.533 85
## 6 2.883 55
## 7 4.700 88
## 8 3.600 85
## 9 1.950 51
## 10 4.350 85
## 11 1.833 54
## 12 3.917 84
## 13 4.200 78
## 14 1.750 47
## 15 4.700 83
## 16 2.167 52
## 17 1.750 62
## 18 4.800 84
## 19 1.600 52
## 20 4.250 79
## 21 1.800 51
## 22 1.750 47
## 23 3.450 78
## 24 3.067 69
## 25 4.533 74
## 26 3.600 83
## 27 1.967 55
## 28 4.083 76
## 29 3.850 78
## 30 4.433 79
## 31 4.300 73
## 32 4.467 77
## 33 3.367 66
## 34 4.033 80
## 35 3.833 74
## 36 2.017 52
## 37 1.867 48
## 38 4.833 80
## 39 1.833 59
## 40 4.783 90
## 41 4.350 80
## 42 1.883 58
## 43 4.567 84
## 44 1.750 58
## 45 4.533 73
## 46 3.317 83
## 47 3.833 64
## 48 2.100 53
## 49 4.633 82
## 50 2.000 59
## 51 4.800 75
## 52 4.716 90
## 53 1.833 54
## 54 4.833 80
## 55 1.733 54
## 56 4.883 83
## 57 3.717 71
## 58 1.667 64
## 59 4.567 77
## 60 4.317 81
## 61 2.233 59
## 62 4.500 84
## 63 1.750 48
## 64 4.800 82
## 65 1.817 60
## 66 4.400 92
## 67 4.167 78
## 68 4.700 78
## 69 2.067 65
## 70 4.700 73
## 71 4.033 82
## 72 1.967 56
## 73 4.500 79
## 74 4.000 71
## 75 1.983 62
## 76 5.067 76
## 77 2.017 60
## 78 4.567 78
## 79 3.883 76
## 80 3.600 83
## 81 4.133 75
## 82 4.333 82
## 83 4.100 70
## 84 2.633 65
## 85 4.067 73
## 86 4.933 88
## 87 3.950 76
## 88 4.517 80
## 89 2.167 48
## 90 4.000 86
## 91 2.200 60
## 92 4.333 90
## 93 1.867 50
## 94 4.817 78
## 95 1.833 63
## 96 4.300 72
## 97 4.667 84
## 98 3.750 75
## 99 1.867 51
## 100 4.900 82
## 101 2.483 62
## 102 4.367 88
## 103 2.100 49
## 104 4.500 83
## 105 4.050 81
## 106 1.867 47
## 107 4.700 84
## 108 1.783 52
## 109 4.850 86
## 110 3.683 81
## 111 4.733 75
## 112 2.300 59
## 113 4.900 89
## 114 4.417 79
## 115 1.700 59
## 116 4.633 81
## 117 2.317 50
## 118 4.600 85
## 119 1.817 59
## 120 4.417 87
## 121 2.617 53
## 122 4.067 69
## 123 4.250 77
## 124 1.967 56
## 125 4.600 88
## 126 3.767 81
## 127 1.917 45
## 128 4.500 82
## 129 2.267 55
## 130 4.650 90
## 131 1.867 45
## 132 4.167 83
## 133 2.800 56
## 134 4.333 89
## 135 1.833 46
## 136 4.383 82
## 137 1.883 51
## 138 4.933 86
## 139 2.033 53
## 140 3.733 79
## 141 4.233 81
## 142 2.233 60
## 143 4.533 82
## 144 4.817 77
## 145 4.333 76
## 146 1.983 59
## 147 4.633 80
## 148 2.017 49
## 149 5.100 96
## 150 1.800 53
## 151 5.033 77
## 152 4.000 77
## 153 2.400 65
## 154 4.600 81
## 155 3.567 71
## 156 4.000 70
## 157 4.500 81
## 158 4.083 93
## 159 1.800 53
## 160 3.967 89
## 161 2.200 45
## 162 4.150 86
## 163 2.000 58
## 164 3.833 78
## 165 3.500 66
## 166 4.583 76
## 167 2.367 63
## 168 5.000 88
## 169 1.933 52
## 170 4.617 93
## 171 1.917 49
## 172 2.083 57
## 173 4.583 77
## 174 3.333 68
## 175 4.167 81
## 176 4.333 81
## 177 4.500 73
## 178 2.417 50
## 179 4.000 85
## 180 4.167 74
## 181 1.883 55
## 182 4.583 77
## 183 4.250 83
## 184 3.767 83
## 185 2.033 51
## 186 4.433 78
## 187 4.083 84
## 188 1.833 46
## 189 4.417 83
## 190 2.183 55
## 191 4.800 81
## 192 1.833 57
## 193 4.800 76
## 194 4.100 84
## 195 3.966 77
## 196 4.233 81
## 197 3.500 87
## 198 4.366 77
## 199 2.250 51
## 200 4.667 78
## 201 2.100 60
## 202 4.350 82
## 203 4.133 91
## 204 1.867 53
## 205 4.600 78
## 206 1.783 46
## 207 4.367 77
## 208 3.850 84
## 209 1.933 49
## 210 4.500 83
## 211 2.383 71
## 212 4.700 80
## 213 1.867 49
## 214 3.833 75
## 215 3.417 64
## 216 4.233 76
## 217 2.400 53
## 218 4.800 94
## 219 2.000 55
## 220 4.150 76
## 221 1.867 50
## 222 4.267 82
## 223 1.750 54
## 224 4.483 75
## 225 4.000 78
## 226 4.117 79
## 227 4.083 78
## 228 4.267 78
## 229 3.917 70
## 230 4.550 79
## 231 4.083 70
## 232 2.417 54
## 233 4.183 86
## 234 2.217 50
## 235 4.450 90
## 236 1.883 54
## 237 1.850 54
## 238 4.283 77
## 239 3.950 79
## 240 2.333 64
## 241 4.150 75
## 242 2.350 47
## 243 4.933 86
## 244 2.900 63
## 245 4.583 85
## 246 3.833 82
## 247 2.083 57
## 248 4.367 82
## 249 2.133 67
## 250 4.350 74
## 251 2.200 54
## 252 4.450 83
## 253 3.567 73
## 254 4.500 73
## 255 4.150 88
## 256 3.817 80
## 257 3.917 71
## 258 4.450 83
## 259 2.000 56
## 260 4.283 79
## 261 4.767 78
## 262 4.533 84
## 263 1.850 58
## 264 4.250 83
## 265 1.983 43
## 266 2.250 60
## 267 4.750 75
## 268 4.117 81
## 269 2.150 46
## 270 4.417 90
## 271 1.817 46
## 272 4.467 74
To get information on this data, see the help file
?faithful
Notice this is a “data frame” in R, which just means a table of data with named columns. To retrieve a single column, use the “$” operator:
faithful$eruptions
## [1] 3.600 1.800 3.333 2.283 4.533 2.883 4.700 3.600 1.950 4.350 1.833 3.917
## [13] 4.200 1.750 4.700 2.167 1.750 4.800 1.600 4.250 1.800 1.750 3.450 3.067
## [25] 4.533 3.600 1.967 4.083 3.850 4.433 4.300 4.467 3.367 4.033 3.833 2.017
## [37] 1.867 4.833 1.833 4.783 4.350 1.883 4.567 1.750 4.533 3.317 3.833 2.100
## [49] 4.633 2.000 4.800 4.716 1.833 4.833 1.733 4.883 3.717 1.667 4.567 4.317
## [61] 2.233 4.500 1.750 4.800 1.817 4.400 4.167 4.700 2.067 4.700 4.033 1.967
## [73] 4.500 4.000 1.983 5.067 2.017 4.567 3.883 3.600 4.133 4.333 4.100 2.633
## [85] 4.067 4.933 3.950 4.517 2.167 4.000 2.200 4.333 1.867 4.817 1.833 4.300
## [97] 4.667 3.750 1.867 4.900 2.483 4.367 2.100 4.500 4.050 1.867 4.700 1.783
## [109] 4.850 3.683 4.733 2.300 4.900 4.417 1.700 4.633 2.317 4.600 1.817 4.417
## [121] 2.617 4.067 4.250 1.967 4.600 3.767 1.917 4.500 2.267 4.650 1.867 4.167
## [133] 2.800 4.333 1.833 4.383 1.883 4.933 2.033 3.733 4.233 2.233 4.533 4.817
## [145] 4.333 1.983 4.633 2.017 5.100 1.800 5.033 4.000 2.400 4.600 3.567 4.000
## [157] 4.500 4.083 1.800 3.967 2.200 4.150 2.000 3.833 3.500 4.583 2.367 5.000
## [169] 1.933 4.617 1.917 2.083 4.583 3.333 4.167 4.333 4.500 2.417 4.000 4.167
## [181] 1.883 4.583 4.250 3.767 2.033 4.433 4.083 1.833 4.417 2.183 4.800 1.833
## [193] 4.800 4.100 3.966 4.233 3.500 4.366 2.250 4.667 2.100 4.350 4.133 1.867
## [205] 4.600 1.783 4.367 3.850 1.933 4.500 2.383 4.700 1.867 3.833 3.417 4.233
## [217] 2.400 4.800 2.000 4.150 1.867 4.267 1.750 4.483 4.000 4.117 4.083 4.267
## [229] 3.917 4.550 4.083 2.417 4.183 2.217 4.450 1.883 1.850 4.283 3.950 2.333
## [241] 4.150 2.350 4.933 2.900 4.583 3.833 2.083 4.367 2.133 4.350 2.200 4.450
## [253] 3.567 4.500 4.150 3.817 3.917 4.450 2.000 4.283 4.767 4.533 1.850 4.250
## [265] 1.983 2.250 4.750 4.117 2.150 4.417 1.817 4.467
faithful$waiting
## [1] 79 54 74 62 85 55 88 85 51 85 54 84 78 47 83 52 62 84 52 79 51 47 78 69 74
## [26] 83 55 76 78 79 73 77 66 80 74 52 48 80 59 90 80 58 84 58 73 83 64 53 82 59
## [51] 75 90 54 80 54 83 71 64 77 81 59 84 48 82 60 92 78 78 65 73 82 56 79 71 62
## [76] 76 60 78 76 83 75 82 70 65 73 88 76 80 48 86 60 90 50 78 63 72 84 75 51 82
## [101] 62 88 49 83 81 47 84 52 86 81 75 59 89 79 59 81 50 85 59 87 53 69 77 56 88
## [126] 81 45 82 55 90 45 83 56 89 46 82 51 86 53 79 81 60 82 77 76 59 80 49 96 53
## [151] 77 77 65 81 71 70 81 93 53 89 45 86 58 78 66 76 63 88 52 93 49 57 77 68 81
## [176] 81 73 50 85 74 55 77 83 83 51 78 84 46 83 55 81 57 76 84 77 81 87 77 51 78
## [201] 60 82 91 53 78 46 77 84 49 83 71 80 49 75 64 76 53 94 55 76 50 82 54 75 78
## [226] 79 78 78 70 79 70 54 86 50 90 54 54 77 79 64 75 47 86 63 85 82 57 82 67 74
## [251] 54 83 73 73 88 80 71 83 56 79 78 84 58 83 43 60 75 81 46 90 46 74
The “summary” command in R will give you some basic statistics
summary(faithful)
## eruptions waiting
## Min. :1.600 Min. :43.0
## 1st Qu.:2.163 1st Qu.:58.0
## Median :4.000 Median :76.0
## Mean :3.488 Mean :70.9
## 3rd Qu.:4.454 3rd Qu.:82.0
## Max. :5.100 Max. :96.0
The usual sample statistics are also commands:
mean(faithful$eruptions) # mean
## [1] 3.487783
median(faithful$eruptions) # median
## [1] 4
var(faithful$eruptions) # variance
## [1] 1.302728
sd(faithful$eruptions) # standard deviation
## [1] 1.141371
We can look at joint statistics such as sample covariance and correlation also
cov(faithful$waiting, faithful$eruptions)
## [1] 13.97781
cor(faithful$waiting, faithful$eruptions)
## [1] 0.9008112
As a first visualization, we could look at the histogram
hist(faithful$eruptions, main="Histogram of Old Faithful Eruption Time")
The empirical cdf is plotted using the “ecdf” command
plot(ecdf(faithful$eruptions), cex.points=0.5, main="Empirical CDF for Old Faithful Eruption Time")
Here’s how to do box plots:
boxplot(faithful$eruptions, main="Box Plot of Old Faithful Eruption Time")
Scatter plots are useful for looking at two random variables, X, Y, and their relationships
plot(faithful$eruptions, faithful$waiting, main="Scatter Plot of Old Faithful Waiting vs Eruption Times")
This example is taken from Wikipedia. It is a box plot of the Michelson-Morley speed of light experiments
morley$Expt <- factor(morley$Expt)
par(las=1, mar=c(5.1, 5.1, 2.1, 2.1))
boxplot(Speed ~ Expt, morley, xlab = "Experiment No.",
ylab="Speed of light (km/s minus 299,000)")
abline(h=792.458, col="red")
text(3,792.458,"true\nspeed")
Another nice example of a box plot is at the bottom of the help page for boxplot
?boxplot