Math 3080-1 TERM PROJECT Due Thursday, Treibergs April 28, 2005 Instructions You are to work alone on this project, using only your texts and notes. You may ask either Mike Peterson or me questions but noone else. In answering each question, you should provide a typewritten response with references to statistical output that you will attach to the back of your report. Be sure to number the pages and refer to the appropriate page number. The response to each question should conclude with a brief summary of your findings described in the context of the practical situation and written for an audience of non-statisticians. Questions on the Earthquake Data Data from S. Kouratkis, G. Kouratkis, et. al., Seismic hazard in Greece based on different strong ground motion parameters, Journal of earthquake engineering, 2002; as described by Navidi, Mc Graw Hill, 2006. The researchers were interested in "strong ground motion", which is the length of time that the acceleration of the ground exceeds a specified value. For each earth tremor event, the strong ground motion was timed at one or more locations. 121 measurements are recorded in the table. y is the duration time in seconds that the grounds acceleration exceeded twice the acceleration due to gravity, m is the magnitude of the earthquake, d is the distance in kilometers from the measurement to the epicenter, and the two indicator variables s1, s2 represent the soil type. s1=1 if the soil consists of soft alluvial deposits, s1=0 otherwise, and s2=1 if the soil consists of tertiary or older rock, s2=0 otherwise. Cases when both s1=s2=0 correspond to intermediate soil conditions. 1. Using the earthquake data, develop a multiple linear regression model to predict the duration y fom all of the variables m, d, s1 and s2. Be sure to consider transformations of the variables as well as powers of the variables and interactions between the independent variables. Are there any variables you would delete from the model? If so, explain why. Remove the variables you do not find useful and give the best model you would recommend. Give the reasons for your choice. Discuss the usefulness of your model and discuss whether the model assumptions are reasonably satisfied. Math 3080 - 1 EARTHQUAKE DATA obs y m d s1 s2 obs y m d s1 s2 obs y m d s1 s2 1 8.82 6.4 30 1 0 42 4.31 5.3 6 0 0 82 5.74 5.6 15 0 0 2 4.08 5.2 7 0 0 43 28.27 6.6 31 1 0 83 5.13 6.9 128 1 0 3 15.9 6.9 105 1 0 44 17.94 6.9 33 0 0 84 3.2 5.1 13 0 0 4 6.04 5.8 15 0 0 45 3.6 5.4 6 0 0 85 7.29 5.2 19 1 0 5 0.15 4.9 16 1 0 46 7.98 5.3 12 1 0 86 0.02 6.2 68 1 0 6 5.06 6.2 75 1 0 47 16.23 6.2 13 0 0 87 7.03 5.4 10 0 0 7 0.01 6.6 119 0 1 48 3.67 6.6 85 1 0 88 2.17 5.1 45 0 1 8 4.13 5.1 10 1 0 49 6.44 5.2 21 0 0 89 4.27 5.2 18 1 0 9 0.02 5.3 22 0 1 50 10.45 5.3 11 0 1 90 2.25 4.8 14 0 1 10 2.14 4.5 12 0 1 51 8.32 5.5 22 1 0 91 3.1 5.5 15 0 0 11 4.41 5.2 17 0 0 52 5.43 5.2 49 0 1 92 6.18 5.2 13 0 0 12 17.19 5.9 9 0 0 53 4.78 5.5 1 0 0 93 4.56 5.5 1 0 0 13 5.14 5.5 10 1 0 54 2.82 5.5 20 0 1 94 0.94 5 6 0 1 14 0.05 4.9 14 1 0 55 3.51 5.7 22 0 0 95 2.85 4.6 21 1 0 15 20 5.8 16 1 0 56 13.92 5.8 34 1 0 96 4.21 4.7 20 1 0 16 12.04 6.1 31 0 0 57 3.96 6.1 44 0 0 97 1.93 5.7 39 1 0 17 0.87 5 65 1 0 58 6.91 5.4 16 0 0 98 1.56 5 44 1 0 18 0.62 4.8 11 1 0 59 5.63 5.3 6 1 0 99 5.03 5.1 2 1 0 19 8.1 5.4 12 1 0 60 0.1 5.2 21 1 0 100 0.51 4.9 14 1 0 20 1.3 5.8 34 1 0 61 5.1 4.8 16 1 0 101 13.14 5.6 5 1 0 21 11.92 5.6 5 0 0 62 16.52 5.5 15 1 0 102 8.16 5.5 12 1 0 22 3.93 5.7 65 1 0 63 19.84 5.7 50 1 0 103 10.04 5.1 28 1 0 23 2 5.4 27 0 1 64 1.65 5.4 27 1 0 104 0.79 5.4 35 0 0 24 0.43 5.4 31 0 1 65 1.75 5.4 30 0 1 105 0.02 5.4 32 1 0 25 14.22 6.5 20 0 1 66 6.37 6.5 90 1 0 106 0.1 6.5 61 0 1 26 0.06 6.5 72 0 1 67 2.78 4.9 8 0 0 107 5.43 5.2 9 0 0 27 1.48 5.2 27 0 0 68 2.14 5.2 22 0 0 108 0.81 4.6 9 0 0 28 3.27 5.1 12 0 0 69 0.92 5.2 29 0 0 109 0.73 5.2 22 0 0 29 6.36 5.2 14 0 0 70 3.18 4.8 15 0 0 110 11.18 5 8 0 0 30 0.18 5 19 0 0 71 1.2 5 19 0 0 111 2.54 4.5 6 0 0 31 0.31 4.5 12 0 0 72 4.37 4.7 5 0 0 112 1.55 4.7 13 0 1 32 1.9 4.7 12 0 0 73 1.02 5 14 0 0 113 0.01 4.5 17 0 0 33 0.29 4.7 5 1 0 74 0.71 4.8 4 1 0 114 0.21 4.8 5 0 1 34 6.26 6.3 9 1 0 75 4.27 6.3 9 0 1 115 0.04 4.5 3 1 0 35 3.44 5.4 4 1 0 76 3.25 5.4 4 0 1 116 0.01 4.5 1 1 0 36 2.32 5.4 5 1 0 77 0.9 4.7 4 1 0 117 1.19 4.7 3 1 0 37 1.49 5 4 1 0 78 0.37 5 4 0 1 118 2.66 5.4 1 1 0 38 2.85 5.4 1 0 1 79 21.07 6.4 78 0 1 119 7.47 6.4 104 0 0 39 0.01 6.4 86 0 1 80 0.04 6.4 105 0 1 120 30.45 6.6 51 1 0 40 9.34 6.6 116 0 1 81 15.3 6.6 82 0 1 121 12.78 6.6 65 1 0 41 10.47 6.6 117 0 0 Questions on the Traffic Data Data from Huber, Transportation Research Board, National Research Council, Washington D.C., 1957, as described by Sen and Srivastava, Springer, 1990. The more cars there are on a road, the slower the speed of the traffic. The transportation planner needs to know the dependence of speed on density in order to predict travel times for future highways. The data describes the DENSITY = vehicles per mile, and SPEED = miles per hour 2. Develop a simple regression model to describe the density of traffic as a function of the speed. Test the assumptions of your model and summarize your findings. What does the model predict as the mean density when the traffic is flowing at 25 mph? 3. If y = density of traffic and x = speed in mph, examine at least two transformations of the (x,y) data that might improve your model and discuss your findings. Choose a final model that you think best describes the data and explain your choice. What does your model now predict as the mean density when the traffic is flowing at 25 mph? Traffic Data DENSITY SPEED 20.4 38.8 27.4 31.5 106.2 10.6 80.4 16.1 141.3 7.7 130.9 8.3 121.7 8.5 106.5 11.1 130.5 8.6 101.1 11.1 123.9 9.8 144.2 7.8 29.5 31.8 30.8 31.6 26.5 34.0 35.7 28.9 30.0 28.8 106.2 10.5 97.0 12.3 90.1 13.2 106.7 11.4 99.3 11.2 107.2 10.3 109.1 11.4