A report can be submitted as either a PDF (preferred)

301114 The Nature of Data Autumn 2017 In this assignment there are 3 questions (split into various parts). For each question/part you should draw appropriate plots, conduct the analysis and describe the conclusions in words. A report can be submitted as either a PDF (preferred) or Word document. It is advisable that your report includes the R-code used so that partial credit can be awarded in case of error. You can create the word/pdf file using Rmarkdown if you know how, but this is not compulsory. Submission is due by Friday for week 12. Submission is by the vUWS online system. 1. a. The following table shows the number of prisoners in Australian prisons in 2014, broken down by age group and gender. Is there evidence that the age distribution differs by gender? Make an appropriate plot of the data. Males Females 19 years and under 920 65 20 to 24 years 4684 316 25 to 29 years 5624 475 30 to 34 years 5407 483 35 to 39 years 4609 404 40 to 44 years 3756 349 45 to 49 years 2408 234 50 to 54 years 1541 135 55 to 59 years 975 72 60 to 64 years 595 32 65 years and over 681 27 prisoners = cbind(Males = c(920, 4684, 5624, 5407, 4609, 3756, 2408, 1541, 975, 595, 681), Females= c(65, 316, 475, 483, 404, 349, 234, 135, 72, 32, 27)) rownames(prisoners) = c(“19 years and under”, “20 to 24 years”, “25 to 29 years”, “30 to 34 years”, “35 to 39 years”, “40 to 44 years”, “45 to 49 years”, “50 to 54 years”, “55 to 59 years”, “60 to 64 years”, “65 years and over”) b. Data has been collected on the number of car insurance claims in two areas of Sweden, Stockholm and surrounds and rural southern Sweden. In Stockholm there were 23174 claims from 326149 policies, whereas in the rural area there were 31913 claims from 846957 policies. Is there evidence that the rate of car insurance claims is different in the two areas? 2. The file PIMA.csv contains information about the Pima people from North America. A number of social and environmental factors have contributed to them having one of the highest rates of type 2 diabetes in the world. This data set contains information of around 700 female individuals, including • ever.pregnant — Whether the individual has ever been pregnant • diastolic — diastolic blood pressure • bmi — Body Mass Index • age — age in years 1 Is there evidence that body mass index differs for women who have had a pregnancy versus those that haven’t? In what direction is any difference? Make an appropriate plot of the data. 3. a. Diastolic blood pressure is thought to vary by age. Make an appropriate plot of the data. Compute a 95% confidence interval for the Pearson correlation of age and diastolic blood pressure in female Pima people. b. Fit the simple linear regression of diastolic and age. Interpret the slope of the regression. Compute a 95% confidence interval for the mean diastolic blood pressure of a 40 year old female of the Pima people. Question 1 Question 2 Question 3 Total Mark Possible 15 10 15 40 This assignment is worth 40% of the unit assessment tasks. 2

Leave a Reply

Your email address will not be published.