Epidemiology & Biostatistics
Epidemiology & Biostatistics (MPH406)
Index No. EPID6001 (EPID6002)
Assignment 2 Semester/Session 2, 2018
PLEASE READ THE INSTRUCTIONS CAREFULLY BEFORE YOU COMMENCE.
INSTRUCTIONS:
 This assignment is due to be handed in on OUA Week 12, Wednesday 3rd October 2018 AWST
 The total marks are 50 marks and this assignment counts towards 25% of your final mark for this unit. It is a requirement of this course that all assessments be completed on an independent basis (i.e., your own work).
 Submission:
 Step 1: Submit your assignment to Turnitin (plagiarism detection software) via the “Turnitin Assignment 2“
 Step 2: Resubmit a revised Assignment 2 to “Turnitin Assignment 2: Revision 1” if Originality Report from Turnitin in Step 1 suggested a revision is necessary.
 Step 3: submit this final revised version of assignment 2 to Blackboard thru Assessmentsà All Students Assessments à Assessment 2: Assignment 2 by clicking the assignment title: ‘Assignment 2’.
Please note
 Do plan ahead to avoid late submission as it may take hours to obtain the Originality Report from Turnitin.
 Your assignment will NOT be marked unless you have submitted to both Blackboard AND Turnitin.
 The assignment will not be accepted unless the Declaration below is signed. All forms of plagiarism, cheating and unauthorised collusion are regarded seriously by the University and could result in penalties including failure and possible exclusion from the University. Please do make sure you avoid plagiarisms at all times!
 Use Stata to obtain the statistics and analyses. Copy and paste relevant sections of the computer output into each question if applicable, and fully interpret the results you present.
 Include only relevant Stata output as part of the assignment or marks will be deducted, but do not include more than one copy of each table/graph.
 Do not submit Stata output separately with your assignment: the relevant Stata output should be copied and pasted into your assignment for corresponding questions.
 Submit your assignment as one Word document. Otherwise, it will not be accepted
 Please provide your answers keeping the questions with assigned marks to facilitate marking.
 You will be able to see your mark and marked assignment with feedback under Blackboard in approximately 3 weeks after the due date.
 Late assessment policy (if a student does not have an approved assessment extension):
This ensures that the requirements for submission of assignments and other work to be assessed are fair, transparent, equitable, and that penalties will be consistently applied in this unit.
 For assessment items submitted within the first 24 hours after the due date/time, students will be penalised by a deduction of 5% of the total marks allocated for the assessment task;
 For each additional 24 hour period commenced an additional penalty of 10% of the total marks allocated for the assessment item will be deducted; and
 Assessment items submitted more than 168 hours late (7 calendar days) will receive a mark of zero.
Details can be found from unit outline.
 You are required to keep a copy of the completed assignment for your own record.
Please do contact your lecturer/tutor if you have any queries not covered in the explanations given: Dr Yun Zhao: [email protected]
Declaration
As I type (sign) my name below, I declare that the submitted assignment is my own work and has not previously been submitted for assessment. I have carried out the analyses, interpreted and answered all questions in this assignment myself. This work complies with Curtin University rules concerning plagiarism and copyright. I understand that all forms of plagiarism, cheating and unauthorised collusion are regarded seriously by the University and could result in penalties including failure and possible exclusion from the University. I have retained a copy of this assignment for my own records.
__________________________ ______________________ _______________
Name & ID of student Signature of student Date
Note: electronic signature is accepted
Assignment 2: BIOSTATISTICS
(Total marks 50 – to be scaled to 25%)
Question ONE
(Total: 23 marks)
In a study a fictitious random sample (Assign2Pulse2018S2.dta) was obtained with information of pulse rate, gender, smoke status, level of activity, and BMI measured for 80 subjects. One of the aims for the study is to understand the difference in pulse rate between overweight and nonoverweight people, and subjects’ gender differences need to be accounted for as well. In this question, you are given one continuous dependent variable Y (pulse) and two categorical independent variables (gender and BMICat) as follows:
Table 1
Variable  Description 
pulse  Pulse rate beat per minute 
gender  1 = male, 2 = female 
BMICat  1 = nonoverweight, 2 =overweight 
Your task is to investigate the relationship between pulse and BMICat using appropriate procedures and techniques, accounting for gender in the analyses as a potential effect modifier. Use a significance level α of 5%.
Hint:
 You may find helpful to follow the instructions in Lab 1 for t test.
 You may find helpful to follow the strategy for analyses given in Module B6 and Computing Lab 6.
 (2 marks) Obtain the sample mean pulse rate, standard deviation (both with 3 decimal places) and number for each BMICat group against each gender group and fill the following table. Calculate and Comment on the difference in the mean pulse between nonoverweight and overweight subjects for each gender group in relation to a possible interaction between gender and BMICat. (No Stata output(s) are required for this question)
Gender 
BMI 
Pulse  
mean  s.d  n  
Female 
Nonoverweight Overweight Total 

Male 
Nonoverweight Overweight Total 

Total  Nonoverweight
Overweight 
 (4 marks) Test the hypothesis that the population mean pulse rate is the same for nonoverweight and overweight subjects.
(No Stata output(s) are required for this question)
 Hypotheses: (1 mark)
H_{O}: __________________________________________________
H_{A}: __________________________________________________
 Name the t test you used for the hypothesis (0.5 marks):___________________
 P value of the t test you used (0.5 marks): ___________________________________
 Conclusion of the hypothesis test: (2 marks)
____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
 (7 marks) Now assess the difference the population mean pulse rate between nonoverweight and overweight subjects using a multiple regression model, accounting for gender in the analyses as a potential effect modifier.
 Name the multiple regression model which is appropriate for this question. Why?
___________________________________________________________________
(1 mark)
 The mean plots for this question are given below:
Based on the mean plots given, make a justification on whether the interaction term between BMICat and gender should be included and assessed in your model. (1 mark)
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
 Fit the model you recommended for pulse on BMICat and gender. (2 marks)
Attach relevant Stata output (eg., ANOVA table) here
_______________________________________________________________________
_______________________________________________________________________
 Based on the ANOVA table in Question iii, test the hypothesis that there is no interaction in the population between BMICat and gender, including your interpretations and conclusions (1 mark).
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
 Comment on whether a further model is necessary by selecting an answer below (2 marks):
 Yes, then which variable should be removed from the model? Why?
Attach Stata output (eg., parameter estimation table) here
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
 No, there is not necessary to have a further model. Why?
Attach Stata output (eg., parameter estimation table) here
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
 (6 marks) Based on your final model.

 Write down the regression equation (estimated regression coefficients are rounded up to 3 decimal places)
(1 mark)
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
 Interpret the constant in the final model. (1 mark)
_______________________________________________________________________
_______________________________________________________________________

 Calculate the predicted pulse rate for male overweight and male nonoverweight subjects based on the regression equation obtained in Q4 a. (2 marks)
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
 Do you agree that the regression coefficient ‘207’ for BMIcat could be interpreted as ‘nonoverweight subjects had a lower pulse rate by 15.207 beats/per minute than overweight subjects on average’? (2 marks)
Yes. I agree because…
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
No. I disagree because…
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
 (4 marks) Using information you obtained from the final model in Q4, draw a detailed conclusion for the final model with regard to the research aim.
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
QUESTION TWO
(Total: 27 marks)
To identify the predictors of obesity as measured by subject’s body mass index (a function of their weight and height), a fictitious data set (Assign2BMI2018S2.dta) from a random sample of 110 adults was used. The independent variables to be assessed are gender, smoking status, alcohol consumption (grams of ethanol in a week), socioeconomic status, whether the person regularly participates in physical activity or not and subject’s age. The information of the variables is given below in Table 1:
Table 1: Variables information
Variable Description 
age The age of the participant (in years) 
gender The gender of the participant: { 1 = Male , 2 = Female } 
smoking Whether the person smokes or not: { 1 = Yes , 2 = No } 
alcohol Alcohol consumption (grams of ethanol) 
physact Whether the person regularly participates in physical activity:
{ 1 = Yes , 2 = No } 
ses The socioeconomic status of the participant:
{ 1 = Lower , 2 = Medium, 3 = Higher} 
BMI Body mass index (in kg/m^{2}) 
You task is to investigate the relationship between BMI (dependent variable) and all the independent variables given in the above table, using the appropriate procedures and techniques.
Hint:
 You may find it helpful to follow the strategy for analyses given in Module B7 & B8 and computing lab 8.
 (7 marks) Exploratory analyses using descriptive statistics and plots.
 Examine the linear relationship between BMI and age using scatter plot and Pearson’s correlation coefficient. (1 mark)
(No Stata output(s) are required for this question)
Pearson’s correlation coefficient = ______________________, p = ________________
Make a conclusion of the relation relationship between BMI and age:
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
 Test the association for Y (i.e., BMI.) against selected categorical X (independent samples t tests or oneway ANOVA) to assess the strength of the association between the X’s and BMI, i.e., for a factor, are there significant differences between the groups?
 BMI and gender (2 marks)
Attach Stata output here
List the test you used: _______________________________________________
Provide the corresponding P value obtained: ______________________________
Make a conclusion of your test _______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
 BMI and smoking (2 marks)
Attach Stata output here
Make a conclusion of your test _______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
 BMI and physact (1 mark)
(No Stata output(s) are required for this question)
Make a conclusion of your test _______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
 BMI and ses (1 mark)
(No Stata output(s) are required for this question)
List the test you used: _______________________________________________
Make a conclusion of your test _______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
 (4 marks) Details of your model building process.
You need to
 Build a parsimonious regression model for BMI, using a backward elimination process.
 Treat All the independent variable equally, i.e., there is no major variable of interest.
 Do NOT test for interaction or confounding effects.
 List each step of modelling as follows:
 Model 1 (1 mark)
List variables included initially: ________________________________________________
Attach Stata output here
 Model 2 (1 mark)
List variables removed from Model 1: ___________________________________________
Reason for removing: ________________________________________________
Attach Stata output here
 Model 3 (1 mark)
List variables removed from Model 2: ___________________________________________
Reason for removing: ________________________________________________
Attach Stata output here
 Model 4 (1 mark)
List variables removed from Model 3: ___________________________________________
Reason for removing: ________________________________________________
Attach Stata outputs (if this is your final model, please attach parameter estimation table too) here
 (4 marks)Assessment of assumptions for the final model obtained in Question 2 above (include your interpretations and conclusions).
 Assess and comment on the normality of the standardised residuals; (1 mark)
(No Stata output(s) are required for this question)
Your conclusion: the standardised residuals can be assumed to have a __________.
 Normal distribution
 Strong Positively skewed distribution
 Strong Negatively skewed distribution
 Bimodal distribution
List the name of the 5 measures you used for the assessment
__________________________________________________________________
 Assess and comment on the assumption of the constant variation; (1 mark)
Attach Stata output here
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
 Assess and comment on the assumption of equal variances. (2 marks)
Attach Stata output here
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
 (3 marks) Assess the goodnessoffit of the final model
(No Stata output(s) are required for this question)
 List the adjusted R^{2} (0.5 marks)
_______________________________________________________________________
 List the range of standardized residuals values. (0.5 marks)
_______________________________________________________________________
 Interpret the adjusted R^{2} value and make comment on the range of standardized residuals in relation to the fit of the final model, do you think the final model is a reasonable good model for further practical application of prediction? (2 marks)
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
 (7 marks)Detailed interpretations and conclusions.
(No Stata output(s) are required for this question)
 Write down the regression equation (three decimal places) based on the final model you obtained in Q2. (1 mark)
_______________________________________________________________________
 Interpret the regression coefficients and their confidence interval(s) for those variables included in the final (6 marks)
For physact _______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
For age
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
For smoking
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
 (2 marks) Based on the final model,
 Obtain the mean predicted value of BMI for a 70 years old nonsmoker who participated in physical activity based on your final model. (1 mark)
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
 Based on the final model, Yun concluded that the difference in the mean predicted BMI between her two friends (one is a smoker and the other a nonsmoker) is 1.586 kg/m^{2}, do you think her conclusion is correct? Justify your answer. (1 mark)
______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
End of the Assignment 2