Epidemiology & Biostatistics

Epidemiology & Biostatistics

Epidemiology & Biostatistics (MPH406)

Index No. EPID6001 (EPID6002)

 

Assignment 2 Semester/Session 2, 2018

 

PLEASE READ THE INSTRUCTIONS CAREFULLY BEFORE YOU COMMENCE.

 

INSTRUCTIONS:

  • This assignment is due to be handed in on OUA Week 12, Wednesday 3rd October 2018 AWST
  • The total marks are 50 marks and this assignment counts towards 25% of your final mark for this unit. It is a requirement of this course that all assessments be completed on an independent basis (i.e., your own work).
  • Submission:
  • Step 1: Submit your assignment to Turnitin (plagiarism detection software) via the “Turnitin Assignment 2
  • Step 2: Resubmit a revised Assignment 2 to “Turnitin Assignment 2: Revision 1” if Originality Report from Turnitin in Step 1 suggested a revision is necessary.
  • Step 3: submit this final revised version of assignment 2 to Blackboard thru Assessmentsà All Students Assessments à Assessment 2: Assignment 2 by clicking the assignment title: Assignment 2’.

 

Please note

  • Do plan ahead to avoid late submission as it may take hours to obtain the Originality Report from Turnitin.
  • Your assignment will NOT be marked unless you have submitted to both Blackboard AND Turnitin.
  • The assignment will not be accepted unless the Declaration below is signed. All forms of plagiarism, cheating and unauthorised collusion are regarded seriously by the University and could result in penalties including failure and possible exclusion from the University. Please do make sure you avoid plagiarisms at all times!
  • Use Stata to obtain the statistics and analyses. Copy and paste relevant sections of the computer output into each question if applicable, and fully interpret the results you present.
  • Include only relevant Stata output as part of the assignment or marks will be deducted, but do not include more than one copy of each table/graph.
  • Do not submit Stata output separately with your assignment: the relevant Stata output should be copied and pasted into your assignment for corresponding questions.
  • Submit your assignment as one Word document. Otherwise, it will not be accepted
  • Please provide your answers keeping the questions with assigned marks to facilitate marking.

 

  • You will be able to see your mark and marked assignment with feedback under Blackboard in approximately 3 weeks after the due date.

 

 

  • Late assessment policy (if a student does not have an approved assessment extension):

This ensures that the requirements for submission of assignments and other work to be assessed are fair, transparent, equitable, and that penalties will be consistently applied in this unit.

  • For assessment items submitted within the first 24 hours after the due date/time, students will be penalised by a deduction of 5% of the total marks allocated for the assessment task;
  • For each additional 24 hour period commenced an additional penalty of 10% of the total marks allocated for the assessment item will be deducted; and
  • Assessment items submitted more than 168 hours late (7 calendar days) will receive a mark of zero.

 

Details can be found from unit outline.

 

  • You are required to keep a copy of the completed assignment for your own record.

 

Please do contact your lecturer/tutor if you have any queries not covered in the explanations given:  Dr Yun Zhao: [email protected]

 

Declaration

As I type (sign) my name below, I declare that the submitted assignment is my own work and has not previously been submitted for assessment. I have carried out the analyses, interpreted and answered all questions in this assignment myself. This work complies with Curtin University rules concerning plagiarism and copyright. I understand that all forms of plagiarism, cheating and unauthorised collusion are regarded seriously by the University and could result in penalties including failure and possible exclusion from the University. I have retained a copy of this assignment for my own records.

 

 

__________________________      ______________________    _______________

Name & ID of student                           Signature of student                     Date

 

Note: electronic signature is accepted
Assignment 2: BIOSTATISTICS

(Total marks 50 – to be scaled to 25%)

 

Question ONE

(Total: 23 marks)

 

In a study a fictitious random sample (Assign2Pulse2018S2.dta) was obtained with information of pulse rate, gender, smoke status, level of activity, and BMI measured for 80 subjects. One of the aims for the study is to understand the difference in pulse rate between overweight and non-overweight people, and subjects’ gender differences need to be accounted for as well. In this question, you are given one continuous dependent variable Y (pulse) and two categorical independent variables (gender and BMICat) as follows:

 

Table 1

Variable Description
pulse Pulse rate beat per minute
gender 1 = male, 2 = female
BMICat 1 = non-overweight, 2 =overweight

 

Your task is to investigate the relationship between pulse and BMICat using appropriate procedures and techniques, accounting for gender in the analyses as a potential effect modifier. Use a significance level α of 5%.

 

Hint:

  • You may find helpful to follow the instructions in Lab 1 for t test.
  • You may find helpful to follow the strategy for analyses given in Module B6 and Computing Lab 6.

 

  1. (2 marks) Obtain the sample mean pulse rate, standard deviation (both with 3 decimal places) and number for each BMICat group against each gender group and fill the following table. Calculate and Comment on the difference in the mean pulse between non-overweight and overweight subjects for each gender group in relation to a possible interaction between gender and BMICat. (No Stata output(s) are required for this question)

 

 

Gender

 

BMI

Pulse
mean s.d n
Female  

Non-overweight

Overweight Total

     
Male  

Non-overweight

Overweight Total

     
Total Non-overweight

Overweight

     

 

 

 

 

 

 

 

 

  1. (4 marks) Test the hypothesis that the population mean pulse rate is the same for non-overweight and overweight subjects.

(No Stata output(s) are required for this question)

 

  1. Hypotheses: (1 mark)

HO: __________________________________________________

 

HA: __________________________________________________

 

  1. Name the t test you used for the hypothesis (0.5 marks):___________________

 

  • P value of the t test you used (0.5 marks): ___________________________________

 

  1. Conclusion of the hypothesis test: (2 marks)

 

____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

 

  1. (7 marks) Now assess the difference the population mean pulse rate between non-overweight and overweight subjects using a multiple regression model, accounting for gender in the analyses as a potential effect modifier.

 

  1. Name the multiple regression model which is appropriate for this question. Why?

 

___________________________________________________________________

(1 mark)

 

  1. The mean plots for this question are given below:

 

 

 

 

Based on the mean plots given, make a justification on whether the interaction term between BMICat and gender should be included and assessed in your model. (1 mark)

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

 

  • Fit the model you recommended for pulse on BMICat and gender. (2 marks)

Attach relevant Stata output (eg., ANOVA table) here

 

_______________________________________________________________________

_______________________________________________________________________

 

  1. Based on the ANOVA table in Question iii, test the hypothesis that there is no interaction in the population between BMICat and gender, including your interpretations and conclusions (1 mark).

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

 

  1. Comment on whether a further model is necessary by selecting an answer below (2 marks):

 

  1. Yes, then which variable should be removed from the model? Why?

Attach Stata output (eg., parameter estimation table) here

 

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

 

  1. No, there is not necessary to have a further model. Why?

Attach Stata output (eg., parameter estimation table) here

 

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

 

 

 

  1. (6 marks) Based on your final model.

 

    1. Write down the regression equation (estimated regression coefficients are rounded up to 3 decimal places)

(1 mark)

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

 

  1. Interpret the constant in the final model. (1 mark)

 

_______________________________________________________________________

_______________________________________________________________________

 

    1. Calculate the predicted pulse rate for male overweight and male non-overweight subjects based on the regression equation obtained in Q4 a. (2 marks)

 

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

 

  1. Do you agree that the regression coefficient ‘-207’ for BMIcat could be interpreted as ‘non-overweight subjects had a lower pulse rate by 15.207 beats/per minute than overweight subjects on average’? (2 marks)

 

Yes. I agree because…

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

 

 

No. I disagree because…

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

 

 

 

 

 

 

 

 

 

  1. (4 marks) Using information you obtained from the final model in Q4, draw a detailed conclusion for the final model with regard to the research aim.

 

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

QUESTION TWO

(Total: 27 marks)

To identify the predictors of obesity as measured by subject’s body mass index (a function of their weight and height), a fictitious data set (Assign2BMI2018S2.dta) from a random sample of 110 adults was used. The independent variables to be assessed are gender, smoking status, alcohol consumption (grams of ethanol in a week), socio-economic status, whether the person regularly participates in physical activity or not and subject’s age. The information of the variables is given below in Table 1:

 

Table 1: Variables information

Variable                      Description
age                   The age of the participant (in years)
gender             The gender of the participant: { 1 = Male , 2 = Female }
smoking           Whether the person smokes or not: { 1 = Yes , 2 = No }
alcohol             Alcohol consumption (grams of ethanol)
physact            Whether the person regularly participates in physical activity:

{ 1 = Yes , 2 = No }

ses                   The socio-economic status of the participant:

{ 1 = Lower , 2 = Medium, 3 = Higher}

BMI                Body mass index (in kg/m2)

 

You task is to investigate the relationship between BMI (dependent variable) and all the independent variables given in the above table, using the appropriate procedures and techniques.

 

 

Hint:

  1. You may find it helpful to follow the strategy for analyses given in Module B7 & B8 and computing lab 8.

 

  1. (7 marks) Exploratory analyses using descriptive statistics and plots.

 

  • Examine the linear relationship between BMI and age using scatter plot and Pearson’s correlation coefficient. (1 mark)

(No Stata output(s) are required for this question)

 

Pearson’s correlation coefficient = ______________________, p = ________________

Make a conclusion of the relation relationship between BMI and age:

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

 

 

 

 

 

 

 

  • Test the association for Y (i.e., BMI.) against selected categorical X (independent samples t tests or one-way ANOVA) to assess the strength of the association between the X’s and BMI, i.e., for a factor, are there significant differences between the groups?

 

  1. BMI and gender (2 marks)

Attach Stata output here

 

List the test you used: _______________________________________________

 

Provide the corresponding P value obtained:  ______________________________

Make a conclusion of your test _______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

 

  1. BMI and smoking (2 marks)

Attach Stata output here

 

Make a conclusion of your test _______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

 

  1. BMI and physact (1 mark)

(No Stata output(s) are required for this question)

 

Make a conclusion of your test _______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

 

 

 

 

 

 

 

  1. BMI and ses (1 mark)

(No Stata output(s) are required for this question)

 

List the test you used: _______________________________________________

 

Make a conclusion of your test _______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

 

  • (4 marks) Details of your model building process.

You need to

  1. Build a parsimonious regression model for BMI, using a backward elimination process.
  2. Treat All the independent variable equally, i.e., there is no major variable of interest.
  3. Do NOT test for interaction or confounding effects.
  4. List each step of modelling as follows:

 

  1. Model 1 (1 mark)

List variables included initially: ________________________________________________

Attach Stata output here

 

  1. Model 2 (1 mark)

List variables removed from Model 1: ___________________________________________

Reason for removing: ________________________________________________

Attach Stata output here

 

  1. Model 3 (1 mark)

List variables removed from Model 2: ___________________________________________

Reason for removing: ________________________________________________

Attach Stata output here

 

  1. Model 4 (1 mark)

List variables removed from Model 3: ___________________________________________

Reason for removing: ________________________________________________

Attach Stata outputs (if this is your final model, please attach parameter estimation table too) here

 

 

 

 

 

 

 

 

 

 

  • (4 marks)Assessment of assumptions for the final model obtained in Question 2 above (include your interpretations and conclusions).

 

  • Assess and comment on the normality of the standardised residuals; (1 mark)

(No Stata output(s) are required for this question)

 

Your conclusion: the standardised residuals can be assumed to have a __________.

  1. Normal distribution
  2. Strong Positively skewed distribution
  3. Strong Negatively skewed distribution
  4. Bimodal distribution

 

List the name of the 5 measures you used for the assessment

 

__________________________________________________________________

 

 

  • Assess and comment on the assumption of the constant variation; (1 mark)

Attach Stata output here

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

 

  • Assess and comment on the assumption of equal variances. (2 marks)

Attach Stata output here

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

 

 

  • (3 marks) Assess the goodness-of-fit of the final model

(No Stata output(s) are required for this question)

 

  • List the adjusted R2 (0.5 marks)

_______________________________________________________________________

 

  • List the range of standardized residuals values. (0.5 marks)

_______________________________________________________________________

 

  • Interpret the adjusted R2 value and make comment on the range of standardized residuals in relation to the fit of the final model, do you think the final model is a reasonable good model for further practical application of prediction? (2 marks)

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

 

 

  • (7 marks)Detailed interpretations and conclusions.

(No Stata output(s) are required for this question)

 

  • Write down the regression equation (three decimal places) based on the final model you obtained in Q2. (1 mark)

 

_______________________________________________________________________

 

  • Interpret the regression coefficients and their confidence interval(s) for those variables included in the final (6 marks)

For physact _______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

For age

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

 

For smoking

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

 

  • (2 marks) Based on the final model,

 

  • Obtain the mean predicted value of BMI for a 70 years old non-smoker who participated in physical activity based on your final model. (1 mark)

_______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

 

  • Based on the final model, Yun concluded that the difference in the mean predicted BMI between her two friends (one is a smoker and the other a non-smoker) is 1.586 kg/m2, do you think her conclusion is correct? Justify your answer. (1 mark)

______________________________________________________________________

_______________________________________________________________________

_______________________________________________________________________

End of the Assignment 2