Decision Analytics in Practice

Analysis of Emergency
Department Data of Epworth
HealthCare
MIS779 – Decision Analytics in Practice
Group 8
Assessment 2
BAJAJ, Puneet – 217242489
GILL, Ikam – 217459128
ORAS, Remson- 217046545

1
TABLE OF CONTENTS
EXECUTIVE SUMMARY……………………………………………………………………………………………………..2
INTRODUCTION ………………………………………………………………………………………………………………..3
APPROACH …………………………………………………………………………………………………………………………3
Data Collection………………………………………………………………………………………………………………3
Data Cleansing ………………………………………………………………………………………………………………4
Statistical and Analytical Methods……………………………………………………………………………..4
ASSUMPTIONS…………………………………………………………………………………………………………………..5
DATA ANALYSIS………………………………………………………………………………………………………………..5
Exploratory Analysis……………………………………………………………………………………………………..5
Summary of Initial Exploratory Analysis……………………………………………………………….5
Secondary Exploratory Analysis ……………………………………………………………………………..6
Advanced Analysis ………………………………………………………………………………………………………..7
Dataset: Variables of Interest…………………………………………………………………………………7
Data Modelling……………………………………………………………………………………………………………7
Linear Regression Model Evaluation ………………………………………………………………………8
RECOMMENDATIONS……………………………………………………………………………………………………..10
GENERAL ANALYTICS ISSUES………………………………………………………………………………………10
APPENDICES…………………………………………………………………………………………………………………….11
APPENDIX A…………………………………………………………………………………………………………………….11
Initial Exploratory Analysis…………………………………………………………………………………………..11
APPENDIX B…………………………………………………………………………………………………………………….16
Secondary Exploratory Analysis……………………………………………………………………………………16
APPENDIX C…………………………………………………………………………………………………………………….19
Predictive Modelling……………………………………………………………………………………………………..19
REFERENCES……………………………………………………………………………………………………………………21
2
EXECUTIVE SUMMARY
Epworth HealthCare is the largest non-profit organisation that provides health care
services to their customers across Victoria. Epworth is well-known for providing wide range
of services to patients from their knowledgeable team and being excellent in delivering
quality service. However, Epworth’s emergency department is having issues with coming
up with an optimal staffing requirement to cater to patients that enter their emergency
department.
The purpose of this study is to explore and understand the three-year dataset that was
gathered from Epworth’s emergency department database. The dataset includes previous
patient visits and employee time sheets and exploration was conducted to help provide
Epworth with information regarding optimal staffing requirement based on trends from
historic patient visits.
Upon conducting exploratory analysis on the patient dataset, it was found out that there
is a seasonal pattern that exists wherein increase in patient visits occur during the months
of March, May, August and December. By observing the trend in patient visit by time, it
was found that spikes in patient arrival happens during 10 AM, 1 PM, 4 PM and 7 PM. Thus,
increase in staffing requirements during these periods are recommended.
Also, the patient length of stay from arrival in the emergency department was analysed
and it was found out that the median length of stay of patient is 175 minutes and most of
the patients’ stay inside the emergency department is at a length of 150 minutes.
With the exploratory results gathered and researches conducted, the hypothesis that the
group had come up with is that length of stay of patients is a measure of overcrowding
and that an increase in patient length of stay in the emergency department also increases
the need for staff to cater to the patients.
A linear regression model is then generated in this study to predict patient length of stay
to inform the organisation if there is a need to increase staff in the next hour of operation
throughout the day. The model resulted to an average squared correlation of 19.5% and
an RMSE of 128.86 minutes. Thus, improvement of the model through better preprocessing of data and use of ensembles is recommended.

3
INTRODUCTION
Emergency care is one of the most challenging areas in the healthcare industry and this
challenge is usually considered under the basis of crowding and urgency. The importance
of the urgency of care is usually derived based on a series of physical and psychological
matters all emergency situations and in these situations a life-threatening condition leads
a patient to the emergency department of a healthcare provider.
Studies define that overcrowding in an emergency department is the situation where the
emergency department function is impeded primarily because the entire sum of the
patients undergoing treatment, waiting to be tended to and the ones leaving the care of
the professionals exceeds the previously arranged staffing requirements and this created
the most common issue with an emergency department which is basically the target
variable that needs to be resolved here.
The emergency department at Epworth’s provides healthcare services to many people
every day and in accordance, the demand for emergency care and specialist treatment
creates a substantial workload for Epworth to align their staffing and scheduling
procedures keeping the specific requirements of the patients in mind.
Hence, in order to help provide Epworth with the optimal staffing arrangements, Deakin
University students were asked to look into years’ worth of data provided by Epworth in
regard to their emergency department and in the process were asked to identify the case
mix and patterns of both triage scales and symptoms reporting.
The purpose of this study is to explore the patient and employee dataset and search for
patterns or trends that would be useful in providing Epworth with information regarding
optimal staffing requirement in their emergency department.
APPROACH
Data Collection
The dataset that was used in this study was extracted from Epworth’s emergency
department database. It contains data regarding the time sheet of employees, roster and
shift schedule that dates from July 2016 until January 2019. Moreover, the dataset
contains data regarding patients that have visited the department from July 2016 to July
2018.

4
The ‘TimesheetsActuals’ tab and ‘PatientAttendences’ tab were the main data tabs that
have been used for analysis and creating recommendations regarding the problem of
staffing requirement.
The ‘TimesheetsActuals’ tab contained 13,052 of records of employees’ start date together
with their start and end time. It also contains the role of employees and their shift type.
The ‘PatientAttendences’ tab on the other hand contains 14,878 records regarding the
patients age, triage category, time of arrival and departure in the emergency department,
arrival mode, discharge location, admitting ward and the length of time inside the
emergency department.
Data Cleansing
Duplicate values have been found within the ‘TimesheetsActuals’ tab wherein 19,448
records had to be removed and thus leaving it with 13,052 records for analysis. There
were also 755 records that have been found within the ‘PatientAttendences’ tab that had
to be removed because of having an invalid data source input and having remarks as not
correct. Negative minutes in the length of stay from doctor seen and arrival to triage were
also removed because it might lead to misinterpretation.
Statistical and Analytical Methods
Microsoft Excel was mainly used in exploring the data and in conducting statistical and
inferential analysis. The data analysis feature of Microsoft Excel was used to provide
descriptive statistics about the data.
The visualisations such as the line and bar charts were generated with the use of Tableau
however some of it had to be created in Microsoft Excel as well.
RapidMiner was the tool used to create the predictive model wherein the Linear Regression
model was used to estimate the patients’ length of stay.

5
ASSUMPTIONS
The analysis conducted in the study lead to assumptions that are enumerated as follows:
1. There is no direct link between the patient data and employee data to specify the
role of employee needed to attend to a patient (example: doctor specialisation).
2. The decrease in patient visit in the month of July for 2018 was because collected
data was only for half month.
3. The data used regarding the patients’ visit was only for two years since data
gathered was for last half of 2016, 2017 and first half of 2018.
DATA ANALYSIS
Exploratory Analysis
Summary of Initial Exploratory Analysis
An initial exploratory analysis was conducted (see Appendix A – Figures 1 to 9) regarding
the dataset wherein it was found out that most of the patients that arrive in the emergency
department are from ages 0-10 followed by patients aged 61-70 (see Appendix A – Figure
1). Patient visit count per quarter was also explored wherein it was found that there is an
average increase in patient visit of 14.93% per quarter and a decrease of 1.09% was
observed for Q1 2018 to Q2 2018 (see Appendix A – Figure 2). From this exploration, a
forecast from Tableau was generated wherein patients for Q3 2018 will increase to 2,395
while Q4 2018 patient visit will increase to 2,767 (see Appendix A – Figure 3).
A further exploration of the daily patient visit was conducted during the initial exploration
wherein it was seen that Mondays, Saturdays and Sundays are the busiest days. The
timing of arrival of patients was also explored and it shows that from 8AM to 9PM of every
day, the patient visits exceeds the average daily patient visit (see Appendix A – Figure 4).
Also, patients who were categorised as triage 4 are the most number of patients that visit
the emergency department. It is then followed by patients categorised under triage 3 then
followed by triage 5 patients (see Appendix A – Figure 5). The length of patient stay in a
specific ward was explored and showed that patients in the EG5DO stays there at an
average of 2,025 minutes (see Appendix A – Figure 6).
Exploration of the employees’ data was conducted wherein it was found that not all the
employee roles are available throughout the year. Example of this is that the Medical Team
Leader was only present during Q2 and Q3 of 2017 (see Appendix A – Figure 8). Lastly,

6
an observation of the total working hours was conducted wherein there is an average
increase of 6.48% for the total working hours for employees under Department 76 and an
average increase of 49.7% was observed for Department 78 (see Appendix A – Figure 9).
Secondary Exploratory Analysis
A secondary exploratory analysis was conducted (see Appendix B – Figures 10 to 16) to
have further understanding of the data. Some of the explorations previously was further
improved in this section to come up with a better connection in providing a solution to the
emergency department problem.
The further exploration conducted provided an insight about the pattern in monthly patient
visit. It was observed that a seasonal pattern occurs wherein an increase in patient visits
occurs during the months of March, May, August and December (see Appendix B – Figure
10). The patient arrival on an hourly basis was also explored and it was observed that
increase in patient arrival happens during 10 AM, 1 PM, 4 PM and 7 PM (see Appendix B –
Figure 11). A comparison in the number of patients in each triage category was conducted
and it shows that most patients are categorised under the triage 4 category wherein it is
twice as much as the number of patients under triage 3 category (see Appendix B – Figure
12).
Descriptive statistics was conducted (see Appendix B – Figure 13) to the numerical
variables in the patient data wherein it was found that the median length of stay of patients
in the emergency department is 175 minutes. It was also found that most of the patients
stay in the emergency department at a length of 150 minutes. The bottom 25% of patients
were found to stay in the emergency department for up to 100 minutes and the top 25%
of patients were staying for 266 minutes and more.
The average count of employees was compared to the average count of patient visit on an
hourly basis (see Appendix B – Figure 14). The figure shows that most of the employees
start their shift from 7 AM and 8 AM. Some employees were also found to start their shift
by 1 PM, 4 PM, 5 PM and 6 PM. The aim for this visualisation is to show if the number of
employees can cater to the arrival of patients at any given hour. The count of employees
per month was also observed wherein the number of employees is at its highest in October
for the year 2016. During 2017, the count of employees is at its highest in August and for
the month of May in 2018 (see Appendix B – Figure 15).
Based on the explorations that was conducted, it was thought that by predicting the patient
length of stay, an optimal staff requirement will be gathered from it. Since the longer a

7
patient stays in the emergency department, the more the need for staff to take care of
patients. The predictive model is then created in the next part of this study.
Advanced Analysis
After the exploratory analysis, an advanced analysis is conducted wherein in this part of
the study, the patient length of stay is to be predicted and matched with employee hours
and availability will provide an optimal staffing requirement. The cleaned dataset was
imported to RapidMiner where the predictive model was created.
Dataset: Variables of Interest
The variables of interest (see Appendix C – Figures 16) are Arrival to Triage, Triage
Minutes, Triage to Seen Minutes, Arrival, Arrival Mode, Arrived Day, ED Length of Stay
from Arrival, ED Length of Stay from Dr Seen By Time, Month, Patient Age, Patients,
Repeat Patient, Triage Category.
The variable Arrival was derived based on the time patients arrived in the emergency
department. Three levels of categories were created under this variable where patients
that arrive from 7:00 AM to 12:00 PM was tagged as morning, 12:01 PM to 6:00 PM was
tagged as afternoon and patients who arrive 6:01 PM onwards were tagged as evening.
This was done in order to lessen the categories for time instead of having 24 levels
representing each hour.
The Patients variable was derived from the Discharge Location variable wherein patients
were tagged as admit to ward, transferred, discharged to home and other categories. The
Patient variable contained two levels of categories and these are Admitted and Discharged.
The Admitted category contains all the admissions and transfers while the Discharged
category contains all discharges, did not register, did not wait and not specified.
The Repeat Patients variable was also derived based on the uniqueness of the patient ID.
This variable has two levels of category wherein if they are repeat patients then they are
tagged as ‘Yes’ and if they have unique patient ID, they are tagged as ‘No’ or considered
as non-repeat patients.
Data Modelling
After the dataset was stored as data for modelling, the Select Attributes operator was
added to the design wherein variables of interest in predicting patient length of stay were

8
selected. The Set Role was used to identify the target variable which is the ED Patient
Length of Stay from Arrival. All the categorical variables were then converted to numerical
variables by using the Nominal to Numerical operator wherein dummy coding was set as
coding type.
Correlation Matrix operator was then used to determine the correlation between all the
attributes. The attributes that have weights greater than 0.5 were selected for modelling
(see Appendix C – Figure 17). The multi-collinear variables such as Arrival Mode =
Ambulance (AV) and ED Length of Stay from Dr Seen By Time were removed from
modelling since it may provide misinterpretations with the results. A screenshot of this
process can be seen in Appendix C – Figure 18.
Linear Regression model was the chosen model since we are trying to estimate a numerical
variable which is the patient length of stay in the emergency department. A screenshot of
the process can be seen in Appendix C – Figure 19. Cross validation with 20 folds was
applied in training and validating the model.
Linear Regression Model Evaluation
After running the regression model, the intercept and coefficients of each variable was
generated. A table below shows the coefficients of the predictors.

Variable Coefficient
Intercept 255.603
Arrived Day = Sunday -1.232
Arrived Day = Tuesday -10.497
Arrived Day = Saturday 10.375
Arrived Day = Friday 9.749
Arrived Day = Wednesday -6.687
Arrived Day = Thursday 7.597
Arrival Mode = Others -65.874
Arrival Mode = GP Referral 36.571
Arrival Mode = Non-Urgent 17.615
Arrival Mode = Ambulance (MAS) -25.646
Arrival Mode = Code Blue -36.666
Arrival = Evening -44.83
Arrival = Morning 33.601
Arrival = Afternoon 11.136
Patient Age 1.011
Triage Category -37.273
Month -0.128

9
The linear regression model performance was gathered after running the model. The
performance showed an average squared correlation of 19.5% wherein it means that
19.5% of the variation in patient length of stay is explained by the variation in patient
arrival day, arrival mode, arrival (morning, afternoon, evening), patient age, triage
category and month of arrival. The root mean squared error resulted to an average of
128.86 minutes which is a very high measure for the variance of residuals.
The predictive ability of the model is not strong enough to be reliable however with proper
pre-processing of data and use of ensembles, the model will generate lesser errors and
provide better predictions regarding patient length of stay.

10
RECOMMENDATIONS
From the exploratory analysis that was conducted throughout this study a list of
recommendations be as follows:
1. Since there is a seasonal pattern that exists wherein there is a patient visit increase
during the months of March, May, August and December, increase in staffing
requirement is required for the mentioned months.
2. Improvement of the model by improving the pre-processing of dataset and
including ensembles to increase predictive power and decrease error in predicting
patient length of stay.
3. The use of the model (if improved) in a real setting should be on an hourly basis
to come up with real-time information whether additional staff is needed.
GENERAL ANALYTICS ISSUES
Epworth HealthCare has provided us with a dataset that was agreed not to be shared to
anyone aside from people within the group who is taking the similar unit of study. The
confidentiality agreement has been acknowledged by the group at its strictest sense and
the confidentiality of the data has been exercised all throughout the completion of this
study.
Before the dataset has been provided by the unit team, everyone from the group have
signed the executed non-disclosure agreement form. The dataset has been imported to
two software (Tableau and RapidMiner) wherein it was cross-checked with the unit team
first making sure that the group have not violated any of the confidentiality agreement.
The dataset used in this study was anonymised and does not provide specific information
regarding the patients and the employees involved.

11
APPENDICES
APPENDIX A
Initial Exploratory Analysis
Figure 1: Patient Age Analysis
12
Figure 2: Patient Visit Count – Quarterly
Figure 3: Quarterly Patient Visit Graph

13
Figure 4: Patient Visit – Daily
Figure 5: Time and Day of Patient Visit Analysis

14
Figure 6: Triage Analysis
Figure 7: Length of Patient Stay per Ward

15
Figure 8: Count of Staff Available
Figure 9: Employee Total vs Assigned Hours

16
APPENDIX B
Secondary Exploratory Analysis
Figure 10: Monthly Patient Visit
Figure 11: Patient Arrival per Day/Hour

17
Figure 12: Patient Number Against Triage Category per Week
Figure 13: Descriptive Statistics of Numerical Variables from Patient Data

18
Figure 14: Average Count of Patients and Employees
Figure 15: Count of Employees per Month

19
APPENDIX C
Predictive Modelling
Figure 16: Variables of Interest
Figure 17 Attribute Weights

20
Figure 18: Screenshot of Correlation and Selection Process
Figure 19: Linear Regression Model Process
Figure 20: Model Performance

21
REFERENCES
Calegari, R., Fogliattio, FS., Lucini, FR., Neyeloff, J., Kuchenbecker, RS. & Schaan, BD. 2016, ‘Forecasting Daily
Volume and Acuity of Patients in the Emergency Department’,
Computational and Mathematical Methods in
Medicine
, 8, retrieved 20 September 2018, <http://dx.doi.org/10.1155/2016/3863268>
Graff, I. May 2016, ‘Nurse Staffing Calculation in the Emergency Department – Performance-Oriented Calculation
Based on the Manchester Triage System at the University Hospital Bonn’, retrieved 10 august 2018,
<https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4854466/>
Humphreys, S. 2013, ‘Healthcare datasets: ethical concerns’,
Br J Gen Pract., <doi: 10.3399/bjgp13X668230>
Jensen K. 2017,
Staffing Your Emergency Department Efficiently, Effectively and Safely: Core Concepts, Envision
Physician Services, retrieved 20 September 2018,
<https://www.envisionphysicianservices.com/campaigns/breakthrough-series/presentationmaterials/presentations/09-staffing-your-ed-core-concepts.pdf>
Jetten, L. & Sharon, S. 2016, ‘Selected Issues Concerning the Ethical Use of Big Data Health
Analytics’,
Washington and Lee Law Review Online, vol. 72, no. 3, retrieved 23 September 2018,
<https://scholarlycommons.law.wlu.edu/cgi/viewcontent.cgi?referer=https://www.google.com.au/&httpsredir=
1&article=1037&context=wlulr-online>
Levin, S., France, DJ., Hemphill, R., Jones, IR., Chen, KY., Rickard, D., Makowski, R. & Aronsky, D. 2006,
‘Tracking Workload in the Emergency Department’,
Human Factors The Journal of the Human Factors and
Ergonomics Society
, <DOI: 10.1518/001872006778606903>
Parker, BT. & Marco, C. 2014, ‘Emergency Department Length of Stay: Accuracy of Patient Estimates’,
West J
Emerg Med.
, <doi: 10.5811/westjem.2013.9.15816>
Rathley, NK., Obendorfer, D., White, LF., Rebholz, C., Magauran, B., Baker, W., Ulrich, A., Fisher, L. & Olshaker,
J. 2012, ‘Time Series Analysis of Emergency Department Length of Stay per 8-Hour Shift’,
West J Emerg Med,
<doi: 10.5811/westjem.2011.7.6743>
Skinner J., Higbea R., Buer D. & Horvath C. 2018,
Using Predictive Analytics to Align ED Staffing Resources With
Patient Demand
, The Healthcare Financial Management Association,
<https://www.hfma.org/Content.aspx?id=59165>