will research various aspects of Hadoop and summarise

Data Analytics: Research Project 1. Overview The Hadoop framework is commonly used to store and process large quantities of data in applications where fault tolerance, scalability and throughput are a priority. In this project, you will research various aspects of Hadoop and summarise your findings in a report. Further details of the report requirements are given in Section 2. The marking scheme is given in Section 3. It should be noted that certain elements of this project require you to conduct independent research outside of the material covered in class. Where appropriate, you should identify reputable, peer-reviewed sources of information to use. External sources of information must be referenced. Guidelines for reference styles are given in Section 2.3. Your project report must be submitted by 17:00 Greenwich Mean Time (GMT), February 21, 2018. Please note that, as college policy strictly prohibits plagiarism, all reports must be processed using Turnitin after submission. 2 Requirements 2.1 Core requirements In order to complete this project, you must write a structured report in the style of an academic paper. Your report should be approximately 3000 words long (±10%), excluding references and captions, and provide concise answers to the following questions: • How is Hadoop architectured? What are the purposes of the Hadoop Distributed File System (HDFS), Yet Another Resource Negotiator (YARN) and MapReduce? • How is fault tolerance achieved in Hadoop applications? • How are scalability and throughput achieved in Hadoop applications? 2.2 Additional research Your report should also include some additional research into a Hadoop-related topic of your choice, e.g. • A feature of Hadoop not addressed above. • A drawback of Hadoop addressed by another technology. • A case study detailing the application of Hadoop to a real world data problem. While the choice of topic is your own, you must ensure that the discussion is sufficiently deep (see Section 3). 2.3 Referencing style You may freely reference but not copy material from external sources. If you reference material from an external source, you must cite it in a consistent and accepted style (e.g. IEEE, Chicago, MLA). Wikipedia and personal blogs are not acceptable sources for citation. Instead, you should try to identify reputable and peer-reviewed sources of information. 3 Marking scheme The project report will be graded as follows: • Research into the core aspects of Hadoop, as specified in Section 2.1: 60%. • Additional research into a Hadoop-related topic of your choice: 25%. • General presentation of the report, including layout, diagrams and references: 15%. Overall, this project is worth 30% of the marks for this module.

Leave a Reply

Your email address will not be published. Required fields are marked *