Sturt University, Melbourne, Victoria.

Review: An Evaluation of Major Threats in Cloud Computing Associated with Big Data Kamalpreet Kaur, Ali Syed, Azeem Mohammad, Malka N. Halgamuge School of Computing and Mathematics, Charles Sturt University, Melbourne, Victoria 3000, Australia e-mail: [email protected], [email protected], [email protected] Abstract—In today’s corporate society, the productivity, stability, and management of an organization relies upon the power of databases. Most organizations outsource their databases in the form of big data and then transfer it into cloud. Although cloud computing technology brings many benefits for an organization, their security risk factor still remains as a big barrier for its wide-spread adoption. Therefore, this problem poses a critical question such as: Is information secure in cloud? Due to this uncertainty, the primary aim of this study is to describe and identify most vulnerable aspects of security threats in cloud environment through content analysis and highlight and evaluate gaps in the literature to draw scholarly attention. This paper analyzed content to source data that helps to identify gaps in the literature. These gaps then have been identified and evaluated to answer questions with possible solutions. This research will help both vendors and users about security issues that have been heightened with recent population advancements and demands that have been pointed out for improvement. This study has reviewed literature in the field over the span of six years, and endeavors to seek answers for the question and cast solutions through thorough evaluation and analysis: the security related issues in cloud computing associated with big data must be taking into account by security practitioners when assessing the needs of service providers. This study has found that cloud environment is an innovation, and the blend of parallel computing and cloud computing can offer various advantages. It is ideal for different kinds of applications that can suit different needs if expenses of application modifications consolidate the cost of setup and maintenance of cloud computing. This study has analyzed content and has also found that management solution of only one big secure data after integrating it with cloud needs yet to be designed. Keywords-Big Data, MapReduce, Cloud computing, RDBM I.INTRODUCTION Big data plays an essential role in the business world, as storage of significant amounts of complex data securely is one of the most crucial aspects of corporate operations. In actual, big data is s term that describes huge amounts of structured, semi-structured and unstructured data. The the demand of innovative method of processing data in a cost effective method, in order to provide process automation and enable decision making is a need and sparks scholarly interest. Big data has 3Vs: high-volume of data, highvelocity of data, and high variety of types of information that needs to be mined. Although, the specific quantity of big data is not mention anywhere, it can be in petabytes or Exabyte. According to Peter Skomoroch, before having big data technologies like MapReduce and Hadoop, it was very difficult, error-prone and time-consuming to process large datasets, therefore, with these technologies, dataset processing is much easier -than the traditional processing tools. Many organizations collect this complex data by doing worldwide surveys to improve their decision-making process which is extremely important to sustain a healthy future for their business [1]. Thus, cloud comes with security challenges as storing big data into the cloud, even though content owners do not know where their data is kept do not trust the cloud. First of all, let me introduce what cloud computing is? Cloud computing is a set of IT services that provides to end-users over the internet in order to scale up or down their service requirements. Cloud computing is the fastest growing sector in IT industry as its capacity increased dynamically with no investing in new infrastructure and licening software. There are many advantages of cloud computing such as cost efficient, storage capacity, back-up and recovery, quick deployment, easy to access information from anywhere and much more. Despiteits advantages, some disadvantages are also there that we need to be aware of, before using this computing. For instance, technical issues (network or connectivity problem), security, prone to attack (hack attacks). However, its pros outweigh its cons and there is only one thing that we need to be aware of which is the fact that cloud computing technology needs to be secure that it would not cause any leaks to stored sensitive information. [20]. Moreover, cloud computing provides virtual resources to their consumers via the internet so that they can use cloud infrastructures, services, and softwares. This is a cheaper way than any other computing systems with zero maintenance cost. The service provider is only responsible for providing the availability of services, nonetheless with services that are also known as “IT on demand” or utility computing [1]. Google has introduced MapReduce, as a framework that uses Hadoop’s distribution file system (HDFS) [2]. In MapReduce, the vast amount of data can be converted into tuples and then these tuples can be deduced as an input and then reduced further by dividing these tuples into smaller sets of tuples. This way, big data can be managed nonetheless, at the same time, it can still create a problem of security, due to business growth and monitoring. However, MapReduce is not sufficient either, as it lacks security of sensitive data, and confidentiality becomes a huge problem. Additionally, we will discuss how Hadoop frame will be used to solve this issue by using different techniques. Although there are many security and privacy issues, they still can be solved by using encryption, multi-factor authentication, security hardware, and applications. II. OUTSOURCING DATA AND APPLICATIONS Cloud computing provides access to data anywhere at any time; nevertheless, the challenge is to check authorized entities. Due to significant challenges caused by monitoring and so on, cloud computing suffers to reach its full potential. However, we have to rely on the third party, and cloud providers, need to make it secure and communicate its reliability with their customers. When we use cloud environment, to make decisions about complex data and platforms as there has not been any other computing ever done before, [3], [5], as, there is no technical mechanism to prevent cloud providers from accessing customer’s data unethically [2]. Therefore, it is significant to build a new layer that can support, a contract negotiation phase between the cloud provider and end-users. Thus, we need the combination of technical and nontechnical mechanism to get clients/customers trust towards cloud environment [1], [2]. III. SECURITY AND PRIVACY CHALLENGES The Service level agreement- means that the service level agreement is a contract agreement that is used to build a new layer that can support a contract negotiation phase between the cloud provider and customers. However, it is still very difficult to bargain about security, privImplementationacy, and trust, thus, there are still some ways to assure customers about the fact that cloud provider provides services which is subjected to a contract agreement, and this makes it difficult. (i) Access control: There is a credential based access control requirement which must be there to access control and services so that customer’s provenance information may not be leaked. (ii) Trust management: Service provider needs to build new access control policies rather than telling customers that their data would never be breached in order to manage trust among customers. (iii) Authentication and identity management: IDM (Identity Management) mechanisms are used to determine the authenticate users and services which are based on credentials and characteristics. (iv) Privacy and data protection: Privacy is the biggest issue in all challenges that are discussed so far.
However, most companies are not comfortable to store their confidential data outside of their premises. By outsourcing data shared infrastructure, the customers background information might be at risk. This can be used for many purposes, such as auditing, trace back, and historically based access control. This shows that the balance between client’s personal information and privacy is the biggest draw back in the cloud environment [10], [15], [17], [25]. IV. MATERIAL AND METHOD The research method used for this study to interpret the trends of data sourced from analyzing content is to find the proceedings and measure progress to assist in solving this perplexing, complex issue. This research uses a Historiclogic method, which will assist in interpreting the findings of security matters in cloud computing in previous research. By gathering information and opinion from expert’s and evaluating it through with data retrieved, will allow the study to shed light in the area for further development. V. RESULTS AND DISCUSSION Cloud environment has different domains and each is having different security and privacy challenges that we need to consider before migrating data to the cloud. The security challenges are categorized into different levels including network level, data level, authentication level etc. However, the protection of sensitive information is a major concern. In order to enhance the security, it is most important to provide access control, authentication, and authorization to the data that are stored in the cloud. There are three main elements in the information security that are CIA (Confidentiality, Integrity, and Availability). However, this is the responsibility of the cloud provider to supply these basic elements to the customers. In Figure 1, it is clearly shown that confidentiality (31%), Integrity (24%), and availability (19%) are the most threaten attributes as compared to other attributes including usability, accountability, security, and reliability. Data security, integrity, and recovery are the biggest risk in cloud computing. It is very important to know what happens to the stored data, and if the cloud fails then, we recover the applications back up. s i[22]. Cloud computing technology uses third-party networks to access cloud services. Thus, there are many risks to access the services in the cloud. There is a need to compare different cloud provider’s services before moving on to cloud. These services are becoming more popular as they are convenient, affordable and provide storage space [25]. Moreover, cloud systems are encountered with various attacks such as loss of data, denial of services, unauthorized access to data, and unauthorized data modification, etc. [24 ] everything is considered. It is expressed that, highly adaptable; nonetheless exceptionally complex cloud computing administrations are arranged by utilizing web interface by clients. However, wrong designing of cloud computing by clients may prompt vulnerable security issues, and it can factor security issues. This paper has used a particular type of methodology that can help us understand issues from the user’s side and server side. Amazon’s Elastic Compute Cloud (EC2) has been decided for this evaluation. The strength of this work is that it provides a robust analysis of vulnerabilities, and security attacks, therefore, this analysis helps the vendors in order to enhance the security policies. The weakness is that it is unique to Amazon, and it could contribute more if it would be general [2], [5]. Figure 1. The percentage of compromised attributes in cloud environment associated with big data A new Architecture and Transparent Cloud Protection System (TCPS) has been discussed by Lombardi et al. [9] to improve the security of cloud resources due to the integrity protection problems in the cloud environment. They guaranteed that they have recognized the integrity protection problems, and to address the integrity issues, they have proposed a framework called TCPS to expand the security of cloud assets. As indicated by them, their proposed framework, TCPS can be utilized to watch the visitor’s integrity, and still keep honesty and virtualization. In TCPS system, in order to manage the image systems, they have used image filter and scanners in order to detect malicious images to prevent from security vulnerability and security attacks. The strength of this work is that, it proposed an instrument that gives enhanced security, transparency, and interruption identification system. The limitation is that they haven’t accepted their work, nor have they sent it in expert cloud computing situation [2], [5]. MapReduce is massive amount of data that can be converted into tuples and then these tuples can be provided to reduce it as an input and then reduce these divided tuples into a smaller set of tuples [28], [29]. This is a way; that big data can be managed although, at the same time, it will still create a problem of security, because data monitoring and business continuity can cause glitches. However, MapReduce is not sufficient due to lack of security of the sensitive data [4]. Therefore, the proposed method to reduce security issues in the cloud is, Airavat. Airavat is a MapReduce-based system that is used to store and provide high security and privacy of sensitive data (Healthcare, shopping transactions, etc.). It is a new integration of access control. The Airavat uses MapReduce on clusters in parallel processing taking on board the example of healthcare data, if this sensitive information leaks to the third party, for instance, insurance companies who can access this data, then the insurance company can find out about medical conditions that would result in the increase of their premiums [28] [29]. Therefore, to protect data from breaches of confidentiality, it is crucial to provide strong security, this proves that Airavat is the best technology to secure confidentiality. With the use of this technique, the untrusted MapReduce program is sent to Airavat then it could be protected as seen in Figure 1. After performing the computations of MapReduce, it could cause leakage of information. It uses a unique system called Linux (SELinux) to add Mandatory access code when Airavat is implemented on Hadoop. This technique provides strong security and privacy by preventing leakage of sensitive data and uses access control mechanism as it is the first system that calculates access control with differential privacy without auditing untrusted codes. The weakness of this system is that it supports not only small sets of reducers and generates but also enough noise to assure the differential privacy of values [28] [29]. Cloud computing is a valuable tool. However, organizations still need to be understood and managed in depth and prioritize execution of any agreements. Fortunately, there are some mitigation strategies if cloud customers can follow; it may reduce the level of risks [26]. Gatewood [27] suggests that deciding a vendor’s inside the audit process, on how frequently the vendor evaluated external organizations, the principles of the merchant is held to, regardless of whether it is interested in being examined consistently. Keeping up consistently with security arrangements and administrative prerequisites can be hard to illustrate. Gatewood recommends that as merchants hurry to create and introduce cloud-based methods; they may miss the mark on including the essential records of administration controls. Moreover, investigating a various security features [30-32] could be an interesting path to explore in the future to protect Big Data [33]. TABLEI.THE COMPARISON OF DIFFERENT CLOUD SECURITY TECHNIQUES No Author Technique /method used Description Problem identified Result 1. Inukollu et al. (2014) [2] Virtual Computing Laboratory technique VCL is an open source implementation that provides NYU students with virtual access to software applications that are academically relevant. end-to-end service insulation via VPN, SSH tunnels, and VLANs A theoretical concept so it is not proposed as much. It could have contributed if the practical
things were discussed in this work. 2. Zhou et al. (2014, 2016, Declarative Secure Distributed This technique is used to explore the security premises of secure data sharing between the apps hosted on Data management issues are listed below: – Data-centric security provides secure query processing, efficient end-to-end verification of data, Researchers have discovered many issues in a cloud environment and start working on these matters in order to minimize these problems. There are threats for using utility computing, as some of the significant results, corresponds to our given results in the table below [2]. This table summarizes the different techniques used to address the security and privacy of the big data in the cloud computing by a different study. There are new security and privacy issues that are identified in the rest of the papers that is obsolete to the argument of this study. VI.CONCLUSION This study has reviewed selected publication from different sources to identify and evaluate gaps regarding security issues in cloud-associated problems with big data. The cloud computing is an emerging technology that most of the organizations are developing, as this system is constantly evolving in this environment. This system has numerous benefits, nonetheless, security and privacy issues are at the top to control sensitive data, including many technologies with cloud computing such as databases, networking, virtualization, and operating systems that are problematic as 2009) [4,17,18] Systems (DS2) the clouds. System analysis and forensic Distributed query processing Query correction assurance system analysis and forensics 3. Rongxing et al. (2010) [3] Bilinear pairing technique This system uses five steps to control unauthorized user access and resolves disputes of big data. The five steps are: Setup, key generation, AnonyAuth, AuthAccess, and Provenance tracking Data forensics and post examination Difficult to implement because it is based on a complex mathematical model. However, this system pushes the use of cloud computing for full recognition to the public 4. Bleikertz et al. (2010) [8] Amazon’s EC2 Amazon’s EC2 have applied specialized query policy language for security analysis model and weigh up it for the practical domain. This security analysis has been implemented in Python and weighs up that was calculated on Amazon EC2 Reachability audit of Amazon security graphs and groups Amazon EC2 provides a robust analysis of security attacks and vulnerabilities to enhance the security policies. However, it is unique to Amazon, and it could be contributed more if it would be general. 5. Lombardi et al. [4, 9] Transparent Cloud Protection System (TCPS) TCPS can be utilized to watch the visitor’s integrity and keeping the honesty and virtualization. Cloud security vulnerabilities and attacks TCPS gives enhanced security, transparency, and interruption identification system, however, they have not accepted their work, nor they have sent in expert cloud computing situation. 6. Gupta et al. (2014, 2012) [4, 6] XML Document- Cryptography and digital signature technique In this, the queries can be processed according to the policy provided by cloud provider, instead of processing all queries. The information size was increased in XML format, and it created some integrity issues in government, health, and finance area because of the mode of delivery of content. XML data document is used for a secure environment to access control of the third party, which introduces another trusted layer of security to the model. 7. Narwal et al. (1985, 2015) [6, 7] Kerberosencryption technology based on Needham- Schroeder protocol Kerberos is an authentication system uses cryptographic tickets to fend off sending the plain text passwords over the connecting wire. Kerberos encrypts considerable shorter oneway hash This technique provides a secure authentication in an open environment and it is very costly regarding CPU power and time. 8. Kevin Hamlen (2014) [2] Cryptographic The sensitive data can be stored in encrypted form in the database rather than plain text. Managing private and public key If Intruder can get the database, however, they cannot get actual data due to encryption of data. 9. Roy et al. (2010) [28] Airavat MapReduce-based system This technique provides strong security and privacy by preventing leakage of sensitive data using access control mechanism -It supports a small set of reducers. -Airavat generates enough noise in order to assure the differential privacy of values. Airavat is the first system that calculates access control with differential privacy without auditing untrusted codes well. Using different studies, we have identified some major research problems that should be taken into account to ensure the security of big data’s success. Moreover, the management solution of only one secure big data after integrating with the cloud needs is yet to be designed. In RDBM (Relational Database Management), the problem of protecting the system from attacks by utilizing the resources and mitigate risks is only one important problem. However, the security solutions are being discovered yet, and even leading providers like Amazon, Google, etc. is facing security issues. Therefore, the decision of adopting cloud computing is still in the progress and could be based on ration of benefits to eliminate threats and risk. REFERENCES [1] B. Matturdi, X. Zhou, S. Li and F. Lin, “Big Data security and privacy: A review”, China Communications, vol. 11, no. 14, pp. 135-145, 2014. [2] V. Inukollu, S. Arsi and S. Rao Ravuri, “Security Issues Associated with Big Data in Cloud Computing”, International Journal of Network Security & Its Applications, vol. 6, no. 3, pp. 45-56, 2014. [3] Rongxing et al, Secure Provenance: The Essential Bread and Butter of Data Forensics in Cloud Computing, ASIACCS‘10, Beijing, China. [4] A. Gholami and E. Laure, “Big Data Security and Privacy Issues in the CLOUD”, International Journal of Network Security & Its Applications, vol. 8, no. 1, pp. 59-79, 2016. [5] F. Shaikh and S. Haider, “Security treats in Cloud Computing”, Int. Conf. for Internet Technology and Secured Transactions (ICITST), Abu Dhabi, pp 214 – 219, 2011. [6] P. Hoving and J. Essén, “Minutes from the first meeting of TC 11, security and protection in information processing systems”, Computers & Security, vol. 4, no. 2, pp. 149-152, 1985. [7] A. Narwal and S. Tomar, “Kerberos Protocol: A Review”, IJERT, vol. 4, no. 04, 2015. [8] S. Bleikertz et al, “Security Audits of Multi-tier Virtual Infrastructures in Public Infrastructure Clouds”, 2010. [9] Y. Zhang and Y. Zhou, “TransOS: a transparent computingbased operating system for the cloud”, International Journal of Cloud Computing, vol. 1, no. 4, pp. 287, 2012. [10] T. Haeberlen and L. Dupré, Cloud Computing Benefits, risks and recommendations for information security, 2nd ed. Rev B, 2012. [11] K. Choo, “Legal Issues in the Cloud”, IEEE Cloud Comput., vol. 1, no. 1, pp. 94-96, 2014. [12] P. Giri and L. Soames, Intellectual Property Arrangements, 1st ed. 2015. [13] R. Sandhu and I. Chana, “Cloud Computing Standardization Initiatives: State of Play”, IJ-CLOSER, vol. 2, no. 5, 2013. [14] A. Gordon, “The Hybrid Cloud Security Professional”, IEEE Cloud Comput., vol. 3, no. 1, pp. 82-86, 2016. [15] “IEEE Cloud Computing Special Issue on Cloud Security”, IEEE Cloud Comput., vol. 2, no. 5, pp. c2-c2, 2015. [16] C. Oppenheim, “Legal issues for information professionals X: Legal issues associated with cloud computing”, Business Information Review, vol. 28, no. 1, pp. 25-29, 2011. [17] W. Zhou, “Towards a Data-centric View of Cloud Security”, 2016. [18] J. Bayuk, “Data-centric security”, Computer Fraud & Security, vol. 2009, no. 3, pp. 7-11, 2009. [19] “IEEE Cloud Computing Special Issue on Cloud Security”, IEEE CloudComput., vol. 2, no. 5, pp. c2-c2, 2015. [20] T. Erl, R. Puttini and Z. Mahmood, Cloud computing. 2013. [21] S. Madria and A. Sen, “Offline Risk Assessment of Cloud Service Providers”, IEEE Cloud Comput., vol. 2, no. 3, pp. 50-57, 2015. [22] T. Erl, R. Putt
ini and Z. Mahmood, Cloud computing. 2013. [23] T. Rodrigues, “Side-by-side comparisons of IaaS service providers”, 2016. [24] C. Chih and Y. Huang, “An Adjustable Risk Assessment Method for a Cloud System”, IEEE International Conference, vol. 2, no. 6, p. 1, 2015. [25] C. Pfleeger, Security in Computing. Upper Saddle River, NJ: Prentice Hall PTR, 1997. [26] T. Betcher, “Cloud Computing: Key IT-Related Risks and Mitigation Strategies for Consideration by IT Security Practitioners”, 2010. [27] B. Gatewood, “Clouds On the Information Horizon: How To Avoid The Storm”, CRM, vol. 43, no. 4, pp. 32-36, 2009. [28] I. Roy et al., “Airavat: Security and Privacy for MapReduce”, NSDI, vol. 10, pp. 297-312, 2010. [29] I. Roy, “Airavat: Security and Privacy for MapReduce”,, 2016. [30] D. V. Pham, A. Syed, A. Mohammad and M. N. Halgamuge, “Threat Analysis of Portable Hack Tools from USB Storage Devices and Protection Solutions”, International Conference on Information and Emerging Technologies, pp 1-5, Karachi, Pakistan, June 2010. [31] D. V. Pham, A. Syed, M. N. Halgamuge, Universal serial bus based software attacks and protection solutions, Digital Investigation 7 (3), 172-184, 2011. [32] D. V. Pham, M. N. Halgamuge, A. Syed P. Mendis, Optimizing windows security features to block malware and hack tools on USB storage devices, Progress in electromagnetics research symposium, 350-355, 2010. [33] V. Vargas, A. Syed, A. Mohammad, and M. N. Halgamuge, “Pentaho and Jaspersoft: A Comparative Study of Business Intelligence Open Source Tools Processing Big Data to Evaluate Performances”, Int. Journal of Advanced Computer Science and Applications (IJACSA), vol 7, no 10, pp 20-29, November 2016.

Leave a Reply

Your email address will not be published. Required fields are marked *