Companies are able to accumulate data from several sources due to the advancement in technology to produce big data that was originally mean to be used to extract priceless information to run businesses. The big data now allows companies to build conceptual models to help them not only to be able to adapt to the new market trends but also to understand consumer behavior. These models are used to discriminate the products offered by companies to match customer’s expectations (M. Alguliyev, T. Gasimova and N. Abbaslı 2017, pp. 28-35).
Big data is defined as massive datasets that have large, diverse and complex structure that are tough to store, analyze and visualize for further course of action (Seref & Duygu 2013, pp. 42-47). These data are produced from numerous sources such as online transactions, emails, science data and many more. It is defined as 3Vs, and the 3Vs are volume, velocity and variety. Volume signifies the huge amount of data that are being produced every day. Up to zettabytes (1021 bytes) or brontobytes (1027 bytes) of data are produced daily.
Velocity is the rate of growth, how quickly the data are collected and the speed the data are analyzed. Variety refers to the different types of data that can now be used. Generally, big data are obtainable in three types: structured, semi-structured, and unstructured format. Structured data are tagged, making it easily sorted. Semi-structured data contains tags to separate data elements but does not follow fixed fields, and unstructured data is random making it hard to analyze (Seref & Duygu 2013, pp. 42-47).The focus in the past was structured data that can be put into tables, but according to Bagiwa (2017, pp. 181-187) 80 percent of the data in the world is now unstructured, making it not easily put into tables or relational databases.
Archarjya and P (2016, pp. 511-518) adds a fourth V that refers to veracity, the accessibility and trustworthiness of data. With many forms of big data, quality and accuracy of data not as manageable, but the volume makes up for it.
Challenges in Big Data
Various fields like health care, public administration, retail, biochemistry and other interdisciplinary scientific researches has been collecting Big data in recent years. Web-based application gathers big data regularly, such as social computing and internet search indexing. New prospects in knowledge processing tasks for researchers due to the advantages big data provides. Naturally, opportunities always follow challenges (Archarjya and P 2016, pp. 511-518). Various computational complexities, information security, and computational method are needed to handle the challenge of analyzing big data. The challenges of big data analytics can be categorized into four groups: 1. Data collection, storage and analysis; 2. Scalability; 3. Information security; and 4. Real World Applications. These issues are discussed in the below sections.
A. Data Collection, Storage and Analysis
The most significant problem in data collection is the diversity of data sources. Heterogeneous data problems results from variety, representation and semantics of the data source. Semantic problems arise due to different method of data collection by two parties. Data representation problems are comparable to the semantics problem as misrepresentation of data can caused the same information to be displayed by different types of data. Another important challenge of data collection is to collect the data needed for the purpose. The speed and amount of data acquired requires making immediate choices of what data to preserve and discard. This process typically takes huge efforts and resources due to the nature of big data. A third problem is the transfer of collected data. due to the volume of data the transfer speed may be bottleneck in the process (M. Alguliyev, T. Gasimova and N. Abbaslı 2017, pp. 28-35).
Collected data must be stored in some system. Every day massive amount of data is created by various means that cost companies millions of dollars just to store it. These data are sometimes then ignored or deleted because of not enough storage to store them. According to (M. Alguliyev, T. Gasimova and N. Abbaslı 2017, pp. 28-35) Switch, a company that is created solely to help other companies store data, have seven football court sized server helping Google, Morgan Stanley and others to store the required data for their business. They added that the data storage market has grown to 70 billion dollar a year.
Collected data needs to be analyzed and sorted before companies are able to use it. When dealing with large datasets data reduction, data selection, and feature selection is a crucial task. This proves to be a challenge to data analyst as current algorithms may take up large amount of time when dealing with big data. One major challenge is to develop new machine learning algorithms to automate the process while making sure it is reliable. (Archarjya and P 2016, pp. 511-518).
Another major problem for big data analysis techniques is its scalability and security. Researchers have started focusing on accelerating data analysis and its speed up processors in the last decades following Moore’s Law (Archarjya and P 2016, pp. 511-518). Archarjya and P (2016, pp. 511-518) added “Incremental techniques have good scalability property in the aspect of big data analysis. As the data size is scaling much faster than CPU speeds, there is a natural dramatic shift in processor technology being embedded with increasing number of cores. This shift in processors leads to the development of parallel computing”.
C. Information Security
When analyzing big data huge amount of data are correlated, analyzed, and mined for meaningful patterns. Sensitive information is safeguarded by organizations using different guidelines. A major challenge in analyzing big data is reserving sensitive information. There is a huge security risk related to big data. Thus, making information security a big data analytics problem. To increase security of big data techniques like authentication, authorization, and encryption can be deployed. Scale of network, real time security monitoring, variety of different devices, and lack of intrusion system are some of the security issues that big data applications are facing. This has attracted the attention of information security to develop a multi-level security policy model and prevention system. Despite extensive research being carried out to protect big data, there is still room for improvement (Archarjya and P 2016, pp. 511-518).
D. Real World Application
Finding qualified data scientist that can deal with big data are proving to be greater challenges for companies compared to the problems of the analytical process itself. Every year a huge sum of money are spent by companies to prepare their employee to be able to manage big data. Analyst who that can deal with big data are rare in the job market and even fewer people who can comprehend the data and the meaning behind the numbers. Most analysts have hard time to understand and see the false data results. (M. Alguliyev, T. Gasimova and N. Abbaslı 2017, pp. 28-35). A study by Manyika et al. (2011) forecasts that “by 2018, the U.S. alone may face a 50 percent to 60 percent gap between supply and requisite demand of deep analytic talent”.
In recent years data are generated at a dramatic pace. Analyzing these data is challenging for a general man. The big data creates challenges from data collection, to data storage and analysis. Despite the challenges, companies will still be using big data for commercial purposes. These problems will be attempted using various analytical and scientific tools. The reality is that the future is depended on the big data. These problems need to be solved in order for companies to thrive and operate in the future. Companies can hope to attract excellent data analysts using high salary. Big data has already attracted a lot of attention and many works on solving fundamental problems that can change the way we perceive the reality right now. The technologies that understand and process huge amount of data to interact with humans are more and more of the reality and attract customers.
It is expected that the amount of global data collected today will double every two years. This abundance of information has no benefit unless they are analyzed. Due to this, it is necessary to develop techniques for facilitating big data analysis. These techniques can be implemented via the development of powerful computers leading to automated IT systems. Transforming big data into relevant and useful information is not an easy feat, even for “high performance large-scale data processing, including exploiting parallelism of current and upcoming computer architectures for data mining” (Archarjya and P 2016, pp. 511-518). Moreover, uncertainty is involved in many different forms of this data (Archarjya and P 2016, pp. 511-518).
Reduction techniques have been developed as big data are often reduced to include only the relevant characteristics necessary for a specific point of view or to facilitate an area of application. Additionally, machine learning concepts and tools will assist researchers in obtaining meaningful results. It is also necessary for the efficient tools to be developed to have the capability to filter through “noisy and imbalance data, uncertainty and inconsistency, and missing values” (Archarjya and P 2016, pp. 511-518).
- Acharjya, DP & P, KA 2016, ‘A Survey on Big Data Analytics: Challenges, Open Research Issues and Tools’, (IJACSA) International Journal of Advanced Computer Science and Applications, vol. 7, no. 2, pp. 511-518.
- Bagiwa, LI 2017, ‘Big Data: Concepts, Approaches and Challenges’, International Journal of Computer Networks and Communications Security, vol. 5, no. 8, pp. 181-187.
- Internetlivestats.com. (2019). Internet Live Stats – Internet Usage & Social Media Statistics. [online] Available at: https://www.internetlivestats.com/ [Accessed 31 Aug. 2019].
- M. Alguliyev, R., T. Gasimova, R. and N. Abbaslı, R. (2017). The Obstacles in Big Data Process. International Journal of Modern Education and Computer Science, vol. 9, no. 3, pp.28-35.
- Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburge, C. and Byers, A. (2011). Big data: The next frontier for innovation, competition, and productivity. [online] McKinsey & Company. Available at: https://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/big-data-the-next-frontier-for-innovation [Accessed 1 Sep. 2019].
- Sagiroglu, S & Sinanc, D 2013, ‘Big Data: A Review ‘, 2013 International Conference on Collaboration Technologies and Systems (CTS), pp. 42-47.
Cite This Work
To export a reference to this article please select a referencing style below: