Executive Summary
Technological advancements have changed the mode of capturing, storing, and sharing information, especially in the digital sphere. The changes have contributed to a generation of a wealth of information, commonly referred to as big data. Initially, data storage was an enterprise wonder due to the intricacies relating to the recording and analyzing the data collected. Presently, the collection, analysis, and storage of data have transformed due to the availability of larger storage spaces and diversified systems. The article on big data offers valuable insight on the concepts that have sustained its survival and evolution in the technology sphere. An analysis of the former and present data corporations portrays how big data has evolved in the past years and what needs to be done to ensure its survival in the rapidly changing world.
Introduction
Governments, public health organizations, and businesses use the online platform to collect and monitor information. Governments use tweets and blogs to determine the prevalent sentimental situation in the country. Public health institutions use trending news and searches to monitor epidemics. Online businesses use data from online purchases and searches to improve their customer service and marketing methods. Social scientists also follow social media platforms and blogs to determine how particular messages spread and how they can serve the public good. As a result, these needs have facilitated rigorous data search, computing and storage, hence widening the scope of big data management and analytics. The database origins, subsequent changes, and current business practices demonstrate big data as a key player in the industrial field; hence, there is the need for the Asterix project as a solution for the existing challenges.
Historical Development of Big Data
The growth of data storage in megabytes to terabytes and petabytes today illustrates the growth surrounding big data. With time, it is expected that the data will be in exabytes to accommodate the large volumes of information generated on online platforms. Initially, the data was stored in a huge computer system with limited processing capabilities. The storage shifted to the development of parallel databases that run on multiple connected computer systems, each with their disks, memories, and processors. The data is spread over partitioned systems, with either random, hash, or range partitioning, using hash-based, divide, and conquer parallel methods. In the 80s, a first-generation system evolved to a second wave with a parallel database, which was later replaced by SQL through acquisition by key software and hardware vendors.
The 90s introduced the World Wide Web, which later led to operational challenges for companies, such as Google, Inktomi, and Yahoo, while facilitating services on their sites. Google established the Google File System and coupled it with a MapReduce programming model to process big data. Yahoo, Facebook, and other companies established an Apache system developed into Hadoop, a high-end data storage database. Online transaction processing relying on SQL became too expensive, complex, and slow, creating a need for friendlier databases. Google developed the BigTable, Amazon developed the Dynamo, while Apache was cloned into Cassandra and HBase to meet the specific complexity, cost, and speed needs.
Interpretation of the Procedures
The article uses case studies to collect their data, after which they analyze, interpret the findings, and make the necessary recommendations. It highlights Hadoop, HDFS, MapReduce, and SQL features in contrast to each other while suggesting a viable platform for big data management (Borkar, Carey, & Li, 2012). Hadoop and HDFS have developed over the years to dominate the big data analytics platform in large companies since MapReduce does not offer a competitive advantage as the former databases. High-level languages, such as Hive and Pig from Facebook and Yahoo, allow a quick and easy expression of data analyses, hence being the most likely choice for big data firms (Borkar et al., 2012). The jobs expressed in the languages are translated into MapReduce tasks and executed on the Hadoop clusters. As a result, MapReduce functions are downgraded in favor of higher-level languages, almost similar to SQL.
Data Analysis Strategies
The Big Data article focuses on a comparative approach by giving the databases’ pros and cons. It highlights the advantages and disadvantages of MapReduce against SQL. Some of the benefits include ready availability, affordability, presence of several components and layers, supporting external information while accessing files, placing data automatically instead of a manual system, and the ability to duplicate activity without the facilitation of an operator (Borkar et al., 2012). Some of the disadvantages given are an impedance mismatch due to the layering system, limited random partitioning, which blocks the necessary files, complex coupling program from the back and front-end big data sites requiring extensive use of baling wire, handwritten and bubble gum scripts, it is quite heavy, and its flexibility leads to maintenance problems.
Recommendations
Since the Hadoop has proved insufficient to address all the Big Data problems, there is a need to establish a highly scalable tool to meet the search, analytics, and storage needs. The system, developed with features combining parallel databases, semi-structured management, and first-generation technical computing elements, is ASTERIX. It is a huge information management system that can access, store, ingest, index, analyze, publish, and query large volumes of semi-structured data.
Conclusion
Big Data has largely transformed the acquisition of information in the present age. The analysis of historical development, especially in storage space, is a key point in the milestones achieved to enable access, storage, and publishing large volumes of data. The information has also elaborated on the continuous growth of various databases, such as SQL and Hadoop, and their role in transforming giant online companies, such as Google, Amazon, and Facebook. Through the availability of systems that can convert and store large data, multiple users can access the sites for various purposes, such as searching and online purchasing. Similarly, institutions can access the data from the users’ bio and use it for the public good, such as monitoring epidemics. Social scientists can also observe patterns from people’s online activities to study their behaviors for relevant social studies.
Reference
Borkar, V., Carey, M., & Li, C. (2012). Big data platforms: What’s next? XRDS: Crossroads, the ACM Magazine for Students, 19(1), 44-49. Web.