Geospatial Big Data Fusion: A Study of the Effects of Renewable and Non-renewable resources in Los Angeles County.
Abstract—The deterioration of air quality due to increased greenhouse gas emissions has an adverse direct correlation to health worldwide. To discourage the manufacture of hazardous air pollutants, renewable energy sources can replace non-renewable energy sources, thereby reducing those effects in densely populated cities. We will create an interactive map as a visual representation and predictive model of air pollution using geospatial data, pollution, and population big data. We will compare a controlled sample of data from Los Angeles, California to rural areas within Kentucky to represent the correlation of dense population, significant levels of pollution, and the addition of renewable energy to replace non-renewables. We hope this research will increase knowledge about the health implications of air pollution and inspire stakeholders to reduce the damaging effects improve the quality of life for people in metropolitan areas.
Keywords— Data Fusion, Big Data Visualization, Environmental Monitoring, Air Pollution, Urban Areas, Predictive Modelling
Pollution and greenhouse gasses have increased across the United States during the past decades. California has developed the “worst US air pollution levels in the United States” . As a result, air pollution has caused a lot of health issues such as asthma, cardiovascular disease, and low birth weight in infants . California’s response to all the pollution and greenhouse gasses was with the passage of AB 32, the California Global Warming Solutions Act, the cap and trade program. AB 32 requires a reduction of greenhouse gas emissions to 1990 levels by 2020 and sets the stage for the transition to a sustainable low-carbon future. AB 32 takes a comprehensive approach to improve the environment and natural resources while maintaining a strong economy . With the use of big data fusion, we can combine different data sources such as population, pollution, non-renewable electricity plants, and renewable resources to visualize the effect of renewable energy on the pollution levels. A map layout is utilized to help display the data in a user-friendly interface that researchers can use to explore multiple data sources fused together.
- Related Work
A. Big Data Fusion
Technological information trends resulting from the OGC Standards baseline linked data and other sources of geospatial observations have been used to increase the variety and volume of geospatial data. 
The integration of data and knowledge from several sources is known as data fusion. In general, all tasks that demand any type of parameter estimation from multiple sources can benefit from the use of data/information fusion methods. The terms information fusion and data fusion are typically employed as synonyms; but in some scenarios, the term data fusion is used for raw data and the term information fusion is employed to define already processed data.  The most agreed definition on data fusion was provided by the Joint Directors of Laboratories (JDL) : “A multi-level process dealing with the association, correlation, combination of data and information from single and multiple sources to achieve refined position, identify estimates and complete and timely assessments of situations, threats, and their significance.”
In principle, a decentralized data fusion system is more difficult to implement because of the computation and communication requirements. However, in practice, there is no single best architecture, and the selection of the most appropriate architecture should be made depending on the requirements, demand, existing networks, data availability, node processing capabilities, and organization of the data fusion system.
B. Geospatial Data
Geospatial data fusion creates new associations involved with processing and abstraction to create a new data element through observation, object/feature, and decision fusion . The implementation of new deployment platforms and sensor types increase the variety and complexity of geospatial observation. Such implications and developments within big data technology assemble a collection of easily exchangeable information within remote sensing research, consistent with geospatial volumes to enhance data analysis. 
If you need assistance with writing your essay, our professional essay writing service is here to help!Essay Writing Service
In response, a live data feed is created with a responsive server-side application involved with real-city application and several other data importing operations.  The increasing variety and volume of geospatial data demand greater abilities to combine and associate information from multiple sources to create knowledge about the geographic world. Through the Geographic Information Systems (GIS) The implementation of the geographic and big data environments with the support of sensor networks and data analysis. 
C. Global/Environmental Challenges
Air pollution is a negative phenomenon that affects not just humans, but the environment too. Air pollution has accelerated climate change, created an ozone hole, and increased particulate matter in the air. The respiratory, cardiovascular, nervous, and immune systems of humans can be damaged by pollution . Some progress has been made to prevent future harm by prohibiting the use of some particularly harmful chemicals and by forecasting air pollution. There are several AI systems that can forecast the pollution much better than statistical measures, but there is not a straight comparison between all of them , .
With the rise of cities and growth of the population, air pollution has become more dangerous than before and continues to threaten the health of the general population. The mortality rate in heavily polluted areas leads to very high mortality rates, particularly in developing countries. A figure known as the Environmental Kuznets Curve displays the relationship between industrialization and air pollutants – As a country begins to develop economically, a variety of factors will create monumental amounts of pollution if safety precautions are not taken. Some of the claimed effects of air pollution cannot yet be traced to it, however, the number of people afflicted as well as the data of deaths in areas of high pollution gives support to the dangers of air pollution .
Tropospheric ozone pollution is a major problem worldwide, specifically in the United States of America, particularly during the summer months. Ozone oxidative capacity and its impact on human health have attracted the attention of the scientific community. In the USA, sparse spatial observations for O3 may not provide a reliable source of data over a geo-environmental region. Geostatistical Analyst in ArcGIS has the capability to interpolate values in unmonitored geo-spaces of interest. In a study of eastern Texas O3 pollution, hourly episodes for spring and summer 2012 were selectively identified. Methodology Geostatistics functionality applies to regionalized phenomena both natural and manmade. It assumes the phenomena that occur in Nature to be spatially dependent or correlated. and utilizes the first law of geography as the core of spatial interpolation and geostatistical analysis .
D. Energy Studies
Renewable energy in California is climbing as strides against climate change and for electric cars are pursued. The power grid in California to date is mostly comprised of renewable resources including (but not limited to) photovoltaic, solar, wind, and biomass. PV (photovoltaic) and wind energy have grown in California which makes up around 50% of the renewable energy going to the grid .
Some of the energy generated by California is wasted or unused because the power grid simply has no space for it, or the energy is being directed to places where it is not needed. A large factor in this issue is the unsupervised charging of electric vehicles. Many vehicles are charged only at certain times of the day, and these hot spots require more available energy than the California energy grid can hold. In addition, during low activity time, not enough storage is made for the energy produced which leads to overfilled storages and the occasional blackout. The idea of managed charging presents the possibility of using grid-integrated technology to control how much power goes to where which would increase energy efficiency and prevent a lot of extra resources going to areas where they are not required. To add on, energy from California’s power grid could be sent to other nearby states to improve clean energy overall in the United States .
California has signed many bills in the Clean Energy & Pollution Reduction Act, setting goals to have renewable energy encompass 33% of energy in the state by 2020, and 60% (formerly 50%) by 2030. As of November of 2018, the 33% goal has already been surpassed, with California even constructing homes with solar panels. The Million Solar Roofs Initiatives set in 2006 set a goal of a million solar roofs by 2018. While California didn’t quite reach that goal, they were very close, with about 958,000 constructed. The push in California for solar power is so important that financial aid was given to residents that chose to invest in solar panels . The observation of data in air pollution from non-renewable to renewable can show the effectiveness of the transition.
E. Predictive Modelling
Satellite rendering and observation with the application of machine learning algorithms are more prominent in predictive learning/modelling to enhance the accuracy of PM2.5, an atmospheric component’s prediction. Uneven spatial coverage and point-based monitoring work hand in hand with the analysis of health effects, epidemiology, and climate effects to aid in the study of atmospheric pressure and concentrations. New advancements as such further contribute to pollution forecasting as it helps understand the fatality of climate change and the increasing danger of atmospheric shifts .
Air quality forecasting and modelling revolve around machine-learning predictive algorithms, specifically the use of accurate sensor readings as well as complex calculations to analyse and predict the current quality index of the atmosphere. Data-driven information used to analyse and predict air pollution risks requires the incorporation of gas sensors, most accurately requiring the integration of neural network analysis to understand environmental patterns that could be disrupted by pollution; most of such neural network predictive models ranging accuracy from 94.2-99.6% .
In relation to our current experiment however, data analysis involves simpler algorithms to display more regressive models for observation. Spatial mapping of air pollution involving the systematic recording, collection, and archiving of the meteorological elements of the studied area. These findings combine into a predictive system more fitted to the local environment to increase predictive accuracy. Such advancements become prominent primarily in operational application of atmospheric analysis .
- Data Representation
The big data we are analysing come from several sources including, Electricity Generation Data from the CARB Pollution Mapping Tool on the California Air Resources Board website  and Solar Measured Production data from the California Solar Initiative Data on the Go Solar California website . The CARB Pollution Mapping Tool provides statistical data on Cement Plants, Hydrogen Plants, Oil and Gas Production, Cogeneration, Refineries, and Power Plant emissions. The focus of this research is the harmful effects of power plant emissions, therefore, the yearly data regarding power plant emissions was interpolated into a dataset identified as Electricity Generation Data. The Electricity Generation Dataset created includes the type, addresses of power plants that are a part of Cap-and-Trade in California, amounts of Greenhouse Gases produced by each power plant yearly, and locations of each plant by county. Our research necessitates the location of power plants and the amount of CO2 produced per year. The location data was interpolated from addresses to latitude and longitude of the power plants. The CO2 data was utilized because it is the most common greenhouse gas and provided an accurate representation of all greenhouse gases. The Solar Measured Production dataset analysed provided Application Number, Program Administrator, Program, Host Customer Physical Address, Zip Code, Production Period, and End Date by month, and Period kWh Production. The data necessary from the Solar Measured Production Data includes, zip code, Production Period End Date by month, and Period kWh Production. The data is by month and was transferred into yearly numbers with the Period kWh Production and condensed into a yearly format.
- Data Fusion
After cleaning the data, we meshed the datasets into a single dataset through data fusion. The data was then combined to create meaningful information through extensive data analysis.
- Data Delivery
The demo prompts to upload four distinct datasets to visualize the data. Through importing our sample data, we are given the option to choose between plotting map data and displaying data in a chart form. Our example utilized small, controlled environments in both California and Kentucky to show comparable features between unique locations within our software. When choosing to plot map data, the model plots markers across California and Kentucky to represent the location and area-based sections in both states. Each of these markers allows the user to view the air quality in a given area by hovering via cursor. The charting option displays a 2-variable histogram representing various renewable and non-renewable parameters to be compared between states. California is represented in blue while Kentucky is green to differentiate one another and build a contrast to make it easier for users to compare the parameters within the given datasets.
Geospatial Big Data Fusion is a revolutionary field with many future applications that can be explored by forthcoming researchers. In this research explores several tools and methods were used through Data Representation, Data fusion, and Data Delivery. The data sets included information on how each state produces energy and, on the air, quality is at every state down to the local level. Fusing the data, we can observe that states who are dependent on coal for energy have a higher PM 2.5 particulate in the atmosphere. The data also included energy sources to produce electric power includes coal, natural gas, ethanol, hydroelectric power, biomass and other renewables. Using C-sharp the data was plotted along California and Kentucky using the Longitude and Latitude coordinates. A demo was used by C-sharp to allow the user to input data that can be overlaid on a map or another option to see the presented data on a graph. Then the user can hover over and read the air quality level as measured by the PM 2.5 level (Fine Particulate Matter). States like California that have a lower the demo shows us that Kentucky primarily uses coal to power their energy plants. The second highest usage for energy production is natural gas. On the other hand, California usage of Coal for energy production is substantially lower than Kentucky’s. California uses different types of renewable resources to power their energy grid that include natural gas, ethanol, hydroelectric and biomass. Once we compare the air quality between California and Kentucky, the results show that Kentucky has a higher PM 2.5 (Fine Particulate Matter) level. As a result, the air quality from Kentucky is much worse than California.
The researchers would like to acknowledge Dr. Justin Zhan, Hadid Salman, the Army Education Outreach Program (AEOP), Research & Engineering Apprenticeship Program (REAP), the National Science Foundation, University of Nevada, Las Vegas, and Research Experiences for Teachers (RET). The publication is a direct outcome of a collaboration between these programs as they allowed secondary students and teachers the opportunity to accomplish graduate-level research. Working with graduate students and professors to pursue a solution to a university-level research project.
 oehha.ca.gov.[Online].Availabl:https://oehha.ca.gov/calenviroscreen/indicators. [Accessed: 14-Jun-2019].
 S. X. staff, “California has worst US air pollution: report,” Phys.org,19-Apr-2018.[Online].Available:https://phys.org/news/2018-04-california-worst-air-pollution.html. [Accessed: 14-Jun-2019].
 California Air Resources Board, “Assembly Bill 32 Overview,” California Environmental Protection Agency Air Resources Board. [Online]. Available: https://www.arb.ca.gov/cc/ab32/ab32.htm. [Accessed: 14-Jun-2019].
 R. E. Sorace, V. S. Reinhardt, and S. A. Vaughn, “High-speed digital-to-RF converter,” U.S. Patent 5 668 842, Sept. 16, 1997.
 M. Shell. (2002) IEEEtran homepage on CTAN. [Online]. Available: http://www.ctan.org/tex-archive/macros/latex/contrib/supported/IEEEtran/
 J. Zhang, J. Jorgenson, T. Markel and K. Walkowicz, “Value to the Grid From Managed Charging Based on California’s High Renewables Study,” in IEEE Transactions on Power Systems, vol. 34, no. 2, pp. 831-840, March 2019.
 J. Roy, “From Data Fusion to Situation Analysis,” https://www.semanticscholar.org/paper/From-Data-Fusion-to-Situation-Analysis-Roy/ff790d907dc0f53d6c90342f7ec90053dfae0827?citationIntent=methodology#citing-papers, 2001. [Online]. Available: http://fusion.isif.org/proceedings/fusion01CD/fusion/searchengine/pdf/ThC21.pdf. [Accessed: 21-Jun-2019]
 J. Geng, S. Wang, W. Gan, H. Yuan, Z. Chen, and T. Dai, “Promoting Geospatial Service from Information to Knowledge with Spatiotemporal Semantics,” Complexity, 21-Jan-2019. [Online]. Available: https://www.hindawi.com/journals/complexity/2019/9301420/. [Accessed: 25-Jun-2019].
 G. Percivall and T. Taylor, “Advances in fusion of big geospatial data,” 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, 2017, pp. 380-383.
 P. A. Parikh and T. D. Nielsen, “Transforming traditional geographic information system to support smart distribution systems,” 2009 IEEE/PES Power Systems Conference and Exposition, Seattle, WA, 2009, pp. 1-4.doi: 10.1109/PSCE.2009.4839979
 Chen, B., & Kan, H. (2008). Air pollution and population health: a global challenge. Environmental health and preventive medicine, 13(2), 94–101.
 Kethireddy, S. R., Tchounwou, P. B., Ahmad, H. A., Yerramilli, A., & Young, J. H. (2014). Geospatial Interpolation and Mapping of Tropospheric Ozone Pollution Using Geostatistics. International journal of environmental research and public health, 11(1), 983–1000.
 Partain, Larry, and Lewis Fraas. “Displacing California’s Coal and Nuclear Generation with Solar PV and Wind by 2022 Using Vehicle-to-Grid Energy Storage.” 2015 IEEE 42nd Photovoltaic Specialist Conference (PVSC), 14 June 2015
 California Energy Commission – Tracking Progress Appendix M. Weng-GutierrezEnergy Commission, “California Energy Commission – Tracking Progress,” California Energy Commission Tracking Progress, 10-Jan-2019.
 Bai, L., Wang, J., Ma, X., & Lu, H. (2018). Air Pollution Forecasts: An Overview. International journal ofBellinger, C., Mohomed Jabbar, M. S., Zaïane, O., & Osornio-Vargas, A. (2017). A systematic review of data mining and machine learning for air pollution epidemiology. BMC public health, 17(1), 907.
 T. M. Amado and J. C. Dela Cruz, “Development of Machine Learning-based Predictive Models for Air Quality Monitoring and Characterization,” TENCON 2018 – 2018 IEEE Region 10 Conference, Jeju, Korea (South), 2018, pp. 0668-0672.
 California Solar Statistics. [Online]. Available: https://www.californiasolarstatistics.ca.gov/data_downloads/. [Accessed: 26-Jun-2019].
Cite This Work
To export a reference to this article please select a referencing stye below:
Related ServicesView all
DMCA / Removal Request
If you are the original writer of this essay and no longer wish to have your work published on UKEssays.com then please: