Display a Kepler.gl Map Using Jupyter Notebook
At this point in the early stages of my career in data science I haven’t worked with geospatial data to a large degree. In fact, it almost appears to me that some data scientists try to avoid working with geospatial data with the intent of leaving it to GIS professionals who focus their life’s work toward understanding geospatial relationships and producing geospatial frameworks, models and visualizations for real world use. That being said, there are definitely moments in data science when the use of geospatial data is necessary and important. Insights detailing the blurred boundaries of data science, GIS and cartography can be found here: Towards Spatial Data Science and What is Spatial Data Science?.
Many libraries have been created to help data scientists specializing in fields outside of geospatial information to bridge the gap in understanding and implementation of geospatial relationship modeling and visualization. Geopandas and Shapely are among the many that are frequently piece-mealed together in Jupyter Notebooks to create good geospatial visualizations highlighting important geographical data. Some more libraries and frameworks are detailed here in this article, Essential Geospatial Python Libraries and this article as well 10 Python Libraries for GIS and Mapping. Yet, since the field is changing rapidly I think it is important to check out this Towards Data Science rip also. The Best New Geospatial Data Science Libraries in 2019. This article details one of my favorite new frameworks in all of data science, Kepler.gl.
KEPLER.GL (Geospatial Framework Made For Data Scientists)
Very simply, Kepler.gl is a quick way to create 3D interactive maps using geospatial data. It is an open-source geo-analytics tool developed and maintained by Uber’s data science team. It piggybacks on top of the WebGL data visualization framework called Deck.gl. Similar to Deck.gl, Kepler.gl uses a layered approach to data visualization that hastens the pace at which you can garner insights and present your data.
Kepler.gl was developed specifically for use by data scientists, or anyone for that matter, that are not particularly focused on GIS. The visualization is a quick way for non-GIS data scientists and visualization professionals to create interactive maps from geospatial data. The use of Kepler.gl has made Exploratory Data Analysis all the more enjoyable for me.
OPEN-SOURCE AND DEVELOPED BY UBER FOR REAL USE
Kepler.gl was developed by Uber Data Scientists for real world use-cases. It is an open-source framework and is available as part of Uber’s Vis.gl suite of industrial-grade data visualization tools. You really don’t have to think about the other part of Uber’s visualization suite when working with Kepler.gl in a Python environment but it is helpful to know that you can easily embed it into React-Redux applications if you have the desire and aptitude to deploy it to the web.
Kepler is used at Uber for real applications. They use it as the map component in several of their dashboarding apps and allow their developers to manipulate Kepler and add other components based on their custom needs.
They have designed this framework knowing that Geo-analytics is usually highly domain-specific and abstract but with the goal of making this resource approachable for data visualization beginners and non-technical practitioners.
Kepler.gl does something different than just the 2D x and y plane plots that data scientists are used to working with. It introduces a third dimension that adds depth to go with the highly interactive functionality of the tool. With height or altitude features enabled, users can more quickly identify outliers in the data.
TWO WAYS TO USE KEPLER.GL
- Their Webpage at Kepler.gl
Here you will find some quick links that can make you dangerous in short time.
2. Jupyter Notebook Library
It is not the easiest task to find the Jupyter Notebook implementation of Kepler.gl on their site. To save you some time here is the link to it: Kepler.gl Jupyter Notebook Documentation.
Geopandas and Geometry Objects Not Necessary
Initially I was under the impression that geopandas was a necessary import for the conversion of latitude and longitudinal coordinates to a geometry object but I was wrong. Kepler.gl works well, at least in my case, while using latitude and longitude. Kepler offers different layer attributes based on different source objects. Figure 1.0 details the layer attribute options available. With just one latitude and one longitudinal coordinate for each record you are able to do 3D Grids, 3D Hexbins, Clusters and Heatmaps. You can see in figure 1.2 the type of map that can be quickly created using the Hexbin layer and adjusting its height attribute. In figure 1.3 you can see how you can zoom in and look at statistics using a tooltip.
With a geometry object you can do other types of layers. Figure 2.0 uses a polygon layer but you can also use geometry objects to create a trip map. In Figure 1.4 I use the polygon layer to highlight the accident data for the entire country. I didn’t play with this feature too much so I didn’t create a map color-coded state by state but it is possible. I just wanted to highlight again, the use of the tool-tip and the amount of data that can be included in it.
There are other types of data you can use as well. I haven’t played with them yet so I can’t provide too much info but I do know that you can add layers that include your own objects like 3D buildings and cars and If you have Hex Id information you can create H3 maps.
CAUTION ON JUPYTER LAB IMPLEMENTATION
There is some documentation regarding Jupyter Lab implementation but when I attempted to utilize Jupyter Lab I had no success. The visualizations would not render and display in Jupyter Lab. There may be work arounds that I am unaware of but once I stopped using Jupyter Lab and began using Jupyter Notebook the interactive maps worked seamlessly.
Kepler will allow you to upload CSV, Json, GeoJSON or a saved map Json. There is also support for other file formats but they do not work with the same fluidity. These data options work for both the online user-interface and the Jupyter Notebook version. However, I think it’s best to use CSV files for Jupyter notebook because they are smallest.
CAUTION ON BIG DATA
Right on their front page Kepler.gl is advertised as a tool for use on large data sets. This is true but probably only to the extent of my own abilities, hardware and software. I was using the U.S. Accident DataSet housed on Kaggle. It has over 4 million records and is gigabytes of data large. When initially attempting to use Kepler.gl, and precisely because it was advertised to work on large data sets, I added the entire dataset into Kepler.gl within my Jupyter Notebook. This procedure was very computationally expensive and felt a bit like performing a moderately complex grid-search. It took somewhere around 30 to 40 minutes to complete. I did this two or three times with Jupyter Lab before figuring out that if I switch to Jupyter Notebook I would be able to actually succeed in displaying the map. Needless to say, it can be quite frustrating taking all this time just to get something up and running. For your sanity, attempt to work with a smaller dataset in the outset (which I should have done) as you are likely going to make some initial mistakes which will cost you some time. Kepler.gl works very well in a Jupyter Notebook environment using about 500,000 or less records or about 100MB of data. These models will display in your Jupyter Notebook and produce output html files that are about 50MB and will smoothly run locally hosted in your web browser. I haven’t yet truly attempted to implement these visualizations on the web but in the consideration of using StreamLit there may be some size limitation and other challenges there. Again, if you are familiar with React this can be done with that framework.
JUPYTER NOTEBOOK WORKFLOW
In addition to standard imports, you will also import Kepler.gl and if you need to convert any data to geometry objects you will import geopandas as well.
2. Read In Data
This is also a good time to limit your DataFrame to only columns you want to use. You need to slim down your data as much as possible to ensure the best operation.
3. Convert to GeoDataFrame if you need a Geometry Object
This is not necessary but if you want to highlight anything requiring a geometry object this is a necessary step.
4. Instantiate a Kepler.gl map object
5. Add data to Kepler.gl map object
The data parameter is taking in either the Pandas or Geopandas Dataframe you would like to visualize. The name parameter can be any name you would like to give your dataset. It will show up inside the map on the tool-tip.
Note: You can do this one time if your data is slim enough. This is the case in Example 1. ‘map_ca’ is the Kepler.gl map object that I instantiated above.
You can also add more data to the map like in Example 2. You can add data as many times as you like. Although, you will experience severe lag’s when using the interactivity and long loading times.
6. Display Kepler.gl in Your Jupyter Notebook
‘map_ca’ is the Kepler.gl map object that I then added data to.
7. Save Configuration File
If you want to save a configuration after playing with all the interactive features and their settings in order to preserve the state of the map as you created it you can copy the config file to your Jupyter Notebook and save it with your Kepler.gl html object that we will create in the next step.
Go the curly brackets, click on them and click copy. This will copy the json file. This step is circled in red below. After copying it, paste it into a cell in your Jupyter Notebook and save it to a variable.
8. Save Your Kepler.gl Map to HTML file.
Congratulations! After following these few steps you have quickly configured, displayed and saved a Kepler.gl map object in Jupyter Notebook. If you have any questions, suggestions for improvement or otherwise please feel free to message me on Medium.