14

I´d like to construct a shapefile from a Pandas Data Frame using the lon & lat rows.

I have got a csv file and I process it with pandas to make a data frame which is easier to handle

Is it posible to do that without make a loop line by line ?

PolyGeo
  • 65,136
  • 29
  • 109
  • 338
kamome
  • 757
  • 2
  • 12
  • 25

3 Answers3

25

Yes, that can be done with shapely and geopandas.

Supposed that your pandas dataframe kind of looks like this:

import pandas as pd
data = [
        {'some_attribute': 'abc', 'lat': '50.1234', 'lon': '10.4023'},
        {'some_attribute': 'def', 'lat': '40.5678', 'lon': '8.3365'},
        {'some_attribute': 'ghi', 'lat': '60.9012', 'lon': '6.2541'},
        {'some_attribute': 'jkl', 'lat': '45.3456', 'lon': '12.5478'},
        {'some_attribute': 'mno', 'lat': '35.7890', 'lon': '14.3957'},
        ]

df = pd.DataFrame(data) print(df)

=>

   lat      lon some_attribute

0 50.1234 10.4023 abc 1 40.5678 8.3365 def 2 60.9012 6.2541 ghi 3 45.3456 12.5478 jkl 4 35.7890 14.3957 mno

First, make sure that geopandas and shapely are installed properly which sometimes is not easy because they come with some dependencies (e.g. GEOS and GDAL). If does not work at first try via pip install geopandas shapely, search for the error on Google or StackOverflow/Gis.Stackexchange because most probably there will be an answer available solving that problem for you.

Then, it is just a matter of creating a new geometry column in your dataframe which combines the lat and lon values into a shapely Point() object. Note that the Point() constructor expects a tuple of float values, so conversion must be included if the dataframe's column dtypes are not already set to float.

from shapely.geometry import Point

combine lat and lon column to a shapely Point() object

df['geometry'] = df.apply(lambda x: Point((float(x.lon), float(x.lat))), axis=1)

Now, convert the pandas DataFrame into a GeoDataFrame. The geopandas constructor expects a geometry column which can consist of shapely geometry objects, so the column we created is just fine:

import geopandas
df = geopandas.GeoDataFrame(df, geometry='geometry')

To dump this GeoDataFrame into a shapefile, use geopandas' to_file() method (other drivers supported by Fiona such as GeoJSON should also work):

df.to_file('MyGeometries.shp', driver='ESRI Shapefile')

And that is what the resulting shapefile looks like when visualized with QGIS:

Resulting shapefile

PolyGeo
  • 65,136
  • 29
  • 109
  • 338
Dirk
  • 1,841
  • 1
  • 18
  • 32
  • 2
    Hi, I have a similar situation,but instead of points, I have polygons. It is possible to do something similar df['geometry'] = df.apply(lambda x: Point((float(x.lon), float(x.lat))), axis=1) but with polygons? – VDC Aug 23 '16 at 17:48
7

For ArcMap you need to define the Projection before exporting to Shapefile.

import geopandas
df = geopandas.GeoDataFrame(df, geometry='geometry')

# proj WGS84

df.crs= "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"

df.to_file('MyGeometries.shp', driver='ESRI Shapefile')
tinlyx
  • 11,057
  • 18
  • 71
  • 119
Ivan Moran
  • 171
  • 1
  • 3
  • I did something similar, using df.crs= "+init=epsg:27700" to project my shapefile according to the British National Grid. However, when I open it in ArcGIS, the coordinate system is unknown. What am I doing wrong? – FaCoffee Sep 18 '17 at 18:27
6

If you haven't done so already, install GeoPandas (e.g. one of either pip install geopandas or conda install geopandas for Anaconda/Miniconda users).

Here is how to read a CSV file with pandas, then use the geopandas.points_from_xy helper function to create a geometry column, then write a shapefile:

import pandas
import geopandas

from io import StringIO

example CSV file

csv_input = StringIO("""
Name,Lat,Long Kingston,18,-76.8 Lima,-12.05,-77.05 Reykjavik,64.15,-21.95 """)

gdf = geopandas.GeoDataFrame(pandas.read_csv(csv_input)) gdf.set_geometry( geopandas.points_from_xy(gdf['Long'], gdf['Lat']), inplace=True, crs='EPSG:4326') gdf.drop(['Lat', 'Long'], axis=1, inplace=True) # optional gdf.to_file('some_capitals.shp')

Mike T
  • 42,095
  • 10
  • 126
  • 187
  • Definitely faster! same data: - this method " 56.2 ms ± 2.13 ms per loop " vs accepted answer + assign crs "1.75 s ± 41.6 ms per loop " – user1013346 Feb 17 '23 at 06:06