-1

I have a folder with around 90 subfolders. Each of these subfolders represents a region in a country and they have the same internal structure. Now, I need two shapefiles from folder (let's call it /NEED/) and shapefiles need_one.shp and need_two.shp. These shapefiles are identical just in different folders (regions) and I want to merge them as one. So, the structure is like this - main_folder/region1/NEED/file_names.shp (for example the end could say need_one.shp or need_two.shp). And the next would be main_folder/region2/NEED/file_names.shp and so on.

with open("/main_folder/region1/NEED/need_one.shp", 'w') as file:
    for dir_, dirpath, filename in os.walk("/main_folder/"):
        file = os.path.join(dir_, filename)
        file.write("main_folder/Merged_Results/need_one.shp")

Now, this is what I have so far. I am still learning python and also using Python 3. I have created a virtual env for this and installed several needed packages.

What am I doing wrong?

When I run it, I get the following error:

TypeError: join() argument must be str or bytes, not 'list

The second version goes like this:

dirpath = "/main_folder/"
regexp = '.shp'
def iter_matching(dirpath, regexp):
    with open("/main_folder/region1/NEED/need_one", 'w') as file:
        for dir_, dirpath, filename in os.walk(dirpath):
            abspath = os.path.join(dir_, filename)
            if regexp.match(abspath):
                yield abspath
            file.write("/output_folder/need_one")

This runs but gives me no results.

PolyGeo
  • 65,136
  • 29
  • 109
  • 338
Matija
  • 7
  • 2
  • I think so. Maybe some library to merge them, perhaps geopandas – Matija May 09 '20 at 11:15
  • 1
    If the join error message is caused by the line where you do os.path.join, then try printing out _dir and filename before it to enlighten you as to the error message. You could also use a python debugger to stop the code at that point and show you. At this point this is a Python question and doesn't really have any GIS aspect so might be better on stackoverflow... – Spacedman May 09 '20 at 11:58
  • Also you using with to open a file as file, which is what you do if you are intending to read or write from that file inside the with block, and then two lines later overwriting the file variable with something else. You might find an answer on SO by searching for "Python find files matching pattern" – Spacedman May 09 '20 at 12:01
  • Can you give a real example of the exact path to both shapefiles that need to be merged? Or even better, two examples so we can see how to distinguish two merges. – BERA May 09 '20 at 12:17
  • Sure, /Volumes/External/DPSG/BD_TOPO/Regions/Region_01/HYDROGRAPHY/SURFACE_WATER.shp and /Volumes/External/DPSG/BD_TOPO/Regions/Region_01/HYDROGRAPHY/SURFACE_WATER_POLYGON.shp After that, /Volumes/External/DPSG/BD_TOPO/Regions/Region_02/HYDROGRAPHY/SURFACE_WATER.shp As you can see, the region number changes, nothing else. – Matija May 09 '20 at 12:20
  • @Spacedman Thanks, I actually did this, I know it is a list just not sure how to bypass it. Anyway, I actually found a strange error by doing this, well not an error, just an occurrence. Since in the folder Regions I have Region_01 as a folder and also a Region_01.md5 which I am not entirely sure what is, but I get a list of those, and not the folders – Matija May 09 '20 at 12:22
  • I am using a with open to open a file and later write, I am just not entirely sure how to do this as is clearly visible. I will try SO search – Matija May 09 '20 at 12:23
  • Have a look at gdal_mergy.py: https://github.com/OSGeo/gdal/blob/master/gdal/swig/python/scripts/gdal_merge.py – Zoltan May 09 '20 at 12:30

1 Answers1

1

I doubt you can merge shapefiles using open. You will need a python library that can handle GIS data. For example geopandas:

import os
import geopandas as gpd
import pandas as pd

outfolder = r'C:\GIS\data\testdata\outshapes'
rootDir = r'C:\GIS\data\testdata\interpoints'

#List files
to_merge = []
for dirName, subdirList, fileList in os.walk(rootDir): #https://www.pythoncentral.io/how-to-traverse-a-directory-tree-in-python-guide-to-os-walk/
    print('Found directory: %s' % dirName)
    for fname in fileList:
        if fname.endswith('.shp'):
            to_merge.append(os.path.join(dirName,fname))
#Merge
for i, (shp1, shp2) in enumerate(zip(to_merge[::2], to_merge[1::2])): #For each shapefile pair: #For each shapefile pair
    df1 = gpd.read_file(shp1)
    df2 = gpd.read_file(shp2)
    merged = pd.concat([df1,df2], ignore_index=True) #https://gis.stackexchange.com/questions/162659/joining-concat-list-of-similar-dataframes-in-geopandas
    merged.to_file(os.path.join(outfolder, 'Merged_{}.shp'.format(i))) #Change naming. Now output SHAPEs will be named Merged_1.shp, Merged_2 ...
BERA
  • 72,339
  • 13
  • 72
  • 161
  • Thanks @BERA I'll check and test it and let you know how it goes, btw, I don't have access for pythoncentral.io, I get 403 forbidden for some reason – Matija May 09 '20 at 12:44
  • So far, I have managed to locate the files by adding to for fname in fileList: if fname.endswith('.shp') and 'SURFACE_WATER' in fname(because there is a bunch of shp and I only want specific ones) but I have no idea how to merge them into one and output in a folder. I saw the merge link but I am trying to figure out how to first output the data to a folder or I don't know, it is all really confusing me – Matija May 09 '20 at 15:06
  • Are the schema (columns) the same in both shapefiles? – BERA May 09 '20 at 15:08
  • Yes, the Shapefiles are identical, they are just in different regions and regions are separated into different folders. – Matija May 09 '20 at 15:24
  • Can you help me to understand where does it output it? Should I change the "outfolder" with the desired path to folder? – Matija May 09 '20 at 15:27
  • I see now the outfielder above you added, my bad there, sorry, I got a bad headache – Matija May 09 '20 at 15:29
  • I am not getting anything and I changed the rootDir to my root folder and outfolder to desired output folder. – Matija May 09 '20 at 15:34
  • I did get some output but not from all folders, maybe one or two regions – Matija May 09 '20 at 15:42
  • Another thought, should I add the and 'SURFACE_WATER' in fname? In order to get only this .shp from each folder and not other shp too? Is it a valid statement? – Matija May 09 '20 at 15:46
  • It should be valid. Add a print statement and see if it works. – BERA May 09 '20 at 15:47
  • It is merging some but not sure how, I got 32 output shp files. Basically, let's say that one region has around 1500 features, the merged ones have slightly more on average, maybe around 2500 features or so – Matija May 09 '20 at 15:57
  • I cant debug for you. Focus on the first part until to_merge list look ok – BERA May 09 '20 at 15:58
  • 1
    I can *guarantee* that you can't merge shapefiles via simple concatenation. First, there's the 100-byte header which specifies file length in bytes 25-28. Then there's the record number embedded in each row. (Both of these problems exist in the .shx as well, and dBase also has multiple header records.) Finally, even if you ignore the headers and update the offset, if geometries of differing type or dimensionality exist (or if the dBase field definitions are not identical), then you have an invalid shapefile as well. – Vince Jun 09 '20 at 01:21