1

I have 3 xls files that have 3 sheets per file. All sheets have same column headers but order as you see below is different

1.xls
Name Address Date City State Zip

2.xls
Address Date City Zip Name State

3.xls
City Zip Name Address Date State

I want my final xls file to concatenate all 3 files and sheets

Output.xls
Name Address Date City State Zip RowNumber SheetName

The rownumber should be the specific row number from each file and sheet the data comes from before concatenation.Sheetname should be the sheet it comes from within the xls file.

My attempt-

import os
import pandas as pd
#set src directory
os.chdir('C:/Users/hhh/Desktop/python/Concat')
def read_sheets(filename):
    result = []
    sheets = pd.read_excel(filename, sheet_name=None)
    for name, sheet in sheets.items():
        sheet['Sheetname'] = name
        sheet['Row'] = sheet.index
        result.append(sheet)
    return pd.concat(result, ignore_index=True)
files = [file for file in os.listdir(folder_path) if file.endswith(".xls")] 
dfoo = read_sheets(files)

But nothing happens and i just receive an assertion error saying assert content_or_path is not None. Is this because column orders don't match? is there a workaround? Number of columns are same in all files and sheets. Within each file sheets have same order. But if you compare 1.xls sheets to 2.xls the column order is different as you can see in my reprex above

Joe Tha
  • 163
  • 9

1 Answers1

1

I believe your question is asking to take 9 different sheets (3 each in 3 different .xls files) and combine them into a single sheet in a new spreadsheet Output.xls.

A few comments to start:

  • The different column order for different input files shouldn't be a problem.
  • You may want to consider having the output file be a .xlsx file instead of .xls, since the xlwt package required to work with .xls files raises a warning:
FutureWarning: 
As the xlwt package is no longer maintained, the xlwt engine will be removed in a future version of pandas. 
This is the only engine in pandas that supports writing in the xls format. 
Install openpyxl and write to an xlsx file instead. 
You can set the option io.excel.xls.writer to 'xlwt' to silence this warning. 
While this option is deprecated and will also raise a warning, it can be globally set and the warning suppressed.
  writer = pd.ExcelWriter('Output.xls')
  • The sample code in your question sends a list of files to the function read_sheets(), so this function needs to be changed to expect this list rather than a single file.
  • The code needs to loop over input files, then over the sheets in each file.

Here is a modification of your code which does what I think you are asking (with a different argument for os.chdir() to match my test environment):

import os
import pandas as pd
#set src directory
#os.chdir('C:/Users/hhh/Desktop/python/Concat')
os.chdir('./Concat')
def read_sheets(files):
    result = []
    for filename in files:
        sheets = pd.read_excel(filename, sheet_name=None)
        for name, sheet in sheets.items():
            sheet['Sheetname'] = name
            sheet['Row'] = sheet.index
            result.append(sheet)
    return pd.concat(result, ignore_index=True)
folder_path = '.'
files = [file for file in os.listdir(folder_path) if file.endswith(".xls")] 
dfCombined = read_sheets(files)
writer = pd.ExcelWriter('Output.xls')
dfCombined.to_excel(writer, index=None, sheet_name='Combined')
writer.save()
writer.close()

The sample output looks like this:

Name    Address             Date    City        State           Zip     Sheetname   Row
Alice   1 Main St           11      Nome        Alaska          11111   Sheet1      0
Bob     1 Main St           12      Providence  Rhode Island    22222   Sheet1      1
Candace 1 Main St           13      Denver      Colorado        33333   Sheet1      2
Dirk    1 Main St           14      Wilmington  Delaware        44444   Sheet1      3
Edward  1 Marvin Gardens    11      Nome        Alaska          11111   Sheet2      0
Fran    1 Marvin Gardens    12      Providence  Rhode Island    22222   Sheet2      1
George  1 Marvin Gardens    13      Denver      Colorado        33333   Sheet2      2
Hannah  1 Marvin Gardens    14      Wilmington  Delaware        44444   Sheet2      3
Irvin   1 St Marks Place    11      Nome        Alaska          11111   Sheet3      0
Jasmine 1 St Marks Place    12      Providence  Rhode Island    22222   Sheet3      1
Kirk    1 St Marks Place    13      Denver      Colorado        33333   Sheet3      2
Lana    1 St Marks Place    14      Wilmington  Delaware        44444   Sheet3      3
Alice   2 Main St           11      Nome        Alaska          11111   Sheet1      0
Bob     2 Main St           12      Providence  Rhode Island    22222   Sheet1      1
Candace 2 Main St           13      Denver      Colorado        33333   Sheet1      2
Dirk    2 Main St           14      Wilmington  Delaware        44444   Sheet1      3
Edward  2 Marvin Gardens    11      Nome        Alaska          11111   Sheet2      0
Fran    2 Marvin Gardens    12      Providence  Rhode Island    22222   Sheet2      1
George  2 Marvin Gardens    13      Denver      Colorado        33333   Sheet2      2
Hannah  2 Marvin Gardens    14      Wilmington  Delaware        44444   Sheet2      3
Irvin   2 St Marks Place    11      Nome        Alaska          11111   Sheet3      0
Jasmine 2 St Marks Place    12      Providence  Rhode Island    22222   Sheet3      1
Kirk    2 St Marks Place    13      Denver      Colorado        33333   Sheet3      2
Lana    2 St Marks Place    14      Wilmington  Delaware        44444   Sheet3      3
Alice   3 Main St           11      Nome        Alaska          11111   Sheet1      0
Bob     3 Main St           12      Providence  Rhode Island    22222   Sheet1      1
Candace 3 Main St           13      Denver      Colorado        33333   Sheet1      2
Dirk    3 Main St           14      Wilmington  Delaware        44444   Sheet1      3
Edward  3 Marvin Gardens    11      Nome        Alaska          11111   Sheet2      0
Fran    3 Marvin Gardens    12      Providence  Rhode Island    22222   Sheet2      1
George  3 Marvin Gardens    13      Denver      Colorado        33333   Sheet2      2
Hannah  3 Marvin Gardens    14      Wilmington  Delaware        44444   Sheet2      3
Irvin   3 St Marks Place    11      Nome        Alaska          11111   Sheet3      0
Jasmine 3 St Marks Place    12      Providence  Rhode Island    22222   Sheet3      1
Kirk    3 St Marks Place    13      Denver      Colorado        33333   Sheet3      2
Lana    3 St Marks Place    14      Wilmington  Delaware        44444   Sheet3      3

constantstranger
  • 4,502
  • 2
  • 2
  • 15
  • Hey constant, thank yiou for your response. Unfortunately,i receive the following error - raise ValueError("No objects to concatenate") ValueError: No objects to concatenate – Joe Tha May 11 '22 at 01:01
  • Nevermind. I forgot to define folder path in my code. Sorry about that. Marking this as correct. I also changed the to output.xlsx to avoid the warning. Thanks again for your canonical answer.Very helpful! – Joe Tha May 11 '22 at 01:05
  • Hi constantstranger, im facing an issue within the output. So since this is a concatenation, there are some blank rows at the end of sheets that are also being concatenated. I have a way to distinguish these rows. If Zip = 00nan, then i want the entire row deleted. How can I implement this within your answer? – Joe Tha May 11 '22 at 02:01
  • Will dfCombined = dfCombined [dfCombined.Zip != '00nan'] work? – Joe Tha May 11 '22 at 02:05
  • 1
    Yes, that should do it. – constantstranger May 11 '22 at 02:30
  • constantstranger, it worked. I wanted to know How i can write the output.xlsx file to a different folder. I created an output_path and did writer = pd.ExcelWriter(output_path,'Output.xlsx'). But im getting a value error saying No excel writer 'Output.xlsx'. is output_path not a valid arguement? – Joe Tha May 11 '22 at 17:36
  • @Joe Tha I think you just need `pd.ExcelWriter(output_path + '/' + 'Output.xlsx')`. Or, if you prefer to use an f-string, `pd.ExcelWriter(f'{output_path}/Output.xlsx')`. Your code probably made ExcelWriter think you were trying to set an argument other than `path` to be `Output.xlsx`. – constantstranger May 11 '22 at 19:43
  • Hey. thanks for replying. Although these two methods are writing to the new outputpath. for some reason the file says excel cannot open the file 'Output.xlsx' because format/extension is not valid – Joe Tha May 11 '22 at 21:24
  • when i dont set this output path an change the pd.excelwriter script. and just go by your original answer script, i do not get this error and am able to open the file – Joe Tha May 11 '22 at 21:26
  • Did you manually change the name of any file from .xls to .xlsx? That can cause this kind of behavior. See [here](https://techcommunity.microsoft.com/t5/excel/excel-cannot-be-open-the-file-filename-xlsx-because-the-file/m-p/1656863/highlight/true#M74623) for example. Please ensure that you can open all your input files, and I would suggest deleting the Output.xlsx file (if it exists) and trying again. – constantstranger May 11 '22 at 21:51
  • never changed names manually. Im also able to open all input files. deleted output.xlsx and tried again but same error. Do i need to change something in dfCombined.to_excel(writer, index=None, sheet_name='Combined')? – Joe Tha May 11 '22 at 21:59
  • I did dfCombined.to_excel('C:/Users/hhh/Desktop/autotranscribe/python/Vet Matching/Results/Output.xlsx', index = False) and i can see the file now. I removed the line under that dfCombined.to_excel(writer, index=None, sheet_name='Combined'). Is this ok or will the data quality change? – Joe Tha May 11 '22 at 22:22
  • Looks like you solved it, just open the Output.xlsx file and make sure it looks right. – constantstranger May 11 '22 at 22:29
  • yeah it looks good. Thanks constant and have a great rest of your day buddy! – Joe Tha May 11 '22 at 22:31
  • https://stackoverflow.com/questions/72209641/how-do-i-read-sql-stored-procedure-data-through-pyodbc-and-get-results-into-a-da hey,not sure if you have familiarity with pyodbc but thought id ask incase – Joe Tha May 12 '22 at 05:40