1

My code-

import pandas as pd
import datetime as dt
import os
import shutil
    path = "C:/Users/rocky/Desktop/autotranscribe/python/Matching"
destination_path = ('C:/Users/rocky/Desktop/autotranscribe/python/Matching/Processed')
    for file in os.listdir("C:/Users/rocky/Desktop/autotranscribe/python/Matching"):
        if file.startswith("TVC")and "(updated headers)" not in file:
            dfs = pd.read_excel(file, sheet_name=None)
            output = dict()
            for ws, df in dfs.items():
                if ws in ["Opt-Ins", "New Voting Members", "Temporary Members"]:
                    continue
                if ws in ["Voting Members", "Removed Members"]:
                    temp = df
                    dt = pd.to_datetime(os.path.getctime(os.path.join(path,file)),unit="s").replace(nanosecond=0)
                    temp['Status'] = "Active" if ws == "Voting Members" else "Cancelled"
                    output[ws] = temp
            writer = pd.ExcelWriter(f'{file.replace(".xlsx","")} (updated headers).xlsx')
            for ws, df in output.items():
                df.to_excel(writer, index=None, sheet_name=ws)
            writer.save()
            writer.close()
shutil.move(path, destination_path )

I just want the file that is being processed here if file.startswith("TVC")and "(updated headers)" not in file: to be moved to another folder directory after all my code is processed and the output file is produced..where do i apply shutil.move? i put it outside my inner loop but the file didnt populate in my dest folder as expected. Do i need an if statement? if script runs successfully and output xls produced then move original file to destination folder?

Joe Tha
  • 163
  • 9
  • You should be able to move the folder any time after loading the pandas dataframe. Outside of which loop are you putting `shutil.move()` – sami-amer May 07 '22 at 02:51
  • l@sami-amer i dont want to move the folder. just the file. lets say you have an xls called df.xls that you read into with pandas from a directory. then you append using some pandas commands, after that it saves the output file dfnew.xls in the source dir, but the original gile df.xls should be moved to a different directory since i have already ran the script to process it. does this help? – Joe Tha May 07 '22 at 03:11
  • Sorry misspoke, I meant file. You should still be able to move it any time after you load the data frame as it is no longer needed. Could you please share the code you are using to move the file? Additionally, I am not using where the variable `path` is being initialized. – sami-amer May 07 '22 at 03:31
  • 1
    This isn't related to your issue, but `f'{file.replace(".xlsx","")} (updated headers).xlsx'` is a very complicated way to write `file.replace(".xlsx", " (updated headers).xlsx")`. – Blckknght May 07 '22 at 03:47
  • @sami-amer hey i re edited. hope this helps – Joe Tha May 07 '22 at 03:48
  • what shutil.move is doing is just taking all files from path and sending to dest. i just want it to send startswith("TVC") after my script has finished running – Joe Tha May 07 '22 at 05:29

2 Answers2

1

Given what the code in your question is trying to do, you may want to consider building out the capability and/or modifying the file creation strategy slightly.

Here are two possible strategies.

#1: Create the new file and then move it.

#2: Create the new file in place in its destination directory.

In either case, the following related issues could also be considered:

  • What to do if the destination directory doesn't exist? You could allow default behavior of python standard library calls (raise an exception) or you can create the directory.
  • What to do if the destination file already exists? You can allow default standard library behavior (raise an exception), or overwrite it (silently or with a warning), or rename the existing file before putting the new file into the destination directory.

As an example, the code below uses strategy #1 (create and move file), creates the destination directory if needed, and overwrites any existing file with a warning. (I have changed the names of the source and destination paths for testing purposes.)

import pandas as pd
#import datetime as dt
import os
import shutil
#path = "C:/Users/rocky/Desktop/autotranscribe/python/Matching"
#destination_path = ('C:/Users/rocky/Desktop/autotranscribe/python/Matching/Processed')
path = "."
destination_path = ('./Processed')
for file in os.listdir(path):
    if file.startswith("TVC")and "(updated headers)" not in file:
        dfs = pd.read_excel(file, sheet_name=None)
        output = dict()
        for ws, df in dfs.items():
            if ws in ["Opt-Ins", "New Voting Members", "Temporary Members"]:
                continue
            if ws in ["Voting Members", "Removed Members"]:
                temp = df
                #dt = pd.to_datetime(os.path.getctime(os.path.join(path,file)),unit="s").replace(nanosecond=0)
                temp['Status'] = "Active" if ws == "Voting Members" else "Cancelled"
                output[ws] = temp

        destination_file = f'{file.replace(".xlsx","")} (updated headers).xlsx'
        destination_file_with_path = f'{destination_path}/{file.replace(".xlsx","")} (updated headers).xlsx'

        # Create and move file, create the destination directory if needed, and overwrite any existing file with a warning:
        writer = pd.ExcelWriter(destination_file)
        for ws, df in output.items():
            df.to_excel(writer, index=None, sheet_name=ws)
        writer.save()
        writer.close()
        if not os.path.exists(destination_path):
            os.makedirs(destination_path)
        if os.path.exists(destination_file_with_path):
            print(f'{destination_file_with_path} already exists, overwriting')
            os.remove(destination_file_with_path)
        shutil.move(destination_file, destination_path)

Alternatively, if you want to use strategy #2 (create file in its destination directory) again creating the destination directory if needed and overwriting any existing file with a warning, you can do it this way (note that shutil is not required for this approach):

import pandas as pd
#import datetime as dt
import os
#import shutil
#path = "C:/Users/rocky/Desktop/autotranscribe/python/Matching"
#destination_path = ('C:/Users/rocky/Desktop/autotranscribe/python/Matching/Processed')
path = "."
destination_path = ('./Processed')
for file in os.listdir(path):
    if file.startswith("TVC")and "(updated headers)" not in file:
        dfs = pd.read_excel(file, sheet_name=None)
        output = dict()
        for ws, df in dfs.items():
            if ws in ["Opt-Ins", "New Voting Members", "Temporary Members"]:
                continue
            if ws in ["Voting Members", "Removed Members"]:
                temp = df
                #dt = pd.to_datetime(os.path.getctime(os.path.join(path,file)),unit="s").replace(nanosecond=0)
                temp['Status'] = "Active" if ws == "Voting Members" else "Cancelled"
                output[ws] = temp

        destination_file = f'{file.replace(".xlsx","")} (updated headers).xlsx'
        destination_file_with_path = f'{destination_path}/{file.replace(".xlsx","")} (updated headers).xlsx'

        # Create the destination directory if needed, create the new file in the destination directory, and overwrite any existing file with a warning:
        if not os.path.exists(destination_path):
            os.makedirs(destination_path)
        if os.path.exists(destination_file_with_path):
            print(f'{destination_file_with_path} already exists, overwriting')
            os.remove(destination_file_with_path)
        writer = pd.ExcelWriter(destination_file_with_path)
        for ws, df in output.items():
            df.to_excel(writer, index=None, sheet_name=ws)
        writer.save()
        writer.close()
constantstranger
  • 4,502
  • 2
  • 2
  • 15
  • 1
    @Joe Tha Do you need additional help on this question? – constantstranger May 08 '22 at 04:28
  • constsnt stranger. thank you for your canonical answer. had to test mockups for both strategies involved.First one seems more applicable in my case. – Joe Tha May 09 '22 at 19:17
  • Hey constantstranger. one more sub question. How do I change the name of my output file? I know it is being defined here - writer = pd.ExcelWriter(f'{file.replace(".xlsx","")} (updated headers).xlsx') . However, I want my final processed file to just be called TVC_Processed(updated headers).xlsx instead of TVC Membership List 04-01-22 (updated headers). – Joe Tha May 10 '22 at 16:21
  • 1
    @Joe Tha You can just change the destination_file assignment to be `destination_file = TVC_Processed(updated headers).xlsx'`. Or, if you want it to be slightly more generic, `destination_file = f'{file[:3]}_Processed(updated headers).xlsx'`. – constantstranger May 10 '22 at 17:08
  • https://stackoverflow.com/questions/72189803/how-do-i-send-my-output-xls-files-to-a-specific-path-in-python I have another question related to this – Joe Tha May 10 '22 at 17:18
  • constantstranger. ive changed my strategy a bit because of some inconsistencies for file drop by the team. i want to write my output directly to a different path. and then whenever the script will run it will just overwrite files in the new output folder. Please see the link above for my new question – Joe Tha May 10 '22 at 17:34
0

There are several problems. If you want to move files based on the names you get in the for loop, then you need to do the move in the for loop.

A bigger problem is that os.listdir just returns file names. If you want to open or access the file, you need to add the path back in.

There's no point in your temp = df stuff. Just make the changes to df.

import pandas as pd
import datetime as dt
import os
import shutil

path = "C:/Users/rocky/Desktop/autotranscribe/python/Matching"
destination_path = path + '/Processed'
for file in os.listdir(path)
    if file.startswith("TVC") and "(updated headers)" not in file:
        fullpath = path + '/' + file
        dfs = pd.read_excel(fullpath, sheet_name=None)
        output = dict()
        for ws, df in dfs.items():
            if ws in ["Voting Members", "Removed Members"]:
                dt = pd.to_datetime(os.path.getctime(os.path.join(path,file)),unit="s").replace(nanosecond=0)
                df['Status'] = "Active" if ws == "Voting Members" else "Cancelled"
                output[ws] = df
        writer = pd.ExcelWriter(fullpath.replace(".xlsx","(updated headers).xlsx")
        for ws, df in output.items():
            df.to_excel(writer, index=None, sheet_name=ws)
        writer.save()
        writer.close()
        shutil.move(fullpath, destination_path )
Tim Roberts
  • 34,376
  • 3
  • 17
  • 24