0

Each time I develop stuff in Python, I get annoyed by the fact that I have to switch between these two statement in order for my scripts to work both when I import script in a interpreted (e.g. Spyder on Ubuntu or even directly in a Python console) and in a console, when I 'launch' my scripts root@machine# python script.py:

import os
some_input_data_path = os.path.join(os.path.dirname(os.path.dirname(__file__),'input.csv')

works when the script is launch as an executable, but not in a interpreter.

import os
some_input_data_path = os.path.join(os.path.dirname(os.getcwd()),'input.csv')

works when the script is run in a interpreter, but not when it is launched as an executable.

I have set up a convenience try: block at the beginning of each of my scripts files to set __file__ as so:

import os

try:
    __file__
except NameError:
    __file__ = os.path.join(os.getcwd(), 'test.py')
    print("Warning: script is not run as a module. "
          "Setting '__file__' to: {}".format(__file__))
else:
    pass

I wonder if there are good practices of if there is some other (better) things that I can do to work (without having to manually switch something) both within my interpreter (mainly to develop stuff), and when executing the scripts in a terminal (mainly when they are used in production)?

Use case

Using this file:
$ cat script.py

import os

some_input_data_path = os.path.join(os.path.dirname(os.getcwd()), 'input.csv')

print(some_input_data_path)

when I execute this in Spyder I got this printed:
'/home/username/scriptdir/input.csv'
which is fine.

If I execute this script in bash:

user@machine:/home/username/scriptdir$ python script.py
'/home/username/scriptdir/input.csv'

but if I cd ..:

user@machine:/home/username$ python scriptdir/script.py
'/home/username/input.csv' # <- this is obviously no more where the csv input data file is.
s.k
  • 3,585
  • 6
  • 30
  • 65
  • 1
    do you maybe have an example of how you are using `__file__` subsequently? maybe there could be a use-case specific "best practice". note that `os.getcwd()` and the way you are setting `__file__` might not necessarily get you what you want: suppose you executed a script in a subfolder (such as `python foo/bar.py`), then `os.getcwd()` will give you the parent directory and not `./foo`). – sim Jan 10 '21 at 12:40
  • 1
    Yes, and in that case, the script is no more able to figure out the right location of some input data files (as shown in my edit). – s.k Jan 10 '21 at 12:46
  • If the usecase is about data inputs, then I would suggest having a lightweight configuration (see https://stackoverflow.com/questions/6198372/most-pythonic-way-to-provide-global-configuration-variables-in-config-py?noredirect=1&lq=1 for some discussion on related best practices) that defines a data directory. – sim Jan 11 '21 at 08:17

1 Answers1

-1

I think I understand what you're trying to do (and if I didn't please comment below).

If you're asking what is a good practice, I'd suggest the following:

  • Try to avoid module level variables.
  • Write some initializer function that accepts a path and sets all dependent objects in your module to your path. This way, your module is generic.
  • Next, when you "double-click" to execute, there should a separate "main" module. This main module calls your module and initializes paths with os.path.dirname(__file__).
  • When you run it in an REPL (python console), you would be working with your module (not main). Import it, and call the same initialization method with your os.getcwd().

Example:

File main.py

from yourlib.filename import process
process(__file__)

File yourlib/filename.py

..
..
def process(path):
    # Whatever you want to do.

When running in the python console:

>>> from yourlib.filename import process
>>> process(os.getcwd())
UltraInstinct
  • 41,605
  • 12
  • 77
  • 102