95

I have a main class that has a ton of different functions in it. It's getting hard to manage. I'd like to be able to separate those functions into a separate file, but I'm finding it hard to come up with a good way to do so.

Here's what I've done so far:

File main.py

import separate

class MainClass(object):
    self.global_var_1 = ...
    self.global_var_2 = ...

    def func_1(self, x, y):
        ...
    def func_2(self, z):
        ...
    # tons of similar functions, and then the ones I moved out:

    def long_func_1(self, a, b):
        return separate.long_func_1(self, a, b)

File separate.py

def long_func_1(obj, a, b):
    if obj.global_var_1:
        ...
    obj.func_2(z)
    ...
    return ...
# Lots of other similar functions that use info from MainClass

I do this because if I do:

obj_1 = MainClass()

I want to be able to do:

obj_1.long_func_1(a, b)

instead of:

separate.long_func_1(obj_1, a, b)

I know this seems kind of nit-picky, but I want just about all of the code to start with obj_1., so there isn't confusion.

Is there a better solution that what I'm currently doing? The only issues that I have with my current setup are:

  1. I have to change arguments for both instances of the function
  2. It seems needlessly repetitive

I know this has been asked a couple of times, but I couldn't quite understand the previous answers and/or I don't think the solution quite represents what I'm shooting for. I'm still pretty new to Python, so I'm having a tough time figuring this out.

Peter Mortensen
  • 30,030
  • 21
  • 100
  • 124
  • 11
    If you are new to Python, **just stick to the conventions** and keep all methods for a class in the same file. – Martijn Pieters Nov 29 '17 at 21:09
  • 5
    If you must group your methods into separate modules, use inheritance; create a base class in one module, import it and subclass it in the other. – Martijn Pieters Nov 29 '17 at 21:11
  • 4
    @MartijnPieters I know I could do that, but none of the functions within the class are finalized, so I find myself scrolling a lot to find the appropriate one, which takes more time than I'd like simply because there's so many. –  Nov 29 '17 at 21:11
  • 4
    That's not a problem to be solved by changing the code; that's a problem to be solved by using [an IDE](https://wiki.python.org/moin/IntegratedDevelopmentEnvironments) which allows you to jump to the location of a function. (Or use your text editor's "find" functionality.) – David Z Nov 29 '17 at 21:31
  • 3
    If a file is not enough for all the methods, then likely you have a problem with the design. The class is too `heavy` and probably splitting it into two or three classes (and files) is the solution. – trinchet Nov 29 '17 at 21:33
  • 1
    Ya, you have to ask your self why you have a class with so may big methofs. – Robert Moskal Aug 19 '19 at 18:45
  • I split a big class into core class and library module that takes class as argument. It works it is approved pattern, but it does no look very natural at the call points, I would rather have a big file. How bad is 2k lines in one .py file? It is easy to break this limit with explicit module names, type annotations and low limit on line length. – uuu777 Oct 14 '20 at 16:12

3 Answers3

128

Here is how I do it:

  1. Class (or group of) is actually a full module. You don't have to do it this way, but if you're splitting a class on multiple files I think this is 'cleanest' (opinion).

  2. The definition is in __init__.py, methods are split into files by a meaningful grouping.

  3. A method file is just a regular Python file with functions, except you can't forget 'self' as a first argument. You can have auxiliary methods here, both taking self and not.

  4. Methods are imported directly into the class definition.

Suppose my class is some fitting GUI (this is actually what I did this for first time). So my file hierarchy may look something like

mymodule/
     __init__.py
     _plotstuff.py
     _fitstuff.py
     _datastuff.py

So plot stuff will have plotting methods, fit stuff contains fitting methods, and data stuff contains methods for loading and handling of data - you get the point. By convention I mark the files with a _ to indicate these really aren't meant to be imported directly anywhere outside the module. So _plotsuff.py for example may look like:

def plot(self,x,y):
     #body
def clear(self):
     #body

etc. Now the important thing is file __init__.py:

class Fitter(object):
     def __init__(self,whatever):
         self.field1 = 0
         self.field2 = whatever

     # Imported methods
     from ._plotstuff import plot, clear
     from ._fitstuff  import fit
     from ._datastuff import load

     # static methods need to be set
     from ._static_example import something
     something = staticmethod(something)

     # Some more small functions
     def printHi(self):
         print("Hello world")

Tom Sawyer mentions PEP-8 recommends putting all imports at the top, so you may wish to put them before __init__, but I prefer it this way. I have to say, my Flake8 checker does not complain, so likely this is PEP-8 compliant.

Note the from ... import ... is particularly useful to hide some 'helper' functions to your methods you don't want accessible through objects of the class. I usually also place the custom exceptions for the class in the different files, but import them directly so they can be accessed as Fitter.myexception.

If this module is in your path then you can access your class with

from mymodule import Fitter
f = Fitter()
f.load('somefile') # Imported method
f.plot()           # Imported method

It is not completely intuitive, but not too difficult either. The short version for your specific problem was you were close - just move the import into the class, and use

from separate import long_func_1

and don't forget your self!

kabanus
  • 22,925
  • 6
  • 32
  • 68
  • Are the private imports occur in `__init__` or outside of it? (Hint: check your indentation...) – cowbert Nov 29 '17 at 21:54
  • 1
    @cowbert Outside, see tabbing. I'll add some example code to make it clear. You only need it to compile once with the class, not every object. – kabanus Nov 29 '17 at 21:55
  • 3
    I'd like to add as a general comment that sometimes it doesn't make sense to split a class into sub-classes, and refactoring long class code is something you may run into even in Python. – kabanus Nov 29 '17 at 22:02
  • 1
    Splitting a "class" into multiple files is also common ECMAScript pattern (using the `var PseudoClass = PseudoClass || {}` idiom), so if you are doing some fullstack development with Python middleware, it might make sense to split a Python class for various reasons. – cowbert Nov 29 '17 at 22:44
  • 1
    using import in middle of the file is bad practice – TomSawyer Apr 12 '19 at 19:11
  • @TomSawyer Is that so? I never heard or had reason to believe what I did here is problematic. Can you explain what are the consequences? – kabanus Apr 12 '19 at 19:14
  • @kabanus https://stackoverflow.com/questions/1188640/good-or-bad-practice-in-python-import-in-the-middle-of-a-file – TomSawyer Apr 13 '19 at 02:41
  • @TomSawyer Thanks, I am familiar with the guidelines and the question, I thought you had something else in mind. My only 2 counterpoints are: 1. I usually put all in-class imports below init, which is not really "middle", and you could put them before - which would be top. This happens often when there is an initial non-import line, such as conditional imports. Matter of preference and strictness I suppose. 2. This use-case is unique. The imports are not of some library, but of the defined class itself and by definition should be scoped. I have my own rule of thumb - it's good practice - – kabanus Apr 13 '19 at 08:56
  • - to consider applying guidelines on a case by case basis. This is of course, completely a matter of opinion. You are welcome to add your alternative solution, it would be good to have as many different useful answers as possible. – kabanus Apr 13 '19 at 08:57
  • Is there a way to put the __init__() in separate script just like the methods? I am keen to 'plug and play' different initializations while maintaining everything else – Tian Oct 09 '19 at 06:18
  • 1
    @Tian Yup, same as any method. Make a file, and import it from there (`from .init1 import __init__`, where init1.py contains `def __init__(self...):...`) etc. Next time I syggest trying out these things yourself (probably faster). – kabanus Oct 09 '19 at 08:29
  • A way to create static methods is by calling `plot = staticmethod(plot)` after the import. You could also automate this process by writing a metaclass... – Nearoo Feb 21 '20 at 07:40
  • @Nearoo Thanks, I added that. – kabanus Feb 21 '20 at 11:21
  • This may create `circular (or cyclic) imports` issue if there is `if name == __main__ ` on the imported modue which tests itself – alper May 05 '20 at 11:12
  • @alper I'm not sure I understand what you say. Which module tests for `__name__ == __main__` (I'm guessing you had a typo?)? The modules imported into the class are by the way I defined above not supposed to be imported standalone, as they are an integral part of the class. That would be like importing methods from a class without a class. As such they really shouldn't have that `if` line at all. Regardless, it is also not clear to me how that `if` statement can create a circular import? – kabanus May 05 '20 at 11:24
  • For example in `_plotstuff.py` file, lets say I write following for testing: `f = Fitter(); f.plot() #Imported method` under a `if __name__ == "__main__":`. When I add `from mymodule import Fitter` to top of the file `circular import` error shows up. So I had to carry `from mymodule import Fitter`into the `if __name__ == __main__:` – alper May 05 '20 at 11:31
  • @alper I see - that would be a complete anti-pattern to this. `_plotstuff` cannot import `Fitter` by definition. In fact, it should never (the way this design is imagined) be invoked directly. That is like a method in your class trying to import your parsed class into an inner method (pre-parsing), after trying to invoke it without parsing the class - which does not make sense to me. To me it seems the circularity is made by hand here - a bad design which causes this is not related specifically to whether we are splitting a class or just have two modules which have a circular dependence. – kabanus May 05 '20 at 11:38
  • 1
    @alper The only place I can imagine something remotely close needed is if you have in a method in `_plotstuff` something like `Fitter.method(...)`. I think that is the only thing that is truly missing compared to a one-file class, but of course, you can always use `self` instead, which my opinion is better design (though up for debate I guess). – kabanus May 05 '20 at 11:40
  • I was converting my code into your explained pattern but like all the files like `_plotstuff.py` have their `if __name__ == __main__: ` to test themselfs. Basically when I call the `_plotstuff.py` it was testing `plot()` function. Along with this design I will carry my testing into completely different file, which would be a better design i guess – alper May 05 '20 at 11:45
  • 1
    @alper I'm afraid that won't work for the above reasons. You have to test the class as a whole, and as such test `plot` only as part of a complete object, just like you would any class. I think your conclusion is correct - testing should be separated or at least only exist in the `__init__` file, where the class is actually made. That would fit better with the whole concept – kabanus May 05 '20 at 11:48
  • Is there a way to import necessary third-party modules only inside the top-level file (__init__.py) and not inside every single file of this class (_plotstuff.py, _fitstuff.py, _datastuff.py) ? I have many repetitive imports. – tevang Jul 05 '20 at 11:03
  • @tevang If you have a dependency that is relevant to every implementation file (as in you call it from there), you will have to import it there. That is the dependency structure, the implementations do not see the `__init__` file at all. This also conforms to how Python works. Perhaps if you have a specific use-case it has a better solution? What do you need to import everywhere? – kabanus Jul 05 '20 at 12:22
  • @kabanus in the simplest scenario "import copy", it the most complicated other necessary functions from my code. When you have all functions within a class placed within a single file, you import the modules only once at the top of the file. However, when you split the functions into separate files you have to import the same modules in every file individually, is this right? – tevang Jul 05 '20 at 12:28
  • @tevang That's still to general. The short answer to your question is yes, you must import individually. What I meant is it sometimes it makes sense to save these methods withing the object itself. For example, If you are `deepcopy`ing fields a lot, then it might make sense for the fields to have their own copy method. `numpy` arrays for example have their own copy methods (or function arguments). Another option is for the object itself to have a copy method (so you can use `self.my_copy(...)`, but again, this is very dependent on your exact use case whether this makes sense. – kabanus Jul 05 '20 at 12:50
  • @tevang This is too much for comments though, if you want you can post a question with specific examples of which libraries you need everywhere, so it will be clear if there are better solutions (within the split class framework) to your specific example. – kabanus Jul 05 '20 at 12:52
  • There is an error in the exampe: when calling Fitter.__init__ it expects a parameter – Amir Katz Sep 23 '20 at 09:53
  • 1
    Excellent work! Here lies the difference between responsible choices and bureaucracy. (The otherwise suggested) subclassing is intended to show inheritance, not as a workaround to group parts of the same entity. Sometimes even mixins don't fit well for this. – dawid Oct 22 '20 at 23:25
  • 1
    Additionally, if the tools (linter, IDE etc.) complain or cannot handle our sensible choices, the problem is with the tools. Software architecture should be served by the tools not the opposite. BTW, the Python built-in libs themselves are plenty of examples of better choices, including name convention, than that *recommended* by PEP8 (which is intended for the language development, not as a sacred book for everything else). – dawid Oct 22 '20 at 23:32
  • 1
    No PEPper could predict all scenarios/hardware/software advances (very far from that, they were mostly worried with the language development itself), and no rule is perfect, and they knew that, as it is clearly stated as guidelines. – dawid Oct 22 '20 at 23:37
14

I use the approach I found here. It shows many different approaches, but if you scroll down to the end, the preferred method is to basically go the opposite direction of @Martin Pieter's suggestion which is have a base class that inherits other classes with your methods in those classes.

So the folder structure is something like:

_DataStore/
    __init__.py
    DataStore.py
    _DataStore.py

So your base class would be:

File DataStore.py

import _DataStore

class DataStore(_DataStore.Mixin): # Could inherit many more mixins

    def __init__(self):
        self._a = 1
        self._b = 2
        self._c = 3

    def small_method(self):
        return self._a

Then your Mixin class:

File _DataStore.py

class Mixin:

    def big_method(self):
        return self._b

    def huge_method(self):
        return self._c

Your separate methods would be located in other appropriately named files, and in this example it is just _DataStore.

I am interested to hear what others think about this approach. I showed it to someone at work and they were scared by it, but it seemed to be a clean and easy way to separate a class into multiple files.

Peter Mortensen
  • 30,030
  • 21
  • 100
  • 124
Jeff Tilton
  • 1,126
  • 11
  • 25
  • 1
    I think it's a valid approach. You might also raise an exception in `__init__` from the Mixin class to discourage users to instantiate the Mixin class. – mjspier Nov 18 '19 at 10:32
  • 1
    I have used mixins, probably too much. It is an easy way to keep file sizes down but lets class sizes be any size. The problem is you aren't refactoring the way you do with inheritance, rather, just making a mega-class. Whether using mixins or monkey patching, the downfall is your linter can't process it, so you have to find errors the hard way. For this one reason, subclassing is preferred, not to mention better encapsulation, et al. – Wyrmwood Aug 27 '20 at 22:58
6

Here is an implementation of Martijn Pieters's comment to use subclasses:

File main.py

from separate import BaseClass

class MainClass(BaseClass):
    def long_func_1(self, a, b):
        if self.global_var_1:
            ...
        self.func_2(z)
        ...
        return ...
    # Lots of other similar functions that use info from BaseClass

File separate.py

class BaseClass(object):

    # You almost always want to initialize instance variables in the `__init__` method.
    def __init__(self):
        self.global_var_1 = ...
        self.global_var_2 = ...

    def func_1(self, x, y):
        ...
    def func_2(self, z):
        ...
    # tons of similar functions, and then the ones I moved out:
    #
    # Why are there "tons" of _similar_ functions?
    # Remember that functions can be defined to take a
    # variable number of/optional arguments, lists/tuples
    # as arguments, dicts as arguments, etc.

from main import MainClass
m = MainClass()
m.func_1(1, 2)
....
Peter Mortensen
  • 30,030
  • 21
  • 100
  • 124
cowbert
  • 2,939
  • 2
  • 21
  • 33