1

I want to structure a Python repo with multiple Spark applications, each one is a separate application. I want to be able to have some common packages which all other can use, and some packages which are standalone spark applications.

I need to be able to build each of the packages separately into a wheel file, both the common packages and the standalone spark applications.

Also I want to have test files for each of these packages separately.

Is the following structure a good practice?

root
├── common_package_a
│   ├── package_a_tests
│   ├── requirements.txt
│   ├── venv
│   ├── setup.py
├── common_package_b
│   ├── package_b_tests
│   ├── requirements.txt
│   ├── venv
│   ├── setup.py
│   .
│   .
│   .
├── spark_application_a
│   ├── spark_application_a_tests
│   ├── requirements.txt
│   ├── venv
│   ├── setup.py
├── spark_application_b
│   ├── spark_application_b_tests
│   ├── requirements.txt
│   ├── venv
│   ├── setup.py

I can't find a recommended structure for this goal, all examples of how to build a python project always have a single setup.py in the root dir, a single venv for the entire project.

I've looked at some questions similar to mine:

  1. https://discuss.python.org/t/how-to-best-structure-a-large-project-into-multiple-installable-packages/5404/2
  2. How do you organise a python project that contains multiple packages so that each file in a package can still be run individually?

Thanks!

Or Bar Yaacov
  • 157
  • 1
  • 11

0 Answers0