Good day,
I am trying to use a library, which uses Spark/PySpark and I am trying to multiprocess it. However, whenever I use Pool with it, I got a Py4J error. Similar situations are already discussed in these questions (q1 & q2), however, there's still no answer (at least for me), why doesn't Pool work, while ThreadPool does? Even more, I don't get why is that replacement ok, while multiprocessing takes a certain number or processes/cores (if I am getting it right), multithreading divides the tasks, though these are performed one by one. Is it possible to still run several processes simultaneously or not? And finally why does Py4JError actually occurs?