I am solving (or say, trying to find good solutions for) an arbitrary combinatorial optimization problem, think of it as a Vehicle Routing Problem with a bunch of side constraints that are not relevant for this question. I coded an Adaptive Large Neighborhood Search (ALNS) for it.
I want to explore "easy" opportunities to enhance its speed by using some kind of parallelization, to at least make use of what every modern computer has available nowadays (more than a single core).
I have already tried two simple "tweaks". 1) Simply run the ALNS for each available thread and pick the best solution at the end, and 2) Run the ALNS for each available thread, but ensure that after every X iterations each thread continues with the best solution of all the threads at that time.
My question is if there are other alternatives to parallelize an ALNS which are easy to implement, won't require days of coding, and potentially impact the runtime/solution quality in a positive way?
Ps. I code in C++, so suggestions tailored towards that are welcome.