Multithreading vs DPCC SYCL Programming for OR computational implementations

Question

While well-known IP/LP solvers such as CPLEX and Gurobi have capabilities to run their solver on multiple threads, is there any tool that helps exploit the availability of not only multiple threads on a single CPU, but also increasingly, the availability of multiple processing units themselves? For instance, my laptop has one CPU with 8 threads and two other GPUs.

(a) Are there solvers that exploit the presence of multiple processing units -- CPU + additional GPUs on a machine?

(b) Is there any benefit to learning DC++ and SYCL (and other options such as Open CL / CUDA) programming for operations research computations?

(c) Are there any open source template code bases in C/C++ that someone can use to build their own OR models?

Erwin Kalvelagen · Answer 1 · 2023-02-09T06:53:54.323

(a) Are there solvers that exploit the presence of multiple processing units -- CPU + additional GPUs on a machine?

Many solvers can work with multiple CPUs. Fewer can do distributed optimization. GPUs have been used in some academic studies, but in general, GPUs don't work very well when dealing with large sparse LP/MIP models. So there is, as of yet, not much commercial interest. Of course, GPUs are much more popular in dense linear algebra and machine learning.

(b) Is there any benefit to learning DC++ and SYCL (and other options such as Open CL / CUDA) programming for operations research computations?

If you want to develop your own high-performance algorithms, this may be of some interest. For modeling, not so much (often better to just use readily available tools -- writing your own tools is often not a good investment: buying good tools is much cheaper than developing them).

(c) Are there any open source template code bases in C/C++ that someone can use to build their own OR models?

Most model development is not done in C++ but in higher-level languages such as Python or using modeling languages such as AMPL and GAMS. C++ is more for developing high-performance algorithms than for model building. Modelers like to be efficient (they often work under time pressure), and using more low-level languages does not help there. Also, model maintenance may be more difficult when using low-level languages. Performance is usually less of an issue when doing modeling (opposed to when running solvers to solve these models: solvers are often written in C or C++).

It seems you are mixing model development and solver development. These are two very different crafts. At least for traditional mathematical programming (e.g. LP/MIP) and for constraint programming (CP). When writing your own tailored heuristic algorithms, the difference may be a bit more blurred.

Thank you. Even in model development, could it not be the case that you have to write your own callback routine/separation algorithm/pricing algorithm, etc., to feed into CPLEX/Gurobi, etc. that could benefit from HPC/GPU offloading? It seems to me that in such cases, very few programming languages can beat C/C++. Hence my post. — Tryer, Feb 09 '23 at 06:54
Most practical models don't use these facilities. So I did not consider this <1% of the cases (may be more for academics). Practical models are often less well structured and don't benefit as much from these approaches. — Erwin Kalvelagen, Feb 09 '23 at 06:58
Many of the HPC examples I have come across seem to come from academic mechanical engineering/physics community. So, even within the academic OR community, there may not be very many applications of HPC/GPU programming other than solver programming. It is somewhat surprising to be honest as I would expect academic OR/stats/applied math community modellers to be highly interested in gpu programming. Thank you for your insight. — Tryer, Feb 09 '23 at 07:06
I don't see much overlap in modeling and developing algorithms/code for GPUs. I would hope modelers would spend more time on understanding the business case than on tinkering with GPUs. — Erwin Kalvelagen, Feb 09 '23 at 08:12

ErlingMOSEK · Accepted Answer · 2023-02-10T06:00:31.670

There at least 4 to 5 companies building and selling software for solution of (mixed) linear optimization problems. All these companies has many very clever employees that have put many man years into the software.

I am pretty sure if any of those companies thought they could make their software be much better than the competitors by using GPUs then they would do it. Simply because they could make of lot of money that way.

I have been working on Mosek for almost 25 years and based on that experience then there is no simple trick like using a GPU that will make the software much better. Rather if all optimization software vendors is not doing something then it is most likely impossible or not worthwhile.

Regarding multithreading as asked in the comments after the initial post. I only have experience with parallelizing interior-point methods. My conclusion is that parallelizing something like an inner product provides little benefit, because the computation is memory bound. Parallelizing something like big dense matrix multiplication is fairly easy because you parallelize O(n^3) computations but only read O(n^2) data. Therefore, the benefit from solving an LP parallel will highly depend on what type computations are dominating and that is problem dependent.

Thank you for sharing your experience. This has also been my experience with general multithreaded (not GPU distributed) LP/IP solving on a single CPU. Somehow, multithreaded computations did not seem to provide promised speed up time in solving hard IP problems. Perhaps the IO times, synchronization time, etc., with multithreading/GPU offloading are too time consuming and make such type of programming less than worthwhile for the additional programming/coding effort needed to get them to work bug-free. — Tryer, Feb 10 '23 at 03:21
No, multi-threading helps a lot. It works so well, that solvers by default use all available cores. — Erwin Kalvelagen, Feb 10 '23 at 03:40

Multithreading vs DPCC SYCL Programming for OR computational implementations

2 Answers2