I'm looking for parallel computing in tiniest form possible. For example, having two 18 cores Intel Xeon parked on both sides of motherboard with size of credit card would be ideal solution.
Couldn't found any motherboards (compatible with CPUs with 10+ cores) without USBs, Wifi, PCIe and other ports, that makes them only bigger.
More specifically I need best Cores per Square Meter ratio possible: CPU and motherboard models.
- 1.5-2GHz is ok. More = better, but not critical
- yes, it is to build high-dense CPU farm
- no, it is not for mining, it is for math, non-GPU calculations
- yes, amount of cores (multithreading) is critical
- yes, huge 4x"Intel Broadwell-EP CPU"s server motherboards are ok until they bring best cores/square ratio
Updated (12.07):
Considering answers below which I have at the moment:
- No AMD at all, Xeon processors it is, E5 or E7 is to be evaluated by cost/density/power factor, will update with calculations later.
- I'm also in blade servers with extreme density WITH support of Xeon Phi - some of applications will run on it nicely with reuse of code and data, have spent a day looking for specifications and cost, will update as soon as will have nice configuration on hands
- regarding CUDA and Nvidia Tesla, that is a separate question and it is solved already, will share specifications with you later, thank you, SEJPM!
What is already decided that configuration will have BOTH Xeon E5/E7 and Xeon Phi on same boards (not on all of them though). Cray supercomputers already use it
Updated (12.07)[2]:
To be clear: I've a lot of small binaries (cross-platform, C++ and Java) each of them work for seconds with parallel computing full-support. When it comes to situation where there are significantly less cores than amount of threads/processes started, overall efficiency drops due to a lot of context switching between processes. And there is no way to queue jobs, let's say they are to be done in real-time manner.
Best way is to distribute them among Xeon Phi (store code and preseed with data) + run others as a tiny-services (solving issue with binary loading overhead) waiting for jobs.
That is why I search for comparatively cheap solution with high cores density to not to maintain big amount of units, solving more problems with data-logistic.


