Biggest amount of cores on the smallest board

Question

I'm looking for parallel computing in tiniest form possible. For example, having two 18 cores Intel Xeon parked on both sides of motherboard with size of credit card would be ideal solution.

Couldn't found any motherboards (compatible with CPUs with 10+ cores) without USBs, Wifi, PCIe and other ports, that makes them only bigger.

More specifically I need best Cores per Square Meter ratio possible: CPU and motherboard models.

1.5-2GHz is ok. More = better, but not critical
yes, it is to build high-dense CPU farm
no, it is not for mining, it is for math, non-GPU calculations
yes, amount of cores (multithreading) is critical
yes, huge 4x"Intel Broadwell-EP CPU"s server motherboards are ok until they bring best cores/square ratio

Updated (12.07):

Considering answers below which I have at the moment:

No AMD at all, Xeon processors it is, E5 or E7 is to be evaluated by cost/density/power factor, will update with calculations later.
I'm also in blade servers with extreme density WITH support of Xeon Phi - some of applications will run on it nicely with reuse of code and data, have spent a day looking for specifications and cost, will update as soon as will have nice configuration on hands
regarding CUDA and Nvidia Tesla, that is a separate question and it is solved already, will share specifications with you later, thank you, SEJPM!

What is already decided that configuration will have BOTH Xeon E5/E7 and Xeon Phi on same boards (not on all of them though). Cray supercomputers already use it

Updated (12.07)[2]:

To be clear: I've a lot of small binaries (cross-platform, C++ and Java) each of them work for seconds with parallel computing full-support. When it comes to situation where there are significantly less cores than amount of threads/processes started, overall efficiency drops due to a lot of context switching between processes. And there is no way to queue jobs, let's say they are to be done in real-time manner.

Best way is to distribute them among Xeon Phi (store code and preseed with data) + run others as a tiny-services (solving issue with binary loading overhead) waiting for jobs.

That is why I search for comparatively cheap solution with high cores density to not to maintain big amount of units, solving more problems with data-logistic.

score 4 · Answer 1 · answered Sep 08 '16 at 19:05

4

If you are willing to do a bit more work on integration with whatever you want to run, you can try a Parallella board. That gives you a 16 core RISC co processor plus a dual core main processor on a board the size of a credit card. They use very little energy and are specifically designed to be used in clusters or in parallel (hence the name) applications. As a bonus, they start at about $100 each and they run Linux. https://www.parallella.org/

answered Sep 08 '16 at 19:05

JBiggs

141
3

1

Now, THIS is something I may be interested in, though I would likely find myself building a custom case out of wood or something. Cannot just have them laying out all unprotected. – NZKshatriya Nov 01 '16 at 21:39

score 2 · Accepted Answer · answered Jul 11 '16 at 14:03

I am going to assume you need x86 compatible cores/threads, and I am going to assume you don't want to use something like the Xeon Phi compute card, for some reason or another. I am also going to assume, because you didn't specify, that power requirements and heat dispersion considerations are not something you're worried about, since it wasn't mentioned.

That being said, there are some absolute monsters out there you can fit into pretty small builds.

On the AMD side, you're stuck back in 2012 unless you get fancy with ARM, but you can still get very powerful 64 real-core systems built on quad-socket G34 server boards that are somehow crammed into 1U chassis by Supermicro and other companies.

On the Intel side, where I suspect you'll end up, you can get the formidable Intel Xeon E5-2699 V4, with 22 cores and 44 threads PER PROCESSOR, onto large multi-CPU boards, or onto thin ITX boards like this one: http://www.anandtech.com/show/9221/asrock-rack-announces-epc612d4i, or this one http://www.asrockrack.com/general/productdetail.asp?Model=EP2C612D8HM#Specifications - whichever way nets you the most density. I'm not recommending AsRock as a brand per se either, I'm just showing you what's possible using their site.

If none of that will suffice, you're really gonna have to look into stuff like Xeon Phi, IBM Power8, or ARM architectures. Although more exotic and therefore more difficult to work with, these types of CPU feature very high compute densities, with different approaches each. More information/research would be necessary before anything in this area could be recommended.

AsRock with two 2011-R3 sockets is best so far. Will check Xeon Phi and add more comments after that, thank you! If none better will be found in next days, will mark your answer. — iXCray, Jul 11 '16 at 14:35
@iXCray note that Xeon E7s sometimes have more cores (for Broadwells at least) and you can run those in 4 and 8 socket configurations as opposed to 2 sockets with E5s. They are super expensive though (as are the 4/8 socket boards) — SEJPM, Jul 11 '16 at 17:25
Thank you SEJPM - I thought I was missing something regarding the E7 lineup, but in my brief search I couldn't find sites selling those with enough information for me to make an informed recommendation.
To the OP - please consider an E7 configuration, cost no object. Even if you had to step "down" to Broadwell, the IPC differences under consideration would not outweigh almost any addition of threads. — Adam Wykes, Jul 11 '16 at 17:33

score 2 · Answer 3 · answered Jul 13 '16 at 15:24

Another option you may want to look at are Industrial single board computers based on the PICMG 1.3 spec.

For example the ROBO-8122VG2R SBC supports a pair of E5-2600 series CPUs:

There is also the Advantech PCE-9228, which actually specifies that it supports v3 CPUs.

Combine these with a quad split chassis backplane and you could potentially have 8 Xeons and 4 Xeon Phi's in a single 4U 19" rack chassis.

I'm not sure this would be any more compact than a cluster of 1U rack servers, but I suspect that they might be easier to work on.

Well, they are great, but it is more likely that I'll use Knights Landing (Xeon Phi) which is already not only in form of co-processor but in the form of CPU. 60+ cores, 240+ hardware threads. — iXCray, Jul 14 '16 at 17:36

score 1 · Answer 4 · answered Jul 11 '16 at 19:48

In my answer I'll expand a little bit on Adam's answer. I'll also restrict myself to Intel processors and all brand recommendations should be taken with a grain of salt given the fact that I don't have any actual / hands-on experience with this sort of hardware.

So your aim is maximal density of cores / space. What this means is that you want to fit as many processors into as small spaces as possible.

If you don't really care about the distribution of CPUs across mainboards, then 1U servers with full-scale 2-socket Xeon E5 v4s is the way to go. Or if you're willing to spend some time searching and asking hardware providers, then you can also get probably a 4-socket Xeon E7 v4s 1U setup (or if you ask really nice an 8-socket mainboard may also fit, note that 4XXX CPUs are for 4-socket configurations and 8XXX CPUs are for 8-socket configurations). This should have the most cores per space unit if you don't care about grouping.

If you want to have as many cores on a single board as possible, then going with 8-socket Xeon E7 v4s is the way to go. Although chances are that it's going to be less optimal in the cores / space department.

Additionally you should consider using Xeon Phi and / or Nvidia Tesla accelerator cards. The Nvidia cards go well with highly parallelizable (small) workloads while the Phis are basically dumbed down Intel processors grouped up onto a PCIe card and thus support linear workload much better.

One last note: Depending on what you plan to do, a proper RISC architecture like PowerPC, ARM and SPARC may be worth a look, given that a lot of the current Top500 supercomputers use these architectures.

Example providers of Xeon E7 equipment include Delta Computers (only in German?), Lenovo and Supermicro.
As for the CPUs, the Intel Xeon E5 2699v4 is the dual-socket CPU with the most cores. The Intel Xeon E5 4669 v4 for 4-socket (only 16 cores each) and the Intel Xeon E7 8890 v4 for 8-sockets (24 physical cores each). The Nvidia Tesla P100 is the best current super computing card by Nvidia and the Intel Xeon Phi 7290F is the best current Xeon Phi card.

I didn't include the Teslas because AFAIK they are essentially GPUs without the RAMDACs and other graphics-specific parts on them, and DP precision enabled in the firmware. If you look at OP's post, they appear not to want GPGPU compute. — Adam Wykes, Jul 11 '16 at 20:51
@AdamWykes, I've included them as an option, because I don't know the exact work load he has and to make the position of the Xeon Phis more clear and easy to understand. — SEJPM, Jul 11 '16 at 20:53

Biggest amount of cores on the smallest board

Updated (12.07):

Updated (12.07)[2]:

4 Answers4