Difference between Nodes and CPUs when running software on a cluster?

Question

I'm looking into moving some computations of mine to a data center to get more computation power. In the context of this process, I am getting confused by the differentiation of a computation node and a CPU.

Apparently some commercial software packages differentiate between the two. Is it correct to imagine each computation node as a different physical computer.

A link to a tutorial on this for dummies would be highly welcome. Evidently, googling this topic provides me with a lot of tech lingo that is cumbersome to decrypt.

aterrel · Accepted Answer · 2013-06-05T15:23:45.867

This issue has become much more nuanced as the changes in architectures has shifted the HPC landscape. As Wolfgang Bangerth mentions one current longstanding view, I'll split my answer into basic definitiions and further details.

Basic Definition

A node refers to the physical box, i.e. cpu sockets with north/south switches connecting memory systems and extension cards, e.g. disks, nics, and accelerators
A cpu socket is the connector to these systems and the cpu cores, you plug in chips with multiple cpu cores. This creates a split in the cache memory space, hence the need for NUMA aware code.
A cpu core is an independent computing with its own computing pipeline, logical units, and memory controller. Each cpu core will be able to service a number of cpu threads, each having an independent instruction stream but sharing the cores memory controller and other logical units.

This notion of node and cpu core gets you through most HPC queuing systems, but note many HPC centers will charge you "Service Units" which is a variable rate dependent on aspects of the node.

Going further

If you are interested in actually giving some performance details of a distributed code, this story is a bit more troublesome. Let me put it in terms of questions this model doesn't answer:

How many cores is a GPU accelerator?

The GPU has very small processors with few logical units, so comparing them to an x86 cpu is not fair. Nonetheless marketers will tell you that GPUs have 1000s of cpus.

Cloud architectures are putting many traditional nodes on a single physical server with integrated networking between them.

Companies like Calxeda are going around many of the inefficiencies in the current node configurations and your traditional node configuration is sharing many more systems. This idea of node is becoming vague.

score 7 · Answer 2 · edited Dec 05 '21 at 22:08

7

Many modern computers have multiple CPUs (chips) and each chip may have multiple cores. Each core can execute one (or sometimes multiple) stream of instructions.

I've recorded an introduction to these sorts of issues here: https://www.math.colostate.edu/~bangerth/videos.676.39.html

edited Dec 05 '21 at 22:08

Tyberius

1,023
1
10
26

answered Jun 05 '13 at 10:33

Wolfgang Bangerth

55,373
59
119

score 3 · Answer 3 · answered Jun 05 '13 at 23:02

Answering from a cluster perspective... You need to be careful as the definition can be dependent on the system configuration. Our system uses PBS and a "node" can either be a physical box or a single core "virtual node".

On our system, if you request nodes=1 you get one core (which is one of 6 cores on a physical CPU (which is one of two CPUs on a physical box)). In PBS these are called "virtual nodes" and I think these may be the same as your "computation node". If you requested nodes=12, then the cores "could" be allocated anywhere in the cluster (though the queuing system tries to put them on the same physical box).

If you request nodes=1:ppn=2 you get two cores on a one physical node. The ppn refers to processors per node.

There is some info I have developed here: https://www.massive.org.au/userguide/cluster-instructions/running-pbs-script-jobs

score 0 · Answer 4 · answered Dec 03 '21 at 04:02

I have posted this 5 minute YouTube video with animations that explain the following terms and concepts:

HPC Terminology and 'Core' Concepts - What's in a Node?

Nodes, Cores, and Processors
Tasks, Threads, and Processes
Shared vs Distributed Memory
Scheduling options - when should you limit the number of tasks put on each node?

I developed it as part of a video tutorial series for students using our campus cluster. I hope others will find it useful as well.

https://www.youtube.com/watch?v=SrumAJj4UjU

Difference between Nodes and CPUs when running software on a cluster?

4 Answers4

Basic Definition

Going further