This issue has become much more nuanced as the changes in architectures has shifted the HPC landscape. As Wolfgang Bangerth mentions one current longstanding view, I'll split my answer into basic definitiions and further details.
Basic Definition
A node refers to the physical box, i.e. cpu sockets with north/south switches connecting memory systems and extension cards, e.g. disks, nics, and accelerators
A cpu socket is the connector to these systems and the cpu cores, you plug in chips with multiple cpu cores. This creates a split in the cache memory space, hence the need for NUMA aware code.
A cpu core is an independent computing with its own computing pipeline, logical units, and memory controller. Each cpu core will be able to service a number of cpu threads, each having an independent instruction stream but sharing the cores memory controller and other logical units.
This notion of node and cpu core gets you through most HPC queuing systems, but note many HPC centers will charge you "Service Units" which is a variable rate dependent on aspects of the node.
Going further
If you are interested in actually giving some performance details of a distributed code, this story is a bit more troublesome. Let me put it in terms of questions this model doesn't answer:
- How many cores is a GPU accelerator?
The GPU has very small processors with few logical units, so comparing them to an x86 cpu is not fair. Nonetheless marketers will tell you that GPUs have 1000s of cpus.
- Cloud architectures are putting many traditional nodes on a single physical server with integrated networking between them.
Companies like Calxeda are going around many of the inefficiencies in the current node configurations and your traditional node configuration is sharing many more systems. This idea of node is becoming vague.