Best hardware solution for microsecond Molecular Dynamics

Question

We would like to reach Molecular Dynamics simulation of proteins with around 20000 atoms in explicit water with trajectories of around 1 microsecond each. We are looking at different options for computer resources to complete these simulations.

Since we (in Europe) can not apply for supercomputing time in Anton (from D.E.Shaw Research) and we have some funds (up to 500k €) we wonder which would be the best cluster or HPC infrastructure to buy for such calculations.

What scientific question are you trying to answer? I do not believe that microsecond MD answers many scientific questions. — Jeff Hammond, Oct 05 '12 at 01:31
If you study protein folding, you should be aware that many proteins fold in the micro and even milisecond timescales, and some others even more. And for ligand-protein induced conformational changes, it falls into similar time domains. Do you need specific bibliographic references? — Open the way, Oct 05 '12 at 03:13
I posted an answer, but before continuing, is the simulation 20K atoms before water is included? If so, how much water will you need to use? — aeismail, Oct 05 '12 at 04:47
You may want to decide first what software you want to use before making a decision on hardware. Although you can't request time from Anton, what's keeping you from requesting time at JuGene, for instance? — Deathbreath, Oct 05 '12 at 18:24
Time on Jugene is awarded through a competitive application process, and may take a long time to become available. The next call has a very near deadline (two weeks from now), and requires a written proposal. Besides, they are looking for jobs requiring thousands of nodes, and this is too small for that, unless they nest jobs together. — aeismail, Oct 05 '12 at 19:39

Pedro · Answer 1 · 2012-10-05T09:49:40.630

12

For such a small simulation, I would strongly suggest looking into GPU-based solutions. This is probably what will get you the most ns/day/Euro.

In my opinion, the fastest fully-featured GPU-based Molecular Dynamics (MD) software out there is ACEMD (see here for timings). The software, however, is commercial, but has a single-GPU free version that can be used for evaluation purposes.

Other fully-featured, yet open-source, GPU-enabled MD packages include NAMD and GROMACS 4.6. Other projects include FenZi, but they don't seem to make their code available.

On the Joint Amber-Charmm (JAC) Benchmark, which consists of 23 558 atoms, but with a relatively short cutoff of 0.9nm, all these codes will get a handful of ns per day on a commodity GPU. That's still a few days of computing to get 1ms, but not bad considering that it's just one single machine.

edited Oct 05 '12 at 09:49

answered Oct 03 '12 at 16:11

Pedro

9,573
1
36
45

@aeismail: I appreciate your edit, but LAMMPS is probably not the right tool for protein simulations, and HOOMD does not seem to do complex systems such as biomolecules at all. I have, however, added FenZi as another full-featured simulation package for proteins. – Pedro Oct 05 '12 at 10:15
Why do you think LAMMPS is inadequate for proteins, @Pedro? – Deathbreath Oct 05 '12 at 18:22
The usual arguments are that NAMD and Gromacs are faster than LAMMPS, and have some features (including allowing SHAKE along a backbone) that LAMMPS does not. That said, for the specific case of GPU-based processing, LAMMPS and HOOMD are superior options. – aeismail Oct 05 '12 at 19:37
@Deathbreath: LAMMPS was initially not designed for protein simulation and while it can do proteins today, it doesn't have a very large following in that field, and thus lacks the ecosystem of tools found in both GROMACS and NAMD (see this discussion for an example). I have also yet to be convinced that LAMMPS is any faster. – Pedro Oct 08 '12 at 09:35
@aeismail: I suppose you have references to back-up that claim? I have yet to see a direct comparison of HOOMD against anything but LAMMPS, less so for the specific case of simulating biomolecules. – Pedro Oct 08 '12 at 09:37
@Pedro: The issue is that as a GPU tool, both NAMD and Gromacs have many limitations in what they can do, at least relative to LAMMPS and HOOMD. While proteins might be a challenge for the latter codes, these are issues that can be corrected through contributions to the code—which are also much more easily adapted than NAMD and Gromacs. – aeismail Oct 08 '12 at 18:37
@aeismail: Sure, but the question referred to proteins, and the asker made no allusions to wanting to develop any code himself/herself, which is why I gave the advice I gave. – Pedro Oct 08 '12 at 20:57
@Pedro: The application purpose in the development of MD simulators seems irrelevant since they all integrate classical EOM, which are independent of the specific target molecules; unless LAMMPS doesn't support the functional forms of force fields (which is not the case AFAIK). I'll give you the tools point, but then analysis tools are not MD. On the timing issues, I don't believe any comparisons and speed-ups unless I've tested them on my hardware myself. – Deathbreath Oct 11 '12 at 13:59
@Deathbreath: Yes, the actual integration, if the interaction potentials are implemented, is the same. so LAMMPS can definitely do MD. The tools I was referring to are not for the analysis of the end result, but for producing the input: Generating topologies from PDB files, hydrating and adding ions, equilibrating an initial setup... All these things, which have little to do with the simulation itself, are extremely important for actual practical users. – Pedro Oct 11 '12 at 15:37
@Pedro, you seem to know quite a bit about simulation (I'm still somewhat new at it), so I have to ask: for serial computation, and with money being no factor, wouldn't CPUs be the fastest? GPUs are composed of many thread-blocks running in parallel at much lower clockspeeds than CPUs, which is great for large systems of millions of atoms, but if I'm not mistaken, a huge number of CPUs would still be faster for simulating long timescales, right? (And by huge, I mean an entirely impractical number of them). – Nick Apr 14 '13 at 19:20
As an extreme example to illustrate the point, consider the implications of simulating just 3 atoms for as long a timescale as possible. – Nick Apr 14 '13 at 19:21
@Nick, the problem with increasing the number of CPUs is that they somehow have to communicate, whereas all the cores on a single GPU share a common memory and can communicate/synchronize implicitly. Multi-core CPUs use shared memory, but GPUs have orders of magnitude more cores per device. If you have only a very small number of atoms, then no amount of parallelism, CPU or GPU, will save you. – Pedro Apr 14 '13 at 19:43

aeismail · Answer 2 · 2012-10-05T19:41:29.123

Even though you have so much money available to spend on computing resources, the bigger issue, as Pedro points out, is that your problem is relatively small. With roughly 100,000 atoms, your "sweet spot" on CPU's will likely be about 100 cores. If you try to use more than that, you'll end up spending a lot of time communicating information between processors, which can be much more expensive. You could try to purchase multithreaded processors, but they might not help nearly as much.

However, you are relatively speaking in the sweet spot for GPU computing. So your best bet would be either to use a package like HOOMD, or to take advantage of shared-memory machines using the multithreaded package options in codes such as LAMMPS or NAMD to avoid some of the internal message passing.

score 2 · Answer 3 · edited Oct 10 '13 at 03:04

For all-atom, explicit solvent, bio-molecular systems of O(100k) atoms you ought now to be using GPU-accelerated codes. Even without knowing exactly the setup of your simulations it is most probable that ACEMD, AMBER, Gromacs, NAMD would all be adequate for your needs.

Generally these codes won't scale beyond a single system for your simulation size (unless with network like Infiniband), or even a few GPUs, and strongly favour GPU performance over CPU, so focus on machine configurations with several high-performance GPUs and good PCIe connectivity. Plan for 1-2 core/GPU. With some codes, there's no need to have multi-CPU systems, since the computation is done on the GPU (note that GROMACS will use both CPU and GPU effectively, so their quality should be balanced), nor employ a high performance interconnect, such as Infiniband.

All the codes use CUDA so Nvidia GPUs are the way to go. Geforce cards are perfectly adequate (eg the 4GB Geforce GTX680), and substantially more economical than the Teslas.

We sell a workstation optimised for ACEMD and other MD codes Acellera Metrocubo. Alternatively, register for the NVidia GPU Test Drive to be put in touch with other suitable hardware resellers.

With regard to the criticism of hydrogen mass re-partitioning, the theoretical and technical basis was first described in:

Improving efficiency of large time-scale molecular dynamics simulations of hydrogen-rich systems Feenstra et al. JCC 1999

doi://10.1002/(SICI)1096-987X(199906)20:8<786::AID-JCC5>3.0.CO;2-B

It is a widely used method, implemented not only in ACEMD but also Gromacs and, recently, Anton:

Atomic-level description of ubiquitin folding, Piana et al, PNAS (2013)

doi://10.1073/pnas.1218321110

how did you now people were talking about ACEMD? – Open the way Apr 15 '13 at 15:51 — Open the way, Apr 15 '13 at 15:51

gianni · Answer 4 · 2013-04-15T14:31:09.353

1

A deterministic code is not necessarily reversible, so it should not actually add any benefit in terms of statistical sampling, it kinds of produce the same errors but consistently. The consistency is useful for debugging. All codes are numerically integrated, so they will all drift sooner or later from the constant energy. It is important to stay close enough to the energy surface even when sampling in NVT and all codes used by people actually do it using mixed or fixed precision (NAMD, LAMMPS, ACEMD, AMBER, GROMACS, DESMOND, ANTON).

A single gtx680 would produce around 160 ns/day. A single gtx Titan 220 ns/day. So you get 1 microsecond in 4 days with the free version of acemd.

edited Apr 15 '13 at 14:31

answered Apr 15 '13 at 10:25

gianni

11
2

gianni, Welcome to SciComp! It appears that this answer is more of a comment on discussion on another answer rather than an answer of its own accord, and it doesn't address the original question which asked about hardware rather than software. It would be helpful if you edited the answer to include hardware recommendations for your suggested codes that would be efficient for this size of problem. – Godric Seer Apr 15 '13 at 12:10
Done. just edited the note above. – gianni Apr 15 '13 at 14:32
@gianni: Welcome to SciComp! Your answer is still mostly a software recommendation and a comment on another answer, and doesn't give a substantive hardware recommendation for a cluster (you mention a single sentence about performance on a single card, with no reference to system size, so I'm skeptical). Please answer the question more substantially, ask to convert your answer to a comment (in which case I'll remove the part about graphics cards), or delete your answer. – Geoff Oxberry Apr 16 '13 at 03:28

Best hardware solution for microsecond Molecular Dynamics

4 Answers4