7

A friend got an MSI GTX 950 2GD5T graphics card for xmas.

I said "yeah, graphics cards are super fast now, that thing probably processes 2 billion triangles per second". Then I tried to look it up to check if that was even in the right ballpark, but I was unable to find a "triangles per second" stat for that card, and actually for many cards.

Is "triangles per second" even a meaningful stat? If so, what is the approximate number for this card? (given otherwise average conditions)

M Katz
  • 173
  • 4
  • 1
    'Processes' is very vague. Is that vertex shader ops? Raterizer? Shading? All of the above? None of these are meaningful, because they have a massive scene dependence. FLOPS is kind of better, but still not great because it doesnt take into account register pressure, memory latency, etc. – RichieSams Dec 28 '15 at 16:52
  • I understand that there are all of these factors. Nonetheless, I'd be interested to know about how many triangles per second can be drawn assuming modest/reasonable/typical choices for the various factors (simple/default vertex and pixel shaders, simple lighting, big model being textured by some reasonable texture sheets). – M Katz Dec 29 '15 at 18:32

2 Answers2

3

Yes, it's a meaningful stat: GPUs have dedicated triangle setup HW and the rate is measured in triangles/GPU clock. According to white papers available on NV's website, the 680 (Kepler) could issue one triangle per SM every other clock - with 8 SMs, this yielded 4 triangles/clock. The Maxwell white paper doesn't indicate a change in this rate per SM - the 980 has 16 SMs so, if there is really is no rate change per SM, it can produce 8 triangles/clock. While the 980 has 2048 CUDA cores, the 950 has 768, implying 6 SMs and 3 triangles/clock. The chip runs around 1 Ghz, so the 950 is probably limited to 3 billion triangles per second.

  • 2
    Modern cards differ from previous generation cards as they nolonger have a fixed pipeline for triangles and thus its harder to say what the rate is as it waries with conditions outside the card. – joojaa Jan 20 '16 at 06:22
  • I was worried that might be the case for NV. AMD did have a fixed triangle setup engine in Hawaii, but I wouldn't be surprised if it went away in the next architecture revision. –  Jan 20 '16 at 06:39
  • Again it seems to me that you can make certain default assumptions that take a lot of the uncertainly out of what's "outside the card" for the purpose of comparing card speeds (simple/default vertex and pixel shaders, simple lighting, big model being textured by some reasonable texture sheets). But I'll take this fixed-pipeline estimate of 2 billion per second as being similar to what you'd get with those assumptions. Thanks. – M Katz Jan 21 '16 at 01:26
  • I'm going to study up on NV's white papers - I need to catch up on their architecture (I worked for AMD a few years back). Ideally, there would be be synthetic benchmarks which would help everybody understand where the bottlenecks are, but there was a long history of cheating on them... –  Jan 21 '16 at 03:31
  • If it's an operation that always (or nearly always) needs to be done (and triangle set up is one such thing) then there can be large power/area/efficiency reasons for using dedicated hardware. – Simon F Jan 21 '16 at 10:31
1

In my tests a GTX 1050 can do ~1B triangles with glDrawElementsInstanced(GL_TRIANGLES, .... That's roughly 2/3 of chip clock. Arguably GL_TRIANGLE_STRIP can give you an x3 speed boost, but e.g. idtech4 only supports GL_TRIANGLES.