84

Sometimes Java outperforms C++ in benchmarks. Of course, sometimes C++ outperforms.

See the following links:

But how is this even possible? It boggles my mind that interpreted bytecode could ever be faster than a compiled language.

Can someone please explain? Thanks!

Joachim Sauer
  • 11,006
  • 2
    You can take a look at http://shootout.alioth.debian.org/u32/benchmark.php?test=all&lang=gpp&lang2=javasteady to see the kind of problems that run faster on java/c++... See the pattern of problems, and not these specific problems... – c0da Sep 26 '11 at 05:48
  • 2
  • Each language serve for specific jobs. You just can't use Java, nor C++, nor Pascal, nor JS for everything. If you want to be a broad professional, you just have to know more languages and use the right one for each job. – NaN Mar 14 '17 at 13:22

13 Answers13

115

Hand rolled C/C++ done by an expert with unlimited time is going to be at least as fast or faster than Java. Ultimately, Java itself is written in C/C++ so you can of course do everything Java does if you are willing to put in enough engineering effort.

In practice however, Java often executes very fast for the following reasons:

  • JIT compilation - although Java classes are stored as bytecode, this is (usually) compiled to native code by the JIT compiler as the program starts up. Once compiled, it is pure native code - so theoretically it can be expected to perform just as well as compiled C/C++ once the program has been running for long enough (i.e. after all the JIT compilation has been done)
  • Garbage collection in Java is extremely fast and efficient - the Hotspot GC is probably the best all-round GC implementation in the world. It's the result of many man-years of expert effort by Sun and other companies. Pretty much any complex memory management system that you roll yourself in C/C++ will be worse. Of course you can write pretty fast/lightweight basic memory management schemes in C/C++, but they won't be nearly as versatile as a full GC system. Since most modern systems need complex memory management, Java therefore has a big advantage for real-world situations.
  • Better platform targetting - by delaying compilation to application start-up (JIT compilation etc.) the Java compiler can take advantage of the fact that it knows the exact processor it is executing on. This can enable some very beneficial optimisations that you wouldn't be able to do in pre-compiled C/C++ code that needs to target a "lowest common denominator" processor instruction set.
  • Runtime statistics - because JIT compilation is done at runtime, it can gather statistics while the program is executing which enable better optimisations (e.g. knowing the probability that a particular branch is taken). This can enable Java JIT compilers to produce better code than C/C++ compilers (which have to "guess" the most likely branch in advance, an assumption which may often be wrong).
  • Very good libraries - the Java runtime contains a host of very well written libraries with good performance (especially for server-side applications). Often these are better than you could write yourself or obtain easily for C/C++.

At the same time C/C++ also have some advantages:

  • More time to do advanced optimisations - C/C++ compilation is done once, and can therefore spend considerable time doing advanced optimisations if you configure it to do so. There's no theoretical reason why Java couldn't do the same, but in practice you want Java to JIT-compile code relatively quickly, so the JIT compiler tends to focus on "simpler" optimisations.
  • Instructions that aren't expressible in bytecode - while Java bytecode is fully general purpose, there are still some things you can do at a low level that you can't do in bytecode (unchecked pointer arithmetic is a good example!). By (ab)using these kind of tricks you can get some performance advantages
  • Less "safety" contstraints - Java does some extra work to ensure that programs are safe and reliable. Examples are bounds checks on arrays, certain concurrency guarantees, null pointer checks, type safety on casts etc. By avoiding these in C/C++ you can get some performance gains (although arguably this can be a bad idea!)

Overall:

  • Java and C/C++ can achieve similar speeds
  • C/C++ probably has the slight edge in extreme circumstances (it's not surprising that AAA game developers still prefer it, for example)
  • In practice it will depend on how the different factors listed above balance out for your particular application.
mikera
  • 20,707
  • 9
    Ad "more time for optimisations in C++": That's one of the tweaks that the Oracle VM does when you chose the Server VM: It accepts a higher start-up cost in order to allow higher performance in the long run. The Client VM, however is tweaked for optimal startup time. So that distinction even exists within Java. – Joachim Sauer Sep 26 '11 at 06:55
  • 8
    -1: A C++ compiler can take much more time (hours, literally, for a large library) to crate a very optimized binary. Java JIT compiler cannot take so much time, even the "server" version. I seriously doubt that Java JIT compiler would be able of performing Whole Program Optimization the way the MS C++ compiler does. – quant_dev Sep 26 '11 at 12:56
  • 22
    @quant_dev: sure, but isn't that exactly what I said in my answer as a C++ advantage (more time to do advanced optimisation)? So why the -1? – mikera Sep 26 '11 at 13:37
  • 13
    Garbage collection is not a speed advantage for Java. It's only a speed advantage if you're a C++ programmer that does not know what you're doing. If all you're checking is how fast you can allocate, then yes, the garbage collector will win. Overall program performance, however, can still be better done by manually managing memory. – Billy ONeal Sep 26 '11 at 14:19
  • small nitpick: I was under the impression that Sun's JVM (maybe others too) doesn't explicitly check for null as an optimization: The HotSpot Virtual Machine installs signal handlers to implement various features and to handle fatal error conditions. For example, in an optimization to avoid explicit null checks in cases where java.lang.NullPointerException will be thrown rarely, the SIGSEGV signal is caught and handled, and the NullPointerException is thrown. – Idan K Sep 26 '11 at 14:43
  • @IdanK: that's probably true for most implementations, but it's a two-sided coin: it means that in the "good" case the null check comes at zero cost, but it also means that if your code uses NPE for flow control (which is a very bad idea, but possible), then you'll frequently have signal handlers running, which will almost certainly slow down the whole application. – Joachim Sauer Sep 26 '11 at 15:36
  • 1
    @IdanK: Null checks aren't exactly common at all in C++ code -- if you use C++ references, they are not allowed to be null in the first place. No null check is ever required in such cases. The few times you have to use pointers, it's still common to have no null check.. Obviously C or C++ can implement the same (catching a SIGSEGV) behavior. (on platforms where the concept of a SIGSEGV exists, of course). – Billy ONeal Sep 26 '11 at 16:54
  • 2
    Great read! Interesting to know because in school, at least from my professors, we were always told that C/C++ will ALWAYS be faster than Java, hence, why it's used in OS development and game development and that Java was for lazy people :P –  Sep 26 '11 at 18:19
  • @BillyONeal: I wasn't implying that it is common practice in C/C++... Simply that it's not in JVM implementations either as this answer suggested. – Idan K Sep 26 '11 at 18:20
  • Don't you mean "Least Common Multiple"? ;) – Mateen Ulhaq Sep 26 '11 at 23:42
  • 4
    ... But with C++, you could always theoretically put a "JIT-like layer" that does similar branch optimizations at runtime, while maintaining the raw speed of a C++ program. (Theoretically. :( ) – Mateen Ulhaq Sep 26 '11 at 23:47
  • @BillyONeal, have you any empirical evidence to back up that claim? –  Sep 28 '11 at 11:37
  • @ThorbjørnRavnAndersen: What I mean is, there is always a memory data structure which is optimal under the given workload. Consider most compilers. Usually, one allocates a bunch of tree structures from a single memory block, and then frees the whole mass as a single block. A garbage collector would have to either copy the contents of the live program memory (expensive), or perform checks to see which objects were reachable (may be relatively expensive) -- while with this kind of block structure the overhead for memory management is effectively zero. GCs are really good, but not zero. – Billy ONeal Sep 28 '11 at 13:49
  • @ThorbjørnRavnAndersen: Even runtimes that use garbage collection usually do something similar to that kind of optimizations for (e.g.) strings. Take a look at libraries like heap layers ( http://www.heaplayers.org/ ) for examples of high performance allocators tailored to specific applications. – Billy ONeal Sep 28 '11 at 13:52
  • @Mikera: "Runtime statistics" is incorrect -- most C++ compilers these days implement these kinds of things by instrumenting the binary, collecting statistics, and accounting for those things during compilation. – Billy ONeal Sep 28 '11 at 13:57
  • 3
    This is a well formulated answer and it is even politically correct. However it's wrong . In the best cases, C and C++ have much more than a "slight edge" on java. Several of your positive points for java cite features that CONSUME resources and thus make Java slower until they've done their job (JIT stuff). Furthermore, all DBMS are written in C for a good reason, all games are written in C++ for a good reason, and GPU's are being modified to handle C++ for a reason. Top scores on code contests are in C++ for a reason.That reason is that C++ IS inherently faster than Java and will always be. – Morg. Oct 06 '11 at 11:39
  • "A C++ compiler can take much more time (hours, literally, for a large library) to crate a very optimized binary." Sure, but the speed improvement from this is often quite small and not proportional to the time spent optimizing. How often do you see significant performance improvements from switching GCC from -O2 to -O3? Most of the time, except some very specific cases, I could hardly see any, and if there are any, they are nowhere near as huge as going from no optimisation at all to -O2. Server Hotspot does almost all of -O2 optimisations, some of -O3 and some Java-specific ones. – Piotr Kolaczkowski Nov 14 '14 at 21:01
  • 4
    "Furthermore, all DBMS are written in C for a good reason" Not true. There are many DBMSes written in Java, some of them being leaders in the performance category. Apache Cassandra is one of them. "All games are written in C++" - again not true. Many games are written in other languages. The best selling game on PC to date was written... in Java. – Piotr Kolaczkowski May 09 '15 at 12:45
113

First, most JVMs include a compiler, so "interpreted bytecode" is actually pretty rare (at least in benchmark code -- it's not quite as rare in real life, where your code is usually more than a few trivial loops that get repeated extremely often).

Second, a fair number of the benchmarks involved appear to be quite biased (whether by intent or incompetence, I can't really say). Just for example, years ago I looked at some of the source code linked from one of the links you posted. It had code like this:

  init0 = (int*)calloc(max_x,sizeof(int));
  init1 = (int*)calloc(max_x,sizeof(int));
  init2 = (int*)calloc(max_x,sizeof(int));
  for (x=0; x<max_x; x++) {
    init2[x] = 0;
    init1[x] = 0;
    init0[x] = 0;
  }

Since calloc provides memory that's already zeroed, using the for loop to zero it again is obviously useless. This was followed (if memory serves) by filling the memory with other data anyway (and no dependence on it being zeroed), so all the zeroing was completely unnecessary anyway. Replacing the code above with a simple malloc (like any sane person would have used to start with) improved the speed of the C++ version enough to beat the Java version (by a fairly wide margin, if memory serves).

Consider (for another example) the methcall benchmark used in the blog entry in your last link. Despite the name (and how things might even look), the C++ version of this is not really measuring much about method call overhead at all. The part of the code that turns out to be critical is in the Toggle class:

class Toggle {
public:
    Toggle(bool start_state) : state(start_state) { }
    virtual ~Toggle() {  }
    bool value() {
        return(state);
    }
    virtual Toggle& activate() {
        state = !state;
        return(*this);
    }
    bool state;
};

The critical part turns out to be the state = !state;. Consider what happens when we change the code to encode the state as an int instead of a bool:

class Toggle {
    enum names{ bfalse = -1, btrue = 1};
    const static names values[2];
    int state;

public:
    Toggle(bool start_state) : state(values[start_state]) 
    { }
    virtual ~Toggle() {  }
    bool value() {  return state==btrue;    }

    virtual Toggle& activate() {
        state = -state;
        return(*this);
    }
};

This minor change improves the overall speed by about a 5:1 margin. Even though the benchmark was intended to measure method call time, in reality most of what it was measuring was the time to convert between int and bool. I'd certainly agree that the inefficiency shown by the original is unfortunate -- but given how rarely it seems to arise in real code, and the ease with which it can be fixed when/if it does arise, I have a difficult time thinking of it as meaning much.

In case anybody decides to re-run the benchmarks involved, I should also add that there's an almost equally trivial modification to the Java version that produces (or at least at one time produced -- I haven't re-run the tests with a recent JVM to confirm they still do) a fairly substantial improvement in the Java version as well. The Java version has an NthToggle::activate() that looks like this:

public Toggle activate() {
this.counter += 1;
if (this.counter >= this.count_max) {
    this.state = !this.state;
    this.counter = 0;
}
return(this);
}

Changing this to call the base function instead of manipulating this.state directly gives quite a substantial speed improvement (though not enough to keep up with the modified C++ version).

So, what we end up with is a false assumption about interpreted byte codes vs. some of the worst benchmarks (I've) ever seen. Neither is giving a meaningful result.

My own experience is that with equally experienced programmers paying equal attention to optimizing, C++ will beat Java more often than not -- but (at least between these two), the language will rarely make as much difference as the programmers and design. The benchmarks being cited tell us more about the (in)competence/(dis)honesty of their authors than they do about the languages they purport to benchmark.

[Edit: As implied in one place above but never stated as directly as I probably should have, the results I'm quoting are those I got when I tested this ~5 years ago, using C++ and Java implementations that were current at that time. I haven't rerun the tests with current implementations. A glance, however, indicates that the code hasn't been fixed, so all that would have changed would be the compiler's ability to cover up the problems in the code.]

If we ignore the Java examples, however, it is actually possible for interpreted code to run faster than compiled code (though difficult and somewhat unusual).

The usual way this happens is that the code being interpreted is much more compact than the machine code, or it's running on a CPU that has a larger data cache than code cache.

In such a case, a small interpreter (e.g., the inner interpreter of a Forth implementation) may be able to fit entirely in the code cache, and the program it's interpreting fits entirely in the data cache. The cache is typically faster than main memory by a factor of at least 10, and often much more (a factor of 100 isn't particularly rare any more).

So, if the cache is faster than main memory by a factor of N, and it takes fewer than N machine code instructions to implement each byte code, the byte code should win (I'm simplifying, but I think the general idea should still be apparent).

Jerry Coffin
  • 44,495
  • 29
    +1, full ack. Especially "the language will rarely make as much difference as the programmers and design" - you'll often stumble about problems where you can optimize the algorithm, e.g. improve big-O which will give much more of a boost than the best compiler could. – schnaader Sep 26 '11 at 10:31
  • 1
    "In case anybody decides to re-run the benchmarks involved..." DON'T! Back in 2005 those old tasks were discarded and replaced by the tasks now shown in the benchmarks game. If anybody wants to re-run some programs then please re-run the current programs for the current tasks shown on the benchmarks game home page http://shootout.alioth.debian.org/ – igouy Sep 26 '11 at 16:15
  • @igouy: Some people might want to simply confirm/deny the results from the benchmarks they ran, with the minimum of corrections necessary to at least give them some minimal relationship with reality. At the same time, you're basically right: the benchmarks in question are so bad that just fixing the most obvious errors isn't going to help much. – Jerry Coffin Sep 26 '11 at 16:32
  • And that's why, back in 2005, they were discarded and replaced by the tasks now shown in the benchmarks game. People who don't know any better re-run those old programs. – igouy Sep 26 '11 at 18:01
  • 13
    +1 I don't like people coding C++ in either C or Java style and then stating Java to be superior. disclaimer: I don't call any language superior, but writing crappy C++ code in a style that might be perfectly suited to another language doesn't make both languages comparable. – Christian Rau Sep 26 '11 at 21:38
21

All things being equal, you could say: no, Java should never be faster. You could always implement Java in C++ from scratch and thereby get at least as good performance. In practice, however:

  • JIT compiles the code on the end-user's machine, allowing it to optimise for the exact CPU that they are running. While there's an overhead here for the compilation, it may well pay off for intensive apps. Often real life programs are not compiled for the CPU you are using.
  • The Java compiler may well be better at automatically optimising things than a C++ compiler. Or it may not, but in the real world, things aren't always perfect.
  • Performance behaviour can vary due to other factors, such as garbage collection. In C++, you typically call the destructor immediately when done with an object. In Java, you simply release the reference, delaying the actual destruction. This is another example of a difference which is neither here nor there, in terms of performance. Of course, you can argue that you could implement GC in C++ and be done with it, but the reality is that few people do / want to / can.

As an aside, this reminds me of the debate regarding C in the 80s / 90s. Everyone was wondering "can C ever be as fast as assembly?". Basically, the answer was: no on paper, but in reality the C compiler created more efficient code than 90% of the assembly programmers (well, once it matured a bit).

Daniel B
  • 6,194
  • 2
    Regarding GC, it is not just that GC may delay the destruction of objects (which shouldn't matter in the longer term); the fact is that with the modern GCs, allocation/deallocation of short-lived objects is extremely cheap in Java compared to C++. – Péter Török Sep 26 '11 at 07:25
  • @PéterTörök yes, you are right, good point. – Daniel B Sep 26 '11 at 07:30
  • 10
    @PéterTörök But in C++, short-lived object are often put on the stack, which in turn is much faster than any GC-ed heap Java can use. – quant_dev Sep 26 '11 at 12:57
  • @quant_dev, you forgot another significant GC effect: compactification. So I would not be so sure which way is faster. – SK-logic Sep 26 '11 at 13:49
  • "implement Java in C++ from scratch" -- This seems unlikely to actually give performance gains. – Sean McMillan Sep 26 '11 at 14:00
  • @quant_dev: Yes, and as a C++ programmer you have to think about that when you write the code. Java programmers write the same code whatever the lifespan of the object. (OK, not a performance benefit but rather a “programmers' cognitive load” benefit…) – Donal Fellows Sep 26 '11 at 14:01
  • @Donal Having experience of writing performance sensitive code in both C++ and Java, I can assure you that you need to think while coding in both languages. – quant_dev Sep 26 '11 at 14:05
  • @quant: Anyone can write bad code in any language. (I shudder to think what it would look like in a language like SNUSP, but that's a story for another time.) Skill lies in writing good code, and often that's about choosing the right data structures and the right algorithms (which in turn takes thought). There's not much different in any language. Having to worry about memory management at the same time though? Hard to see that as being an advantage in the 99% case. :-) – Donal Fellows Sep 26 '11 at 14:44
  • 4
    @DonalFellows What makes you think I have to worry about memory management in C++? Most of the time I don't. There are simple patterns you need to apply, which are different from Java, but that's it. – quant_dev Sep 26 '11 at 15:31
19

The Java runtime isnt interpreting bytecode. Rather, it uses whats called Just In Time Compilation. Basically, as the program is run, it takes bytecode and converts it into native code optimized for the particular CPU.

GrandmasterB
  • 39,224
  • In practice, yes. In principle, it depends - early Java virtual machines used bytecode interpreters, and you can probably still find bytecode-interpreting VMs if you look hard enough. –  Sep 26 '11 at 06:24
  • 10
    @Steve314: but the purely interpreting VMs won't be the ones that outperform C++, so they are not really relevant to this question. – Joachim Sauer Sep 26 '11 at 06:53
  • The JIT compiler may also optimize dynamically for the specific use of the code, which is not possible with code that is compiled statically. – starblue Sep 26 '11 at 07:46
  • 2
    @starblue, well, it is somewhat possible with a static compilation - see the profile-guided optimisation. – SK-logic Sep 26 '11 at 10:08
10

But allocation is only half of memory management -- deallocation is the other half. It turns out that for most objects, the direct garbage collection cost is -- zero. This is because a copying collector does not need to visit or copy dead objects, only live ones. So objects that become garbage shortly after allocation contribute no workload to the collection cycle.

...

JVMs are surprisingly good at figuring out things that we used to assume only the developer could know. By letting the JVM choose between stack allocation and heap allocation on a case-by-case basis, we can get the performance benefits of stack allocation without making the programmer agonize over whether to allocate on the stack or on the heap.

http://www.ibm.com/developerworks/java/library/j-jtp09275/index.html

Landei
  • 1,993
  • This is only a small part of the whole picture, but pretty relevant nonetheless. – Joachim Sauer Sep 26 '11 at 07:00
  • 2
    I like how the substance of this is : java is for noobs, trust the magic GC, it knows better. – Morg. Nov 21 '12 at 08:47
  • 1
    @Morg: Or you can read it that way: Java is for people who like to get things done instead of wasting their time with bit twiddling and manual memory management. – Landei Nov 21 '12 at 19:38
  • 4
    @Landei I think your comment would have way more credibility if any decent long-lasting widely used codebase had been written in Java. In my world, real OS's are written in C, postgreSQL is written in C, as are most important tools that would really be a pain to rewrite. Java was (and that's even the official version) to enable less skilled people to program in herds and yet reach tangible results. – Morg. Dec 01 '12 at 15:50
  • @Morg A lot of financial software is written in COBOL, a lot of web applications are written in PHP, Apple uses ObjectiveC (which is IMHO an abomination). Does this say much about the quality of the underlying language? You can write good OS's in Java (JavaOS, Android) or C# (Singularity). – Landei Dec 01 '12 at 21:02
  • @Landei Financial software is not an important tool, web apps can be written in PHP (even I do it), it's still an inferior language that's slow and broken.You can't write good OS's in Java. Android is a linux with a Java GUI and the GUI is among the most broken GUI's in existence. It even shows some of the ugliest aspects of Java - none of the OS's you cite come close to being relevant vs unix. In summary it does say a lot about the quality of the underlying language. – Morg. Dec 04 '12 at 10:19
  • 3
    @Morg I find it very strange how you seem to focus just on OSs. This simply can't be a good measure, for several reasons. First, the requirements of OSs are crucially different from most other software, second you have the Panda thumb principle (who wants to rewrite a complete OS in another language, who wants to write his own OS if there are working and even free alternatives?) and third other software uses the features of the OS, so there is no need to ever write a disk driver, task manager, etc. If you can't provide some better arguments (not based entirely on OSs) you sound like a hater. – Landei Dec 04 '12 at 15:48
  • Well, my dear Landei, the only codebases that I know to be even remotely good quality are the kernel and postgresql, both of which are written in C. If of course you have another example you would like to show me, I'd be delighted to read it. – Morg. Dec 06 '12 at 14:40
6

Posted by Tim Holloway on JavaRanch:

Here's a primitive example: Back when machines operated in mathematically-determined cycles, a branch instruction typically had 2 different timings. One for when the branch was taken, one for when the branch wasn't taken. Usually, the no-branch case was faster. Obviously, this meant that you could optimize logic based on the knowledge of which case was more common (subject to the constraint that what we "know" isn't always what's actually the case).

JIT recompilation takes this one step further. It monitors the actual real-time usage, and flips the logic based on what actually is the most common case. And flip it back again if the workload shifts. Statically-compiled code can't do this. That's how Java can sometimes out-perform hand-tuned assembly/C/C++ code.

Source: http://www.coderanch.com/t/547458/Performance/java/Ahead-Time-vs-Just-time

  • 3
    And once again, this is wrong/incomplete. Static compilers with profile-guided optimsation can recognise this. – Konrad Rudolph Sep 26 '11 at 21:12
  • 4
    Konrad, static compilers can flip the logic based on current workload? As I understand, static compilers generate code once and it stays the same forever. – Thiago Negri Sep 27 '11 at 15:50
  • 2
    Current workload, no. But typical workload. Profile-guided optimisation analyses how your program runs under typical load and optimises hot-spots accordingly, just like the HotSpot JIT does. – Konrad Rudolph Sep 27 '11 at 16:02
  • 1
    @KonRadRudolph So, it can't do what a JIT compiler can. Note the most common kind of branch optimized like this is a virtual method call. – user253751 Jul 18 '21 at 11:13
5

While a completely optimized Java program will seldom beat a completely optimized C++ program, differences in things like memory management can make a lot of algorithms idiomatically implemented in Java faster than the same algorithms idiomatically implemented in C++.

As @Jerry Coffin pointed out, there are a lot of cases where simple changes can make the code much faster -- but often it can take too much unclean tweaking in one language or the other for the performance improvement to be worthwhile. That's probably what you'd see in a good benchmark that shows Java doing better than C++.

Also, though usually not all that significant, there are some performance optimization that a JIT language like Java can do that C++ can't. The Java runtime can include improvements after the code has been compiled, which means that the JIT can potentially produce optimized code to take advantage of new (or at least different) CPU features. For this reason, a 10 year old Java binary might potentially outperform a 10 year old C++ binary.

Lastly, complete type safety in the bigger picture can, in very rare cases, offer extreme performance improvements. Singularity, an experimental OS written almost entirely in a C#-based language, has much faster interprocess communication and multitasking due to the fact that there's no need for hardware process boundaries or expensive context switches.

Rei Miyasaka
  • 4,551
4

That is because the final step generating machine code happens transparently inside the JVM when running your Java program, instead of explicit when building your C++ proram.

You should consider the fact that modern JVM's spend quite a lot of time compiling the byte code on the fly to native machine code to make it as fast as possible. This allow the JVM to do all kinds of compiler tricks that can be even better by knowing the profiling data of the program being run.

Just such a thing as automatically inlining a getter, so that a JUMP-RETURN is not needed to just get a value, speeds up things.

However, the thing that really has allowed fast programs is better cleaning up afterwards. The garbage collection mechanism in Java is faster than the manual malloc-free in C. Many modern malloc-free implementations use a garbage collector underneath.

  • Note that this embedded stuff makes starting the JVM larger and slower until the better code has a chance of catching up. –  Sep 26 '11 at 06:13
  • 1
    "Many modern malloc-free implementations use a garbage collector underneath." Really? I'd like to know more; Do you have any references? – Sean McMillan Sep 28 '11 at 12:51
  • Thank you. I was trying to find a way of saying that the JVM no longer contains simply a just in time compiler compiling to executable code, but a hot spot compiler which profiles the running code and optimises further as a result. I one-time compiler like C++ struggles to match that. – Highland Mark Sep 29 '11 at 18:43
  • @SeanMcMillan, I saw an analysis a while back about the performance of malloc-free implementations where it was mentioned that the fastest one used a garbage collector underneath. I cannot remember where I read it. –  Sep 30 '11 at 07:24
  • Was it the BDW conservative GC? – Demi Aug 14 '16 at 22:19
4

Short answer - it is not. Forget it, the topic is as old as fire or wheel. Java or .NET is not and will not be faster than C/C++. It's fast enough for most tasks where you don't need to think about optimization at all. Like forms and SQL processing, but that's where it ends.

For benchmarks, or small apps written by incompetent developers yes, the end result will be that Java/.NET is probably going to be close and maybe even faster.

In reality, simple things like allocating memory on stack, or simply using memzones will simply kill the Java/.NET on the spot.

Garbage collected world is using sort of memzone with all the accounting. Add memzone to C and C will be faster right there on the spot. Especially for those Java vs. C "high-performance code" benchmarks, that go like this:

for(...)
{
alloc_memory//Allocating heap in a loop is verrry good, in't it?
zero_memory//Extra zeroing, we really need it in our performance code
do_stuff//something like memory[i]++
realloc//This is lovely speedup
strlen//loop through all memory, because storing string length is soo getting old
free//Java will do that outside out timing loop, but oh well, we're comparing apples to oranges here
}//loop 100000 times

Try to use stack based variables in C/C++ (or placement new), they translate into sub esp, 0xff, it's a single x86 instruction, beat that with Java - you can't...

Most of the time I see those benches where Java against C++ are compared it causes me to go like, wth? Wrong memory allocation strategies, self-growing containers without reserves, multiple new's. This is not even close to performance oriented C/C++ code.

Also a good read: https://days2011.scala-lang.org/sites/days2011/files/ws3-1-Hundt.pdf

Coder
  • 6,968
  • 5
  • 38
  • 49
  • 3
    Wrong. Entirely wrong. You won't be able to outperform a compactifying GC with your manual memory management. Naive reference counting won't ever be better than a proper mark'n'sweep. As soon as it comes to a complicated memory management, C++ is a retard. – SK-logic Sep 26 '11 at 14:03
  • 3
    @SK-Logic: Wrong, with memzones or stack allocation there is NO memory allocation or deallocation AT ALL. You have a block of memory and you just write to it. Mark block as free with volatile variable like concurrency protection InterlockedExchange etc., and the next thread just dumps it's data to preallocated block without going to OS for memory at all, if it sees it's free. With stack it's even easier, with the only exception that you can't dump 50MB on the stack. And that object lifetime is only inside {}. – Coder Sep 26 '11 at 14:46
  • 2
    with 1st generation of GC there is no "allocation" as well, it is equivalent to stack (1 register increment). Once you're dealing with even a marginally complicated data structures, you'd really need all the benefits that compactification can offer. – SK-logic Sep 26 '11 at 14:55
  • @SK-logic: compactification and complex data structures kills the performance more than Java vs C/C++. If you have either of them, it's not an app that is performance oriented in any way. Under those circumstances you can use C/C++ memory allocators without a problem as well, and they will be as fast as Java either way. This is why there are Java vs. C/C++ benchmarks that win. Choose insane amounts of unnecessary allocation, stick with complex data structures, voila, complex an inefficient code runs equally on forgiving and unforgiving platform. – Coder Sep 26 '11 at 15:21
  • 2
    are you trying to tell me that it is possible to avoid using complex data structures? Interesting. – SK-logic Sep 26 '11 at 15:38
  • 1
    @SK-logic: For performance critical parts of the application, yes I do. Looping over null terminated byte-array is always faster than looping through tab terminated string via streaming libraries. Or searching via static hash tables vs. dynamic name-value collections. You have to know when you should choose which. And GC is not going to save you from the fact that your data container is simply ineffective, it can amortize the costs, but it ends right there. And when you know you need to optimize one exact hotspot, it's a lot easier to do that in C/C++. – Coder Sep 26 '11 at 16:16
  • 2
    ok, compilers are often quite performance-critical. Go on, tell me how to avoid not building abstract syntax trees, control flow graphs and all other sorts of DAGs, and many other simple structures, which are already too complex for ad hoc memory management. – SK-logic Sep 26 '11 at 16:18
  • 2
    @SK-logic: Compilers are correctness first, performance second. Search engines, database engines, real-time trading systems, games are what I would consider performance critical. And most of those rely on flattish structures. And either way, compilers are mostly written in C/C++. With custom allocators I guess. Then again, I see no problems with using tree or list elements over rammap. You just use placement new. There is no much complexity in that. – Coder Sep 26 '11 at 16:38
  • 1
    I suspect you know what is important for compilers better than I. I can agree that games are a way much more important and thus performance critical, but as for database engines, if you ever had a chance to hack a query optimiser, you could notice that it is almost as complicated as an average compiler. No flattish data structures. Just admit - low level, non-managed environments are not very good in handling memory management. A smart, tuned GC is much faster for a typical load profile for a data strucures-intensive application. – SK-logic Sep 26 '11 at 16:44
  • 3
    @SK-logic: It's not much faster, every .NET/Java app I've seen, always turned out to be slower, and a real hog. Every rewrite of managed app into SANE C/C++ code resulted in cleaner and lighter app. Managed apps are always heavy. See VS2010 vs 2008. Same data structures, but VS2010 is a HOG. Correctly written C/C++ apps usually boot up in milliseconds, and don't get stuck on splash screens, while also consuming a lot less memory. The only downside is that you have to code with hardware in mind, and a lot of people don't know how it is nowadays. It's only benchmarks where managed have a chance. – Coder Sep 26 '11 at 17:12
  • 4
    your anecdotal evidence does not count. Proper benchmarks shows the real difference. It is especially strange that you're referring to the GUI applications, bound to the bulky and suboptimal GUI libraries. And, what is more important - in theory the performance limit is much higher for a properly implemented GC. – SK-logic Sep 26 '11 at 17:32
  • @SK-logic: Whatever, it's the managed crowd that has to prove anything, especially from practical perspective. Case closed. – Coder Sep 26 '11 at 18:28
2

The reality is they are both just high level assemblers that do exactly what the programmer tells them to, exaclty how the programmer tells them to in the exact order the programmer tells them. The performance differences are so small as to be inconsequential to all practical purposes.

The language is not "slow", the programmer wrote a slow program. Very rarely will a program written the best way in one language outperfrom (to any practical purpose) a program doing the same thing using the best way of the alternate language, unless the author of the study is out to grind his particular axe.

Obviously if you are going to a rare edge case like hard realtime embedded systems, the language choice may make a difference, but how often is this the case? and of those cases, how often is the correct choice not blindly obvious.

mattnz
  • 21,362
  • 3
    In theory, an "ideal" JITting VM must outperform statically compiled code, by adjusting its optimisations to the dynamically collected profiling information. In practice, JIT compilers are not that smart yet, but they're at least capable of producing a code of a similar quality as with their bigger and slower static peers. – SK-logic Sep 26 '11 at 10:15
2

See the following links ... But how is this even possible? It boggles my mind that interpreted bytecode could ever be faster than a compiled language.

  1. Do those blog posts provide trustworthy evidence?
  2. Do those blog posts provide definitive evidence?
  3. Do those blog posts even provide evidence about "interpreted bytecode"?

Keith Lea tells you there are "obvious flaws" but does nothing about those "obvious flaws". Back in 2005 those old tasks were discarded and replaced by the tasks now shown in the benchmarks game.

Keith Lea tells you he "took the benchmark code for C++ and Java from the now outdated Great Computer Language Shootout and ran the tests" but actually he only shows measurements for 14 out of 25 of those outdated tests.

Keith Lea now tells you he wasn't trying to prove anything with the blog post seven years before, but back then he said "I was sick of hearing people say Java was slow, when I know it's pretty fast..." which suggests back then there was something he was trying to prove.

Christian Felde tells you "I didn’t create the code, just re-ran the tests." as-if that absolved him from any responsibility for his decision to publicise measurements of the tasks and programs Keith Lea selected.

Do measurements of even 25 tiny tiny programs provide definitive evidence?

Those measurements are for programs run as "mixed mode" Java not interpreted Java - "Remember how HotSpot works." You can easily find out how well Java runs "interpreted bytecode", because you can force Java to only interpret bytecode - simply time some Java programs run with and without the -Xint option.

igouy
  • 574
1

Maybe research we recently did, could also give some ideas on how Java could be faster than C++ (in some cases). The article can be found here: https://www.researchgate.net/publication/352705416_Improving_productivity_in_large_scale_testing_at_the_compiler_level_by_changing_the_intermediate_language_from_C_to_Java

In this research project, we extended a compiler to use Java as an additional intermediate language, besides C/C++. We kept as much of the architecture as Java allowed. For the exact same input (source code) the build produced executables which read the exact same configuration files, produced the exact same functional behavior, and generated the exact same logs.

In short: we have observed, that the C/C++ implementations started up faster. But with time, the Java side implementation became gradually faster, to the point where, in long-running scenarios, it could end up finishing earlier.

It would be reasonable to believe, that the Just In Time (JIT) compiler of the JVM was able to notice some optimization opportunities during execution, that were either missed by the C/C++ compiler used or only present themselves during execution. Please note, that JIT compilation itself, looking for optimization opportunities, might also contribute some runtime performance penalty to the program, as it takes away some resources from the actual work. In the situation, we observed this work seemed to bring more benefits in the long run. We hope in a future research project we can get to study this effect deeper.

-1

Actually, JIT compilation has one major advantage compared to ahead-of-time (AOT) compilation. That is that the JIT compiler is able to optimize the machine code based on what the code is actually doing, not just what it could do.

A good example is a virtual function call (method call in Java). When you write (C++) pBase->SomeVirtualFunction(); it may call either Derived1::SomeVirtualFunction or Derived2::SomeVirtualFunction. In C++, the compiler will emit instructions to check the object type and call the appropriate function. (This will take the form of an indirect branch through a vtable)

In Java, the JIT compiler is able to notice that you're always calling Derived1::SomeVirtualFunction - and just call it directly. This is called inline caching (for Java see here). The benefit is that the CPU already knows which function is being called and doesn't have to wait for a memory access or indirect branch to resolve. The compiler can even inline the function or specialize it.

Of course, it also has to insert code to make sure this was the correct function to call - if not, it has to undo whatever the code did up to that point, then go back and recompile the code with a proper virtual call and resume it. The check can be allowed to run in parallel with the actual code - think of it as speculative execution on a software level.

However, this is unlikely to affect benchmarks, as benchmarks are written for optimal performance in each language. If virtual calls are slow in C++, the benchmark will be written to avoid virtual calls.

user253751
  • 4,873
  • 1
    Benchmarks should be written from scratch in each language by someone proficient. Not doing so provides more controversy though, which is often the true goal. Also, profile-driven optimization can nearly always provide all the advantages of adapting the code to the encountered data, without the overhead of keeping track and recompiling. – Deduplicator Jul 18 '21 at 12:18