68

Well, I know that there are things like malloc/free for C, and new/using-a-destructor for memory management in C++, but I was wondering why there aren't "new updates" to these languages that allow the user to have the option to manually manage memory, or for the system to do it automatically (garbage collection)?

Somewhat of a newb-ish question, but only been in CS for about a year.

genesis
  • 205
  • 1
  • 10
Dark Templar
  • 6,303
  • 16
  • 47
  • 47
  • 10
    We've got a module in iPhone development this semester. After coding apps for Android for 2 years, this question struck most of the class pretty hard. Only now do we see how many hours Java has actually saved us in not having to track down nasty memory management errors and not having to write boiler plate code. – siamii Oct 08 '11 at 23:28
  • D has this capability (both GC and manual memory management). C and C++ can have it throw Boehm GC. – deadalnix Oct 09 '11 at 02:04
  • I could be wrong, but the JVM spec doesn't call for a GC. – NullUserException Oct 09 '11 at 04:25
  • 7
    @NullUserException, since it doesn't specify a way to reclaim memory that pretty much implies a GC. – Winston Ewert Oct 09 '11 at 04:48
  • 1
    @bizso09: Did you look at ARC yet? No need for slow/fat/non-deterministic GC when you've got system-support for reference counting: http://developer.apple.com/technologies/ios5/ – JBRWilkinson Oct 10 '11 at 12:01
  • For all those saying that C cannot have a garbage collector, why not use one which supports the malloc-free API? –  Oct 14 '11 at 09:58
  • @acidzombie24 I wouldn't say GC is bad. It eliminates a common source for mistakes and takes work away from the programmer, at a price. Every language is a child of its time intended use. – Mark Oct 28 '14 at 07:32
  • 4
    The answers to this pretty good question are full of religious bullshit. – abatishchev Apr 06 '15 at 03:47
  • Mainly because garbage collection is not free. – user253751 Sep 07 '15 at 03:35
  • 1
    In C and C++ it is possible to take a pointer, cast it to int and add a number to it. Later substract the number from the int and cast the result back to a pointer. You will get the same pointer as before. Good luck in implementing a GC which does NOT collect that memory while its address is stored only in the variable which also has another value. I know the example is silly but a XOR linked list uses something similar. I would post this as an answer but the question is closed. – Marian Spanik Jun 18 '16 at 09:26
  • And how should you implement a GC on platforms with only 256 Bytes (Yes Bytes, like ATTINY4) of RAM? There you can not use dynamic memory at all. How would you write a realtime software with a GC? There you most likely do not use dynamic memory. And how would you implement a GC without a language like C or C++ when you not want to use assembler? – 12431234123412341234123 Jul 05 '17 at 08:02
  • @MarianSpanik This can run into UB if sizeof(int)<sizeof((yourtype*)) is True or when adding the number result in a overflow. To avoid this use uintptr_t. – 12431234123412341234123 Jul 05 '17 at 08:11

16 Answers16

79

Garbage collection requires data structures for tracking allocations and/or reference counting. These create overhead in memory, performance, and the complexity of the language. C++ is designed to be "close to the metal", in other words, it takes the higher performance side of the tradeoff vs convenience features. Other languages make that tradeoff differently. This is one of the considerations in choosing a language, which emphasis you prefer.

That said, there are a lot of schemes for reference counting in C++ that are fairly lightweight and performant, but they are in libraries, both commercial and open source, rather than part of the language itself. Reference counting to manage object lifetime is not the same as garbage collection, but it addresses many of the same kinds of issues, and is a better fit with C++'s basic approach.

Others
  • 101
  • 4
kylben
  • 2,308
  • 31
    A secondary issue is the GC is non-deterministic. The object may or may not still be in memory long after the program has "dropped" it. Refcount lifecycles are deterministic, when the last reference is dropped, the memory is dropped. This has implications not only for memory efficiency, but for debugging as well. A common programming error is the "zombie" object, reference memory that has theoretically been dropped. GC is much more likely to mask this effect, and produce bugs that are intermittent and extremely difficult to track down. – kylben Oct 08 '11 at 22:17
  • 2
    If you want to experiment with a language that supports both, try Objective-C. It can do reference counting or garbage collection. –  Oct 08 '11 at 22:28
  • 24
  • modern gc's neither track allocations or count references. They build a graph from everything currently on the stack and just condense and wipe everything else (simplified), and GC normally results in reduced language complexity. Even the performance benefit is questionable.
  • – Joel Coehoorn Oct 09 '11 at 00:56
  • 14
    Er, @kylben, the whole point of having automatic GC baked into the language is that it becomes impossible to reference zombie objects, because the GC only frees objects that are impossible to reference! You get the sort of hard-to-track bugs you're talking about when you make mistakes with manual memory management. – Ben Oct 09 '11 at 01:08
  • 14
    -1, GC does not count references. Plus, depending of your memory usage and allocation scheme, a GC can be faster (with an overhead in memory usage). So the argument about performance is fallacy too. Only the close to the metal is a valid point actually. – deadalnix Oct 09 '11 at 02:07
  • 1
    My understanding was that both Java and C# count references AND do graph-traversal. This is because reference-counting has much, much less overhead, and works correctly about 95% of the time. Also, I think kylben meant increased complexity in implementing the language, not programming for it. – BlueRaja - Danny Pflughoeft Oct 09 '11 at 02:44
  • 14
    Neither Java nor C# use reference counting: GC schemes based on reference counting are pretty primitive in comparison and perform much worse than modern garbage collectors (mainly because they need to do memory writes to alter reference counts whenever you copy a reference!) – mikera Oct 09 '11 at 02:59
  • 2
    Ben, that's true if the language is designed to lock that and force all references through certain channels. C++ does not, so this argument still holds against grafting it on.

    Danny, yes, that is what I meant.

    As far as all the comments on implementation, I'm not familiar with the nitty gritty and was not aware of the stack-inspection method. That's good to know, but I don't think it changes the argument too much, it's still overhead compared to traditional C/C++ ownership-based explicit deletion, or compared to refcounting, which is basically an atomic exchange and an if (jz).

    – kylben Oct 09 '11 at 06:43
  • 1
    @kylben You explicitly drew a distinction between reference counting schemes in C/C++ and GC, and then wrote a comment stating that GC has the risk of masking zombie object bugs. This is severely misleading; it's only true if you're using a library based GC such as Boehm and you circumvent it. GC was specifically invented to make bugs such as zombie objects impossible. – Ben Oct 09 '11 at 23:00
  • 7
    @kylben Also, as others have said, reference counting is hugely inefficient compared to sophisticated modern GCs, and even hand-tuned manual free can be sub-par for some workloads. The advantage of C/C++ isn't that they're always more efficient than more managed environments, it's that they make it possible to write more efficient code when the managed environment is sub-par for your use-case. – Ben Oct 09 '11 at 23:04
  • 4
    @Ben: You can prevent access to zombie objects, or you can free unreachable subgraphs with cycles, or you can support finalizers, pick two. If you free an unreachable subgraph, finalizers running on components of that subgraph may still reference now-destroyed portions of said subgraph. – Ben Voigt Oct 15 '11 at 03:49
  • @BenVoigt I don't know that that's inherently true. I can imagine schemes where you run all the finalizers before deallocating any of the objects and only afterwards free anything that's still unreachable. I have gotten the impression that some modern languages are moving away from finalizers being a good idea because of complications with GC though, so your point has a measure of truth. I'd though it was impossible to do the GC/finalizers combo efficiently rather than impossible to do it though. – Ben Oct 15 '11 at 05:25
  • 1
    @Ben, which again gets at what the question is about. It's not whether GC is better or worse, its why C++ in particular doesn't have it. The design of C++ works against reliable and efficient GC. – kylben Oct 15 '11 at 13:01
  • 5
    @Ben: But if there are cycles in the subgraph being finalized, there is no "safe" order to run the finalizers. You won't be accessing memory that's repurposed, but you will have access to zombie objects (those for which the finalizer has already run). – Ben Voigt Oct 15 '11 at 13:27
  • 1
    @BenVoigt, and cycles are always a problem vis a vis object lifetimes, no matter what approach you take to cleanup. Its why the company I work for now has a fairly strict policy of object models always being acyclic (a very large C++ app). It's made life a hell of a lot easier, and dealing with the legacy code a hell of a lot more hellish, by comparison. A lot of this gordian knot is cleaved by just not allowing cycles. The assumption that there could be cycles is baked into most languages' designs, but it is an assumption that is usually never questioned. It should be. – kylben Oct 15 '11 at 13:37
  • 3
    @kylben: Agreed. But "they deal with cycles" is one of the most touted features of generational GC. In fact, there are still landmines. – Ben Voigt Oct 15 '11 at 13:38
  • @kylben - in the most general case, statically disallowing reference cycles while allowing linked lists, binary trees etc would be impossible, and immediately detecting and disallowing the creation of cycles at runtime would a potentially problematic overhead. Banning reference cycles in some data structures (e.g. double-linked lists) is impossible by definition, though that's usually an issue you can leave to your library. It's a non-issue too, if nodes don't have destructors. I certainly believe high-level code (and a lot of low-level code) can work with a reference-cycle ban. –  Oct 15 '11 at 16:24
  • @kylben - a slightly more permissive option might be to ban destructors for recursive data structure nodes (containing direct or indirect pointers to the same type - should be statically checkable). This shouldn't be a hassle - the destructor cleanup just moves up a layer to the container-class level instead of the data-structure node level. Of course the problem is that this is a language-design option - not applicable to C++, Java or any other current language I'm aware of. –  Oct 15 '11 at 16:30
  • @Steve314, sure, that's why I hedged a bit with "fairly strict", there are cases in libs and frameworks where the implementations have local cycles, but those are generally subject to much higher scrutiny and careful implementation. For a double-linked list, just don't have your operational objects hold the links, link together very lightweight objects that hold references to them, and don't let the reference holders manage lifetime in any way. I believe STL containers do the first part of that at least. – kylben Oct 15 '11 at 16:37
  • @Steve314, some reference counting schemes in C++ do just that.

    There's no programming problem that can't be solved with another layer of indirection. :-)

    – kylben Oct 15 '11 at 17:40
  • This answer is out of date as of C++11 since C++11 introduced smart pointers. – Pharap Jun 17 '16 at 00:07