Garbage Collection in Java

· 7 min read · #java #garbage-collection #memory #c #cpp

Java is a garbage-collected language. You create objects with new, and you never manually free their memory; something else reclaims that memory for you. It is worth understanding what that something else does, and what it costs.

Start with the problem it solves. Conceptually, every object your program creates has to live somewhere in memory, in a region called the heap. The heap is not infinite. Every object takes a chunk of it, and once the program can no longer use an object, that chunk has to be handed back, or the heap fills up and the program runs out of memory. The question is who hands it back, and when. C, C++, and Java answer that question in three different ways, and comparing the three is the clearest way to understand what garbage collection is.

C: you free memory by hand

In C, you manage the heap yourself. You ask for memory with malloc, and you give it back with free:

int *p = malloc(sizeof(int));   // reserve a chunk of heap, get its address
*p = 5;                         // use it
free(p);                        // hand the chunk back

The rule is simple: every successful malloc must eventually be matched by exactly one free, no more and no less. If you get that wrong, you hit one of three common bugs.

The first is a memory leak. You allocate, you forget to free, and that chunk stays reserved until the program ends. One small leak in a short-lived program may not matter much. A leak inside a loop, in a program that runs for a long time, accumulates until the program runs out of memory.

The second is a dangling pointer. You free a chunk, but you still hold a pointer to it, and later you use that pointer. The memory may have already been handed out to something else by then, so you are reading or writing data that no longer belongs to you. These bugs are hard to reproduce and are a common cause of crashes and security vulnerabilities.

The third is a double free: calling free twice on the same chunk, which corrupts the heap's bookkeeping.

C does not catch any of these for you. You have full control over memory and full responsibility for managing it correctly.

C++: cleanup tied to scope

C++ keeps everything C has. It has new and delete, which are the C++ equivalents of malloc and free, and the same three bugs are still possible. But C++ also provides a way to avoid freeing memory by hand.

The idea is to tie the lifetime of a heap object to the lifetime of an ordinary stack variable. When a C++ object with automatic storage duration goes out of scope, C++ guarantees that a special method called its destructor runs automatically. If that stack object owns heap memory, its destructor can free that heap memory, so cleanup happens automatically when the owner goes out of scope. This pattern is called RAII ("resource acquisition is initialization"), but the idea is just to let scope exit do the freeing.

In modern C++ you rarely write delete yourself. You use smart pointers that apply this pattern for you:

std::unique_ptr<int> p = std::make_unique<int>(5);
// use *p ...
// no delete needed: when p goes out of scope, the int is freed

A unique_ptr owns its object and destroys it when it goes out of scope. A shared_ptr lets several owners share one object and destroys it when the last owner goes away, by keeping a count of how many owners there are.

The important property is that the cleanup is deterministic. You know what event triggers cleanup: a unique_ptr destroys its object when the owner goes out of scope, and a shared_ptr destroys its object when the last owner goes away. There is no background process. You still decide who owns what, and you can still get it wrong — two shared_ptrs that point at each other form a cycle that keeps each other alive and is never freed — but the common case is handled predictably.

Java: the runtime frees memory for you

Java takes the responsibility away from you. You allocate with new, and there is no second step. There is no free and no delete. You never reclaim memory yourself.

Employee e = new Employee(1);
// ... use it ...
// no free, no delete; you simply stop using it

Instead, the JVM runs a garbage collector: a process that, from time to time, determines which objects are still in use and reclaims the rest. The question it has to answer is which objects are still in use, and the answer is based on references — the links a variable holds to an object, and that object holds to another.

An object is live if it is reachable: if you can get to it by starting from a set of roots — the local variables currently on the stack, the static fields of your classes, and a few others — and following references from object to object. If no chain of references leads to an object, then nothing in your program can name it or use it again. It is garbage, and the collector can reclaim its memory.

This changes how you free something. You do not free it; you make it unreachable by dropping the references to it:

Employee e = new Employee(1);
e = null;   // if no other reachable reference points to that object, it is now eligible for collection

You are not telling the runtime to delete the object. You are telling it that you no longer hold a reference to the object, and at some later point the collector reclaims it. (Letting a local variable go out of scope does the same thing, without writing null.)

In exchange for giving up control, you avoid all three C bugs. You cannot have a dangling pointer, because you never free anything that a live reference still points to. You cannot double-free. And the "forgot to free" leak is mostly gone, because the collector reclaims objects once they are no longer reachable. This safety is a large part of why Java has no C/C++-style pointers: with no manual free, no raw addresses, and no pointer arithmetic, dangling-pointer and double-free bugs are removed from ordinary Java code.

Tracing collection has another benefit. Because the collector works by tracing what is reachable, it handles cycles automatically. Two Java objects that refer to each other but that nothing else points to are still unreachable from the roots, so they are collected. This is the case that reference-counting smart pointers in C++ do not handle.

Garbage collection also has costs.

The main cost is that collection is non-deterministic. Unlike C++'s scope-based cleanup, you do not know when an object will be collected, or even whether it will be collected before the program ends. The collector runs on its own schedule, and when it runs it may briefly pause your program to do its work — a "stop-the-world" pause. Modern collectors are designed to keep these pauses small, but pause behavior depends on the collector and workload, so for latency-sensitive software they are still a concern. Garbage-collected programs often need extra memory headroom, because memory is usually reclaimed in batches rather than immediately at the moment an object becomes unreachable.

There is also a subtler problem. Java can still leak, not in the C sense, but in a logical one. If you keep a reference to an object you no longer need (a common cause is an object left in a long-lived collection or cache), it stays reachable, so the collector will never reclaim it. The collector reclaims the unreachable; it cannot know that a still-reachable object is one you are done with. Java leaks are leaks of unintended reachability, not of forgotten frees.

One more consequence: because the collector may move objects around in memory to reduce fragmentation, an object's address is not stable over its lifetime. This is one reason Java does not show you raw addresses or let you do pointer arithmetic — the address you saw a moment ago might no longer be correct. And because collection timing is unpredictable, Java does not give you a reliable destructor the way C++ does. (Java historically had finalizers, but they are unreliable and deprecated; they are not a substitute for deterministic cleanup.) That is fine for memory, which the collector handles, but not for other resources like open files or network sockets, which need to be closed promptly. For those, Java provides try-with-resources: a scope-based cleanup mechanism for objects that implement AutoCloseable, where Java calls close() automatically at the end of the block.

The trade-off

The three languages form a spectrum that trades control against safety:

  • C gives you full manual control. You have precise control over when memory is freed, and full responsibility for getting it right.
  • C++ keeps the control but automates the work. With RAII and smart pointers, cleanup happens deterministically at scope exit, and you mostly stop writing delete by hand.
  • Java gives the job to the runtime. Whole categories of bugs are removed, at the cost of determinism, some memory overhead, and occasional pauses.

No point on this spectrum is universally correct. A database engine or an operating-system kernel may need C's control; a large business application is usually better off with Java's safety. Each language makes a different choice.

The short version

  • In C, you free memory by hand with free, and you own every mistake: leaks, dangling pointers, double frees.
  • In C++, you can tie cleanup to scope with destructors and smart pointers, so owned memory is reclaimed automatically and deterministically when the owner is destroyed. You control when, and the language does the work.
  • In Java, a garbage collector reclaims whatever your references no longer reach. You stop managing memory and start managing reachability: safer, but you give up knowing exactly when cleanup happens.

You never call free in Java. But you are still responsible for letting go of references you no longer need.