When it comes to compilation (not only in Java) one can name at least two types of the latter. Let’s check out the differences.
Let’s say you are developing an application that consists of only one class. After the job is finished you definitely want to run the code. What happens under the hood?
Your code is transformed (means – compiled) to a form of a bytecode using javac
. The bytecode is a set of commands for the JVM. A bytecode is considered as an intermediate code representation.
Of course, your code should be run in a concrete environment and the set of instructions from the bytecode should be compiled to instructions with which the processor is familiar.
When your code runs at JVM firstly interpreter reads the bytecode line-by-line and transforms each command to a related machine instruction using a dictionary. The interpreter is a nice thing. It has almost zero startup costs because no time is spent on the compilation. But that comes at a price of efficiency, the approach when each line of a bytecode interprets one at a time is slow.
The JVM is able to detect the most-used or “hot” blocks. Being run many times such blocks are subjected to both compilation and optimization, compiler transforms these commands to machine instructions which are much more efficient compared to sequential interpreting. The compiler is usually called a C1-compiler. Compiled code is stored at the code cache.
After the code is running for some time and JVM has already profiled the main routes inside the application, the C2 compiler goes on the stage. It performs heavier optimizations and replaces the previous machine instructions made by C1 in the code cache.
Before Java 7 we were able to choose which compiler to use. C1 was faster but provided less efficient optimizations, while C2 was slower and less resource-efficient, but gave more optimization and was good for server-side applications.
Since Java 8 there is an opportunity to use both C1 and C2 at the same time.
Optimizations
Which are the optimization we talking about?
They are
- Removing the code which is not used (dead code);
- Moving objects which were created inside methods and never returned, to the stack instead of the heap (escape analysis);
- Combining, unrolling loops and loops inversion;
- Moving bodies of small methods within calling methods (method inlining);
- Removes the lock if only one thread utilizes it (lock removal);
- If a variable is never null – removes a null check code (null check removal);
- and many others!
Both C1 and C2 do not stop the applications during the execution.
Why not just use pre-compilation?
Well, why do we need to run an interpreter at first and later compile hot blocks to the machine code (in other words – use JIT: just in time compilation)? Why not just pre-compile the whole code and run it (= use ahead of time compilation or AOT)?
AOT compilation produces compiled code (oh, really?) which is stuck to the environment it was compiled for. It can not be moved to another hardware architecture, it is not cross-platform. The compiled code is loading directly to the code cache.
Since Java 9 we have had an opportunity to compile Java code directly to machine instructions using jaotc
.
Summary
- Use bytecode if portability is important;
- Bytecode is initially interpreted line-by-line, one at a time;
- C1 compiler does fast yet not very efficient optimizations;
- C2 compiler does relatively slow but efficient optimizations;
- AOT compilation means that the whole code is pre-compiled directly into machine instructions bypassing an interpreting stage;
- Code cache used to store compilated code blocks made by C1, C2 or directly loads pre-compiled machine instructions;