Essentials of Interpretation. Lecture [3/18] Compilers: AOT, JIT, Transpiler


Welcome back, guys! So in the previous
video we have discussed interpreters in particular,
AST-interpreters and Bytecode interpreters, and today we’ll talk about
compilers. So let’s start from the AOT compiler which stands for Ahead-of-time
translation. And what is “Ahead-of-time”? Well, this means before code execution,
right we fully translate the source code before execution. Okay so this is our AST
code which we obtained in the previous lecture and in case of the AOT
compiler this AST is passed at static time that is at compile time to the
module which is called Code generator Now the Code generator after producing
multiple intermediate representations that is IRs, eventually produces the
Native code. In this case as we see it’s x86 or x64 but it can be any target
architecture for example: Arm Web-assembly and others. Now we should also
say that compiler engineers tend to call as “IR” that is intermediate
representation specifically the representations which come after the
code generator, that is roughly after bytecode, and as we can see, there
might be multiple intermediate presentations, and for each of them
there might be its own code generator And then this code is eventually passed
to the CPU which produces the final result, and this happens at runtime. So
this was the AOT-compiler When we talk about compilers we also
have backend engineers and frontend engineers. And the frontend boundary is
usually anything before the AST (including the AST), and the backend is
something after the code generator multiple IRs and finally the target code.
And from this perspective there is a project which is called LLVM which
stands for “Low-Level Virtual Machine” which allows us reusing the existing
backend. Let’s see what it is. So here’s again our parsed code, and using this LLVM
backend allows us generating the native code for multiple architectures. So
if you want to have a cross-platform cross-compiling code for x64, arm, etc
you can use the LLVM which will handle this automatically for you. And in
case of LLVM we need to produce specific LLVM bytecode. So pretty much
similar as we had bytecode emitter here we have LLVM IR generator, which is
nothing but bytecode emitter, but not arbitrary bytecode, instead — specific
LLVM byteccode which is called LLVM IR And once we have generated this LLVM IR,
you can further pass it to the black box called “LLVM” which fully abstracts its
code generation, however what is important here as we said, that it’s able
to produce multiple target source codes for multiple platforms, and which
again eventually is passed to the CPU and executed at runtime. OK, let’s take
a look at the compilation example. We have this source C++ file, and as you can
see, it’s our expression from the previous lecture: x=15; and then we
return x + 10 – 5; Now to compile this code I use standard clang++ utility, which produces us this “a.out” file. “a” stands for “assembly” and
“assembly output”. And we can normally execute this generated binary file, and
as we can see it contains correct result. And if we introspect this
file, we really see this binary code generated and that is exactly the
machine code. And so we can obtain actually the source representation of
this machine code if pass the “-s” option now this produces us the source “.s” file,
which is nothing but an assembly output for this program. And a specific
instructions here are here moving value 15, putting it into register %eax,
and + 10 – 5. Again the result is in the %eax. And the interesting thing
here if we pass the optimisation option we see that compiler was able to
completely eliminate all the calculations and directly pre-evaluate
the result 20, moving it directly into final register %eax. So that’s the
optimizing compiler. And this is nothing but calling an interpreter at
compilation stage. Right we call interpreter to pre-evaluate some
expressions. And we have talked about LLVM, and clang actually is able to produce specifically LLVM IR if we pass “-emit-llvm” option. In this case it produces this file “source.ll” and
this is exactly the LLVM IR. As we can see again still the same instructions
moving 15, adding 10, and minus 5. So this is LLVM compiler. Now let’s talk about
JIT-compiler. So the “JIT” stands for “Just-in-time”. What is “Just-in-time”? Well,
this is at runtime, that is translating the code directly when the program is
being executed. So “at runtime” assumes that there is actually some runtime
already involved here, and in fact when people talk about JIT-compiler they
actually assume a Virtual machine As we remember, a virtual machine after
obtaining this bytecode is able to directly obtain the result by
interpreting this bytecode. However imagine the situation when we have some
heavyweight function which does some complex calculation, and this function is
called multiple times during the program execution. So eventually this
function becomes a performance bottleneck. And what if instead of
getting the output here directly, we obtained it indirectly? So instead of
interpreting this function over and over again, what if we pause for a second, and
call code generator directly at runtime. This code generator produces the
native code which is then passed in directly to the CPU, and the CPU obtains
the correct result. And once the CPU executed this compiled code, it can
jump back to the interpreter and interpreter can proceed normally from
there. Next time this function is called we don’t even compile this code
at runtime, instead we use cached compiled version and have a jump
directly to the CPU. Again the purpose of the JIT compiler is to
improve performance of the heavyweight operations. Although we should say there
is a trade-off in case if we have simple operations: it might be actually faster
to just interpret them instead of calling the code generator, spending time
on the compile and setting the jump jump back,
so you may consider this for specific optimizations when have some hot path in
the code, and it’s makes sense to compile it to the native code. Okay so that was
the JIT-compiler, that is translation at runtime. Now let’s take a look at the
final transformation pipeline known as the AST-transformer. Now the AST-transformer provides the high level translation and usually it’s called
“Transpiler”, from the “Transformer + Compiler” And the output of the AST-transformer is another AST Let’s take a look. So here is our parsed
code again, and in this case we pass it to the module which is called AST-transformer, which obtains as we said the next representation, which is also just
an AST. Now what is important here is that this AST might be
of the same language or of the completely different language. For
example, I can translate a new version of JavaScript to the old version of
JavaScript, or we can translate for example, Python to JavaScript, obtaining completely different AST And then this AST is passed again to the
code generator, although to the high level code generator sometimes it’s
called “Printer”, which prints the next representation, the high-level source in
the different program. In this case let’s pretend we translated the Ruby code to
the JavaScript code, right as you can see we added parentheses and the semicolon
at the end. And once we have this new source code we can pass it to the black
box called “Compiler” but it contains the full transformation cycle starting from
the tokens, from the parsers, ASTs, etc and after eventual transformation it
will get the correct result. So again this is the AST-transformer, and as you
can see it’s a pure frontend, it operates mainly at the AST, there is no
access to the memory here, machine instruction, etc and you fully rely on
the this black box “Compiler”, again you hope that there is some interpreter
for this translated code. So we have been talking about interpreters and compilers,
and this may raise interesting question Is JavaScript or Python or C++ or
whatever — interpreted or compiled langauge? Well, in fact this is a
wrong question What interpreted or compiled is not the
languages but instead implementations We can easily have an interpreter
for C++: we can allocate a virtual heap in JavaScript, and implement C++ semantics. Moreover if we take for example optimizing compiler or C++, it is
nothing but calling C++ interpreter during the compilation time. So
from one side it’s a C++ compiler, but to do some pre-evaluation and optimizations
it needs to call interpreter for the same C++ language. What this means is,
whatever you choose to implement your language, right whether it will be AST
interpreter, Bytecode interpreter or AOT-compiler, doesn’t really
matter. What matters, is the semantics, the final semantics should be preserved. And
from this perspective it is actually a good idea when you design your language
to implement first the AST interpreter as a proof of concept whether your
language is vital, and already after we have implemented full programming
language at the AST level you may go further and optimize for example the
program storage and translate this AST to the bytecode. Once we have bytecode
interpreter it might make sense to add JIT-compiler support right again for the
heavyweight operations which requires faster execution. And that’s exactly
the reason why we’ll be using the AST interpreter in our class since again
it’s easier to explain the semantics of the programming language, and we’ll be
able to implement full semantics of a programming language. Okay so at this
step we have reached the check point for the Part 1. Let’s do a quick recap. So
as we know there are interpreters and compilers, and what is interpreted and
compiled are not the languages but their implementations. We also know that
compilers just translate from one language to another and don’t execute
any code. What actually executes the code is the interpreter or the machine. We
also know the concept of the Abstract Syntax Tree or the AST, and know about
AST interpreters as well as about AST-transformers or Transpilers.
We also have talked about AOT compilers which fully translates to
the target language before execution and of the JIT-compilers that is
just-in-time compilation which happens at runtime. We also said that bytecode
interpreter or the virtual machine is usually an imitation of the real machine
and it optimizes for the program format that is plain array of bytecode
instructions. And we also know that there is frontend and backend of the
compiler. Frontend is everything that goes until the AST level, and the backend
is usually after code generation and different IRs, bytecodes and further the
native code. Okay, at this step we have a high-level understanding of the
different compilation and interpretation pipelines, and this
knowledge is enough for us to actually start building our language. And in
the next lectures will already start building the first parts of our
interpreter and we’ll consider the simplest expressions. That’s it for today,
thanks, and see you in the class.

Leave a Reply

Your email address will not be published. Required fields are marked *