The Angular 2 Compiler Tobias Bosch

The Angular 2 Compiler   Tobias Bosch


TOBIAS: Welcome to the talk. My name is Tobias Bosch. I work for Google, and I’m
part of the Angular core team, and I build most of the bits of the compiler. This is with
about how Angular works internally. It’s not high-level features. The hope is that you
can take some of the lessons we’ve learned for your applications, maybe you build your
own compiler on the framework, as we all do. So I will talk about how the compile er
works and how we got there. What is the compiler? It is everything of the inputs of the
application which might be your directives, your templates, and so forth, and your
running application. Everything in between is what the compiler
does. From the outside,
it seems magic, but it’s not. We will go through different things today. First, we will talk
a little bit about performance: what does it mean to be fast? We will walk about what
kind of inputs we have. We will talk about parsing, what is parsing,
what was the Angular 2 do with parsing? We will talk about how the Angular 2 will
take this output that it parsed and brings it to life. We will actually do this three times, like
first the simple implementation, and then a better one,
and then a better one. And that is
actually just how we did it. These were the iterations we actually did
in our compiler to make us faster. You can see some key words that we might recognise
from previous keynotes of ours. Then we will talk about different environments,
and we will talk about Just In Time compilation versus ahead-of-time
compilation. I hope we have time for a
little demo at the end. Let’s get started. First performance: what does it mean to be
fast? So, if I write an application and I say, “This
is fast,” what do you mean? I could
mean first this loads super quickly. Maybe you saw the ANP pages in Google search? They load super first. So this could be fast. Fast could be I’m in a big application, I’m
in there, and now I switch from one part to the
other part. Like from an overview to a
detail page, and that’s just really snappy, so I could say that is fast. Or I could say, I
have a big table, and now I just exchange all the values. I create new values but the
structure stays the same, and say, “That’s actually fast.” These are different aspects. They’re different things to look at in the
application in these different aspects. We want
to focus on switching the route. The important thing is that the framework
destroys everything and recreates everything anew. It’s not keeping the structure; it’s destroying
everything and recreating everything. How can we make this fast? To be fast, you
need to measure it. One of the benchmarks we use is the deep-tree
benchmark. Maybe
you heard about it? It’s a component that uses itself two times. It is a recursive
component until a certain depth is reach. We have 512 components in there, and the
buttons can destroy or create them. Then we measure how long it takes to click
destroy and create. So that is your route switch. From one view to the other. Destroy
everything, create everything. What kind of inputs do we have? We’ve got
components – probably everybody here knows them. They can have a template, that’s a
file, and you can make them inline or in a separate file, and they have a context. The
component instance itself is the context, it’s the data that is used to render the
template. In this case, we have a user, and the user
has a name, and it’s my name. Then we have templates. They are first plain old HTML, like you can
put forms, inputs, whatever HTML has, you can put there. They have some new syntax in there – did you
recall double curlies, square brackets, round brackets. These are data bindings for
properties or for events. In this talk, we only talk about the curlies
interpretation, what they mean. Semantically, this means take the data from
the component of its place. Then after the templates, we have directives. Directives have a selector that’s a CSS
selector and the idea is that when Angular finds this template, whenever it finds a
directive that matches an element, it should substantiate it. Let’s say we have a selector
for form, so whenever we create the form element, please create this directive. The
same for mgModel. When you create an engine with an mgModel
attribute, please create in directive as well. This directive can have tend sees, so that
is our dependency injection. It’s hierarchical in the way that an mgModel
can ask for an ng form. It will
look you mean the down tree to find the nearest ng form upwards. It won’t search the
siblings, only the parents. So, there are more in puts. We won’t talk about these. They
are pipes, inputs, outputs, content children, so on and so forth. We will focus on some
but do a deep dive of these. Everything that we do in Angular goes through
the compiler. So we’ve got the inputs. The next thing we need to do is we need to
understand the inputs. Like, is it gibberish what you wrote or does
it actually make sense? So that is a process of parsing. So let’s say we have a template, like some
HTML. How can we re-present it so the compiler can
understand it? That’s what a
parser does. It starts reading each character and then
makes sense of it. It is it builds
up a tree. For each element, you can have one object. Let’s say there’s a name on it
which is the element name, it has children. Let’s say the text node will re-present us,
JSON as well with a text node on it, and an attribute, for example, for the input
element. Let’s say we encode these as nested lists. The first entry is the key; the
second value is the value of the attribute. The mgModel has an empty value. So far,
that’s straightforward. This this representation, we call abstract
syntax tree – AST in short. You will hear this key word a lot. That is HTML. What about this binding. How
can we represent this binding? We can do it as follows. It’s a text node, so we’ve got a
JSON object with a text property. The text itself is empty because initially,
there’s no text to show. It depends on the data that comes in. The data that comes this is
represented at this expression. Any expressions we analyse like what do they
mean? They look like JavaScript. They’re a little bit more and a little bit
less. You cannot
declare a function in there, or you cannot do a foreloop in there, but we have things
like pipes, like it’s some special variant we have in there, and you can represent this
one, user.name, a property path that you can read in, say a prop path to represent this
kind of expression, and we also capture where it came from. In your template. So why
is that important? Why do we want to know where it came from
in the AST? It’s
because we want to show you meaningful error messages during runtime. So let’s say,
during runtime, your user gets – an exception, happens, right? Cannot read property
name from undefined. So if this just happens, okay, then, you need
to go into your debugger and look at maybe making a break
point on the first error, and then you know where it actually happened. Angular wants to give you more information:
it want to tell you from which place in your template
this originated. We want to tell you it
came actually from this interpolation it had there, line 2, column 14 of your template. What kind of parsers can we use to produce
this AST? There are multiple pockets. We
can use the browser. The browser is good at parsing HTML, right? It does it every day. That’s the approach we took for Angular 1,
and we took the approach for Angular 2 as well. Right now we don’t use it any more for two
reasons: we don’t get line and column numbers out of the browser. If you ask a browser to parse HTML, there
is no column and line number. The other reason is we want to run Angular
on the server as well. On the server, there is no browser, obviously,
so we could say in the browser, use the browser, in the server, we use something
else. We actually had that. But then you
run into corner cases for sbg names basis, for example, and there is some – so you
want to have the exact same semantics everywhere. It’s simpler to have a JavaScript
portion, and – parser, and that’s what we do right now. We have talked about HTML. How do we represent directives we find? We could re-present these in our JSON
objects that represent an element. We point to the constructor functions of these
directors. In this case, when we look at this input with
mgModel, we can represent this element as a JSON object, there’s a name on
it, input. It has this attribute, mgModel,
and it has directors on it. We have a point er and we capture the dependencies. We
need to say whenever we create an mgModel, we need to create an mg form – an ng
form. How do we bring it to life? What is the fastest way to create a DOM element? The first thing you could say I use elements
in within HTML the second thing you can do is you can take an existing element and
clone it. The third thing you can do is you
call document create element. Let’s do a hand show. Who thinks element HTML is the
fastest? Who thinks element.cloneNode is the fastest? Who thinks
element.createElement is the fastest? It changed over time, obviously, but, recently,
it’s the case that inner HTML is the slowest. It takes your string, walks through each
character and build up a DOM element, so that is obviously slow. In cloneNode it is the
fastest because the browser has the representation, and it can clone it, essentially. It is
allocating new memory. That’s all it needs to do. Document.create slap element is —
document.createElement is slower but very close to clone.node. You say let’s use
cloneNode for the DOM. It turns out this is not a fair comparison,
not for the use case that Angular has. The use case of Angular is we need to create
something but second, we need to locate these elements. So, in this case, we want to stamp out new
text nodes, but we need to find the one that is
responsible for user.name because we want to update it later on. To compare, we should compare creating plus
locating the elements, like the interesting elements. If you do cloneNode or HTML, you need to walk
the path. You just call it and you have to the instance
directly in hand. There’s nothing
to walk or anything. So, in this regard, if you put these two together,
create element and create text node is about the same speed
depending on the number of bindings you have, but secondly, you need a lot less
data structure. You don’t need to keep
track of the indices and this stuff, so it’s simpler. We’re using this, and other
frameworks switch to this. Now we can create DOM elements. The other thing we need
to create are directives. Directives can be – we need this dependsy
injection upwards. Let’s say we have a data structure called
ng element. It has a parent as well, so it’s a
simple tree that follows the DOM tree. It just wraps the DOM elements plus the
directives. So then how can we create a DOM element from
the RST? We saw the –
AST. We saw the element. What can we do in this our NGElement, we look
over the attributes, set them on the element, and then
append our element to the parent. So
there is no magic. Then, for the directives, how – directives,
we would new, on this – we would call new on this constructor. Store it in the map. The map would be the type of
directive, like mgModel to the instance of the directive. This look-up for the directives
would work like this. A method that get directive which would first
check the element itself. Do we have the director in there? If not, walk up to the parent and check there. That is about the simplest you can do. We started along these lines, actually. And it
works. There is one part missing, the bindings. How can we present the bindings? You
just make a class binding that has a target which will be a text node. A target property,
in this case, the node value, so that is the place where the value should go. Plus the
expression. Then the binding works like this: each time
you evaluate the expression and just when it changed, you store it in
the target. You could have a method check
like this, previous stored value. That’s how our change detection works. If you look at
the generation code, you see a lot of these statements here. Good value and compare
in the target. And then this – these nice exceptions that
we had, which is to catch the evaluate, and when the exception happens we
rethrow the exception and wrap it with line and column number. That’s how you get these line errors like
we saw before. We
bind these together into a view, so that is the last data structure. A view is the instance
of a template. So, if you read error code, you will see the
view a lot. That just means
it’s an instance of a template. We put these together in a view. The view has the
reference to the component; the ng elements and the bindings, and there is a
dirty-check method on it that looks over the bindings and checks these. So, we are
done, first iteration. Good. We have a new compiler. How fast are we! About the same
speed as Angular 1. That’s good. That’s a simple approach, and we get that
speed. Angular 1 is not slow. How do we get faster? What is the next step? What are we
missing? Let’s see. So we need something that is related to our
data structures. And
data structures, that’s actually really hard. If you profile the previous program that we
wrote, you will see this tricat shows up, but this is not one place you will find it
is super slow, not one function that takes two seconds
you need to optimise. If the reason why
your program is slow or data structures, it is it’s hard to tell because it’s the it’s
scattered all over your program. It won’t show up in the profile exactly, so
like this long tail, but, if you add it all up, then it gets
slow. We experimented and tried things out. One things we looked at is this directive
map in this ngElements. It means for each
element in the DOM tree, we create a new map, right? We could be smart saying if
there are no directives, we don’t create it, okay? But still we always create it, we
populate it with the directives and we read from it. Alternative approach is to say, okay,
we only allow ten directives per element, and then we create a class called in line
ngElement, ten properties for the directive types, and then to look up a directive, we
would just code ten if statements – if it is this time, return that, if it is that time,
return that. Is this faster? It could be. There is not much memory allocation, right? Like
setting, you just set a property, not into a map. Reading might be a little slower
because there are ten statements. But the interesting part here is this is a
pattern that the JavaScript VMs optimise for. Wherever you do this, the JavaScript VM can
create a hidden class. This is what makes JavaScript VMs fast. Switching to this data structure
makes us faster. We will see later our benchmark. So the other thing to optimise for our
data structures is just to reuse the existing instances. So we could say, okay, if we have
an ng four, first, some rows are destroyed and others show up, so why not cache these
in a cache and change the data the next time they show up? So we did this. We
created a so-called new pool or new cache that restored the old view instances, so,
before it goes into the new pool, you need to destroy the state. So we killed all the
directives out. And then, when it goes back out of the pool,
we need to create the directives again. So that is these hydrate and dehydrate methods. The DOM nodes we
kept around because everything is driven from the model, all the status in the model so
we could keep this around. So we did benchmarks again. So, to understand these
benchmarks, the baseline that’s a hand-coded JavaScript program. There’s no
framework involved, it’s just coded for this deep-tree benchmark case. It does some
birdie-checking and everything is hard-coded. Everything is compared in ratios. Compared to that, Angular 1 is indeed 5.7. Previously, we were also at that speed with
these optimised data structures without this view cache. We were at 2.7, so that is
good. We got two times faster because of the fast
property access, and, with the ViewCache we got 1.0 off the baseline, so
that is cool. We had this and we
thought, “Our performance is done.” Maybe this slide you remember from a preaches
conference, actually. So we had this. We had some applications with it. And then we
thought there are problems. The ViewCache is bad with memory. Your old route stays
in memory because it is cached, right? When to evict things out of the cache? That’s
actually a hard question. We could have said we add some new primitives,
so the user should tell us should I cache it or not cache
it? That would be awkward. The other
problem is that DOM elements have a hidden state. This focus, for example, even if you
don’t have a binding to focus, and like removing it and adding it back can change focus
of this element or other elements. We didn’t think about this. There were bugs coming
in because of that. One way to go would be to say we need to clean
up the DOM elements thoroughly to remove the state or
recreate them, even. But this would have
destroyed our ViewCache performance if we needed to recreate the DOM. In the end,
we really had this 2.7. How can we get faster with that. Third iteration. Now, the idea
was let’s look at our view class. What do we have? We’ve got the component, these ng
elements which we already optimised, right? And we’ve got the bindings. But the
ViewClass still has these arrays. Can we make a thing like this? And it turns out, we
can. How would this look like, though? It would look like something like this. So we
would have a template, as we had before, and, for each element, we would just
generate code. For each Kim plate we would generate code
that represents our view class. In the constructor, which element we would
call document.createElement. Node
zero would be for the first one, the second one scored to the document again, stored in
node one. When we need to append an element to its parent,
we have the properties, right? We just need to do it in the right order. Everything is fills. We can use the
property to refer to previous state. This is what we would do with the do DOM. For
directives, we do exactly the same. We just have properties for each instance,
and we just need to be sure we do it in the right
order, so that dependencies come first and then the things that use the dependency come
afterwards, so we would first use up the ng form and then the mgModel. The bindings, we would take the expression,
convert them back into JavaScript, and this case it
would be this .component.user.name. Compared to the previous component which is
also just a property, and, if it changed, we update the text node. In the end, we end up with a view with the
data structure. There are properties on it. No arrays, no maps, so this is fast property
access everywhere. Now, when – okay, this is faster. I will show the numbers in a second. But
the question is how do we make this? Like somebodies needs to produce the string
that evaluates to this new class, right? How do we do it? So we just apply what we
had in instantiation 101 into our current instantiation. And the idea is whenever before
we created DOM nodes now we create codes to create DOM nodes. Whenever before
we compared things, we generated codes to compare things. Whenever before we
referred to directive instances or DOM nodes, we now store the property name where
they are stored. Then the code looks like this: so we have
our ng element as before. Now we have a compile element. These classes actually exist right now in
the compiler. There is a compiler element, exile view, and
so – compile view, and so forth. Previously,
we had a property where the DOM element is stored. Previously, we called
document.createElement and now we produce a string, which is super handy for code
generation, and we say this.document name – whenever we call appendChild we call to
append child, and that’s a method. The same goes for this directive-dependency
finding. Like exactly the same algorithms, just now
with generate code to it. If we look
at the numbers where we’re a lot better actually now, so now before we were 2.7, now
we are 1.5. It is not quite two times, but almost two
times faster. The view cache still
was a little bit faster, but we dropped it because of the known problems. Okay, we are
good, right? We could be done. We’re not. So, so far, we talked about just-in-time
compilation. It means we compile in the browser, and to
recap, it works like this: you have some inputs that are on your server. The browser loads them up, and then the
browser kicks in, it parses everything as we had, we generate this view class source,
and then we need to eval the source code to get an extra class, and then we can new
up this class, and then we get a running application. There are some problems with
this. One problem is we need to parse and generate
in the browser. This takes some
time, depending on how many components we have. Going character by character, it is
pretty fast, but it takes some time to do it. So that your page load will not be the
fastest second. The parser actually needs to be in the browser,
so you need to load the whole compiler into your browser which got
bigger the more features we added and the more code gen we added. The next problem is Eval which is evil, right! Hopefully. So
the problem with Eval – the main problem with Eval is that you can make mistakes by
using it. Because of these mistakes, somebody can hijack
your website. That’s why you
should not use Eval, even if you think your code is correct. The other problem is that
browser now add security content policy as a feature – maybe you know it, so, with
that, a server can tell a browser, don’t allow Eval on this page at all. This would mean a
browser – a server can tell a browser don’t run Angular on this site which is not so nice. The other problem we faced, if you want to
minify your code, there are tricks you can do. You can re name variables because they’re
safe because you can determined where they’re used. You can rename properties on objects. That’s harder, because you need
to know where the objects are actually used. Clojure compiler supports it, and we use
that internally in Google. We use that a lot. If we use enhanced minification with our
runtime, this happens. We have our components, they load up in the
browser, and the browser analyses the template and generates
the view class. So far, so good. The
user.name and the component also has user.name on it, minified with enhanced
optimisation. The component is called maybe C1 and then
my user suddenly is just U and the name is just N. The problem is the
minimum — minifier doesn’t know about my template. It goes to user.name which doesn’t exist on
my component. There’s some
solution so you can use the component not to minify this property. You want to have
this short, but with just-in-time compilation and evaluating, that’s not going to work. Our next step in evolution was to introduce
ahead of time compilation you have the inputs again in the ahead-of-time compilation. You generate the view laughs on the
server and the browser picks them up, loads them as regular script text, however you
load your JavaScript text, and load them up. It’s good because parsing happens on the
server, so it’s fast. The compiler doesn’t need to be shipped to
the browser – that’s also good. We also don’t use Eval any more, right? That’s fine with all the browsers. It’s
also good for enhanced minification, so, if you run a minifier, it can run the view classes
as well. If it does its work, it will generate the
properties in our view class as well. We
can use it as well. We can use enhanced minification. With that, we get low numbers. Everything is good. We’ve got ahead-of-time compilation and we
are good. As with
everything, there are always downsides. One problem we have with ahead-of-time
compilation is we need to produce different code. For Eval in the browser, you need –
you generate something which uses local variables or something that gets arguments
that you parse to a function. You don’t use a require statement, or whatever. So, if you
generate it on the server, we wanted to generate TypeScript code for two reasons: one
is we wanted to type check your expressions that you used, whether they actually
existed on your component. The other reason is we wanted to leverage
type scripts. We
wanted to generate import and then you can care what module system you want to
use. It is a lot nicer from a code-generation perspective
to generate import rather than worrying about just, and Google clojure, module
system, and so forth. We wanted to
produce 2016 code as well. How did we do it? It’s actually, if you know compilers, it is
a common pattern. Instead of producing strings, you produce
a data structure that is similar to your output, but this data structure,
this EST, you can then serialise into different outputs. So this EST contains things like declare variable,
invoke method, and so forth, plus the types, and then for ES5,
you just – without the types, for TypeScript. We serialise it with the types. It looks something like this. So this work an output AST
that we generate, just declare variable inside, so we would say declare variable name
EL, so you produce this code, Var EL, in our TypeScript, we would produce the type. Then we can invoke the method. That’s a global. We need to read it. We can invoke a
method on it with this name createElement and these arguments. We posted a little
called DIV, because if you parse strings, your code needs to escape, right? The value
could contain quotes, you need to escape the quotes, and so forth, so, with that, we
can do all of this kind. The nice thing now is we have the same code
gen running in the browser. No different code paths. That’s super cool. I’m glad we added this. That was
really cool. The second problem we faced for ahead-of-time
compilation is the data structure. Think about the code here. What is problematic in this code? So let’s say we
have some direct ive that some feature detection for cookies, and whether you have
cookies, it does something else. This works, you can compile it. Perfect. It does not
work with ahead-of-time compilation. If you think about it trivially, why is that? If this
gets down level to ES5, this is what is produced. This decorator, in the end, what it
does, it adds a property to your constructor towards this metadata. In the end, it just
adds a statement, SomeDir annotations. The problem, is if you run this on the server,
this will not work, right? There is no document on the server, so we
have a problem. What do we do? One, we could say let’s build up a browser
environment in the server. Let’s declare a variable document or a variable
window, and so forth. This works for
some cases, but there are always cases that will fail. The other way we can do it,
hopefully we’re experts in ASTs, everybody, is say look at the AST and extract the
metadata from the AST without evaluating this code. So the AST of this could be
represented by that side here. So, our class, SomeDir could have in the AST
property decorators where the AST refers to what it
is to call, that’s the definition, to find, plus
the arguments. And that’s actually what we do. So we extract this metadata into JSON
files, and our ahead-of-time compiler takes these JSON files and produces the template
from there. There are some limitations, obviously. We’re not generating JavaScript code
directly, so you can’t do everything that you used to do in the browser. So, you are our
ahead-of-time compiler restricts but you put in an annotation, but this will work in all
edge cases, and that is nice. Let’s run benchmarks again. This is a simple app that we
wrote, and the load time got down significantly. It is because we don’t parse any more,
and, second, because the Angular 2 compiler isn’t shipped any more. This is about
three times faster, and our size also dropped a lot from 144 kilobyte minified gzip to 49. As comparison, React right now is at 46 kb
gzip, like the library itself. This is with
enhanced minification as well. This is why we are able to get this small. Okay, I think
we’ve got time for a little demo. Let’s see. Big enough? You can see that? So let’s say
I have a component, exactly like the one we just saw, an Angular component, and
we’ve got the ngForm and the mgModel has the dependency and let’s run an ngc over
it. Let’s make this bigger. It’s the types with the compiler, because
it depends on TypeScript to extract metadata and support,
so let’s run this guy. It’s generated in here. If we look at our generated code, you will
notice that this code here looks very familiar, right? We get the user name, and it changed, in this
case, we update an input of a directive, so we just compared to the previous
value and then set it. There are nice
thing about this, though. Let’s say in my template I change this user,
this name here to wrong name. So our tell me — so our template referred
to user.name. If I look at
the code, TypeScript will give me an error, because these views, they are typed, they
have a generic on it based on the component type, and just by compiling it, you will
detect errors in your code, just because we used TypeScript. There is no extra thing we
needed to do. The last part, we agreed we wanted to be better. Our goal is to reach 10
kb minified gzips. Then we want to be faster than the baseline. This baseline is not
optimal, and we can do better than the baseline, and that is some things that we work
on next. The nice things is that, for US users, you
don’t need to change your code because Angular is this declarative, can do
all of these changes, generate different code, and you won’t even in the it except
for better numbers. So summarise, we talked
about performance, the difference aspects to performance. We talked about the inputs. We talked about parsing. Like what an AST is, how you can represent
a template. We
talked about how you can instantiation your template. We learned that
document-createElement is really good for frameworks like Angular. We found these
hidden classes are great tools to optimise your code. We learned that code generation,
if you do it right, like, if you’re not just with Eval but also offline supported, and
with this output AST thing, like it is really powerful. And it can help you in this path of
generating these hidden classes. And we talked about just in time compilation,
ahead-of-time compilation and things you can opt on mice. Last but not least, we had a
little demo. If you’re interested in the slides, that’s
a short link to get to my slides. My
name is Tobias Bosch. Thank you very much. [APPLAUSE]. NAOMI: What happens if you get under 10K. Did Brad promise us something?>>So Brad is our manager. And he’s really good at baking, so he loves
to bake bread, and he’s good at baking cakes. So he promised to bake a cake for us once
we hit 10k. NAOMI: We have to keep reminding him that
we don’t forget, so you can all help remind him.>>It’s not the only reason, though. NAOMI: The satisfaction of a job well done
is also important. Thank you. [APPLAUSE]. Let’s have a couple of minutes’ break. We will change over the room. Stretch your legs
and your arms. We will set up for the next speaker. [Break]

15 thoughts on “The Angular 2 Compiler Tobias Bosch”

  1. Could you please help me how I can create a dynamic component from a dynamic HTML. What I want is to generate the HTML in real time from the server using service and then embed that HTML in view. I also need data binding on dynamic HTML.

  2. The typescript demo was amazing. I always thought Angular Language Service needs to parse templates on its own to red light wrong refs in the template. Ts is amazing.

Leave a Reply

Your email address will not be published. Required fields are marked *