Bottom Up Parsing – Computerphile


Last time you [Sean] did a really good
animation about top-down parsing. We had these sentences in this totally
artificial grammar, which has only got about 30 words in it, all in al. But we
started here withisand, in the top-down approach, you
basically say: “OK I’ll push those on the stack in reverse order:” And, starting at the top, of the stack, at the left, you say: “Well, what can
abe? So, you do these things on your stack, top down, one at a time, and
you start off at the root of the tree and you develop the component parts, left
to right, one at a time. It is a lot easier, totally by hand, to write a
top-down parser rather than a bottom-up one. But what I shall be getting to, later on
and in fact I’m starting right now, is to say: “Well what happens if instead of
starting up at the root and developing the leaves of the tree, as it were, you
start right down at the bottom, with the text string that you know is correct, and
try and work upwards you know can you work back upwards, from the leaves
to the root?” So, in fact let’s write that test sentence
below here: the robot stroked two furry …” And actually, since layout doesn’t matter
too much at the moment I’ll squeeze the word ‘dice’ in at the right there.
Now, be clear, we’re talking about bottom-up parsing. Now, in bottom-up
parsing you start with the string that you think is correct and you say:
“Starting with a string can I look into the rules and see how to work up the tree,
not down the tree?” And, yes, so therefore you’re looking at possible matches on the
right-hand side [of the rules] for components of this string, reading from left to right.
OK ‘the’. How many ways are there you can match the string ‘the’ against one
of these classifications here? Well the text – the string ‘the’ – is the right-hand
side possibility of anicle. That’s one way to do it. Oh! and then look up
here, right at the top of the grammar, if you have anicle followed by a,
like ‘the’ followed by something else, that could be aect, and that’s looking
good because right up at the top of the grammar we want to end up withect
looking just at “the robot”, and looking at the grammar right hand
sides, I could do it by saying: “Well it’s the subject of the sentence, it’s at the
left-hand side, and if I goicleI get ‘the’ and I get ‘robot’, but what
I’ve done, just to act as a talking point and it illustrates a lot of things here,
I’ve given you a shortcut. If you want to, you can just do “the robot”, with no
further interior analysis at all. It’s an allowed phrase; it’s theect. Now I
have to say that as we develop this story we will get into bottom-up parsing
because one of the tools were going to use called ‘yacc’ basically produces
bottom-up parses for you, not top-down ones. nd it’s a yacc behavior symptom that it loves, when you’re trying to match text strings, it likes to match the longest one that
it can [legally] manage. So it is going to seize on “the robot”, all as one phrase, as being a
wonderful long solution. Why doesn’t it use ‘the’ and then wait patiently
for? That’s not the bottom up way. If you can see a longer … Oh! and this thing
by the way, that you’re looking at, is built upon a stack, of course, it’s called
the ‘handle’. I get a longer handle by going for this option here and getting
“the robot”, all in one go. OK “the robot” then and you’ve got to get used to
reading from right to left now, in bottom-up parsing, “the robot” all as one
phrase is an example of aect. So, we can now say: “OK, ‘the robot’ is myect”.
Now that act – of picking up a substring from your sentence and going upwards, and
making it more abstract if you like – it’s called ‘reducing’. in bottom-up parsing.
So, looking for a longer and longer and longer string to get your handle that’s
called ‘shifting’, because you’re shifting characters, one after another, and making
the string longer and longer and saying “… can I go any further?” That’s shifting but
when you say: “Ooh! that’s a nice long string – and it matches” and then you go
up and say: “Oh! that’s my subject”, that is called ‘reduction’ because you’re going to
something simpler further up the tree. So, you can tick that off as being done
bottom up next thing is you see this string of characters called ‘stroked’ and,
once again, it’s right-hand side driven. What is there, on the right hand side,
and which rule is it that could possibly match ‘stroked’? You see in here, against
, ,or. Those three strings are your possibilities. So, that’s
fine. Going right to left you say ‘stroked’ is an example of a.
So we’ve got ourthere. Now,again, I’ve cheated but it’s wonderful fun! I have
not analyzed “two furry dice” into adjectives and nouns or anything like that.
I’ve just put it in as a interesting short-cut to have there. And it is an
example of what I would call an object-phrase Some of you, who are really good
English linguists, may want to go on about my lack of understanding about what a
direct and indirect object are – not to mention ‘predicates’ and so on. But please
forgive me. I regard it as being a phrase in an object position so I’m saying
there’s a quick match here and bottom-up I love this: “two furry dice” is a great
long handle. Oh! and if I’ve match it there, what’s the left-hand side it corresponds to?. OK then, we’ve won! We have worked
bottom-up to havingectect on our stack starting with the string.
And, what’s more, we’ve exhausted the string now. It’s the end of it. There’s a
sort of full stop after that. There we are then.
We’ve got top-down which tends to be more – how shall we say? Eager?- you know a top-down
parse would very probably leap on the word ‘the’, and not bother to go any
further because it’s found a quick match whereas bottom-up is the other way round.
It’s basically saying: “I want the longest possible handle”. Even at this stage in
the late 50s and early 60s there was a sneaking suspicion coming around
that actually bottom-up parsing was a little bit more powerful than top-down.
I’m going to put out a set of notes for this so that you can look up for yourself.
Just examples of why it [i.e. bottom up] is more powerful. But roughly speaking I think
you can sense that because you’ve not only got something you’re looking for
but you’ve [also] got a handle that you’ve already accumulated, it’s like gathering
more contextual information – going bottom-up. But, on the other hand, handling
the stack and working out what’s happening is a darn sight more complicated – if
you do it by hand – coming bottom up. Rather than doing it all by hand, why not
me and you lot [together]. It’s a good way to learn ‘lex’ and ‘yacc’ In other words
don’t write the C directly yourself get a software tool, like these two,
to do it for you. So, that’s exactly what we’re going to do. I’ve got the program
‘putty’ hat does ssh connected here. I’m talking to my other Linux machine in the
other room where I have got set up a parser – complete parser: front-end lexical
analyzer [then] syntax analysis – for this ‘furry grammar’ and all legal sentences in it.
And I know, first of all, you will want me to call up the program that implements
this and the test sentence first of all is: “the robot stroked two furry dice”. Here we go! So, “furry”. It’s hanging there, it’s waiting for us to give
a correct furry sentence [reads from screen] “… dice … return” Look at that! It’s happy with it!
I’m giving it in subject-verb-object order and I have numbered those rules, in
the grammar as I showed you earlier, and I now have as it were a map of how it
has parsed it. Rule 3? now that is the one that effectively
says I can do “the robot” all as one phrase. It has chosen not to go for ‘the’
and ‘robot’ asicle and, as separate entities. It might well have done that,
had I gone top-down, but because this yacc-confected parser system goes bottom-up
it’s gone for the longest possible handle at that stage and it’s matched it.
Rule 4: the middle piece is matched ‘stroked’ as the verb and finally it has
spotted, right at the very end, that I put in another sneaky short-cut to “two furry
dice” is Rule 6. ASnd that is my parse. So, should we try one more just to make sure?
Go on Sean, tell me what you want to try next.>>SEan: Try “the woman bit the dog”
>>DFB: yep, “the woman bit the dog” There you are look – Rule 2 for “the woman” now. Rule 2
not rule 3. if it has followed Rule 2 it’s gone down theicleroute, which
means it knows that’s the only way to match “the woman”. There is no
shortcut way. OK? Rule 4 – arule again; you chose ‘bit’
Rule 5: now this time, again, there is no short-cut to “two furry dice” at Rule 6
You’ve got to go the long way around and following Rule 5, you break it down
intoicleagain: ‘the’ and ‘dog’ So, there we are. You could say
well you’ve written a compiler for the ‘furry’ language, with the help of lex and
yacc. We could go into details of that later, if need be, but not now. It’s fair enough
but it’s not doing anything really is it? What more shall we do with this,
now we’ve written it this far then Sean? You tell me ?!
>>Sean: Well I think next time we need
to come up with an action, it needs to do something ….>>DFB: We need to transform that
grammar in some way. Those of you who, in the previous video,
actually bothered to look at the EXTRA BITS may have had a sneak preview as to
what we’re going to do, as our much more interesting actions now we have
recognized the innate structure. So, remember – always watch the EXTRA BITS !

Leave a Reply

Your email address will not be published. Required fields are marked *