Template Parser
In the previous section, we looked in detail at the AST, which is the result of parsing.
Now, let's explore the parser that actually generates that AST.
The parser is divided into two steps: parse
and tokenize
.
We start with tokenize
.
Tokenize
tokenize
is the lexical analysis step.
Lexical analysis is the process of analyzing code, which is just a string, into units called tokens (lexemes).
Tokens are meaningful chunks of strings. Let's look at the actual source code to see what they are.
87export enum State {
88 Text = 1,
89
90 // interpolation
91 InterpolationOpen,
92 Interpolation,
93 InterpolationClose,
94
95 // Tags
96 BeforeTagName, // After <
97 InTagName,
98 InSelfClosingTag,
99 BeforeClosingTagName,
100 InClosingTagName,
101 AfterClosingTagName,
102
103 // Attrs
104 BeforeAttrName,
105 InAttrName,
106 InDirName,
107 InDirArg,
108 InDirDynamicArg,
109 InDirModifier,
110 AfterAttrName,
111 BeforeAttrValue,
112 InAttrValueDq, // "
113 InAttrValueSq, // '
114 InAttrValueNq,
115
116 // Declarations
117 BeforeDeclaration, // !
118 InDeclaration,
119
120 // Processing instructions
121 InProcessingInstruction, // ?
122
123 // Comments & CDATA
124 BeforeComment,
125 CDATASequence,
126 InSpecialComment,
127 InCommentLike,
128
129 // Special tags
130 BeforeSpecialS, // Decide if we deal with `<script` or `<style`
131 BeforeSpecialT, // Decide if we deal with `<title` or `<textarea`
132 SpecialStartSequence,
133 InRCDATA,
134
135 InEntity,
136
137 InSFCRootTagName,
138}
Tips
Actually, this tokenizer is a forked implementation of a library called htmlparser2.
This parser is known as one of the fastest HTML parsers, and Vue.js significantly improved performance from v3.4 onward by using this parser.
This is also mentioned in the source code:
1/**
2 * This Tokenizer is adapted from htmlparser2 under the MIT License listed at
3 * https://github.com/fb55/htmlparser2/blob/master/LICENSE
Tokens are represented as State
in the source code.
The tokenizer has a single internal state called state
, which is one of the states defined in the State
enum.
Looking specifically, we have the default state Text
, {{
representing the start of an interpolation, }}
representing the end of an interpolation, the state in between, <
representing the start of a tag, >
representing the end of a tag, and so on.
As you can see around:
47export enum CharCodes {
48 Tab = 0x9, // "\t"
49 NewLine = 0xa, // "\n"
50 FormFeed = 0xc, // "\f"
51 CarriageReturn = 0xd, // "\r"
52 Space = 0x20, // " "
53 ExclamationMark = 0x21, // "!"
54 Number = 0x23, // "#"
55 Amp = 0x26, // "&"
56 SingleQuote = 0x27, // "'"
57 DoubleQuote = 0x22, // '"'
58 GraveAccent = 96, // "`"
59 Dash = 0x2d, // "-"
60 Slash = 0x2f, // "/"
61 Zero = 0x30, // "0"
62 Nine = 0x39, // "9"
63 Semi = 0x3b, // ";"
64 Lt = 0x3c, // "<"
65 Eq = 0x3d, // "="
66 Gt = 0x3e, // ">"
67 Questionmark = 0x3f, // "?"
68 UpperA = 0x41, // "A"
69 LowerA = 0x61, // "a"
70 UpperF = 0x46, // "F"
71 LowerF = 0x66, // "f"
72 UpperZ = 0x5a, // "Z"
73 LowerZ = 0x7a, // "z"
74 LowerX = 0x78, // "x"
75 LowerV = 0x76, // "v"
76 Dot = 0x2e, // "."
77 Colon = 0x3a, // ":"
78 At = 0x40, // "@"
79 LeftSquare = 91, // "["
80 RightSquare = 93, // "]"
81}
82
83const defaultDelimitersOpen = new Uint8Array([123, 123]) // "{{"
84const defaultDelimitersClose = new Uint8Array([125, 125]) // "}}"
the strings to be parsed are encoded as Uint8Array
or numbers to improve performance. (I'm not very familiar with it, but numerical comparisons are probably faster.)
Since it's a forked implementation of htmlparser2
, it's debatable whether this counts as reading Vue.js source code, but let's actually read a bit of the Tokenizer's implementation.
Below is where the Tokenizer's implementation starts:
236export default class Tokenizer {
As you can see from the constructor, callbacks for each token are defined to achieve "tokenize -> parse".
(In the upcoming parser.ts
, the parsing of templates is realized by defining these callbacks.)
265 constructor(
266 private readonly stack: ElementNode[],
267 private readonly cbs: Callbacks,
268 ) {
180export interface Callbacks {
181 ontext(start: number, endIndex: number): void
182 ontextentity(char: string, start: number, endIndex: number): void
183
184 oninterpolation(start: number, endIndex: number): void
185
186 onopentagname(start: number, endIndex: number): void
187 onopentagend(endIndex: number): void
188 onselfclosingtag(endIndex: number): void
189 onclosetag(start: number, endIndex: number): void
190
191 onattribdata(start: number, endIndex: number): void
192 onattribentity(char: string, start: number, end: number): void
193 onattribend(quote: QuoteType, endIndex: number): void
194 onattribname(start: number, endIndex: number): void
195 onattribnameend(endIndex: number): void
196
197 ondirname(start: number, endIndex: number): void
198 ondirarg(start: number, endIndex: number): void
199 ondirmodifier(start: number, endIndex: number): void
200
201 oncomment(start: number, endIndex: number): void
202 oncdata(start: number, endIndex: number): void
203
204 onprocessinginstruction(start: number, endIndex: number): void
205 // ondeclaration(start: number, endIndex: number): void
206 onend(): void
207 onerr(code: ErrorCodes, index: number): void
208}
Then, the parse
method is the initial function:
923 /**
924 * Iterates through the buffer, calling the function corresponding to the current state.
925 *
926 * States that are more likely to be hit are higher up, as a performance improvement.
927 */
928 public parse(input: string): void {
It reads (stores) the source into the buffer and processes it one character at a time.
929 this.buffer = input
930 while (this.index < this.buffer.length) {
It executes callbacks in specific states.
The initial value is State.Text
, so it starts there.
935 switch (this.state) {
936 case State.Text: {
937 this.stateText(c)
938 break
939 }
940 case State.InterpolationOpen: {
941 this.stateInterpolationOpen(c)
942 break
943 }
For example, if the state
is Text
and the current character is <
, it executes the ontext
callback while updating state
to State.BeforeTagName
.
318 private stateText(c: number): void {
319 if (c === CharCodes.Lt) {
320 if (this.index > this.sectionStart) {
321 this.cbs.ontext(this.sectionStart, this.index)
322 }
323 this.state = State.BeforeTagName
324 this.sectionStart = this.index
In this way, it reads characters in specific states and transitions states based on the character type, proceeding step by step.
Basically, it's a repetition of this process.
Due to the large amount of implementation for other states and characters, we'll omit them.
(There's a lot, but they're doing the same thing.)
Parse
Now that we have a general understanding of the tokenizer's implementation, let's move on to parse
.
This is implemented in parser.ts
.
packages/compiler-core/src/parser.ts
Here, the Tokenizer
we just discussed is used:
97const tokenizer = new Tokenizer(stack, {
Callbacks are registered for each token to build the template's AST.
Let's look at one example.
Please focus on the oninterpolation
callback.
As the name suggests, this is processing related to the Interpolation
Node.
108 oninterpolation(start, end) {
Using the length of the delimiters (default is {{
and }}
) and the passed indices, it calculates the indices of the inner content of the Interpolation
.
112 let innerStart = start + tokenizer.delimiterOpen.length
113 let innerEnd = end - tokenizer.delimiterClose.length
Based on those indices, it retrieves the inner content:
120 let exp = getSlice(innerStart, innerEnd)
Finally, it generates a Node:
129 addNode({
130 type: NodeTypes.INTERPOLATION,
131 content: createExp(exp, false, getLoc(innerStart, innerEnd)),
132 loc: getLoc(start, end),
133 })
addNode
is a function that pushes the Node into the existing stack if there is one, or into the root's children if not.
916function addNode(node: TemplateChildNode) {
917 ;(stack[0] || currentRoot).children.push(node)
918}
The stack
is a stack where elements are pushed as they nest.
Since we're here, let's look at that process as well.
When an open tag is finished—for example, if it's <p>
, at the timing of the >
—the current tag is unshift
ed into the stack:
567function endOpenTag(end: number) {
568 if (tokenizer.inSFCRoot) {
569 // in SFC mode, generate locations for root-level tags' inner content.
570 currentOpenTag!.innerLoc = getLoc(end + 1, end + 1)
571 }
572 addNode(currentOpenTag!)
573 const { tag, ns } = currentOpenTag!
574 if (ns === Namespaces.HTML && currentOptions.isPreTag(tag)) {
575 inPre++
576 }
577 if (currentOptions.isVoidTag(tag)) {
578 onCloseTag(currentOpenTag!, end)
579 } else {
580 stack.unshift(currentOpenTag!)
581 if (ns === Namespaces.SVG || ns === Namespaces.MATH_ML) {
582 tokenizer.inXML = true
583 }
584 }
585 currentOpenTag = null
586}
580 stack.unshift(currentOpenTag!)
Then, in onclosetag
, it shifts the stack:
154 onclosetag(start, end) {
155 const name = getSlice(start, end)
156 if (!currentOptions.isVoidTag(name)) {
157 let found = false
158 for (let i = 0; i < stack.length; i++) {
159 const e = stack[i]
160 if (e.tag.toLowerCase() === name.toLowerCase()) {
161 found = true
162 if (i > 0) {
163 emitError(ErrorCodes.X_MISSING_END_TAG, stack[0].loc.start.offset)
164 }
165 for (let j = 0; j <= i; j++) {
166 const el = stack.shift()!
In this way, by making full use of the Tokenizer
callbacks, the AST is constructed.
Although the amount of implementation is large, we're essentially just steadily doing these processes.