Skip to content

Template Parser

In the previous section, we looked in detail at the AST, which is the result of parsing.

Now, let's explore the parser that actually generates that AST.

The parser is divided into two steps: parse and tokenize.

We start with tokenize.

Tokenize

tokenize is the lexical analysis step.

Lexical analysis is the process of analyzing code, which is just a string, into units called tokens (lexemes).

Tokens are meaningful chunks of strings. Let's look at the actual source code to see what they are.

87export enum State {
88  Text = 1,
89
90  // interpolation
91  InterpolationOpen,
92  Interpolation,
93  InterpolationClose,
94
95  // Tags
96  BeforeTagName, // After <
97  InTagName,
98  InSelfClosingTag,
99  BeforeClosingTagName,
100  InClosingTagName,
101  AfterClosingTagName,
102
103  // Attrs
104  BeforeAttrName,
105  InAttrName,
106  InDirName,
107  InDirArg,
108  InDirDynamicArg,
109  InDirModifier,
110  AfterAttrName,
111  BeforeAttrValue,
112  InAttrValueDq, // "
113  InAttrValueSq, // '
114  InAttrValueNq,
115
116  // Declarations
117  BeforeDeclaration, // !
118  InDeclaration,
119
120  // Processing instructions
121  InProcessingInstruction, // ?
122
123  // Comments & CDATA
124  BeforeComment,
125  CDATASequence,
126  InSpecialComment,
127  InCommentLike,
128
129  // Special tags
130  BeforeSpecialS, // Decide if we deal with `<script` or `<style`
131  BeforeSpecialT, // Decide if we deal with `<title` or `<textarea`
132  SpecialStartSequence,
133  InRCDATA,
134
135  InEntity,
136
137  InSFCRootTagName,
138}

Tips

Actually, this tokenizer is a forked implementation of a library called htmlparser2.

This parser is known as one of the fastest HTML parsers, and Vue.js significantly improved performance from v3.4 onward by using this parser.

This is also mentioned in the source code:

1/**
2 * This Tokenizer is adapted from htmlparser2 under the MIT License listed at
3 * https://github.com/fb55/htmlparser2/blob/master/LICENSE

Tokens are represented as State in the source code.

The tokenizer has a single internal state called state, which is one of the states defined in the State enum.

Looking specifically, we have the default state Text, {{ representing the start of an interpolation, }} representing the end of an interpolation, the state in between, < representing the start of a tag, > representing the end of a tag, and so on.

As you can see around:

47export enum CharCodes {
48  Tab = 0x9, // "\t"
49  NewLine = 0xa, // "\n"
50  FormFeed = 0xc, // "\f"
51  CarriageReturn = 0xd, // "\r"
52  Space = 0x20, // " "
53  ExclamationMark = 0x21, // "!"
54  Number = 0x23, // "#"
55  Amp = 0x26, // "&"
56  SingleQuote = 0x27, // "'"
57  DoubleQuote = 0x22, // '"'
58  GraveAccent = 96, // "`"
59  Dash = 0x2d, // "-"
60  Slash = 0x2f, // "/"
61  Zero = 0x30, // "0"
62  Nine = 0x39, // "9"
63  Semi = 0x3b, // ";"
64  Lt = 0x3c, // "<"
65  Eq = 0x3d, // "="
66  Gt = 0x3e, // ">"
67  Questionmark = 0x3f, // "?"
68  UpperA = 0x41, // "A"
69  LowerA = 0x61, // "a"
70  UpperF = 0x46, // "F"
71  LowerF = 0x66, // "f"
72  UpperZ = 0x5a, // "Z"
73  LowerZ = 0x7a, // "z"
74  LowerX = 0x78, // "x"
75  LowerV = 0x76, // "v"
76  Dot = 0x2e, // "."
77  Colon = 0x3a, // ":"
78  At = 0x40, // "@"
79  LeftSquare = 91, // "["
80  RightSquare = 93, // "]"
81}
82
83const defaultDelimitersOpen = new Uint8Array([123, 123]) // "{{"
84const defaultDelimitersClose = new Uint8Array([125, 125]) // "}}"

the strings to be parsed are encoded as Uint8Array or numbers to improve performance. (I'm not very familiar with it, but numerical comparisons are probably faster.)

Since it's a forked implementation of htmlparser2, it's debatable whether this counts as reading Vue.js source code, but let's actually read a bit of the Tokenizer's implementation.

Below is where the Tokenizer's implementation starts:

236export default class Tokenizer {

As you can see from the constructor, callbacks for each token are defined to achieve "tokenize -> parse".

(In the upcoming parser.ts, the parsing of templates is realized by defining these callbacks.)

265  constructor(
266    private readonly stack: ElementNode[],
267    private readonly cbs: Callbacks,
268  ) {
180export interface Callbacks {
181  ontext(start: number, endIndex: number): void
182  ontextentity(char: string, start: number, endIndex: number): void
183
184  oninterpolation(start: number, endIndex: number): void
185
186  onopentagname(start: number, endIndex: number): void
187  onopentagend(endIndex: number): void
188  onselfclosingtag(endIndex: number): void
189  onclosetag(start: number, endIndex: number): void
190
191  onattribdata(start: number, endIndex: number): void
192  onattribentity(char: string, start: number, end: number): void
193  onattribend(quote: QuoteType, endIndex: number): void
194  onattribname(start: number, endIndex: number): void
195  onattribnameend(endIndex: number): void
196
197  ondirname(start: number, endIndex: number): void
198  ondirarg(start: number, endIndex: number): void
199  ondirmodifier(start: number, endIndex: number): void
200
201  oncomment(start: number, endIndex: number): void
202  oncdata(start: number, endIndex: number): void
203
204  onprocessinginstruction(start: number, endIndex: number): void
205  // ondeclaration(start: number, endIndex: number): void
206  onend(): void
207  onerr(code: ErrorCodes, index: number): void
208}

Then, the parse method is the initial function:

923  /**
924   * Iterates through the buffer, calling the function corresponding to the current state.
925   *
926   * States that are more likely to be hit are higher up, as a performance improvement.
927   */
928  public parse(input: string): void {

It reads (stores) the source into the buffer and processes it one character at a time.

929    this.buffer = input
930    while (this.index < this.buffer.length) {

It executes callbacks in specific states.

The initial value is State.Text, so it starts there.

935      switch (this.state) {
936        case State.Text: {
937          this.stateText(c)
938          break
939        }
940        case State.InterpolationOpen: {
941          this.stateInterpolationOpen(c)
942          break
943        }

For example, if the state is Text and the current character is <, it executes the ontext callback while updating state to State.BeforeTagName.

318  private stateText(c: number): void {
319    if (c === CharCodes.Lt) {
320      if (this.index > this.sectionStart) {
321        this.cbs.ontext(this.sectionStart, this.index)
322      }
323      this.state = State.BeforeTagName
324      this.sectionStart = this.index

In this way, it reads characters in specific states and transitions states based on the character type, proceeding step by step.

Basically, it's a repetition of this process.

Due to the large amount of implementation for other states and characters, we'll omit them.

(There's a lot, but they're doing the same thing.)

Parse

Now that we have a general understanding of the tokenizer's implementation, let's move on to parse.

This is implemented in parser.ts.

packages/compiler-core/src/parser.ts

Here, the Tokenizer we just discussed is used:

97const tokenizer = new Tokenizer(stack, {

Callbacks are registered for each token to build the template's AST.

Let's look at one example.

Please focus on the oninterpolation callback.

As the name suggests, this is processing related to the Interpolation Node.

108  oninterpolation(start, end) {

Using the length of the delimiters (default is {{ and }}) and the passed indices, it calculates the indices of the inner content of the Interpolation.

112    let innerStart = start + tokenizer.delimiterOpen.length
113    let innerEnd = end - tokenizer.delimiterClose.length

Based on those indices, it retrieves the inner content:

120    let exp = getSlice(innerStart, innerEnd)

Finally, it generates a Node:

129    addNode({
130      type: NodeTypes.INTERPOLATION,
131      content: createExp(exp, false, getLoc(innerStart, innerEnd)),
132      loc: getLoc(start, end),
133    })

addNode is a function that pushes the Node into the existing stack if there is one, or into the root's children if not.

916function addNode(node: TemplateChildNode) {
917  ;(stack[0] || currentRoot).children.push(node)
918}

The stack is a stack where elements are pushed as they nest.

Since we're here, let's look at that process as well.

When an open tag is finished—for example, if it's <p>, at the timing of the >—the current tag is unshifted into the stack:

567function endOpenTag(end: number) {
568  if (tokenizer.inSFCRoot) {
569    // in SFC mode, generate locations for root-level tags' inner content.
570    currentOpenTag!.innerLoc = getLoc(end + 1, end + 1)
571  }
572  addNode(currentOpenTag!)
573  const { tag, ns } = currentOpenTag!
574  if (ns === Namespaces.HTML && currentOptions.isPreTag(tag)) {
575    inPre++
576  }
577  if (currentOptions.isVoidTag(tag)) {
578    onCloseTag(currentOpenTag!, end)
579  } else {
580    stack.unshift(currentOpenTag!)
581    if (ns === Namespaces.SVG || ns === Namespaces.MATH_ML) {
582      tokenizer.inXML = true
583    }
584  }
585  currentOpenTag = null
586}
580    stack.unshift(currentOpenTag!)

Then, in onclosetag, it shifts the stack:

154  onclosetag(start, end) {
155    const name = getSlice(start, end)
156    if (!currentOptions.isVoidTag(name)) {
157      let found = false
158      for (let i = 0; i < stack.length; i++) {
159        const e = stack[i]
160        if (e.tag.toLowerCase() === name.toLowerCase()) {
161          found = true
162          if (i > 0) {
163            emitError(ErrorCodes.X_MISSING_END_TAG, stack[0].loc.start.offset)
164          }
165          for (let j = 0; j <= i; j++) {
166            const el = stack.shift()!

In this way, by making full use of the Tokenizer callbacks, the AST is constructed.

Although the amount of implementation is large, we're essentially just steadily doing these processes.