Architecture
This document provides a deep dive into the architecture and design decisions of Ox Content.
Overview
Ox Content is designed as a modular, high-performance Markdown processing toolkit. The architecture follows the Oxc philosophy of prioritizing speed, memory efficiency, and correctness.
Crate Structure
ox-content/
├── crates/
│ ├── ox_content_allocator/ # Foundation: Arena allocator
│ ├── ox_content_ast/ # Core: AST definitions
│ ├── ox_content_parser/ # Core: Markdown parser
│ ├── ox_content_renderer/ # Core: HTML renderer
│ ├── ox_content_search/ # Core: Full-text search engine
│ ├── ox_content_ssg/ # Core: Static site generation
│ ├── ox_content_napi/ # Bindings: Node.js via napi-rs
│ ├── ox_content_wasm/ # Bindings: WebAssembly
│ ├── ox_content_vite/ # Integration: Vite plugin
│ ├── ox_content_og_image/ # Feature: OG image generation
│ └── ox_content_docs/ # Feature: Source code documentation
├── playground/ # Interactive playground
└── docs/ # Documentation (self-hosted)
Memory Management
Arena Allocation with bumpalo
Ox Content uses bumpalo for arena-based allocation. This is the key to our performance advantage.
How Arena Allocation Works
Traditional: 4 separate heap allocations, 4 separate deallocations
Arena: 1 contiguous region, 1 deallocation (drop arena)
Benefits
Fast Allocation - Just bump a pointer, no free list traversal
Zero-Copy Parsing - AST nodes can reference source slices directly
Efficient Deallocation - Drop the entire arena at once
Cache-Friendly - Related data stored contiguously in memory
No Fragmentation - Memory is allocated linearly
Implementation
// ox_content_allocator/src/lib.rs
use bumpalo::Bump;
/// Arena allocator for AST nodes.
pub struct Allocator {
bump: Bump,
}
impl Allocator {
/// Creates a new allocator with default capacity.
pub fn new() -> Self {
Self { bump: Bump::new() }
}
/// Allocates a value in the arena.
pub fn alloc<T>(&self, value: T) -> &mut T {
self.bump.alloc(value)
}
/// Allocates a string slice in the arena.
pub fn alloc_str(&self, s: &str) -> &str {
self.bump.alloc_str(s)
}
/// Creates a new Vec in the arena.
pub fn new_vec<T>(&self) -> Vec<'_, T> {
Vec::new_in(&self.bump)
}
}
// Re-export arena-aware types with standard names
pub type Box<'a, T> = bumpalo::boxed::Box<'a, T>;
pub type Vec<'a, T> = bumpalo::collections::Vec<'a, T>;
pub type String<'a> = bumpalo::collections::String<'a>;
Usage Pattern
fn process_markdown(source: &str) -> String {
// Create arena - all allocations happen here
let allocator = Allocator::new();
// Parse document - AST allocated in arena
let parser = Parser::new(&allocator, source);
let document = parser.parse().unwrap();
// Render to HTML - output is owned String
let mut renderer = HtmlRenderer::new();
let html = renderer.render(&document);
html
// allocator dropped here - all AST memory freed at once
}
AST Design
mdast Specification
The AST follows the mdast specification, which is part of the unified ecosystem. This ensures compatibility with existing tools and plugins.
Node Hierarchy
Document (root)
├── Block Nodes
│ ├── Paragraph
│ │ └── Inline Nodes...
│ ├── Heading (depth: 1-6)
│ │ └── Inline Nodes...
│ ├── CodeBlock (lang, meta, value)
│ ├── BlockQuote
│ │ └── Block Nodes...
│ ├── List (ordered, start, spread)
│ │ └── ListItem (checked)
│ │ └── Block Nodes...
│ ├── Table
│ │ └── TableRow
│ │ └── TableCell
│ │ └── Inline Nodes...
│ ├── ThematicBreak
│ └── Html (raw)
│
└── Inline Nodes
├── Text (value)
├── Emphasis
│ └── Inline Nodes...
├── Strong
│ └── Inline Nodes...
├── InlineCode (value)
├── Link (url, title)
│ └── Inline Nodes...
├── Image (url, alt, title)
├── Break
├── Delete (GFM)
│ └── Inline Nodes...
└── FootnoteReference (identifier)
Span Information
Every node includes source location information:
/// Source span (byte offsets).
#[derive(Debug, Clone, Copy)]
pub struct Span {
/// Start byte offset (inclusive).
pub start: u32,
/// End byte offset (exclusive).
pub end: u32,
}
impl Span {
pub fn new(start: u32, end: u32) -> Self {
Self { start, end }
}
}
This enables:
Error messages with precise source locations
Source maps for debugging
Syntax highlighting in editors
Incremental re-parsing
Visitor Pattern
The AST can be traversed using the visitor pattern:
/// Trait for visiting AST nodes.
pub trait Visit<'a> {
fn visit_document(&mut self, document: &Document<'a>) {
for node in &document.children {
self.visit_node(node);
}
}
fn visit_node(&mut self, node: &Node<'a>) {
match node {
Node::Paragraph(n) => self.visit_paragraph(n),
Node::Heading(n) => self.visit_heading(n),
Node::CodeBlock(n) => self.visit_code_block(n),
// ... other variants
}
}
fn visit_paragraph(&mut self, paragraph: &Paragraph<'a>) {
for child in ¶graph.children {
self.visit_node(child);
}
}
fn visit_heading(&mut self, heading: &Heading<'a>) {
for child in &heading.children {
self.visit_node(child);
}
}
// ... other visit methods with default implementations
}
Example: Table of Contents Generator
use ox_content_ast::{Visit, Document, Heading, Node, Text};
struct TocGenerator {
entries: Vec<TocEntry>,
}
struct TocEntry {
depth: u8,
text: String,
id: String,
}
impl<'a> Visit<'a> for TocGenerator {
fn visit_heading(&mut self, heading: &Heading<'a>) {
let mut text = String::new();
for child in &heading.children {
if let Node::Text(t) = child {
text.push_str(t.value);
}
}
let id = slugify(&text);
self.entries.push(TocEntry {
depth: heading.depth,
text,
id,
});
}
}
fn generate_toc(document: &Document<'_>) -> Vec<TocEntry> {
let mut generator = TocGenerator { entries: vec![] };
generator.visit_document(document);
generator.entries
}
Parser Design
Architecture
Parser Options
/// Parser configuration options.
#[derive(Debug, Clone, Default)]
pub struct ParserOptions {
/// Enable GFM (GitHub Flavored Markdown) extensions.
pub gfm: bool,
/// Enable footnotes.
pub footnotes: bool,
/// Enable task lists.
pub task_lists: bool,
/// Enable tables.
pub tables: bool,
/// Enable strikethrough.
pub strikethrough: bool,
/// Enable autolinks.
pub autolinks: bool,
/// Maximum nesting depth for block elements.
pub max_nesting_depth: usize,
}
impl ParserOptions {
/// Creates options with all GFM extensions enabled.
pub fn gfm() -> Self {
Self {
gfm: true,
footnotes: true,
task_lists: true,
tables: true,
strikethrough: true,
autolinks: true,
max_nesting_depth: 100,
}
}
}
Parsing Strategy
Block-First - Parse block structure first (paragraphs, headings, etc.)
Inline Later - Parse inline content within blocks
Lazy Evaluation - Only parse what's needed
Error Recovery - Continue parsing after errors when possible
CommonMark Compliance
The parser follows the CommonMark spec:
ATX headings (
# Heading)Setext headings (underlined)
Fenced code blocks (
` or ~~~)Indented code blocks
Block quotes (
>)Lists (ordered and unordered)
Thematic breaks (
---,***,___)Emphasis and strong emphasis
Links and images
Hard and soft line breaks
Renderer Design
HTML Renderer
/// HTML renderer with customizable options.
pub struct HtmlRenderer {
options: HtmlRendererOptions,
output: String,
}
/// Renderer configuration.
#[derive(Debug, Clone)]
pub struct HtmlRendererOptions {
/// Use XHTML-style self-closing tags.
pub xhtml: bool,
/// Soft break string.
pub soft_break: String,
/// Hard break string.
pub hard_break: String,
/// Enable syntax highlighting.
pub highlight: bool,
/// Sanitize HTML output.
pub sanitize: bool,
}
Renderer Trait
Custom renderers can be implemented:
/// Trait for rendering AST to output format.
pub trait Renderer {
type Output;
fn render(&mut self, document: &Document<'_>) -> RenderResult<Self::Output>;
}
HTML Escaping
The renderer properly escapes HTML entities:
| Character | Entity |
|---|---|
& |
& |
< |
< |
> |
> |
" |
" |
' |
' |
URL encoding is also handled for link/image URLs.
NAPI Bindings
Architecture
Data Transfer
AST is serialized to JSON for JavaScript interop
HTML rendering happens in Rust for maximum performance
Async support for large documents
Thread Safety
The NAPI bindings are designed to be thread-safe:
Each parse operation creates its own allocator
No shared mutable state between calls
Vite Integration
Environment API
Ox Content integrates with Vite's Environment API for SSG:
// Creates a server-side environment for Markdown processing
const mdEnv = new Environment('markdown', {
// Custom module resolution for .md files
resolve: {
extensions: ['.md'],
},
// Transform .md to JS modules
transform: async (code, id) => {
if (id.endsWith('.md')) {
const result = await parseAndRender(code);
return `export default ${JSON.stringify(result)}`;
}
},
});
Hot Module Replacement
The Vite plugin supports HMR for Markdown files:
File change detected
Re-parse changed file
Send update to client
Update rendered content without full reload
Performance Characteristics
Memory Usage
| Content Size | Traditional Parser | Ox Content |
|---|---|---|
| 1 KB | ~50 KB heap | ~8 KB arena |
| 10 KB | ~500 KB heap | ~80 KB arena |
| 100 KB | ~5 MB heap | ~800 KB arena |
Parse Speed (approximate)
| Content Size | Traditional Parser | Ox Content |
|---|---|---|
| 1 KB | ~1 ms | ~0.1 ms |
| 10 KB | ~10 ms | ~1 ms |
| 100 KB | ~100 ms | ~10 ms |
Benchmarks vary by content complexity and hardware.
Security Considerations
HTML Sanitization
When rendering untrusted Markdown, enable sanitization:
let options = HtmlRendererOptions {
sanitize: true,
..Default::default()
};
let mut renderer = HtmlRenderer::with_options(options);
Nesting Limits
The parser enforces maximum nesting depth to prevent stack overflow:
let options = ParserOptions {
max_nesting_depth: 50, // Limit nesting
..Default::default()
};
Input Validation
Maximum input size limits
Invalid UTF-8 handling
Malformed Markdown graceful handling
Future Directions
Incremental Parsing - Re-parse only changed portions
Streaming Parser - Parse large documents in chunks
WASM Build - Run in browsers without NAPI
Custom Syntax Extensions - Plugin system for custom blocks
Source Maps - Full source map generation