Utilities
Utility functions for Markdown serialization, HTML sanitization, and video parsing.
Blokhaus exports four utility functions for common tasks: sanitizing pasted HTML, serializing to and from Markdown, and parsing video embed URLs.
Import
import {
sanitizePastedHTML,
serializeNodesToMarkdown,
$parseMarkdownToLexicalNodes,
parseVideoEmbed,
} from "@blokhaus/core";sanitizePastedHTML
Sanitizes raw HTML from the clipboard for safe insertion into the Lexical AST. Uses DOMParser to parse the HTML and applies a strict allowlist-based sanitization pipeline.
Signature
function sanitizePastedHTML(html: string): string;Parameters
| Parameter | Type | Description |
|---|---|---|
html | string | Raw HTML string from the clipboard (typically from ClipboardEvent.clipboardData.getData('text/html')). |
Returns
A sanitized HTML string safe for conversion into Lexical nodes via $generateNodesFromDOM.
What it does
The sanitization pipeline performs the following operations:
-
Strips dangerous tags entirely (element and all children):
<script>,<style>,<iframe>,<object>,<noscript>. -
Strips all
styleattributes -- removes inline font sizes, colors, margins. -
Strips all
classattributes. -
Normalizes heading levels -- Google Docs uses generic elements with inline font-size styles. Before stripping styles, the sanitizer checks font sizes and converts them to semantic headings:
- 32px+ maps to
<h1> - 24px+ maps to
<h2> - 18px+ maps to
<h3>
- 32px+ maps to
-
Unwraps
<span>and<font>elements -- these are non-semantic after style stripping. -
Converts
<div>to<p>-- divs are treated as paragraphs. -
Normalizes non-semantic tags:
<b>becomes<strong>,<i>becomes<em>,<del>and<strike>become<s>. -
Preserves only semantic attributes:
hrefon<a>,srcandalton<img>. All other attributes are stripped. -
Applies a strict element allowlist -- anything not on the list is unwrapped to its text content.
Allowlist
The following HTML elements survive sanitization:
p, br, h1, h2, h3, h4, h5, h6, strong, em, u, s, code, pre,
blockquote, ul, ol, li, a, img, table, thead, tbody, tr, th, td, hrAdditionally, b, i, del, and strike are accepted but normalized to their semantic equivalents (strong, em, s).
Example
const dirty =
'<div style="font-size: 32px" class="heading"><b onclick="alert()">Title</b></div>';
const clean = sanitizePastedHTML(dirty);
// Result: '<h1><strong>Title</strong></h1>'This function is automatically called by the PastePlugin. You typically do
not need to call it directly unless building a custom paste handler.
serializeNodesToMarkdown
Serializes an array of Lexical nodes to a Markdown string. This is a pure function with no side effects -- it does not read from or write to the editor.
Signature
function serializeNodesToMarkdown(nodes: LexicalNode[]): string;Parameters
| Parameter | Type | Description |
|---|---|---|
nodes | LexicalNode[] | An array of Lexical nodes to serialize. Typically obtained via $getRoot().getChildren() inside editor.read(). |
Returns
A Markdown string with double newlines (\n\n) separating block-level elements.
Supported node types
| Node type | Markdown output |
|---|---|
| HeadingNode (h1-h6) | # , ## , ### , etc. |
| ParagraphNode | Plain text |
| QuoteNode | > text |
| ListNode (bullet) | - item |
| ListNode (number) | 1. item |
| ListNode (check) | - [x] item or - [ ] item |
| ImageNode |  |
| VideoNode | [title](src) |
| HorizontalRuleNode | --- |
| CalloutNode | > [!emoji] text |
| ToggleContainerNode | <details><summary>...</summary>...</details> |
| TableNode | Pipe-delimited Markdown table with header separator |
| LinkNode (inline) | [text](url) |
| Bold text | **text** |
| Italic text | *text* |
| Strikethrough text | ~~text~~ |
| Code text | `text` |
| Unknown nodes | Falls back to getTextContent() |
Example
const markdown = editor.read(() => {
const root = $getRoot();
return serializeNodesToMarkdown(root.getChildren());
});
console.log(markdown);
// # Hello World
//
// This is a paragraph with **bold** and *italic* text.
//
// - Item one
// - Item twoThis function must be called with pre-read nodes. Call it inside
editor.read() or with nodes that have already been extracted from the editor
state. Do not call it inside editor.update().
$parseMarkdownToLexicalNodes
Parses a Markdown string into an array of Lexical nodes. The $ prefix indicates that this function creates Lexical nodes and must be called inside editor.update().
Signature
function $parseMarkdownToLexicalNodes(markdown: string): LexicalNode[];Parameters
| Parameter | Type | Description |
|---|---|---|
markdown | string | A Markdown string to parse. |
Returns
An array of Lexical nodes ready to be inserted into the AST.
Supported Markdown syntax
| Markdown | Lexical node |
|---|---|
# Heading through ###### Heading | HeadingNode (h1-h6) |
> Quote text | QuoteNode |
- item or * item | ListNode (bullet) |
1. item | ListNode (number) |
- [x] item or - [ ] item | ListNode (check) with checked/unchecked items |
**bold** | TextNode with bold format |
*italic* | TextNode with italic format |
`code` | TextNode with code format |
~~strikethrough~~ | TextNode with strikethrough format |
| Plain text | ParagraphNode |
Example
editor.update(() => {
const nodes = $parseMarkdownToLexicalNodes(
"# Hello\n\nThis is **bold** text.",
);
const root = $getRoot();
root.clear();
nodes.forEach((node) => root.append(node));
});This function creates Lexical nodes that require an active editor context.
Always call it inside editor.update(). Calling it outside an update context
will throw.
parseVideoEmbed
Parses a URL string and returns embed information for known video providers. Pure function with no side effects.
Signature
function parseVideoEmbed(url: string): VideoEmbedInfo | null;Parameters
| Parameter | Type | Description |
|---|---|---|
url | string | A URL string to parse. |
Returns
A VideoEmbedInfo object if the URL is valid, or null if the input is not a valid URL.
interface VideoEmbedInfo {
provider: "youtube" | "vimeo" | "loom" | "generic";
embedUrl: string;
thumbnailUrl?: string;
}Supported providers
| Provider | Recognized URL patterns | Embed URL format |
|---|---|---|
| YouTube | youtube.com/watch?v=ID, youtu.be/ID, youtube.com/embed/ID, youtube.com/shorts/ID, youtube.com/live/ID, m.youtube.com/watch?v=ID | https://www.youtube.com/embed/{ID} |
| Vimeo | vimeo.com/ID, player.vimeo.com/video/ID, vimeo.com/channels/*/ID | https://player.vimeo.com/video/{ID} |
| Loom | loom.com/share/ID, loom.com/embed/ID | https://www.loom.com/embed/{ID} |
| Generic | Any other valid HTTP(S) URL | The original URL (passed through) |
Examples
parseVideoEmbed("https://www.youtube.com/watch?v=dQw4w9WgXcQ");
// {
// provider: 'youtube',
// embedUrl: 'https://www.youtube.com/embed/dQw4w9WgXcQ',
// thumbnailUrl: 'https://img.youtube.com/vi/dQw4w9WgXcQ/hqdefault.jpg',
// }
parseVideoEmbed("https://vimeo.com/123456789");
// {
// provider: 'vimeo',
// embedUrl: 'https://player.vimeo.com/video/123456789',
// }
parseVideoEmbed("https://www.loom.com/share/abc123def456");
// {
// provider: 'loom',
// embedUrl: 'https://www.loom.com/embed/abc123def456',
// }
parseVideoEmbed("https://example.com/my-video.mp4");
// {
// provider: 'generic',
// embedUrl: 'https://example.com/my-video.mp4',
// }
parseVideoEmbed("not a url");
// null
parseVideoEmbed("ftp://files.example.com/video.mp4");
// null (only http and https are supported)Thumbnail support
Thumbnails are returned for YouTube videos only (https://img.youtube.com/vi/{ID}/hqdefault.jpg). Vimeo thumbnails require an API call and are not included. Loom thumbnails are not supported.
Related
- PastePlugin -- Uses
sanitizePastedHTMLinternally - AIPreviewNode -- Uses
$parseMarkdownToLexicalNodeson accept - AI Integration guide -- Uses
serializeNodesToMarkdownfor context - VideoPlugin -- Uses
parseVideoEmbedfor URL processing - Paste Handling guide -- Paste pipeline details