blokhaus

Utilities

Utility functions for Markdown serialization, HTML sanitization, and video parsing.

Blokhaus exports four utility functions for common tasks: sanitizing pasted HTML, serializing to and from Markdown, and parsing video embed URLs.

Import

import {
  sanitizePastedHTML,
  serializeNodesToMarkdown,
  $parseMarkdownToLexicalNodes,
  parseVideoEmbed,
} from "@blokhaus/core";

sanitizePastedHTML

Sanitizes raw HTML from the clipboard for safe insertion into the Lexical AST. Uses DOMParser to parse the HTML and applies a strict allowlist-based sanitization pipeline.

Signature

function sanitizePastedHTML(html: string): string;

Parameters

ParameterTypeDescription
htmlstringRaw HTML string from the clipboard (typically from ClipboardEvent.clipboardData.getData('text/html')).

Returns

A sanitized HTML string safe for conversion into Lexical nodes via $generateNodesFromDOM.

What it does

The sanitization pipeline performs the following operations:

  1. Strips dangerous tags entirely (element and all children): <script>, <style>, <iframe>, <object>, <noscript>.

  2. Strips all style attributes -- removes inline font sizes, colors, margins.

  3. Strips all class attributes.

  4. Normalizes heading levels -- Google Docs uses generic elements with inline font-size styles. Before stripping styles, the sanitizer checks font sizes and converts them to semantic headings:

    • 32px+ maps to <h1>
    • 24px+ maps to <h2>
    • 18px+ maps to <h3>
  5. Unwraps <span> and <font> elements -- these are non-semantic after style stripping.

  6. Converts <div> to <p> -- divs are treated as paragraphs.

  7. Normalizes non-semantic tags: <b> becomes <strong>, <i> becomes <em>, <del> and <strike> become <s>.

  8. Preserves only semantic attributes: href on <a>, src and alt on <img>. All other attributes are stripped.

  9. Applies a strict element allowlist -- anything not on the list is unwrapped to its text content.

Allowlist

The following HTML elements survive sanitization:

p, br, h1, h2, h3, h4, h5, h6, strong, em, u, s, code, pre,
blockquote, ul, ol, li, a, img, table, thead, tbody, tr, th, td, hr

Additionally, b, i, del, and strike are accepted but normalized to their semantic equivalents (strong, em, s).

Example

const dirty =
  '<div style="font-size: 32px" class="heading"><b onclick="alert()">Title</b></div>';
const clean = sanitizePastedHTML(dirty);
// Result: '<h1><strong>Title</strong></h1>'

This function is automatically called by the PastePlugin. You typically do not need to call it directly unless building a custom paste handler.


serializeNodesToMarkdown

Serializes an array of Lexical nodes to a Markdown string. This is a pure function with no side effects -- it does not read from or write to the editor.

Signature

function serializeNodesToMarkdown(nodes: LexicalNode[]): string;

Parameters

ParameterTypeDescription
nodesLexicalNode[]An array of Lexical nodes to serialize. Typically obtained via $getRoot().getChildren() inside editor.read().

Returns

A Markdown string with double newlines (\n\n) separating block-level elements.

Supported node types

Node typeMarkdown output
HeadingNode (h1-h6)# , ## , ### , etc.
ParagraphNodePlain text
QuoteNode> text
ListNode (bullet)- item
ListNode (number)1. item
ListNode (check)- [x] item or - [ ] item
ImageNode![alt](src)
VideoNode[title](src)
HorizontalRuleNode---
CalloutNode> [!emoji] text
ToggleContainerNode<details><summary>...</summary>...</details>
TableNodePipe-delimited Markdown table with header separator
LinkNode (inline)[text](url)
Bold text**text**
Italic text*text*
Strikethrough text~~text~~
Code text`text`
Unknown nodesFalls back to getTextContent()

Example

const markdown = editor.read(() => {
  const root = $getRoot();
  return serializeNodesToMarkdown(root.getChildren());
});

console.log(markdown);
// # Hello World
//
// This is a paragraph with **bold** and *italic* text.
//
// - Item one
// - Item two

This function must be called with pre-read nodes. Call it inside editor.read() or with nodes that have already been extracted from the editor state. Do not call it inside editor.update().


$parseMarkdownToLexicalNodes

Parses a Markdown string into an array of Lexical nodes. The $ prefix indicates that this function creates Lexical nodes and must be called inside editor.update().

Signature

function $parseMarkdownToLexicalNodes(markdown: string): LexicalNode[];

Parameters

ParameterTypeDescription
markdownstringA Markdown string to parse.

Returns

An array of Lexical nodes ready to be inserted into the AST.

Supported Markdown syntax

MarkdownLexical node
# Heading through ###### HeadingHeadingNode (h1-h6)
> Quote textQuoteNode
- item or * itemListNode (bullet)
1. itemListNode (number)
- [x] item or - [ ] itemListNode (check) with checked/unchecked items
**bold**TextNode with bold format
*italic*TextNode with italic format
`code`TextNode with code format
~~strikethrough~~TextNode with strikethrough format
Plain textParagraphNode

Example

editor.update(() => {
  const nodes = $parseMarkdownToLexicalNodes(
    "# Hello\n\nThis is **bold** text.",
  );
  const root = $getRoot();
  root.clear();
  nodes.forEach((node) => root.append(node));
});

This function creates Lexical nodes that require an active editor context. Always call it inside editor.update(). Calling it outside an update context will throw.


parseVideoEmbed

Parses a URL string and returns embed information for known video providers. Pure function with no side effects.

Signature

function parseVideoEmbed(url: string): VideoEmbedInfo | null;

Parameters

ParameterTypeDescription
urlstringA URL string to parse.

Returns

A VideoEmbedInfo object if the URL is valid, or null if the input is not a valid URL.

interface VideoEmbedInfo {
  provider: "youtube" | "vimeo" | "loom" | "generic";
  embedUrl: string;
  thumbnailUrl?: string;
}

Supported providers

ProviderRecognized URL patternsEmbed URL format
YouTubeyoutube.com/watch?v=ID, youtu.be/ID, youtube.com/embed/ID, youtube.com/shorts/ID, youtube.com/live/ID, m.youtube.com/watch?v=IDhttps://www.youtube.com/embed/{ID}
Vimeovimeo.com/ID, player.vimeo.com/video/ID, vimeo.com/channels/*/IDhttps://player.vimeo.com/video/{ID}
Loomloom.com/share/ID, loom.com/embed/IDhttps://www.loom.com/embed/{ID}
GenericAny other valid HTTP(S) URLThe original URL (passed through)

Examples

parseVideoEmbed("https://www.youtube.com/watch?v=dQw4w9WgXcQ");
// {
//   provider: 'youtube',
//   embedUrl: 'https://www.youtube.com/embed/dQw4w9WgXcQ',
//   thumbnailUrl: 'https://img.youtube.com/vi/dQw4w9WgXcQ/hqdefault.jpg',
// }

parseVideoEmbed("https://vimeo.com/123456789");
// {
//   provider: 'vimeo',
//   embedUrl: 'https://player.vimeo.com/video/123456789',
// }

parseVideoEmbed("https://www.loom.com/share/abc123def456");
// {
//   provider: 'loom',
//   embedUrl: 'https://www.loom.com/embed/abc123def456',
// }

parseVideoEmbed("https://example.com/my-video.mp4");
// {
//   provider: 'generic',
//   embedUrl: 'https://example.com/my-video.mp4',
// }

parseVideoEmbed("not a url");
// null

parseVideoEmbed("ftp://files.example.com/video.mp4");
// null (only http and https are supported)

Thumbnail support

Thumbnails are returned for YouTube videos only (https://img.youtube.com/vi/{ID}/hqdefault.jpg). Vimeo thumbnails require an API call and are not included. Loom thumbnails are not supported.