mdast-util-to-hast
!Buildbuild-badgebuild
!Coveragecoverage-badgecoverage
!Downloadsdownloads-badgedownloads
!Sizesize-badgesize
!Sponsorssponsors-badgecollective
!Backersbackers-badgecollective
!Chatchat-badgechatmdast utility to transform to hast.
Contents
* [`defaultFootnoteBackContent(referenceIndex, rereferenceIndex)`](#defaultfootnotebackcontentreferenceindex-rereferenceindex)
* [`defaultFootnoteBackLabel(referenceIndex, rereferenceIndex)`](#defaultfootnotebacklabelreferenceindex-rereferenceindex)
* [`defaultHandlers`](#defaulthandlers)
* [`toHast(tree[, options])`](#tohasttree-options)
* [`FootnoteBackContentTemplate`](#footnotebackcontenttemplate)
* [`FootnoteBackLabelTemplate`](#footnotebacklabeltemplate)
* [`Handler`](#handler)
* [`Handlers`](#handlers)
* [`Options`](#options)
* [`Raw`](#raw)
* [`State`](#state)
* [Example: supporting HTML in markdown naïvely](#example-supporting-html-in-markdown-naïvely)
* [Example: supporting HTML in markdown properly](#example-supporting-html-in-markdown-properly)
* [Example: footnotes in languages other than English](#example-footnotes-in-languages-other-than-english)
* [Example: supporting custom nodes](#example-supporting-custom-nodes)
* [Default handling](#default-handling)
* [Fields on nodes](#fields-on-nodes)
* [Nodes](#nodes)
What is this?
This package is a utility that takes an mdast (markdown) syntax tree as input and turns it into a hast (HTML) syntax tree.When should I use this?
This project is useful when you want to deal with ASTs and turn markdown to HTML.The hast utility
hast-util-to-mdast
hast-util-to-mdast does the inverse of
this utility.
It turns HTML into markdown.The remark plugin
remark-rehype
remark-rehype wraps this utility to also
turn markdown to HTML at a higher-level (easier) abstraction.Install
This package is ESM onlyesm. In Node.js (version 16+), install with npm:npm install mdast-util-to-hast
In Deno with
esm.sh
esmsh:import {toHast} from 'https://esm.sh/mdast-util-to-hast@13'
In browsers with
esm.sh
esmsh:<script type="module">
import {toHast} from 'https://esm.sh/mdast-util-to-hast@13?bundle'
</script>
Use
Say we have the followingexample.md
:## Hello **World**!
…and next to it a module
example.js
:import {fs} from 'node:fs/promises'
import {toHtml} from 'hast-util-to-html'
import {fromMarkdown} from 'mdast-util-from-markdown'
import {toHast} from 'mdast-util-to-hast'
const markdown = String(await fs.readFile('example.md'))
const mdast = fromMarkdown(markdown)
const hast = toHast(mdast)
const html = toHtml(hast)
console.log(html)
…now running
node example.js
yields:<h2>Hello <strong>World</strong>!</h2>
API
This package exports the identifiersdefaultFootnoteBackContent
api-default-footnote-back-content,
defaultFootnoteBackLabel
api-default-footnote-back-label,
defaultHandlers
api-default-handlers, and
toHast
api-to-hast.
There is no default export.defaultFootnoteBackContent(referenceIndex, rereferenceIndex)
Generate the default content that GitHub uses on backreferences.Parameters
referenceIndex
(number
)
— index of the definition in the order that they are first referenced,
0-indexed
rereferenceIndex
(number
)
— index of calls to the same definition, 0-indexed
Returns
Content (Array<ElementContent>
).defaultFootnoteBackLabel(referenceIndex, rereferenceIndex)
Generate the default label that GitHub uses on backreferences.Parameters
referenceIndex
(number
)
— index of the definition in the order that they are first referenced,
0-indexed
rereferenceIndex
(number
)
— index of calls to the same definition, 0-indexed
Returns
Label (string
).defaultHandlers
Default handlers for nodes (Handlers
api-handlers).toHast(tree[, options])
Transform mdast to hast.Parameters
tree
(MdastNode
mdast-node)
— mdast tree
options
(Options
api-options, optional)
— configuration
Returns
hast tree (HastNode
hast-node).Notes
HTML
Raw HTML is available in mdast ashtml
mdast-html nodes and can be embedded
in hast as semistandard raw
nodes.
Most utilities ignore raw
nodes but two notable ones don’t:hast-util-to-html
hast-util-to-html also has an option
`allowDangerousHtml` which will output the raw HTML.
This is typically discouraged as noted by the option name but is useful if
you completely trust authors
hast-util-raw
hast-util-raw can handle the raw embedded HTML strings by
parsing them into standard hast nodes (`element`, `text`, etc).
This is a heavy task as it needs a full HTML parser, but it is the only way
to support untrusted content
Footnotes
Many options supported here relate to footnotes. Footnotes are not specified by CommonMark, which we follow by default. They are supported by GitHub, so footnotes can be enabled in markdown withmdast-util-gfm
mdast-util-gfm.The options
footnoteBackLabel
and footnoteLabel
define natural language
that explains footnotes, which is hidden for sighted users but shown to
assistive technology.
When your page is not in English, you must define translated values.Back references use ARIA attributes, but the section label itself uses a heading that is hidden with an
sr-only
class.
To show it to sighted users, define different attributes in
footnoteLabelProperties
.Clobbering
Footnotes introduces a problem, as it links footnote calls to footnote definitions on the page throughid
attributes generated from user content,
which results in DOM clobbering.DOM clobbering is this:
<p id=x></p>
<script>alert(x) // `x` now refers to the DOM `p#x` element</script>
Elements by their ID are made available by browsers on the
window
object,
which is a security risk.
Using a prefix solves this problem.More information on how to handle clobbering and the prefix is explained in Example: headings (DOM clobbering) in
rehype-sanitize
clobber-example.Unknown nodes
Unknown nodes are nodes with a type that isn’t inhandlers
or passThrough
.
The default behavior for unknown nodes is:- when the node has a
value
(and doesn’t havedata.hName
,
`data.hProperties`, or `data.hChildren`, see later), create a hast `text`
node
- otherwise, create a
<div>
element (which could be changed with
`data.hName`), with its children mapped from mdast to hast as well
This behavior can be changed by passing an unknownHandler
.FootnoteBackContentTemplate
Generate content for the backreference dynamically.For the following markdown:
Alpha[^micromark], bravo[^micromark], and charlie[^remark].
This function will be called with:
0
and0
for the backreference fromthings about micromark
to
`alpha`, as it is the first used definition, and the first call to it
0
and1
for the backreference fromthings about micromark
to
`bravo`, as it is the first used definition, and the second call to it
1
and0
for the backreference fromthings about remark
to
`charlie`, as it is the second used definition
Parameters
referenceIndex
(number
)
— index of the definition in the order that they are first referenced,
0-indexed
rereferenceIndex
(number
)
— index of calls to the same definition, 0-indexed
Returns
Content for the backreference when linking back from definitions to their reference (Array<ElementContent>
, ElementContent
, or string
).FootnoteBackLabelTemplate
Generate a back label dynamically.For the following markdown:
Alpha[^micromark], bravo[^micromark], and charlie[^remark].
This function will be called with:
0
and0
for the backreference fromthings about micromark
to
`alpha`, as it is the first used definition, and the first call to it
0
and1
for the backreference fromthings about micromark
to
`bravo`, as it is the first used definition, and the second call to it
1
and0
for the backreference fromthings about remark
to
`charlie`, as it is the second used definition
Parameters
referenceIndex
(number
)
— index of the definition in the order that they are first referenced,
0-indexed
rereferenceIndex
(number
)
— index of calls to the same definition, 0-indexed
Returns
Back label to use when linking back from definitions to their reference (string
).Handler
Handle a node (TypeScript type).Parameters
— info passed around
node
(MdastNode
mdast-node)
— node to handle
parent
(MdastNode | undefined
mdast-node)
— parent of `node`
Returns
Result (Array<HastNode> | HastNode | undefined
hast-node).Handlers
Handle nodes (TypeScript type).Type
type Handlers = Partial<Record<Nodes['type'], Handler>>
Options
Configuration (TypeScript type).Fields
allowDangerousHtml
(boolean
, default:false
)
— whether to persist raw HTML in markdown in the hast tree
clobberPrefix
(string
, default:'user-content-'
)
— prefix to use before the `id` property on footnotes to prevent them from
*clobbering*
— corresponding virtual file representing the input document
footnoteBackContent
([`FootnoteBackContentTemplate`][api-footnote-back-content-template]
or `string`, default:
[`defaultFootnoteBackContent`][api-default-footnote-back-content])
— content of the backreference back to references
footnoteBackLabel
([`FootnoteBackLabelTemplate`][api-footnote-back-label-template]
or `string`, default:
[`defaultFootnoteBackLabel`][api-default-footnote-back-label])
— label to describe the backreference back to references
footnoteLabel
(string
, default:'Footnotes'
)
— label to use for the footnotes section (affects screen readers)
footnoteLabelProperties
([`Properties`][properties], default: `{className: ['sr-only']}`)
— properties to use on the footnote label
(note that `id: 'footnote-label'` is always added as footnote calls use it
with `aria-describedby` to provide an accessible label)
footnoteLabelTagName
(string
, default:h2
)
— tag name to use for the footnote label
handlers
(Handlers
api-handlers, optional)
— extra handlers for nodes
passThrough
(Array<Nodes['type']>
, optional)
— list of custom mdast node types to pass through (keep) in hast (note that
the node itself is passed, but eventual children are transformed)
unknownHandler
(Handler
api-handler, optional)
— handle all unknown nodes
Raw
Raw string of HTML embedded into HTML AST (TypeScript type).Type
import type {Data, Literal} from 'hast'
interface Raw extends Literal {
type: 'raw'
data?: RawData | undefined
}
interface RawData extends Data {}
State
Info passed around about the current state (TypeScript type).Fields
all
((node: MdastNode) => Array<HastNode>
)
— transform the children of an mdast parent to hast
applyData
(<Type extends HastNode>(from: MdastNode, to: Type) => Type | HastElement
)
— honor the `data` of `from` and maybe generate an element instead of `to`
definitionById
(Map<string, Definition>
)
— definitions by their uppercased identifier
footnoteById
(Map<string, FootnoteDefinition>
)
— footnote definitions by their uppercased identifier
footnoteCounts
(Map<string, number>
)
— counts for how often the same footnote was called
footnoteOrder
(Array<string>
)
— identifiers of order when footnote calls first appear in tree order
handlers
(Handlers
api-handlers)
— applied node handlers
one
((node: MdastNode, parent: MdastNode | undefined) => HastNode | Array<HastNode> | undefined
)
— transform an mdast node to hast
options
(Options
api-options)
— configuration
patch
((from: MdastNode, to: HastNode) => undefined
)wrap
(<Type extends HastNode>(nodes: Array<Type>, loose?: boolean) => Array<Type | HastText>
)
— wrap `nodes` with line endings between each node, adds initial/final line
endings when `loose`
Examples
Example: supporting HTML in markdown naïvely
If you completely trust authors (or plugins) and want to allow them to HTML in markdown, and the last utility has anallowDangerousHtml
option as well (such
as hast-util-to-html
) you can pass allowDangerousHtml
to this utility
(mdast-util-to-hast
):import {fromMarkdown} from 'mdast-util-from-markdown'
import {toHast} from 'mdast-util-to-hast'
import {toHtml} from 'hast-util-to-html'
const markdown = 'It <i>works</i>! <img onerror="alert(1)">'
const mdast = fromMarkdown(markdown)
const hast = toHast(mdast, {allowDangerousHtml: true})
const html = toHtml(hast, {allowDangerousHtml: true})
console.log(html)
…now running
node example.js
yields:<p>It <i>works</i>! <img onerror="alert(1)"></p>
⚠️ Danger: observe that the XSS attack through the onerror
attribute
is still present.
Example: supporting HTML in markdown properly
If you do not trust the authors of the input markdown, or if you want to make sure that further utilities can see HTML embedded in markdown, usehast-util-raw
hast-util-raw.
The following example passes allowDangerousHtml
to this utility
(mdast-util-to-hast
), then turns the raw embedded HTML into proper HTML nodes
(hast-util-raw
), and finally sanitizes the HTML by only allowing safe things
(hast-util-sanitize
):import {raw} from 'hast-util-raw'
import {sanitize} from 'hast-util-sanitize'
import {toHtml} from 'hast-util-to-html'
import {fromMarkdown} from 'mdast-util-from-markdown'
import {toHast} from 'mdast-util-to-hast'
const markdown = 'It <i>works</i>! <img onerror="alert(1)">'
const mdast = fromMarkdown(markdown)
const hast = raw(toHast(mdast, {allowDangerousHtml: true}))
const safeHast = sanitize(hast)
const html = toHtml(safeHast)
console.log(html)
…now running
node example.js
yields:<p>It <i>works</i>! <img></p>
👉 Note: observe that the XSS attack through the onerror
attribute
is no longer present.
Example: footnotes in languages other than English
If you know that the markdown is authored in a language other than English, and you’re usingmicromark-extension-gfm
and mdast-util-gfm
to match how
GitHub renders markdown, and you know that footnotes are (or can?) be used, you
should translate the labels associated with them.Let’s first set the stage:
import {toHtml} from 'hast-util-to-html'
import {gfm} from 'micromark-extension-gfm'
import {fromMarkdown} from 'mdast-util-from-markdown'
import {gfmFromMarkdown} from 'mdast-util-gfm'
import {toHast} from 'mdast-util-to-hast'
const markdown = 'Bonjour[^1]\n\n[^1]: Monde!'
const mdast = fromMarkdown(markdown, {
extensions: [gfm()],
mdastExtensions: [gfmFromMarkdown()]
})
const hast = toHast(mdast)
const html = toHtml(hast)
console.log(html)
…now running
node example.js
yields:<p>Bonjour<sup><a href="#user-content-fn-1" id="user-content-fnref-1" data-footnote-ref aria-describedby="footnote-label">1</a></sup></p>
<section data-footnotes class="footnotes"><h2 class="sr-only" id="footnote-label">Footnotes</h2>
<ol>
<li id="user-content-fn-1">
<p>Monde! <a href="#user-content-fnref-1" data-footnote-backref="" aria-label="Back to reference 1" class="data-footnote-backref">↩</a></p>
</li>
</ol>
</section>
This is a mix of English and French that screen readers can’t handle nicely. Let’s say our program does know that the markdown is in French. In that case, it’s important to translate and define the labels relating to footnotes so that screen reader users can properly pronounce the page:
@@ -9,7 +9,16 @@ const mdast = fromMarkdown(markdown, {
extensions: [gfm()],
mdastExtensions: [gfmFromMarkdown()]
})
-const hast = toHast(mdast)
+const hast = toHast(mdast, {
+ footnoteLabel: 'Notes de bas de page',
+ footnoteBackLabel(referenceIndex, rereferenceIndex) {
+ return (
+ 'Retour à la référence ' +
+ (referenceIndex + 1) +
+ (rereferenceIndex > 1 ? '-' + rereferenceIndex : '')
+ )
+ }
+})
const html = toHtml(hast)
console.log(html)
…now running
node example.js
with the above patch applied yields:@@ -1,8 +1,8 @@
<p>Bonjour<sup><a href="#user-content-fn-1" id="user-content-fnref-1" data-footnote-ref aria-describedby="footnote-label">1</a></sup></p>
-<section data-footnotes class="footnotes"><h2 class="sr-only" id="footnote-label">Footnotes</h2>
+<section data-footnotes class="footnotes"><h2 class="sr-only" id="footnote-label">Notes de bas de page</h2>
<ol>
<li id="user-content-fn-1">
-<p>Monde! <a href="#user-content-fnref-1" data-footnote-backref="" aria-label="Back to reference 1" class="data-footnote-backref">↩</a></p>
+<p>Monde! <a href="#user-content-fnref-1" data-footnote-backref="" aria-label="Retour à la référence 1" class="data-footnote-backref">↩</a></p>
</li>
</ol>
</section>
Example: supporting custom nodes
This project supports CommonMark and the GFM constructs (footnotes, strikethrough, tables) and the frontmatter constructs YAML and TOML. Support can be extended to other constructs in two ways: a) with handlers, b) through fields on nodes.For example, when we represent a mark element in markdown and want to turn it into a
<mark>
element in HTML, we can use a handler:import {toHtml} from 'hast-util-to-html'
import {toHast} from 'mdast-util-to-hast'
const mdast = {
type: 'paragraph',
children: [{type: 'mark', children: [{type: 'text', value: 'x'}]}]
}
const hast = toHast(mdast, {
handlers: {
mark(state, node) {
return {
type: 'element',
tagName: 'mark',
properties: {},
children: state.all(node)
}
}
}
})
console.log(toHtml(hast))
We can do the same through certain fields on nodes:
import {toHtml} from 'hast-util-to-html'
import {toHast} from 'mdast-util-to-hast'
const mdast = {
type: 'paragraph',
children: [
{
type: 'mark',
children: [{type: 'text', value: 'x'}],
data: {hName: 'mark'}
}
]
}
console.log(toHtml(toHast(mdast)))
Algorithm
This project by default handles CommonMark, GFM (footnotes, strikethrough, tables) and common frontmatter (YAML, TOML).Existing handlers can be overwritten and handlers for more nodes can be added. It’s also possible to define how mdast is turned into hast through fields on nodes.
Default handling
The following table gives insight into what input turns into what output:mdast node | markdown example | hast node | html example |
---|---|---|---|
blockquote |
|
element (blockquote ) |
|
break |
|
element (br ) |
|
code |
````markdown
```` |
element (pre and code ) |
|
delete (GFM) |
|
element (del ) |
|
emphasis |
|
element (em ) |
|
footnoteReference ,
footnoteDefinition
(GFM) |
|
element (section , sup , a ) |
|
heading |
|
element (h1 …h6 ) |
|
html |
|
Nothing (default), raw (when allowDangerousHtml: true ) |
n/a |
image |
|
element (img ) |
|
imageReference ,
definition |
|
element (img ) |
|
inlineCode |
|
element (code ) |
|
link |
|
element (a ) |
|
linkReference ,
definition |
|
element (a ) |
|
list ,
listItem |
|
element (li and ol or ul ) |
|
paragraph |
|
element (p ) |
|
root |
|
root |
|
strong |
|
element (strong ) |
|
text |
|
text |
|
table ,
tableRow ,
tableCell |
|
element (table , thead , tbody , tr , td , th ) |
|
thematicBreak |
|
element (hr ) |
|
toml (frontmatter) |
|
Nothing |
n/a |
yaml (frontmatter) |
```markdownfenced: yes
…yields (hast):
|