Markdown to HTML: How Conversion Works and Which Tools to Use
Markdown to HTML: How Conversion Works and Which Tools to Use
Converting markdown to HTML is the core operation behind every markdown-based website, documentation tool, and blog. Understanding how markdown-to-HTML conversion works helps you choose the right tool, troubleshoot unexpected output, and make informed decisions about sanitization and rendering security.
Quick Answer: Markdown-to-HTML conversion follows a 2-step process: parsing to an AST (abstract syntax tree), then rendering to an HTML string. The top JavaScript tools are markdown-it (best spec compliance), marked (fastest), and Pandoc (widest format support). For untrusted input, always sanitize output with DOMPurify or sanitize-html to prevent XSS attacks.
How Does Markdown-to-HTML Conversion Work?
A markdown processor takes a plain text input and produces an HTML string. Internally, most processors follow the same two-step approach:
- Parsing: The input is read and converted into an abstract syntax tree (AST). Each element (paragraph, heading, code block, link) becomes a node in the tree.
- Rendering: The AST is walked and each node is serialized to an HTML string.
This two-step approach is important because it allows processors to support plugins that transform the AST before rendering. A plugin can add nodes (for example, auto-inserting a table of contents), modify nodes (adding rel="noopener" to external links), or remove nodes (stripping certain elements from untrusted content).
A simple markdown input like this:
## Getting Started
Install the package using npm:
```bash
npm install my-package
See the documentation for configuration options.
Produces output like this:
```html
<h2>Getting Started</h2>
<p>Install the package using npm:</p>
<pre><code class="language-bash">npm install my-package
</code></pre>
<p>See the <a href="https://example.com">documentation</a> for configuration options.</p>
The conversion is straightforward for standard markdown. The differences between processors become apparent when you use extended syntax, raw HTML, or edge cases in the spec.
What Is the Inline vs. Block Element Mapping?
Markdown maps cleanly to HTML’s inline/block distinction:
| Markdown Element | HTML Output | Type |
|---|---|---|
**bold** |
<strong>bold</strong> |
Inline |
*italic* |
<em>italic</em> |
Inline |
`code` |
<code>code</code> |
Inline |
[link](url) |
<a href="url">link</a> |
Inline |
 |
<img src="src" alt="img"> |
Inline |
# Heading |
<h1>Heading</h1> |
Block |
| Paragraph text | <p>text</p> |
Block |
> blockquote |
<blockquote><p>...</p></blockquote> |
Block |
--- |
<hr> |
Block |
| Fenced code block | <pre><code>...</code></pre> |
Block |
| Unordered list | <ul><li>...</li></ul> |
Block |
| Ordered list | <ol><li>...</li></ol> |
Block |
Understanding this mapping helps when styling the output. If you apply CSS to a rendered markdown page, you target standard HTML elements: h1, h2, p, ul, li, code, pre, blockquote, table, and so on.
What Are the Most Common Markdown-to-HTML Tools?
markdown-it
markdown-it is a JavaScript/Node.js parser that is CommonMark-compliant and highly extensible through a plugin architecture. It is the engine used by many popular tools including VuePress, VitePress, and Docusaurus.
import MarkdownIt from 'markdown-it';
const md = new MarkdownIt();
const html = md.render('# Hello\n\nThis is **markdown**.');
console.log(html);
// <h1>Hello</h1>
// <p>This is <strong>markdown</strong>.</p>
Adding plugins extends its capabilities:
import MarkdownIt from 'markdown-it';
import markdownItFootnote from 'markdown-it-footnote';
import markdownItTaskLists from 'markdown-it-task-lists';
const md = new MarkdownIt()
.use(markdownItFootnote)
.use(markdownItTaskLists);
markdown-it is the best choice for JavaScript/Node.js projects where you need a reliable CommonMark parser with plugin support.
marked
marked is another JavaScript parser, known for its simplicity and speed. It is less strict about spec compliance than markdown-it but processes large documents very quickly.
import { marked } from 'marked';
const html = marked('# Hello\n\nThis is **markdown**.');
console.log(html);
marked supports custom renderers, which allow you to override how specific elements are output:
import { marked, Renderer } from 'marked';
const renderer = new Renderer();
renderer.link = (href, title, text) => {
return `<a href="${href}" target="_blank" rel="noopener noreferrer">${text}</a>`;
};
marked.use({ renderer });
marked is a good choice for simpler use cases or when raw performance is the priority.
Pandoc
Pandoc is a command-line document converter written in Haskell. It is the most powerful option in this list and handles a much wider range of input and output formats than any JavaScript library: markdown, HTML, DOCX, PDF, EPUB, LaTeX, RST, and many others.
pandoc input.md -o output.html
With syntax highlighting and a standalone HTML file:
pandoc input.md --highlight-style=pygments -s -o output.html
Pandoc supports Pandoc Markdown, which is a superset of CommonMark with footnotes, definition lists, superscript, subscript, math (via MathJax/KaTeX), and more. For converting markdown to PDF via LaTeX or directly via WeasyPrint, Pandoc is the standard tool. The Markdown to PDF Guide covers this workflow in detail.
showdown
showdown is a JavaScript bidirectional converter (markdown to HTML and HTML to markdown). It is older than the other tools listed here and predates the CommonMark spec. Its default behavior does not fully follow CommonMark, which can cause subtle differences in how edge cases are rendered.
import showdown from 'showdown';
const converter = new showdown.Converter();
const html = converter.makeHtml('# Hello\n\nThis is **markdown**.');
showdown is worth knowing about because it appears in older codebases, but for new projects markdown-it or marked are better choices.
What Are the Output Quality Differences Between Processors?
The main quality differences between processors come down to:
Spec compliance: markdown-it is the most strictly CommonMark-compliant of the JavaScript parsers. Pandoc follows its own extended spec but is also rigorous. marked and showdown have historical quirks.
Extended syntax support: Pandoc supports the widest range of extensions. markdown-it supports them through plugins. marked has limited extension support beyond the core. For a full breakdown of which extensions exist and where they are supported, see Markdown Extended Syntax.
Syntax highlighting: Pandoc has built-in syntax highlighting using the Kate highlighting library. JavaScript parsers typically delegate to a separate library like highlight.js or Shiki.
Table rendering: All four tools support tables, but only markdown-it and Pandoc handle complex table edge cases reliably.
Why Is Sanitization Critical for Markdown-to-HTML Conversion?
When you render markdown from untrusted sources (user-submitted content, forum posts, comments), the HTML output must be sanitized before embedding in a webpage. This is critical for security. An unsanitized markdown renderer can be exploited through raw HTML injection:
<script>document.cookie = 'stolen=true'</script>
Or through malicious link attributes:
[Click me](javascript:alert('XSS'))
Sanitization with DOMPurify
DOMPurify is the standard client-side sanitization library for HTML strings:
import MarkdownIt from 'markdown-it';
import DOMPurify from 'dompurify';
const md = new MarkdownIt({ html: false }); // Disable raw HTML first
const rawHtml = md.render(untrustedMarkdownInput);
const cleanHtml = DOMPurify.sanitize(rawHtml);
document.getElementById('output').innerHTML = cleanHtml;
The best approach is to disable raw HTML in the parser itself (html: false in markdown-it) and then additionally sanitize the output. Defense in depth.
Server-Side Sanitization
For Node.js server rendering, use the sanitize-html package:
import sanitizeHtml from 'sanitize-html';
const clean = sanitizeHtml(rawHtml, {
allowedTags: ['h1','h2','h3','h4','h5','h6','p','ul','ol','li',
'code','pre','blockquote','strong','em','a','img',
'table','thead','tbody','tr','th','td'],
allowedAttributes: {
'a': ['href', 'title'],
'img': ['src', 'alt'],
'code': ['class'],
},
});
Being explicit about which tags and attributes are allowed is safer than relying on a denylist.
How Do You Embed Converted HTML in a Webpage?
Once you have your sanitized HTML string, embedding it is straightforward. In a React component:
type Props = {
content: string; // Pre-sanitized HTML string
};
export function MarkdownContent({ content }: Props) {
return (
<article
className="prose prose-neutral max-w-none"
dangerouslySetInnerHTML={{ __html: content }}
/>
);
}
The Tailwind Typography plugin (prose class) applies clean, readable default styles to the rendered HTML elements (headings, paragraphs, lists, code blocks, blockquotes). It removes the need to write custom CSS for each element type.
For server components in Next.js, you can render markdown during the build step and pass the resulting HTML string as a prop, keeping the markdown parsing work off the client entirely.
How Do You Choose the Right Markdown-to-HTML Tool?
| Use Case | Recommended Tool |
|---|---|
| JavaScript/Node.js web app | markdown-it |
| Simple fast rendering | marked |
| CLI conversion, PDF, DOCX output | Pandoc |
| Legacy project | showdown (maintain) or migrate to markdown-it |
| User-generated content | markdown-it + DOMPurify |
| Extended syntax (footnotes, math) | Pandoc or markdown-it with plugins |
For most modern JavaScript projects, markdown-it with appropriate plugins and DOMPurify for untrusted content is the right combination. Pandoc is the right choice for anything that needs to go to PDF, Word, or other non-HTML formats.
If you want to see live markdown-to-HTML conversion in action without writing any code, edtr.md renders your markdown instantly in the browser as you type.
Try it yourself
Open edtr.md and start writing Markdown with live preview, diagrams, math, and PDF export. Free, no sign-up.
Open editor