Advertisement
</>

HTML Entity Decoder

Decode HTML entities (&amp;amp;, &amp;#x2026;, &amp;quot;, &amp;#39;, etc.) back to their original characters. Handles named, decimal, and hex entities. Bidirectional — encode mode also available.

Advertisement

HTML Entities, Decoded and Explained

HTML entities are character references that let you embed special characters in HTML source without breaking the markup. The most common ones — &amp;, &lt;, &gt;, &quot;, &#39; — encode the five characters that have syntactic meaning in HTML and would otherwise be parsed as markup or attribute boundaries. Beyond those, entities can represent any Unicode character, from accented letters to emoji.

This tool handles three formats interchangeably:

  • Named entities&amp;, &copy;, &mdash;, &hellip;. There are about 250 named entities in the HTML5 spec; the decoder ships with the full list.
  • Decimal numeric&#38;, &#169;, &#8212;. The number is the Unicode codepoint in base 10.
  • Hexadecimal numeric&#x26;, &#xA9;, &#x2014;. Same codepoint, base 16. Both &#X26; (uppercase X) and &#x26; (lowercase x) are valid.

All three formats require a trailing semicolon to be reliably parsed. Browsers tolerate missing semicolons in some cases, but the spec — and this tool — does not.

Why You Need to Decode

You'll encounter encoded text in three main places:

  • Web scraping — page source has Tom &amp; Jerry; you want Tom & Jerry in your database.
  • Double-escaped JSON — an API returns a JSON string field whose value contains HTML, and that value was HTML-encoded before being JSON-encoded. You need to JSON-parse, then HTML-decode.
  • RSS/Atom feeds — feeds frequently encode the article body as HTML inside a CDATA or escaped string. To extract plain text, decode then strip tags.

Why You Need to Encode

The opposite problem is cross-site scripting (XSS). If you paste user-supplied text into HTML without encoding it, an input like <script>alert('xss')</script> becomes executable JavaScript in the visitor's browser. The fix is to encode the five HTML-special characters as entities before embedding:

  • &&amp;
  • <&lt;
  • >&gt;
  • "&quot; (only required inside attribute values, but harmless elsewhere)
  • '&#39; (same)

Modern frameworks (React, Vue, Svelte, Angular) encode by default in template expressions — you only have to think about it when bypassing the framework with dangerouslySetInnerHTML, v-html, or equivalent. For server-side templating without auto-escaping (some older PHP, raw Express+EJS, etc.), encoding is your responsibility.

Doing the Same in Code

// JavaScript (browser) — leverage the DOM parser
function decodeEntities(s) {
  const t = document.createElement('textarea');
  t.innerHTML = s;
  return t.value;
}
function encodeEntities(s) {
  return s.replace(/[<>&"']/g, c => (
    { '&': '&amp;', '<': '&lt;', '>': '&gt;',
      '"': '&quot;', "'": '&#39;' }[c]
  ));
}

// Node.js
import { decode, encode } from 'html-entities';
decode('Tom &amp; Jerry');   // 'Tom & Jerry'
encode('<script>');         // '&lt;script&gt;'

// Python
import html
html.unescape('Tom &amp; Jerry')   // 'Tom & Jerry'
html.escape('<script>')           // '&lt;script&gt;'

Common Pitfalls

  • Don't roll your own decoder with a few .replace() calls — you will miss entities and edge cases. Use the browser's DOM parser, a library, or this tool.
  • Don't decode then write back as HTML without re-encoding — that defeats the purpose and reintroduces XSS.
  • Watch for double-encoding bugs — if you see &amp;amp; in output, somewhere in your pipeline a string was encoded twice.
  • UTF-8 is preferred over entities for non-ASCII characters in modern HTML — there's no need to write &#233; when é is fine in a UTF-8 page. Entities are still useful for obfuscating mailto: addresses or when the encoding is uncertain.

How to Use

  1. Pick Decode or Encode — Decode turns &amp; into &, Encode turns & into &amp;.
  2. Paste your text — output appears live as you type.
  3. Try a sample — see how named, decimal, and hex entities all decode correctly.
  4. Copy the result — round-trip-safe: encoding then decoding restores the original.

Frequently Asked Questions

What are HTML entities and why do I need to decode them?

HTML entities are character references like &amp;amp; (= &), &amp;lt; (= <), &amp;quot; (= "), &amp;#x2026; (= …) that allow special characters to appear safely in HTML source. You'll see them when scraping web pages, reading API responses that double-escape strings, or working with XML/RSS feeds. Decoding turns them back into the actual characters for display, comparison, or further processing.

What's the difference between named, decimal, and hex entities?

All three reference the same characters, just in different syntaxes. Named entities (&amp;amp;, &amp;quot;, &amp;copy;) use a predefined name from the HTML spec — there are about 250 of them. Decimal entities (&amp;#38;, &amp;#34;, &amp;#169;) use the Unicode codepoint as a base-10 number. Hex entities (&amp;#x26;, &amp;#x22;, &amp;#xA9;) use the codepoint as a base-16 number. The decoder handles all three.

Why does &amp;amp;amp; decode to '&amp;amp;' instead of '&'?

That's a double-escape: the original &amp;amp; was already an entity for &, and someone re-encoded the entire string. One pass of decoding turns &amp;amp;amp; into &amp;amp; (an entity for &). A second pass turns &amp;amp; into &. The tool decodes once per click — paste the output back in if you need a second pass.

When should I encode instead of decode?

Encode whenever you embed user-supplied text directly into HTML. Without encoding, a username like <code>&lt;script&gt;alert('xss')&lt;/script&gt;</code> becomes executable JavaScript. Encoding the five HTML-special characters (& < > " ') as entities turns the string into harmless text. Modern frameworks (React, Vue, Angular) auto-encode by default — manual encoding is mostly needed when building HTML strings by hand.

Does this decode emoji and CJK characters?

Yes. Decimal and hex entities can reference any Unicode codepoint, including emoji (&amp;#x1F600; → 😀) and CJK characters (&amp;#x4E2D; → 中). The decoder handles surrogate pairs and codepoints above U+FFFF correctly.

Why is my decoded output still showing entities?

Three common reasons. (1) Your input has a typo: missing semicolon, e.g. <code>&amp;amp</code> instead of <code>&amp;amp;</code> — entities require the trailing semicolon to decode reliably. (2) The entity name is invalid: <code>&amp;quote;</code> (with an extra 'e') is not recognized. (3) The character itself is a literal ampersand: <code>&amp;</code> followed by random text doesn't form an entity. The tool only decodes well-formed entities.

Comments

No comments yet. Be the first!