Unicode Inspector
Inspect any character's Unicode code point, name, category, and HTML entity.
Type or paste characters...
About this tool
The Unicode Inspector is a tool for examining the technical details of any character or piece of text. Every character you type—whether it's a letter, emoji, space, or symbol—has an underlying code point assigned by the Unicode standard. This tool reveals that hidden structure, showing you the exact code point (like U+0041 for the letter A), the character's formal Unicode name, its category, the UTF-8 bytes that represent it, and its HTML entity. Understanding what lies beneath the surface of your text is invaluable when working with internationalization, debugging encoding issues, or simply satisfying curiosity about how digital text is actually stored.
Using the tool is straightforward: paste or type your text into the input field, and the Unicode Inspector immediately generates a detailed breakdown of every character. The table displays each character alongside its code point in hexadecimal (U+XXXX), its official Unicode name (for example, 'LATIN CAPITAL LETTER A' or 'EURO SIGN'), its Unicode category (letter, digit, punctuation, symbol, space, or control), the UTF-8 byte sequence showing how it's encoded in memory, and its HTML entity (useful for inserting special characters in web pages). You can copy all results as a tab-separated table for use in spreadsheets or documentation.
This tool is particularly useful for developers working with text processing, web developers handling special characters in HTML or CSS, linguists studying different writing systems, and anyone troubleshooting mysterious encoding problems or string length mismatches. It also helps you understand why certain characters like accented letters can be represented in multiple ways—either as a single precomposed code point or as a base letter plus combining marks—a distinction that matters for database storage, string comparison, and internationalization.
Frequently Asked Questions
Code Implementation
def inspect_unicode(text: str):
"""Print Unicode code point details for each character."""
for char in text:
cp = ord(char)
try:
import unicodedata
name = unicodedata.name(char, "UNKNOWN")
category = unicodedata.category(char)
except Exception:
name = "UNKNOWN"
category = "??"
# UTF-8 bytes
utf8_bytes = char.encode("utf-8")
utf8_hex = " ".join(f"{b:02X}" for b in utf8_bytes)
print(f"U+{cp:04X} {char!r:6} {name:<40} {category} UTF-8: {utf8_hex}")
inspect_unicode("Hello €")
# U+0048 'H' LATIN CAPITAL LETTER H Lu UTF-8: 48
# U+0065 'e' LATIN SMALL LETTER E Ll UTF-8: 65
# ...
# U+20AC '€' EURO SIGN Sc UTF-8: E2 82 ACComments & Feedback
Comments are powered by Giscus. Sign in with GitHub to leave a comment.