<link rel="canonical"> · <meta name="description">
Unicode is a universal character encoding standard that assigns a unique number to every character across all languages and writing systems in the world. From the English letter "A" to the Chinese character "中" to the emoji "🎉", Unicode ensures that computers can consistently represent and process text regardless of platform, software, or language.
Unicode escape sequences represent characters using their hexadecimal code point. For example, the Chinese character "你" has the code point U+4F60, which can be written as the escape sequence \u4F60 in JavaScript strings or as 你 in HTML. Escape sequences are particularly useful when you need to represent non-ASCII characters in environments that only support ASCII.
In JavaScript, you can escape Unicode characters using the \uXXXX format for BMP characters (U+0000 to U+FFFF) or the \u{XXXXX} format for supplementary characters. This is essential when working with JSON, programming strings, or any context where special characters might cause parsing issues.
Unicode defines the mapping between characters and code points, but different encoding schemes represent those code points differently in bytes. UTF-8 uses 1-4 bytes per character and is the most common encoding for web content. UTF-16 uses 2-4 bytes and is used internally by JavaScript strings. UTF-32 uses a fixed 4 bytes per character.
Before Unicode, different systems used conflicting character encodings, causing乱码 (mojibake) when text was viewed on systems using different encodings. Unicode solved this by providing a single, unified character set. Today, virtually all modern software supports Unicode, making it possible to mix languages, symbols, and emoji in a single document.