Common Myths in Character Counting
Even in 2025, many myths about character counting persist across blogs, documentation, and developer forums. These misconceptions lead to real bugs in production software. Here are the most common ones — and why they’re wrong.
Myth number one: “All emojis are 2 code points.” Reality: simple face emojis are two code points, but people emojis with skin tones add modifiers, and ZWJ sequences like families add many more. There is no single fixed size for “an emoji”.
The Persistent Flag Myth
Myth: “Every flag emoji is 4 code points.” Reality: every flag is exactly two regional indicator symbols. The number four comes from counting UTF-16 units, not code points. The flag itself is one grapheme cluster made of two code points.
The Family Emoji Confusion
Myth: “The family emoji is always 8 code points and 20 UTF-16 units.” Reality: only the versions with skin tones applied reach those numbers. The default yellow family emoji is seven code points and eleven UTF-16 units. Most tools and articles still quote the skinned version.
The “JavaScript is Broken” Fallacy
Myth: “JavaScript string.length is broken.” Reality: it works exactly as designed — it counts UTF-16 code units, not human characters. The web now has Intl.Segmenter for the human count. Both are correct in their own context.
The Future
As Unicode continues to evolve, new sequences and modifiers will appear. Tools and documentation that lock in “fixed” numbers for emojis will quickly become outdated. The only future-proof approach is to use the official Unicode segmentation rules — exactly what modern browsers provide.
Understanding these distinctions doesn’t just satisfy curiosity — it leads to better software that works correctly for everyone, everywhere, with any text they choose to write.
Myths fade. Accurate text handling stays.