docs/objects/strings.md
Strings are a fundamental data type in JavaScript, and V8 uses a complex hierarchy of string representations to optimize various operations like concatenation, slicing, and internalization.
All strings in V8 inherit from the String class (defined in src/objects/string.h). V8 uses different concrete classes depending on how the string was created and how it is used.
SeqString)Captures sequential string values where the characters are stored directly in the object.
SeqOneByteString: Characters are stored as 8-bit Latin-1 code units. Used for ASCII-like strings.SeqTwoByteString: Characters are stored as 16-bit UTF-16 code units. Used for strings containing non-Latin-1 characters.ConsString)Describes string values built by using the addition operator (+) on strings.
ConsString is a pair of pointers to the two constituent strings.ConsString is read or becomes too deep, V8 may "flatten" it by allocating a sequential string and copying the characters into it.SlicedString)Describes strings that are substrings of another sequential string.
substr() or slice(), a SlicedString contains a pointer to the parent string, an offset, and a length.ThinString)Describes string objects that are just references to another string object.
ThinString pointing at its internalized version (which is allocated as a new object).ExternalString)Describes string values that are backed by a string resource that lies outside the V8 heap (e.g., in the embedder like Chrome or Node.js).
ExternalString is live.When a string is used as a property key (e.g., obj["prop"]), V8 internalizes it. This means it ensures there is only one unique instance of that string value in the String Table (a hash table).
SeqString is internalized, it might be changed to an InternalizedString in place if possible.ConsString), V8 creates a new InternalizedString and converts the original string into a ThinString pointing to the new one.As mentioned above, ConsString instances are tree structures. To read characters efficiently or pass them to APIs that expect flat buffers, V8 will flatten the tree into a single SeqString.
V8 uses the InstanceType field in the object Map to identify the specific representation and encoding of a string. For strings, the high-order bits (bits 7-15) are cleared, and the lower bits form a bitfield:
000: Sequential String001: Cons String010: External String011: Sliced String101: Thin String0: Two-Byte (UTF-16)1: One-Byte (Latin-1)0: Internalized String1: Not Internalized StringThis bitfield layout allows V8 to perform extremely fast checks (e.g., checking if a string is one-byte or internalized) using simple bitwise operations.
The String Table is a hash table that stores all internalized strings.
src/objects/string.h: Main header file defining the string hierarchy.src/objects/string.tq: Torque definitions for strings.src/snapshot/code-serializer.cc: Handles serialization of strings for code caching.