files/en-us/web/javascript/reference/global_objects/string/index.md
The String object is used to represent and manipulate a
sequence of characters.
Strings are useful for holding data that can be represented in text form. Some of the
most-used operations on strings are to check their {{jsxref("String/length", "length")}}, to build and concatenate them using the
+ and += string operators,
checking for the existence or location of substrings with the
{{jsxref("String/indexOf", "indexOf()")}} method, or extracting substrings
with the {{jsxref("String/substring", "substring()")}} method.
Strings can be created as primitives, from string literals, or as objects, using the {{jsxref("String/String", "String()")}} constructor:
const string1 = "A string primitive";
const string2 = 'Also a string primitive';
const string3 = `Yet another string primitive`;
const string4 = new String("A String object");
String primitives and string objects share many behaviors, but have other important differences and caveats. See "String primitives and String objects" below.
String literals can be specified using single or double quotes, which are treated identically, or using the backtick character <kbd>`</kbd>. This last form specifies a template literal: with this form you can interpolate expressions. For more information on the syntax of string literals, see lexical grammar.
There are two ways to access an individual character in a string. The first is the {{jsxref("String/charAt", "charAt()")}} method:
"cat".charAt(1); // gives value "a"
The other way is to treat the string as an array-like object, where individual characters correspond to a numerical index:
"cat"[1]; // gives value "a"
When using bracket notation for character access, attempting to delete or assign a value to these properties will not succeed. The properties involved are neither writable nor configurable. (See {{jsxref("Object.defineProperty()")}} for more information.)
Use the less-than and greater-than operators to compare strings:
const a = "a";
const b = "b";
if (a < b) {
// true
console.log(`${a} is less than ${b}`);
} else if (a > b) {
console.log(`${a} is greater than ${b}`);
} else {
console.log(`${a} and ${b} are equal.`);
}
Note that all comparison operators, including === and ==, compare strings case-sensitively. A common way to compare strings case-insensitively is to convert both to the same case (upper or lower) before comparing them.
function areEqualCaseInsensitive(str1, str2) {
return str1.toUpperCase() === str2.toUpperCase();
}
The choice of whether to transform by toUpperCase() or toLowerCase() is mostly arbitrary, and neither one is fully robust when extending beyond the Latin alphabet. For example, the German lowercase letter ß and ss are both transformed to SS by toUpperCase(), while the Turkish letter ı would be falsely reported as unequal to I by toLowerCase() unless specifically using toLocaleLowerCase("tr").
const areEqualInUpperCase = (str1, str2) =>
str1.toUpperCase() === str2.toUpperCase();
const areEqualInLowerCase = (str1, str2) =>
str1.toLowerCase() === str2.toLowerCase();
areEqualInUpperCase("ß", "ss"); // true; should be false
areEqualInLowerCase("ı", "I"); // false; should be true
A locale-aware and robust solution for testing case-insensitive equality is to use the {{jsxref("Intl.Collator")}} API or the string's localeCompare() method — they share the same interface — with the sensitivity option set to "accent" or "base".
const areEqual = (str1, str2, locale = "en-US") =>
str1.localeCompare(str2, locale, { sensitivity: "accent" }) === 0;
areEqual("ß", "ss", "de"); // false
areEqual("ı", "I", "tr"); // true
The localeCompare() method enables string comparison in a similar fashion as strcmp() — it allows sorting strings in a locale-aware manner.
Note that JavaScript distinguishes between String objects and
{{Glossary("Primitive", "primitive string")}} values. (The same is true of
{{jsxref("Boolean")}} and {{jsxref("Number", "Numbers")}}.)
String literals (denoted by double or single quotes) and strings returned from
String calls in a non-constructor context (that is, called without using
the {{jsxref("Operators/new", "new")}} keyword) are primitive strings. In contexts where a
method is to be invoked on a primitive string or a property lookup occurs, JavaScript
will automatically wrap the string primitive and call the method or perform the property
lookup on the wrapper object instead.
const strPrim = "foo"; // A literal is a string primitive
const strPrim2 = String(1); // Coerced into the string primitive "1"
const strPrim3 = String(true); // Coerced into the string primitive "true"
const strObj = new String(strPrim); // String with new returns a string wrapper object.
console.log(typeof strPrim); // "string"
console.log(typeof strPrim2); // "string"
console.log(typeof strPrim3); // "string"
console.log(typeof strObj); // "object"
[!WARNING] You should rarely find yourself using
Stringas a constructor.
String primitives and String objects also give different results when
using {{jsxref("Global_Objects/eval", "eval()")}}. Primitives passed to
eval are treated as source code; String objects are treated as
all other objects are, by returning the object. For example:
const s1 = "2 + 2"; // creates a string primitive
const s2 = new String("2 + 2"); // creates a String object
console.log(eval(s1)); // returns the number 4
console.log(eval(s2)); // returns the string "2 + 2"
For these reasons, the code may break when it encounters String objects
when it expects a primitive string instead, although generally, authors need not worry
about the distinction.
A String object can always be converted to its primitive counterpart with
the {{jsxref("String/valueOf", "valueOf()")}} method.
console.log(eval(s2.valueOf())); // returns the number 4
Many built-in operations that expect strings first coerce their arguments to strings (which is largely why String objects behave similarly to string primitives). The operation can be summarized as follows:
undefined turns into "undefined".null turns into "null".true turns into "true"; false turns into "false".toString(10).toString(10).[Symbol.toPrimitive]() (with "string" as hint), toString(), and valueOf() methods, in that order. The resulting primitive is then converted to a string.There are several ways to achieve nearly the same effect in JavaScript.
`${x}` does exactly the string coercion steps explained above for the embedded expression.String() function: String(x) uses the same algorithm to convert x, except that Symbols don't throw a {{jsxref("TypeError")}}, but return "Symbol(description)", where description is the description of the Symbol.+ operator: "" + x coerces its operand to a primitive instead of a string, and, for some objects, has entirely different behaviors from normal string coercion. See its reference page for more details.Depending on your use case, you may want to use `${x}` (to mimic built-in behavior) or String(x) (to handle symbol values without throwing an error), but you should not use "" + x.
Strings are represented fundamentally as sequences of UTF-16 code units. In UTF-16 encoding, every code unit is exact 16 bits long. This means there are a maximum of 2<sup>16</sup>, or 65536 possible characters representable as single UTF-16 code units. This character set is called the basic multilingual plane (BMP), and includes the most common characters like the Latin, Greek, Cyrillic alphabets, as well as many East Asian characters. Each code unit can be written in a string with \u followed by exactly four hex digits.
However, the entire Unicode character set is much, much bigger than 65536. The extra characters are stored in UTF-16 as surrogate pairs, which are pairs of 16-bit code units that represent a single character. To avoid ambiguity, the two parts of the pair must be between 0xD800 and 0xDFFF, and these code units are not used to encode single-code-unit characters. (More precisely, leading surrogates, also called high-surrogate code units, have values between 0xD800 and 0xDBFF, inclusive, while trailing surrogates, also called low-surrogate code units, have values between 0xDC00 and 0xDFFF, inclusive.) Each Unicode character, comprised of one or two UTF-16 code units, is also called a Unicode code point. Each Unicode code point can be written in a string with \u{xxxxxx} where xxxxxx represents 1–6 hex digits.
A "lone surrogate" is a 16-bit code unit satisfying one of the descriptions below:
0xD800–0xDBFF, inclusive (i.e., is a leading surrogate), but it is the last code unit in the string, or the next code unit is not a trailing surrogate.0xDC00–0xDFFF, inclusive (i.e., is a trailing surrogate), but it is the first code unit in the string, or the previous code unit is not a leading surrogate.Lone surrogates do not represent any Unicode character. Although most JavaScript built-in methods handle them correctly because they all work based on UTF-16 code units, lone surrogates are often not valid values when interacting with other systems — for example, encodeURI() will throw a {{jsxref("URIError")}} for lone surrogates, because URI encoding uses UTF-8 encoding, which does not have any encoding for lone surrogates. Strings not containing any lone surrogates are called well-formed strings, and are safe to be used with functions that do not deal with UTF-16 (such as encodeURI() or {{domxref("TextEncoder")}}). You can check if a string is well-formed with the {{jsxref("String/isWellFormed", "isWellFormed()")}} method, or sanitize lone surrogates with the {{jsxref("String/toWellFormed", "toWellFormed()")}} method.
On top of Unicode characters, there are certain sequences of Unicode characters that should be treated as one visual unit, known as a grapheme cluster. The most common case is emojis: many emojis that have a range of variations are actually formed by multiple emojis, usually joined by the <ZWJ> (U+200D) character.
You must be careful which level of characters you are iterating on. For example, split("") will split by UTF-16 code units and will separate surrogate pairs. String indexes also refer to the index of each UTF-16 code unit. On the other hand, [Symbol.iterator]() iterates by Unicode code points. Iterating through grapheme clusters will require some custom code.
"😄".split(""); // ['\ud83d', '\ude04']; splits into two lone surrogates
// "Backhand Index Pointing Right: Dark Skin Tone"
[..."👉🏿"]; // ['👉', '🏿']
// splits into the basic "Backhand Index Pointing Right" emoji and
// the "Dark skin tone" emoji
// "Family: Man, Boy"
[..."👨👦"]; // [ '👨', '', '👦' ]
// splits into the "Man" and "Boy" emoji, joined by a ZWJ
// The United Nations flag
[..."🇺🇳"]; // [ '🇺', '🇳' ]
// splits into two "region indicator" letters "U" and "N".
// All flag emojis are formed by joining two region indicator letters
String objects. When called as a function, it returns primitive values of type String.These properties are defined on String.prototype and shared by all String instances.
String instances, the initial value is the {{jsxref("String/String", "String")}} constructor.These properties are own properties of each String instance.
length of the string. Read-only.{{jsxref("String.prototype.at()")}}
index. Accepts negative integers, which count back from the last string character.{{jsxref("String.prototype.charAt()")}}
index.{{jsxref("String.prototype.charCodeAt()")}}
index.{{jsxref("String.prototype.codePointAt()")}}
pos.{{jsxref("String.prototype.concat()")}}
{{jsxref("String.prototype.endsWith()")}}
searchString.{{jsxref("String.prototype.includes()")}}
searchString.{{jsxref("String.prototype.indexOf()")}}
searchValue, or -1 if not found.{{jsxref("String.prototype.isWellFormed()")}}
{{jsxref("String.prototype.lastIndexOf()")}}
searchValue, or -1 if not found.{{jsxref("String.prototype.localeCompare()")}}
compareString comes before, after, or is equivalent to the
given string in sort order.{{jsxref("String.prototype.match()")}}
regexp against a string.{{jsxref("String.prototype.matchAll()")}}
regexp's matches.{{jsxref("String.prototype.normalize()")}}
{{jsxref("String.prototype.padEnd()")}}
targetLength.{{jsxref("String.prototype.padStart()")}}
targetLength.{{jsxref("String.prototype.repeat()")}}
count times.{{jsxref("String.prototype.replace()")}}
searchFor using
replaceWith. searchFor may be a string
or Regular Expression, and replaceWith may be a string or
function.{{jsxref("String.prototype.replaceAll()")}}
searchFor using
replaceWith. searchFor may be a string
or Regular Expression, and replaceWith may be a string or
function.{{jsxref("String.prototype.search()")}}
regexp and
the calling string.{{jsxref("String.prototype.slice()")}}
{{jsxref("String.prototype.split()")}}
sep.{{jsxref("String.prototype.startsWith()")}}
searchString.{{jsxref("String.prototype.substr()")}} {{deprecated_inline}}
{{jsxref("String.prototype.substring()")}}
{{jsxref("String.prototype.toLocaleLowerCase()")}}
: The characters within a string are converted to lowercase while respecting the current locale.
For most languages, this will return the same as {{jsxref("String/toLowerCase", "toLowerCase()")}}.
{{jsxref("String.prototype.toLocaleUpperCase()")}}
: The characters within a string are converted to uppercase while respecting the current locale.
For most languages, this will return the same as {{jsxref("String/toUpperCase", "toUpperCase()")}}.
{{jsxref("String.prototype.toLowerCase()")}}
{{jsxref("String.prototype.toString()")}}
{{jsxref("String.prototype.toUpperCase()")}}
{{jsxref("String.prototype.toWellFormed()")}}
{{jsxref("String.prototype.trim()")}}
{{jsxref("String.prototype.trimEnd()")}}
{{jsxref("String.prototype.trimStart()")}}
{{jsxref("String.prototype.valueOf()")}}
String.prototype[Symbol.iterator]()
[!WARNING] Deprecated. Avoid these methods.
They are of limited use, as they are based on a very old HTML standard and provide only a subset of the currently available HTML tags and attributes. Many of them create deprecated or non-standard markup today. In addition, they do string concatenation without any validation or sanitation, which makes them a potential security threat when directly inserted using
innerHTML. Use DOM APIs such asdocument.createElement()instead.
<a name="name"> (hypertext target)<blink><a href="url"> (link to URL)Note that these methods do not check if the string itself contains HTML tags, so it's possible to create invalid HTML:
"</b>".bold(); // <b></b></b>
The only escaping they do is to replace " in the attribute value (for {{jsxref("String/anchor", "anchor()")}}, {{jsxref("String/fontcolor", "fontcolor()")}}, {{jsxref("String/fontsize", "fontsize()")}}, and {{jsxref("String/link", "link()")}}) with ".
"foo".anchor('"Hello"'); // <a name=""Hello"">foo</a>
The String() function is a more reliable way of converting values to strings than calling the toString() method of the value, as the former works when used on null and {{jsxref("undefined")}}. For example:
// You cannot access properties on null or undefined
const nullVar = null;
nullVar.toString(); // TypeError: Cannot read properties of null
String(nullVar); // "null"
const undefinedVar = undefined;
undefinedVar.toString(); // TypeError: Cannot read properties of undefined
String(undefinedVar); // "undefined"
{{Specifications}}
{{Compat}}