A few days ago someone asked me how I would go about determining the unicode escape sequence for an arbitrary character. Of course, I went to Google Chrome’s web console and showed them how it can be done by using the charCodeAt, toString, and slice functions. The following is a function which will take in a string and return the equivalent with all characters converted to their unicode escape sequence:
/**
* Replaces each character in the string with the corresponding
* unicode escape sequence.
* @param {string} str The string of characters to escape.
* @return {string} The string with the escape sequences.
*/
function toUnicodeSequence(str) {
for(var i = str.length; i--;) {
str = str.slice(0, i) + '\\u'
+ ('000' + str.charCodeAt(i).toString(16)).slice(-4)
+ str.slice(i + 1);
}
return str;
}
Let’s review some basic JavaScript functionality:
- The
String.prototype.charCodeAt
function basically provides the ability to find the integer equivalent of any one character within a string. - The
Number.prototype.toString
function provides the ability to convert the number into a string which represents that number in a different base. In otherwords,(5).toString(2)
will result in101
because binary (from the2
passed in) version of5
is101
. - The
String.prototype.slice
function is similar toString.prototype.substring
function but it has one advantage: you can pass in negative parameters as well. If you pass in a negative number, the index will be calculated from the end of the string instead of the beginning. Therefore executing("JavaScript").slice(-6)
results in"Script"
because those are the last6
characters of the string"JavaScript"
.
By using these three functions together, we can create a function such as toUnicodeSequence
which will return the escape sequences for all of the characters. I am not sure how often one would need such a function but have fun with it. 8)
UPDATE (2013-08-30): Shorter Definition
I just realized that this function could be easily shortened by taking advantage of the fact that you can use a regular expression with the global flag set and a callback function in order to replace all of the characters in a string:
/**
* Replaces each character in the string with the corresponding
* unicode escape sequence.
* @param {string} str The string of characters to escape.
* @return {string} The string with the escape sequences.
*/
function toUnicodeSequence(str) {
return str.replace(/[\s\S]/g, function(c) {
return '\\u' + ('000' + c.charCodeAt(0).toString(16)).slice(-4);
});
}
2 Comments
ildar · June 6, 2013 at 7:29 AM
var outStr = escape(inStr).replace(/%(u[0-9a-f]{2})?([0-9a-f]{2})/ig, function($0, $1, $2)
{
return ‘\’ + ($1 || ‘u00’) + $2;
});
ildar · June 6, 2013 at 7:40 AM
and the couple of crazy modifications
var outStr = escape(inStr)
.replace(/%(?=[0-9a-f]{2})/ig, ‘\u00’)
.replace(/%(?=u[0-9a-f]{4})/ig, ‘\’);
var outStr = escape(inStr)
.replace(/%(?=[0-9a-f]{2})/ig, ‘%u00’)
.replace(/%(?=u[0-9a-f]{4})/ig, ‘\’);