A few days ago someone asked me how I would go about determining the unicode escape sequence for an arbitrary character. Of course, I went to Google Chrome’s web console and showed them how it can be done by using the charCodeAt, toString, and slice functions. The following is a function which will take in a string and return the equivalent with all characters converted to their unicode escape sequence:


/**
 * Replaces each character in the string with the corresponding
 * unicode escape sequence.
 * @param {string} str  The string of characters to escape.
 * @return {string}  The string with the escape sequences.
 */
function toUnicodeSequence(str) {
  for(var i = str.length; i--;) {
    str = str.slice(0, i) + '\\u'
        + ('000' + str.charCodeAt(i).toString(16)).slice(-4)
        + str.slice(i + 1);
  }
  return str;
}

Let’s review some basic JavaScript functionality:

  • The String.prototype.charCodeAt function basically provides the ability to find the integer equivalent of any one character within a string.
  • The Number.prototype.toString function provides the ability to convert the number into a string which represents that number in a different base. In otherwords, (5).toString(2) will result in 101 because binary (from the 2 passed in) version of 5 is 101.
  • The String.prototype.slice function is similar to String.prototype.substring function but it has one advantage: you can pass in negative parameters as well. If you pass in a negative number, the index will be calculated from the end of the string instead of the beginning. Therefore executing ("JavaScript").slice(-6) results in "Script" because those are the last 6 characters of the string "JavaScript".
  • By using these three functions together, we can create a function such as toUnicodeSequence which will return the escape sequences for all of the characters. I am not sure how often one would need such a function but have fun with it. 8)

    UPDATE (2013-08-30): Shorter Definition

    I just realized that this function could be easily shortened by taking advantage of the fact that you can use a regular expression with the global flag set and a callback function in order to replace all of the characters in a string:

    
    /**
     * Replaces each character in the string with the corresponding
     * unicode escape sequence.
     * @param {string} str  The string of characters to escape.
     * @return {string}  The string with the escape sequences.
     */
    function toUnicodeSequence(str) {
      return str.replace(/[\s\S]/g, function(c) {
        return '\\u' + ('000' + c.charCodeAt(0).toString(16)).slice(-4);
      });
    }
    

2 Comments

ildar · June 6, 2013 at 7:29 AM

var outStr = escape(inStr).replace(/%(u[0-9a-f]{2})?([0-9a-f]{2})/ig, function($0, $1, $2)
{
return ‘\’ + ($1 || ‘u00’) + $2;
});

ildar · June 6, 2013 at 7:40 AM

and the couple of crazy modifications

var outStr = escape(inStr)
.replace(/%(?=[0-9a-f]{2})/ig, ‘\u00’)
.replace(/%(?=u[0-9a-f]{4})/ig, ‘\’);

var outStr = escape(inStr)
.replace(/%(?=[0-9a-f]{2})/ig, ‘%u00’)
.replace(/%(?=u[0-9a-f]{4})/ig, ‘\’);

Leave a Reply to ildar Cancel reply

Your email address will not be published. Required fields are marked *