Capitalizing words in Javascript with RegExes

regexp, javascript

An example of how to use functions in JS regular expressions is to imitate the /e flag other languages have.

"abb Cdd-Eff.ghh 1AAA".toLowerCase()
  .replace(/(^|[^a-z0-9-])([a-z])/g,
    function(m, m1, m2, p) {
      return m1 + m2.toUpperCase();
    });
//output: Abb Cdd-eff.Ghh 1aaa

Breaking it down:

    "abb Cdd-Eff.ghh 1AAA"

    an anonymous string literal I am using on the console to test

    .toLowerCase()

    make it all lower case first. Normal string methods apply even to anonymous strings not initialised with new String( "blah" );

    .replace(

    start the regular expression fun

    / .... /g

    replace takes either a regular expression or a string (which will be transformed into a regular expression) as first argument. Here a regular expression with the flag g, global, is used, meaning the regular expression wil be run until the end of the string and all matches collected

    (^

    match whatever is before the beginning of a word - the start of the string...

    |

    ...or...

    [^a-z0-9-])

    a character which is neither a letter, a digit, nor the dash. Note that the dash between a-z and 0-9 represents a range, while the one at the very end is just a dash

    , function(

    the second arguments to replace is often a string, but it can also be a function. As the RegExp was created with the g flag, this function will be called for every match - i.e., in this case, for every word.

    m,

    the arguments passed to the function are similar to those returned by the match method. The first one is the complete matched string (not used in this function)

    m1, m2

    these are matches - i.e. the expression inside the parentheses, i.e. the character before the start of a word and the first character of the word respectively

    p)

    the last argument is the position in the string where the match start - not used in this function

    { return m1 + m2.toUpperCase(); }

    the body of the function puts strings back into the initial, anonymous strings with some manipulation. m1 is puts back where it was found, and m2 is capitalized before putting it back. The remaining characters were not matched therefore will not be touched.

    Note that this regular expression is not UTF safe, i.e. special characters such as umlauts etc will not be treated correctly.