Escaping Fun: replaceAll (“\\\\”, “\\\\\\\\”)!

There is a small escaping bug in cucumber-jvm. The java generated groovy step snippets do not properly escape the escape character \ in the steps regular expression.

Currently it generates:

Given(~'^I have (\d+) cukes in my "([^"]*)" belly') { int arg1 ->
    // Express the Regexp above with the code you wish you had
    throw new PendingException()
}

which should be:

Given(~'^I have (\\d+) cukes in my "([^"]*)" belly$') { int arg1 ->
    // Express the Regexp above with the code you wish you had
    throw new PendingException()
}

Cucumber generates code snippets we have to escape the escape character in the snippet output too, i.e. the (\\d+). I have modified the groovy snippet generation before, so it should be an easy fix. Or so I thought. ;-)

It was not a big issue but it took me longer than expected to understand because escaping the escape characters is a bit confusing at first using replaceAll().

Escaping the escape character (\) gets interesting if it is a regular expression: it needs to be escaped again. All this endless escaping turns into this stupid piece of code:

public String escapePattern(String pattern) {
    return pattern.replaceAll ("\\\\", "\\\\\\\\");
}

The method above gets a regular expression string for a step as input (as I see it in the debugger):

"^I have (\\d+) cukes in my \"([^\"]*)\" belly$"

Actually the real string is just:

^I have (\d+) cukes in my "([^"]*)" belly$

And what we like to see as the final regular expression is:

^I have (\\d+) cukes in my "([^"]*)" belly$

We just want to replace \ with \\.

replaceAll takes a regular expression pattern (String) as the first parameter so we have to escape the \ twice to match it:

  • \ => \\ because \ is the escape character for regular expressions
  • \\ => \\\\ because \ is the escape character for Strings

Because \ is also a special character in the replacement (second) parameter of replaceAll. So we have to escape \ twice again:

  • \\ => \\\\ because \ is the escape character in the replacement parameter`
  • \\\\=> \\\\\\\\ because \ is the escape character for Strings

Which finally leads to this stupid line: pattern.replaceAll ("\\\\", "\\\\\\\\"); !

This can be simplified by using String.replace (CharSequence target, CharSequence replacement) (since 1.5). It does not use regular expressions which allows us to drop one level of escaping:

pattern.replace ("\\", "\\\\");

Which is a lot easier to understand. Which is also the final solution for the pull request :-)

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s