Rectangle 27 0

java Escape special characters in a text when text is either enclosed in double quotes or not?


String SPECIAL_REGEX_CHARS = "[()'\"\\[\\]*]";
//String s = "\"te(st\""; // => "te\(st"
//String s = "te(st"; // => te\(st
String s = "te\"st"; // => te\"st
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile("(?s)\"(.*)\"|(.*)").matcher(s);
if (m.matches()) {
    if (m.group(1) == null) { // we have no quotes around
        m.appendReplacement(result, m.group(2).replaceAll(SPECIAL_REGEX_CHARS, "\\\\\\\\$0"));
    }
    else {
        m.appendReplacement(result, "\"" + m.group(1).replaceAll(SPECIAL_REGEX_CHARS, "\\\\\\\\$0") + "\"");
    }
}
m.appendTail(result);
System.out.println(result.toString());
  • If the 1st alternative is matched, we just replace the selected special characters in the first capture group, and then add the " on both ends.
  • If the second alternative is matched, just add the escaping symbol in the whole Group 2.
  • The Matcher#addReplacement() with Matcher#appendTail() allow manipulating groups.
  • To replace with a literal backslash, you need \\\\\\\\ in the replacement pattern.
  • Using (?s)\"(.*)\"|(.*) regex with 2 alternative branches: ".*" matching a string starting with " and ending with " (note that (?s) is a DOTALL inline modifier allowing matching strings with newline sequences) or a .* alternative just matching all other strings.

Here is how I would do that with one regex using an alternation:

I have added a "simpler" solution with minimal regex.

You know, "simplified" may mean different things for different people :) I think this is already simple taking into account your requirements. If anything is unclear, please ask.

You may use the if (s.startsWith("\"") && s.endsWith("\"")) to check if a string has both leading and trailing ", and if it does, you can then trim out the leading and trailing " with replaceAll("^\"|\"$", ""), then escape using your escaping regex, and then add " back. Else, just escape the characters in your set.

Note