Rectangle 27 121

Open the file with a FileInputStream, then use an InputStreamReader with the UTF-8 Charset to read characters from the stream, and use a BufferedReader to read lines, e.g. via BufferedReader#readLine, which will give you a string. Once you have the string, you can check for characters that aren't what you consider to be printable.

E.g. (without error checking), using try-with-resources (which is in vaguely modern Java version):

String line;
try (
    InputStream fis = new FileInputStream("the_file_name");
    InputStreamReader isr = new InputStreamReader(fis, Charset.forName("UTF-8"));
    BufferedReader br = new BufferedReader(isr);
) {
    while ((line = br.readLine()) != null) {
        // Deal with the line
    }
}

Or, for one less step, open the file with a FileReader and use a BufferedReader to read lines.

@abhisheknaik96: Thank you for your edit, but only the isr bit was correct; the () are supposed to be (), not {}, and the last semicolon isn't required (but it's allowed, so I've left it -- more in keeping with the lines above it).

java - Check line for unprintable characters while reading text file -...

java file file-io
Rectangle 27 9

To repeat string n number of times we have a repeat method in Stringutils class from Apache commons.In repeat method we can give the String and number of times the string should repeat and the separator which separates the repeated strings.

StringUtils.repeat("Hello"," ",2);

In the above example we are repeating Hello string two times with space as separator. we can give n number of times in 3 argument and any separator in second argument.

This is my preferred solution too as it is very readable, and especially when using a separator it has its special cases taken care of

The class is actually "StringUtils" not "Stringutils"

loops - How to repeat string "n" times in java? - Stack Overflow

java loops for-loop
Rectangle 27 39

The Path class does not have a notion of "extension", probably because the file system itself does not have it. Which is why you need to check its String representation and see if it ends with the four five character string .java. Note that you need a different comparison than simple endsWith if you want to cover mixed case, such as ".JAVA" and ".Java":

path.toString().toLowerCase().endsWith(".java");

How to check the extension of a Java 7 Path - Stack Overflow

java path java-7
Rectangle 27 2

Here is an elegant and pure Java one-line solution:

String str = new String(new char[10]).replace("\0", "1");

Can one initialise a java String with a single repeated character to a...

java string initialization
Rectangle 27 17

Suppose we have a list of String like:

List<String> strList = new ArrayList<>(5);
// insert up to five items to list.

Then we can remove duplicate elements in in multiple ways.

List<String> deDupStringList = new ArrayList<>(new HashSet<>(strList));
List<String> deDupStringList2 = Lists.newArrayList(Sets.newHashSet(strList));

Note: If we want to maintain the insertion order then we need to use LinkedHashSet in place of HashSet.

List<String> deDupStringList3 = stringList.parallelStream().map(String::toLowerCase).distinct().collect(Collectors.toList());

java - How do I remove repeated elements from ArrayList? - Stack Overf...

java list collections arraylist duplicates
Rectangle 27 16

Suppose we have a list of String like:

List<String> strList = new ArrayList<>(5);
// insert up to five items to list.

Then we can remove duplicate elements in in multiple ways.

List<String> deDupStringList = new ArrayList<>(new HashSet<>(strList));
List<String> deDupStringList2 = Lists.newArrayList(Sets.newHashSet(strList));

Note: If we want to maintain the insertion order then we need to use LinkedHashSet in place of HashSet.

List<String> deDupStringList3 = stringList.parallelStream().map(String::toLowerCase).distinct().collect(Collectors.toList());

java - How do I remove repeated elements from ArrayList? - Stack Overf...

java list collections arraylist duplicates
Rectangle 27 1

Tesseract API class provides a isValidWord Method to check if the string is a valid word. You can use this to check the recognized characters. This will increase the accuracy of the output.

I am developing using Tess4j Which is a Java JNA wrapper for tesseract-ocr, and it gives quite good results after checking.

Inaccurate results might be due to the text size, check this out. It says "Accuracy drops off below 10pt x 300dpi, rapidly below 8pt x 300dpi."

Further, not being able to detect more than 4 words depends on a lot of factors, what kind (with how many features) of test image, the size of the image, platform etc.

Thanks but i wanted to know how can we improve the recognition ? Like for instance if you see the project uploaded by Robert Theis at github.com/rmtheis/android-ocr then you can see he has used image enhancement algorithms and even though he uses the same Tesseract API as mine the recognition rate is higher

Oh of course, image pre-processing would increase the accuracy of OCR engine, but with an additional cost of time. for pre-processing you can: Increase the DPI of the image, Resize the image and you can also check Bluring/Sharpening. High contrast betweent text and background is recognized much better. after that try to de-noising it and binarize the image. It increases the accuracy quite good.

java - How do I improve the accuracy of the OCR text from Tesseract? -...

java android android-ndk ocr tesseract
Rectangle 27 1

It is not deprecated. The decode method with two parameters is not deprecated. Please check again. The first parameter is the String to decode; the second is the name of the character encoding to use (e.g., "UTF-8").

How do decode html entities in java? - Stack Overflow

java decode utf8-decode
Rectangle 27 1

It is not deprecated. The decode method with two parameters is not deprecated. Please check again. The first parameter is the String to decode; the second is the name of the character encoding to use (e.g., "UTF-8").

utf 8 - How do decode URL entities in java? - Stack Overflow

java utf-8 decode
Rectangle 27 1

If strings only can occur between brackets, then you don't need to check for them at all and just use "[^"]*" as your regex and find all matches (assuming no escaped quotes).

If that doesn't work because strings could occur in other places too, where you don't want to capture them, do it in two steps.

\[[^\]]*\]
  • Find all occurrences of "[^"]*" within the result of the first match. Or even use a JSON parser to read that string.

Third possibility, cheating a bit:

Search for "[^"\[\]]*"(?=[^\[\]]*\]). That will match a string only if the next bracket that follows is a closing bracket. Limitation: No brackets are allowed inside the strings. I consider this ugly, especially if you look at how it would look like in Java:

List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("\"[^\"\\[\\]]*\"(?=[^\\[\\]]*\\])");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    matchList.add(regexMatcher.group());
}

Do you think anybody who looks at this in a few months can tell what it's doing?

Well, OK, there is another way, but I don't think it's a good one. I've added it to my answer.

java - Regex: How to capture this? (a nested group inside a repeated g...

java regex
Rectangle 27 12

This does what you want in java as a single regex, although I would personally use something like the solution provided by Mark Rhodes. This will get ridiculous quick (if it isn't already...) as the rules get more complicated.

String regex = "^(?=.*?\\p{Lu})(?=.*?[\\p{L}&&[^\\p{Lu}]])(?=.*?\\d)" + 
               "(?=.*?[`~!@#$%^&*()\\-_=+\\\\\\|\\[{\\]};:'\",<.>/?]).*$"

+1 Thank you, Affe, can you explain this to me? I must confess that I'm not a regular expression expert and I just wanted to solve my problem with regular expressions, instead of checking my strings character-by-character.

java - How to check whether a string contains lowercase letter, upperc...

java regex
Rectangle 27 12

This does what you want in java as a single regex, although I would personally use something like the solution provided by Mark Rhodes. This will get ridiculous quick (if it isn't already...) as the rules get more complicated.

String regex = "^(?=.*?\\p{Lu})(?=.*?[\\p{L}&&[^\\p{Lu}]])(?=.*?\\d)" + 
               "(?=.*?[`~!@#$%^&*()\\-_=+\\\\\\|\\[{\\]};:'\",<.>/?]).*$"

+1 Thank you, Affe, can you explain this to me? I must confess that I'm not a regular expression expert and I just wanted to solve my problem with regular expressions, instead of checking my strings character-by-character.

java - How to check whether a string contains lowercase letter, upperc...

java regex
Rectangle 27 2

You can use String.charAt(int) to get a character from any point in the string. Note that, like arrays, the first character in a string is index 0, not 1.

input = input.toUpperCase(); // Makes code work for all cases
int x = input.charAt(0) - 'A';
int y = input.charAt(1) - '1';
char currentValue = board[x][y];

After that, currentValue will contain the value currently on the game board at that location.

Do you mind elaborating on what is going on in your second and third line of code? I don't get why you're subtracting 'A' and '1'.

It's because arrays start from 0, if the user inputs 1, he actually refers to the 0 cell in the array. same for 'A'. it's the ASCII code. 'A' - 'A' = 0. 'B'-'A' = 1 (if i'm not mistaken, 'A' s code is 49, since it's an int.)

tic tac toe - Java Tic Tac Toe: convert string to check for value in 2...

java tic-tac-toe
Rectangle 27 7

How it works: a simpler example

At a high level, the pattern matches any one character ., but additionally performs a grab$2 action, which captures the reversal "mate" of the character that was matched into group 2. This capture is done by building a suffix of the input string whose length matches the length of the prefix up to the current position. We do this by applying assertSuffix on a pattern that grows the suffix by one character, repeating this once forEachDotBehind. Group 1 captures this suffix. The first character of that suffix, captured in group 2, is the reversal "mate" for the character that was matched.

Thus, replacing each matched character with its "mate" has the effect of reversing a string.

To better understand how the regex pattern works, let's first apply it on a simpler input. Also, for our replacement pattern, we'll just "dump" out all the captured strings so we get a better idea of what's going on. Here's a Java version:

System.out.println(
    "123456789"
        .replaceAll(REVERSE, "[$0; $1; $2]\n")
);
[1; 9; 9]
[2; 89; 8]
[3; 789; 7]
[4; 6789; 6]
[5; 56789; 5]
[6; 456789; 4]
[7; 3456789; 3]
[8; 23456789; 2]
[9; 123456789; 1]

Thus, e.g. [3; 789; 7] means that the dot matched 3 (captured in group 0), the corresponding suffix is 789 (group 1), whose first character is 7 (group 2). Note that 7 is 3's "mate".

current position after
                      the dot matched 3
                                      ________
                      1  2 [3] 4  5  6 (7) 8  9
                      \______/         \______/
                       3 dots        corresponding
                       behind      suffix of length 3

Note that a character's "mate" may be to its right or left. A character may even be its own "mate".

The pattern responsible for matching and building the growing suffix is the following:

((.) \1?)
    |\_/    |
    | 2     |       "suffix := (.) + suffix
    |_______|                    or just (.) if there's no suffix"
        1

Note that within the definition of group 1 is a reference to itself (with \1), though it is optional (with ?). The optional part provides the "base case", a way for the group to match without the reference to itself. This is required because an attempt to match a group reference always fails when the group hasn't captured anything yet.

Once group 1 captures something, the optional part is never exercised in our setup, since the suffix that we just captured last time will still be there this time, and we can always prepend another character to the beginning of this suffix with (.). This prepended character is captured into group 2.

Thus this pattern attempts to grow the suffix by one dot. Repeating this once forEachDotBehind will therefore results in a suffix whose length is exactly the length of the prefix up to our current position.

Note that so far we've treated assertSuffix and forEachDotBehind as blackboxes. In fact, leaving this discussion for last is a deliberate act: the names and the brief documentation suggest WHAT they do, and this was enough information for us to write and read our REVERSE pattern!

Upon closer inspection, we see that the Java and C# implementations of these abstractions slightly differ. This is due to the differences between the two regex engines.

The .NET regex engine allows full regular expression in a lookbehind, so these meta-patterns look a lot more natural in that flavor.

  • AssertSuffix(pattern) := (?=.*$(?<=pattern)), i.e. we use a lookahead to go all the way to the end of the string, then use a nested lookbehind to match the pattern against a suffix.
  • ForEachDotBehind(assertion) := (?<=(?:.assertion)*), i.e. we simply match .* in a lookbehind, tagging the assertion along with the dot inside a non-capturing group.

Since Java's doesn't officially support infinite-length lookbehind (but it works anyway under certain circumstances), its counterpart is a bit more awkward:

  • assertSuffix(pattern) := (?<=(?=^.*?pattern$).*), i.e. we use a lookbehind to go all the way to the beginning of the string, then use a nested lookahead to match the entire string, prepending the suffix pattern with .*? to reluctantly match some irrelevant prefix.
  • forEachDotBehind(assertion) := (?<=^(?:.assertion)*?), i.e. we use an anchored lookbehind with reluctant repetition, i.e. ^.*? (and likewise tagging the assertion along with the dot inside a non-capturing group).

It should be noted that while the C# implementation of these meta-patterns doesn't work in Java, the Java implementation DOES work in C# (see on ideone.com). Thus, there is no actual need to have different implementations for C# and Java, but the C# implementation deliberately took advantage of the more powerful .NET regex engine lookbehind support to express the patterns more naturally.

We have thus shown the benefits of using meta-pattern abstractions:

  • Meta-patterns promote reuse, and programmatic generation means there's less duplication

While this particular manifestation of the concept is rather primitive, it's also possible to take this further and develop a more robust programmatic pattern generation framework, with a library of well-tested and optimized meta-patterns.

It needs to be reiterated that reversing a string with regex is NOT a good idea in practice. It's way more complicated than necessary, and the performance is quite poor.

That said, this article shows that it CAN in fact be done, and that when expressed at higher levels using meta-pattern abstractions, the solution is in fact quite readable. As a key component of the solution, the nested reference is showcased once again in what is hopefully another engaging example.

Less tangibly, perhaps the article also shows the determination required to solve a problem that may seem difficult (or even "impossible") at first. Perhaps it also shows the clarity of thought that comes with a deeper understanding of a subject matter, a result of numerous studies and hard work.

No doubt regex can be an intimidating subject, and certainly it's not designed to solve all of your problems. This is no excuse for hateful ignorance, however, and this is one surprisingly deep well of knowledge if you're willing to learn.

c# - How does this regex replacement reverse a string? - Stack Overflo...

c# java regex lookaround nested-reference
Rectangle 27 7

How it works: a simpler example

At a high level, the pattern matches any one character ., but additionally performs a grab$2 action, which captures the reversal "mate" of the character that was matched into group 2. This capture is done by building a suffix of the input string whose length matches the length of the prefix up to the current position. We do this by applying assertSuffix on a pattern that grows the suffix by one character, repeating this once forEachDotBehind. Group 1 captures this suffix. The first character of that suffix, captured in group 2, is the reversal "mate" for the character that was matched.

Thus, replacing each matched character with its "mate" has the effect of reversing a string.

To better understand how the regex pattern works, let's first apply it on a simpler input. Also, for our replacement pattern, we'll just "dump" out all the captured strings so we get a better idea of what's going on. Here's a Java version:

System.out.println(
    "123456789"
        .replaceAll(REVERSE, "[$0; $1; $2]\n")
);
[1; 9; 9]
[2; 89; 8]
[3; 789; 7]
[4; 6789; 6]
[5; 56789; 5]
[6; 456789; 4]
[7; 3456789; 3]
[8; 23456789; 2]
[9; 123456789; 1]

Thus, e.g. [3; 789; 7] means that the dot matched 3 (captured in group 0), the corresponding suffix is 789 (group 1), whose first character is 7 (group 2). Note that 7 is 3's "mate".

current position after
                      the dot matched 3
                                      ________
                      1  2 [3] 4  5  6 (7) 8  9
                      \______/         \______/
                       3 dots        corresponding
                       behind      suffix of length 3

Note that a character's "mate" may be to its right or left. A character may even be its own "mate".

The pattern responsible for matching and building the growing suffix is the following:

((.) \1?)
    |\_/    |
    | 2     |       "suffix := (.) + suffix
    |_______|                    or just (.) if there's no suffix"
        1

Note that within the definition of group 1 is a reference to itself (with \1), though it is optional (with ?). The optional part provides the "base case", a way for the group to match without the reference to itself. This is required because an attempt to match a group reference always fails when the group hasn't captured anything yet.

Once group 1 captures something, the optional part is never exercised in our setup, since the suffix that we just captured last time will still be there this time, and we can always prepend another character to the beginning of this suffix with (.). This prepended character is captured into group 2.

Thus this pattern attempts to grow the suffix by one dot. Repeating this once forEachDotBehind will therefore results in a suffix whose length is exactly the length of the prefix up to our current position.

Note that so far we've treated assertSuffix and forEachDotBehind as blackboxes. In fact, leaving this discussion for last is a deliberate act: the names and the brief documentation suggest WHAT they do, and this was enough information for us to write and read our REVERSE pattern!

Upon closer inspection, we see that the Java and C# implementations of these abstractions slightly differ. This is due to the differences between the two regex engines.

The .NET regex engine allows full regular expression in a lookbehind, so these meta-patterns look a lot more natural in that flavor.

  • AssertSuffix(pattern) := (?=.*$(?<=pattern)), i.e. we use a lookahead to go all the way to the end of the string, then use a nested lookbehind to match the pattern against a suffix.
  • ForEachDotBehind(assertion) := (?<=(?:.assertion)*), i.e. we simply match .* in a lookbehind, tagging the assertion along with the dot inside a non-capturing group.

Since Java's doesn't officially support infinite-length lookbehind (but it works anyway under certain circumstances), its counterpart is a bit more awkward:

  • assertSuffix(pattern) := (?<=(?=^.*?pattern$).*), i.e. we use a lookbehind to go all the way to the beginning of the string, then use a nested lookahead to match the entire string, prepending the suffix pattern with .*? to reluctantly match some irrelevant prefix.
  • forEachDotBehind(assertion) := (?<=^(?:.assertion)*?), i.e. we use an anchored lookbehind with reluctant repetition, i.e. ^.*? (and likewise tagging the assertion along with the dot inside a non-capturing group).

It should be noted that while the C# implementation of these meta-patterns doesn't work in Java, the Java implementation DOES work in C# (see on ideone.com). Thus, there is no actual need to have different implementations for C# and Java, but the C# implementation deliberately took advantage of the more powerful .NET regex engine lookbehind support to express the patterns more naturally.

We have thus shown the benefits of using meta-pattern abstractions:

  • Meta-patterns promote reuse, and programmatic generation means there's less duplication

While this particular manifestation of the concept is rather primitive, it's also possible to take this further and develop a more robust programmatic pattern generation framework, with a library of well-tested and optimized meta-patterns.

It needs to be reiterated that reversing a string with regex is NOT a good idea in practice. It's way more complicated than necessary, and the performance is quite poor.

That said, this article shows that it CAN in fact be done, and that when expressed at higher levels using meta-pattern abstractions, the solution is in fact quite readable. As a key component of the solution, the nested reference is showcased once again in what is hopefully another engaging example.

Less tangibly, perhaps the article also shows the determination required to solve a problem that may seem difficult (or even "impossible") at first. Perhaps it also shows the clarity of thought that comes with a deeper understanding of a subject matter, a result of numerous studies and hard work.

No doubt regex can be an intimidating subject, and certainly it's not designed to solve all of your problems. This is no excuse for hateful ignorance, however, and this is one surprisingly deep well of knowledge if you're willing to learn.

c# - How does this regex replacement reverse a string? - Stack Overflo...

c# java regex lookaround nested-reference
Rectangle 27 0

You might want to remove the accents and diacritic signs first, then on each character position check if the "simplified" string is an ascii letter - if it is, the original position shall contain word characters, if not, it can be removed.

Class java.text.Normalizer is not supported before android API level 9, so if your app must be compatible with API level 8 (13% of total devices, according to Google's Android dashboard), this method is not viable

regex - Remove all non-"word characters" from a String in Java, leavin...

java regex string
Rectangle 27 0

You can also unconditionally add the delimiter string, and after the loop remove the extra delimiter at the end. Then an "if list is empty then return this string" at the beginning will allow you to avoid the check at the end (as you cannot remove characters from an empty list)

So the question really is:

"Given a loop and an if, what do you think is the clearest way to have these together?"

pretty print - Clearest way to comma-delimit a list (Java)? - Stack Ov...

java pretty-print
Rectangle 27 0

You could read the lenght all hashes are exactly the same lenght provided the same algorithm is used. 22 or 32 or 53 depending on your implementation. If in Java 53 is used. To make this more reliable you could also detect that the first character is $ and the whole string should be 53 characters. Positions 3 and 6 also contain $. There is other factors as well that can be checked such as the work factor being the same. This is represented by the position 1 and 2 the combination of all this and a verification to make sure that the user doesn't input something like that. If this is not viable creating a instance boolean that is set to true when the password is hashed but requires that each password be it's own object.

Yes, I thought of that but as said by @Hexaholic, this is not 100% sure since a user can have a password that fit exactly these requirements.

java - Check if a string has been hashed with BCrypt or not - Stack Ov...

java hash bcrypt jbcrypt
Rectangle 27 0

Simple solution with one escaping regex only

You may use the if (s.startsWith("\"") && s.endsWith("\"")) to check if a string has both leading and trailing ", and if it does, you can then trim out the leading and trailing " with replaceAll("^\"|\"$", ""), then escape using your escaping regex, and then add " back. Else, just escape the characters in your set.

Here is how I would do that with one regex using an alternation:

String SPECIAL_REGEX_CHARS = "[()'\"\\[\\]*]";
//String s = "\"te(st\""; // => "te\(st"
//String s = "te(st"; // => te\(st
String s = "te\"st"; // => te\"st
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile("(?s)\"(.*)\"|(.*)").matcher(s);
if (m.matches()) {
    if (m.group(1) == null) { // we have no quotes around
        m.appendReplacement(result, m.group(2).replaceAll(SPECIAL_REGEX_CHARS, "\\\\\\\\$0"));
    }
    else {
        m.appendReplacement(result, "\"" + m.group(1).replaceAll(SPECIAL_REGEX_CHARS, "\\\\\\\\$0") + "\"");
    }
}
m.appendTail(result);
System.out.println(result.toString());
  • The Matcher#addReplacement() with Matcher#appendTail() allow manipulating groups.
  • Using (?s)\"(.*)\"|(.*) regex with 2 alternative branches: ".*" matching a string starting with " and ending with " (note that (?s) is a DOTALL inline modifier allowing matching strings with newline sequences) or a .* alternative just matching all other strings.
  • If the 1st alternative is matched, we just replace the selected special characters in the first capture group, and then add the " on both ends.
  • If the second alternative is matched, just add the escaping symbol in the whole Group 2.
  • To replace with a literal backslash, you need \\\\\\\\ in the replacement pattern.

You know, "simplified" may mean different things for different people :) I think this is already simple taking into account your requirements. If anything is unclear, please ask.

I have added a "simpler" solution with minimal regex.

java - Escape special characters in a text when text is either enclose...

java regex