Rectangle 27 0

How do I create a Java string from the contents of a file?


List<String> lines = Files.readAllLines(Paths.get(path), encoding);
String content = readFile("test.txt", Charset.defaultCharset());
String content = readFile("test.txt", StandardCharsets.UTF_8);
static String readFile(String path, Charset encoding) 
  throws IOException 
{
  byte[] encoded = Files.readAllBytes(Paths.get(path));
  return new String(encoded, encoding);
}

@Sbastien Nussbaumer: I also bumped on this problem. Amazing that the bug has been marked "Will Not Fix". This essentially means that FileChannel#map is, in general, unusable.

@Sbastien Nussbaumer: The bug has been deleted from the Oracle / Sun Bug Database: "This bug is not available." Google cached the site at webcache.googleusercontent.com/search?q=cache:bugs.sun.com/

For reading large files, you need a different design for your program, one that reads a chunk of text from a stream, processes it, and then moves on to the next, reusing the same fixed-sized memory block. Here, "large" depends on the computer specs. Nowadays, this threshold might be many gigabytes of RAM.

Here's a compact, robust idiom for Java 7, wrapped up in a utility method:

Java 7 added a convenience method to read a file as lines of text, represented as a List<String>. This approach is "lossy" because the line separators are stripped from the end of each line.

Note : after exercising a bit that code, I found out that you can't reliably delete the file right after reading it with this method, which may be a non issue in some case, but not mine. May it be in relation with this issue : bugs.sun.com/bugdatabase/view_bug.do?bug_id=4715154 ? I finally went with the proposition of Jon Skeet which doesn't suffer from this bug. Anyways, I just wanted to give the info, for other people, just in case...

Note: This answer largely replaces my Java 6 version. The utility of Java 7 safely simplifies the code, and the old answer, which used a mapped byte buffer, prevented the file that was read from being deleted until the mapped buffer was garbage collected. You can view the old version via the "edited" link on this answer.

One thing that is missing from the sample in the original post is the character encoding. There are some special cases where the platform default is what you want, but they are rare, and you should be able justify your choice.

Possible typo? NIO has a Charset (not CharSet) class called java.nio.charset.Charset. Is this what CharSet should have been?

Technically speaking, it's O(n) in time and space. Qualitatively, due the immutability requirement of Strings, it's pretty hard on memory; temporarily there are two copies of the char data in memory, plus the room for the encoded bytes. Assuming some single-byte encoding, it will (temporarily) require 5 bytes of memory for each character in the file. Since the question asks specifically for a String, that's what I show, but if you can work with the CharBuffer returned by "decode", the memory requirement is much less. Time-wise, I don't think you'll find anything faster in the core Java libs.

The StandardCharsets class define some constants for the encodings required of all Java runtimes:

The first method, that preserves line breaks, can temporarily require memory several times the size of the file, because for a short time the raw file contents (a byte array), and the decoded characters (each of which is 16 bits even if encoded as 8 bits in the file) reside in memory at once. It is safest to apply to files that you know to be small relative to the available memory.

The platform default is available from the Charset class itself:

The second method, reading lines, is usually more memory efficient, because the input byte buffer for decoding doesn't need to contain the entire file. However, it's still not suitable for files that are very large relative to available memory.

Note
Rectangle 27 0

How do I create a Java string from the contents of a file?


List<String> lines = Files.readAllLines(Paths.get(path), encoding);
String content = readFile("test.txt", Charset.defaultCharset());
String content = readFile("test.txt", StandardCharsets.UTF_8);
static String readFile(String path, Charset encoding) 
  throws IOException 
{
  byte[] encoded = Files.readAllBytes(Paths.get(path));
  return new String(encoded, encoding);
}

@Sbastien Nussbaumer: I also bumped on this problem. Amazing that the bug has been marked "Will Not Fix". This essentially means that FileChannel#map is, in general, unusable.

@Sbastien Nussbaumer: The bug has been deleted from the Oracle / Sun Bug Database: "This bug is not available." Google cached the site at webcache.googleusercontent.com/search?q=cache:bugs.sun.com/

For reading large files, you need a different design for your program, one that reads a chunk of text from a stream, processes it, and then moves on to the next, reusing the same fixed-sized memory block. Here, "large" depends on the computer specs. Nowadays, this threshold might be many gigabytes of RAM.

Here's a compact, robust idiom for Java 7, wrapped up in a utility method:

Java 7 added a convenience method to read a file as lines of text, represented as a List<String>. This approach is "lossy" because the line separators are stripped from the end of each line.

Note : after exercising a bit that code, I found out that you can't reliably delete the file right after reading it with this method, which may be a non issue in some case, but not mine. May it be in relation with this issue : bugs.sun.com/bugdatabase/view_bug.do?bug_id=4715154 ? I finally went with the proposition of Jon Skeet which doesn't suffer from this bug. Anyways, I just wanted to give the info, for other people, just in case...

Note: This answer largely replaces my Java 6 version. The utility of Java 7 safely simplifies the code, and the old answer, which used a mapped byte buffer, prevented the file that was read from being deleted until the mapped buffer was garbage collected. You can view the old version via the "edited" link on this answer.

One thing that is missing from the sample in the original post is the character encoding. There are some special cases where the platform default is what you want, but they are rare, and you should be able justify your choice.

Possible typo? NIO has a Charset (not CharSet) class called java.nio.charset.Charset. Is this what CharSet should have been?

Technically speaking, it's O(n) in time and space. Qualitatively, due the immutability requirement of Strings, it's pretty hard on memory; temporarily there are two copies of the char data in memory, plus the room for the encoded bytes. Assuming some single-byte encoding, it will (temporarily) require 5 bytes of memory for each character in the file. Since the question asks specifically for a String, that's what I show, but if you can work with the CharBuffer returned by "decode", the memory requirement is much less. Time-wise, I don't think you'll find anything faster in the core Java libs.

The StandardCharsets class define some constants for the encodings required of all Java runtimes:

The first method, that preserves line breaks, can temporarily require memory several times the size of the file, because for a short time the raw file contents (a byte array), and the decoded characters (each of which is 16 bits even if encoded as 8 bits in the file) reside in memory at once. It is safest to apply to files that you know to be small relative to the available memory.

The platform default is available from the Charset class itself:

The second method, reading lines, is usually more memory efficient, because the input byte buffer for decoding doesn't need to contain the entire file. However, it's still not suitable for files that are very large relative to available memory.

Note
Rectangle 27 0

How do I create a Java string from the contents of a file?


List<String> lines = Files.readAllLines(Paths.get(path), encoding);
String content = readFile("test.txt", Charset.defaultCharset());
String content = readFile("test.txt", StandardCharsets.UTF_8);
static String readFile(String path, Charset encoding) 
  throws IOException 
{
  byte[] encoded = Files.readAllBytes(Paths.get(path));
  return new String(encoded, encoding);
}
try (BufferedReader r = Files.newBufferedReader(path, encoding)) {
  r.lines().forEach(System.out::println);
}

@Sbastien Nussbaumer: I also bumped on this problem. Amazing that the bug has been marked "Will Not Fix". This essentially means that FileChannel#map is, in general, unusable.

@Sbastien Nussbaumer: The bug has been deleted from the Oracle / Sun Bug Database: "This bug is not available." Google cached the site at webcache.googleusercontent.com/search?q=cache:bugs.sun.com/

For reading large files, you need a different design for your program, one that reads a chunk of text from a stream, processes it, and then moves on to the next, reusing the same fixed-sized memory block. Here, "large" depends on the computer specs. Nowadays, this threshold might be many gigabytes of RAM. The third method, using a Stream<String> is one way to do this, if your input "records" happen to be individual lines. (Using the readLine() method of BufferedReader is the procedural equivalent to this approach.)

Here's a compact, robust idiom for Java 7, wrapped up in a utility method:

In Java 8, BufferedReader added a new method, lines() to produce a Stream<String>. If an IOException is encountered while reading the file, it is wrapped in an UncheckedIOException, since Stream doesn't accept lambdas that throw checked exceptions.

Java 7 added a convenience method to read a file as lines of text, represented as a List<String>. This approach is "lossy" because the line separators are stripped from the end of each line.

Note : after exercising a bit that code, I found out that you can't reliably delete the file right after reading it with this method, which may be a non issue in some case, but not mine. May it be in relation with this issue : bugs.sun.com/bugdatabase/view_bug.do?bug_id=4715154 ? I finally went with the proposition of Jon Skeet which doesn't suffer from this bug. Anyways, I just wanted to give the info, for other people, just in case...

Note: This answer largely replaces my Java 6 version. The utility of Java 7 safely simplifies the code, and the old answer, which used a mapped byte buffer, prevented the file that was read from being deleted until the mapped buffer was garbage collected. You can view the old version via the "edited" link on this answer.

One thing that is missing from the sample in the original post is the character encoding. There are some special cases where the platform default is what you want, but they are rare, and you should be able justify your choice.

Possible typo? NIO has a Charset (not CharSet) class called java.nio.charset.Charset. Is this what CharSet should have been?

Technically speaking, it's O(n) in time and space. Qualitatively, due the immutability requirement of Strings, it's pretty hard on memory; temporarily there are two copies of the char data in memory, plus the room for the encoded bytes. Assuming some single-byte encoding, it will (temporarily) require 5 bytes of memory for each character in the file. Since the question asks specifically for a String, that's what I show, but if you can work with the CharBuffer returned by "decode", the memory requirement is much less. Time-wise, I don't think you'll find anything faster in the core Java libs.

The StandardCharsets class define some constants for the encodings required of all Java runtimes:

The first method, that preserves line breaks, can temporarily require memory several times the size of the file, because for a short time the raw file contents (a byte array), and the decoded characters (each of which is 16 bits even if encoded as 8 bits in the file) reside in memory at once. It is safest to apply to files that you know to be small relative to the available memory.

The platform default is available from the Charset class itself:

The second method, reading lines, is usually more memory efficient, because the input byte buffer for decoding doesn't need to contain the entire file. However, it's still not suitable for files that are very large relative to available memory.

Note
Rectangle 27 0

How do I create a Java string from the contents of a file?


List<String> lines = Files.readAllLines(Paths.get(path), encoding);
String content = readFile("test.txt", Charset.defaultCharset());
String content = readFile("test.txt", StandardCharsets.UTF_8);
static String readFile(String path, Charset encoding) 
  throws IOException 
{
  byte[] encoded = Files.readAllBytes(Paths.get(path));
  return new String(encoded, encoding);
}
try (BufferedReader r = Files.newBufferedReader(path, encoding)) {
  r.lines().forEach(System.out::println);
}

@Sbastien Nussbaumer: I also bumped on this problem. Amazing that the bug has been marked "Will Not Fix". This essentially means that FileChannel#map is, in general, unusable.

@Sbastien Nussbaumer: The bug has been deleted from the Oracle / Sun Bug Database: "This bug is not available." Google cached the site at webcache.googleusercontent.com/search?q=cache:bugs.sun.com/

For reading large files, you need a different design for your program, one that reads a chunk of text from a stream, processes it, and then moves on to the next, reusing the same fixed-sized memory block. Here, "large" depends on the computer specs. Nowadays, this threshold might be many gigabytes of RAM. The third method, using a Stream<String> is one way to do this, if your input "records" happen to be individual lines. (Using the readLine() method of BufferedReader is the procedural equivalent to this approach.)

Here's a compact, robust idiom for Java 7, wrapped up in a utility method:

In Java 8, BufferedReader added a new method, lines() to produce a Stream<String>. If an IOException is encountered while reading the file, it is wrapped in an UncheckedIOException, since Stream doesn't accept lambdas that throw checked exceptions.

Java 7 added a convenience method to read a file as lines of text, represented as a List<String>. This approach is "lossy" because the line separators are stripped from the end of each line.

Note : after exercising a bit that code, I found out that you can't reliably delete the file right after reading it with this method, which may be a non issue in some case, but not mine. May it be in relation with this issue : bugs.sun.com/bugdatabase/view_bug.do?bug_id=4715154 ? I finally went with the proposition of Jon Skeet which doesn't suffer from this bug. Anyways, I just wanted to give the info, for other people, just in case...

Note: This answer largely replaces my Java 6 version. The utility of Java 7 safely simplifies the code, and the old answer, which used a mapped byte buffer, prevented the file that was read from being deleted until the mapped buffer was garbage collected. You can view the old version via the "edited" link on this answer.

One thing that is missing from the sample in the original post is the character encoding. There are some special cases where the platform default is what you want, but they are rare, and you should be able justify your choice.

Possible typo? NIO has a Charset (not CharSet) class called java.nio.charset.Charset. Is this what CharSet should have been?

Technically speaking, it's O(n) in time and space. Qualitatively, due the immutability requirement of Strings, it's pretty hard on memory; temporarily there are two copies of the char data in memory, plus the room for the encoded bytes. Assuming some single-byte encoding, it will (temporarily) require 5 bytes of memory for each character in the file. Since the question asks specifically for a String, that's what I show, but if you can work with the CharBuffer returned by "decode", the memory requirement is much less. Time-wise, I don't think you'll find anything faster in the core Java libs.

The StandardCharsets class define some constants for the encodings required of all Java runtimes:

The first method, that preserves line breaks, can temporarily require memory several times the size of the file, because for a short time the raw file contents (a byte array), and the decoded characters (each of which is 16 bits even if encoded as 8 bits in the file) reside in memory at once. It is safest to apply to files that you know to be small relative to the available memory.

The platform default is available from the Charset class itself:

The second method, reading lines, is usually more memory efficient, because the input byte buffer for decoding doesn't need to contain the entire file. However, it's still not suitable for files that are very large relative to available memory.

Note
Rectangle 27 0

How do I create a Java string from the contents of a file?


public static String readFileAsString(String filePath) throws IOException {
    DataInputStream dis = new DataInputStream(new FileInputStream(filePath));
    try {
        long len = new File(filePath).length();
        if (len > Integer.MAX_VALUE) throw new IOException("File "+filePath+" too large, was "+len+" bytes.");
        byte[] bytes = new byte[(int) len];
        dis.readFully(bytes);
        return new String(bytes, "UTF-8");
    } finally {
        dis.close();
    }
}

To read a File as binary and convert at the end

Note
Rectangle 27 0

How do I create a Java string from the contents of a file?


Agree that Java is long on high-level abstractions but short on convenience methods

Java attempts to be extremely general and flexible in all it does. As a result, something which is relatively simple in a scripting language (your code would be replaced with "open(file).read()" in python) is a lot more complicated. There doesn't seem to be any shorter way of doing it, except using an external library (like Willi aus Rohr mentioned). Your options:

True, Java has an insane number of ways of dealing with Files and many of them seem complicated. But this is fairly close to what we have in higher level languages: byte[] bytes = Files.readAllBytes(someFile.toPath());

Yeap. It makes the "high" level language take a different meaning. Java is high level compared with C but low compared with Python or Ruby

Your best bet is probably the 2nd one, as it has the least dependencies.

Note
Rectangle 27 0

How do I create a Java string from the contents of a file?


Agree that Java is long on high-level abstractions but short on convenience methods

Java attempts to be extremely general and flexible in all it does. As a result, something which is relatively simple in a scripting language (your code would be replaced with "open(file).read()" in python) is a lot more complicated. There doesn't seem to be any shorter way of doing it, except using an external library (like Willi aus Rohr mentioned). Your options:

True, Java has an insane number of ways of dealing with Files and many of them seem complicated. But this is fairly close to what we have in higher level languages: byte[] bytes = Files.readAllBytes(someFile.toPath());

Yeap. It makes the "high" level language take a different meaning. Java is high level compared with C but low compared with Python or Ruby

Your best bet is probably the 2nd one, as it has the least dependencies.

Note
Rectangle 27 0

How do I create a Java string from the contents of a file?


FileUtils#readFileToString
FileUtils.readFileToString
public static String readFileToString(File file)
                       throws IOException
public static long copyLarge(InputStream input, OutputStream output)
       throws IOException {
   byte[] buffer = new byte[DEFAULT_BUFFER_SIZE];
   long count = 0;
   int n = 0;
   while (-1 != (n = input.read(buffer))) {
       output.write(buffer, 0, n);
       count += n;
   }
   return count;
}
readFileToString(File file)
readFileToString(File file,Charset encoding)
  • file - the file to read, must not be null

@Guillaume: The biggest question is whether you're comfortable having a dependency on a 3rd party library. If you do have Commons IO or Guava in your project, then use that (just for code simplicity; otherwise there likely won't be a noticeable difference).

I'm using FileUtils too, but I'm wondering what is better betwwen using FileUtils or the accepted nio answer?

It's in the class org.apache.commons.io.FileUtils

Reads the contents of a file into a String using the default encoding for the VM. The file is always closed.

Returns: the file contents, never null

The code used (indirectly) by that class is:

Throws: - IOException - in case of an I/O error

Note
Rectangle 27 0

How do I create a Java string from the contents of a file?


Scanner scanner = new Scanner( new File("poem.txt"), "UTF-8" );
String text = scanner.useDelimiter("\\A").next();
scanner.close(); // Put this call in a finally block
java.util.NoSuchElementException

As the poster, I can say I really don't know if and when the file is properly close...I never write this one in production code, I use it only for tests or debug.

From this page a very lean solution:

Scanner implements Closeable (it invokes close on the source) - so while elegant it shouldn't really be a one-liner. The default size of the buffer is 1024, but Scanner will increase the size as necessary (see Scanner#makeSpace())

\\A works because there is no "other beginning of file", so you are in fact read the last token...which is also the first. Never tried with \\Z. Also note you can read anything that is Readable , like Files, InputStreams, channels...I sometimes use this code to read from the display window of eclipse, when I'm not sure if I'm reading one file or another...yes, classpath confuses me.

Note
Rectangle 27 0

How do I create a Java string from the contents of a file?


FileUtils#readFileToString
FileUtils.readFileToString
public static String readFileToString(File file)
                       throws IOException
public static long copyLarge(InputStream input, OutputStream output)
       throws IOException {
   byte[] buffer = new byte[DEFAULT_BUFFER_SIZE];
   long count = 0;
   int n = 0;
   while (-1 != (n = input.read(buffer))) {
       output.write(buffer, 0, n);
       count += n;
   }
   return count;
}
readFileToString(File file)
readFileToString(File file,Charset encoding)
  • file - the file to read, must not be null

@Guillaume: The biggest question is whether you're comfortable having a dependency on a 3rd party library. If you do have Commons IO or Guava in your project, then use that (just for code simplicity; otherwise there likely won't be a noticeable difference).

I'm using FileUtils too, but I'm wondering what is better betwwen using FileUtils or the accepted nio answer?

It's in the class org.apache.commons.io.FileUtils

Reads the contents of a file into a String using the default encoding for the VM. The file is always closed.

Returns: the file contents, never null

The code used (indirectly) by that class is:

Throws: - IOException - in case of an I/O error

Note