Rectangle 27 306

It's important to understand that there are two aspects to thread safety: (1) execution control, and (2) memory visibility. The first has to do with controlling when code executes (including the order in which instructions are executed) and whether it can execute concurrently, and the second to do with when the effects in memory of what has been done are visible to other threads. Because each CPU has several levels of cache between it and main memory, threads running on different CPUs or cores can see "memory" differently at any given moment in time because threads are permitted to obtain and work on private copies of main memory.

Using synchronized prevents any other thread from obtaining the monitor (or lock) for the same object, thereby preventing all code blocks protected by synchronization on the same object from executing concurrently. Synchronization also creates a "happens-before" memory barrier, causing a memory visibility constraint such that anything done up to the point some thread releases a lock appears to another thread subsequently acquiring the same lock to have happened before it acquired the lock. In practical terms, on current hardware, this typically causes flushing of the CPU caches when a monitor is acquired and writes to main memory when it is released, both of which are (relatively) expensive.

Using volatile, on the other hand, forces all accesses (read or write) to the volatile variable to occur to main memory, effectively keeping the volatile variable out of CPU caches. This can be useful for some actions where it is simply required that visibility of the variable be correct and order of accesses is not important. Using volatile also changes treatment of long and double to require accesses to them to be atomic; on some (older) hardware this might require locks, though not on modern 64 bit hardware. Under the new (JSR-133) memory model for Java 5+, the semantics of volatile have been strengthened to be almost as strong as synchronized with respect to memory visibility and instruction ordering (see http://www.cs.umd.edu/users/pugh/java/memoryModel/jsr-133-faq.html#volatile). For the purposes of visibility, each access to a volatile field acts like half a synchronization.

Under the new memory model, it is still true that volatile variables cannot be reordered with each other. The difference is that it is now no longer so easy to reorder normal field accesses around them. Writing to a volatile field has the same memory effect as a monitor release, and reading from a volatile field has the same memory effect as a monitor acquire. In effect, because the new memory model places stricter constraints on reordering of volatile field accesses with other field accesses, volatile or not, anything that was visible to thread A when it writes to volatile field f becomes visible to thread B when it reads f.

So, now both forms of memory barrier (under the current JMM) cause an instruction re-ordering barrier which prevents the compiler or run-time from re-ordering instructions across the barrier. In the old JMM, volatile did not prevent re-ordering. This can be important, because apart from memory barriers the only limitation imposed is that, for any particular thread, the net effect of the code is the same as it would be if the instructions were executed in precisely the order in which they appear in the source.

One use of volatile is for a shared but immutable object is recreated on the fly, with many other threads taking a reference to the object at a particular point in their execution cycle. One needs the other threads to begin using the recreated object once it is published, but does not need the additional overhead of full synchronization and it's attendant contention and cache flushing.

I just yesterday had some code where a shared but immutable object is recreated on the fly, with many other threads taking a reference to the object at a particular point in their execution cycle (at the start of handling a message) - volatile is perfect for that situation. I needed the other threads to begin using the recreated object as soon as it was published, but did not need the additional overhead of full synchronization and it's attendant contention and cache flushing.

// Declaration
public class SharedLocation {
    static public SomeObject someObject=new SomeObject(); // default object
    }

// Publishing code
// Note: do not simply use SharedLocation.someObject.xxx(), since although
//       someObject will be internally consistent for xxx(), a subsequent 
//       call to yyy() might be inconsistent with xxx() if the object was 
//       replaced in between calls.
SharedLocation.someObject=new SomeObject(...); // new object is published

// Using code
private String getError() {
    SomeObject myCopy=SharedLocation.someObject; // gets current copy
    ...
    int cod=myCopy.getErrorCode();
    String txt=myCopy.getErrorText();
    return (cod+" - "+txt);
    }
// And so on, with myCopy always in a consistent state within and across calls
// Eventually we will return to the code that gets the current SomeObject.

Speaking to your read-update-write question, specifically. Consider the following unsafe code:

public void updateCounter() {
    if(counter==1000) { counter=0; }
    else              { counter++; }
    }

Now, with the updateCounter() method unsynchronized, two threads may enter it at the same time. Among the many permutations of what could happen, one is that thread-1 does the test for counter==1000 and finds it true and is then suspended. Then thread-2 does the same test and also sees it true and is suspended. Then thread-1 resumes and sets counter to 0. Then thread-2 resumes and again sets counter to 0 because it missed the update from thread-1. This can also happen even if thread switching does not occur as I have described, but simply because two different cached copies of counter were present in two different CPU cores and the threads each ran on a separate core. For that matter, one thread could have counter at one value and the other could have counter at some entirely different value just because of caching.

What's important in this example is that the variable counter was read from main memory into cache, updated in cache and only written back to main memory at some indeterminate point later when a memory barrier occurred or when the cache memory was needed for something else. Making the counter volatile is insufficient for thread-safety of this code, because the test for the maximum and the assignments are discrete operations, including the increment which is a set of non-atomic read+increment+write machine instructions, something like:

MOV EAX,counter
INC EAX
MOV counter,EAX

Volatile variables are useful only when all operations performed on them are "atomic", such as my example where a reference to a fully formed object is only read or written (and, indeed, typically it's only written from a single point). Another example would be a volatile array reference backing a copy-on-write list, provided the array was only read by first taking a local copy of the reference to it.

Thanks very much! The example with the counter is simple to understand. However, when things get real, it's a bit different.

Nice answer and a special +1 for asm!

"In practical terms, on current hardware, this typically causes flushing of the CPU caches when a monitor is acquired and writes to main memory when it is released, both of which are expensive (relatively speaking)." . When you say CPU caches, is it the same as Java Stacks local to each thread? or does a thread has its own local version of Heap? Apologize if i am being silly here.

@nishm It's not the same, but it would include the local caches of the threads involved. .

@MarianPadzioch: An increment or decrement is NOT a read or a write, it's a read and a write; it's a read into a register, then a register increment, then a write back to memory. Reads and writes are individually atomic, but multiple such operations are not.

So, according to the FAQ, not only the actions made since a lock acquisition are made visible after unlock, but all actions made by that thread are made visible. Even actions made before the lock acquisition.

multithreading - Difference between volatile and synchronized in Java ...

java multithreading java-me synchronized volatile
Rectangle 27 58

The standard way to do this is as follows:

/**
 * Returns a psuedo-random number between min and max, inclusive.
 * The difference between min and max can be at most
 * <code>Integer.MAX_VALUE - 1</code>.
 *
 * @param min Minimim value
 * @param max Maximim value.  Must be greater than min.
 * @return Integer between min and max, inclusive.
 * @see java.util.Random#nextInt(int)
 */
public static int randInt(int min, int max) {

    // Usually this can be a field rather than a method variable
    Random rand = new Random();

    // nextInt is normally exclusive of the top value,
    // so add 1 to make it inclusive
    int randomNum = rand.nextInt((max - min) + 1) + min;

    return randomNum;
}

As explained by Aurund, Random objects created within a short time of each other will tend to produce similar output, so it would be a good idea to keep the created Random object as a field, rather than in the method as I have done (for explanation purposes only).

Random rand = new Random(); I would go so far as to say that it must be a field. Random objects created within a short time of each other will tend to produce similar output. So many calls to randInt within a short period of time will not give evenly distributed output.

Generating a Random Number between 1 and 10 Java - Stack Overflow

java random
Rectangle 27 83

volatile is a field modifier, while synchronized modifies code blocks and methods. So we can specify three variations of a simple accessor using those two keywords:

int i1;
     int geti1() {return i1;}

     volatile int i2;
     int geti2() {return i2;}

     int i3;
     synchronized int geti3() {return i3;}
int i1;
    int geti1() {return i1;}

    volatile int i2;
    int geti2() {return i2;}

    int i3;
    synchronized int geti3() {return i3;}

geti1() accesses the value currently stored in i1 in the current thread. Threads can have local copies of variables, and the data does not have to be the same as the data held in other threads.In particular, another thread may have updated i1 in it's thread, but the value in the current thread could be different from that updated value. In fact Java has the idea of a "main" memory, and this is the memory that holds the current "correct" value for variables. Threads can have their own copy of data for variables, and the thread copy can be different from the "main" memory. So in fact, it is possible for the "main" memory to have a value of 1 for i1, for thread1 to have a value of 2 for i1 and for thread2 to have a value of 3 for i1 if thread1 and thread2 have both updated i1 but those updated value has not yet been propagated to "main" memory or other threads.

On the other hand, geti2() effectively accesses the value of i2 from "main" memory. A volatile variable is not allowed to have a local copy of a variable that is different from the value currently held in "main" memory. Effectively, a variable declared volatile must have it's data synchronized across all threads, so that whenever you access or update the variable in any thread, all other threads immediately see the same value. Generally volatile variables have a higher access and update overhead than "plain" variables. Generally threads are allowed to have their own copy of data is for better efficiency.

There are two differences between volitile and synchronized.

Firstly synchronized obtains and releases locks on monitors which can force only one thread at a time to execute a code block. That's the fairly well known aspect to synchronized. But synchronized also synchronizes memory. In fact synchronized synchronizes the whole of thread memory with "main" memory. So executing geti3() does the following:

  • The thread acquires the lock on the monitor for object this .
  • The thread memory flushes all its variables, i.e. it has all of its variables effectively read from "main" memory .
  • The code block is executed (in this case setting the return value to the current value of i3, which may have just been reset from "main" memory).
  • (Any changes to variables would normally now be written out to "main" memory, but for geti3() we have no changes.)
  • The thread releases the lock on the monitor for object this.

So where volatile only synchronizes the value of one variable between thread memory and "main" memory, synchronized synchronizes the value of all variables between thread memory and "main" memory, and locks and releases a monitor to boot. Clearly synchronized is likely to have more overhead than volatile.

-1, Volatile does not acquire a lock, it uses the underlying CPU architecture to ensure visibility across all threads after the write.

It's worth noting that there may be some cases where a lock may be used to guarantee atomicity of writes. E.g. writing a long on a 32 bit platform that doesn't support extended width rights. Intel avoids this by using SSE2 registers (128 bits wide) to handle volatile longs. However, considering a volatile as a lock will likely lead to nasty bugs in your code.

The important semantic shared by locks a volatile variables is that they both provide Happens-Before edges (Java 1.5 and later). Entering a synchronized block, taking out a lock and reading from a volatile are all considered as an "acquire" and the release of a lock, exiting a synchronized block and writing a volatile are all forms of a "release".

It's also important to realize that while the five steps in the answer may be what Java does (currently), it is not what the JMM specifically requires; for example, if the CPU supported it, all that is required within a synchronized block is that any variable accessed be read from main memory on first access; similarly, on exiting a synchronized block all that is required is that any variable that was updated since the synchronize started be updated in main memory. In other words, within the synchronize block the code must see fresh values, and any updates must update main memory.

Thank you a lot, very clear. Could you please articulate what happens if iX are references to objects instead of primitives types? Especially references to instances of classes that already provide synchronized methods.

multithreading - Difference between volatile and synchronized in Java ...

java multithreading java-me synchronized volatile
Rectangle 27 315

There is no difference between the objects; you have a HashMap<String, Object> in both cases. There is a difference in the interface you have to the object. In the first case, the interface is HashMap<String, Object>, whereas in the second it's Map<String, Object>. But the underlying object is the same.

The advantage to using Map<String, Object> is that you can change the underlying object to be a different kind of map without breaking your contract with any code that's using it. If you declare it as HashMap<String, Object>, you have to change your contract if you want to change the underlying implementation.

class Foo {
    private HashMap<String, Object> things;
    private HashMap<String, Object> moreThings;

    protected HashMap<String, Object> getThings() {
        return this.things;
    }

    protected HashMap<String, Object> getMoreThings() {
        return this.moreThings;
    }

    public Foo() {
        this.things = new HashMap<String, Object>();
        this.moreThings = new HashMap<String, Object>();
    }

    // ...more...
}

The class has a couple of internal maps of string->object which it shares (via accessor methods) with subclasses. Let's say I write it with HashMaps to start with because I think that's the appropriate structure to use when writing the class.

Later, Mary writes code subclassing it. She has something she needs to do with both things and moreThings, so naturally she puts that in a common method, and she uses the same type I used on getThings/getMoreThings when defining her method:

Later, I decide that actually, it's better if I use TreeMap instead of HashMap in Foo. I update Foo, changing HashMap to TreeMap. Now, SpecialFoo doesn't compile anymore, because I've broken the contract: Foo used to say it provided HashMaps, but now it's providing TreeMaps instead. So we have to fix SpecialFoo now (and this kind of thing can ripple through a codebase).

Unless I had a really good reason for sharing that my implementation was using a HashMap (and that does happen), what I should have done was declare getThings and getMoreThings as just returning Map<String, Object> without being any more specific than that. In fact, barring a good reason to do something else, even within Foo I should probably declare things and moreThings as Map, not HashMap/TreeMap:

class Foo {
    private Map<String, Object> things;             // <== Changed
    private Map<String, Object> moreThings;         // <== Changed

    protected Map<String, Object> getThings() {     // <== Changed
        return this.things;
    }

    protected Map<String, Object> getMoreThings() { // <== Changed
        return this.moreThings;
    }

    public Foo() {
        this.things = new HashMap<String, Object>();
        this.moreThings = new HashMap<String, Object>();
    }

    // ...more...
}

Note how I'm now using Map<String, Object> everywhere I can, only being specific when I create the actual objects.

If I had done that, then Mary would have done this:

class SpecialFoo extends Foo {
    private void doSomething(Map<String, Object> t) { // <== Changed
        // ...
    }

    public void whatever() {
        this.doSomething(this.getThings());
        this.doSomething(this.getMoreThings());
    }
}
Foo
SpecialFoo

Interfaces (and base classes) let us reveal only as much as is necessary, keeping our flexibility under the covers to make changes as appropriate. In general, we want to have our references be as basic as possible. If we don't need to know it's a HashMap, just call it a Map.

This isn't a blind rule, but in general, coding to the most general interface is going to be less brittle than coding to something more specific. If I'd remembered that, I wouldn't have created a Foo that set Mary up for failure with SpecialFoo. If Mary had remembered that, then even though I messed up Foo, she would have declared her private method with Map instead of HashMap and my changing Foo's contract wouldn't have impacted her code.

Sometimes you can't do that, sometimes you have to be specific. But unless you have a reason to be, err toward the least-specific interface.

so the only difference is when i pass it as a parameter for example then i need to reference one as Map<blah> and the other as HashMap<blah> but they are indeed the same exact type of object?

Yes, they're the exact same object, it's about the contract you're forming with any code using it. I updated the answer a bit to clarify.

ah, so the difference is that in general, Map has certain methods associated with it. but there are different ways or creating a map, such as a HashMap, and these different ways provide unique methods that not all maps have. So if I use a Map, I can only use Map methods, but I have a HashMap underneath so any speed benefits, search benefits, etc of HashMap will be seen in the Map. And if I used a HashMap, I could use those HashMap specific methods and if I ultimately need to change the map type it's a lot more work.

I think what he's saying is that even if you're referring to a HashMap as a Map, the implementation remains HashMap and so nothing changes about how the methods execute. If you changed the implementation behind the Map interface, properties of execution (such as speed) could indeed change.

This is a great answer. It would be even better with examples.

dictionary - What is the difference between the HashMap and Map object...

java dictionary hashmap
Rectangle 27 307

There is no difference between the objects; you have a HashMap<String, Object> in both cases. There is a difference in the interface you have to the object. In the first case, the interface is HashMap<String, Object>, whereas in the second it's Map<String, Object>. But the underlying object is the same.

The advantage to using Map<String, Object> is that you can change the underlying object to be a different kind of map without breaking your contract with any code that's using it. If you declare it as HashMap<String, Object>, you have to change your contract if you want to change the underlying implementation.

class Foo {
    private HashMap<String, Object> things;
    private HashMap<String, Object> moreThings;

    protected HashMap<String, Object> getThings() {
        return this.things;
    }

    protected HashMap<String, Object> getMoreThings() {
        return this.moreThings;
    }

    public Foo() {
        this.things = new HashMap<String, Object>();
        this.moreThings = new HashMap<String, Object>();
    }

    // ...more...
}

The class has a couple of internal maps of string->object which it shares (via accessor methods) with subclasses. Let's say I write it with HashMaps to start with because I think that's the appropriate structure to use when writing the class.

Later, Mary writes code subclassing it. She has something she needs to do with both things and moreThings, so naturally she puts that in a common method, and she uses the same type I used on getThings/getMoreThings when defining her method:

Later, I decide that actually, it's better if I use TreeMap instead of HashMap in Foo. I update Foo, changing HashMap to TreeMap. Now, SpecialFoo doesn't compile anymore, because I've broken the contract: Foo used to say it provided HashMaps, but now it's providing TreeMaps instead. So we have to fix SpecialFoo now (and this kind of thing can ripple through a codebase).

Unless I had a really good reason for sharing that my implementation was using a HashMap (and that does happen), what I should have done was declare getThings and getMoreThings as just returning Map<String, Object> without being any more specific than that. In fact, barring a good reason to do something else, even within Foo I should probably declare things and moreThings as Map, not HashMap/TreeMap:

class Foo {
    private Map<String, Object> things;             // <== Changed
    private Map<String, Object> moreThings;         // <== Changed

    protected Map<String, Object> getThings() {     // <== Changed
        return this.things;
    }

    protected Map<String, Object> getMoreThings() { // <== Changed
        return this.moreThings;
    }

    public Foo() {
        this.things = new HashMap<String, Object>();
        this.moreThings = new HashMap<String, Object>();
    }

    // ...more...
}

Note how I'm now using Map<String, Object> everywhere I can, only being specific when I create the actual objects.

If I had done that, then Mary would have done this:

class SpecialFoo extends Foo {
    private void doSomething(Map<String, Object> t) { // <== Changed
        // ...
    }

    public void whatever() {
        this.doSomething(this.getThings());
        this.doSomething(this.getMoreThings());
    }
}
Foo
SpecialFoo

Interfaces (and base classes) let us reveal only as much as is necessary, keeping our flexibility under the covers to make changes as appropriate. In general, we want to have our references be as basic as possible. If we don't need to know it's a HashMap, just call it a Map.

This isn't a blind rule, but in general, coding to the most general interface is going to be less brittle than coding to something more specific. If I'd remembered that, I wouldn't have created a Foo that set Mary up for failure with SpecialFoo. If Mary had remembered that, then even though I messed up Foo, she would have declared her private method with Map instead of HashMap and my changing Foo's contract wouldn't have impacted her code.

Sometimes you can't do that, sometimes you have to be specific. But unless you have a reason to be, err toward the least-specific interface.

so the only difference is when i pass it as a parameter for example then i need to reference one as Map<blah> and the other as HashMap<blah> but they are indeed the same exact type of object?

Yes, they're the exact same object, it's about the contract you're forming with any code using it. I updated the answer a bit to clarify.

ah, so the difference is that in general, Map has certain methods associated with it. but there are different ways or creating a map, such as a HashMap, and these different ways provide unique methods that not all maps have. So if I use a Map, I can only use Map methods, but I have a HashMap underneath so any speed benefits, search benefits, etc of HashMap will be seen in the Map. And if I used a HashMap, I could use those HashMap specific methods and if I ultimately need to change the map type it's a lot more work.

I think what he's saying is that even if you're referring to a HashMap as a Map, the implementation remains HashMap and so nothing changes about how the methods execute. If you changed the implementation behind the Map interface, properties of execution (such as speed) could indeed change.

This is a great answer. It would be even better with examples.

dictionary - What is the difference between the HashMap and Map object...

java dictionary hashmap
Rectangle 27 306

It's important to understand that there are two aspects to thread safety: (1) execution control, and (2) memory visibility. The first has to do with controlling when code executes (including the order in which instructions are executed) and whether it can execute concurrently, and the second to do with when the effects in memory of what has been done are visible to other threads. Because each CPU has several levels of cache between it and main memory, threads running on different CPUs or cores can see "memory" differently at any given moment in time because threads are permitted to obtain and work on private copies of main memory.

Using synchronized prevents any other thread from obtaining the monitor (or lock) for the same object, thereby preventing all code blocks protected by synchronization on the same object from executing concurrently. Synchronization also creates a "happens-before" memory barrier, causing a memory visibility constraint such that anything done up to the point some thread releases a lock appears to another thread subsequently acquiring the same lock to have happened before it acquired the lock. In practical terms, on current hardware, this typically causes flushing of the CPU caches when a monitor is acquired and writes to main memory when it is released, both of which are (relatively) expensive.

Using volatile, on the other hand, forces all accesses (read or write) to the volatile variable to occur to main memory, effectively keeping the volatile variable out of CPU caches. This can be useful for some actions where it is simply required that visibility of the variable be correct and order of accesses is not important. Using volatile also changes treatment of long and double to require accesses to them to be atomic; on some (older) hardware this might require locks, though not on modern 64 bit hardware. Under the new (JSR-133) memory model for Java 5+, the semantics of volatile have been strengthened to be almost as strong as synchronized with respect to memory visibility and instruction ordering (see http://www.cs.umd.edu/users/pugh/java/memoryModel/jsr-133-faq.html#volatile). For the purposes of visibility, each access to a volatile field acts like half a synchronization.

Under the new memory model, it is still true that volatile variables cannot be reordered with each other. The difference is that it is now no longer so easy to reorder normal field accesses around them. Writing to a volatile field has the same memory effect as a monitor release, and reading from a volatile field has the same memory effect as a monitor acquire. In effect, because the new memory model places stricter constraints on reordering of volatile field accesses with other field accesses, volatile or not, anything that was visible to thread A when it writes to volatile field f becomes visible to thread B when it reads f.

So, now both forms of memory barrier (under the current JMM) cause an instruction re-ordering barrier which prevents the compiler or run-time from re-ordering instructions across the barrier. In the old JMM, volatile did not prevent re-ordering. This can be important, because apart from memory barriers the only limitation imposed is that, for any particular thread, the net effect of the code is the same as it would be if the instructions were executed in precisely the order in which they appear in the source.

One use of volatile is for a shared but immutable object is recreated on the fly, with many other threads taking a reference to the object at a particular point in their execution cycle. One needs the other threads to begin using the recreated object once it is published, but does not need the additional overhead of full synchronization and it's attendant contention and cache flushing.

// Declaration
public class SharedLocation {
    static public SomeObject someObject=new SomeObject(); // default object
    }

// Publishing code
// Note: do not simply use SharedLocation.someObject.xxx(), since although
//       someObject will be internally consistent for xxx(), a subsequent 
//       call to yyy() might be inconsistent with xxx() if the object was 
//       replaced in between calls.
SharedLocation.someObject=new SomeObject(...); // new object is published

// Using code
private String getError() {
    SomeObject myCopy=SharedLocation.someObject; // gets current copy
    ...
    int cod=myCopy.getErrorCode();
    String txt=myCopy.getErrorText();
    return (cod+" - "+txt);
    }
// And so on, with myCopy always in a consistent state within and across calls
// Eventually we will return to the code that gets the current SomeObject.

Speaking to your read-update-write question, specifically. Consider the following unsafe code:

public void updateCounter() {
    if(counter==1000) { counter=0; }
    else              { counter++; }
    }

Now, with the updateCounter() method unsynchronized, two threads may enter it at the same time. Among the many permutations of what could happen, one is that thread-1 does the test for counter==1000 and finds it true and is then suspended. Then thread-2 does the same test and also sees it true and is suspended. Then thread-1 resumes and sets counter to 0. Then thread-2 resumes and again sets counter to 0 because it missed the update from thread-1. This can also happen even if thread switching does not occur as I have described, but simply because two different cached copies of counter were present in two different CPU cores and the threads each ran on a separate core. For that matter, one thread could have counter at one value and the other could have counter at some entirely different value just because of caching.

What's important in this example is that the variable counter was read from main memory into cache, updated in cache and only written back to main memory at some indeterminate point later when a memory barrier occurred or when the cache memory was needed for something else. Making the counter volatile is insufficient for thread-safety of this code, because the test for the maximum and the assignments are discrete operations, including the increment which is a set of non-atomic read+increment+write machine instructions, something like:

MOV EAX,counter
INC EAX
MOV counter,EAX

Volatile variables are useful only when all operations performed on them are "atomic", such as my example where a reference to a fully formed object is only read or written (and, indeed, typically it's only written from a single point). Another example would be a volatile array reference backing a copy-on-write list, provided the array was only read by first taking a local copy of the reference to it.

Thanks very much! The example with the counter is simple to understand. However, when things get real, it's a bit different.

"In practical terms, on current hardware, this typically causes flushing of the CPU caches when a monitor is acquired and writes to main memory when it is released, both of which are expensive (relatively speaking)." . When you say CPU caches, is it the same as Java Stacks local to each thread? or does a thread has its own local version of Heap? Apologize if i am being silly here.

@nishm It's not the same, but it would include the local caches of the threads involved. .

@MarianPadzioch: An increment or decrement is NOT a read or a write, it's a read and a write; it's a read into a register, then a register increment, then a write back to memory. Reads and writes are individually atomic, but multiple such operations are not.

So, according to the FAQ, not only the actions made since a lock acquisition are made visible after unlock, but all actions made by that thread are made visible. Even actions made before the lock acquisition.

multithreading - Difference between volatile and synchronized in Java ...

java multithreading java-me synchronized volatile
Rectangle 27 304

It's important to understand that there are two aspects to thread safety: (1) execution control, and (2) memory visibility. The first has to do with controlling when code executes (including the order in which instructions are executed) and whether it can execute concurrently, and the second to do with when the effects in memory of what has been done are visible to other threads. Because each CPU has several levels of cache between it and main memory, threads running on different CPUs or cores can see "memory" differently at any given moment in time because threads are permitted to obtain and work on private copies of main memory.

Using synchronized prevents any other thread from obtaining the monitor (or lock) for the same object, thereby preventing all code blocks protected by synchronization on the same object from executing concurrently. Synchronization also creates a "happens-before" memory barrier, causing a memory visibility constraint such that anything done up to the point some thread releases a lock appears to another thread subsequently acquiring the same lock to have happened before it acquired the lock. In practical terms, on current hardware, this typically causes flushing of the CPU caches when a monitor is acquired and writes to main memory when it is released, both of which are (relatively) expensive.

Using volatile, on the other hand, forces all accesses (read or write) to the volatile variable to occur to main memory, effectively keeping the volatile variable out of CPU caches. This can be useful for some actions where it is simply required that visibility of the variable be correct and order of accesses is not important. Using volatile also changes treatment of long and double to require accesses to them to be atomic; on some (older) hardware this might require locks, though not on modern 64 bit hardware. Under the new (JSR-133) memory model for Java 5+, the semantics of volatile have been strengthened to be almost as strong as synchronized with respect to memory visibility and instruction ordering (see http://www.cs.umd.edu/users/pugh/java/memoryModel/jsr-133-faq.html#volatile). For the purposes of visibility, each access to a volatile field acts like half a synchronization.

Under the new memory model, it is still true that volatile variables cannot be reordered with each other. The difference is that it is now no longer so easy to reorder normal field accesses around them. Writing to a volatile field has the same memory effect as a monitor release, and reading from a volatile field has the same memory effect as a monitor acquire. In effect, because the new memory model places stricter constraints on reordering of volatile field accesses with other field accesses, volatile or not, anything that was visible to thread A when it writes to volatile field f becomes visible to thread B when it reads f.

So, now both forms of memory barrier (under the current JMM) cause an instruction re-ordering barrier which prevents the compiler or run-time from re-ordering instructions across the barrier. In the old JMM, volatile did not prevent re-ordering. This can be important, because apart from memory barriers the only limitation imposed is that, for any particular thread, the net effect of the code is the same as it would be if the instructions were executed in precisely the order in which they appear in the source.

One use of volatile is for a shared but immutable object is recreated on the fly, with many other threads taking a reference to the object at a particular point in their execution cycle. One needs the other threads to begin using the recreated object once it is published, but does not need the additional overhead of full synchronization and it's attendant contention and cache flushing.

// Declaration
public class SharedLocation {
    static public SomeObject someObject=new SomeObject(); // default object
    }

// Publishing code
// Note: do not simply use SharedLocation.someObject.xxx(), since although
//       someObject will be internally consistent for xxx(), a subsequent 
//       call to yyy() might be inconsistent with xxx() if the object was 
//       replaced in between calls.
SharedLocation.someObject=new SomeObject(...); // new object is published

// Using code
private String getError() {
    SomeObject myCopy=SharedLocation.someObject; // gets current copy
    ...
    int cod=myCopy.getErrorCode();
    String txt=myCopy.getErrorText();
    return (cod+" - "+txt);
    }
// And so on, with myCopy always in a consistent state within and across calls
// Eventually we will return to the code that gets the current SomeObject.

Speaking to your read-update-write question, specifically. Consider the following unsafe code:

public void updateCounter() {
    if(counter==1000) { counter=0; }
    else              { counter++; }
    }

Now, with the updateCounter() method unsynchronized, two threads may enter it at the same time. Among the many permutations of what could happen, one is that thread-1 does the test for counter==1000 and finds it true and is then suspended. Then thread-2 does the same test and also sees it true and is suspended. Then thread-1 resumes and sets counter to 0. Then thread-2 resumes and again sets counter to 0 because it missed the update from thread-1. This can also happen even if thread switching does not occur as I have described, but simply because two different cached copies of counter were present in two different CPU cores and the threads each ran on a separate core. For that matter, one thread could have counter at one value and the other could have counter at some entirely different value just because of caching.

What's important in this example is that the variable counter was read from main memory into cache, updated in cache and only written back to main memory at some indeterminate point later when a memory barrier occurred or when the cache memory was needed for something else. Making the counter volatile is insufficient for thread-safety of this code, because the test for the maximum and the assignments are discrete operations, including the increment which is a set of non-atomic read+increment+write machine instructions, something like:

MOV EAX,counter
INC EAX
MOV counter,EAX

Volatile variables are useful only when all operations performed on them are "atomic", such as my example where a reference to a fully formed object is only read or written (and, indeed, typically it's only written from a single point). Another example would be a volatile array reference backing a copy-on-write list, provided the array was only read by first taking a local copy of the reference to it.

Thanks very much! The example with the counter is simple to understand. However, when things get real, it's a bit different.

"In practical terms, on current hardware, this typically causes flushing of the CPU caches when a monitor is acquired and writes to main memory when it is released, both of which are expensive (relatively speaking)." . When you say CPU caches, is it the same as Java Stacks local to each thread? or does a thread has its own local version of Heap? Apologize if i am being silly here.

@nishm It's not the same, but it would include the local caches of the threads involved. .

@MarianPadzioch: An increment or decrement is NOT a read or a write, it's a read and a write; it's a read into a register, then a register increment, then a write back to memory. Reads and writes are individually atomic, but multiple such operations are not.

So, according to the FAQ, not only the actions made since a lock acquisition are made visible after unlock, but all actions made by that thread are made visible. Even actions made before the lock acquisition.

multithreading - Difference between volatile and synchronized in Java ...

java multithreading java-me synchronized volatile
Rectangle 27 217

What are the differences in the applications of Runnable and Callable. Is the difference only with the return parameter present in Callable?

Callable

What is the need of having both if Callable can do all that Runnable does?

Because the Runnable interface cannot do everything that Callable does!

Runnable has been around since Java 1.0, but Callable was only introduced in Java 1.5 ... to handle use-cases that Runnable does not support. In theory, the Java team could have changed the signature of the Runnable.run() method, but this would have broken binary compatiblity with pre-1.5 code, requiring recoding when migrating old Java code to newer JVMs. That is a BIG NO-NO. Java strives to be backwards compatible ... and that's been one of Java's biggest selling points for business computing.

And, obviously, there are use-cases where a task doesn't need to return a result or throw a checked exception. For those use-cases, using Runnable is more concise than using Callable<Void> and returning a dummy (null) value from the call() method.

@prash - the basic facts are to be found in old textbooks. Like the first edition of Java in a Nutshell.

(@prash - Also ... by starting to use Java in the Java 1.1 era.)

@StephenC If I read your answer correctly, you're suggesting that Runnable exists (largely) for backward compatibility reasons. But aren't there situations where it's unnecessary or too expensive to implement (or to require) Callable interface (e.g., in ScheduledFuture<?> ScheduledExecutorService.schedule(Runnable command, long delay, TimeUnit unit)) ? So isn't there a benefit to maintaining both interfaces in the language even the history didn't force the current outcome?

@max - Well I said that, and I still agree with that. However, that is a secondary reason. But even so, I suspect that Runnable would have been modified if there had not been an imperative to maintain compatibility. The "boilerplate" of return null; is a weak argument. (At least, that would have been my decision ... in the hypothetical context where you could ignore backwards compatibility.)

multithreading - The difference between the Runnable and Callable inte...

java multithreading interface runnable callable
Rectangle 27 215

What are the differences in the applications of Runnable and Callable. Is the difference only with the return parameter present in Callable?

Callable

What is the need of having both if Callable can do all that Runnable does?

Because the Runnable interface cannot do everything that Callable does!

Runnable has been around since Java 1.0, but Callable was only introduced in Java 1.5 ... to handle use-cases that Runnable does not support. In theory, the Java team could have changed the signature of the Runnable.run() method, but this would have broken binary compatiblity with pre-1.5 code, requiring recoding when migrating old Java code to newer JVMs. That is a BIG NO-NO. Java strives to be backwards compatible ... and that's been one of Java's biggest selling points for business computing.

And, obviously, there are use-cases where a task doesn't need to return a result or throw a checked exception. For those use-cases, using Runnable is more concise than using Callable<Void> and returning a dummy (null) value from the call() method.

@prash - the basic facts are to be found in old textbooks. Like the first edition of Java in a Nutshell.

(@prash - Also ... by starting to use Java in the Java 1.1 era.)

@StephenC If I read your answer correctly, you're suggesting that Runnable exists (largely) for backward compatibility reasons. But aren't there situations where it's unnecessary or too expensive to implement (or to require) Callable interface (e.g., in ScheduledFuture<?> ScheduledExecutorService.schedule(Runnable command, long delay, TimeUnit unit)) ? So isn't there a benefit to maintaining both interfaces in the language even the history didn't force the current outcome?

@max - Well I said that, and I still agree with that. However, that is a secondary reason. But even so, I suspect that Runnable would have been modified if there had not been an imperative to maintain compatibility. The "boilerplate" of return null; is a weak argument. (At least, that would have been my decision ... in the hypothetical context where you could ignore backwards compatibility.)

multithreading - The difference between the Runnable and Callable inte...

java multithreading interface runnable callable
Rectangle 27 56

The standard way to do this is as follows:

/**
 * Returns a psuedo-random number between min and max, inclusive.
 * The difference between min and max can be at most
 * <code>Integer.MAX_VALUE - 1</code>.
 *
 * @param min Minimim value
 * @param max Maximim value.  Must be greater than min.
 * @return Integer between min and max, inclusive.
 * @see java.util.Random#nextInt(int)
 */
public static int randInt(int min, int max) {

    // Usually this can be a field rather than a method variable
    Random rand = new Random();

    // nextInt is normally exclusive of the top value,
    // so add 1 to make it inclusive
    int randomNum = rand.nextInt((max - min) + 1) + min;

    return randomNum;
}

As explained by Aurund, Random objects created within a short time of each other will tend to produce similar output, so it would be a good idea to keep the created Random object as a field, rather than in the method as I have done (for explanation purposes only).

Random rand = new Random(); I would go so far as to say that it must be a field. Random objects created within a short time of each other will tend to produce similar output. So many calls to randInt within a short period of time will not give evenly distributed output.

Generating a Random Number between 1 and 10 Java - Stack Overflow

java random
Rectangle 27 83

volatile is a field modifier, while synchronized modifies code blocks and methods. So we can specify three variations of a simple accessor using those two keywords:

int i1;
    int geti1() {return i1;}

    volatile int i2;
    int geti2() {return i2;}

    int i3;
    synchronized int geti3() {return i3;}

geti1() accesses the value currently stored in i1 in the current thread. Threads can have local copies of variables, and the data does not have to be the same as the data held in other threads.In particular, another thread may have updated i1 in it's thread, but the value in the current thread could be different from that updated value. In fact Java has the idea of a "main" memory, and this is the memory that holds the current "correct" value for variables. Threads can have their own copy of data for variables, and the thread copy can be different from the "main" memory. So in fact, it is possible for the "main" memory to have a value of 1 for i1, for thread1 to have a value of 2 for i1 and for thread2 to have a value of 3 for i1 if thread1 and thread2 have both updated i1 but those updated value has not yet been propagated to "main" memory or other threads.

On the other hand, geti2() effectively accesses the value of i2 from "main" memory. A volatile variable is not allowed to have a local copy of a variable that is different from the value currently held in "main" memory. Effectively, a variable declared volatile must have it's data synchronized across all threads, so that whenever you access or update the variable in any thread, all other threads immediately see the same value. Generally volatile variables have a higher access and update overhead than "plain" variables. Generally threads are allowed to have their own copy of data is for better efficiency.

There are two differences between volitile and synchronized.

Firstly synchronized obtains and releases locks on monitors which can force only one thread at a time to execute a code block. That's the fairly well known aspect to synchronized. But synchronized also synchronizes memory. In fact synchronized synchronizes the whole of thread memory with "main" memory. So executing geti3() does the following:

  • The thread acquires the lock on the monitor for object this .
  • The thread memory flushes all its variables, i.e. it has all of its variables effectively read from "main" memory .
  • The code block is executed (in this case setting the return value to the current value of i3, which may have just been reset from "main" memory).
  • (Any changes to variables would normally now be written out to "main" memory, but for geti3() we have no changes.)
  • The thread releases the lock on the monitor for object this.

So where volatile only synchronizes the value of one variable between thread memory and "main" memory, synchronized synchronizes the value of all variables between thread memory and "main" memory, and locks and releases a monitor to boot. Clearly synchronized is likely to have more overhead than volatile.

-1, Volatile does not acquire a lock, it uses the underlying CPU architecture to ensure visibility across all threads after the write.

It's worth noting that there may be some cases where a lock may be used to guarantee atomicity of writes. E.g. writing a long on a 32 bit platform that doesn't support extended width rights. Intel avoids this by using SSE2 registers (128 bits wide) to handle volatile longs. However, considering a volatile as a lock will likely lead to nasty bugs in your code.

The important semantic shared by locks a volatile variables is that they both provide Happens-Before edges (Java 1.5 and later). Entering a synchronized block, taking out a lock and reading from a volatile are all considered as an "acquire" and the release of a lock, exiting a synchronized block and writing a volatile are all forms of a "release".

Thank you a lot, very clear. Could you please articulate what happens if iX are references to objects instead of primitives types? Especially references to instances of classes that already provide synchronized methods.

multithreading - Difference between volatile and synchronized in Java ...

java multithreading java-me synchronized volatile
Rectangle 27 82

volatile is a field modifier, while synchronized modifies code blocks and methods. So we can specify three variations of a simple accessor using those two keywords:

int i1;
    int geti1() {return i1;}

    volatile int i2;
    int geti2() {return i2;}

    int i3;
    synchronized int geti3() {return i3;}

geti1() accesses the value currently stored in i1 in the current thread. Threads can have local copies of variables, and the data does not have to be the same as the data held in other threads.In particular, another thread may have updated i1 in it's thread, but the value in the current thread could be different from that updated value. In fact Java has the idea of a "main" memory, and this is the memory that holds the current "correct" value for variables. Threads can have their own copy of data for variables, and the thread copy can be different from the "main" memory. So in fact, it is possible for the "main" memory to have a value of 1 for i1, for thread1 to have a value of 2 for i1 and for thread2 to have a value of 3 for i1 if thread1 and thread2 have both updated i1 but those updated value has not yet been propagated to "main" memory or other threads.

On the other hand, geti2() effectively accesses the value of i2 from "main" memory. A volatile variable is not allowed to have a local copy of a variable that is different from the value currently held in "main" memory. Effectively, a variable declared volatile must have it's data synchronized across all threads, so that whenever you access or update the variable in any thread, all other threads immediately see the same value. Generally volatile variables have a higher access and update overhead than "plain" variables. Generally threads are allowed to have their own copy of data is for better efficiency.

There are two differences between volitile and synchronized.

Firstly synchronized obtains and releases locks on monitors which can force only one thread at a time to execute a code block. That's the fairly well known aspect to synchronized. But synchronized also synchronizes memory. In fact synchronized synchronizes the whole of thread memory with "main" memory. So executing geti3() does the following:

  • The thread acquires the lock on the monitor for object this .
  • The thread memory flushes all its variables, i.e. it has all of its variables effectively read from "main" memory .
  • The code block is executed (in this case setting the return value to the current value of i3, which may have just been reset from "main" memory).
  • (Any changes to variables would normally now be written out to "main" memory, but for geti3() we have no changes.)
  • The thread releases the lock on the monitor for object this.

So where volatile only synchronizes the value of one variable between thread memory and "main" memory, synchronized synchronizes the value of all variables between thread memory and "main" memory, and locks and releases a monitor to boot. Clearly synchronized is likely to have more overhead than volatile.

-1, Volatile does not acquire a lock, it uses the underlying CPU architecture to ensure visibility across all threads after the write.

It's worth noting that there may be some cases where a lock may be used to guarantee atomicity of writes. E.g. writing a long on a 32 bit platform that doesn't support extended width rights. Intel avoids this by using SSE2 registers (128 bits wide) to handle volatile longs. However, considering a volatile as a lock will likely lead to nasty bugs in your code.

The important semantic shared by locks a volatile variables is that they both provide Happens-Before edges (Java 1.5 and later). Entering a synchronized block, taking out a lock and reading from a volatile are all considered as an "acquire" and the release of a lock, exiting a synchronized block and writing a volatile are all forms of a "release".

Thank you a lot, very clear. Could you please articulate what happens if iX are references to objects instead of primitives types? Especially references to instances of classes that already provide synchronized methods.

multithreading - Difference between volatile and synchronized in Java ...

java multithreading java-me synchronized volatile
Rectangle 27 37

uses algorithms to decide whether or not to reclaim a softly reachable object, but always reclaims a weakly reachable object.

Clear and concise, this is the perfect summary for previous answers.

why did you highlight that as code? is it a quote?

reference - What's the difference between SoftReference and WeakRefere...

java reference weak-references soft-references
Rectangle 27 37

uses algorithms to decide whether or not to reclaim a softly reachable object, but always reclaims a weakly reachable object.

Clear and concise, this is the perfect summary for previous answers.

why did you highlight that as code? is it a quote?

reference - What's the difference between SoftReference and WeakRefere...

java reference weak-references soft-references
Rectangle 27 37

uses algorithms to decide whether or not to reclaim a softly reachable object, but always reclaims a weakly reachable object.

Clear and concise, this is the perfect summary for previous answers.

why did you highlight that as code? is it a quote?

reference - What's the difference between SoftReference and WeakRefere...

java reference weak-references soft-references
Rectangle 27 40

When you say "type" I'm going to assume you mean static type mostly. But I'll talk about dynamic types shortly.

A static type is a property of a portion of a program that can be statically proven (static means "without running it"). In a statically typed language, every expression has a type whether you write it or not. For instance, in the Cish "int x = a * b + c - d", a,b,c,and d have types, a * b has a type, a * b + c has a type and a * b + c -d has a type. But we've only annotated x with a type. In other languages, such as Scala, C#, Haskell, SML, and F#, even that wouldn't be necessary.

Exactly what properties are provable depends on the type checker.

A Scala style class, on the other hand, is just the specification for a set of objects. That specification includes some type information and includes a lot of implementation and representation details such as method bodies and private fields, etc. In Scala a class also specifies some module boundaries.

Many languages have types but don't have classes and many languages have classes but don't have (static) types.

There are several observable differences between types and classes. List[String] is a type but not a class. In Scala List is class but normally not a type (it's actually a higher kinded type). In C# List isn't a type of any sort and in Java it's a "raw type".

Scala offers structural types. {def foo : Bar} means any object that provably has a foo method that returns a Bar, regardless of class. It's a type, but not a class.

Types can be abstracted using type parameters. When you write def foo[T](x : T) = ..., then inside the body of foo T is a type. But T is not a class.

Types can be virtual in Scala (i.e. "abstract type members"), but classes can't be virtual with Scala today (although there's a boilerplate heavy way to encode virtual classes https://wiki.scala-lang.org/display/SIW/VirtualClassesDesign)

Now, dynamic types. Dynamic types are properties of objects that the runtime automatically checks before performing certain operations. In dynamically typed class-based OO languages there's a strong correlation between types and classes. The same thing happens on JVM languages such as Scala and Java which have operations that can only be checked dynamically such as reflection and casting. In those languages, "type erasure" more or less means that the dynamic type of most objects is the same as their class. More or less. That's not true of, e.g., arrays which aren't typically erased so that the runtime can tell the difference between Array[Int] and Array[String]. But remember my broad definition "dynamic types are properties of objects that the runtime automatically checks." When you use reflection it is possible to send any message to any object. If the object supports that message then everything works out. Thus it makes sense to talk of all objects that can quack like a duck as a dynamic type, even though it's not a class. That's the essence of what the Python and Ruby communities call "duck typing." Also, by my broad definition even "zeroness" is a dynamic type in the sense that, in most languages, the runtime automatically checks numbers to make sure you don't divide by zero. There are a very, very few languages that can prove that statically by making zero (or not-zero) a static type.

Finally, as other's have mentioned, there are types like int which don't have a class as an implementation detail, types like Null and Any which are a bit special but COULD have classes and don't, and types like Nothing which doesn't even have any values let alone a class.

As I get to understand from your above explanation, a question has just crossed my mind. Will I be wrong if I assume, that due to the checking of type during run-time, the dynamically-typed languages(like Python) are slower compared to the statically-typed languages(Scala, Java); because dynamically typed languages have to perform extensive type-checking and that too at run-time?

What is the difference between a class and a type in Scala (and Java)?...

java class scala types language-design
Rectangle 27 21

What is the main difference in object creation between Java and C++?

Unlike Java, in C++ objects can also be created on the stack.

Class obj; //object created on the stack

In Java you can write

Class obj; //obj is just a reference(not an object)
obj = new Class();// obj refers to the object

I would argue that this is the memory allocation that happens before object creation. Does it really count?

I'm not sure that they are asking about memory allocation, but on the other hand, I'm not sure that they aren't asking about memory allocation.

I would extend this answer to consideration of placement-new ... C++ allows creating the object anywhere - automatically on stack or heap, or even a memory location YOU specify. On Java, this is all done by the JVM... and it's always on the heap.

@Deep-B: Placement new is a very advance technique. I would not expect an average C++ user to know how to use it let alone a student who is studying the difference between C++ and Java

What is the main difference in object creation between Java and C++? -...

java c++ object creation
Rectangle 27 19

In addition to other excellent answers, one thing very important, and usually ignored/forgotten, or misunderstood (which explains why I detail the process below):

  • Let's imagine a Base class, with a virtual method foo().
  • Let's imagine a Derived class, inheriting from Base, who overrides the method foo()

The difference between C++ and Java is:

  • In Java, calling foo() from the Base class constructor will call Derived.foo()
  • In C++, calling foo() from the Base class constructor will call Base.foo()

The "bugs" for each languages are different:

  • In Java, calling any method in the constructor could lead to subtle bugs, as the overridden virtual method could try to access a variable which was declared/initialized in the Derived class.

Conceptually, the constructors job is to bring the object into existence (which is hardly an ordinary feat). Inside any constructor, the entire object might be only partially formed you can know only that the base-class objects have been initialized, but you cannot know which classes are inherited from you. A dynamically-bound method call, however, reaches forward or outward into the inheritance hierarchy. It calls a method in a derived class. If you do this inside a constructor, you call a method that might manipulate members that havent been initialized yet a sure recipe for disaster.

  • In C++, one must remember a virtual won't work as expected, as only the method of the current constructed class will be called. The reason is to avoid accessing data members or even methods that do not exist yet.

During base class construction, virtual functions never go down into derived classes. Instead, the object behaves as if it were of the base type. Informally speaking, during base class construction, virtual functions aren't.

+1 I stumbled upon this on my own... my parents told me about many dangerous things out there but never told me about that, and I had to look it up on the internet.

@Hemant & @Archimedix: Thanks for your comments! The question's author did mention the "virtual thing" in his question, but I guessed the problem was devious/vicious enough to have a developed answer, complete with description, facts and sources, instead of being limited to half a sentence in the question. I'm happy to see I was not wrong about that guess.

What is the main difference in object creation between Java and C++? -...

java c++ object creation
Rectangle 27 18

I'm totally confused right now - mostly because of the terminology, I guess. Can someone please walk me through the differences, or provide a few links to Dummy-proof material? Especially URI to URL and Resource to File? To me, it feels like they should be the same thing, respectively...

The terminology is confusing and sometimes befuddling, and mostly born from the evolution both of Java as an API and as a platform over time. To understand how these terms came to mean what they do, it is important to recognise two things that influence Java's design:

  • Backwards compatibility. Old applications should run on newer installations, ideally without modification. This means that an old API (with its names and terminology) needs to be maintained through all newer versions.
  • Cross-platform. The API should provide a usable abstraction of its underlying platform, whether that be an operating system or a browser.

An abstract, generic piece of data that can be located and read.

It does not have a direct class or interface representation in Java

Because one of Java's early design goals was to be run inside of a browser, as a sandboxed application (applets!) with very limited rights/privileges/security clearance, Java makes a clear (theoretical) difference between a file (something on the local file system) and a resource (something it needs to read). This is why reading something relative to the application (icons, class files, and so on) is done through ClassLoader.getResource and not through the File class.

Unfortunately, because "resource" is also a useful generic term outside of this interpretation, it is also used to name very specific things (e.g. class ResourceBundle, UIResource, Resource) that are not, in this sense, a resource.

An abstract representation of file and directory pathnames.

The File class represents a resource that is reachable through the platform's native file system. It contains only the name of the file, so it is really more a path (see later) that the host platform interprets according to its own settings, rules, and syntax.

Note that File doesn't need to point to something local, just something that the host platform understands in the context of file access, e.g. a UNC path in Windows. If you mount a ZIP file as a file system in your OS, then File will read its contained entries just fine.

Class URL represents a Uniform Resource Locator, a pointer to a "resource" on the World Wide Web. A resource can be something as simple as a file or a directory, or it can be a reference to a more complicated object, such as a query to a database or to a search engine.

In tandem with the concept of a resource, the URL represents that resource the same way the File class represents a file in the host platform: as a structured string that points to a resource. URL additionally contains a scheme that hints at how to reach the resource (with "file:" being "ask the host platform"), and so allows pointing at resources through HTTP, FTP, inside a JAR, and whatnot.

Unfortunately, URLs come with their own syntax and terminology, including the use of "file" and "path". In case the URL is a file-URL, URL.getFile will return a string identical to the path string of the referenced file.

Class.getResource returns an URL: it is more flexible than returning File, and it has served the needs of the system as imagined in the early 1990's.

Represents a Uniform Resource Identifier (URI) reference.

URI is a (slight) abstraction over URL. The difference between URI and URL is conceptual and mostly academic, but URI is better defined in a formal sense, and covers a wider range of use cases. Because URL and URI are/were not the same thing, a new class was introduced to represent them, with methods URI.toURL and URL.toURI to move between one and the other.

In Java, the main difference between URL and URI is that an URL carries the expectation of being resolvable, something the application might want an InputStream from; an URI is treated more like an abstract thingamajig that might point to something resolvable (and usually does), but what it means and how to reach it are more open to context and interpretation.

An object that may be used to locate a file in a file system. It will typically represent a system dependent file path.

The new file API, iconified in the Path interface, allows for much greater flexibility than the File class could offer. The Path interface is an abstraction of the File class, and is part of the New IO File API. Where File necessarily points to a "file" as understood by the host platform, Path is more generic: it represents a file (resource) in an arbitrary file system.

Path takes away the reliance on the host platform's concept of a file. It could be an entry in a ZIP file, a file reachable through FTP or SSH-FS, a multi-rooted representation of the application classpath, or really anything that can be meaningfully represented through the FileSystem interface and its driver, FileSystemProvider. It brings the power of "mounting" file systems into the context of a Java application.

The host platform is represented through the "default file system"; when you call File.toPath, you get a Path on the default file system.

Now, if I have a locator that references a class or package in a jar file, will those two (i.e. path an file strings) differ?

Unlikely. If the jar file is on the local file system, you should not have a query component, so URL.getPath and URL.getFile should return the same result. However, pick the one you need: file-URLs may not typically have query components, but I could sure add one anyway.

Lastly - and most importantly - why do I need File object; why isn't a Resource (URL) enough?

URL might not be enough because File gives you access to housekeeping data such as permissions (readable, writable, executable), file type (am I a directory?), and the ability to search and manipulate the local file system. If these are features you need, then File or Path provide them.

You don't need File if you have access to Path. Some older API may require File, though.

No, there isn't. There are many things named like it, but they are not a resource in the sense of ClassLoader.getResource.

Wow, very thorough. Just going through it, but already have the first follow-up question: When you say a File "contains only the name of the file", don't you contradict your initial statement that it's "An abstract representation of file and directory pathnames" - i.e.more?

@Christian I meant "only the name" as in: does not in any way model the contents of the file; it's merely a thin wrapper around a string. The "abstract representation" part is quoted from the API docs. ;)

What's the difference between a Resource, URI, URL, Path and File in J...

java url terminology
Rectangle 27 17

I was just going to do this as a comment on the accepted answer but it got too funky (I hate not having line breaks)

ah, so the difference is that in general, Map has certain methods associated with it. but there are different ways or creating a map, such as a HashMap, and these different ways provide unique methods that not all maps have.

Exactly--and you always want to use the most general interface you possibly can. Consider ArrayList vs LinkedList. Huge difference in how you use them, but if you use "List" you can switch between them readily.

In fact, you can replace the right-hand side of the initializer with a more dynamic statement. how about something like this:

List collection;
if(keepSorted)
    collection=new LinkedList();
else
    collection=new ArrayList();

This way if you are going to fill in the collection with an insertion sort, you would use a linked list (an insertion sort into an array list is criminal.) But if you don't need to keep it sorted and are just appending, you use an ArrayList (More efficient for other operations).

This is a pretty big stretch here because collections aren't the best example, but in OO design one of the most important concepts is using the interface facade to access different objects with the exact same code.

As for your map comment below, Yes using the "Map" interface restricts you to only those methods unless you cast the collection back from Map to HashMap (which COMPLETELY defeats the purpose).

Often what you will do is create an object and fill it in using it's specific type (HashMap), in some kind of "create" or "initialize" method, but that method will return a "Map" that doesn't need to be manipulated as a HashMap any more.

If you ever have to cast by the way, you are probably using the wrong interface or your code isn't structured well enough. Note that it is acceptable to have one section of your code treat it as a "HashMap" while the other treats it as a "Map", but this should flow "down". so that you are never casting.

Also notice the semi-neat aspect of roles indicated by interfaces. A LinkedList makes a good stack or queue, an ArrayList makes a good stack but a horrific queue (again, a remove would cause a shift of the entire list) so LinkedList implements the Queue interface, ArrayList does not.

but in this example, i only get the methods from the general List class, right? regardless of whether I make it a LinkedList() or an ArrayList()? it's just that if I use insertion sort (which I imagine must be a method for List that LinkedList and ArrayList get by inheritance) it works way faster on the LinkedList?

i guess what i'm looking for is whether or not when I say Map<string, string> m = new HashMap<string, string>() my Map m can use the methods specific to HashMap, or not. I'm thinking it can't?

ah, wait, no, my Map m from above must have the methods from HashMap.

so basically the only perk of using Map in the 'interface sense' is that if i have a method that requires a map, i'm guaranteeing any type of map will work in this method. but if i used a hashmap, i'm saying the method only works with hashmaps. or, put another way, my method only uses methods defined in Map class but inherited by the other Classes which extend Map.

in addition to the perk you mentioned above, where using List means I don't need to decide which type of List I want until runtime, whereas if the interface thing didn't exist I'd have to pick one before compiling and running

dictionary - What is the difference between the HashMap and Map object...

java dictionary hashmap