Rectangle 27 0

java What's the best way to validate an XML file against an XSD file?


import javax.xml.XMLConstants;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.*;
import java.net.URL;
import org.xml.sax.SAXException;
//import java.io.File; // if you use File
import java.io.IOException;
...
URL schemaFile = new URL("http://host:port/filename.xsd");
// webapp example xsd: 
// URL schemaFile = new URL("http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd");
// local file example:
// File schemaFile = new File("/location/to/localfile.xsd"); // etc.
Source xmlFile = new StreamSource(new File("web.xml"));
SchemaFactory schemaFactory = SchemaFactory
    .newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
try {
  Schema schema = schemaFactory.newSchema(schemaFile);
  Validator validator = schema.newValidator();
  validator.validate(xmlFile);
  System.out.println(xmlFile.getSystemId() + " is valid");
} catch (SAXException e) {
  System.out.println(xmlFile.getSystemId() + " is NOT valid reason:" + e);
} catch (IOException e) {}

@Chry Cheng - not sure I understand; the example code does use the constant W3C_XML_SCHEMA_NS_URI.

@ziggy - this is an implementation detail of the JAXP implementation. Sun's JDK 6 uses SAX parser with a StreamSource. A JAXP implementation could legally use a DOM parser in this case, but there is no reason to. If you use a DOM parser explicitly for validation, you will definitely instantiate a DOM tree.

Are you using a DOM or SAX parser in this example? How do i tell which parser you are using as i cant see a reference to either.

Hours upon the net trying to find this; why haven't I learned to search SO first yet?

How do i use an ErrorHandler with the above? Is is a case of just creating the ErrorHandler and associating it with the validator? i.e. validator.SetErrorHandler() as in the example in this SO question stackoverflow.com/questions/4864681/?

Shouldn't execptions just be used for execptional situations and not for control flow?

Shouldn't the value passed to SchemaFactory.newInstance be XMLConstants.W3C_XML_SCHEMA_NS_URI instead as described in stackoverflow.com/questions/2396903/?

The Java runtime library supports validation. Last time I checked this was the Apache Xerces parser under the covers. You should probably use a javax.xml.validation.Validator.

The schema factory constant is the string http://www.w3.org/2001/XMLSchema which defines XSDs. The above code validates a WAR deployment descriptor against the URL http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd but you could just as easily validate against a local file.

You should not use the DOMParser to validate a document (unless your goal is to create a document object model anyway). This will start creating DOM objects as it parses the document - wasteful if you aren't going to use them.

You're right. My mistake. I've gone cross-eyed. I thought you referenced W3C_XML_SCHEMA_INSTANCE_NS_URI instead.

Note
Rectangle 27 0

java What's the best way to validate an XML file against an XSD file?


<schemavalidate> 
    <fileset dir="${configdir}" includes="**/*.xml" />
</schemavalidate>

Now naughty config files will fail our build!

We build our project using ant, so we can use the schemavalidate task to check our config files:

Note
Rectangle 27 0

java What's the best way to validate an XML file against an XSD file?


// parse an XML document into a DOM tree
DocumentBuilder parser = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document document = parser.parse(new File("instance.xml"));

// create a SchemaFactory capable of understanding WXS schemas
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);

// load a WXS schema, represented by a Schema instance
Source schemaFile = new StreamSource(new File("mySchema.xsd"));
Schema schema = factory.newSchema(schemaFile);

// create a Validator instance, which can be used to validate an instance document
Validator validator = schema.newValidator();

// validate the DOM tree
try {
    validator.validate(new DOMSource(document));
} catch (SAXException e) {
    // instance document is invalid!
}
Source
parser.parse(new File("instance.xml"))
validator
validator.validate(new StreamSource(new File("instance.xml")))

There are "errors" (e.g. validation errors) and "fatal errors" (well-formedness errors). One fatal error typically stops the parsing. But a validation error does not stop it : you have to explicitly throw an exception. Thus, it is necessary to provide an ErrorHandler if you need to do validation.

Using Java 7 you can follow the documentation provided in package description.

Working this way, a SAXException would be thrown at the first error in the xml-file and stops then the validation. But I want to know all (!) errors. If I use an ErrorHandler (own class that implements ErrorHandler) instead, it recognizes all errors, but the try-catch-block of validator.validate does not throw any Exception.. How do I recognize an error in the class that invokes the validate-method of my validator? Thanks for your help!

Note
Rectangle 27 0

java What's the best way to validate an XML file against an XSD file?


import org.apache.xerces.parsers.DOMParser;
import java.io.File;
import org.w3c.dom.Document;

public class SchemaTest {
  public static void main (String args[]) {
      File docFile = new File("memory.xml");
      try {
        DOMParser parser = new DOMParser();
        parser.setFeature("http://xml.org/sax/features/validation", true);
        parser.setProperty(
             "http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation", 
             "memory.xsd");
        ErrorChecker errors = new ErrorChecker();
        parser.setErrorHandler(errors);
        parser.parse("memory.xml");
     } catch (Exception e) {
        System.out.print("Problem parsing the file.");
     }
  }
}

"ErrorChecker cannor be resolved to a type" .. missing import ?

All releases available for download on the apache site are vulnerable to DoS attacks (see issues.apache.org/jira/browse/XERCESJ-970). So relying on xerces-j is not a good idea.

Original attribution: blatantly copied from here:

The SAX parser would be more efficient - the DOM parser creates DOM objects; wasteful operations in this instance.

The question is to validate an XML against a XSD. In this answer you are going further and getting a Parser object, which is not needed, right?

Note
Rectangle 27 0

java What's the best way to validate an XML file against an XSD file?


It's the one that actually worked for me with a minimum of fuss.

Note
Rectangle 27 0

java What's the best way to validate an XML file against an XSD file?


public static void verifyValidatesInternalXsd(String filename) throws Exception {
    InputStream xmlStream = new new FileInputStream(filename);
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    factory.setValidating(true);
    factory.setNamespaceAware(true);
    factory.setAttribute("http://java.sun.com/xml/jaxp/properties/schemaLanguage",
                 "http://www.w3.org/2001/XMLSchema");
    DocumentBuilder builder = factory.newDocumentBuilder();
    builder.setErrorHandler(new RaiseOnErrorHandler());
    builder.parse(new InputSource(xmlStream));
    xmlStream.close();
  }

  public static class RaiseOnErrorHandler implements ErrorHandler {
    public void warning(SAXParseException e) throws SAXException {
      throw new RuntimeException(e);
    }
    public void error(SAXParseException e) throws SAXException {
      throw new RuntimeException(e);
    }
    public void fatalError(SAXParseException e) throws SAXException {
      throw new RuntimeException(e);
    }
  }
<document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:SchemaLocation="http://www.example.com/document http://www.example.com/document.xsd">
  ...
SchemaFactory factory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
Schema schema = factory.newSchema();
Source xmlFile = new StreamSource(xmlFileLocation);
SchemaFactory schemaFactory = SchemaFactory
                                .newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema();
Validator validator = schema.newValidator();
validator.setResourceResolver(new LSResourceResolver() {
  @Override
  public LSInput resolveResource(String type, String namespaceURI,
                                 String publicId, String systemId, String baseURI) {
    InputSource is = new InputSource(
                           getClass().getResourceAsStream(
                          "some_local_file_in_the_jar.xsd"));
                          // or lookup by URI, etc...
    return new Input(is); // for class Input see 
                          // https://stackoverflow.com/a/2342859/32453
  }
});
validator.validate(xmlFile);

"If you create a schema without specifying a URL, file, or source, then the Java language creates one that looks in the document being validated to find the schema it should use. For example:"

Here's an example that validates an XML file against any XSD's it references (even if it has to pull them from the network):

Since this is a popular question, I would also like to point out that java can validate against a "referred to" xsd, for instance if the .xml file itself specifies an XSD, using xsi:SchemaLocation or xsi:noNamespaceSchemaLocation (or xsi for particular namespaces) as stated here:

You can avoid pulling referenced XSD's from the network, even though the xml files reference url's, by specifying the xsd manually (see some other answers here) or by using an "XML catalog" style resolver. Spring apparently also can intercept the URL requests to serve local files for validations. Or you can set your own via setResourceResolver, ex:

and this works for multiple namespaces, etc. The problem with this approach is that the xmlsns:xsi is probably a network location, so it'll go out and hit the network with each and every validation, not always optimal.

or SchemaLocation (always a list of namespace to xsd mappings)

Note
Rectangle 27 0

java What's the best way to validate an XML file against an XSD file?


<schemavalidate> 
    <fileset dir="${configdir}" includes="**/*.xml" />
</schemavalidate>

Now naughty config files will fail our build!

We build our project using ant, so we can use the schemavalidate task to check our config files:

Note
Rectangle 27 0

java What's the best way to validate an XML file against an XSD file?


import org.apache.xerces.parsers.DOMParser;
import java.io.File;
import org.w3c.dom.Document;

public class SchemaTest {
  public static void main (String args[]) {
      File docFile = new File("memory.xml");
      try {
        DOMParser parser = new DOMParser();
        parser.setFeature("http://xml.org/sax/features/validation", true);
        parser.setProperty(
             "http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation", 
             "memory.xsd");
        ErrorChecker errors = new ErrorChecker();
        parser.setErrorHandler(errors);
        parser.parse("memory.xml");
     } catch (Exception e) {
        System.out.print("Problem parsing the file.");
     }
  }
}

"ErrorChecker cannor be resolved to a type" .. missing import ?

All releases available for download on the apache site are vulnerable to DoS attacks (see issues.apache.org/jira/browse/XERCESJ-970). So relying on xerces-j is not a good idea.

Original attribution: blatantly copied from here:

The SAX parser would be more efficient - the DOM parser creates DOM objects; wasteful operations in this instance.

The question is to validate an XML against a XSD. In this answer you are going further and getting a Parser object, which is not needed, right?

Note
Rectangle 27 0

java What's the best way to validate an XML file against an XSD file?


import javax.xml.XMLConstants;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.*;
import java.net.URL;
import org.xml.sax.SAXException;
//import java.io.File; // if you use File
import java.io.IOException;
...
URL schemaFile = new URL("http://host:port/filename.xsd");
// webapp example xsd: 
// URL schemaFile = new URL("http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd");
// local file example:
// File schemaFile = new File("/location/to/localfile.xsd"); // etc.
Source xmlFile = new StreamSource(new File("web.xml"));
SchemaFactory schemaFactory = SchemaFactory
    .newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
try {
  Schema schema = schemaFactory.newSchema(schemaFile);
  Validator validator = schema.newValidator();
  validator.validate(xmlFile);
  System.out.println(xmlFile.getSystemId() + " is valid");
} catch (SAXException e) {
  System.out.println(xmlFile.getSystemId() + " is NOT valid reason:" + e);
} catch (IOException e) {}

@ziggy - this is an implementation detail of the JAXP implementation. Sun's JDK 6 uses SAX parser with a StreamSource. A JAXP implementation could legally use a DOM parser in this case, but there is no reason to. If you use a DOM parser explicitly for validation, you will definitely instantiate a DOM tree.

Are you using a DOM or SAX parser in this example? How do i tell which parser you are using as i cant see a reference to either.

Hours upon the net trying to find this; why haven't I learned to search SO first yet?

How do i use an ErrorHandler with the above? Is is a case of just creating the ErrorHandler and associating it with the validator? i.e. validator.SetErrorHandler() as in the example in this SO question stackoverflow.com/questions/4864681/?

Shouldn't execptions just be used for execptional situations and not for control flow?

The Java runtime library supports validation. Last time I checked this was the Apache Xerces parser under the covers. You should probably use a javax.xml.validation.Validator.

The schema factory constant is the string http://www.w3.org/2001/XMLSchema which defines XSDs. The above code validates a WAR deployment descriptor against the URL http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd but you could just as easily validate against a local file.

You should not use the DOMParser to validate a document (unless your goal is to create a document object model anyway). This will start creating DOM objects as it parses the document - wasteful if you aren't going to use them.

Note
Rectangle 27 0

java What's the best way to validate an XML file against an XSD file?


Castor and JAXB are other Java libraries that serve a similar purpose to XMLBeans.

If you are generating XML files programatically, you may want to look at the XMLBeans library. Using a command line tool, XMLBeans will automatically generate and package up a set of Java objects based on an XSD. You can then use these objects to build an XML document based on this schema.

It has built-in support for schema validation, and can convert Java objects to an XML document and vice-versa.

Note