Rectangle 27 3

You can use SAX to do this sort of thing, but you will probably find it gets tedious quickly. SAX is a basic building block sort of tool. Assuming your documents are less than 20 MB or so, you will almost certainly find it more convenient to load the entire document in memory and process it using more powerful tools. DOM is a bit tedious to program against too, mostly because its API is poorly designed, but has the virtue that you can run XPath expressions against it, effectively letting you find all nodes with a certain key value. You might find that other tree APIs like JDOM, XOM and DOM4J are more to your liking. However ultimately you will probably end up wanting to use a richer programming language like XSLT or xquery. XSLT has a built in "key" directive that will let you define an index for rapid lookup of items based on keys like those you describe, and provides a rich programming environment for processing XML.

But isn't it true that for large files, the only feasible way is SAX?

Yes, true - note my caveat of 20 MB for loading into memory. Of course YMMV.

with vtd-xml, that 20MB could easily be 200MB

java - How to use SAX on this xml file - Stack Overflow

java xml sax saxparser
Rectangle 27 1

For your first question: yes, you have to maintain any context that is used by the parser (ie, you have to keep track of the fact that you are in/not-in a Facts element).

As for associating different Fact elements by key, yes, with caveats. You can load the file into DOM (assuming that you have enough memory), then use XPath to extract all elements with a specific FactKey.

//Facts[@FactKey="2832154551"]

However, if you want to read the file and accumulate Facts with the same key, then a multimap is your best bet. A DOM parser may still be useful, as you could have a multimap that associates the string keys to Element values.

java - How to use SAX on this xml file - Stack Overflow

java xml sax saxparser
Rectangle 27 7

The best way I've found (so far) of parsing XML using SAX is to use a stack and conditional statements in the relevant callbacks. Here's an article describing it, and my summary of it:

The basic premise is that as you parse the document, you create objects to store the parsed data, pushing them onto the stack as you go, peeking at the top of the stack to add data to the current element, and at the end of each element popping it off the stack and storing it in the parent.

The effect is that you parse the tree of elements depth first, and at the end of each branch you roll it back into the parent until you're left with a single object (such as your ConnectionList) that contains all of the parsed data ready to be used. Essentially, you end up with a series of objects that mirror the structure of the original XML

That means you need some data objects that can store the data in the same structure as the XML. Complex elements will normally become classes, while simple elements will normally be attributes within classes. The root element is often represented by a list of some kind.

To start with, you create a stack object to hold the data as you parse it.

Then, at the start of each element you identify what type it is using localName.equals() method, create an instance of the appropriate class, and push it into the Stack. If the element is a simple element, you will probably model that as an attribute in the class representing the parent element, and you will need a series of flags that tells the parser if such an element is encountered and what element it is so it can be processed in the characters() method.

The actual data is read using the characters() method, and again you use conditional logic to determine what to do with the data, based on the value of the flag. Essentially, you peek at the top of the stack and use the appropriate method to write the data into the object, converting from text where necessary.

At the end of each element, you pop the top of the stack and use localName.equals() again to determine how to store it in the object before it (e.g. which setter method needs to be called)

When you reach the end of the document you should have captured all the data in the document.

Thanks, Chris! I think that is what I've been looking for :-)

The first advice I usually gives about stax/sax parsing is to not use them whenever they can be avoided...

Sign up for our newsletter and get our top new questions delivered to your inbox (see an example).

java - How to properly parse XML with SAX? - Stack Overflow

java xml parsing sax
Rectangle 27 1

I am not convinced that SAX is the best approach for you. There are different ways you could use SAX here, though.

If element order is not guaranteed within certain elements, like ListingDetails, then you need to be proactive.

When you start a ListingDetails, initialize a map as a member variable on the handler. In each subelement, set the appropriate key-value in that map. When you finish a ListingDetails, examine the map and explicitly mock values such as nulls for the missing elements. Assuming you have one ListingDetails per item, save it to a member variable in the handler.

Now, when your item element is over, have a function that writes the line of CSVs based on the map in the order you wanted.

The risk with this is if you have corrupted XML. I would strongly consider setting all these variables to null when an item starts, and then checking for errors and announcing them when the item ends.

Convert XML file to CSV in java - Stack Overflow

java xml stax
Rectangle 27 3

You can't navigate back and forth when using SAX. You should try using DOM. If you have to use SAX then you can use Stack to hold the previous data and pop them as required.

java - How can I get parent node while i using SAX parser? - Stack Ove...

java xml sax
Rectangle 27 3

If the way you have to chunk your files is fixed and known, the easiest solution is to use SAX or StAX to do it programmatically. I personally prefer StAX for this kind of task as the code is generally cleaner and easier to understand but SAX will do the job equally well.

XSLT is a great tool but its main drawback is that it can only produce one output. And apart from a few exceptions XSLT engines don't support streaming processing, so if the initial file is too big to fit in memory, you can't use them.

Update: In XSLT 2.0 <xsl:result-document> can be used to produce multiple output files, but if you want to get your chunks one by one and not store them in files, it's not ideal.

java - Splitting a big XML file into smaller ones - Stack Overflow

java xml xslt
Rectangle 27 2

I use a boolean variable "stopParse" to consume the listeners since i dont like to use throw new SAXException();

private boolean stopParse;

article.getChild("title").setEndTextElementListener(new EndTextElementListener(){
            public void end(String body) {
                if(stopParse) {
                  return; //if stopParse is true consume the listener.
                }
                setTitle(body);
            }
        });
<root>
        <article>
               <title>Jorgesys</title>
        </article>
        <article>
               <title>Android</title>
        </article>
        <article>
               <title>Java</title>
        </article>
</root>

the parser to get the "title" value using android SAX must be:

import android.sax.Element;
   import android.sax.EndTextElementListener;
   import android.sax.RootElement;
...
...
...
    RootElement root = new RootElement("root");
    Element article= root.getChild("article");
    article.getChild("title").setEndTextElementListener(new EndTextElementListener(){
                public void end(String body) {
                    if(stopParse) {
                      return; //if stopParse is true consume the listener.
                    }
                    setTitle(body);
                }
            });
android.sax.EndTextElementListener
article

Thanks. I'm not developing for Android but clears it up! I guess this is then not doable in regular Java (which I'm using).

java - How to stop parsing xml document with SAX at any time? - Stack ...

java xml sax
Rectangle 27 2

I use a boolean variable "stopParse" to consume the listeners since i dont like to use throw new SAXException();

private boolean stopParse;

article.getChild("title").setEndTextElementListener(new EndTextElementListener(){
            public void end(String body) {
                if(stopParse) {
                  return; //if stopParse is true consume the listener.
                }
                setTitle(body);
            }
        });
<root>
        <article>
               <title>Jorgesys</title>
        </article>
        <article>
               <title>Android</title>
        </article>
        <article>
               <title>Java</title>
        </article>
</root>

the parser to get the "title" value using android SAX must be:

import android.sax.Element;
   import android.sax.EndTextElementListener;
   import android.sax.RootElement;
...
...
...
    RootElement root = new RootElement("root");
    Element article= root.getChild("article");
    article.getChild("title").setEndTextElementListener(new EndTextElementListener(){
                public void end(String body) {
                    if(stopParse) {
                      return; //if stopParse is true consume the listener.
                    }
                    setTitle(body);
                }
            });
android.sax.EndTextElementListener
article

Thanks. I'm not developing for Android but clears it up! I guess this is then not doable in regular Java (which I'm using).

java - How to stop parsing xml document with SAX at any time? - Stack ...

java xml sax
Rectangle 27 1

You're right to say that SAX is probably not the best option if you want to keep the nodes that you've not "consumed". You could still do it using some kind of "sax store" that would keep the SAX events and replay them (there are some few implementations of such a thing around), but an object model based API would be much easier to use: you'd easily keep the complete object model and just update "your" nodes.

Of course, you can use DOM which is the standard, but you may also want to consider alternatives which provide an easier access to the specific nodes that you'll be using in an arbitrary data model. Among them, JDOM (http://www.jdom.org/) and XOM (http://www.xom.nu/) are interesting candidates.

java - How to preserve XML nodes that are not bound to an object when ...

java android xml sax
Rectangle 27 5

The particular example you provide seems to be HTML or XHTML. Trying to edit HTML or XML using regular expressions is frought with problems. For the kind of editing you seem to be interested in doing you should look at using XSLT. Another possibility is to use SAX, the streaming XML parser, and have your back-end write the edited output on the fly. If the text is actually HTML, you might be better using a tolerant HTML parser, such as JSoup, to build a parsed representation of the document (like the DOM), and manipulate that before outputting it.

java - Alternative to successive String.replace - Stack Overflow

java string replace
Rectangle 27 6

If you control the definition of the XML, you could use an XML binding tool, for example JAXB (Java Architecture for XML Binding.) In JAXB you can define a schema for the XML structure (XSD and others are supported) or annotate your Java classes in order to define the serialization rules. Once you have a clear declarative mapping between XML and Java, marshalling and unmarshalling to/from XML becomes trivial.

Using JAXB does require more memory than SAX handlers, but there exist methods to process the XML documents by parts: Dealing with large documents.

java - Better way to parse xml - Stack Overflow

java xml sax
Rectangle 27 10

Unless you're using the research prototype of streaming XPath, it is very likely that your XPath engine is loading everything into memory, so it will have similar characteristics to DOM. So it rather depends on your definition of 'efficiency'. It's certainly easier to use, and the XPath implementations could change to be more efficient, whereas DOM will always have some representation of the whole document on the client machine, and SAX will always be a lot more awkward to program than XPath.

I find it odd that the other answers don't mention your point, since XPath still has to parse the document in some way. DOM, SAX and XPath are different APIs for accessing a document; but only DOM and SAX are parsers of a document. Unless some #C does a parser for XPath that we don't know about?

BTW: your linked XSQ uses SAX for parsing underneath - it doesn't have a specific XPath parser.

Yes, it's a layer above a streaming parser rather than an object model.

xml - Is XPath much more efficient as compared to DOM and SAX? - Stack...

xml dom xpath sax
Rectangle 27 2

In many problems it is necessary to use different kinds of xml files for different purposes. I will not attempt to grasp the immensity and tell from my own experience what I needed all this.

Java, perhaps, my favorite programming language. In addition, this love is strengthened by the fact that you can solve any problem and come up with a bike is not necessary.

So, it took me to create a bunch of client-server running a database that would allow the client to remotely make entries in the database server. Needless to be checking input data, etc. and the like, but it's not about that.

As a principle of work, I, without hesitation, chose the transmission of information in the form of xml file. Of the following types:

<? xml version = "1.0" encoding = "UTF-8" standalone = "no"?> 
<doc> 
<id> 3 </ id> 
<fam> Ivanov </ fam> 
<name> Ivan </ name> 
<otc> I. </ otc> 
<dateb> 10-03-2005 </ dateb> 
<datep> 10-03-2005 </ datep> 
<datev> 10-03-2005 </ datev> 
<datebegin> 09-06-2009 </ datebegin> 
<dateend> 10-03-2005 </ dateend> 
<vdolid> 1 </ vdolid> 
<specid> 1 </ specid> 
<klavid> 1 </ klavid> 
<stav> 2.0 </ stav> 
<progid> 1 </ progid> 
</ doc>

Make it easier to read any further, except to say that it is the information about doctors institutions. Last name, first name, unique id, and so on. In general, the data series. This file safely got on the server side, and then start parsing the file.

Of the two options parsing (SAX vs DOM) I chose SAX view of the fact that he works more bright, and he was the first I fell into the hands :)

So. As you know, to work successfully with the parser, we need to override the needed methods DefaultHandler's. To begin, connect the required packages.

Now we can start writing our parser

public class SAXPars extends DefaultHandler {
 ... 
}

Let's start with the method startDocument (). He, as the name implies, reacts to an event beginning of the document. Here you can hang a variety of actions such as memory allocation, or to reset the values, but our example is pretty simple, so just mark the beginning of work of an appropriate message:

Override 
public void startDocument () throws SAXException {
 System.out.println ("Start parse XML ..."); 
}

Next. The parser goes through the document meets the element of its structure. Starts method startElement (). And in fact, his appearance this: startElement (String namespaceURI, String localName, String qName, Attributes atts). Here namespaceURI - the namespace, localName - the local name of the element, qName- a combination of local name with a namespace (separated by a colon) and atts - the attributes of this element. In this case, all simple. It suffices to use qName'om and throw it into some service line thisElement. Thus we mark in which the element at the moment we are.

@Override 
public void startElement (String namespaceURI, String localName, String qName, Attributes atts) throws SAXException {
 thisElement = qName; 
}

Next, meeting item we get to its meaning. Here include methods characters (). He has the form: characters (char [] ch, int start, int length). Well here everything is clear. ch - a file containing the string itself self-importance within this element. start and length - the number of service indicating the starting point in the line and length.

@Override 
public void characters (char [] ch, int start, int length) throws SAXException {
 if (thisElement.equals ("id")) {
 doc.setId (new Integer (new String (ch, start, length))); 
 } 
 if (thisElement.equals ("fam")) {
 doc.setFam (new String (ch, start, length)); 
 } 
 if (thisElement.equals ("name")) {
 doc.setName (new String (ch, start, length)); 
 } 
 if (thisElement.equals ("otc")) {
 doc.setOtc (new String (ch, start, length)); 
 } 
 if (thisElement.equals ("dateb")) {
 doc.setDateb (new String (ch, start, length)); 
 } 
 if (thisElement.equals ("datep")) {
 doc.setDatep (new String (ch, start, length)); 
 } 
 if (thisElement.equals ("datev")) {
 doc.setDatev (new String (ch, start, length)); 
 } 
 if (thisElement.equals ("datebegin")) {
 doc.setDatebegin (new String (ch, start, length)); 
 } 
 if (thisElement.equals ("dateend")) {
 doc.setDateend (new String (ch, start, length)); 
 } 
 if (thisElement.equals ("vdolid")) {
 doc.setVdolid (new Integer (new String (ch, start, length))); 
 } 
 if (thisElement.equals ("specid")) {
 doc.setSpecid (new Integer (new String (ch, start, length))); 
 } 
 if (thisElement.equals ("klavid")) {
 doc.setKlavid (new Integer (new String (ch, start, length))); 
 } 
 if (thisElement.equals ("stav")) {
 doc.setStav (new Float (new String (ch, start, length))); 
 } 
 if (thisElement.equals ("progid")) {
 doc.setProgid (new Integer (new String (ch, start, length))); 
 } 
}

Ah, yes. I almost forgot. As the object of which will be to fold naparsennye data speaks to the type of Doctors. This class is defined and has all the necessary setters-getters.

Next obvious element ends and it is followed by the next. Responsible for ending the endElement (). It signals to us that the item has ended and you can do anything at this time. Will proceed. Cleanse Element.

@Override 
public void endElement (String namespaceURI, String localName, String qName) throws SAXException {
 thisElement = ""; 
}

Coming so the entire document, we come to the end of the file. Work endDocument (). In it, we can free up memory, do some diagnostichesuyu printing, etc. In our case, just write about what parsing ends.

@Override 
public void endDocument () {
 System.out.println ("Stop parse XML ..."); 
}

So we got a class to parse xml our format. Here is the full text:

import org.xml.sax.helpers.DefaultHandler; 
import org.xml.sax. *; 

public class SAXPars extends DefaultHandler {

Doctors doc = new Doctors (); 
String thisElement = ""; 

public Doctors getResult () {
 return doc; 
} 

@Override 
public void startDocument () throws SAXException {
 System.out.println ("Start parse XML ..."); 
} 

@Override 
public void startElement (String namespaceURI, String localName, String qName, Attributes atts) throws SAXException {
 thisElement = qName; 
} 

@Override 
public void endElement (String namespaceURI, String localName, String qName) throws SAXException {
 thisElement = ""; 
} 

@Override 
public void characters (char [] ch, int start, int length) throws SAXException {
 if (thisElement.equals ("id")) {
 doc.setId (new Integer (new String (ch, start, length))); 
 } 
 if (thisElement.equals ("fam")) {
 doc.setFam (new String (ch, start, length)); 
 } 
 if (thisElement.equals ("name")) {
 doc.setName (new String (ch, start, length)); 
 } 
 if (thisElement.equals ("otc")) {
 doc.setOtc (new String (ch, start, length)); 
 } 
 if (thisElement.equals ("dateb")) {
 doc.setDateb (new String (ch, start, length)); 
 } 
 if (thisElement.equals ("datep")) {
 doc.setDatep (new String (ch, start, length)); 
 } 
 if (thisElement.equals ("datev")) {
 doc.setDatev (new String (ch, start, length)); 
 } 
 if (thisElement.equals ("datebegin")) {
 doc.setDatebegin (new String (ch, start, length)); 
 } 
 if (thisElement.equals ("dateend")) {
 doc.setDateend (new String (ch, start, length)); 
 } 
 if (thisElement.equals ("vdolid")) {
 doc.setVdolid (new Integer (new String (ch, start, length))); 
 } 
 if (thisElement.equals ("specid")) {
 doc.setSpecid (new Integer (new String (ch, start, length))); 
 } 
 if (thisElement.equals ("klavid")) {
 doc.setKlavid (new Integer (new String (ch, start, length))); 
 } 
 if (thisElement.equals ("stav")) {
 doc.setStav (new Float (new String (ch, start, length))); 
 } 
 if (thisElement.equals ("progid")) {
 doc.setProgid (new Integer (new String (ch, start, length))); 
 } 
} 

@Override 
public void endDocument () {
 System.out.println ("Stop parse XML ..."); 
} 
}

I hope the topic helped to easily present the essence of the SAX parser.

Do not judge strictly first article :) I hope it was at least someone useful.

UPD: To run this parser, you can use this code:

SAXParserFactory factory = SAXParserFactory.newInstance (); 
SAXParser parser = factory.newSAXParser (); 
SAXPars saxp = new SAXPars (); 

parser.parse (new File ("..."), saxp);

java - How to parse XML using the SAX parser - Stack Overflow

java android xml parsing sax
Rectangle 27 3

I'd recommend using a SAX parser rather than a DOM parser for a file this large. Nokogiri has a nice SAX parser built in: http://nokogiri.org/Nokogiri/XML/SAX.html

The SAX way of doing things is nice for large files simply because it doesn't build a giant DOM tree, which in your case is overkill; you can build up your own structures when events fire (for counting nodes, for example).

FWIW, see my answer for the comparison; though the memory savings of SAX are nice (critical sometimes), the performance is worse even for something so trivial as this.

Parsing Large XML files w/ Ruby & Nokogiri - Stack Overflow

ruby xml nokogiri
Rectangle 27 2

Thanks a lot for the info. XERCES is a standalone JAR file or does it come with JDK? Is JAXP better than xerces ? Also when i want to access one or two elements of a large message if i use SAX parser and destroy the parser object is it's efficient? Otherwise it works the same way like DOM Parser? what is the advantage of using it? Also if i want to create SOAP Message for consuming webservice can i use the xerces parser? what should i use?

The advantage of a SAX parser is that it hands you the elements as it parses them. So if you just want to know something like a country code before you pass the xml on to another process this is very efficient. A DOM parser on other hand builds a complex structure of nested dictionaries and lists, you must wait for all the XML to be parsed and the complete tree to be built before you can access any elements. If you just want to examine one or two elements its very expensive. As SAX is a stream parser you can handle documents of any size. A DOM parser needs to fit everything in memory.

Thanks a lot for the info

Parsing XML Elements in JAVA - Stack Overflow

java xml dom java-ee sax
Rectangle 27 11

DOM was designed as a language-independent object model to hold any XML data, and as such is a large and complex system. It suits well the two-phase approach of first loading an XML document in, then performing various operations on it. SAX, on the other hand, was designed as a fairly light-weight system using a single-phase approach. With SAX, user-specified operations are performed as the document is loaded. Some applications use SAX to generate a smaller object model, with uninteresting information filtered out, which is then processed similarly to DOM. Note that although DOM and SAX are the well-known "standard" XML APIs, there are plenty of others available, and sometimes a particular application may be better off using a non-standard API. With XML the important bit is always the data; code can be rewritten.

  • SAX is good for large documents because it takes comparitively less memory than Dom.
  • SAX takes less time to read a document where as Dom takes more time.
  • With SAX we can access data but we can't modify data.With Dom we can modify data.
  • We can stop the SAX parsing when ever and where ever you want.
  • SAX is sequential parsing but with DOM we can move to back also.
  • To parse machine generated code SAX is better.To parse human readable documents DOM is useful.

can they both skip tags but? Or only SAX can? In other words- does DOM need to parse the all the tags?

DOM cannot skip tags, it returns all the information as a tree model, then from the tree model you can choose which specific information you want... so to answer your question, no DOM cannot skip tags, only SAX can. There are different ways to return only information from a certain level of the tree(like the lowset level of the tree) but you still need to go through all the tags to do this

android - DOM vs SAX Java - Stack Overflow

java android parsing dom xml-parsing
Rectangle 27 1

Since you want an instance of some Product class that encapsulates the data from the XML, probably in a structured way by preference, you'd do much better to use JAXB for this task. Unless you have really specific requirements regarding the customization of binding XML input to objects, this will turn out a lot simpler than using SAX.

  • Get a W3C XML Schema for your XML. If you don't have one and can't obtain one, then there are tools out there that can generate a schema based on input XML. Trang makes this very easy.
  • Generate Java classes from the schema. For this you can use XJC (the XML-to-Java Compiler) available with Sun's JDK. If you're using build tools like Ant or Maven, there's plugins available. This can automate the process to make it part of a full build.
  • Use the JAXB API with your generated classes to easily turn XML documents into objects and vice-versa.

Although JAXB will take some time to learn (especially if the desired XML-Java mapping isn't 1-to-1), I think you'll end up saving time in the long run.

Sorry if you really do need SAX and this answer is not applicable, but I figured I'd rather let you know your options before using a somewhat archaic XML processing model. Note that DOM and StAX might also be of interest to you.

java - How to create a handler for parsing a xml using SAX? - Stack Ov...

java xml spring sax xerces
Rectangle 27 1

Add root element and use SAX, STax or VTD-XML ..

I linked the meta account with this one, where is the 100 points that you promised?

Parsing large pseudo-xml files in python - Stack Overflow

python xml