[ACCEPTED]-Best way to process large XML in PHP-large-files
For a large file, you'll want to use a SAX parser rather 11 than a DOM parser.
With a DOM parser it will 10 read in the whole file and load it into 9 an object tree in memory. With a SAX parser, it 8 will read the file sequentially and call 7 your user-defined callback functions to 6 handle the data (start tags, end tags, CDATA, etc.)
With 5 a SAX parser you'll need to maintain state 4 yourself (e.g. what tag you are currently 3 in) which makes it a bit more complicated, but 2 for a large file it will be much more efficient 1 memory wise.
My take on it:
https://github.com/prewk/XmlStreamer
A simple class that will extract 3 all children to the XML root element while 2 streaming the file. Tested on 108 MB XML 1 file from pubmed.com.
class SimpleXmlStreamer extends XmlStreamer {
public function processNode($xmlString, $elementName, $nodeIndex) {
$xml = simplexml_load_string($xmlString);
// Do something with your SimpleXML object
return true;
}
}
$streamer = new SimpleXmlStreamer("myLargeXmlFile.xml");
$streamer->parse();
When using a DOMDocument
with large XML files, don't 4 forget to pass the LIBXML_PARSEHUGE
flag in the options 3 of the load()
method. (Same applies for the other 2 load
methods of the DOMDocument
object)
$checkDom = new \DOMDocument('1.0', 'UTF-8');
$checkDom->load($filePath, LIBXML_PARSEHUGE);
(Works with a 120mo 1 XML file)
A SAX Parser, as Eric Petroelje recommends, would 6 be better for large XML files. A DOM parser 5 loads in the entire XML file and allows 4 you to run xpath queries-- a SAX (Simple 3 API for XML) parser will simply read one 2 line at a time and give you hook points 1 for processing.
It really depends on what you want to do 7 with the data? Do you need it all in memory 6 to effectively work with it?
6.5 MB is not 5 that big, in terms of today's computers. You 4 could, for example, ini_set('memory_limit', '128M');
However, if your data 3 can be streamed, you may want to look at 2 using a SAX parser. It really depends on your usage 1 needs.
SAX parser is the way to go. I've found 9 that SAX parsing can get messy if you don't 8 stay organised.
I use an approach based on 7 STX (Streaming Transformations for XML) to 6 parse large XML files. I use the SAX methods 5 to build a SimpleXML object to keep track 4 of the data in the current context (ie just 3 the nodes between the root and the current 2 node). Other functions are then used for 1 processing the SimpleXML document.
I needed to parse a large XML file that 6 happened to have an element on each line 5 (the StackOverflow data dump). In this specific 4 case it was sufficient to read the file 3 one line at a time and parse each line using 2 SimpleXML. For me this had the advantage 1 of not having to learn anything new.
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.