Rectangle 27 2

<div id="divID" name="notWorking">This is not working!</div>
<?php

    $dom = new DOMDocument("1.0", "utf-8");
    $dom->loadHTMLFile('YourFile.html');
    $div = $dom->getElementById('divID');

    echo $div->textContent;

    $div->setAttribute("name", "yesItWorks");
?>

Should work without the file as long as you pass a Well-Made XML or XHTML content, changing

$dom->loadHTMLFile('YourFile.html');
$dom->loadHTML($html);

Oh yeah, and of course, to CHANGE the content (For completeness):

$div->removeChild($div->firstChild);
$newText = new DOMText('Yes this works!');
$div->appendChild($newText);

Then you can just Echo it again or something.

Life saver. I changed it up a little to fit my exact situation, but this helped out so much. Also, I might add for anyone else that uses this... if the HTML file is not "properly formatted" it will run, but give a bunch of warnings. You can suppress markup errors by switching... $dom->loadHTMLFile('file.html'); ... to ... @$dom->loadHTMLFile('file.html');

In PHP, using DomDocument getElementByID not working? What am I doing ...

php domdocument getelementbyid nodevalue
Rectangle 27 1

getElementById
$caption = "blah blah<p id ='test'>Test message</p>";
$doc = new DOMDocument;
$doc->validateOnParse = true;  // validate HTML
$doc->loadHTML($caption);  // This loads an HTML string
$xmessage = $doc->getElementById('test');

(NOTE: You need to use loadHTML, not loadHTMLFile).

This still may not work, as the HTML may not be valid.

$caption = "blah blah<p id ='test'>Test message</p>";
$doc = new DOMDocument;
$doc->loadHTMLFile($caption);
$xpath = new DOMXPath($doc);
$xmessage = $xpath->query("//p[@id='test']")->item(0);

domdocument - Remove paragraph by id with php Dom - Stack Overflow

php domdocument
Rectangle 27 5

$dom = new DOMDocument;
$dom->loadHTML($html_content);

function preg_replace_dom($regex, $replacement, DOMNode $dom, array $excludeParents = array()) {
  if (!empty($dom->childNodes)) {
    foreach ($dom->childNodes as $node) {
      if ($node instanceof DOMText && 
          !in_array($node->parentNode->nodeName, $excludeParents)) 
      {
        $node->nodeValue = preg_replace($regex, $replacement, $node->nodeValue);
      } 
      else
      {
        preg_replace_dom($regex, $replacement, $node, $excludeParents);
      }
    }
  }
}

preg_replace_dom('/match this text/i', 'IT WORKS', $dom->documentElement, array('a'));

php - Regex / DOMDocument - match and replace text not in a link - Sta...

php regex xpath preg-replace domdocument
Rectangle 27 2

DOMDocument::createElementNS()
$root = $dom->createElementNS("http://www.sitemaps.org/schemas/sitemap/0.9", "urlset");

xml - PHP DOMDocument - how to add Namespace declaration? - Stack Over...

php xml domdocument
Rectangle 27 18

You can use childNodes. This is a property of a DOM element that contains a NodeList containing all the element's children. Ideally you'd be able to do $el->childNodes->item(2) (note that it's 0-based, not 1-based, so 2 is the third item). However, this includes text nodes. So it's hard to predict what number your node will be. This probably isn't the best solution.

You could go with alexn's solution (getElementsByTagName('*')->item(2)), but again this has its drawbacks. If your nodes have child nodes, they will also be included in the selection. This could throw your calculation off.

My preferred solution would be to use XPath: it's probably the most stable solution, and not particularly hard.

You'll need to have created an XPath object with $xpath = new DOMXPath($document) somewhere, where $document is your DOMDocument instance. I'm going to assume that $el is the parent div node, the "context" that we're searching in.

$node = $x->query('*', $el)->item(2);

Note that, again, we're using a 0-based index to find which element in the selection it is. Here, we're looking at child nodes of the top level div only, and * selects only element nodes, so the calculations with text nodes are unnecessary.

+1 for XPath, that really is the solution

hi, thanks for you answer. I tried $el->childNode->item(2). it striped off all the html tags. I also tried $x->query('/div/*', $el)->item(2);but notice that the child node is not fixed with div... I need to dynamiclly to get a child level by level down with a serial index like 1 3 4 0.

query('*', $el)

dom - How to get a child of PHP DOMDocument by index - Stack Overflow

php dom
Rectangle 27 2

descendent
descendant
descendant::p[@class="4textlist"]
.//p[@class="4textlist"]

Thank you @Jens, when changing to $xpath->query('descendant::p[@class="4textlist"]', $paragraph); this returns me 0 results. When doint with your 2nd proposal, I am getting all the 6 p tags with this class, I need to query only in the selected element, not in the entire domdocument.

Just add the current context (.). I updated the answer.

Thanks again, but in this case, I am getting an empty result with the 2nd proposal.

Then your current context does not contain the elements. What does the XPath expression . return? Or are XML namespaces involved?

Hm, when trying to get the node value of the current $paragraph, I am getting the the content I am waiting for. The XPath expression . returns me a node list with 1 element.

Sign up for our newsletter and get our top new questions delivered to your inbox (see an example).

xml - XPath and PHP DomDocument - How to select all childs with specif...

php xml xpath domdocument
Rectangle 27 2

$a='<p>Match this text and replace it</p>
<p>Don\'t <a href="/">match this text</a></p>
<p>We still need to match this text and replace it</p>';

echo preg_replace('~match this text(?![^<]*</a>)~i','replacement',$a);

The negative lookahead ensures the replacement happens only if the next tag is not a closing link . It works fine with your example, though it won't work if you happen to use other tags inside your links.

php - Regex / DOMDocument - match and replace text not in a link - Sta...

php regex xpath preg-replace domdocument
Rectangle 27 1

DOMDocument
getElementsByTagName()
coordinates
$dom = new DOMDocument();
// load file 
$dom->load("file.kml");
// get coordinates tag
$coordinates = $dom->getElementsByTagName("coordinates");
foreach($coordinates as $coordinate){
    echo $coordinate->nodeValue;
}

xml - How to get specific tag in KML file using php DOMDocument? - Sta...

php xml parsing kml domdocument
Rectangle 27 1

Certain versions of libxml require a doctype to be present in order that getElementById will work correctly, hence the quite "hacky" approach here which tricks libxml slightly.

$doc='<!doctype>';

    $html='
        <div class="something">
            important stuff
            <div id="delete_me">
                not so important stuff, better delete me
            </div>
        </div>';

    /* append the doctype */
    $html=$doc . $html;

    $dom=new DOMDocument;
    $dom->validateOnParse = false;
    $dom->loadHTML( $html );

    /* get the element to be deleted */
    $div=$dom->getElementById('delete_me');

    /* delete the node */
    if( $div && $div->nodeType==XML_ELEMENT_NODE ){
        $div->parentNode->removeChild( $div );
    }
    echo $dom->saveHTML();
    $dom=null;

Alternatively use DOMXPath to find the element by querying for the id and delete.

$html='
        <div class="something">
            important stuff
            <div id="delete_me">
                not so important stuff, better delete me
            </div>
        </div>';
    $dom=new DOMDocument;
    $dom->validateOnParse = false;
    $dom->loadHTML( $html );
    $xp=new DOMXPath( $dom );

    $col = $xp->query( '//div[ @id="delete_me" ]' );
    if( !empty( $col ) ){
        foreach( $col as $node ){
            $node->parentNode->removeChild( $node );
        }
    }
    echo $dom->saveHTML();
    $dom=null;

PHP DOMDocument, remove element by Id - Stack Overflow

php domdocument
Rectangle 27 25

The first level of elements below the root node can be accessed with

$dom->documentElement->childNodes

The childNodes property contains a DOMNodeList, which you can iterate with foreach.

DOMDocument::documentElement

This is a convenience attribute that allows direct access to the child node that is the document element of the document.

and DOMNode::childNodes

A DOMNodeList that contains all children of this node. If there are no children, this is an empty DOMNodeList.

Since childNodes is a property of DOMNode any class extending DOMNode (which is most of the classes in DOM) have this property, so to get the first level of elements below a DOMElement is to access that DOMElement's childNode property.

Note that if you use DOMDocument::loadHTML() on invalid HTML or partial documents, the HTML parser module will add an HTML skeleton with html and body tags, so in the DOM tree, the HTML in your example will be

<!DOCTYPE html  ">
<html><body><div id="header">
</div>
<div id="content">
    <div id="sidebar">
    </div>
    <div id="info">
    </div>
</div>
<div id="footer">
</div></body></html>

which you have to take into account when traversing or using XPath. Consequently, using

$dom = new DOMDocument;
$dom->loadHTML($str);
foreach ($dom->documentElement->childNodes as $node) {
    echo $node->nodeName; // body
}

will only iterate the <body> DOMElement node. Knowing that libxml will add the skeleton, you will have to iterate over the childNodes of the <body> element to get the div elements from your example code, e.g.

$dom->getElementsByTagName('body')->item(0)->childNodes

However, doing so will also take into account any whitespace nodes, so you either have to make sure to set preserveWhiteSpace to false or query for the right element nodeType if you only want to get DOMElement nodes, e.g.

foreach ($dom->getElementsByTagName('body')->item(0)->childNodes as $node) {
    if ($node->nodeType === XML_ELEMENT_NODE) {
        echo $node->nodeName;
    }
}
$dom->loadHTML($str);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('/html/body/*') as $node) {
    echo $node->nodeName;
}

xpath - How get first level of dom elements by Domdocument PHP? - Stac...

php xpath domdocument
Rectangle 27 25

The first level of elements below the root node can be accessed with

$dom->documentElement->childNodes

The childNodes property contains a DOMNodeList, which you can iterate with foreach.

DOMDocument::documentElement

This is a convenience attribute that allows direct access to the child node that is the document element of the document.

and DOMNode::childNodes

A DOMNodeList that contains all children of this node. If there are no children, this is an empty DOMNodeList.

Since childNodes is a property of DOMNode any class extending DOMNode (which is most of the classes in DOM) have this property, so to get the first level of elements below a DOMElement is to access that DOMElement's childNode property.

Note that if you use DOMDocument::loadHTML() on invalid HTML or partial documents, the HTML parser module will add an HTML skeleton with html and body tags, so in the DOM tree, the HTML in your example will be

<!DOCTYPE html  ">
<html><body><div id="header">
</div>
<div id="content">
    <div id="sidebar">
    </div>
    <div id="info">
    </div>
</div>
<div id="footer">
</div></body></html>

which you have to take into account when traversing or using XPath. Consequently, using

$dom = new DOMDocument;
$dom->loadHTML($str);
foreach ($dom->documentElement->childNodes as $node) {
    echo $node->nodeName; // body
}

will only iterate the <body> DOMElement node. Knowing that libxml will add the skeleton, you will have to iterate over the childNodes of the <body> element to get the div elements from your example code, e.g.

$dom->getElementsByTagName('body')->item(0)->childNodes

However, doing so will also take into account any whitespace nodes, so you either have to make sure to set preserveWhiteSpace to false or query for the right element nodeType if you only want to get DOMElement nodes, e.g.

foreach ($dom->getElementsByTagName('body')->item(0)->childNodes as $node) {
    if ($node->nodeType === XML_ELEMENT_NODE) {
        echo $node->nodeName;
    }
}
$dom->loadHTML($str);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('/html/body/*') as $node) {
    echo $node->nodeName;
}

xpath - How get first level of dom elements by Domdocument PHP? - Stac...

php xpath domdocument
Rectangle 27 1

For the second part of the question, the result of the query has a length property which you can use to see if anything was matched:

$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//div[contains(attribute::class, "foo")]');

printf('Removing %d nodes', $nodes->length);

PHP DOMDocument: Delete elements by class - Stack Overflow

php domdocument
Rectangle 27 5

You can to use XPath to directly grab the element you're looking for:

$dom = new DOMDocument();
$dom->loadXML('<p><a id="1">test 1</a><span><a id="2">test 2</a></span></p>');

$xpath = new DOMXpath( $dom);
$a = $xpath->query( '//a[@id="1"]')->item( 0);
echo $a->textContent;
test 1

+1 for beating me to the punch. Good catch. Specifying the first index ->item(0); will ensure that only one element is returned instead of a DOMNodeList.

Sign up for our newsletter and get our top new questions delivered to your inbox (see an example).

PHP DomDocument - getAllChildrenByTagName - Stack Overflow

php domdocument children
Rectangle 27 5

you might have to set preserveWhiteSpace to false though as well

php domdocument: when i create an xml, how can i ident each element pr...

php xml domdocument
Rectangle 27 2

$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->loadXML("<div>$xhtml</div>"); // we need the div as root element

// find all img elements in paragraphs in the partial body
$xp = new DOMXPath($dom);
foreach ($xp->query('/div/p/img') as $img) {

    $parentNode = $img->parentNode; // store for later
    $parentNode->removeChild($img); // unlink all found img elements

    // create a element
    $a = $dom->createElement('a');
    $a->setAttribute('href', '/files/fullview/' . basename($img->getAttribute('src')));
    $a->setAttribute('rel', sprintf('lightbox[group][%s]', $img->getAttribute('alt')));
    $a->appendChild($img);

    // prepend img src with path to thumbs and remove alt attribute
    $img->setAttribute('href', '/files/thumbs' . $img->getAttribute('src'));
    $img->removeAttribute('alt'); // imo you should keep it for accessibility though

    // create the holding div
    $div = $dom->createElement('div');
    $div->setAttribute('class', 'custom');
    $div->appendChild($a);

    // insert the holding div
    $parentNode->parentNode->insertBefore($div, $parentNode);
}

$dom->formatOutput = true;
echo $dom->saveXml($dom->documentElement);

Thank you very much for the code. As a whole I couldn't get it to work, the only output ever was the xml declaration line. I am not experienced enough to debug this but I did use your way to get the attributes from the original images.

@Paul im not sure what you mean by you couldnt get it to work as a whole. The code snippet above produces the wanted output you show in your question. Click the demo link on top to see.

I did look at the demo and the more I was confused why it didn't work when I embedded that into my code. The last two lines regarding the output still do puzzle me. But even when I changed that bit the script always returned <!--?xml version="1.0" encoding="utf-8"?--> and nothing else.

@Paul formatOutput in combination with preserveWhiteSpace at the top makes sure you get a pretty printed xml string instead of a compact one. Passing the documentElement to saveXml makes sure you dont get the xml prolog.

Now that's weird since I got the xml prolog and only that. Well, then I probably messed something up. But I really appreciate your solution, I got quite some understanding of the whole system out of it. Thanks again.

xpath - How to properly replace inline images with styled floating div...

php xpath domdocument
Rectangle 27 4

Returns a typed result if possible or a DOMNodeList containing all nodes matching the given XPath expression.

so your XPath returns multiple nodes apparently, which likely stems from you using // which means "find everywhere". If you do echo $url->length; you'll see there is 460 items (no matter the passed context node).

  • //para selects all the para descendants of the document root and thus selects all para elements in the same document as the context node
  • .//para selects the para element descendants of the context node

So you need to use .//a/@href instead. This will give only 1 result for echo $url->length; then but it cannot be returned as a typed result, so you have to change your code to

$url = $xpath->evaluate('string(.//a/@href)', $event);
$nom = $xpath->evaluate('string(.//a)', $event);
$lieu = $xpath->evaluate('string(../li[@class="lieu"]/a)', $event);

Also note that you can shorten your DOMDocument creation and loading to

libxml_use_internal_errors(true);
$doc = new DOMDocument;
$doc->loadHTMLFile('http://www.parisbouge.com/events/2012/05/01/');
libxml_use_internal_errors

I tried whith .//a/@href and still the same error

xpath - PHP - DOMXpath - Get the result - Stack Overflow

php xpath domdocument domxpath evaluate
Rectangle 27 3

If you must save the HTML as a string, there is DOMDocument::saveHTML

$elems = $xpath->query('//tr');

foreach ($elems as $elem) {
  $array[] = $doc->saveHTML($elem);
}
saveHTML

I'd recommend saving the nodes themselves, though, and converting them to string only shortly before you output them.

That's what I was doing wrong. I was using saveHTML(). Thanks!

Already up-voted, I'm way ahead of you. The OP explicitly stated DOMXPath, but how exactly the nodes are being selected is actually secondary for this question.

php domdocument or domxpath: how to extract TRs and save html - Stack ...

php domdocument domxpath
Rectangle 27 22

The quick solution to your problem is to use an xPath expression to grab the body.

$dom= new DOMDocument();
$dom->loadHTML('<div><p>Hello World');      
$xpath = new DOMXPath($dom);
$body = $xpath->query('/html/body');
echo($dom->saveXml($body->item(0)));

A word of warning here. Sometimes loadHTML will throw a warning when it encounters certainly poorly formed HTML documents. If you're parsing those kind of HTML documents, you'll need to find a better html parser [self link warning].

this will return <body>[CONTENT]</body> ... how can you get just [CONTENT]?

you can always do a search and replace before output ...

PHP DOMDocument - get html source of BODY - Stack Overflow

php html dom parsing domdocument
Rectangle 27 4

If you use DOMDocument you can use getElementsByTagName('*') which returns a DomNodeList with all elements in your document. You can then invoke the item function which takes an index as a parameter:

$nodes = $dom->getElementsByTagName('*');
$targetNode = $nodes->item(3);

Hi, thank you for the answer, your solution gets all the element from all descendent nodes. and all html tags are striped off. my intention is to get a child level by level down with a serials of index like 1 3 4 0. Hope this is clear.

dom - How to get a child of PHP DOMDocument by index - Stack Overflow

php dom