Rectangle 27 0

How to add non escaped ampersands to HTML with Nokogiri::XML::Builder?


builder = Nokogiri::XML::Builder.new do |xml|
    xml.span {
      xml.text "I can has "
      xml.entity 8665
      xml.text " entity?"
    }
  end
  puts builder.to_xml
class Nokogiri::XML::Builder
    def entity(code)
      doc = Nokogiri::XML("<?xml version='1.0'?><root>&##{code};</root>")
      insert(doc.root.children.first)
    end
  end
<!ENTITY bull CDATA "&#8226;" -- bullet, =black small circle, u+2022 ISOpub -->
<?xml version="1.0"?>
<span>I can has &#x2022; entity?</span>

PS this a workaround only, for a clean solution please refer to the libxml2 documentation (Nokogiri is built on libxml2) for more help. However, even these folks admit that handling entities can be quite ..err, cumbersome sometimes.

if I do 8226 instead of 8665, it parses it to "bull;" :/

thanks adrian, what is an "entity", and where'd you get 8665?

Note
Rectangle 27 0

How to add non escaped ampersands to HTML with Nokogiri::XML::Builder?


(In general non-ASCII characters in Ruby 1.8 are tricky. The byte-based interfaces don't mesh too well with XML's world of all-text-is-Unicode.)

So just type a bullet: ''. Of course your source code and your XML file will have to be using the same encoding for that to come out right. If your XML file is UTF-8 but your source code isn't, you'd probably have to say '\xe2\x80\xa2' which is the UTF-8 byte sequence for the bullet character as a string literal.

When you're setting the text of an element, you really are setting text, not HTML source. < and & don't have any special meaning in plain text.

Why do you need that particular escaped version? If you are having encoding troubles so the doesn't appear like you type it, then you should try to fix those by setting your encoding right rather than resorting to HTML-escapes. (Whilst in other environments you might ask your HTML serialiser to escape all non-ASCII characters to HTML-ampersand-sequences to get around this, Ruby doesn't currently have that level of Unicode support.)

Note