Rectangle 27 42

You can get a string from the element and then write that from lxml tutorial

str = etree.tostring(root, pretty_print=True)
et = etree.ElementTree(root)
et.write(sys.stdout, pretty_print=True)
write
pretty_print=True
str
with open('pretty.html', 'wb') as file: file.write(str)

As of python3, you need to use sys.stdout.buffer instead of sys.stdout - which essentially is the same as what @laviex pointed out, only for the special case of sys.stdout.

Write xml file using lxml library in Python - Stack Overflow

python xml lxml
Rectangle 27 23

To get to your particular troubles, as the comments point out, you're missing GCC. On OS X, Xcode Command Line Tools provides GCC, as well as many other programs necessary for building software on OS X. For OS X 10.9 (Mavericks) and newer, either install Xcode through the App Store, or alternatively, install only the Xcode Command Line Tools with

xcode-select --install

For more details, please see the Apple Developer FAQ or search the web for "install Xcode Command Line Tools".

For older versions of OS X, you can get Xcode Command Line Tools from the downloads page of the Apple Developer website (free registration required).

Once you have GCC installed, you may still encounter errors during compilation if the C/C++ library dependencies are not installed on your system. On OS X, the Homebrew project is the easiest way to install and manage such dependencies. Follow the instructions on the Homebrew website to install Homebrew on your system, then issue

brew update
brew install libxml2 libxslt

Possibly causing further trouble in your case, you placed the downloaded setuptools in /Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/. Please do not download any files to this location. Instead, I suggest you download the file to your home directory, or your usual Downloads directory. After downloading it, you're supposed to run sh setuptools-X.Y.Z.egg, which will then install it properly into the appropriate site-packages and put the executable easy_install on your path.

+1 for the great graphic

Above pip link is broken. I think it is this: pip.pypa.io/en/latest

Sign up for our newsletter and get our top new questions delivered to your inbox (see an example).

python - Installing easy_install... to get to installing lxml - Stack ...

python lxml easy-install
Rectangle 27 7

I had the same problem. If you have installed it with pip as follows: pip install lxml

STATIC_DEPS=true pip install lxml

thanks, i have already installed that package after dozens of tests. but your solution seems good.

get errors when import lxml.etree to python - Stack Overflow

python python-2.7 lxml
Rectangle 27 7

I had the same problem. If you have installed it with pip as follows: pip install lxml

STATIC_DEPS=true pip install lxml

thanks, i have already installed that package after dozens of tests. but your solution seems good.

get errors when import lxml.etree to python - Stack Overflow

python python-2.7 lxml
Rectangle 27 7

I had the same problem. If you have installed it with pip as follows: pip install lxml

STATIC_DEPS=true pip install lxml

thanks, i have already installed that package after dozens of tests. but your solution seems good.

get errors when import lxml.etree to python - Stack Overflow

python python-2.7 lxml
Rectangle 27 5

If you've installed libxml2, then it's possible that it's just not picking up the right version (there's a version installed with OS X by default). In particular, suppose you've installed libxml2 to /usr/local. You can check what shared libraries etree.so references:

$> otool -L /Library/Python/2.7/site-packages/lxml-3.2.1-py2.7-macosx-10.7-intel.egg/lxml/etree.so 
/Library/Python/2.7/site-packages/lxml-3.2.1-py2.7-macosx-10.7-intel.egg/lxml/etree.so:
    /usr/lib/libxslt.1.dylib (compatibility version 3.0.0, current version 3.24.0)
    /usr/local/lib/libexslt.0.dylib (compatibility version 9.0.0, current version 9.17.0)
    /usr/lib/libxml2.2.dylib (compatibility version 10.0.0, current version 10.3.0)
    /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.5)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 159.1.0)

Checking for that symbol in the system-installed version:

$> nm /usr/lib/libxml2.2.dylib | grep ___xmlStructuredErrorContext

For me, it's not present in the system-installed library. In the version I installed, however:

$> nm /usr/local/lib/libxml2.2.dylib | grep ___xmlStructuredErrorContext
000000000007dec0 T ___xmlStructuredErrorContext
DYLD_LIBRARY_PATH
$> export DYLD_LIBRARY_PATH=/usr/local/lib
$> python
>>> from lxml import etree
# Success!

This solved my problem after an hour of trying... What is the permanent solution to this problem?

/etc/ld.so.conf.d/lxml.conf
ldconfig

as for me, doing the opposite solved my problem (because the symbol was actually in the system's libxml2 version), so I had to put /usr/lib as the first entry of DYLD_LIBRARY_PATH

get errors when import lxml.etree to python - Stack Overflow

python python-2.7 lxml
Rectangle 27 5

If you've installed libxml2, then it's possible that it's just not picking up the right version (there's a version installed with OS X by default). In particular, suppose you've installed libxml2 to /usr/local. You can check what shared libraries etree.so references:

$> otool -L /Library/Python/2.7/site-packages/lxml-3.2.1-py2.7-macosx-10.7-intel.egg/lxml/etree.so 
/Library/Python/2.7/site-packages/lxml-3.2.1-py2.7-macosx-10.7-intel.egg/lxml/etree.so:
    /usr/lib/libxslt.1.dylib (compatibility version 3.0.0, current version 3.24.0)
    /usr/local/lib/libexslt.0.dylib (compatibility version 9.0.0, current version 9.17.0)
    /usr/lib/libxml2.2.dylib (compatibility version 10.0.0, current version 10.3.0)
    /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.5)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 159.1.0)

Checking for that symbol in the system-installed version:

$> nm /usr/lib/libxml2.2.dylib | grep ___xmlStructuredErrorContext

For me, it's not present in the system-installed library. In the version I installed, however:

$> nm /usr/local/lib/libxml2.2.dylib | grep ___xmlStructuredErrorContext
000000000007dec0 T ___xmlStructuredErrorContext
DYLD_LIBRARY_PATH
$> export DYLD_LIBRARY_PATH=/usr/local/lib
$> python
>>> from lxml import etree
# Success!

This solved my problem after an hour of trying... What is the permanent solution to this problem?

/etc/ld.so.conf.d/lxml.conf
ldconfig

as for me, doing the opposite solved my problem (because the symbol was actually in the system's libxml2 version), so I had to put /usr/lib as the first entry of DYLD_LIBRARY_PATH

get errors when import lxml.etree to python - Stack Overflow

python python-2.7 lxml
Rectangle 27 5

If you've installed libxml2, then it's possible that it's just not picking up the right version (there's a version installed with OS X by default). In particular, suppose you've installed libxml2 to /usr/local. You can check what shared libraries etree.so references:

$> otool -L /Library/Python/2.7/site-packages/lxml-3.2.1-py2.7-macosx-10.7-intel.egg/lxml/etree.so 
/Library/Python/2.7/site-packages/lxml-3.2.1-py2.7-macosx-10.7-intel.egg/lxml/etree.so:
    /usr/lib/libxslt.1.dylib (compatibility version 3.0.0, current version 3.24.0)
    /usr/local/lib/libexslt.0.dylib (compatibility version 9.0.0, current version 9.17.0)
    /usr/lib/libxml2.2.dylib (compatibility version 10.0.0, current version 10.3.0)
    /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.5)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 159.1.0)

Checking for that symbol in the system-installed version:

$> nm /usr/lib/libxml2.2.dylib | grep ___xmlStructuredErrorContext

For me, it's not present in the system-installed library. In the version I installed, however:

$> nm /usr/local/lib/libxml2.2.dylib | grep ___xmlStructuredErrorContext
000000000007dec0 T ___xmlStructuredErrorContext
DYLD_LIBRARY_PATH
$> export DYLD_LIBRARY_PATH=/usr/local/lib
$> python
>>> from lxml import etree
# Success!

This solved my problem after an hour of trying... What is the permanent solution to this problem?

/etc/ld.so.conf.d/lxml.conf
ldconfig

as for me, doing the opposite solved my problem (because the symbol was actually in the system's libxml2 version), so I had to put /usr/lib as the first entry of DYLD_LIBRARY_PATH

get errors when import lxml.etree to python - Stack Overflow

python python-2.7 lxml
Rectangle 27 10

sudo apt-get install python-lxml

Better to use pip. It will give you a more recent version than your package maintainer. Sooner or later you'll be glad you used it.

python - Installing easy_install... to get to installing lxml - Stack ...

python lxml easy-install
Rectangle 27 2

How to get content of page dynamically modified within browser?

Here is snippet from the page:

<form id="vCSS_mainform" method="post" name="MainForm" action="/ProductDetails.asp?ProductCode=MCFFGB" onsubmit="javascript:return QtyEnabledAddToCart_SuppressFormIE();">
      <img src="/v/vspfiles/templates/MAKO/images/clear1x1.gif" width="5" height="5" alt="" /><br />
      <table width="100%" cellpadding="0" cellspacing="0" border="0" id="v65-product-parent">
        <tr>
          <td colspan="2" class="vCSS_breadcrumb_td"><b>
&nbsp; 
<a href="http://www.makospearguns.com/">Home</a> >
id
"v65-product-parent" is of type
and has subelement

There can be only one element with such id (otherwise it would be broken xml).

The xpath is expecting tbody as child of given element (table) and there is none in whole page.

>>> "tbody" in page.text
False
$ wget http://www.makospearguns.com/product-p/mcffgb.htm

and review content of it, it does not contain a single element named tbody

This often happens, if JavaScript comes into play and generates some page content when in the browser. But as LegoStormtroopr noted, this is not our case and this time it is the browser, which modifies document to make it correct.

You have to give some sort of browser a chance. E.g. if you use selenium, you would get it.

from selenium import webdriver
from lxml import html

url = "http://www.makospearguns.com/product-p/mcffgb.htm"
xpath = '//*[@id="v65-product-parent"]/tbody/tr[2]/td[2]/table[1]/tbody/tr/td/table/tbody/tr[2]/td[2]/table/tbody/tr[1]/td[1]/div/table/tbody/tr/td/font/div/b/span/text()'

browser = webdriver.Firefox()
browser.get(url)
html_source = browser.page_source
print "test tbody", "tbody" in html_source

tree = html.fromstring(html_source) 
text = tree.xpath(xpath)
print text
$ python byselenimum.py 
test tbody True
['$149.95']

Selenium is great when it comes to changes within browser. However it is a bit heavy tool and if you can do it simpler way, do it that way. Lego Stormrtoopr have proposed such a simpler solution working on simply fetched web page.

I just now went to the page and inspected it. When I right click on the span with the price and select "Copy XPath", this is exactly what it gives me. And when I plug that copied xpath into firepath, it shows me the correct part of the page. If the path is simply wrong than why did that work?

-1 because this "It gets generated dynamically by JavaScript after it is loaded into browser" is wrong.

@JanVlcinsky Check my answer. The page is altered by the browser to massage it into the DOM, before any Javascript is called.

xml - Why does this xpath fail using lxml in python? - Stack Overflow

python xml xpath lxml
Rectangle 27 16

Parameters in a URL (e.g. key=listOfUsers/user1) are GET parameters and you shouldn't be using them for POST requests. A quick explanation of the difference between GET and POST can be found here.

In your case, to make use of REST principles, you should probably have:

http://ip:5000/users
http://ip:5000/users/<user_id>

Then, on each URL, you can define the behaviour of different HTTP methods (GET, POST, PUT, DELETE). For example, on /users/<user_id>, you want the following:

GET /users/<user_id> - return the information for <user_id>
POST /users/<user_id> - modify/update the information for <user_id> by providing the data
PUT - I will omit this for now as it is similar enough to `POST` at this level of depth
DELETE /users/<user_id> - delete user with ID <user_id>

So, in your example, you want do a POST to /users/user_1 with the POST data being "John". Then the XPath expression or whatever other way you want to access your data should be hidden from the user and not tightly couple to the URL. This way, if you decide to change the way you store and access data, instead of all your URL's changing, you will simply have to change the code on the server-side.

Now, the answer to your question: Below is a basic semi-pseudocode of how you can achieve what I mentioned above:

@app.route('/users/<user_id>', methods = ['GET', 'POST', 'DELETE'])
def user(user_id):
    if request.method == 'GET':
        """return the information for <user_id>"""
        .
        .
        .
    if request.method == 'POST':
        """modify/update the information for <user_id>"""
        # you can use <user_id>, which is a str but could
        # changed to be int or whatever you want, along
        # with your lxml knowledge to make the required
        # changes
        data = request.form # a multidict containing POST data
        .
        .
        .
    if request.method == 'DELETE':
        """delete user with ID <user_id>"""
        .
        .
        .
else:
    # POST Error 405 Method Not Allowed
    .
    .
    .

There are a lot of other things to consider like the POST request content-type but I think what I've said so far should be a reasonable starting point. I know I haven't directly answered the exact question you were asking but I hope this helps you. I will make some edits/additions later as well.

Thanks and I hope this is helpful. Please do let me know if I have gotten something wrong.

do you have to do something special for the POST to get routed back correctly? I have /competitions/<int: id> set up but when the POST occurs, it posts to /competitions instead so my post handling logic is never reached.

python - Flask example with POST - Stack Overflow

python rest flask lxml
Rectangle 27 11

I had exactly this problem. Turned out to be a memory problem - I was installing reporter.py, which depends on lxml, on a server with only 500MB RAM, of which only 150MB was free. I killed off a few things to get up to ~300MB free, and just managed to squeeze out the installation of lxml. (Watching TOP showed available memory going down to 4MB at one point!)

I had the same problem and my solution was based on yours. Actually, I'm using Vagrant and my VM was with only 512 MB RAM, so I've changed this limit to 1 GB. The installation ocurred without error.

I have solved this by adding swap file with 500M size because there was no ability to increase RAM.

Same for me. Vagrant default box. Bumped up to 1024 with config.vm.provider "virtualbox" do |vb| vb.memory = 1024 end fixes this hellish error.

python - can't installing lxml on Ubuntu 12.04 - Stack Overflow

python ubuntu-12.04 lxml python-import
Rectangle 27 11

I had exactly this problem. Turned out to be a memory problem - I was installing reporter.py, which depends on lxml, on a server with only 500MB RAM, of which only 150MB was free. I killed off a few things to get up to ~300MB free, and just managed to squeeze out the installation of lxml. (Watching TOP showed available memory going down to 4MB at one point!)

I had the same problem and my solution was based on yours. Actually, I'm using Vagrant and my VM was with only 512 MB RAM, so I've changed this limit to 1 GB. The installation ocurred without error.

I have solved this by adding swap file with 500M size because there was no ability to increase RAM.

Same for me. Vagrant default box. Bumped up to 1024 with config.vm.provider "virtualbox" do |vb| vb.memory = 1024 end fixes this hellish error.

python - can't installing lxml on Ubuntu 12.04 - Stack Overflow

python ubuntu-12.04 lxml python-import
Rectangle 27 11

I had exactly this problem. Turned out to be a memory problem - I was installing reporter.py, which depends on lxml, on a server with only 500MB RAM, of which only 150MB was free. I killed off a few things to get up to ~300MB free, and just managed to squeeze out the installation of lxml. (Watching TOP showed available memory going down to 4MB at one point!)

I had the same problem and my solution was based on yours. Actually, I'm using Vagrant and my VM was with only 512 MB RAM, so I've changed this limit to 1 GB. The installation ocurred without error.

I have solved this by adding swap file with 500M size because there was no ability to increase RAM.

Same for me. Vagrant default box. Bumped up to 1024 with config.vm.provider "virtualbox" do |vb| vb.memory = 1024 end fixes this hellish error.

python - can't installing lxml on Ubuntu 12.04 - Stack Overflow

python ubuntu-12.04 lxml python-import
Rectangle 27 3

name = None
level = 0
for event, element in etree.iterparse(gzip.GzipFile(f), events=('end', 'start' ), tag='label'):
    # Update current level
    if event == 'start': level += 1;
    elif event == 'end': level -= 1;
    # Get name for top level label
    if level == 0:
        name = element.xpath('name/text()')

As an alternate solution, parse the whole file and use xpath to get the top label name:

from lxml import html

with gzip.open(f, 'rb') as f:
    file_content = f.read()
    tree = html.fromstring(file_content)
    name = tree.xpath('//label/name/text()')

The file is huge. Parsing the hole thing at once is not an option.

python - lxml eTree iterparse depth - Stack Overflow

python lxml
Rectangle 27 7

import xml.etree.ElementTree as et
import csv

xmltext = """
<dicts>
    <key>1375</key>
    <dict>
        <key>Key 1</key><integer>1375</integer>
        <key>Key 2</key><string>Some String</string>
        <key>Key 3</key><string>Another string</string>
        <key>Key 4</key><string>Yet another string</string>
        <key>Key 5</key><string>Strings anyone?</string>
    </dict>
</dicts>
"""

f = open('output.txt', 'w')

writer = csv.writer(f, quoting=csv.QUOTE_NONNUMERIC)

tree = et.fromstring(xmltext)

# iterate over the dict elements
for dict_el in tree.iterfind('dict'):
    data = []
    # get the text contents of each non-key element
    for el in dict_el:
        if el.tag == 'string':
            data.append(el.text)
        # if it's an integer element convert to int so csv wont quote it
        elif el.tag == 'integer':
            data.append(int(el.text))
    writer.writerow(data)

Thanks for posting so soon. The problem is, I cannot get lxml to run on my machine. I have python 2.7 and have made several attempts to get that module installed, but have failed. I was hoping there was another way that doesn't involve lxml.

I'm running Ubuntu Maverick Meerkat Netbook edition...

How are you trying to install it? have you tried installing it with PIP?

Python XML Parsing - Stack Overflow

python xml parsing lxml
Rectangle 27 7

import xml.etree.ElementTree as et
import csv

xmltext = """
<dicts>
    <key>1375</key>
    <dict>
        <key>Key 1</key><integer>1375</integer>
        <key>Key 2</key><string>Some String</string>
        <key>Key 3</key><string>Another string</string>
        <key>Key 4</key><string>Yet another string</string>
        <key>Key 5</key><string>Strings anyone?</string>
    </dict>
</dicts>
"""

f = open('output.txt', 'w')

writer = csv.writer(f, quoting=csv.QUOTE_NONNUMERIC)

tree = et.fromstring(xmltext)

# iterate over the dict elements
for dict_el in tree.iterfind('dict'):
    data = []
    # get the text contents of each non-key element
    for el in dict_el:
        if el.tag == 'string':
            data.append(el.text)
        # if it's an integer element convert to int so csv wont quote it
        elif el.tag == 'integer':
            data.append(int(el.text))
    writer.writerow(data)

Thanks for posting so soon. The problem is, I cannot get lxml to run on my machine. I have python 2.7 and have made several attempts to get that module installed, but have failed. I was hoping there was another way that doesn't involve lxml.

I'm running Ubuntu Maverick Meerkat Netbook edition...

How are you trying to install it? have you tried installing it with PIP?

Python XML Parsing - Stack Overflow

python xml parsing lxml
Rectangle 27 5

Since it seems that /usr/include/libxml2 is being included, I think the most probable reason is that you don't have libxml2 installed on your system. This is most likely due to missing "command line tools". Get them here: https://developer.apple.com/downloads/index.action?=command%20line%20tools

Can also be solved by installing libxml2 via macports or brew (But don't do this other than as last resort). Using system libraries instead of homebrew or macports whenever possible can save you from a lot of incompatibility pitfalls.

Hi i reinstalled libxml2 via homebrew. maybe the upgrade to mavericks 10.9.1 overwrote the prior install. Will get back here.

I did "brew install libxml2" which was successful. But "pip install --upgrade lxml" still fails with same error

Have you checked that /usr/include/libxml2 exists? I bet it doesn't, in which case my recommendation is to install command line tools from Apple. Preferring system libs when both system and homebrew are available will almost always save you some headache. PS. The reason why homebrew-libxml didn't solve it is that that /usr/local/include probably isn't included by pip while building. Before trying to hack that, I'd really suggest checking /usr/include and installing command line tools - that will probably solve it all :)

You are correct - the /usr/include/libxml2 does not exist. I just upgraded to 10.9.1 - did that wipe out the command line tools??

Yep. The "command line tools" are OS-specific and need to be re-installed each time you upgrade to a new "major version" of OSX.

Error installing python module lxml on osx mavericks - Stack Overflow

python osx lxml osx-mavericks
Rectangle 27 3

There is no silver bullet. Different HTML parsers behave differently and you should pick the one that works for your particular page. Works in this case basically means, that you can get to your desired data.

lxml parser is generally faster, html5lib is the most lenient one - this kind of difference would be relevant if you have a broken or non-well-formed HTML to parse. html.parser is built-in and can help to avoid extra dependencies, if this is a problem. Here is a related table that highlights the differences.

So to be sure to get all the links, I must use several methods, several parsers?

@Anonymus nope, usually you just pick a parser and stick to it. But, I can imagine a page being non well-formed and parsing it with different parsers might get a bigger picture than with a single one. Though, I haven't been in that situation ever yet. Thanks.

python beautifulsoup : lxml html.parser - Stack Overflow

python beautifulsoup lxml html-parser
Rectangle 27 2

It looks like lxml wants to build an extension that requires access to a C compiler. You will need gcc for that. Try running sudo apt-get install build-essential and that should fix this particular issue.

sudo apt-get install gcc sudo: apt-get: command not found

@John The better command for Debain/Ubuntu is sudo apt-get install build-essential because it includes tools like make and a few other friends that are usually used in concert with gcc/g++.

Ah. OSX doesn't install the gcc compiler. Get Homebrew (github.com/mxcl/homebrew) or its less-intelligent cousin, ports, and then install gcc through them instead. Most of your pain is happening because there isn't an official, sensible packager on OS X. Sorry. =/

python - Installing easy_install... to get to installing lxml - Stack ...

python lxml easy-install