Rectangle 27 0

Searching for expected content by means of lxml

This assumes, you already have content of the page containing the data you need. The code shows fetching it by http request, if it requires rendering within browser, see later part of my answer how to get get it.

If you want to get all values in attribute data-hiringurl, try XPath //@data-hiringurl

from lxml import html
import requests

url = "http://wearemadeinny.com/find-a-job/"

page = requests.get(url)
tree = html.fromstring(page.text) # corrected, used to be `lxml.html.fromstring`

xp = "//@data-hiringurl"
job_urls = tree.xpath(xp)

print print job_urls

But I am not sure, if the url you have provided contain such data. I did not find it there.

If the page gets the content you are interested in rendered dynamically on the client, you need to provide the browser context and let it render there. Using selenium can do the work:

>>> from selenium import webdriver
>>> browser = webdriver.Firefox()
>>> url = "http://wearemadeinny.com/find-a-job/"
>>> browser.get(url)
>>> page = browser.page_source
>>> print page

Now you have in page variable content of the page and you may proceed with lxml as described above.

Note: I do not guarantee, you will get the expected content in the page, I only know, it comes in rendered form. But if you need to proceed by clicking on some of the elements on the page, filling in some text, pressing buttons, all that can be done by browser instance shown above - just read doc.

this is returning an empty list. the data-hiringurl attribute can be found if you use chrome dev tools and inspect element on one of the companies that says "we're hiring" on the right rail. edit: all of the companies have the data-hiringurl attribute, though some are empty.

@Barnaby If these data-hiringurl attributes are filled in by JavaScript, then requests will not bring it to lxml. You shall look to mechanize or seleninum drivers.

i assume that this is the case. i will investigate mechanize or selenium

selenium

python - Is it possible to scrape html data- attributes with XPath sel...

python html xpath lxml custom-data-attribute