Say you have HTML similar to the following:

<div style="background-image: url('https://some.domain/image')"></div>

and you want to extract https://some.domain/image using XPath. With XPath 2.0, you can select the URL with something like

select-before(select-after(//div/@style, "backgound-image: url("), ")")

but, when using XPath 1.0, this fails — I think it’s due to nested functions not being supported in XPath 1.0, but I have been unable to find documentation to confirm that. Is there a way to accomplish this using XPath 1.0?

  • towerful@programming.dev
    link
    fedilink
    arrow-up
    2
    ·
    edit-2
    1 month ago

    A quick Google suggests what you have.

    If the code you have quoted is verbatim what you have tried, seems like you need to extract the parentheses and possibly a single or double quote, depending on the source css. The example source you have given has a single quote.

    select-before(select-after(//div/@style, "backgound-image: url("), ")") 
    

    Should be (notice the extra ' relating to url('...url'))

    select-before(select-after(//div/@style, "backgound-image: url('"), "')")
    

    But I don’t think that would cause xpath to fail… It would just extract the wrong value

    Edit:
    Further reading suggests xpath 1.0 does have limited functionalities. But, like you, can’t find anything concrete.

  • spartanatreyu@programming.dev
    link
    fedilink
    arrow-up
    1
    ·
    1 month ago

    Asking just because I’m curious… why are you using xpath?

    Also, is this for a website you control or for some else’s website?

    If you’re rendering the page (in a browser, e2e test-runner, spider bot, etc…), have you considered running some js on the page to get the image? Something like: const imagePath = document.getElementById('exampleIdOnElement').style.backgroundImage

    • Kalcifer@sh.itjust.worksOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      1 month ago

      Asking just because I’m curious… why are you using xpath?

      I’m using a service called FreshRSS that automatically fetches RSS feeds. It has a feature that allows you to create custom feeds for sites by scraping the HTML with user specified XPath expressions.

      I know that this isn’t exactly “web development”, but it uses webdev tools, and I wasn’t entirely sure where else to post this.

      If you’re rendering the page (in a browser, e2e test-runner, spider bot, etc…), have you considered running some js on the page to get the image? Something like: const imagePath = document.getElementById(‘exampleIdOnElement’).style.backgroundImage

      JS is, unfortunately, not possible here. I can only use XPath expressions.