Using PHP SimpleXML to get XML Namespace Elements
by Sherif
PHP has a great SimpleXML library that converts XML to an object that can be processed with normal property selectors and array iterators. I’ve been using this quite a bit lately to process some XML documents.
The library documentation isn’t that great when it comes to processing Namespace Elements within your XML document. An example of such use case is when you are parsing an RSS feed that has XML Namespace elements.
Consider the following example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | <?xml version="1.0" encoding="UTF-8"?> <rss version="2.0" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:dc="http://purl.org/dc/elements/1.1/"> <channel> .... <item> <title>My Title</title> <description>My Item</description> <dc:publisher>ABC</dc:publisher> <dc:creator>DEF</dc:creator> <dc:date>2009-02-12T16:53:25Z</dc:date> </item> ... </channel> </rss> |
For me to access things like the Title and Description elements, its as simple as:
1 2 3 4 5 6 | $feed = file_get_contents("http://linkto.my.feed"); $xml = new SimpleXmlElement($feed); foreach ($xml->channel->item as $entry){ echo $entry->title; echo $entry->description; } |
But what if I want to access my namespace elements such as dc:publisher or dc:creator? You would think it ‘could’ be as simple as this:
1 2 3 4 5 6 7 | //This doesn't work ... foreach ($xml->channel->item as $entry){ echo $entry->publisher; echo $entry->creator; ... } |
The code above doesn’t work because the publisher and creator elements sit inside different namespaces. So how do we do this? If you recall, the second line of our feed had this:
1 | .... xmlns:dc="http://purl.org/dc/elements/1.1/"> |
So we know from above that anything in the dc namespace refers this URL: http://purl.org/dc/elements/1.1. Now that we know this, we can easily do this:
1 2 3 4 5 6 7 8 9 10 | $feed = file_get_contents("http://linkto.my.feed"); $xml = new SimpleXmlElement($feed); foreach ($xml->channel->item as $entry){ echo $entry->title; echo $entry->description; //Use that namespace $dc = $entry->children(‘http://purl.org/dc/elements/1.1/’); echo $dc->publisher; echo $dc->creator; } |
That would work. Now a cleaner way is to read the namespace URI form the document itself using the getNamespaces method:
1 2 3 4 5 6 7 8 9 10 | ... foreach ($xml->channel->item as $entry){ ... //Use that namespace $namespaces = $entry->getNameSpaces(true); //Now we don't have the URL hard-coded $dc = $entry->children($namespaces['dc']); echo $dc->publisher; echo $dc->creator; } |
That’s it! I found this useful when getting an RSS feed using SimpleXML and wanting to parse the XML Namespace elements.
[...] AHA am gasit site-ul : http://blog.sherifmansour.com/?p=302 [...]
VERY helpful. Thank you!
hi,
Thankx a lot.you have saved a lot of time of mine.thankx again.
i couldnt give any thing but the clicks on google adds.
Thanks for this! I spent yesterday afternoon unsuccessfully dealing with an items title because it was namespaced.
Thaaaank you !
I spent all last day looking for that solution, and it’s gonna save my next two days (at least) !
I love you !
Thanks for this. Very good and simple example.
Cheers for this – very helpful indeed.
Very helpful. Learning about name spaces in the xml is totaly cool.
I’m am a very experienced mainframe programmer but new to php. Great stuff.
thx swampy
Didn’t work for me, but this did: http://www.leftontheweb.com/message/A_small_SimpleXML_gotcha_with_namespaces
Awesome – this saved my day –
Gracias; me ayudo mucho tus ejemplos
Thanks! I was scouring the web to figure out how to do XML namespaces with SimpleXML and this post solved it for me!
I just want to thank you for this tip. It seems in just about any programing language, xml has been just hard for me. Between SimpleXML and this tip, I feel like I am getting a handle on things finally.
Very cool and most appreciated!
Great info! I was worried I was going to spend all day figuring this out. Thanks!
My output:
Request Processed
Array
(
[] => urn:melbourneit-com:xml:spin:genericapi-1.0
[bulk] => urn:melbourneit-com:xml:spin:bulk-1.0
[xsi] => http://www.w3.org/2001/XMLSchema-instance
[domain] => urn:melbourneit-com:xml:spin:domain-1.0
[contact] => urn:melbourneit-com:xml:common:contact-1.0
)
we reached here
1 =
————
The XML file spinResponse-listDomainContactData.xml
Request Processed
bulkSearch
2006-05-04T18:13:51.0Z
name0
namespace0
0
ns0
ns1
2006-05-04T18:13:51.0Z
CUSTOMER_LOCK
false
y
0
0
0
0
Very good! Its a helpful article. It helped me a lot with an integration problem with the feeds from youtube! I could not pull at all the nodes containing namespace. Thanks.
Can anybody tell me what should I do when I have XML document like this one? (dc starts in very begining of the tags)
….
My Title
My Item
ABC
DEF
2009-02-12T16:53:25Z
…
Very useful, thanks!
That’s ace. Thank you!
This was exactly what I needed, thanks!
Hello
I like your blog. It is interesting to read.
I am curious why I didn’t know about this blog before.
I will try to share. Some of my friends will appreciate this.
Thanks and keep’em coming!
I would like to thank you for this too. It helped me very much.
Best regards, Sabine
Thank you so much. Why the hell doesn’t my book mention this… bah!
[...] I never knew how to extract those from the RSS using simplexml, a revalation however! See – http://blog.sherifmansour.com/?p=302 [...]
Excellent, to-the-point post. Did you add comments to the PHP manual? If not, you should submit a link to this page. Brilliant, thanks.
This was a lifesaver–trying to deal with the google analytics api and just couldn’t figure out how to get an attribute value for the dxp:metric element. You rock!
Thanxxxxxxxxxxxxxxxxxx!!!!!!!!!
I was just out of my mind parsing an atom feed, but couldn’t get the data with namespace.
Thanks for this article. I just faced this issue and this gave me the right answer.
Nevertheless I don’t completely agree with your ‘clean’ way.
With this way you expect that your input document will use ‘dc’ as namespace prefix. This is not mandatory at all and the XML document producer might decide to use another prefix and the document will be exactly the same from an XML point of view.
The thing that cannot change is the namespace URI. So in my opinion there is absolutely no problem to hardcode the namespace URI as this ensure the received document is conformant to what is expected.
Br,
Eric Bourlon
Working with google maps… u saved life
Is there any way this can be customised to allow for multiple blog feeds?
Very useful. Thanks!
Hello. Thank you very much. Helped me A LOT
Thank you, this helped me a lot
Thanks! Helped very much!!
If you are struggling with xml namespaces, there is a great tutorial on xpath namespaces at xml reports. It walks you through it in very simple steps
xml reports http://www.xml-reports.com/2011/05/xml-namespaces-for-dummies-part-1.html
If you are trying to pull an image or thumbnail out of a namespace where it’s embedded in an attribute, try this. Notice how there is no closing tag. What I want to do is pull the URL so I can use that to display an image.
SAMPLE:
…
CODE:
channel->item as $entry){
…
//Use that namespace
$namespaces = $entry->getNameSpaces(true);
//Now we don’t have the URL hard-coded
$media = $entry->children($namespaces['media']);
// Must call attributes() with this format. Easy peasy.
echo $media->thumbnail->attributes()->url;
}
?>
How can I retrive a list of Dublin Core records, from a URl like: http://memory.loc.gov/cgi-bin/oai2_0?verb=ListRecords&metadataPrefix=oai_dc&set=mussm
If someone could help, I´ll appreciate.
Thanks in advance,
Flávio
Thanks for this great example. Also, thanks to Richie for the tips on how to pull an image out – helped me loads!
nice, just what the doctor ordered, or in this instance.. the client…
thanks Sherif!
/configs/application.xml
How to parser this?
[config][php:const php:name="APPLICATION_PATH" /] /configs/application.xml [/config]
How to parser this?
Many thanks for this! Very helpful.
It’s about time someone explain this!!! Thanks
Dude! thanks so much!
Big thanks, you saved me a lot of time and headache.
Hi Sherif,
This is the first example that actually works – thanks.
Question, though: I can get all the XML data except for the parameters/values on the namespace tags, for example:
How do I get the ‘id’, ‘available’ and ‘archived’ attributes of that XML tag?
Thanks!
It’s really a cool and helpful piece of information. I’m satisfied that you shared this helpful information with us. Please stay us informed like this. Thanks for sharing.
It’s simple and helpful,I needn’t view the documents through now.
If you know the prefix ($namespaces['dc']?) and the child’s name (publisher?) all you need is:
$entry->children(‘dc’, true)->publisher