Using PHP SimpleXML to get XML Namespace Elements

by Sherif

PHP has a great SimpleXML library that converts XML to an object that can be processed with normal property selectors and array iterators. I’ve been using this quite a bit lately to process some XML documents.

The library documentation isn’t that great when it comes to processing Namespace Elements within your XML document. An example of such use case is when you are parsing an RSS feed that has XML Namespace elements.

Consider the following example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
....
  <item>
    <title>My Title</title>
    <description>My Item</description>
    <dc:publisher>ABC</dc:publisher>
    <dc:creator>DEF</dc:creator>
    <dc:date>2009-02-12T16:53:25Z</dc:date>
  </item>
  ...
</channel>
</rss>

For me to access things like the Title and Description elements, its as simple as:

1
2
3
4
5
6
$feed = file_get_contents("http://linkto.my.feed");
$xml = new SimpleXmlElement($feed);
foreach ($xml->channel->item as $entry){
  echo $entry->title;
  echo $entry->description;
}

But what if I want to access my namespace elements such as dc:publisher or dc:creator? You would think it ‘could’ be as simple as this:

1
2
3
4
5
6
7
//This doesn't work
...
foreach ($xml->channel->item as $entry){
  echo $entry->publisher;
  echo $entry->creator;
  ...
}

The code above doesn’t work because the publisher and creator elements sit inside different namespaces. So how do we do this? If you recall, the second line of our feed had this:

1
.... xmlns:dc="http://purl.org/dc/elements/1.1/">

So we know from above that anything in the dc namespace refers this URL: http://purl.org/dc/elements/1.1. Now that we know this, we can easily do this:

1
2
3
4
5
6
7
8
9
10
$feed = file_get_contents("http://linkto.my.feed");
$xml = new SimpleXmlElement($feed);
foreach ($xml->channel->item as $entry){
  echo $entry->title;
  echo $entry->description;
  //Use that namespace
  $dc = $entry->children(‘http://purl.org/dc/elements/1.1/’);
  echo $dc->publisher;
  echo $dc->creator;
}

That would work. Now a cleaner way is to read the namespace URI form the document itself using the getNamespaces method:

1
2
3
4
5
6
7
8
9
10
...
foreach ($xml->channel->item as $entry){
  ...
  //Use that namespace
  $namespaces = $entry->getNameSpaces(true);
  //Now we don't have the URL hard-coded
  $dc = $entry->children($namespaces['dc']); 
  echo $dc->publisher;
  echo $dc->creator;
}

That’s it! I found this useful when getting an RSS feed using SimpleXML and wanting to parse the XML Namespace elements.