Wednesday, May 02, 2007

Angry at the DOM

This is not a bug, it's a feature. This really frustrates me.

In short, the below code example demonstrates a huge inconsistency between uses of a particularly really poorly documented (in the areas you want to use, anyway) implementation of DOM level 3.

$d = new DOMDocument();
$example = $d->createElementNS('','example');
$example->setAttributeNS('', 'bar:bar',"value");
$example->setAttributeNS('', 'monkey',"value");

$d = new DOMDocument();
$example = $d->createElementNS('','example');
$example->setAttributeNS('', 'bar:bar',"value");
$example->setAttributeNS('', 'monkey',"value");

What the heck am I angry about? You can...
  • $a->setAttribute('href', '');
  • $a->getAttribute('href');
  • $a->getAttributeNS('', 'href');
but only 50% of the time do you get to do
  • $a->setAttributeNS('', 'href', '');
Now the reason for it makes sense. If is not mapped to an xml namespace (xmlns); then how does it add that? Accordingly, you have to add in a qualifiedName

  • $a->setAttributeNS('', 'xhtml:href', '');
Up until this point, I am perfectly fine with how things are implemented, except forgetting to put in a qualified name gives you a "Namespace Error", when it should be telling you that "No qualified xmlns was found". I can deal with that. What I can't deal with is this:
  1. $a->setAttributeNS('', 'xhtml:href', '');
  2. $a->setAttributeNS('', 'href', '');
What threw an exception moments ago no longer does, because doing the first line changed a state internally, and invisibly.

So to get it right all the time, you use the same qualified name prefix, right? 'xhtml:foo'.
  1. $a->setAttributeNS('', 'xhtml:href', '');
  2. $a->setAttributeNS('', 'xhtml:href', '');
But what if someone gives you a document and uses different qualifiers for the same urls?
  1. $a->setAttributeNS('', $a->lookupPrefix('') . ':href', '');
  2. $a->setAttributeNS('', $a->lookupPrefix('') . ':href', '');
But now, you are stuck, because if the namespace isn't declared, lookupPrefix gives you a null. You are suddenly right back at the start, and you get a heck of a lot of exceptions. This should be easy to do. Instead, you have to be careful about doing it, otherwise you end up with a pile of unusable code very quickly.

So the best strategy to deal with this is:

determinePrefix($element, $namespace, $suggestedPrefix = null)
  • Should check for the existence $namespace in $element
  • If no $namespace prefix exists, register the namespace on the element
  • If a $suggestedPrefix is defined, use that
  • If a $namespace prefix is found, return that

... which is clunky and verbose.

Why doesn't the internals of DOMDocument either:
  • Refuse at any time to let you add something without a qualified prefix, and give you methods to add such a prefix - registerNamespace(); adds (xmlns:fish=""/)
  • Or, let you add in all of the unqualified prefixes you like, and give them anonymous names; and if you later append the node to another document or whatever, resolve them then.


Anonymous said...

Why do you care about the prefix? Its all about the namespace. Always use a prefix with the attribute and you should be fine. It seems the only issue you really have is:
"But what if someone gives you a document and uses different qualifiers for the same urls?"

The answer is that it makes absolutely no difference as long as the prefixes point to the same namespaceURI. As far as the XML is concerned, they mean the same thing. The difference is purely visual only.

Doing this the only time you should get an exception is when the prefix you are using with the attribute is already bound to a different namespace (must be already declared directly on the element you are setting the attribute on).

Dan said...

re/prefix - Good point, I wasn't thinking about that too much.

Most of this is because I want to KISS and DRY - why define xmlns:a and xmlns:b if I can just define xmlns:a

Additionally, what happens if I pick a namespace that is already purposed for something else?

I pick xmlns:a for xlink; someoneelse picks xmlns:a for anchors; what's my code to do?

Admittedly, the chances of collision are low; but it's still a niggle.

Anonymous said...

No matter what you are doing with XML, when mixing documents from different sources and there are no documented standards for them, the *ONLY* way you can make sure the same prefix is used is to look up the prefix of a namespaceURI before EVERY call.

When nodes are inserted into the document, the subtree is them reconciled to check for namespace problems that might have occured. This can result in prefixes being renamed and declared in different places to maintain the namespaceURI through the proper scope. This is one of the many reasons why you cant guarantee prefix naming.

Can you guess why most people hate namespaces :)

NOTE: I did find a bug in the setAttributeNS function (will be fixed after the 5.2.2 release) in that the namespaces of the parent element and its subtree are not reconciled if needed, which can result in breaking the namespacing scope in certain cases.