Mark Nottingham digs in on Infosets, and offers some interesting insight. Count him Thought Leader in my book.
He provides some thoughts about the infoset that echo some undercurrent feelings I have about it, but of course, he has actually turned into quite a nice essay. The first and the last sentence of this excerpt do it for me.
To sum: Xml ain't all that great for data modeling, but the readability sure is nice.
And I quote:
I’m forming a belief that the complexity of the Infoset as a data model forces an unwelcome choice upon its users:1) You can describe your format in terms of the Infoset, and therefore get easy human-readability and writability, while getting a lot of baggage as part of the bargain. I believe that a lot of the problems evident in the use of XML Schema and XML itself have their root in this complexity.
2) Or, you can layer a model on top of the Infoset that explains how format-specific components are serialised into XML. This is great for particular formats, but a fair amount of work. For example, WSDL 2.0 defines a component model that gets serialised into XML; the markup is still very human-readable, and the model is clear. However, it takes a fair amount of work to do this, and it’s very tricky to get the full benefits of Infoset-layer mechanisms like Schema in your component model.
3) The other option is to layer a generic model on top of the Infoset. This is the approach that RDF/XML takes; it insulates the data model from the XML serialisation, and as a consequence loses much of the intuitive readability of XML. Ask anybody about RDF, and they’ll tell you that they love the model, but hate the syntax.
The root of this, I think, is that XML was first and foremost a markup language, not a data modelling language; we’ve seen a number of attempts to layer something more appropriate on top of it (e.g., SOAP encoding, RDF/RDFS, XML Schema, etc.) but the human-readability draws people back to the Infoset level every time.
I’m forming a belief that the complexity of the Infoset as a data model forces an unwelcome choice upon its users:1) You can describe your format in terms of the Infoset, and therefore get easy human-readability and writability, while getting a lot of baggage as part of the bargain. I believe that a lot of the problems evident in the use of XML Schema and XML itself have their root in this complexity.
2) Or, you can layer a model on top of the Infoset that explains how format-specific components are serialised into XML. This is great for particular formats, but a fair amount of work. For example, WSDL 2.0 defines a component model that gets serialised into XML; the markup is still very human-readable, and the model is clear. However, it takes a fair amount of work to do this, and it’s very tricky to get the full benefits of Infoset-layer mechanisms like Schema in your component model.
3) The other option is to layer a generic model on top of the Infoset. This is the approach that RDF/XML takes; it insulates the data model from the XML serialisation, and as a consequence loses much of the intuitive readability of XML. Ask anybody about RDF, and they’ll tell you that they love the model, but hate the syntax.
The root of this, I think, is that XML was first and foremost a markup language, not a data modelling language; we’ve seen a number of attempts to layer something more appropriate on top of it (e.g., SOAP encoding, RDF/RDFS, XML Schema, etc.) but the human-readability draws people back to the Infoset level every time.
Recent Comments