Document.parse

Take XMLish data and try to make the DOM tree out of it.

The goal isn't to be perfect, but to just be good enough to approximate Javascript's behavior.

If strict, it throws on something that doesn't make sense. (Examples: mismatched tags. It doesn't validate!) If not strict, it tries to recover anyway, and only throws when something is REALLY unworkable.

If strict is false, it uses a magic list of tags that needn't be closed. If you are writing a document specifically for this, try to avoid such - use self closed tags at least. Easier to parse.

The dataEncoding argument can be used to pass a specific charset encoding for automatic conversion. If null (which is NOT the default!), it tries to determine from the data itself, using the xml prolog or meta tags, and assumes UTF-8 if unsure.

If this assumption is wrong, it can throw on non-ascii characters!

More...
class Document
void
parse
()
(
in string rawdata
,
bool caseSensitive = false
,
bool strict = false
,
string dataEncoding = "UTF-8"
)

Detailed Description

Note that it previously assumed the data was encoded as UTF-8, which is why the dataEncoding argument defaults to that.

So it shouldn't break backward compatibility.

But, if you want the best behavior on wild data - figuring it out from the document instead of assuming - you'll probably want to change that argument to null.

This is a template so it lazily imports arsd.characterencodings, which is required to fix up data encodings.

If you are sure the encoding is good, try parseUtf8 or parseStrict to avoid the dependency. If it is data from the Internet though, a random website, the encoding is often a lie. This function, if dataEncoding == null, can correct for that, or you can try parseGarbage. In those cases, arsd.characterencodings is required to compile.

Meta