Creates a document with the given source data. If you want HTML behavior, use caseSensitive and struct set to false. For XML mode, set them to true.
Creates an empty document. It has *nothing* in it at all, ready.
.
.
.
.
.
Adds objects to the dom representing things normally stripped out during the default parse, like comments, <!instructions>, <% code%>, and <? code?> all at once.
.
.
implementing the FileResource interface; it calls toString.
These functions all forward to the root element. See the documentation in the Element class.
FIXME: btw, this could just be a lazy range......
this uses a weird thing... it's [name=] if no colon and [property=] if colon
This returns the <body> element, if there is one. (It different than Javascript, where it is called 'body', because body is a keyword in D.)
This is just something I'm toying with. Right now, you use opIndex to put in css selectors. It returns a struct that forwards calls to all elements it holds, and returns itself so you can chain it.
These functions all forward to the root element. See the documentation in the Element class.
Take XMLish data and try to make the DOM tree out of it.
Given the kind of garbage you find on the Internet, try to make sense of it. Equivalent to document.parse(data, false, false, null); (Case-insensitive, non-strict, determine character encoding from the data.) NOTE: this makes no attempt at added security, but it will try to recover from anything instead of throwing.
Parses well-formed UTF-8, case-sensitive, XML or XHTML Will throw exceptions on things like unclosed tags.
Parses well-formed UTF-8 in loose mode (by default). Tries to correct tag soup, but does NOT try to correct bad character encodings.
These functions all forward to the root element. See the documentation in the Element class.
Sets a meta tag in the document header. It is kinda hacky to work easily for both Facebook open graph and traditional html meta tags/
Returns or sets the string before the root element. This is, for example, <!DOCTYPE html>\n or similar.
Writes it out with whitespace for easier eyeball debugging
Returns the document as string form. Please note that if there is anything in piecesAfterRoot, they are discarded. If you want to add them to the file, loop over that and append it yourself (but remember xml isn't supposed to have anything after the root element).
If you're using this for some other kind of XML, you can set the content type here.
implementing the FileResource interface, useful for sending via http automatically.
implementing the FileResource interface, useful for sending via http automatically.
Returns or sets the string before the root element. This is, for example, <!DOCTYPE html>\n or similar.
Gets the <title> element's innerText, if one exists
Sets the title of the page, creating a <title> element if needed.
Convenience method for web scraping. Requires arsd.http2 to be included in the build as well as arsd.characterencodings.
List of elements that are considered inline for pretty printing. The default for a Document are hard-coded to something appropriate for HTML. For XmlDocument, it defaults to empty. You can modify this after construction but before parsing.
.
If the parser sees <% asp code... %>, it will call this callback. It will be passed "% asp code... %" or "%= asp code .. %" Return true if you want the node appended to the document. It will be in an AspCode object.
if it sees a <! that is not CDATA or comment (CDATA is handled automatically and comments call parseSawComment), it calls this function with the contents. <!SOMETHING foo> calls parseSawBangInstruction("SOMETHING foo") Return true if you want the node appended to the document. It will be in a BangInstruction object.
If the parser sees a html comment, it will call this callback <!-- comment --> will call parseSawComment(" comment ") Return true if you want the node appended to the document. It will be in a HtmlComment object.
If the parser sees <?php php code... ?>, it will call this callback. It will be passed "?php php code... ?" or "?= asp code .. ?" Note: dom.d cannot identify the other php <? code ?> short format. Return true if you want the node appended to the document. It will be in a PhpCode object.
if it sees a <?xxx> that is not php or asp it calls this function with the contents. <?SOMETHING foo> calls parseSawQuestionInstruction("?SOMETHING foo") Unlike the php/asp ones, this ends on the first > it sees, without requiring ?>. Return true if you want the node appended to the document. It will be in a QuestionInstruction object.
stuff after the root, only stored in non-strict mode and not used in toString, but available in case you want it
if these were kept, this is stuff that appeared before the root element, such as <?xml version ?> decls and <!DOCTYPE>s
The root element, like <html>. Most the methods on Document forward to this object.
List of elements that can be assumed to be self-closed in this document. The default for a Document are a hard-coded list of ones appropriate for HTML. For XmlDocument, it defaults to empty. You can modify this after construction but before parsing.
the content-type of the file. e.g. "text/html; charset=utf-8" or "image/png"
the data
filename, return null if none
The main document interface, including a html or xml parser.
There's three main ways to create a Document:
If you want to parse something and inspect the tags, you can use the constructor:
If you want to download something and parse it in one call, the fromUrl static function can help:
(note that this requires my arsd.characterencodings and arsd.http2 libraries)
And, if you need to inspect things like <%= foo %> tags and comments, you can add them to the dom like this, with the enableAddingSpecialTagsToDom and parseUtf8 or parseGarbage functions:
However you parse it, it will put a few things into special variables.
root contains the root document. prolog contains the instructions before the root (like <!DOCTYPE html>). To keep the original things, you will need to enableAddingSpecialTagsToDom first, otherwise the library will return generic strings in there. piecesBeforeRoot will have other parsed instructions, if enableAddingSpecialTagsToDom is called. piecesAfterRoot will contain any xml-looking data after the root tag is closed.
Most often though, you will not need to look at any of that data, since Document itself has methods like querySelector, appendChild, and more which will forward to the root Element for you.