package SZXX

  1. Overview
  2. Docs
module DOM : sig ... end

Basic XML types and accessor functions

module SAX : sig ... end

Advanced parsing utilities: custom parser options and tools to stream huge documents

type document = {
  1. decl_attrs : DOM.attr_list;
    (*

    The declaration attributes, e.g. version and encoding

    *)
  2. top : DOM.element;
    (*

    The top element of the document

    *)
}
val sexp_of_document : document -> Sexplib0.Sexp.t
val compare_document : document -> document -> Base.int
val equal_document : document -> document -> Base.bool
val parse_document : ?parser:SAX.node Angstrom.t -> ?strict:Base.bool -> Feed.t -> (document, Base.string) Base.Result.t

Progressively parse a fully formed, fully escaped XML document. It begins parsing without having to read the whole input in its entirety.

parser: Override the default parser. Make your own parser with SZXX.Xml.SAX.make_parser or pass SZXX.Xml.html_parser.

strict: Default: true. When false, non-closed elements are treated as self-closing elements, HTML-style. For example a <br> without a matching </br> will be treated as a self-closing <br />.

feed: A producer of raw input data. Create a feed by using the SZXX.Feed module.

val parse_document_from_string : ?parser:SAX.node Angstrom.t -> ?strict:Base.bool -> Base.string -> (document, Base.string) Base.Result.t

Same as parse_document, but from a string

val html_parser : SAX.node Angstrom.t
val stream_matching_elements : ?parser:SAX.node Angstrom.t -> ?strict:Base.bool -> filter_path:Base.string Base.list -> on_match:(DOM.element -> Base.unit) -> Feed.t -> (document, Base.string) Base.Result.t

Progressively assemble an XML DOM, but every element that matches filter_path is passed to on_match instead of being added to the DOM. This "shallow DOM" is then returned. All text nodes are properly unescaped. It begins parsing without having to read the whole input in its entirety.

parser: Override the default parser. Make your own parser with SZXX.Xml.SAX.make_parser or pass SZXX.Xml.html_parser.

strict: Default: true. When false, non-closed elements are treated as self-closing elements, HTML-style. For example a <br> without a matching </br> will be treated as a self-closing <br />.

feed: A producer of raw input data. Create a feed by using the SZXX.Feed module.

filter_path: indicates which part of the DOM should be streamed out instead of being stored in the DOM. For example ["html"; "body"; "div"; "div"; "p"] will emit all the <p> tags nested inside exactly 2 levels of <div> tags in an HTML document.

on_match: Called on every element that matched filter_path

OCaml

Innovation. Community. Security.