lambdasoup

Easy functional HTML scraping and manipulation with CSS selectors
README

Lambda Soup is a functional HTML scraping and manipulation library for OCaml
aimed at being easy to use.

Lambda Soup is simple. It provides a set of
elementary traversals for getting from node to node, familiar
functional combinators such as filter, map, and fold, and
support for all CSS selectors that still make sense when not running in a
browser (and a few obvious extensions on top of that).

Here is a trivial self-contained example:

(parse "<p class='Hello'>World!</p>") $ ".Hello" |> R.leaf_text;;
- : string = "World!"

And, a mutation:

let soup = parse "<p class='Hello'>World!</p>" in
wrap (soup $ ".Hello" |> R.child) (create_element "strong");
soup |> to_string;;
- : string = "<p class=\"Hello\"><strong>World!</strong></p>"

For some more examples, see the Lambda Soup postprocessor that
runs on Lambda Soup's own documentation after it is generated by
ocamldoc.

The library is tested thoroughly.

Lambda Soup is based on Markup.ml. As a consequence, it resolves
entity references, detects character encodings automatically, and converts
everything to UTF-8. And, you can use Lambda Soup on XML, by
parsing the XML with Markup.ml and feeding the
signals to Lambda Soup.

Installing

opam install lambdasoup

Starting from scratch

To use Lambda Soup interactively as in the GIF at the top of this README, you
need to have done something like this:

your-package-manager install ocaml opam
opam init
eval `opam config env`          # Or restart your shell
opam install lambdasoup

and make sure your ~/.ocamlinit file looks something like this:

let () =
  try Topdirs.dir_directory (Sys.getenv "OCAML_TOPLEVEL_PATH")
  with Not_found -> ()
;;

#use "topfind";;

Then, run ocaml -short-paths to start the top-level, and scrape away!

Depending

Lambda Soup uses semantic versioning, but is currently in 0.x.x. For now, the
minor version number will be incremented on breaking changes. So, to give
yourself a chance to review the changelog before your code breaks, put the
following constraint on Lambda Soup: lambdasoup {< "0.7.0"}.

Documentation

Lambda Soup's interface consists of one module Soup, whose signature is
documented here.

Developing

See CONTRIBUTING. All feedback is welcome – open an issue on
GitHub, or send me an email at antonbachin@yahoo.com. If you find
yourself repeatedly writing the same helper on top of Lambda Soup's functions,
perhaps we should add it to Lambda Soup.

Install
Published
02 Jun 2018
Maintainers
Sources
0.6.3.tar.gz
md5=89f0596aa05a6e7a33bf9d74797905f1
Dependencies
ounit
with-test
markup
>= "0.7.1"
jbuilder
>= "1.0+beta10"
Reverse Dependencies