mechaml

A functional web scraping library
README

Description

Mechaml is a functional web scraping library that allows to :

  • Fetch web content

  • Analyze, fill and submit HTML forms

  • Handle cookies, headers and redirections

Mechaml is built on top of existing libraries that provide low-level features : Cohttp and
Lwt for asynchronous I/O and HTTP handling, and
Lambdasoup to parse HTML. It provides
an interface that handles the interactions between these and add a few
other features.

Overview

The library is divided into 3 main modules :

  • Agent : User-agent features. Perform requests, get back content, headers, status code, ...

  • Cookiejar : Cookies handling

  • Page : HTML parsing and forms handling

The Format module provides helpers to manage the formatted content in forms such
as date, colors, etc. For more details, see the documentation

Installation

From opam

opam install mechaml

From source

Mechaml uses the dune build system, which can be installed through opam. Then,
just run

dune build

to build the library.

Use dune build @doc to generate the documentation, dune runtest to build and
execute tests, and dune build examples/XXX.exe to compile example XXX.

Usage

Here is sample of code that fetches a web page, fills a login form and submits
it in the monadic style:

open Mechaml
module M = Agent.Monad
open M.Infix

let require msg = function
  | Some a -> a
  | None -> failwith msg

let action_login =
  Agent.get "http://www.somewebsite.com"
  >|= Agent.HttpResponse.page
  >|= (function page ->
    page
    |> Page.form_with "[name=login]"
    |> require "Can't find the login form !"
    |> Page.Form.set "username" "mynick"
    |> Page.Form.set "password" "@xlz43")
  >>= Agent.submit

let _ =
  M.run (Agent.init ()) action_login

More examples are available in the dedicated folder.

license

GNU LGPL v3

Install
Published
06 May 2021
Sources
1.2.1.tar.gz
md5=5c04d389b4f167ee03fda1b85b7b8099
sha512=269eeba6a3b9e178f1c9d2e9d6569d113aeb14094485069f1958d5975d92b72b4f2c8cb6e5935f66a767cde9e955c0432539344e014abb4540389624dd4ee9c7
Dependencies
ocaml
>= "4.03.0"
alcotest
with-test & >= "0.8.0"
lambdasoup
< "0.8.0"
cohttp
>= "0.21.0" & < "5.0.0"
dune
>= "1.8.0"
Reverse Dependencies