encore

Isomorphism between encoder & decoder
README


Encore is a little library to provide an interface to generate an
Angstrom's decoder and a
internal encoder from a shared description. The goal is specifically for
ocaml-git to ensure isomorphism when
we decode and encode a Git object - and keep the same hash/identifier.

Examples

A good example can be found in test/ directory. It provides a description of a
Git object and, by this way, make an Angstrom decoder and an encoder. Then, we
test the Encore git repository itself to check integrity after a serialization
and a de-serialization.

Benchmark

Encore integrates a little overhead when you compare generated decoder/encoder
with an encoder and a decoder generated by your hands. We integrate a benchmark
which compares a specific version of ocaml-git (encore branch) and
decoder/encoder produced by Encore. You can run this benchmark locally with
jbuilder build @runbench but first you need to pin ocaml-git on:

$ opam pin add git https://github.com/dinosaure/ocaml-git.git#encore
$ opam pin add git-http https://github.com/dinosaure/ocaml-git.git#encore
$ opam pin add git-unix https://github.com/dinosaure/ocaml-git.git#encore

Then, on my computer (Thinkpad X1 Carbon - Intel i7-7500U CPU @ 2.70 Ghz - 2.90
Ghz), I get this result:

┌────────┬──────────┬─────────┬──────────┬──────────┬────────────┐
│ Name   │ Time/Run │ mWd/Run │ mjWd/Run │ Prom/Run │ Percentage │
├────────┼──────────┼─────────┼──────────┼──────────┼────────────┤
│ encore │  37.24ms │  3.45Mw │ 194.32kw │  18.09kw │    100.00% │
│ git    │  32.84ms │  3.52Mw │ 229.67kw │  13.92kw │     88.16% │
└────────┴──────────┴─────────┴──────────┴──────────┴────────────┘

So, we can observe a little overhead but guarantees provided by Encore are more
interesting than a faster decoder/encoder.

Some notes about internal encoder

Internal encoder is a little encoder which takes care about the memory
consumption when you serialize an OCaml value with a description. We use a
bounded bigarray and when it's full, we explicitly ask to the user to flush it.

Internal encoder was built on a CPS mind like Angstrom and uses only pure
functional data structures. This is a big difference from
Faraday. So, obviously, this
encoder is slower than Faraday (3 times), however, we can not use Faraday in
this context, precisely about alteration.

In fact, when the encoder fails, we raise an exception to short-cut to the other
branch. With a mutable structure, it's little bit hard to rollback to the old
state of encoder and retry the other branch. With this encoder, we don't need to
trick to rollback because, at any step we make a new pure state.

Inspirations

This project is inspired by the finale
project which is focused on a pretty-printer at the end. Encore is close to
provide a low-level encoder like
Faraday than a generator of a
pretty-printer.

Improvements

This library was made specifically for ocaml-git. The API could be not
consistent for an usual user (and not easy to use). So feedbacks are really
welcomed to improve API. Finally, the big issue seems to be performance on
internal encoder - it could be interesting to to improve it but it's little-bit
difficult to understand assumptions on encoding process - like immutability. So,
feel free!

Install
Published
01 Apr 2018
Sources
encore-0.1.tbz
md5=79647f73f51e1681cd64790b62d13e59
Dependencies
alcotest
with-test
angstrom
>= "0.9.0" & < "0.14.0"
jbuilder
>= "1.0+beta9"
ocaml
>= "4.03.0"
Reverse Dependencies
git
>= "2.0.0" & < "2.1.3"