package irmin-chunk

  1. Overview
  2. Docs

This package provides an Irmin backend to cut raw contents into blocks of the same size, while preserving the keys used in the store. It can be used to optimize space usage when dealing with large files or as an intermediate layer for a raw block device backend.

Managing Chunks

This module exposes functors to store raw contents into append-only stores as chunks of same size. It exposes the AO functor which split the raw contents into Data blocks, addressed by Node blocks. That's the usual rope-like representation of strings, but chunk trees are always built as perfectly well-balanced and blocks are addressed by their hash (or by the stable keys returned by the underlying store).

A chunk has the following structure:

     --------------------------      --------------------------
     | uint8_t type            |     | uint8_t type            |
     ---------------------------     ---------------------------
     | uint16_t                |     | uint64_t                |
     ---------------------------     ---------------------------
     | key children[length]    |     | byte data[length]       |
     ---------------------------     ---------------------------

type is either Data (0) or Index (1). If the chunk contains data, length is the payload length. Otherwise it is the number of children that the node has.

It also exposes AO_stable which -- as AO does -- stores raw contents into chunks of same size. But it also preserves the nice property that values are addressed by their hash, instead of by the hash of the root chunk node as is the case for AO.

module Conf : sig ... end
val config : ?size:int -> ?min_size:int -> ?chunking:[ `Max | `Best_fit ] -> Irmin.config -> Irmin.config

config ?config ?size ?min_size () is the configuration value extending the optional config with bindings associating chunk_size to size.

If chunking is Best_fit (the default), the size of new chunks will be of maximum max_size but could be smaller if they don't need to be chunked. If chunking is Max, all the new chunks will be of size max_size.

Fail with Invalid_argument if size is smaller than min_size. min_size is, by default, set to 4000 (to avoid hash collisions on smaller sizes) but can be tweaked for testing purposes. Notes: the smaller size is, the bigger the risk of hash collisions, so use reasonable values.

Content_addressable(X) is a content-addressable store which store values cut into chunks into the underlying store X.