Decoder of a PACK file.
Along this module, the type ('a, 's) io
with a 's scheduler
is needed for some operations (which use a syscall). To be able to use them, the use must create a new type 's
which represents the scheduler. To do that with LWT for example:
module Lwt_scheduler = Make (Lwt)
let scheduler =
let open Lwt.Infix in
let open Lwt_scheduler in
{
bind = (fun x f -> inj (x >>= fun x -> prj (f x)));
return = (fun x -> inj x);
}
The produced module has 2 functions inj
and prj
to pass from or to an LWT value. The user can use these functions like:
let fiber =
let ( >>= ) = scheduler.bind in
let return = scheduler.return in
weight_of_offset scheduler ~map t ~weight:null 0L >>= fun weight ->
let raw = make_raw ~weight in
of_offset scheduler ~map t raw ~cursor:0L in
prj fiber ;;
- : (Carton.v, [> error ]) Lwt.t = <abstr>
type weight = private int
Type of weight
. weight
is not length of object but bytes needed to extract it.
val weight_of_int_exn : int -> weight
weight_of_int_exn n
is the weight of n
.
type ('fd, 's) read =
'fd ->
bytes ->
off:int ->
len:int ->
(int, 's) Carton__.Sigs.io
module Fp (Uid : sig ... end) : sig ... end
Type of state used to access to any objects into a Carton
file.
/
/
with_z new t
replaces the used temporary buffer by t
by new
. Indeed, when the user wants to extract an object, the internal temporary buffer is used to store the inflated object. By this way, a parallel/concurrent computation of 2 extractions with the same t
is unsafe.
So, this function allows the user to create a new t
with a new dedicated temporary buffer (physically different from the old one) to be able to start a parallel/concurrent process.
val with_w : 'fd W.t -> ('fd, 'uid) t -> ('fd, 'uid) t
with_w w t
replaces the used table W.t
by w
. As with_z
, the purpose of this function is to be able to parallelize multiple t
.
val with_allocate :
allocate:(int -> De.window) ->
('fd, 'uid) t ->
('fd, 'uid) t
with_allocate allocate t
replaces the function to allocate the window needed to inflate objects by allocate
. As with_z
, the purpose of this function is to be able to parallelize multiple t
.
val fd : ('fd, 'uid) t -> 'fd
fd t
returns the underlying used fd
resource to map memory parts of it. On Unix
, even if a mapped memory part can live if fd
is the close, the resource should be open as long as the user extracts objects.
Type of a Carton
object as is into a Carton
file.
make_raw ~weight
allocates a raw.
val v : kind:[ `A | `B | `C | `D ] -> ?depth:int -> Bigstringaf.t -> v
v ~kind ?depth raw
is a value raw
typed by kind
. ?depth
is an optional value to know at which depth the object exists into the PACK file it came from (default to 1
).
val kind : v -> [ `A | `B | `C | `D ]
kind v
is the type of the object v
.
raw v
is the contents of the object v
.
Note. The Bigstringaf.t
can be larger (and contain extra contents) than len v
(see len
). The user should Bigstringaf.sub
it with the real length of the object.
len v
is the length of the object v
.
depth v
is the depth of the object into the PACK file it came from.
val make :
'fd ->
z:Zl.bigstring ->
allocate:(int -> Zl.window) ->
uid_ln:int ->
uid_rw:(string -> 'uid) ->
('uid -> int64) ->
('fd, 'uid) t
make fd ~z ~allocate ~uid_ln ~uid_rw where
returns a state associated to fd
which is the user-defined representation of a Carton
file. Some informations are needed:
z
is an underlying buffer used to inflate an object.allocate
is an allocator of underlying window used to inflate an object.uid_ln
is the length of raw representation of user-defined uid.uid_rw
is the cast-function from a string to user-defined uid.where
is the function to associate an uid to an offset into the associated Carton
file.
Each argument depends on what the user wants. For example, if t
is used by Verify.verify
, allocate
must be thread-safe according to IO
. where
is not used by Verify.verify
. uid_ln
and uid_rw
depends on the Carton
file associated by fd
. Each functions available below describes precisely what they do on t
.
Weight of object.
Before to extract an object, we must know resources needed to extract it. weight_of_offset
/weight_of_uid
do an simple analyse and return the larger length needed to store the requested object such as:
weight_of_offset unix ~map t ~weight:null 0L >>= fun weight ->
assert ((null :> int) <= (weight :> int)) ;
Fmt.epr "Object at %08Lx needs %d byte(s).\n%!" 0L (weight :> int) ;
let resource = make_raw ~weight in
...
An object can need an other object (see OBJ_OFS_DELTA
and OBJ_REF_DELTA
). In this case, the resource needed must be larger/enough to store both objects. So the analyse is recursive over the delta-chain.
Note. If the given PACK file represented by t
is bad, Cycle
is raised. It means that an object A refers to an object B which refers to our last object A.
Note. This process is not tail-rec and discover at each step if it needs to continue the delta-chain or not.
val weight_of_offset :
map:'fd W.map ->
('fd, 'uid) t ->
weight:weight ->
?visited:int64 list ->
int64 ->
weight
weight_of_offset sched ~map t ~weight offset
returns the weight
of the given object available at offset
into t
. This function assumes:
weight_of_offset sched ~map t ~weight:a offset >>= fun b ->
assert ((a :> int) <= (b :> int))
Note. This function can try to partially inflate objects. So, this function can use internal buffers and it is not thread-safe.
Note. This function can try to look-up an other object if it extracts an OBJ_REF_DELTA
object. However, if we suppose that we process a PACKv2, an OBJ_REF_DELTA
usually points to an external object (see thin-pack).
val weight_of_uid :
map:'fd W.map ->
('fd, 'uid) t ->
weight:weight ->
?visited:int64 list ->
'uid ->
weight
weight_of_offset sched ~map t ~weight uid
returns the weight
of the given object identified by uid
into t
. This function assumes the same assumption as weight_of_offset
.
Note. As weight_of_offset
, this function can inflate objects and use internal buffers and it is not thread-safe.
Note. Despite weight_of_offset
, this function look-up the object from the given reference.
val length_of_offset : map:'fd W.map -> ('fd, 'uid) t -> int64 -> int
Value of object.
val of_offset : map:'fd W.map -> ('fd, 'uid) t -> raw -> cursor:int64 -> v
of_offset sched ~map raw ~cursor
is the object at the offset cursor
into t
. The function is not tail-recursive. It discovers at each step if the object depends on another one (see OBJ_REF_DELTA
or OBJ_OFS_DELTA
).
Note. This function does not allocate larges resources (or, at least, only the given allocate
function to t
is able to allocate a large resource). raw
(which should be created with the associated weight
given by weight_of_offset
) is enough to extract the object.
val of_uid : map:'fd W.map -> ('fd, 'uid) t -> raw -> 'uid -> v
As of_offset
, of_uid sched ~map raw uid
is the object identified by uid
into t
.
Path of object.
Due to the fact that of_offset
/of_uid
are not tail-rec, an other solution exists to extract an object from the PACK file. However, this solution requires a meta-data path
to be able to extract an object.
A path
is the delta-chain of the object. It assumes that a delta-chain can not be larger than 60
(see Git assumptions). From it, the way to construct an object is well-know and the step to discover if an object depends on an other one is deleted - and we ensure that the reconstruction is bound over our path
.
This solution fits well when we want to memoize the extraction.
val path_to_list : path -> int64 list
path_to_list path
returns the delta-chain of the given path
.
val kind_of_path : path -> [ `A | `B | `C | `D ]
kind_of_path path
returns the kind of the object associated to the given path
. An assumption exists about PACK format, a delta-chain refers to several objects which must have the same type/kind.
val path_of_offset : map:'fd W.map -> ('fd, 'uid) t -> cursor:int64 -> path
path_of_offset sched ~map t ~cursor
is that path
of the given object available at cursor
.
Note. This function can try to partially inflate objects. So, this function can use internal buffers and it is not thread-safe.
Note. This function can try to look-up an other object if it extracts an OBJ_REF_DELTA
object. However, if we suppose that we process a PACKv2, an OBJ_REF_DELTA
usually points to an external object (see thin-pack).
val path_of_uid : map:'fd W.map -> ('fd, 'uid) t -> 'uid -> path
path_of_uid sched ~map t uid
is the path
of the given object identified by uid
into t
.
Note. As weight_of_offset
, this function can inflate objects and use internal buffers and it is not thread-safe.
Note. Despite weight_of_offset
, this function look-up the object from the given reference.
val of_offset_with_path :
map:'fd W.map ->
('fd, 'uid) t ->
path:path ->
raw ->
cursor:int64 ->
v
of_offset_with_path sched ~map t ~path raw ~cursor
is the object available at cursor
into t
. This function is tail-recursive and bound to the given path
.
Uid of object.
Unique identifier of objects is a user-defined type which is not described by the format of the PACK file. By this fact, the way to digest an object is at the user's discretion. For example, Git prepends the value by an header such as:
let digest v =
let kind = match kind v with
| `A -> "commit"
| `B -> "tree"
| `C -> "blob"
| `D -> "tag" in
let hdr = Fmt.str "%s %d\000" kind (len v) int
let ctx = Digest.empty in
feed_string ctx hdr ;
feed_bigstring ctx (Bigstringaf.sub (raw v) 0 (len v)) ;
finalize ctx
Of course, the user can decide how to digest a value (see digest
). However, 2 objects with the same contents but different types should have different unique identifier.
type 'uid digest =
kind:[ `A | `B | `C | `D ] ->
?off:int ->
?len:int ->
Bigstringaf.t ->
'uid
val uid_of_offset :
map:'fd W.map ->
digest:'uid digest ->
('fd, 'uid) t ->
raw ->
cursor:int64 ->
[ `A | `B | `C | `D ] * 'uid
val uid_of_offset_with_source :
map:'fd W.map ->
digest:'uid digest ->
('fd, 'uid) t ->
kind:[ `A | `B | `C | `D ] ->
raw ->
depth:int ->
cursor:int64 ->
'uid
type 'uid children = cursor:int64 -> uid:'uid -> int64 list
type where = cursor:int64 -> int
Verify.
When the user get a PACK file, he must generate an IDX file (see Idx
) from it - to be able to look-up objects from their uid
. Verify
is a process which try to create an OCaml representation of the IDX file. This process requires some information (see oracle
) which can be collected by a first analyse (see Fp
). Then, the process wants to take the opportunity to parallelize extraction (depending on the IO
implementation).
module Ip
(Scheduler : sig ... end)
(IO : sig ... end)
(Uid : sig ... end) :
sig ... end