package zipc

  1. Overview
  2. Docs

ZIP archives.

Consult the quick start and limitations.

References.

Archive members

type compression =
  1. | Bzip2
  2. | Deflate
    (*

    Via Zipc_deflate.

    *)
  3. | Lzma
  4. | Stored
    (*

    No compression.

    *)
  5. | Xz
  6. | Zstd
  7. | Other of int

The type for compression formats.

Zipc only handles Stored and Deflate but third party libraries can be used to support others formats or to plug an alternate implementation of Deflate.

val pp_compression : Stdlib.Format.formatter -> compression -> unit

pp_compression formats compression formats.

module Fpath : sig ... end

File paths and modes.

module Ptime : sig ... end

POSIX time.

module File : sig ... end

Archive file data.

module Member : sig ... end

Archive members.

Archives

type t

The type for ZIP archives.

val empty : t

empty is an empty archive.

val is_empty : t -> bool

is_empty z is true iff z is empty.

val mem : Fpath.t -> t -> bool

mem p z is true iff z has a member with path p.

val find : Fpath.t -> t -> Member.t option

find p z is the member with path p of z (if any).

val fold : (Member.t -> 'a -> 'a) -> t -> 'a -> 'a

fold f z acc folds f over the members of z starting with acc in increasing lexicographic member path order. In particular this means that directory members, if they exist, are folded over before any of their content (assuming paths without relative segments).

val add : Member.t -> t -> t

add member z is z with member added. Overrides a previous member with the same path in z (if any).

val remove : Fpath.t -> t -> t

remove p is z with member with path p removed (if any).

val member_count : t -> int

member_count z is the number of members in z.

val to_string_map : t -> Member.t Stdlib.Map.Make(Stdlib.String).t

to_string_map z is z as a map from Member.path to their values.

val of_string_map : Member.t Stdlib.Map.Make(Stdlib.String).t -> t

of_string_map map is map as a ZIP archive.

Warning. It is assumed that in map each key k maps to a member m with Member.path m = k. This is not checked by the function.

Decode

val string_has_magic : string -> bool

string_has_magic s is true iff s has at least 4 bytes and starts with PK\x03\04 or PK\x05\06 (empty archive).

val of_binary_string : string -> (t, string) Stdlib.result

of_binary_string s decodes a ZIP archive from s.

Note. ZIP archives's integrity constraints are unclear. For now based on sanity and certain archives found in the wild that are supported by the unzip tool the following is done:

  • As a rule of thumb, all member metadata is determined only from the archive's central directory file header; local file headers and data descriptors are ignored.
  • If a directory member pretends to have file data this data is ignored.
  • If a path is defined more than once, the second definition takes over.
  • If the central directory CRC-32 of a file member is 0 we lookup and use the value found in its local file header.

Encode

val encoding_size : t -> int

encoding_size z is the number of bytes needed to encode z.

val to_binary_string : ?first:Fpath.t -> t -> (string, string) Stdlib.result

to_binary_string z is the encoding of archive z. Error _ is returned with a suitable error message in case z has more members than Member.max.

If a member with path first exists in z then this member's data is written first in the ZIP archive. It defaults to "mimetype" to support the EPUB OCF ZIP container constraint (you are however in charge of making sure this member is not compressed in this case).

Note.

  • Member.mtime that are before the Ptime.dos_epoch are silently truncated to that date.
  • Except for first, member data is encoded in the (deterministic) increasing lexical order of their path.
  • The encoding does not use data descriptors, so bit 3 of File.gp_flags is always set to 0 on encoding.
val write_bytes : ?first:Fpath.t -> t -> ?start:int -> bytes -> (unit, string) Stdlib.result

write_bytes t ~start b writes to_binary_string to bytes starting at start (defaults to 0).

Raises Invalid_argument if b is too small.

Limitations

Up to the limitations listed below Zipc is suitable for the following:

  • Reading and writing the subset of ZIP archives defined by ISO/IEC 21320-1 which is used as a documentation container for the Office Open XML or OpenDocument file formats. This subset mandates only stored or deflate compression formats.
  • Reading and writing the EPUB file format which loosely refers to the previous standard in its definition. These may however be ZIP64 if needed (see below).
  • Reading and writing dozen of others formats that are based on ZIP like .jar, .usdz (mandates no compression), .kmz, etc. Note that these formats do not always formally restrict the compression formats but deflate seems to be widely used.

It is not the aim of Zipc to be able to read every ZIP archive out there. The format is quite loose, highly denormalized, has plenty of ways to encode metadata and allows many modern and legacy compression algorithms to be used. Hence take into account the following points:

  • The current implementation is simple, it needs the whole archive in-memory for encoding or decoding.
  • The current implementation does not preserve the information about the order of files in the ZIP archive and generally writes members in the lexicographic order of their path save for the first one which can be specified with the optional argument first in Zipc.to_binary_string and defaults to "mimetype". This supports the EPUB OCF ZIP container constraint which is the only format we are aware of that mandates an ordering in ZIP archives. A more general scheme (e.g. a Zipc.Member.order property) could be devised would that be needed.
  • It handles only deflate and stored (no compression) compression formats. It has decent performance but if you find yourself limited by it or need other formats, third-party compression libraries can be easily integrated.
  • It is possible to rewrite an archive without touching or decompressing some of its members, however some metadata like comment fields may be lost in the process. See also of_binary_string.
  • For now it does not handle ZIP64. ZIP64 is needed if your ZIP archive or decompressed file sizes exceed 4Go (232-1 bytes) or if you need more than 65535 archive members.
  • It does not handle encrypted ZIP archives. Most standards avoid this anyways.
  • It does not handle multipart archives. Most standards avoid this anyways.
  • On 32-bit platforms one is severly limited by Sys.max_string_size.
  • Compressed and decompressed sizes are uint32 values in Zip archives but are represented by an OCaml int in Zipc. This is not a problem on 64-bit platforms but can be in on 32-bit platforms and js_of_ocaml where Int.max_int is respectively 230-1 and 231-1. See Zipc.File.max_size for more information.
OCaml

Innovation. Community. Security.