package bap-byteweight

  1. Overview
  2. Docs

Byteweight algorithm interface.

Byteweight is a supervised machine learning algorithm. Based on the input string, where each substrings is labeled by true of false, a function is inferred that can map substrings into boolean domain.

For example, if the label function teaches whether the given substring is a start of a function, we can infer an algorithm for finding function starts.

type t
include Bin_prot.Binable.S with type t := t
include Bin_prot.Binable.S_only_functions with type t := t
val bin_size_t : t Bin_prot.Size.sizer
val bin_write_t : t Bin_prot.Write.writer
val bin_read_t : t Bin_prot.Read.reader
val __bin_read_t__ : (int -> t) Bin_prot.Read.reader

This function only needs implementation if t exposed to be a polymorphic variant. Despite what the type reads, this does *not* produce a function after reading; instead it takes the constructor tag (int) before reading and reads the rest of the variant t afterwards.

val bin_shape_t : Bin_prot.Shape.t
val bin_writer_t : t Bin_prot.Type_class.writer
val bin_reader_t : t Bin_prot.Type_class.reader
include Ppx_sexp_conv_lib.Sexpable.S with type t := t
val t_of_sexp : Sexplib0.Sexp.t -> t
val sexp_of_t : t -> Sexplib0.Sexp.t
type key
type corpus
val create : unit -> t

create () creates an empty instance of the byteweigth decider.

val train : t -> max_length:int -> (key -> bool) -> corpus -> unit

train decider ~max_length test corpus train the decider on the specified corpus. The test function classifies extracted substrings. The max_length parameter binds the maximum length of substrings.

val length : t -> int

length decider total amount of different substrings known to a decider.

next t ~length ~threshold data begin the next positive chunk.

Returns an offset that is greater than begin of the next longest substring up to the given length, for which h1 / (h0 + h1) > threshold.

This is a specialization of the next_if function from the extended V1.V2.S interface.

val next : t -> length:int -> threshold:float -> corpus -> int -> int option
val pp : Format.formatter -> t -> unit

pp ppf decider prints all known to decider chunks.