Library
Module
Module type
Parameter
Class
Class type
Substrings.
A substring defines a possibly empty subsequence of bytes in a base string.
The positions of a string s
of length l
are the slits found before each byte and after the last byte of the string. They are labelled from left to right by increasing number in the range [0
;l
].
positions 0 1 2 3 4 l-1 l +---+---+---+---+ +-----+ indices | 0 | 1 | 2 | 3 | ... | l-1 | +---+---+---+---+ +-----+
The i
th byte index is between positions i
and i+1
.
Formally we define a substring of s
as being a subsequence of bytes defined by a start and a stop position. The former is always smaller or equal to the latter. When both positions are equal the substring is empty. Note that for a given base string there are as many empty substrings as there are positions in the string.
Like in strings, we index the bytes of a substring using zero-based indices.
See how to use substrings to parse data.
type t = sub
The type for substrings.
val empty : sub
empty
is the empty substring of the empty string String.empty
.
val v : ?start:int -> ?stop:int -> string -> sub
v ~start ~stop s
is the substring of s
that starts at position start
(defaults to 0
) and stops at position stop
(defaults to String.length s
).
val start_pos : sub -> int
start_pos s
is s
's start position in the base string.
val stop_pos : sub -> int
stop_pos s
is s
's stop position in the base string.
val base_string : sub -> string
base_string s
is s
's base string.
val length : sub -> int
length s
is the number of bytes in s
.
val get : sub -> int -> char
get s i
is the byte of s
at its zero-based index i
.
val get_byte : sub -> int -> int
get_byte s i
is Char.to_int (get s i)
.
val head : ?rev:bool -> sub -> char option
head s
is Some (get s h)
with h = 0
if rev = false
(default) or h = length s - 1
if rev = true
. None
is returned if s
is empty.
val of_string : string -> sub
of_string s
is v s
val to_string : sub -> string
to_string s
is the bytes of s
as a string.
rebase s
is v (to_string s)
. This puts s
on a base string made solely of its bytes.
val hash : sub -> int
hash s
is Hashtbl.hashs
.
See the graphical guide.
tail s
is s
without its first (rev
is false
, default) or last (rev
is true
) byte or s
if it is empty.
extend ~rev ~max ~sat s
extends s
by at most max
consecutive sat
satisfiying bytes of the base string located after stop s
(rev
is false
, default) or before start s
(rev
is true
). If max
is unspecified the extension is limited by the extents of the base string of s
. sat
defaults to fun _ -> true
.
reduce ~rev ~max ~sat s
reduces s
by at most max
consecutive sat
satisfying bytes of s
located before stop
s
(rev
is false
, default) or after start s
(rev
is true
). If max
is unspecified the reduction is limited by the extents of the substring s
. sat
defaults to fun _ ->
true
.
extent s s'
is the smallest substring that includes all the positions of s
and s'
.
overlap s s'
is the smallest substring that includes all the positions common to s
and s'
or None
if there are no such positions. Note that the overlap substring may be empty.
append s s'
is like append
. The substrings can be on different bases and the result is on a base string that holds exactly the appended bytes.
concat ~sep ss
is like String.concat
. The substrings can all be on different bases and the result is on a base string that holds exactly the concatenated bytes.
val is_empty : sub -> bool
is_empty s
is length s = 0
.
is_prefix
is like String.is_prefix
. Only bytes are compared, affix
can be on a different base string.
is_infix
is like String.is_infix
. Only bytes are compared, affix
can be on a different base string.
is_suffix
is like String.is_suffix
. Only bytes are compared, affix
can be on a different base string.
val for_all : (char -> bool) -> sub -> bool
for_all
is like String.for_all
on the substring.
val exists : (char -> bool) -> sub -> bool
exists
is like String.exists
on the substring.
same_base s s'
is true
iff the substrings s
and s'
have the same base string according to physical equality.
equal_bytes s s'
is true
iff the substrings s
and s'
have exactly the same bytes. The substrings can be on a different base string.
compare_bytes s s'
compares the bytes of s
and s
' in lexicographical order. The substrings can be on a different base string.
compare s s'
compares the positions of s
and s'
in lexicographical order.
Extracted substrings are always on the same base string as the substring s
acted upon.
with_range
is like String.sub_with_range
. The indices are the substring's zero-based ones, not those in the base string.
with_index_range
is like String.sub_with_index_range
. The indices are the substring's zero-based ones, not those in the base string.
trim
is like String.trim
. If all bytes are dropped returns an empty string located in the middle of the argument.
span
is like String.span
. For a substring s
a left empty span is start s
and a right empty span is stop s
.
take
is like String.take
.
drop
is like String.drop
.
cut
is like String.cut
. sep
can be on a different base string
cuts
is like String.cuts
. sep
can be on a different base string
fields
is like String.fields
.
find ~rev sat s
is the substring of s
(if any) that spans the first byte that satisfies sat
in s
after position start s
(rev
is false
, default) or before stop s
(rev
is true
). None
is returned if there is no matching byte in s
.
find_sub ~rev ~sub s
is the substring of s
(if any) that spans the first match of sub
in s
after position start s
(rev
is false
, default) or before stop s
(rev
is true
). Only bytes are compared and sub
can be on a different base string. None
is returned if there is no match of sub
in s
.
filter sat s
is like String.filter
. The result is on a base string that holds only the filtered bytes.
filter_map f s
is like String.filter_map
. The result is on a base string that holds only the filtered bytes.
map
is like String.map
. The result is on a base string that holds only the mapped bytes.
mapi
is like String.mapi
. The result is on a base string that holds only the mapped bytes. The indices are the substring's zero-based ones, not those in the base string.
val fold_left : ('a -> char -> 'a) -> 'a -> sub -> 'a
fold_left
is like String.fold_left
.
val fold_right : (char -> 'a -> 'a) -> sub -> 'a -> 'a
fold_right
is like String.fold_right
.
val iter : (char -> unit) -> sub -> unit
iter
is like String.iter
.
val iteri : (int -> char -> unit) -> sub -> unit
iteri
is like String.iteri
. The indices are the substring's zero-based ones, not those in the base string.
val pp : Stdlib.Format.formatter -> sub -> unit
pp ppf s
prints s
's bytes on ppf
.
val dump : Stdlib.Format.formatter -> sub -> unit
dump ppf s
prints s
as a syntactically valid OCaml string on ppf
using Ascii.escape_string
.
val dump_raw : Stdlib.Format.formatter -> sub -> unit
dump_raw ppf s
prints an unspecified raw internal representation of s
on ppf.
val of_char : char -> sub
of_char c
is a string that contains the byte c
.
val to_char : sub -> char option
to_char s
is the single byte in s
or None
if there is no byte or more than one in s
.
val of_bool : bool -> sub
of_bool b
is a string representation for b
. Relies on Stdlib.string_of_bool
.
val to_bool : sub -> bool option
to_bool s
is a bool
from s
, if any. Relies on Stdlib.bool_of_string
.
val of_int : int -> sub
of_int i
is a string representation for i
. Relies on Stdlib.string_of_int
.
val to_int : sub -> int option
to_int
is an int
from s
, if any. Relies on Stdlib.int_of_string
.
val of_nativeint : nativeint -> sub
of_nativeint i
is a string representation for i
. Relies on Nativeint.of_string
.
val to_nativeint : sub -> nativeint option
to_nativeint
is an nativeint
from s
, if any. Relies on Nativeint.to_string
.
val of_int32 : int32 -> sub
of_int32 i
is a string representation for i
. Relies on Int32.of_string
.
val to_int32 : sub -> int32 option
to_int32
is an int32
from s
, if any. Relies on Int32.to_string
.
val of_int64 : int64 -> sub
of_int64 i
is a string representation for i
. Relies on Int64.of_string
.
val to_int64 : sub -> int64 option
to_int64
is an int64
from s
, if any. Relies on Int64.to_string
.
val of_float : float -> sub
of_float f
is a string representation for f
. Relies on Stdlib.string_of_float
.
val to_float : sub -> float option
to_float s
is a float
from s
, if any. Relies on Stdlib.float_of_string
.
+---+---+---+---+---+---+---+---+---+---+---+ | R | e | v | o | l | t | | n | o | w | ! | +---+---+---+---+---+---+---+---+---+---+---+ |---------------| a | start a | stop a |-----------| tail a |-----------| tail ~rev:true a |-----------------------------------| extend a |-----------------------| extend ~rev:true a |-------------------------------------------| base a |-----------| b | start b | stop b |-------| tail b |-------| tail ~rev:true b |-------------------------------------------| extend b |-----------| extend ~rev:true b |-------------------------------------------| base b |-----------------------| extent a b |---| overlap a b | c | start c | stop c | tail c | tail ~rev:true c |---------------| extend c |---------------------------| extend ~rev:true c |-------------------------------------------| base c |-------------------| extent a c None overlap a c |---------------| d | start d | stop d |-----------| tail d |-----------| tail ~rev:true d |---------------| extend d |-------------------------------------------| extend ~rev:true d |-------------------------------------------| base d |---------------| extent d c | overlap d c