package bos
Library
Module
Module type
Parameter
Class
Class type
include module type of struct include Astring.String end
Strings, substrings, string sets and maps.
A string s
of length l
is a zero-based indexed sequence of l
bytes. An index i
of s
is an integer in the range [0
;l-1
], it represents the i
th byte of s
which can be accessed using the string indexing operator s.[i]
.
Important. OCaml's string
s became immutable since 4.02. Whenever possible compile your code with the -safe-string
option. This module does not expose any mutable operation on strings and assumes strings are immutable. See the porting guide.
String
v len f
is a string s
of length len
with s.[i] = f
i
for all indices i
of s
. f
is invoked in increasing index order.
get s i
is the byte of s
' at index i
. This is equivalent to the s.[i]
notation.
head s
is Some (get s h)
with h = 0
if rev = false
(default) or h = length s - 1
if rev = true
. None
is returned if s
is empty.
get_head s
is like head
but
hash s
is Hashtbl.hash
s
.
Appending strings
concat ~sep ss
concatenates the list of strings ss
, separating each consecutive elements in the list ss
with sep
(defaults to empty
).
Predicates
is_prefix ~affix s
is true
iff affix.[i] = s.[i]
for all indices i
of affix
.
is_infix ~affix s
is true
iff there exists an index j
in s
such that for all indices i
of affix
we have affix.[i] = s.[j + i]
.
is_suffix ~affix s
is true iff affix.[n - i] = s.[m - i]
for all indices i
of affix
with n = String.length affix - 1
and m =
String.length s - 1
.
for_all p s
is true
iff for all indices i
of s
, p s.[i]
= true
.
exists p s
is true
iff there exists an index i
of s
with p s.[i] = true
.
compare s s'
is Stdlib.compare s s'
, it compares the byte sequences of s
and s'
in lexicographical order.
Extracting substrings
Tip. These functions extract substrings as new strings. Using substrings may be less wasteful and more flexible.
with_range ~first ~len s
are the consecutive bytes of s
whose indices exist in the range [first
;first + len - 1
].
first
defaults to 0
and len
to max_int
. Note that first
can be any integer and len
any positive integer.
with_index_range ~first ~last s
are the consecutive bytes of s
whose indices exist in the range [first
;last
].
first
defaults to 0
and last
to String.length s - 1
.
Note that both first
and last
can be any integer. If first > last
the interval is empty and the empty string is returned.
trim ~drop s
is s
with prefix and suffix bytes satisfying drop
in s
removed. drop
defaults to Char.Ascii.is_white
.
span ~rev ~min ~max ~sat s
is (l, r)
where:
- if
rev
isfalse
(default),l
is at leastmin
and at mostmax
consecutivesat
satisfying initial bytes ofs
orempty
if there are no such bytes.r
are the remaining bytes ofs
. - if
rev
istrue
,r
is at leastmin
and at mostmax
consecutivesat
satisfying final bytes ofs
orempty
if there are no such bytes.l
are the remaining the bytes ofs
.
If max
is unspecified the span is unlimited. If min
is unspecified it defaults to 0
. If min > max
the condition can't be satisfied and the left or right span, depending on rev
, is always empty. sat
defaults to (fun _ -> true)
.
The invariant l ^ r = s
holds.
take ~rev ~min ~max ~sat s
is the matching span of span
without the remaining one. In other words:
(if rev then snd else fst) @@ span ~rev ~min ~max ~sat s
drop ~rev ~min ~max ~sat s
is the remaining span of span
without the matching span. In other words:
(if rev then fst else snd) @@ span ~rev ~min ~max ~sat s
cut ~sep s
is either the pair Some (l,r)
of the two (possibly empty) substrings of s
that are delimited by the first match of the non empty separator string sep
or None
if sep
can't be matched in s
. Matching starts from the beginning of s
(rev
is false
, default) or the end (rev
is true
).
The invariant l ^ sep ^ r = s
holds.
cuts sep s
is the list of all substrings of s
that are delimited by matches of the non empty separator string sep
. Empty substrings are omitted in the list if empty
is false
(defaults to true
).
Matching separators in s
starts from the beginning of s
(rev
is false
, default) or the end (rev
is true
). Once one is found, the separator is skipped and matching starts again, that is separator matches can't overlap. If there is no separator match in s
, the list [s]
is returned.
The following invariants hold:
concat ~sep (cuts ~empty:true ~sep s) = s
cuts ~empty:true ~sep s <> []
fields ~empty ~is_sep s
is the list of (possibly empty) substrings that are delimited by bytes for which is_sep
is true
. Empty substrings are omitted in the list if empty
is false
(defaults to true
). is_sep
defaults to Char.Ascii.is_white
.
Substrings
type sub = Astring.String.sub
The type for substrings.
val sub_with_range : ?first:int -> ?len:int -> string -> sub
sub_with_range
is like with_range
but returns a substring value. If first
is smaller than 0
the empty string at the start of s
is returned. If first
is greater than the last index of s
the empty string at the end of s
is returned.
val sub_with_index_range : ?first:int -> ?last:int -> string -> sub
sub_with_index_range
is like with_index_range
but returns a substring value. If first
and last
are smaller than 0
the empty string at the start of s
is returned. If first
and is greater than the last index of s
the empty string at the end of s
is returned. If first > last
and first
is an index of s
the empty string at first
is returned.
module Sub = Astring.String.Sub
Substrings.
Traversing strings
find ~rev ~start sat s
is:
- If
rev
isfalse
(default). The smallest indexi
, if any, greater or equal tostart
such thatsat s.[i]
istrue
.start
defaults to0
. - If
rev
istrue
. The greatest indexi
, if any, smaller or equal tostart
such thatsat s.[i]
istrue
.start
defaults toString.length s - 1
.
Note that start
can be any integer.
find_sub ~rev ~start ~sub s
is:
- If
rev
isfalse
(default). The smallest indexi
, if any, greater or equal tostart
such thatsub
can be found starting ati
ins
that iss.[i] = sub.[0]
,s.[i+1] = sub.[1]
, ...start
defaults to0
. - If
rev
istrue
. The greatest indexi
, if any, smaller or equal tostart
such thatsub
can be found starting ati
ins
that iss.[i] = sub.[0]
,s.[i+1] = sub.[1]
, ...start
defaults toString.length s - 1
.
Note that start
can be any integer.
filter sat s
is the string made of the bytes of s
that satisfy sat
, in the same order.
filter_map f s
is the string made of the bytes of s
as mapped by f
, in the same order.
map f s
is s'
with s'.[i] = f s.[i]
for all indices i
of s
. f
is invoked in increasing index order.
mapi f s
is s'
with s'.[i] = f i s.[i]
for all indices i
of s
. f
is invoked in increasing index order.
fold_left f acc s
is f (
...(f (f acc s.[0]) s.[1])
...) s.[m]
with m = String.length s - 1
.
fold_right f s acc
is f s.[0] (f s.[1] (
...(f s.[m] acc) )
...)
with m = String.length s - 1
.
iter f s
is f s.[0]; f s.[1];
... f s.[m]
with m = String.length s - 1
.
iteri f s
is f 0 s.[0]; f 1 s.[1];
... f m s.[m]
with m = String.length s - 1
.
Uniqueness
uniquify ss
is ss
without duplicates, the list order is preserved.
Strings as US-ASCII character sequences
module Ascii = Astring.String.Ascii
US-ASCII string support.
Pretty printing
val pp : Format.formatter -> string -> unit
pp ppf s
prints s
's bytes on ppf
.
val dump : Format.formatter -> string -> unit
dump ppf s
prints s
as a syntactically valid OCaml string on ppf
using Ascii.escape_string
.
String sets and maps
type set = Astring.String.set
The type for string sets.
module Set = Astring.String.Set
String sets.
module Map = Astring.String.Map
String maps.
type +'a map = 'a Map.t
The type for maps from strings to values of type 'a.
OCaml base type conversions
to_char s
is the single byte in s
or None
if there is no byte or more than one in s
.
of_bool b
is a string representation for b
. Relies on Stdlib.string_of_bool
.
to_bool s
is a bool
from s
, if any. Relies on Stdlib.bool_of_string
.
of_int i
is a string representation for i
. Relies on Stdlib.string_of_int
.
to_int
is an int
from s
, if any. Relies on Stdlib.int_of_string
.
of_nativeint i
is a string representation for i
. Relies on Nativeint.of_string
.
to_nativeint
is an nativeint
from s
, if any. Relies on Nativeint.to_string
.
of_int32 i
is a string representation for i
. Relies on Int32.of_string
.
to_int32
is an int32
from s
, if any. Relies on Int32.to_string
.
of_int64 i
is a string representation for i
. Relies on Int64.of_string
.
to_int64
is an int64
from s
, if any. Relies on Int64.to_string
.
of_float f
is a string representation for f
. Relies on Stdlib.string_of_float
.
to_float s
is a float
from s
, if any. Relies on Stdlib.float_of_string
.