package ez_search

  1. Overview
  2. Docs

This module implements full-text search with regexps in a set of files. Two steps are required: in the first step, a database is generated from all the files; in the second step, searches are performed in the database.

module TYPES : sig ... end
val index_directory : db_dir:string -> ?db_name:string -> select:(string -> bool) -> string -> unit

index_directory ~db_dir ?db_name ~select DIRECTORY index all files in DIRECTORY, and store the index in db_dir. Every top-directory in DIRECTORY is considered as a file_entry name, and file_name are relative paths within top-directories. select takes a path in argument and returns true if the content of the path should be indexed. WARNING: not reentrant. temporary chdir to the directory.

val index_files : db_dir:string -> ?db_name:string -> ((file_entry:string -> file_name:string -> file_content:string -> unit) -> unit) -> unit

index_files ~db_dir ?db_name f creates an index on disk in directory db_dir with database name db_name. f is called by index_files with a function that should be called for each file to index with arguments ~file_entry ~file_name ~file_content.

val load_db : db_dir:string -> ?db_name:string -> ?use_mapfile:bool -> unit -> TYPES.db

load_db ~db_dir ?db_name ?use_mapfile () loads the database in memory. use_mapfile controls whether to use a memory-mapped file or load it normally. Memory-mapped files are normally more efficient, but support may be more unstable.

val count_lines_total : db:TYPES.db -> int

count_lines_total ~db counts the number of '\n' in the database. Needs some time to iter on the whole text.

val length : db:TYPES.db -> int

length ~db returns the number of chars in the database.

search ~db ~f ?pos ?last ?len find searches with find in the database, starting either from pos, from after the last occurrence last, or from the beginning. Calls f for every occurrence found. f returns a boolean, that should be true if the search should continue after, or false if the search should terminate immediately. len is the string length to use.

val search_and_count : db:TYPES.db -> ?is_regexp:bool -> ?is_case_sensitive:bool -> ?ncores:int -> ?maxn:int -> ?find:(pos:int -> len:int -> string -> int) -> ?engine:[ `Re | `Str ] -> string -> int * TYPES.occurrence list

search_and_count ~db ?is_regexp ?is_case_sensitive ?ncores ?maxn ?find term searches term in the database, either using find if provided, or a mix of Str and memmem otherwise (depending on is_regexp and is_case_sensitive). Uses Parmap to split the computation on multiple cores, with at most ncores if provided. Returns a very close approximation of the number of occurrences (exact on 1 core), and a list of at least maxn occurrences.

val occurrence_file : db:TYPES.db -> TYPES.occurrence -> TYPES.occurrence_file

occurrence_file ~db pos returns the file occurrence of the match.

val occurrence_line : db:TYPES.db -> TYPES.occurrence_file -> int

occurrence_line ~db occ returns the line number in the file.

val occurrence_context : db:TYPES.db -> line:int -> TYPES.occurrence_file -> max:int -> TYPES.occurrence_context

occurrence_context ~db ~line occ ~max returns the context of the occurrence of in the file. The line number of the occurrence, as provided by occurrence_line should be provided. The parameter max controls how many lines should be returned before and after the occurrence.

val file_content : db:TYPES.db -> TYPES.file -> string

file_content ~db file returns the content of the file, as retrieved from the database.

val files : db:TYPES.db -> TYPES.file array

files ~db returns all the files stored in the database.

val pos : TYPES.occurrence -> int
val text : db:TYPES.db -> string
val memmem : haystack:string -> pos:int -> len:int -> needle:string -> int
val time : string -> ('a -> 'b) -> 'a -> 'b

time msg f x prints the time spent executing f x.