package bio_io

  1. Overview
  2. Docs

In_channel for FASTA records. For more general info, see the Record_in_channel module mli file.

Examples

Return all records in a list

let records = Fasta.In_channel.with_file_records fname

Iterating over records

Use the iter functions when you need to go over each record and perform some side-effects with them.

Print sequence IDs and sequence lengths

let () =
  Fasta.In_channel.with_file_iter_records "sequences.fasta"
    ~f:(fun record ->
      let open Fasta.Record in
      printf "%s => %d\n" (id record) (seq_length record))

Print sequence index, IDs, and sequence lengths.

This is like the last example except that we also want to print the index. The first record is 0, the 2nd is 1, etc.

let () =
  Fasta.In_channel.with_file_iteri_records "sequences.fasta"
    ~f:(fun i record ->
      let open Fasta.Record in
      printf "%d: %s => %d\n" (i + 1) (id record) (seq_length record))

Folding over records

If you need to reduce all the records down to a single value, use the fold functions.

Get total length of all sequences in the file.

let total_length =
  Fasta.In_channel.with_file_fold_records "sequences.fasta" ~init:0
    ~f:(fun length record -> length + Fasta.Record.seq_length record)

Pipelines with records

Sometimes you have a "pipeline" of computations that you need to do one after the other on records. In that case, you could the sequence functions. Here's a silly example.

let () =
  Fasta.In_channel.with_file name ~f:(fun chan ->
      Fasta.In_channel.record_sequence chan
      (* Add sequence index to record description *)
      |> Sequence.mapi ~f:(fun i record ->
             let new_desc =
               match Fasta.Record.desc record with
               | None -> Some (sprintf "sequence %d" i)
               | Some old_desc ->
                   Some (sprintf "%s -- sequence %d" old_desc i)
             in
             Fasta.Record.with_desc new_desc record)
      (* Convert all sequence chars to lowercase *)
      |> Sequence.map ~f:(fun record ->
             let new_seq = String.lowercase (Fasta.Record.seq record) in
             Fasta.Record.with_seq new_seq record)
      (* Print sequences *)
      |> Sequence.iter ~f:(fun record ->
             print_endline @@ Fasta.Record.to_string record))

One thing to watch out for though...if you get an exception half way through and you are running side-effecting code like we are here then part of your side effects will have occured and part of them will not have occured.

As you can see, if that fasta file has more than one sequence it will hit the assert false and blow up.

include Record_in_channel.S with type record := Record.t

API

type t
val stdin : t

create file_name opens an t on the standard input channel.

val create : Base.string -> t

create file_name opens an input channel on the file specified by file_name. You may want to use Base.Exn.protectx with this.

val close : t -> Base.unit

close t Close the t.

val with_file : Base.string -> f:(t -> 'a) -> 'a

with_file file_name ~f executes ~f on the channel created from file_name and ensures it is closed properly.

val equal : t -> t -> Base.bool

equal t1 t2 compares t1 and t2 for equality.

val input_record : t -> Record.t Base.option

input_record t returns Some record if there is a record to return. If there are no more records, None is returned. Raises exceptions on bad input (e.g., bad file format).

Folding over records

val fold_records : t -> init:'a -> f:('a -> Record.t -> 'a) -> 'a

fold_records t ~init ~f reduces all records from a t down to a single value of type 'a.

val foldi_records : t -> init:'a -> f:(Base.int -> 'a -> Record.t -> 'a) -> 'a

fold'_records t ~init ~f is like fold_records except that f is provided the 0-based record index as its first argument.

Folding with file name

val with_file_fold_records : Base.string -> init:'a -> f:('a -> Record.t -> 'a) -> 'a

with_file_fold_records file_name ~init ~f is like fold_records t ~init ~f except that it is passed a file name, and it manages t automatically. See with_file.

val with_file_foldi_records : Base.string -> init:'a -> f:(Base.int -> 'a -> Record.t -> 'a) -> 'a

with_file_foldi_records file_name ~init ~f is like foldi_records t ~init ~f except that it is passed a file name, and it manages t automatically. See with_file.

Iterating over records

The iter functions are like the fold functions except they do not take an init value and the f function returns unit insead of some other value 'a, and thus return unit rather than a value 'a.

Use them for side-effects.

val iter_records : t -> f:(Record.t -> Base.unit) -> Base.unit

iter_records t ~f calls f on each record in t. As f returns unit this is generally used for side effects.

val iteri_records : t -> f:(Base.int -> Record.t -> Base.unit) -> Base.unit

iteri_records t ~f is like iteri_records t ~f except that f is passed in the 0-indexed record index as its first argument.

Iterating with file name

val with_file_iter_records : Base.string -> f:(Record.t -> Base.unit) -> Base.unit

with_file_iter_records file_name ~init ~f is like iter_records t ~init ~f except that it is passed a file name, and it manages t automatically. See with_file.

val with_file_iteri_records : Base.string -> f:(Base.int -> Record.t -> Base.unit) -> Base.unit

with_file_iteri_records file_name ~init ~f is like iteri_records t ~init ~f except that it is passed a file name, and it manages t automatically. See with_file.

Getting records as a list

These functions return record lists.

val records : t -> Record.t Base.list

With file name

val with_file_records : Base.string -> Record.t Base.list

Getting records as a sequence

These are a bit different:

* There are no with_file versions as you would have to do some fiddly things to keep the channel open, making them not so nice to use.

* If an exception is raised sometime during the pipeline, it will blow up, but any successful processing that happended, will have happened. So be careful if you are doing side-effecting things.

val record_sequence : t -> Record.t Base.Sequence.t

record_sequence t returns a Sequence.t of record.