package cmarkit

  1. Overview
  2. Docs

CommonMark parser and abstract syntax tree.

See examples.

References.

Abstract syntax tree

module Textloc : sig ... end

Text locations.

module Meta : sig ... end

Node metadata.

type 'a node = 'a * Meta.t

The type for abstract syntax tree nodes. The data of type 'a and its metadata.

module Layout : sig ... end

Types for layout information.

module Block_line : sig ... end

Block lines.

module Label : sig ... end

Labels.

Link definitions.

module Inline : sig ... end

Inlines.

module Block : sig ... end

Blocks.

module Doc : sig ... end

Documents (and parser).

Maps and folds

module Mapper : sig ... end

Abstract syntax tree mappers.

module Folder : sig ... end

Abstract syntax tree folders.

Extensions

For some documents, bare CommonMark just misses it. The extensions are here to make it hit the mark. To enable them use Doc.of_string with strict:false.

Please note the following:

  1. There is no plan to provide an extension mechanism at the parsing level. A lot can already be achieved by using reference resolvers, abusing code fences, post-processing the abstract syntax tree, or extending the renderers.
  2. In order to minimize dialects and extension interaction oddities, there is no plan to allow to selectively enable extensions.
  3. If one day the CommonMark specification standardizes a set of extensions. Cmarkit will support those.
  4. In the short term, there is no plan to support more extensions than those that are listed here.

Strikethrough

According to pandoc.

Strikethrough your ~~perfect~~ imperfect thoughts.

Inline text delimited between two ~~ gets into an Inline.Ext_strikethrough node.

The text delimited by ~~ cannot start or end with Unicode whitespace. When a closer can close multiple openers, the neareast opener is closed. Strikethrough inlines can be nested.

Math

According to a mix of pandoc, GLFM, GFM.

Inline math

This is an inline $\sqrt(x - 1)$ math expression.

Inline text delimited between $ gets into an Inline.Ext_math_span node.

The text delimited by $ cannot start and end with Unicode whitespace. Inline math cannot be nested, after an opener the nearest (non-escaped) closing delimiter matches. Otherwise it is parsed in essence like a code span.

Display math

It's better to get that $$ \left( \sum_{k=1}^n a_k b_k \right)^2 $$
on its own line. A math block may also be more convenient:

```math
\left( \sum_{k=1}^n a_k b_k \right)^2 < \Phi
```

Inline text delimited by $$ gets into a Inline.Ext_math_span with the Inline.Math_span.display property set to true. Alternatively code blocks whose language is math get into in Block.Ext_math_block blocks.

In contrast to $, the text delimited by $$ can start and end with whitespace, however it can't contain a blank line. Display math cannot be nested, after an opener the nearest (non-escaped) closing delimiter matches. Otherwise it's parsed in essence like a code span.

List task items

According to a mix of md4c, GLFM, GFM and personal ad-hoc brewery.

* [ ] That's unchecked.
* [x] That's checked.
* [~] That's cancelled.

If a list item starts with up to three space, followed by followed by [, a single Unicode character, ] and a space (the space can be omitted if the line is empty, but subsequent indentation considers there was one). The Unicode character gets stored in Block.List_item.ext_task_marker and counts as one column regardless of the character's render width. The task marker including the final space is considered part of the list marker as far as subsequent indentation is concerned.

The Unicode character indicates the status of the task. That's up to the client but the function Block.List_item.task_status_of_task_marker which is used by the built-in renderers makes the following choices:

  • Unchecked: ' ' (U+0020).
  • Checked: 'x' (U+0078), 'X' (U+0058), '✓' (U+2713, CHECK MARK), '✔' (U+2714, HEAVY CHECK MARK), '𐄂' (U+10102, AEGEAN CHECK MARK), '🗸' (U+1F5F8, LIGHT CHECK MARK).
  • Cancelled: '~' (U+007E).
  • Other: any other character, interpretation left to clients or renderers (built-in ones equate it with done).

Tables

According to djot.

| # |      Name | Description           |                    Link |
|:-:|----------:|:----------------------|------------------------:|
| 1 |     OCaml | The OCaml website     |     <https://ocaml.org> |
| 2 |   Haskell | The Haskell website   |   <https://haskell.org> |
| 3 |       MDN | Web dev docs | <https://developer.mozilla.org/> |
| 4 | Wikipedia | The Free Encyclopedia | <https://wikipedia.org> |

A table is a sequence of rows, each row starts and ends with a (non-escaped) pipe | character. The first row can't be indented by more than three spaces of indentation, subsequent rows can be arbitrarily indented. Blanks after the final pipe are allowed.

Each row of the table contains cells separated by (non-escaped) pipe | characters. Pipes embedded in inlines constructs do not count as separators (the parsing strategy is to parse the row as an inline, split the result on the | present in toplevel text nodes and strip initial and trailing blanks in cells). The number of | separators plus 1 determines the number of columns of a row. The number of columns of a table is the greatest number of columns of its rows.

A separator line is a row in which every cell content is made only of one or more - optionally prefixed and suffixed by :. These rows are not data, they indicate alignment of data in their cell for subsequent rows (multiple separator lines in a single table are allowed) and that the previous line (if any) was a row of column headers. :- is left aligned -: is right aligned, :-: is centered. If there's no alignement specified it's left aligned.

Tables are stored in Block.Ext_table nodes.

Footnotes

According to djot for the footnote contents.

This is a footnote in history[^1] with mutiple references[^1].
Footnotes are not [very special][^1] references.

 [^1]: Footnotes can have
lazy continuation lines and multiple paragraphs.

  If you start one column after the left bracket, blocks still get
  into the footnote.

 But this is no longer the footnote.

Footnotes go through the label resolution mecanism and share the same namespace as link references (including the ^). They end up being defined in the Doc.defs as Block.Footnote.Def definitions. Footnote references are simply made by using Inline.Link with the corresponding labels.

Definition

A footnote definition starts with a (single line) link label followed by :. The label must start with a ^. Footnote labels go through the label resolution mechanism.

All subsequent lines indented one column further than the start of the label (i.e. starting on the ^) get into the footnote. Lazy continuation lines are supported.

The result is stored in the document's Doc.defs in Block.Footnote.Def cases and it's position in the documentation witnessed by a Block.Ext_footnote_definition node which is kept for layout.

References

Footnote references are simply reference links with the footnote label. Linking text on footnotes is allowed. Shortcut and collapsed references to footnotes are rendered specially by Cmarkit_html.

OCaml

Innovation. Community. Security.