January 2, 2023 · couchdb, javascript and language

tango implements textual syntax for logical/relational expressions compatible with Mango selectors.

In my previous post, I mentioned about couchilla which you can use to bundle design documents for creating map/reduce views on CouchDB.

The /db/_find API is another way of querying a database and results in faster responses in most cases. You interface with it using Mango selector expressions in JSON syntax.

For example, the following query retrieves movies directed by Roy Andersson which are released after 2007.

{
  "$and": [
    {
      "director": {
        "$eq": "Roy Andersson"
      }
    },
    {
      "year": {
        "$gt": 2007
      }
    }
  ]
}

This looks very much like a syntax tree, isn't it? It is a tree-like object that contains syntactic structure of a simple logical expression, such as the following C expression written in infix form.

director == "Roy Andersson" && year > 2007

Actually, this could be a nice addition to my HTML template language as an embedded sub-language for selecting items within section blocks. I already use CouchDB in my frontend server stack, it's not a bad idea to bake the logic for building query expressions into the template language itself.

Parser algorithm

A language parser starts with lexical analysis which is tokenizing a stream of text and grouping tokens into grammatical phrases. There is really nothing special with a lexer, you simply implement a state machine which identifies keywords, symbols and literals character by character. At worst you will need to deal with look-ahead. For example, an arithmetic expression such as 1 + 2 should yield a number literal, followed by an operator token and another number literal.

However, a stream of tokens doesn't give a lot of information to process. A more common internal representation for syntactic structures is AST—abstract syntax tree.

The Shunting yard algorithm I used in tango is invented by Edsger W. Dijkstra; it's a linear time algorithm which implements a parsing method known as operator precedence parsing.

Simply told: the algorithm works by employing a stack and a queue, the stack is for maintaining operator tokens based on their order of operations precedence, and the queue is for outputting the same expression in RPN (Reverse Polish Notation) or to produce an AST instead.

Currently, tango supports the standard C comparison operators only, and parentheses for explicit operator precedence.

Example

(x > y || z == 10) && name == "example"

Parsing this expression outputs an AST which conforms to the declarative JSON querying language found in CouchDB.

{
  "$and": [
    {
      "$or": [
        {
          "x": {
            "$gt": "y"
          }
        },
        {
          "z": {
            "$eq": 10
          }
        }
      ]
    },
    {
      "name": {
        "$eq": "example"
      }
    }
  ]
}

There are other features of Mango which are not supported in here such as the $in operator for matching array elements. Also note that, tango doesn't support unary operators.