LambdaDB currently supports eight types of indexes. Note that the whole document is stored as is regardless of its existence in index configurations, but being stored does not necessarily mean they are searchable.

text

This type is for full-text values, such as the body of an email or the description of a product. These full-text values are analyzed that they are passed through an analyzer to convert the string into a list of individual terms before being indexed. The analysis process allows LambdaDB to search for individual words within each full text field.

text indexes are best suited for unstructured but human-readable content. If you need to index structured content such as email addresses, hostnames, status codes, or tags, it is likely that you should rather use a keyword index.

LambdaDB supports four analyzers for tokenization: standard(default), korean, japanese, english. You can specify multiple analyzers to a single text field to improve search performance.

. (dot) character cannot be used as an index name and you can only add indexes to existing collections. Modifying or deleting existing indexes is not supported.

keyword

keyword type is used for structured content such as IDs, email addresses, hostnames, status codes, zip codes, or tags. Keyword indexes are often used in sorting, aggregations, and term-level queries, such as term. Note that LambdaDB does not index any string longer than 4096.

long

This type os for a signed 64-bit integer with a minimum value of 263-2^{63} and a maximum value of 26312^{63}-1. long indexes are optimized for scoring, sorting, and range queries.

double

This type is for a double-precision 64-bit IEEE 754 floating point number, restricted to finite values. double indexes are optimized for scoring, sorting, and range queries.

boolean

boolean indexes accept JSON true and false values, but can also accept strings which are interpreted as either true or false.

datetime

This type is for date and time in RFC 3339 format. datetime indexes are optimized for sorting and range queries, or to search for a specific time.

import pytz
from datetime import datetime

dt = datetime.now()
print(dt.astimezone(pytz.UTC).isoformat(timespec="seconds"))

# 2024-11-05T14:27:56+00:00

object

JSON documents are hierarchical in nature: the document may contain inner objects which, in turn, may contain inner objects themselves. Internally, this document is indexed as a simple, flat list of key-value pairs. The fields within the object, which can be of any data type, including object. objectIndexConfigs should be specified in order to index the fields inside the object.

{
  "content": {
    "type": "text",
    "analyzers": ["english", "korean", "japanese"]
  },
  "metadata": {
    "type": "object",
    "objectIndexConfigs": {
      "url": {
        "type": "keyword"
      },
      "author": {
        "type": "keyword"
      }
    }
  }
}

vector

The vector type indexes dense vectors of numeric values. vector indexes are primarily used for k-nearest neighbor (kNN) search. The vector type does not support aggregations or sorting. You add a vector field as an array of numeric values

A k-nearest neighbor (kNN) search finds the k nearest vectors to a query vector, as measured by a similarity metric. LambdaDB supports four similarity metrics including l2_norm(euclidean), dot_product, cosine, max_inner_product. You can define the vector similarity to use in kNN search.

An example index configurations

{
  "index_id": {
    "type": "keyword"
  },
  "parent_id": {
    "type": "keyword"
  },
  "summary": {
    "type": "text",
    "analyzers": ["english", "korean", "japanese"]
  },
  "summary_model": {
    "type": "keyword"
  },
  "node_type": {
    "type": "keyword"
  },
  "embed_model": {
    "type": "keyword"
  },
  "content_type": {
    "type": "keyword"
  },
  "content": {
    "type": "text",
    "analyzers": ["english", "korean", "japanese"]
  },
  "text_embedding": {
    "dimensions": 512,
    "similarity": "cosine",
    "type": "vector"
  },
  "created_at": {
    "type": "datetime"
  },
  "updated_at": {
    "type": "datetime"
  },
  "node_info": {
    "type": "object",
    "objectIndexConfigs": {
      "next": {
        "type": "keyword"
      },
      "prev": {
        "type": "keyword"
      }
    }
  },
  "metadata": {
    "type": "object",
    "objectIndexConfigs": {
      "robots_allowed": {
        "type": "boolean"
      },
      "failed_crawl": {
        "type": "boolean"
      },
      "md5_digest": {
        "type": "keyword"
      },
      "url": {
        "type": "keyword"
},
      "og_image": {
        "type": "keyword"
      },
      "related_images": {
        "type": "keyword"
      }
    }
  }
}