LambdaDB currently supports eight types of indexes. Note that the whole document is stored as is regardless of its existence in index configurations, but being stored does not necessarily mean they are searchable.
The dot (.) character cannot be used as a field name and you can only add indexes to existing collections. Modifying or deleting existing indexes is not supported.

text

This type is for full-text values, such as the body of an email or the description of a product. These full-text values are analyzed by passing them through an analyzer to convert the string into a list of individual terms before being indexed. The analysis process allows LambdaDB to search for individual words within each full text field. text indexes are best suited for unstructured but human-readable content. If you need to index structured content such as email addresses, hostnames, status codes, or tags, you should rather use a keyword index. LambdaDB supports four analyzers for tokenization: standard (default), korean, japanese, english. You can specify multiple analyzers to a single text field to improve search performance.
{
    "content": {
        "type": models.TypeText.TEXT,
        "analyzers": [
            models.Analyzer.ENGLISH,
            models.Analyzer.KOREAN,
            models.Analyzer.JAPANESE
        ]
    }
}

keyword

keyword type is used for structured content such as IDs, email addresses, hostnames, status codes, zip codes, or tags. Keyword indexes are often used in sorting, aggregations, and term-level queries.
LambdaDB does not index any string longer than 4096 characters.
You can also store multiple keyword values as an array for fields like tags or categories.
{
    "status": {"type": models.Type.KEYWORD},
    "category": {"type": models.Type.KEYWORD}
}

long

This type is for a signed 64-bit integer with a minimum value of 263-2^{63} and a maximum value of 26312^{63}-1. long indexes are optimized for scoring, sorting, and range queries.
{
    "user_id": {"type": models.Type.LONG},
    "timestamp": {"type": models.Type.LONG}
}

double

This type is for a double-precision 64-bit IEEE 754 floating point number, restricted to finite values. double indexes are optimized for scoring, sorting, and range queries.
{
    "score": {"type": models.Type.DOUBLE},
    "price": {"type": models.Type.DOUBLE}
}

boolean

boolean indexes accept JSON true and false values, but can also accept strings which are interpreted as either true or false.
{
    "is_active": {"type": models.Type.BOOLEAN},
    "published": {"type": models.Type.BOOLEAN}
}

datetime

This type is for date and time in RFC 3339 format. datetime indexes are optimized for sorting and range queries.
import pytz
from datetime import datetime

# Example datetime usage
dt = datetime.now()
print(dt.astimezone(pytz.UTC).isoformat(timespec="seconds"))
# Output: 2024-11-05T14:27:56+00:00

# Index configuration
{
    "created_at": {"type": models.Type.DATETIME},
    "updated_at": {"type": models.Type.DATETIME}
}

vector

The vector type indexes dense vectors of numeric values. vector indexes are primarily used for k-nearest neighbor (kNN) search. The vector type does not support aggregations or sorting. You add a vector field as an array of numeric values. A kNN search finds the k nearest vectors to a query vector, as measured by a similarity metric. LambdaDB supports four similarity metrics: euclidean, dot_product, cosine, max_inner_product. You can define the vector similarity to use in kNN search. LambdaDB also supports multi-field vector search, allowing you to perform kNN searches across multiple vector fields simultaneously within a single query. This enables complex semantic search scenarios where you can combine different types of embeddings (e.g., text embeddings, image embeddings) in one search operation.
{
    "embedding": {
        "type": models.TypeVector.VECTOR,
        "dimensions": 768,
        "similarity": models.Similarity.COSINE
    },
    "image_vector": {
        "type": models.TypeVector.VECTOR,
        "dimensions": 512,
        "similarity": models.Similarity.EUCLIDEAN
    }
}

sparseVector

The sparseVector type is designed for storing and indexing sparse vectors, where most elements are zero or missing. Unlike dense vectors, sparse vectors only store non-zero values along with their corresponding indexes. sparseVector type only supports dot_product distance metrics.
{
    "sparse_embedding": {"type": models.Type.SPARSE_VECTOR}
}

object

JSON documents are hierarchical in nature: the document may contain inner objects which, in turn, may contain inner objects themselves. Internally, this document is indexed as a simple, flat list of key-value pairs. The fields within the object can be of any data type, including object. objectIndexConfigs should be specified in order to index the fields inside the object.
{
    "metadata": {
        "type": models.TypeObject.OBJECT,
        "objectIndexConfigs": {
            "url": {"type": models.Type.KEYWORD},
            "author": {"type": models.Type.KEYWORD},
            "content": {
                "type": models.TypeText.TEXT,
                "analyzers": [models.Analyzer.ENGLISH, models.Analyzer.KOREAN]
            }
        }
    }
}

Complete example configuration

Here’s a comprehensive example that demonstrates all index types:
from lambdadb import models

complete_index_config = {
    "text": {
        "type": models.TypeText.TEXT,
        "analyzers": [
            models.Analyzer.JAPANESE,
            models.Analyzer.KOREAN,
            models.Analyzer.ENGLISH,
        ],
    },
    "keyword": {"type": models.Type.KEYWORD},
    "long": {"type": models.Type.LONG},
    "double": {"type": models.Type.DOUBLE},
    "boolean": {"type": models.Type.BOOLEAN},
    "datetime": {"type": models.Type.DATETIME},
    "vector": {
        "type": models.TypeVector.VECTOR,
        "dimensions": 10,
        "similarity": models.Similarity.COSINE,
    },
    "sparseVector": {"type": models.Type.SPARSE_VECTOR},
    "object": {
        "type": models.TypeObject.OBJECT,
        "objectIndexConfigs": {
            "text": {
                "type": models.TypeText.TEXT,
                "analyzers": [
                    models.Analyzer.JAPANESE,
                    models.Analyzer.KOREAN,
                    models.Analyzer.ENGLISH,
                ],
            },
            "keyword": {"type": models.Type.KEYWORD},
            "long": {"type": models.Type.LONG},
            "double": {"type": models.Type.DOUBLE},
            "boolean": {"type": models.Type.BOOLEAN},
            "datetime": {"type": models.Type.DATETIME},
            "vector": {
                "type": models.TypeVector.VECTOR,
                "dimensions": 10,
                "similarity": models.Similarity.COSINE,
            },
            "sparseVector": {"type": models.Type.SPARSE_VECTOR},
        },
    },
}

Supported analyzers

AnalyzerDescription
standardDefault general-purpose analyzer
englishEnglish language analyzer
koreanKorean language analyzer
japaneseJapanese language analyzer
Leave a comment in our community channel or contact us if you need an analyzer not listed above.

Supported similarity metrics

MetricDescriptionUse Case
cosineCosine similarityMost common for text embeddings
euclideanEuclidean distanceGeometric distance calculations
dot_productDot product similarityFast similarity computation
max_inner_productMaximum inner productSpecialized similarity metric