Overview¶

What is Forage?¶

Forage is a Java library that creates an in-memory search index from your existing database. It wraps Apache Lucene to provide powerful full-text search capabilities without requiring a dedicated search infrastructure.

The Problem¶

When you have a small to medium dataset and need search capabilities, traditional approaches have significant drawbacks:

Approach	Problem
Dedicated Search Engine (Elasticsearch, Solr)	Overkill for small datasets, expensive to operate, adds operational complexity
Database Indexes	Limited to exact matches, no full-text search, bloats your database
LIKE Queries	Slow, no relevance scoring, no fuzzy matching

The Solution¶

Forage solves this by:

Creating an in-memory Lucene index in each application node
Periodically syncing with your database to stay up-to-date
Providing a simple API to define indexing rules and execute searches

How It Works¶

Architecture¶

Forage Architecture

We've finished the What and the Why, now let's look at the How. At its heart is Lucene. Why lucene you ask? Well, lucene is the most evolved open-source java search engine libraries out there. It powers Nutch, Solr, Elasticsearch etc. It is well maintained, supported by the Apache Software Foundation, and has continuous contributions. Need I say more?!

The library operates in four phases:

1. Bootstrapping¶

When your application starts, Forage scans your database and builds the initial search index:

@Override
public void bootstrap(Consumer<IndexableDocument> consumer) {
    // Scan your database
    for (Book book : getAllBooksFromDatabase()) {
        // Create indexable documents
        consumer.accept(createIndexableDocument(book));
    }
}

2. Periodic Updates¶

A background thread periodically re-bootstraps from your database to capture changes:

PeriodicUpdateEngine<IndexableDocument> updateEngine =
        new PeriodicUpdateEngine<>(
                dataStore,
                new AsyncQueuedConsumer<>(searchEngine),
                60, TimeUnit.SECONDS  // Refresh every 60 seconds
        );
updateEngine.start();

3. Indexing Rules¶

You define which fields are searchable and how they should be analyzed:

new ForageDocument(book.getId(), Arrays.asList(
        new TextField("title", book.getTitle()),        // Full-text searchable
        new TextField("author", book.getAuthor()),      // Full-text searchable
        new StringField("genre", book.getGenre()),      // Exact match only
        new FloatField("rating", book.getRating()),     // Numeric, sortable
        new IntField("pages", book.getPages())          // Numeric, range queries
));

4. Search Queries¶

Execute searches using the fluent QueryBuilder API:

ForageQueryResult<Book> results = searchEngine.search(
        QueryBuilder.booleanQuery()
                .query(QueryBuilder.matchQuery("title", "programming").boost(2.0f).build())
                .query(QueryBuilder.matchQuery("author", "martin").build())
                .clauseType(ClauseType.SHOULD)
                .buildForageQuery(10)
);

Data Flow¶

sequenceDiagram
    participant App as Application
    participant FE as ForageEngine
    participant LI as Lucene Index
    participant DB as Database

    Note over App,DB: Bootstrap Phase
    App->>FE: start()
    FE->>DB: bootstrap()
    DB-->>FE: Documents
    FE->>LI: Index documents

    Note over App,DB: Search Phase
    App->>FE: search(query)
    FE->>LI: Execute query
    LI-->>FE: Doc IDs + Scores
    FE->>DB: get(ids)
    DB-->>FE: Full documents
    FE-->>App: ForageQueryResult

Key Components¶

Bootstrapper¶

The Bootstrapper<IndexableDocument> interface is how you feed data into Forage:

public interface Bootstrapper<T> {
    void bootstrap(Consumer<T> consumer) throws Exception;
}

Store¶

The Store<D> interface retrieves full data objects given their IDs:

public interface Store<D> {
    Map<String, D> get(List<String> ids);
}

ForageQuery¶

The query abstraction that supports multiple query types:

MatchQuery - Term matching
BooleanQuery - Combine queries
RangeQuery - Numeric ranges
FuzzyMatchQuery - Typo tolerance
PhraseMatchQuery - Exact phrases
PrefixMatchQuery - Prefix matching
FunctionScoreQuery - Custom scoring

ForageQueryResult¶

Search results containing:

matchingResults - List of matched documents with scores
total - Total number of matches
nextPage - Cursor for pagination

Prerequisites¶

Before using Forage, ensure:

Data Accessibility: You can scan/stream all data from your database
Memory Capacity: You have sufficient heap space (2-4x your raw data size)
Java 17+: Forage requires Java 17 or later

Limitations¶

Limitation	Details
Dataset Size	Optimized for up to ~1 million documents
Memory Bound	Index must fit in JVM heap
Single Node	No distributed search (each node has its own index)
Eventual Consistency	Changes reflect after the next bootstrap cycle
Database workload	Every bootstrap, must necessarily get all documents from the primary datastore. This may not be incremental changes.

Next Steps¶

Installation Guide - Add Forage to your project
Quick Start - Build your first search engine
Core Concepts - Deep dive into Forage concepts