Skip to content

Performance

This guide covers optimization strategies for getting the best performance from Forage.

Memory Management

Heap Sizing

Forage maintains the index in JVM heap memory. Plan for:

Required Heap ≈ Raw Data Size × 2-4
Data Size Minimum Heap Recommended Heap
10 MB 40 MB 80 MB
100 MB 400 MB 800 MB
1 GB 4 GB 8 GB

JVM Configuration

# For 500K documents (~500MB data)
java -Xms2g -Xmx4g -XX:+UseG1GC -jar myapp.jar

Monitoring Memory

// Log memory usage after bootstrap
Runtime runtime = Runtime.getRuntime();
long usedMemory = runtime.totalMemory() - runtime.freeMemory();
log.info("Index memory usage: {} MB", usedMemory / (1024 * 1024));

Indexing Performance

Minimize Fields

Only index fields you'll search:

// Good: Minimal fields
new ForageDocument(book.getId(), Arrays.asList(
    new TextField("title", book.getTitle()),
    new TextField("author", book.getAuthor())
));

// Avoid: Unnecessary fields
new ForageDocument(book.getId(), Arrays.asList(
    new TextField("title", book.getTitle()),
    new TextField("author", book.getAuthor()),
    new TextField("internalNotes", book.getInternalNotes()),    // Never searched
    new TextField("legacyDescription", book.getLegacyDesc())    // Never searched
));

Use Appropriate Field Types

// Efficient: Right field type for the job
new StringField("isbn", book.getIsbn()),     // Exact match only
new IntField("year", new int[]{book.getYear()})  // Numeric

// Less efficient: TextField for non-searchable data
new TextField("isbn", book.getIsbn())  // Unnecessarily analyzed

Batch Bootstrap

Stream data efficiently:

@Override
public void bootstrap(Consumer<IndexableDocument> consumer) {
    // Good: Stream processing
    try (Stream<Book> books = repository.streamAll()) {
        books.forEach(book -> consumer.accept(createDocument(book)));
    }

    // Avoid: Loading all into memory first
    // List<Book> allBooks = repository.findAll(); // OOM risk!
}

Query Performance

Be Specific

// Fast: Specific field
QueryBuilder.matchQuery("title", searchTerm)

// Slower: Multiple fields
QueryBuilder.booleanQuery()
    .query(QueryBuilder.matchQuery("title", searchTerm).build())
    .query(QueryBuilder.matchQuery("author", searchTerm).build())
    .query(QueryBuilder.matchQuery("description", searchTerm).build())
    .query(QueryBuilder.matchQuery("tags", searchTerm).build())
    .clauseType(ClauseType.SHOULD)

Use Filters

Filters are cached and faster than scoring queries:

// Fast: FILTER doesn't compute scores
QueryBuilder.booleanQuery()
    .query(QueryBuilder.matchQuery("title", searchTerm).build())
    .query(QueryBuilder.matchQuery("category", "technology")
        .clauseType(ClauseType.FILTER).build())  // Cached filter
    .clauseType(ClauseType.MUST)

// Slower: MUST computes scores
QueryBuilder.booleanQuery()
    .query(QueryBuilder.matchQuery("title", searchTerm).build())
    .query(QueryBuilder.matchQuery("category", "technology").build())  // Scores computed
    .clauseType(ClauseType.MUST)

Limit Results

// Good: Reasonable limit
QueryBuilder.matchQuery("title", searchTerm).buildForageQuery(20)

// Bad: Excessive results
QueryBuilder.matchQuery("title", searchTerm).buildForageQuery(10000)

Avoid Short Prefixes

// Slow: Very short prefix
QueryBuilder.prefixMatchQuery("title", "a")  // Matches too many

// Fast: Longer prefix
QueryBuilder.prefixMatchQuery("title", "prog")  // More selective

Caching Strategies

Query Result Caching

private final Cache<String, ForageQueryResult<Book>> queryCache =
    CacheBuilder.newBuilder()
        .maximumSize(1000)
        .expireAfterWrite(5, TimeUnit.MINUTES)
        .build();

public ForageQueryResult<Book> search(String query) {
    String cacheKey = "search:" + query.toLowerCase();
    return queryCache.get(cacheKey, () -> engine.search(
        QueryBuilder.matchQuery("title", query).buildForageQuery(20)
    ));
}

Autocomplete Caching

private final Cache<String, List<String>> autocompleteCache =
    CacheBuilder.newBuilder()
        .maximumSize(5000)
        .expireAfterWrite(10, TimeUnit.MINUTES)
        .build();

Bootstrap Optimization

Parallel Processing

@Override
public void bootstrap(Consumer<IndexableDocument> consumer) {
    List<Book> allBooks = repository.findAll();

    // Process in parallel (consumer is thread-safe)
    allBooks.parallelStream()
        .map(this::createDocument)
        .forEach(consumer);
}

Optimize Update Frequency

// Consider your data freshness requirements
new PeriodicUpdateEngine<>(
    bootstrapper,
    new AsyncQueuedConsumer<>(engine),
    300,  // 5 minutes - balance freshness vs CPU
    TimeUnit.SECONDS
);

Benchmarking

Query Timing

public ForageQueryResult<Book> searchWithTiming(String query) {
    long start = System.nanoTime();

    ForageQueryResult<Book> results = engine.search(
        QueryBuilder.matchQuery("title", query).buildForageQuery(20)
    );

    long elapsed = System.nanoTime() - start;
    log.debug("Query '{}' took {} ms, found {} results",
        query, elapsed / 1_000_000.0, results.getTotal().getValue());

    return results;
}

Bootstrap Timing

@Override
public void bootstrap(Consumer<IndexableDocument> consumer) {
    long start = System.currentTimeMillis();
    AtomicInteger count = new AtomicInteger();

    repository.streamAll().forEach(book -> {
        consumer.accept(createDocument(book));
        count.incrementAndGet();
    });

    log.info("Bootstrapped {} documents in {} ms",
        count.get(), System.currentTimeMillis() - start);
}

Performance Checklist

  • [ ] Heap sized appropriately (2-4x data size)
  • [ ] Only necessary fields indexed
  • [ ] Using correct field types
  • [ ] Result limits applied
  • [ ] Filters used where scoring not needed
  • [ ] Caching implemented for frequent queries
  • [ ] Bootstrap interval tuned for use case
  • [ ] Monitoring in place

Typical Performance

Operation Documents Typical Time
Simple match query 100K < 5ms
Boolean query (3 clauses) 100K < 10ms
Function score query 100K < 15ms
Full bootstrap 100K 5-15 seconds