Skip to content

Data Store

The Store interface is responsible for retrieving full data objects when search results are returned. While the Lucene index stores document IDs and indexed fields, the Store provides the complete data.

The Store Interface

public interface Store<D> {
    Map<String, D> get(List<String> ids);
}

Parameters

  • ids: List of document IDs that matched the search query
  • Returns: Map of ID to full data object

Basic Implementation

public class BookStore implements Store<Book> {

    private final Map<String, Book> booksCache = new ConcurrentHashMap<>();

    @Override
    public Map<String, Book> get(List<String> ids) {
        Map<String, Book> results = new HashMap<>();
        for (String id : ids) {
            Book book = booksCache.get(id);
            if (book != null) {
                results.put(id, book);
            }
        }
        return results;
    }
}

Database-Backed Implementation

In production, you typically fetch from a database:

public class BookStore implements Store<Book> {

    private final BookRepository repository;  // Your JPA/JDBC repository

    public BookStore(BookRepository repository) {
        this.repository = repository;
    }

    @Override
    public Map<String, Book> get(List<String> ids) {
        if (ids == null || ids.isEmpty()) {
            return Collections.emptyMap();
        }

        // Batch fetch from database
        List<Book> books = repository.findAllById(ids);

        return books.stream()
            .collect(Collectors.toMap(Book::getId, book -> book));
    }
}

Caching Considerations

Since Store.get() is called for every search result, consider caching:

Option 1: In-Memory Cache

public class CachedBookStore implements Store<Book> {

    private final BookRepository repository;
    private final Cache<String, Book> cache;

    public CachedBookStore(BookRepository repository) {
        this.repository = repository;
        this.cache = CacheBuilder.newBuilder()
            .maximumSize(10_000)
            .expireAfterWrite(5, TimeUnit.MINUTES)
            .build();
    }

    @Override
    public Map<String, Book> get(List<String> ids) {
        Map<String, Book> results = new HashMap<>();
        List<String> missingIds = new ArrayList<>();

        // Check cache first
        for (String id : ids) {
            Book cached = cache.getIfPresent(id);
            if (cached != null) {
                results.put(id, cached);
            } else {
                missingIds.add(id);
            }
        }

        // Fetch missing from database
        if (!missingIds.isEmpty()) {
            List<Book> fromDb = repository.findAllById(missingIds);
            for (Book book : fromDb) {
                results.put(book.getId(), book);
                cache.put(book.getId(), book);
            }
        }

        return results;
    }
}

Option 2: Full In-Memory Store

For small datasets, keep everything in memory:

public class InMemoryBookStore implements Bootstrapper<IndexableDocument>, Store<Book> {

    private final Map<String, Book> books = new ConcurrentHashMap<>();
    private final BookRepository repository;

    @Override
    public void bootstrap(Consumer<IndexableDocument> consumer) {
        // Load all books into memory during bootstrap
        List<Book> allBooks = repository.findAll();
        for (Book book : allBooks) {
            books.put(book.getId(), book);
            consumer.accept(createDocument(book));
        }
    }

    @Override
    public Map<String, Book> get(List<String> ids) {
        // Fast in-memory lookup
        return ids.stream()
            .filter(books::containsKey)
            .collect(Collectors.toMap(id -> id, books::get));
    }
}

Handling Missing Data

What happens when a document ID exists in the index but not in the store?

@Override
public Map<String, Book> get(List<String> ids) {
    Map<String, Book> results = new HashMap<>();

    for (String id : ids) {
        Book book = fetchFromDatabase(id);
        if (book != null) {
            results.put(id, book);
        } else {
            // Option 1: Log and skip
            log.warn("Book not found for id: {}", id);

            // Option 2: Return a placeholder
            // results.put(id, Book.notFound(id));
        }
    }

    return results;
}

Index-Store Consistency

If data is deleted from your database between bootstrap cycles, search results may include stale IDs. Handle this gracefully in your Store implementation.

Combining Bootstrapper and Store

Typically, one class implements both interfaces:

public class BookDataStore implements Bootstrapper<IndexableDocument>, Store<Book> {

    private final BookRepository repository;
    private final Map<String, Book> cache = new ConcurrentHashMap<>();

    public BookDataStore(BookRepository repository) {
        this.repository = repository;
    }

    @Override
    public void bootstrap(Consumer<IndexableDocument> consumer) {
        // Clear and rebuild cache during bootstrap
        cache.clear();

        repository.findAll().forEach(book -> {
            cache.put(book.getId(), book);
            consumer.accept(createDocument(book));
        });
    }

    @Override
    public Map<String, Book> get(List<String> ids) {
        // Use cache populated during bootstrap
        return ids.stream()
            .filter(cache::containsKey)
            .collect(Collectors.toMap(id -> id, cache::get));
    }

    private ForageDocument createDocument(Book book) {
        return new ForageDocument(book.getId(), Arrays.asList(
            new TextField("title", book.getTitle()),
            new TextField("author", book.getAuthor()),
            new FloatField("rating", new float[]{book.getRating()})
        ));
    }
}

Performance Tips

1. Batch Database Queries

// Good: Single batch query
List<Book> books = repository.findAllById(ids);

// Bad: N+1 queries
for (String id : ids) {
    Book book = repository.findById(id);
}

2. Limit Result Size

// Limit search results to reduce Store.get() calls
QueryBuilder.matchQuery("title", "java")
    .buildForageQuery(10)  // Only fetch top 10

3. Use Projections for Large Objects

// If Book has large fields you don't need in search results
public class BookSummary {
    private String id;
    private String title;
    private String author;
    private float rating;
    // Omit large fields like 'fullContent'
}

public class BookStore implements Store<BookSummary> {
    @Override
    public Map<String, BookSummary> get(List<String> ids) {
        return repository.findSummariesByIds(ids);
    }
}

Next Steps