Best Practices¶
Follow these best practices to build robust, performant search functionality with Forage.
Indexing Best Practices¶
1. Index Only What You Search¶
// Good: Minimal, purposeful fields
new ForageDocument(book.getId(), Arrays.asList(
new TextField("title", book.getTitle()),
new TextField("author", book.getAuthor()),
new FloatField("rating", new float[]{book.getRating()})
));
// Avoid: Indexing everything
new ForageDocument(book.getId(), Arrays.asList(
new TextField("title", book.getTitle()),
new TextField("author", book.getAuthor()),
new TextField("description", book.getDescription()),
new TextField("tableOfContents", book.getToc()), // Never searched
new TextField("publisherNotes", book.getNotes()), // Never searched
new TextField("internalComments", book.getComments()) // Never searched
));
2. Use Correct Field Types¶
| Data | Field Type | Reason |
|---|---|---|
| Searchable text | TextField |
Tokenized for full-text search |
| IDs, codes, exact values | StringField |
Not tokenized, exact match |
| Numbers for filtering/sorting | IntField/FloatField |
Range queries, sorting |
| Numbers for scoring | FloatField |
Function score access |
new ForageDocument(id, data, Arrays.asList(
new TextField("title", title), // Full-text search
new StringField("isbn", isbn), // Exact match
new StringField("status", status), // Exact match
new IntField("year", new int[]{year}), // Range queries
new FloatField("rating", new float[]{rating}) // Sorting, scoring
));
3. Handle Null Values¶
private ForageDocument createDocument(Book book) {
List<Field> fields = new ArrayList<>();
// Required fields
fields.add(new TextField("title",
book.getTitle() != null ? book.getTitle() : ""));
// Optional fields - only add if present
if (book.getDescription() != null && !book.getDescription().isEmpty()) {
fields.add(new TextField("description", book.getDescription()));
}
// Numeric fields - use defaults for null
fields.add(new FloatField("rating", new float[]{
book.getRating() != null ? book.getRating() : 0.0f
}));
return new ForageDocument(book.getId(), fields);
}
4. Normalize Data¶
// Normalize text for consistent searching
fields.add(new TextField("title", book.getTitle().trim()));
fields.add(new StringField("category", book.getCategory().toLowerCase()));
Query Best Practices¶
1. Be Specific¶
// Good: Search specific field
QueryBuilder.matchQuery("title", searchTerm)
// Less optimal: Search many fields
QueryBuilder.booleanQuery()
.query(QueryBuilder.matchQuery("title", searchTerm).build())
.query(QueryBuilder.matchQuery("author", searchTerm).build())
.query(QueryBuilder.matchQuery("description", searchTerm).build())
.clauseType(ClauseType.SHOULD)
2. Use Filters for Non-Scoring Clauses¶
// Good: FILTER for non-scoring conditions
QueryBuilder.booleanQuery()
.query(QueryBuilder.matchQuery("title", searchTerm).build())
.query(QueryBuilder.matchQuery("category", "technology")
.clauseType(ClauseType.FILTER).build()) // No scoring overhead
.clauseType(ClauseType.MUST)
// Less optimal: MUST computes unnecessary scores
QueryBuilder.booleanQuery()
.query(QueryBuilder.matchQuery("title", searchTerm).build())
.query(QueryBuilder.matchQuery("category", "technology").build())
.clauseType(ClauseType.MUST)
3. Limit Results¶
// Good: Reasonable limit
QueryBuilder.matchQuery("title", searchTerm).buildForageQuery(20)
// Avoid: Excessive results
QueryBuilder.matchQuery("title", searchTerm).buildForageQuery(10000)
4. Use Minimum Score for Quality¶
// Filter out weak matches
QueryBuilder.matchQuery("title", searchTerm)
.buildForageQuery(20, null, 0.3f)
Scoring Best Practices¶
1. Start Simple, Add Complexity¶
// Start with basic search
QueryBuilder.matchQuery("title", searchTerm)
// Then add boosting if needed
QueryBuilder.matchQuery("title", searchTerm).boost(2.0f)
// Then add function scoring if needed
QueryBuilder.functionScoreQuery()
.baseQuery(QueryBuilder.matchQuery("title", searchTerm).build())
.fieldValueFactor("rating")
2. Document Your Boosts¶
// Documented boost strategy
QueryBuilder.booleanQuery()
.query(QueryBuilder.matchQuery("title", term)
.boost(3.0f).build()) // Title: highest priority
.query(QueryBuilder.matchQuery("author", term)
.boost(2.0f).build()) // Author: high priority
.query(QueryBuilder.matchQuery("description", term)
.boost(1.0f).build()) // Description: baseline
.clauseType(ClauseType.SHOULD)
3. Test with Real Data¶
// Verify ranking with actual queries
@Test
void testSearchRanking() {
var results = engine.search(
QueryBuilder.matchQuery("title", "java").buildForageQuery(10)
);
// Verify expected order
assertEquals("Effective Java", results.getMatchingResults().get(0).getData().getTitle());
}
Performance Best Practices¶
1. Size Heap Appropriately¶
# Rule of thumb: 2-4x raw data size
java -Xmx4g -jar myapp.jar
2. Stream Large Datasets¶
@Override
public void bootstrap(Consumer<IndexableDocument> consumer) {
// Good: Stream processing
try (Stream<Book> stream = repository.streamAll()) {
stream.forEach(book -> consumer.accept(createDocument(book)));
}
// Avoid: Loading all into memory
// List<Book> all = repository.findAll();
}
3. Cache Frequent Queries¶
private final Cache<String, ForageQueryResult<Book>> cache =
CacheBuilder.newBuilder()
.maximumSize(1000)
.expireAfterWrite(5, TimeUnit.MINUTES)
.build();
4. Monitor Performance¶
public ForageQueryResult<Book> search(String query) {
long start = System.nanoTime();
var results = engine.search(buildQuery(query));
long elapsed = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
if (elapsed > 100) {
log.warn("Slow query: '{}' took {}ms", query, elapsed);
}
return results;
}
Error Handling Best Practices¶
1. Handle Bootstrap Errors Gracefully¶
@Override
public void bootstrap(Consumer<IndexableDocument> consumer) {
int errors = 0;
for (Book book : repository.findAll()) {
try {
consumer.accept(createDocument(book));
} catch (Exception e) {
errors++;
log.error("Failed to index {}: {}", book.getId(), e.getMessage());
}
}
if (errors > 0) {
log.warn("Bootstrap completed with {} errors", errors);
}
}
2. Handle Missing Store Data¶
@Override
public Map<String, Book> get(List<String> ids) {
Map<String, Book> results = new HashMap<>();
for (String id : ids) {
Book book = repository.findById(id).orElse(null);
if (book != null) {
results.put(id, book);
} else {
log.warn("Book not found in store: {}", id);
}
}
return results;
}
3. Validate Queries¶
public ForageQueryResult<Book> search(String query) {
if (query == null || query.trim().isEmpty()) {
return ForageQueryResult.empty();
}
if (query.length() > 1000) {
throw new IllegalArgumentException("Query too long");
}
return engine.search(buildQuery(query.trim()));
}
Testing Best Practices¶
1. Test Different Query Types¶
@Test
void testMatchQuery() { ... }
@Test
void testBooleanQuery() { ... }
@Test
void testRangeQuery() { ... }
@Test
void testFunctionScoreQuery() { ... }
2. Test Edge Cases¶
@Test
void testEmptyQuery() { ... }
@Test
void testSpecialCharacters() { ... }
@Test
void testVeryLongQuery() { ... }
@Test
void testNoResults() { ... }
3. Test Ranking¶
@Test
void testResultsOrderedByRelevance() {
// Exact match should rank higher
var results = engine.search(query);
assertTrue(results.getMatchingResults().get(0).getDocScore().getScore() >=
results.getMatchingResults().get(1).getDocScore().getScore());
}
Summary Checklist¶
- [ ] Index only necessary fields
- [ ] Use correct field types
- [ ] Handle null values
- [ ] Use FILTER for non-scoring clauses
- [ ] Limit result sizes
- [ ] Apply minimum score thresholds
- [ ] Size heap appropriately
- [ ] Stream large datasets
- [ ] Cache frequent queries
- [ ] Handle errors gracefully
- [ ] Write comprehensive tests