Core Concepts¶
Understanding Forage's core concepts is essential for building effective search solutions. This section covers the fundamental building blocks.
Architecture Overview¶
graph TB
subgraph "Your Application"
DS[Data Store]
BS[Bootstrapper]
ST[Store]
end
subgraph "Forage Engine"
FE[ForageEngine]
LI[Lucene Index]
QG[QueryGenerator]
end
subgraph "Data Flow"
DS -->|implements| BS
DS -->|implements| ST
BS -->|IndexableDocument| FE
FE -->|Index| LI
Q[ForageQuery] --> QG
QG --> LI
LI -->|Doc IDs| FE
FE -->|get ids| ST
ST -->|Full Data| FE
FE --> R[ForageQueryResult]
end
Key Components¶
| Component | Purpose | Interface |
|---|---|---|
| Indexable Documents | Define what gets indexed | IndexableDocument |
| Field Types | Specify how data is analyzed | TextField, StringField, etc. |
| Data Store | Retrieve full data objects | Store<D> |
| Bootstrapping | Feed data into the index | Bootstrapper<T> |
The Search Flow¶
1. Indexing Phase¶
When bootstrapping occurs:
// Your bootstrapper creates IndexableDocuments
consumer.accept(new ForageDocument(
"book-123", // Unique ID
Arrays.asList( // Fields to index
new TextField("title", "Effective Java"),
new TextField("author", "Joshua Bloch"),
new FloatField("rating", new float[]{4.7f})
)
));
Lucene then:
- Analyzes text fields (tokenization, lowercasing, etc.)
- Creates inverted indexes for fast lookup
- Stores numeric fields for range queries and sorting
2. Query Phase¶
When a search is executed:
ForageQueryResult<Book> results = engine.search(
QueryBuilder.matchQuery("title", "java").buildForageQuery(10)
);
Forage:
- Converts your query to a Lucene query
- Executes against the in-memory index
- Gets matching document IDs and scores
- Calls your
Store.get()to fetch full objects - Returns combined results
3. Update Phase¶
Periodically, the PeriodicUpdateEngine:
- Triggers a new bootstrap
- Builds a fresh index
- Atomically swaps the old index with the new one
- Old index is garbage collected
Data Model¶
ForageDocument¶
The primary indexable document type:
public class ForageDocument implements IndexableDocument {
private final String id; // Unique identifier
private final Object data; // Original data object
private final List<Field> fields; // Indexed fields
}
ForageQuery¶
The query abstraction:
public interface ForageQuery {
// Visitor pattern for different query types
<T> T accept(ForageQueryVisitor<T> visitor);
}
ForageQueryResult¶
Search results:
public class ForageQueryResult<D> {
private List<MatchingResult<D>> matchingResults; // Matched documents
private TotalResults total; // Total count
private String nextPage; // Pagination cursor
}
MatchingResult¶
Individual result with score:
public class MatchingResult<D> {
private String id; // Document ID
private D data; // Full data object from Store
private DocScore docScore; // Relevance score
}
Memory Model¶
Forage maintains the index entirely in JVM heap memory:
┌─────────────────────────────────────────────┐
│ JVM Heap │
│ ┌─────────────────────────────────────┐ │
│ │ Lucene Index │ │
│ │ ┌──────────────────────────────┐ │ │
│ │ │ Inverted Index (Terms) │ │ │
│ │ │ ───────────────────── │ │ │
│ │ │ "java" → [doc1, doc5] │ │ │
│ │ │ "code" → [doc2, doc3] │ │ │
│ │ └──────────────────────────────┘ │ │
│ │ ┌──────────────────────────────┐ │ │
│ │ │ DocValues (Numerics) │ │ │
│ │ │ ───────────────────── │ │ │
│ │ │ rating: [4.7, 4.4, 4.5] │ │ │
│ │ └──────────────────────────────┘ │ │
│ │ ┌──────────────────────────────┐ │ │
│ │ │ Stored Fields (IDs) │ │ │
│ │ └──────────────────────────────┘ │ │
│ └─────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────┐ │
│ │ Your Data Store Reference │ │
│ │ (for Store.get() calls) │ │
│ └─────────────────────────────────────┘ │
└─────────────────────────────────────────────┘
Memory Planning
Plan for 2-4x your raw data size in heap memory. The multiplier depends on:
- Number of text fields (more = more memory)
- Average document size
- Text analysis complexity
Thread Safety¶
Forage is designed for concurrent access:
- Read operations: Fully thread-safe, multiple threads can search simultaneously
- Write operations: Handled by the
AsyncQueuedConsumerwhich serializes writes - Index swap: Atomic reference swap ensures readers always see a consistent index
Next Steps¶
Dive deeper into each concept:
- Indexable Documents - How to structure your data for indexing
- Field Types - Choosing the right field type
- Data Store - Implementing the Store interface
- Bootstrapping - Feeding data into Forage