Indexable Documents¶

Indexable documents are the fundamental unit of data in Forage. They define what gets stored in the Lucene index and how it can be searched.

ForageDocument¶

The primary implementation of IndexableDocument:

import com.livetheoogway.forage.search.engine.model.index.ForageDocument;
import com.livetheoogway.forage.search.engine.model.index.field.*;

ForageDocument document = new ForageDocument(
    "unique-id-123",           // Document ID
    Arrays.asList(             // List of indexed fields
        new TextField("title", "Effective Java"),
        new TextField("author", "Joshua Bloch"),
        new FloatField("rating", new float[]{4.7f})
    )
);

Document Structure¶

Document ID¶

Every document requires a unique identifier:

new ForageDocument("book-123", ...)  // String ID

ID Uniqueness

Document IDs must be unique across your entire index. Duplicate IDs will result in overwritten documents.

Fields¶

The second parameter is a list of fields to index:

Arrays.asList(
    new TextField("title", book.getTitle()),
    new TextField("author", book.getAuthor()),
    new StringField("isbn", book.getIsbn()),
    new FloatField("rating", new float[]{book.getRating()}),
    new IntField("pages", new int[]{book.getPages()})
)

Creating Documents in Bootstrap¶

Typically, you create documents during the bootstrap phase:

@Override
public void bootstrap(Consumer<IndexableDocument> consumer) {
    for (Book book : getAllBooks()) {
        consumer.accept(createDocument(book));
    }
}

private ForageDocument createDocument(Book book) {
    List<Field> fields = new ArrayList<>();

    // Required: searchable fields
    fields.add(new TextField("title", book.getTitle()));
    fields.add(new TextField("author", book.getAuthor()));

    // Optional: exact match field
    if (book.getIsbn() != null) {
        fields.add(new StringField("isbn", book.getIsbn()));
    }

    // Numeric fields for filtering and sorting
    fields.add(new FloatField("rating", new float[]{book.getRating()}));
    fields.add(new IntField("pages", new int[]{book.getPages()}));

    return new ForageDocument(book.getId(), fields);
}

Field Selection Strategy¶

Choose fields based on your search requirements:

Search Need	Field Type	Example
Full-text search	`TextField`	Title, description, content
Exact matching	`StringField`	ISBN, category codes, status
Range filtering	`IntField`, `FloatField`	Price, rating, page count
Sorting	Any numeric field	Date (as epoch), popularity score
Function scoring	Numeric fields	Rating, freshness score

Best Practices¶

1. Index Only What You Search¶

// Good: Only index searchable fields
new ForageDocument(book.getId(), Arrays.asList(
    new TextField("title", book.getTitle()),
    new TextField("author", book.getAuthor())
));

// Avoid: Indexing fields you never search
new ForageDocument(book.getId(), Arrays.asList(
    new TextField("title", book.getTitle()),
    new TextField("internalNotes", book.getInternalNotes()),  // Never searched
    new TextField("auditLog", book.getAuditLog())             // Never searched
));

2. Handle Null Values¶

private ForageDocument createDocument(Book book) {
    List<Field> fields = new ArrayList<>();

    // Always add required fields
    fields.add(new TextField("title",
        book.getTitle() != null ? book.getTitle() : ""));

    // Conditionally add optional fields
    if (book.getDescription() != null && !book.getDescription().isEmpty()) {
        fields.add(new TextField("description", book.getDescription()));
    }

    return new ForageDocument(book.getId(), fields);
}

3. Normalize Text¶

// Normalize before indexing for consistent search
fields.add(new TextField("title", book.getTitle().toLowerCase().trim()));

4. Composite Fields¶

Create composite fields for cross-field searching:

// Index individual fields
fields.add(new TextField("title", book.getTitle()));
fields.add(new TextField("author", book.getAuthor()));

// Create a composite field for "search all"
String allText = String.join(" ",
    book.getTitle(),
    book.getAuthor(),
    book.getDescription()
);
fields.add(new TextField("_all", allText));

Then search across all content:

QueryBuilder.matchQuery("_all", "java programming").buildForageQuery();

Document Size Considerations¶

// Monitor document complexity
private ForageDocument createDocument(Book book) {
    List<Field> fields = new ArrayList<>();

    // Text fields consume more memory
    fields.add(new TextField("title", book.getTitle()));           // ~100 bytes
    fields.add(new TextField("description", book.getDescription())); // ~2KB average

    // Numeric fields are compact
    fields.add(new FloatField("rating", new float[]{book.getRating()})); // ~8 bytes
    fields.add(new IntField("pages", new int[]{book.getPages()}));       // ~8 bytes

    return new ForageDocument(book.getId(), fields);
}

Memory Estimation

For a rough memory estimate:

Text fields: 2-3x the raw text size
Numeric fields: ~8-16 bytes per field
Per-document overhead: ~100-200 bytes

Next Steps¶

Field Types - Detailed guide to each field type
Bootstrapping - How to feed documents into Forage