Skip to content

Indexable Documents

Indexable documents are the fundamental unit of data in Forage. They define what gets stored in the Lucene index and how it can be searched.

ForageDocument

The primary implementation of IndexableDocument:

import com.livetheoogway.forage.search.engine.model.index.ForageDocument;
import com.livetheoogway.forage.search.engine.model.index.field.*;

ForageDocument document = new ForageDocument(
    "unique-id-123",           // Document ID
    Arrays.asList(             // List of indexed fields
        new TextField("title", "Effective Java"),
        new TextField("author", "Joshua Bloch"),
        new FloatField("rating", new float[]{4.7f})
    )
);

Document Structure

Document ID

Every document requires a unique identifier:

new ForageDocument("book-123", ...)  // String ID

ID Uniqueness

Document IDs must be unique across your entire index. Duplicate IDs will result in overwritten documents.

Fields

The second parameter is a list of fields to index:

Arrays.asList(
    new TextField("title", book.getTitle()),
    new TextField("author", book.getAuthor()),
    new StringField("isbn", book.getIsbn()),
    new FloatField("rating", new float[]{book.getRating()}),
    new IntField("pages", new int[]{book.getPages()})
)

Creating Documents in Bootstrap

Typically, you create documents during the bootstrap phase:

@Override
public void bootstrap(Consumer<IndexableDocument> consumer) {
    for (Book book : getAllBooks()) {
        consumer.accept(createDocument(book));
    }
}

private ForageDocument createDocument(Book book) {
    List<Field> fields = new ArrayList<>();

    // Required: searchable fields
    fields.add(new TextField("title", book.getTitle()));
    fields.add(new TextField("author", book.getAuthor()));

    // Optional: exact match field
    if (book.getIsbn() != null) {
        fields.add(new StringField("isbn", book.getIsbn()));
    }

    // Numeric fields for filtering and sorting
    fields.add(new FloatField("rating", new float[]{book.getRating()}));
    fields.add(new IntField("pages", new int[]{book.getPages()}));

    return new ForageDocument(book.getId(), fields);
}

Field Selection Strategy

Choose fields based on your search requirements:

Search Need Field Type Example
Full-text search TextField Title, description, content
Exact matching StringField ISBN, category codes, status
Range filtering IntField, FloatField Price, rating, page count
Sorting Any numeric field Date (as epoch), popularity score
Function scoring Numeric fields Rating, freshness score

Best Practices

// Good: Only index searchable fields
new ForageDocument(book.getId(), Arrays.asList(
    new TextField("title", book.getTitle()),
    new TextField("author", book.getAuthor())
));

// Avoid: Indexing fields you never search
new ForageDocument(book.getId(), Arrays.asList(
    new TextField("title", book.getTitle()),
    new TextField("internalNotes", book.getInternalNotes()),  // Never searched
    new TextField("auditLog", book.getAuditLog())             // Never searched
));

2. Handle Null Values

private ForageDocument createDocument(Book book) {
    List<Field> fields = new ArrayList<>();

    // Always add required fields
    fields.add(new TextField("title",
        book.getTitle() != null ? book.getTitle() : ""));

    // Conditionally add optional fields
    if (book.getDescription() != null && !book.getDescription().isEmpty()) {
        fields.add(new TextField("description", book.getDescription()));
    }

    return new ForageDocument(book.getId(), fields);
}

3. Normalize Text

// Normalize before indexing for consistent search
fields.add(new TextField("title", book.getTitle().toLowerCase().trim()));

4. Composite Fields

Create composite fields for cross-field searching:

// Index individual fields
fields.add(new TextField("title", book.getTitle()));
fields.add(new TextField("author", book.getAuthor()));

// Create a composite field for "search all"
String allText = String.join(" ",
    book.getTitle(),
    book.getAuthor(),
    book.getDescription()
);
fields.add(new TextField("_all", allText));

Then search across all content:

QueryBuilder.matchQuery("_all", "java programming").buildForageQuery();

Document Size Considerations

// Monitor document complexity
private ForageDocument createDocument(Book book) {
    List<Field> fields = new ArrayList<>();

    // Text fields consume more memory
    fields.add(new TextField("title", book.getTitle()));           // ~100 bytes
    fields.add(new TextField("description", book.getDescription())); // ~2KB average

    // Numeric fields are compact
    fields.add(new FloatField("rating", new float[]{book.getRating()})); // ~8 bytes
    fields.add(new IntField("pages", new int[]{book.getPages()}));       // ~8 bytes

    return new ForageDocument(book.getId(), fields);
}

Memory Estimation

For a rough memory estimate:

  • Text fields: 2-3x the raw text size
  • Numeric fields: ~8-16 bytes per field
  • Per-document overhead: ~100-200 bytes

Next Steps