Document Splitter Guide

The Document Splitter feature allows you to divide large, multi-page documents into smaller, well-structured sub-documents.

For example, if you upload a 40-page file that includes multiple document types (such as invoices, receipts, IRS forms, or custom models), the Document Splitter can automatically separate each type and send it to the correct processing flow.
This ensures that every document type is processed with the appropriate model, saving time and improving consistency.

How It Works

  1. Open your Custom Model.

  2. In the Info tab, set the Model Category to Document Splitter.

  3. Open the Splitter tab.

  4. Define your splitting rules (see “Splitting Rules” below for details).

  5. Optionally, configure your fallback flow and fallback flow types.

  6. Save the custom model.

  7. Create a new Flow for multi-page document ingestion.

    • This ingestion flow classifies all incoming documents as Custom Model Splitter.

    • Only OCR is applied at this stage; no data extraction is performed.

    • The splitter model then routes each sub-document to the appropriate flow based on your defined rules.

  8. Assign your Splitter Model to the ingestion flow.

Defining Splitting Rules

Rules determine how and when a document is divided into sub-documents.
Each rule specifies a condition that identifies the beginning or end of a sub-document, or how unclassified pages should be handled.

Name

  • A unique identifier for the rule, used to label sub-documents.

  • If multiple criteria (for example, startsWith and endsWith) apply to the same logical rule, use the same Name.

  • Rules sharing the same name are merged during processing.

Rule

The Rule field defines the type of condition that triggers a document split. The following rule types are supported:

  • startsWith

Begins a new sub-document when the current page contains all values specified in the Value field.

Example:
If the Value is /acord 125/i, a new sub-document begins on any page containing the phrase “ACORD 125” (case-insensitive).
Pages continue to be part of this sub-document until another rule match occurs.

  • endsWith

Ends a sub-document when the current page contains all values specified in the Value field.

Note: An endsWith rule requires a corresponding startsWith rule to function correctly.

  • means

Used when a document cannot be classified through startsWith or endsWith rules.
This rule uses an LLM-based prompt to classify a page based on contextual meaning.

Example prompt:

This page belongs to ACORD 125 only if the text explicitly contains the exact phrase "ACORD 125".
Do not match pages that only reference related forms, schedules, or identifiers. The phrase "ACORD 125" must appear literally in the text for a match to occur. If the text does not include the exact string "ACORD 125", then it does not belong to ACORD 125.

Value

Defines the matching criteria for the rule:

  • For startsWith and endsWith, the value can be either plain text or a regular expression.

  • For means, the value should be a prompt text that provides instructions for contextual classification.

Model Type

Select the model (or default classification) that should process the sub-document produced by this rule.

Target Flow

Select the Flow to which the sub-document should be sent after splitting.

Handling Unclassified Documents

If any pages do not satisfy the defined criteria, they are labeled as Unrecognized and are not sent to any flow by default.

If you want to process unclassified pages as well:

  • Assign a Model Type (the default is OCR).

  • Choose a Target Flow for these pages.

Summary

The Document Splitter enables automated separation and routing of multi-page documents.
By defining precise rules using text, regular expressions, or LLM-based prompts, you can:

  • Automatically split large documents into logical sub-documents.

  • Assign each sub-document to the appropriate model and flow.

  • Handle unrecognized pages through fallback settings.

  • Maintain accuracy and efficiency in document processing pipelines.