Document Splitter Guide
The Document Splitter feature allows you to divide large, multi-page documents into smaller, well-structured sub-documents.
For example, if you upload a 40-page file that includes multiple document types (such as invoices, receipts, IRS forms, or custom models), the Document Splitter can automatically separate each type and send it to the correct processing flow.
This ensures that every document type is processed with the appropriate model, saving time and improving consistency.
How It Works
Open your Custom Model.
In the Info tab, set the Model Category to Document Splitter.
Open the Splitter tab.
Define your splitting rules (see “Splitting Rules” below for details).
Optionally, configure your fallback flow and fallback flow types.
Save the custom model.
Create a new Flow for multi-page document ingestion.
This ingestion flow classifies all incoming documents as Custom Model Splitter.
Only OCR is applied at this stage; no data extraction is performed.
The splitter model then routes each sub-document to the appropriate flow based on your defined rules.
Assign your Splitter Model to the ingestion flow.
Defining Splitting Rules
Rules determine how and when a document is divided into sub-documents.
Each rule specifies a condition that identifies the beginning or end of a sub-document, or how unclassified pages should be handled.
Name
A unique identifier for the rule, used to label sub-documents.
If multiple criteria (for example,
startsWithandendsWith) apply to the same logical rule, use the same Name.Rules sharing the same name are merged during processing.
Rule
The Rule field defines the type of condition that triggers a document split. The following rule types are supported:
startsWith
Begins a new sub-document when the current page contains all values specified in the Value field.
Example:
If the Value is /acord 125/i, a new sub-document begins on any page containing the phrase “ACORD 125” (case-insensitive).
Pages continue to be part of this sub-document until another rule match occurs.
endsWith
Ends a sub-document when the current page contains all values specified in the Value field.
Note: An endsWith rule requires a corresponding startsWith rule to function correctly.
means
Used when a document cannot be classified through startsWith or endsWith rules.
This rule uses an LLM-based prompt to classify a page based on contextual meaning.
Example prompt:
This page belongs to ACORD 125 only if the text explicitly contains the exact phrase "ACORD 125".
Do not match pages that only reference related forms, schedules, or identifiers. The phrase "ACORD 125" must appear literally in the text for a match to occur. If the text does not include the exact string "ACORD 125", then it does not belong to ACORD 125.Value
Defines the matching criteria for the rule:
For
startsWithandendsWith, the value can be either plain text or a regular expression.For
means, the value should be a prompt text that provides instructions for contextual classification.
Model Type
Select the model (or default classification) that should process the sub-document produced by this rule.
Target Flow
Select the Flow to which the sub-document should be sent after splitting.
Handling Unclassified Documents
If any pages do not satisfy the defined criteria, they are labeled as Unrecognized and are not sent to any flow by default.
If you want to process unclassified pages as well:
Assign a Model Type (the default is OCR).
Choose a Target Flow for these pages.
Summary
The Document Splitter enables automated separation and routing of multi-page documents.
By defining precise rules using text, regular expressions, or LLM-based prompts, you can:
Automatically split large documents into logical sub-documents.
Assign each sub-document to the appropriate model and flow.
Handle unrecognized pages through fallback settings.
Maintain accuracy and efficiency in document processing pipelines.
