Taxonomy Management for Document Processing Blog Post

What is taxonomy?

Most documents don’t follow a standardized naming convention for their field names, making it a challenge to normalize the extracted data points across documents with different formats. Even though your system expects a specific phrase, the same field in one document may have another name for the same key in another document. This discrepancy in phrasing can cause the system to fail when trying to recognize the expected key and extract the corresponding value.

Human reviewers often need to manually review and interpret data from different documents. However, this process is time-consuming, error-prone, and adds unnecessary overhead to the validation process. Similarly, automation tools struggle to accurately extract data without a consistent naming convention, resulting in inefficiencies and delays. Extracting data from multiple sources with varying naming conventions becomes a laborious task that hampers workflow productivity, even when using outdated automation tools.

Taxonomy addresses these issues by understanding the various naming conventions associated with the same key in a workflow. This means that even if a company reviews multiple document types for the same values, different keys will not go unnoticed, ensuring smooth document data extraction.

Why is taxonomy important?

Taxonomy plays a crucial role in maintaining the integrity and quality of data by establishing clear guidelines for validation. It ensures that only valid and accurate data is extracted, effectively preventing errors or discrepancies due to inconsistent naming conventions or alternative wording. By empowering users to define and customize their own taxonomy rules, Base64.ai ensures that users have full control over how their data is classified and organized, enabling them to align the extraction process with their preferred result outcomes. This level of customization helps users eliminate ambiguity and achieve higher accuracy by tailoring the taxonomy to their specific domain or industry requirements.

Base64.ai's taxonomy management feature

With this feature, you can specify the desired wording for a field and list all possible alternatives that may appear in the document data. By including alternate spellings, synonyms, or variations of data field names, users can ensure accurate data extraction, even when dealing with inconsistent naming conventions.

Taxonomy can be managed within your Flow Settings. Simply click on your preferred Flow, then settings on the top right, and finally click “Edit This Flow.”

Choosing to update Taxonomy later on with new alternatives or removing old ones can be done seamlessly. Simply return to the “Enhancements” and make updates. Taxonomy commands are permanent until a user updates Taxonomy from Flow settings.

Taxonomies can be included in Custom Models or Flows, where they will automatically be applied to every upload.

Taxonomy proves itself useful

While handling documents like forms, users might come across diverse naming conventions for identical fields. For instance, disparities could exist between "Family name" and "Last name" or "Given name" and "First name."

In this particular case, two alternate names were added to each taxonomy, but additional alternate names can be included if needed. Users have the flexibility to add as many names as necessary. There are no limitations on the amount of taxonomy that can be added per flow.

Benefits of Base64.ai's Flow feature for taxonomy

Flexibility and Adaptability: With the Taxonomy Flow feature, users have the flexibility to process various document layouts and naming conventions into a single document processing flow. This ensures a higher level of accuracy and ease of use in data extraction.

Reduced Manual Intervention: Incorporating alternate names or synonyms into the taxonomy reduces the need for manual intervention in data validation. The system can automatically fall back on alternate naming conventions when the preferred field does not exist in the extracted result.

Streamlined Workflow: Taxonomy streamlines the data extraction process, increasing efficiency and productivity. With consistent and accurate data extraction, users can focus on higher-value workflows.