Extract Document Text

Experimental Feature

This feature is Experimental and may change based on user feedback and testing. Share your thoughts via our chatbot to help us improve it.

The Extract Document Text block extracts text from supported documents and images using OCR (Optical Character Recognition). This is useful when you need to validate or reuse text from scanned PDFs or image files inside an automation flow (for example, extracting text from a scanned contract before checking an approval keyword). This feature is available starting from Release 2025.1.486.

Note:

  • To use this block, cloud blocks must be activated for your tenant in the add-ons section of Customer Portal. Make sure this is enabled before adding the block to your flows.

  • The screenshot on this page uses the Elegance Design, introduced in 2025.3. If you are using an earlier version, your layout may look different.

When fully expanded, the Extract Document Text block displays the following properties:

image-20260205-095812.png

Quick-start

  1. Drag Extract Document Text onto the canvas.

  2. Connect the block in the flow and specify Source type, then provide the file in File Input.

  3. Run the flow when it’s ready.

Building block parameters

Parameters
  • Block header: The green input connector triggers the block to start executing. The green output connector triggers when the text has been successfully extracted. You can rename the block by double-clicking the header text and typing a new title.

  • File Input: Defines the document/image to extract text from. Users must select a supported file type when importing a document.

    • Source type: Selects where the input file comes from. The block recognizes the file type once the file is provided.

      • Data file: Extracts text from a file stored in Leapwork as a Data File.

        • Select file to extract text: The file picker/drop area used to provide the input file (for example .pdf, .jpeg, .png). Selecting Import New File opens a window where you can upload the document.

      • Local path: Extracts text from a file referenced from a specified local path.

        • Path to file: Specifies the local path to the file to extract text from. This field is used when Source type is set to Local path.

        • Text fields: Stores key–value pairs that can be inserted as dynamic tokens into Path to file via Insert token. Click Add field to define one or more fields, then use Insert token in Path to file to reuse them.

  • Extracted text: Returns the recognized text from the document or image. If the PDF contains multiple pages, the text is extracted from all pages and combined in order.

  • Failed: Triggers if the block cannot extract text (for example unreadable/low-quality scans, unsupported format, password-protected/encrypted PDFs, or corrupted files).

  • Default timeout: Controls whether the block uses the default timeout from the flow settings or a custom timeout value.

  • Timeout (sec): Sets the maximum time spent extracting text before giving up and triggering Failed. This field is used when Default timeout is not selected.
    Note: All cases have a global timeout configured in the Settings panel. This is unrelated to the timeout of a single building block. However, a running case will automatically be cancelled if it runs for longer than the global timeout.

Resources

Topic

Description

Flows FAQ

Common questions about creating, running, and managing flows in Leapwork.

Flows Troubleshooting

Guidelines and solutions for identifying and fixing issues that occur when building or running flows in Leapwork.

Customer Portal Add-ons

Customer portal section to activate your cloud blocks.