Get Text from OCR Page#

Read text from the specified page of a PDF document using recognition.

File Name[Text] The name of the PDF file from which the text will be extracted. You can enter the full file name including the path.
Page Number[Number] The page number from which the text will be extracted. Numbering starts from 1.
Text LanguageSelect the language of the text.
ModuleSelect the OCR module used for recognizing the image into text.
Segmentation Method

[Text] The recognized text can be automatically segmented into sections, separated by commas.

Segmentation method:

  • 0 - Use the specified block delimiter;
  • 1 - Automatic segmentation (for Yandex only);
  • 2 - Segment by empty spaces longer than the specified number of characters.
Block Delimiter

[Number] The hexadecimal code of the character that will be considered as the block delimiter. For example, space has code 20, tab has code 9.

Used when selecting segmentation method 0.

Number of Characters[Number] The length of empty space in the recognized text, measured in the number of characters, which is used when selecting segmentation method 2.
Zoom

[Number] A value indicating how many times to zoom in on the image before recognition.

Depending on the engine used, zooming the image 2 or 3 times helps improve recognition quality.

Auto Rotate PageAutomatically rotate the page during recognition.
Process AnnotationsAnnotations will also be processed when selected.
Result[Text] Returns the extracted text from the page.
Error Handling Level

Select the error handling level. Possible values:

  • "Default" - default;
  • "Ignore" - errors are ignored;
  • "Handle" - errors are handled.

If "Default" is selected, the value from the "Start" block of this diagram will be used.

Message Level

Select the message level that the blocks will output during operation. Possible values:

  • "Default" - default;
  • "Release" - output is disabled;
  • "Debug" - main information output;
  • "Detailed" - detailed information output.

If "Default" is selected, the value from the "Start" block of this diagram will be used.

Error Text[Text] Returns detailed information about the error in case of incorrect execution of the block's work.