LEADTOOLS OCR(Leadtools.Forms.Ocrアセンブリ)

フレームを表示

GetRecognizedCharactersメソッド








このIOcrPageの最後の認められたキャラクタデータを取得します
構文
IOcrPageCharacters GetRecognizedCharacters()
'Declaration
 
Function GetRecognizedCharacters() As IOcrPageCharacters
'Usage
 
Dim instance As IOcrPage
Dim value As IOcrPageCharacters
 
value = instance.GetRecognizedCharacters()
IOcrPageCharacters GetRecognizedCharacters()
- (nullable LTOcrPageCharacters *)recognizedCharacters:(NSError **)error
public OcrPageCharacters getRecognizedCharacters()
function Leadtools.Forms.Ocr.IOcrPage.GetRecognizedCharacters()
IOcrPageCharacters^ GetRecognizedCharacters(); 

戻り値

最後を格納しているIOcrPageCharactersのインスタンスは、このIOcrPageの文字データを認めました。
解説

IOcrPageRecognizeメソッドで認められたあと、このメソッドを呼び出さなければなりません。すなわち、このページのIsRecognizedプロパティの値がfalseであるならば、このメソッドを呼び出すことは例外をスローします。

認められたキャラクタデータを調べるために、GetRecognizedCharactersを使うことができます。このデータは、ページならびにフォント情報でキャラクタコード、信頼、推測コード、場所とpositionに関する情報を格納します。詳細については、「OcrCharacter」を参照してください。

GetRecognizedCharactersメソッドはIOcrPageCharactersのインスタンスを返します、このインスタンスはIOcrZoneCharactersのコレクションです。IOcrZoneCharacters.ZoneIndexプロパティは、ゾーンの0から始まるインデックスを格納します。このIOcrPageZonesプロパティと同じインデックスを用いて、ゾーン情報を取得することができます。

認識データを変更してページに戻して適用するには、SetRecognizedCharactersを使用します。

ゾーンの認められたワードを取得するために、IOcrZoneCharacters.GetWordsを使います。

スペースの上のメモ:LEADTOOLS Advantage OCRエンジンは、GetRecognizedCharactersメソッドを使うとき、任意の間隔文字を返しません。

ブールのRecognition.SpaceIsValidCharacter設定値の値がfalse(デフォルト)であるならば、LEADTOOLS Professional OCRエンジンは間隔文字を返しません。LEADTOOLS Professionalエンジンを使用するとき認識結果で間隔文字を絶対に必要とするならば、ブールのRecognition.SpaceIsValidCharacter設定の値をtrueocrEngineInstance.SettingManager.SetBooleanValue(「Recognition.SpaceIsValidCharacter」、true))に設定します。OCR設定の詳細については、「IOcrSettingManagerとLEADTOOLS OCR Professionalエンジン設定」を参照してください。

SetRecognizedCharactersメソッドは、LEADTOOLS Advantageエンジンで間隔文字を受け取ります。しかし、これらの間隔文字が、最終的なドキュメント(PDF)を生成するとき使われて、最終アウトプットに影響を及ぼすかもしれません。したがって、LEADTOOLS Advantageエンジンを使用するとき、間隔文字を挿入するよう勧めません。

ブールのRecognition.SpaceIsValidCharacter設定値の値がfalse(デフォルト)であるならば、LEADTOOLS Professional OCRエンジンはSetRecognizedCharactersに渡される結果から任意の間隔文字を剥がします。LEADTOOLS Professionalエンジンを使用するとき認識結果で間隔文字を絶対に必要とするならば、SetRecognizedCharactersを呼び出す前に、ブールのRecognition.SpaceIsValidCharacter設定の値をtrueに設定します。

GetRecognizedCharactersSetRecognizedCharactersを使うならば、認識を修正するメソッドは出力ファイルに保存する前に起こります、そして、機能(IOcrDocumentManager.EngineFormatプロパティを設定して、IOcrDocument.SaveメソッドでDocumentFormat.Userを使うことを通して)を除いてエンジンネイティブを使うつもりになっています、そして、trueに設定しているブールのRecognition.SpaceIsValidCharacterを変更しなければなりません。

IOcrPageCharactersインターフェースは、IOcrPageCharacters.UpdateWordも格納しますメソッドを使用すると、単語を更新または削除することによってOCR認識結果を変更できます。その後で、必要に応じて結果を最終出力文書に保存できます。

サンプル

このサンプルはページの認識された文字を取得して、修正して、最終的なドキュメントを保存する前に、阻みます。

Copy Code  
Imports Leadtools
Imports Leadtools.Codecs
Imports Leadtools.Forms.Ocr
Imports Leadtools.Forms
Imports Leadtools.Forms.DocumentWriters
Imports Leadtools.WinForms
Imports Leadtools.Drawing
Imports Leadtools.ImageProcessing
Imports Leadtools.ImageProcessing.Color

<TestMethod>
Public Sub RecognizedCharactersExample()
   ' Create an image with some text in it
   Dim image As New RasterImage(RasterMemoryFlags.Conventional, _
                                 640, 200, 24, _
                                 RasterByteOrder.Bgr, _
                                 RasterViewPerspective.TopLeft, _
                                 Nothing, IntPtr.Zero, 0)
   Dim imageRect As New Rectangle(0, 0, image.ImageWidth, image.ImageHeight)
   Dim hdc As IntPtr = RasterImagePainter.CreateLeadDC(image)
   Using g As Graphics = Graphics.FromHdc(hdc)
      g.SmoothingMode = System.Drawing.Drawing2D.SmoothingMode.HighQuality
      g.FillRectangle(Brushes.White, imageRect)

      Using f As New Font("Arial", 20, FontStyle.Regular)
         g.DrawString("Normal line", f, Brushes.Black, 0, 0)
      End Using

      Using f As New Font("Arial", 20, FontStyle.Bold)
         g.DrawString("Bold, italic and underline", f, Brushes.Black, 0, 40)
      End Using

      Using f As New Font("Courier New", 20, FontStyle.Regular)
         g.DrawString("Monospaced line", f, Brushes.Black, 0, 80)
      End Using
   End Using

   RasterImagePainter.DeleteLeadDC(hdc)

   Dim textFileName As String = Path.Combine(LEAD_VARS.ImagesDir, "MyImageWithTest.txt")
   Dim pdfFileName As String = Path.Combine(LEAD_VARS.ImagesDir, "MyImageWithTest.pdf")

   ' Create an instance of the engine
   Using ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, False)
      ' Start the engine using default parameters
      ocrEngine.Startup(Nothing, Nothing, Nothing, LEAD_VARS.OcrAdvantageRuntimeDir)

      ' Create an OCR page
      Dim ocrPage As IOcrPage = ocrEngine.CreatePage(image, OcrImageSharingMode.AutoDispose)

      ' Recognize this page
      ocrPage.Recognize(Nothing)

      ' Dump the characters into a text file
      Using writer As StreamWriter = File.CreateText(textFileName)
         Dim ocrPageCharacters As IOcrPageCharacters = ocrPage.GetRecognizedCharacters()
         For Each ocrZoneCharacters As IOcrZoneCharacters In ocrPageCharacters
            ' Show the words found in this zone. Get the word boundaries in inches
            Dim words As ICollection(Of OcrWord) = ocrZoneCharacters.GetWords(ocrPage.DpiX, ocrPage.DpiY, LogicalUnit.Inch)
            Console.WriteLine("Words:")
            For Each word As OcrWord In words
               Console.WriteLine("Word: {0}, at {1}, characters index from {2} to {3}", _
                                 word.Value, word.Bounds, word.FirstCharacterIndex, word.LastCharacterIndex)
            Next

            Dim nextCharacterIsNewWord As Boolean = True

            For i As Integer = 0 To ocrZoneCharacters.Count - 1
               Dim ocrCharacter As OcrCharacter = ocrZoneCharacters(i)

               ' Capitalize the first letter if this is a new word
               If nextCharacterIsNewWord Then
                  ocrCharacter.Code = [Char].ToUpper(ocrCharacter.Code)
               End If

               writer.WriteLine("Code: {0}, Confidence: {1}, WordIsCertain: {2}, Bounds: {3}, Position: {4}, FontSize: {5}, FontStyle: {6}", _
                                 ocrCharacter.Code, _
                                 ocrCharacter.Confidence, _
                                 ocrCharacter.WordIsCertain, _
                                 ocrCharacter.Bounds, _
                                 ocrCharacter.Position, _
                                 ocrCharacter.FontSize, _
                                 ocrCharacter.FontStyle)

               ' If the charcater is bold, make it underline
               If (ocrCharacter.FontStyle And OcrCharacterFontStyle.Bold) = OcrCharacterFontStyle.Bold Then
                  ocrCharacter.FontStyle = ocrCharacter.FontStyle Or OcrCharacterFontStyle.Italic
                  ocrCharacter.FontStyle = ocrCharacter.FontStyle Or OcrCharacterFontStyle.Underline
               End If

               ' Check if next character is the start of a new word
               If (ocrCharacter.Position And OcrCharacterPosition.EndOfWord) = OcrCharacterPosition.EndOfWord OrElse _
                  (ocrCharacter.Position And OcrCharacterPosition.EndOfLine) = OcrCharacterPosition.EndOfLine Then
                  nextCharacterIsNewWord = True
               Else
                  nextCharacterIsNewWord = False
               End If

               ocrZoneCharacters(i) = ocrCharacter
            Next
         Next

         ' Replace the characters with the modified one before we save
         ocrPage.SetRecognizedCharacters(ocrPageCharacters)
      End Using

      ' Create an OCR document so we can save the results
      Using ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument(Nothing, OcrCreateDocumentOptions.AutoDeleteFile)
         ' Add the page and dispose it
         ocrDocument.Pages.Add(ocrPage)
         ocrPage.Dispose()

         ' Show the recognition results
         ' Set the PDF options to save as PDF/A text only
         Dim pdfOptions As PdfDocumentOptions = TryCast(ocrEngine.DocumentWriterInstance.GetOptions(DocumentFormat.Pdf), PdfDocumentOptions)
         pdfOptions.DocumentType = PdfDocumentType.PdfA
         pdfOptions.ImageOverText = False
         ocrEngine.DocumentWriterInstance.SetOptions(DocumentFormat.Pdf, pdfOptions)


         ' Open and check the result file, it should contain the following text
         ' "Normal Line"
         ' "Bold And Italic Line"
         ' "Monospaced Line"
         ' With the second line bold and underlined now
         ocrDocument.Save(pdfFileName, DocumentFormat.Pdf, Nothing)
      End Using

      ' Shutdown the engine
      ' Note: calling Dispose will also automatically shutdown the engine if it has been started
      ocrEngine.Shutdown()
   End Using
End Sub

Public NotInheritable Class LEAD_VARS
Public Const ImagesDir As String = "C:\Users\Public\Documents\LEADTOOLS Images"
Public Const OcrAdvantageRuntimeDir As String = "C:\LEADTOOLS 19\Bin\Common\OcrAdvantageRuntime"
End Class
using Leadtools;
using Leadtools.Codecs;
using Leadtools.Forms.Ocr;
using Leadtools.Forms;
using Leadtools.Forms.DocumentWriters;
using Leadtools.WinForms;
using Leadtools.Drawing;
using Leadtools.ImageProcessing;
using Leadtools.ImageProcessing.Color;

public void RecognizedCharactersExample()
{
   // Create an image with some text in it
   RasterImage image = new RasterImage(RasterMemoryFlags.Conventional, 640, 200, 24, RasterByteOrder.Bgr, RasterViewPerspective.TopLeft, null, IntPtr.Zero, 0);
   Rectangle imageRect = new Rectangle(0, 0, image.ImageWidth, image.ImageHeight);
   IntPtr hdc = RasterImagePainter.CreateLeadDC(image);
   using (Graphics g = Graphics.FromHdc(hdc))
   {
      g.SmoothingMode = System.Drawing.Drawing2D.SmoothingMode.HighQuality;
      g.FillRectangle(Brushes.White, imageRect);

      using (Font f = new Font("Arial", 20, FontStyle.Regular))
         g.DrawString("Normal line", f, Brushes.Black, 0, 0);

      using (Font f = new Font("Arial", 20, FontStyle.Bold))
         g.DrawString("Bold, italic and underline", f, Brushes.Black, 0, 40);

      using (Font f = new Font("Courier New", 20, FontStyle.Regular))
         g.DrawString("Monospaced line", f, Brushes.Black, 0, 80);
   }

   RasterImagePainter.DeleteLeadDC(hdc);

   string textFileName = Path.Combine(LEAD_VARS.ImagesDir, "MyImageWithTest.txt");
   string pdfFileName = Path.Combine(LEAD_VARS.ImagesDir, "MyImageWithTest.pdf");

   // Create an instance of the engine
   using (IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false))
   {
      // Start the engine using default parameters
      ocrEngine.Startup(null, null, null, LEAD_VARS.OcrAdvantageRuntimeDir);

      // Create an OCR page
      IOcrPage ocrPage = ocrEngine.CreatePage(image, OcrImageSharingMode.AutoDispose);

      // Recognize this page
      ocrPage.Recognize(null);

      // Dump the characters into a text file
      using (StreamWriter writer = File.CreateText(textFileName))
      {
         IOcrPageCharacters ocrPageCharacters = ocrPage.GetRecognizedCharacters();
         foreach (IOcrZoneCharacters ocrZoneCharacters in ocrPageCharacters)
         {
            // Show the words found in this zone. Get the word boundaries in inches
            ICollection<OcrWord> words = ocrZoneCharacters.GetWords(ocrPage.DpiX, ocrPage.DpiY, LogicalUnit.Inch);
            Console.WriteLine("Words:");
            foreach (OcrWord word in words)
               Console.WriteLine("Word: {0}, at {1}, characters index from {2} to {3}", word.Value, word.Bounds, word.FirstCharacterIndex, word.LastCharacterIndex);

            bool nextCharacterIsNewWord = true;

            for (int i = 0; i < ocrZoneCharacters.Count; i++)
            {
               OcrCharacter ocrCharacter = ocrZoneCharacters[i];

               // Capitalize the first letter if this is a new word
               if (nextCharacterIsNewWord)
                  ocrCharacter.Code = Char.ToUpper(ocrCharacter.Code);

               writer.WriteLine("Code: {0}, Confidence: {1}, WordIsCertain: {2}, Bounds: {3}, Position: {4}, FontSize: {5}, FontStyle: {6}",
                  ocrCharacter.Code,
                  ocrCharacter.Confidence,
                  ocrCharacter.WordIsCertain,
                  ocrCharacter.Bounds,
                  ocrCharacter.Position,
                  ocrCharacter.FontSize,
                  ocrCharacter.FontStyle);

               // If the charcater is bold, make it underline
               if ((ocrCharacter.FontStyle & OcrCharacterFontStyle.Bold) == OcrCharacterFontStyle.Bold)
               {
                  ocrCharacter.FontStyle |= OcrCharacterFontStyle.Italic;
                  ocrCharacter.FontStyle |= OcrCharacterFontStyle.Underline;
               }

               // Check if next character is the start of a new word
               if ((ocrCharacter.Position & OcrCharacterPosition.EndOfWord) == OcrCharacterPosition.EndOfWord ||
                  (ocrCharacter.Position & OcrCharacterPosition.EndOfLine) == OcrCharacterPosition.EndOfLine)
                  nextCharacterIsNewWord = true;
               else
                  nextCharacterIsNewWord = false;

               ocrZoneCharacters[i] = ocrCharacter;
            }
         }

         // Replace the characters with the modified one before we save
         ocrPage.SetRecognizedCharacters(ocrPageCharacters);
      }

      // Create an OCR document so we can save the results
      using (IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(null, OcrCreateDocumentOptions.AutoDeleteFile))
      {
         // Add the page and dispose it
         ocrDocument.Pages.Add(ocrPage);
         ocrPage.Dispose();

         // Show the recognition results
         // Set the PDF options to save as PDF/A text only
         PdfDocumentOptions pdfOptions = ocrEngine.DocumentWriterInstance.GetOptions(DocumentFormat.Pdf) as PdfDocumentOptions;
         pdfOptions.DocumentType = PdfDocumentType.PdfA;
         pdfOptions.ImageOverText = false;
         ocrEngine.DocumentWriterInstance.SetOptions(DocumentFormat.Pdf, pdfOptions);

         ocrDocument.Save(pdfFileName, DocumentFormat.Pdf, null);

         // Open and check the result file, it should contain the following text
         // "Normal Line"
         // "Bold And Italic Line"
         // "Monospaced Line"
         // With the second line bold and underlined now
      }

      // Shutdown the engine
      // Note: calling Dispose will also automatically shutdown the engine if it has been started
      ocrEngine.Shutdown();
   }
}

static class LEAD_VARS
{
public const string ImagesDir = @"C:\Users\Public\Documents\LEADTOOLS Images";
public const string OcrAdvantageRuntimeDir = @"C:\LEADTOOLS 19\Bin\Common\OcrAdvantageRuntime";
}
using Leadtools;
using Leadtools.Codecs;
using Leadtools.Controls;
using Leadtools.Forms.Ocr;
using Leadtools.Forms;
using Leadtools.Forms.DocumentWriters;
using Leadtools.ImageProcessing;

      
public async Task RecognizedCharactersExample()
{
   string imageFileName = @"Assets\OCR1.TIF";
   string textFileName = "OCR1.txt";
   string pdfFileName = "OCR1.pdf";
   // Create an instance of the engine
   IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false);

   // Start the engine using default parameters
   ocrEngine.Startup(null, null, String.Empty, Tools.OcrEnginePath);

   // Create an OCR document
   IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument();

   // Add this image to the document
   IOcrPage ocrPage = null;
   using (RasterCodecs codecs = new RasterCodecs())
   {
      StorageFile loadFile = await Tools.AppInstallFolder.GetFileAsync(imageFileName);
      using (RasterImage image = await codecs.LoadAsync(LeadStreamFactory.Create(loadFile)))
         ocrPage = ocrDocument.Pages.AddPage(image, null);
   }

   // Recognize this page
   ocrPage.Recognize(null);

   // Dump the characters into a text file
   StorageFile file = await Tools.AppLocalFolder.CreateFileAsync(textFileName);
   using (IRandomAccessStream fileStream = await file.OpenAsync(FileAccessMode.ReadWrite))
   {
      using (IOutputStream outputStream = fileStream.GetOutputStreamAt(0))
      {
         using (DataWriter writer = new DataWriter(outputStream))
         {
            IOcrPageCharacters ocrPageCharacters = ocrPage.GetRecognizedCharacters();
            foreach (IOcrZoneCharacters ocrZoneCharacters in ocrPageCharacters)
            {
               // Show the words found in this zone.
               ICollection<OcrWord> words = ocrZoneCharacters.GetWords();
               Debug.WriteLine("Words:");
               foreach (OcrWord word in words)
                  Debug.WriteLine("Word: {0}, at {1}, characters index from {2} to {3}", word.Value, word.Bounds, word.FirstCharacterIndex, word.LastCharacterIndex);

               bool nextCharacterIsNewWord = true;

               for (int i = 0; i < ocrZoneCharacters.Count; i++)
               {
                  OcrCharacter ocrCharacter = ocrZoneCharacters[i];

                  // Capitalize the first letter if this is a new word
                  if (nextCharacterIsNewWord)
                     ocrCharacter.Code = Char.ToUpper(ocrCharacter.Code);

                  writer.WriteString(string.Format("Code: {0}, Confidence: {1}, WordIsCertain: {2}, Bounds: {3}, Position: {4}, FontSize: {5}, FontStyle: {6}",
                     ocrCharacter.Code,
                     ocrCharacter.Confidence,
                     ocrCharacter.WordIsCertain,
                     ocrCharacter.Bounds,
                     ocrCharacter.Position,
                     ocrCharacter.FontSize,
                     ocrCharacter.FontStyle));

                  // If the charcater is bold, make it underline
                  if ((ocrCharacter.FontStyle & OcrCharacterFontStyle.Bold) == OcrCharacterFontStyle.Bold)
                  {
                     ocrCharacter.FontStyle |= OcrCharacterFontStyle.Italic;
                     ocrCharacter.FontStyle |= OcrCharacterFontStyle.Underline;
                  }

                  // Check if next character is the start of a new word
                  if ((ocrCharacter.Position & OcrCharacterPosition.EndOfWord) == OcrCharacterPosition.EndOfWord ||
                     (ocrCharacter.Position & OcrCharacterPosition.EndOfLine) == OcrCharacterPosition.EndOfLine)
                     nextCharacterIsNewWord = true;
                  else
                     nextCharacterIsNewWord = false;

                  ocrZoneCharacters[i] = ocrCharacter;
               }
            }

            // Replace the characters with the modified one before we save
            ocrPage.SetRecognizedCharacters(ocrPageCharacters);

            await writer.StoreAsync();
            writer.DetachStream();
         }

         await outputStream.FlushAsync();
      }
   }

   // Show the recognition results
   // Set the PDF options to save as PDF/A text only
   PdfDocumentOptions pdfOptions = ocrEngine.DocumentWriterInstance.GetOptions(DocumentFormat.Pdf) as PdfDocumentOptions;
   pdfOptions.DocumentType = PdfDocumentType.PdfA;
   pdfOptions.ImageOverText = false;
   ocrEngine.DocumentWriterInstance.SetOptions(DocumentFormat.Pdf, pdfOptions);

   StorageFile saveFile = await Tools.AppLocalFolder.CreateFileAsync(pdfFileName, CreationCollisionOption.ReplaceExisting);
   await ocrDocument.SaveAsync(LeadStreamFactory.Create(saveFile), DocumentFormat.Pdf, null);

   // Shutdown the engine
   ocrEngine.Shutdown();
}
必要条件

ターゲットプラットホーム

参照

参照

IOcrPageインターフェース
IOcrPageメンバ
SetRecognizedCharactersメソッド
OcrCharacter構造体
IOcrPageCharactersインターフェース
IOcrZoneCharactersインターフェース
IOcrPageCollectionインターフェース
IOcrZoneCollectionインターフェース
OcrZone構造体
自動ゾーン
LEADTOOLS .NET OCRによるプログラミング
OCR信用度の報告

Leadtools.Forms.Ocrは、認識またはDocument Imaging Suiteのライセンスと、解除キーが必要です。詳細は、以下を参照してください。LEADTOOLSツールキット機能