About Hidden Text Strings in PDF Files

PDF files can commonly contain hidden text strings which only become visible when they are opened in an application such as Scan2CAD.

It may appear that Scan2CAD has added duplicate or incorrect text strings in your PDF file. This is not the case. These text strings are created by the application which created the PDF.

Why do hidden text strings exist?

Hidden text strings are created by the application which created the PDF (not Scan2CAD.) 

These text strings are typically created to make the PDF indexable (searchable). 

Hidden text strings can be found in both raster and vector PDF files. They may be seen as duplicate strings ontop of exploded vector text (typical for PDFs created by CAD applications) or as strings ontop of raster images (typical for PDFs created by scanner applications.)

In many cases, these text strings can be poor interpretations of the original text because some are created using unsuitable OCR technology.

Why do I only see these text strings in Scan2CAD?

Basic PDF viewer applications are designed to make a PDF easy to view but they do not display much of the data which exists in a PDF.

As a CAD conversion solution, Scan2CAD will display as much of the data that exists in the PDF as possible. This includes text strings that may be hidden by some applications. These strings may be useful for some users. If they are not useful to you, you can easily remove them.

How to automatically remove hidden text strings

You can quickly remove all text strings held in your PDF with a simple feature in Scan2CAD.

  1. Open the PDF in Scan2CAD
  2. Open the ‘Convert Vector Image’ dialog
  3. Select ‘Optimize Vectors’
  4. In the ‘Optimize Vectors’ tab, enable ‘Remove text’
  5. Click ‘Run’ and then ‘OK’ to save the results to the canvas.

All text strings in the file will be removed.