28

I have on each page of my PDF document a line with this string:

%REPLACE%

Which I'd like to find and replace with another string.

Does anyone know how to do this with some command line application such as PDFTK?

This folk gave me an important clue however I'd like something more direct.

Thanks.

Community
  • 1
  • 1
Roger
  • 7,606
  • 15
  • 56
  • 75
  • Does this answer your question? [How to program a text search and replace in PDF files](https://stackoverflow.com/questions/220445/how-to-program-a-text-search-and-replace-in-pdf-files) – rogerdpack Jun 11 '21 at 06:09
  • I added an answer to the above question of a custom program I wrote for this purpose https://stackoverflow.com/a/67932076/32453 – rogerdpack Jun 16 '21 at 04:15

3 Answers3

42

You can try to modify content of your PDF as follows

  1. Uncompress the text streams of PDF

    pdftk file.pdf output uncompressed.pdf uncompress
    
  2. Use sed to replace your text with another

    sed -e "s/ORIGINALSTRING/NEWSTRING/g" <uncompressed.pdf >modified.pdf
    
  3. If this attempt was successful, re-compress the PDF with pdftk

    pdftk modified.pdf output recompressed.pdf compress
    

Note: This way is not successful every time, mainly due to font subsetting

thirdender
  • 3,811
  • 2
  • 28
  • 32
Dingo
  • 2,535
  • 1
  • 21
  • 31
  • I can't make this work with the PDF file exported from Google Docs (even when I choose arial as the only font). I am afraid that I'd have to use some other application only to write the page and then try the very simple and wonderful code you wrote... – Roger Mar 26 '12 at 14:52
  • 2
    with *pdfedit* you can have more chances (if fonts are fully embedded) to edit text content - http://pdfedit.cz/en/index.html – Dingo Mar 26 '12 at 15:01
  • 2
    pdfedit can be used also from command line without gui (see its site for command line utilities) – Dingo Mar 27 '12 at 12:35
  • 4
    Note that this will only work when the text is using `Tj` command in PDF along with plain ASCII chars. As soon as octal, hex or glyph refences are used, you are lost. – Michael-O Dec 14 '18 at 11:00
  • For anyone with Mac M1 this might be useful - https://stackoverflow.com/questions/60859527/how-to-solve-pdftk-bad-cpu-type-in-executable-on-mac – PeteW Sep 09 '21 at 15:51
  • I had to replace `sed`, because of encoding issues, with `perl -pi.bak -e 's/findthis/replacewiththis/g' uncompressed.pdf` from https://stackoverflow.com/a/6995010/241542 – pgericson Jan 03 '22 at 09:27
  • Is this able to use regex for `sed`? Without regex, it works. But with regex it says ``` Error: Unable to find file. Error: Failed to open PDF file: modified.pdf Errors encountered. No output created. Done. Input errors, so no output created. ``` – Nor.Z May 20 '22 at 20:47
1

For making a small change just on a few pages, inkscape can do a good job. It can also fix some issues in diagrams and with table borders. One must process each page separately, though, and stick the pages back together using pdfunite. (Unchanged page ranges can be extracted with pdfseparate.)

Inspiration: https://tatica.org/2015/07/13/edit-pdf-inkscape/

Joachim Wagner
  • 739
  • 5
  • 16
0

changepagestring will do this in a single step, as easy as:

changepagestring -o -v infile.pdf search-regex replace-str outfile.pdf

However like the currently accepted answer, this is hit or miss and doesn't work as expected with all files.

Brian Z
  • 798
  • 11
  • 19