PDFBOX: Merge adds unused Fonts, how to remove it

i merge two PDF Files into one with PDFBOX Version 2.
The First one got Fonts:

name                                 type              encoding         emb sub uni object ID

------------------------------------ ----------------- ---------------- --- --- --- ---------

XXMGEM+Arial-BoldMT                  TrueType          WinAnsi          yes yes yes     15  0

XXMGEM+ArialMT                       TrueType          WinAnsi          yes yes yes     19  0

XXMGEM+ArialMT                       CID TrueType      Identity-H       yes yes yes     27  0

XXMGEM+ArialNarrow-Bold              TrueType          WinAnsi          yes yes yes     40  0

XXMGEM+ArialNarrow                   TrueType          WinAnsi          yes yes yes     44  0

and the Second one:

name                                 type              encoding         emb sub uni object ID

------------------------------------ ----------------- ---------------- --- --- --- ---------

UNTWVR+HelveticaLTCom-Roman          CID TrueType      Identity-H       yes yes yes     25  0

UNTYID+HelveticaLTCom-Bold           CID TrueType      Identity-H       yes yes yes     26  0

UNTZUP+ArialMT                       CID TrueType      Identity-H       yes yes yes     27  0

UNUBHB+Arial-BoldMT                  CID TrueType      Identity-H       yes yes yes     28  0

Helvetica-Bold                       Type 1            WinAnsi          no  no  no      29  0

UNXPUH+HelveticaLTCom-Roman          CID TrueType      Identity-H       yes yes yes     50  0

UNXRGT+HelveticaLTCom-Bold           CID TrueType      Identity-H       yes yes yes     51  0

UNXSTF+ArialMT                       CID TrueType      Identity-H       yes yes yes     52  0

UNXUFR+Arial-BoldMT                  CID TrueType      Identity-H       yes yes yes     53  0

After Merging, this happens:

name                                 type              encoding         emb sub uni object ID

------------------------------------ ----------------- ---------------- --- --- --- ---------

SRWYVL+HelveticaLTCom-Roman          CID TrueType      Identity-H       yes yes yes    420  0

SRXAHX+HelveticaLTCom-Bold           CID TrueType      Identity-H       yes yes yes    421  0

SRXBUJ+ArialMT                       CID TrueType      Identity-H       yes yes yes    422  0

SRXDGV+Arial-BoldMT                  CID TrueType      Identity-H       yes yes yes    423  0

Helvetica-Bold                       Type 1            WinAnsi          no  no  no     424  0

SRWYVL+HelveticaLTCom-Roman          CID TrueType      Identity-H       yes yes yes    425  0

SRXAHX+HelveticaLTCom-Bold           CID TrueType      Identity-H       yes yes yes    426  0

SRXBUJ+ArialMT                       CID TrueType      Identity-H       yes yes yes    427  0

SRXDGV+Arial-BoldMT                  CID TrueType      Identity-H       yes yes yes    428  0

SRWYVL+ArialMT                       CID TrueType      Identity-H       yes yes yes    429  0

SRXAHX+HelveticaLTCom-Roman          CID TrueType      Identity-H       yes yes yes    430  0

SRXBUJ+HelveticaLTCom-Bold           CID TrueType      Identity-H       yes yes yes    431  0

SRXDGV+Arial-BoldMT                  CID TrueType      Identity-H       yes yes yes    432  0

WDEGAT+Arial-BoldMT                  TrueType          WinAnsi          yes yes yes    436  0

GSEDXU+ArialMT                       TrueType          WinAnsi          yes yes yes    437  0

Arial                                TrueType          WinAnsi          yes no  no     416  0

ZapfDingbats                         TrueType          WinAnsi          yes no  yes    419  0

ArialNarrow                          TrueType          WinAnsi          yes no  no     417  0

ACHRDX+ZapfDingbats                  TrueType          WinAnsi          yes yes yes    618  0

ACHRDX+ZapfDingbats                  TrueType          WinAnsi          yes yes yes    619  0

ACHRDX+ZapfDingbats                  TrueType          WinAnsi          yes yes yes    620  0

ACHRDX+ZapfDingbats                  TrueType          WinAnsi          yes yes yes    621  0

ACHRDX+ZapfDingbats                  TrueType          WinAnsi          yes yes yes    622  0

GSEDXU+ArialNarrow-Bold              TrueType          WinAnsi          yes yes yes    560  0

NVGLHQ+ArialNarrow                   TrueType          WinAnsi          yes yes yes    561  0

KWHHMM+ArialMT                       CID TrueType      Identity-H       yes yes yes    578  0

My Code in Java:

final PDFMergerUtility pdfMerger = new PDFMergerUtility();

            pdfMerger.setDestinationStream(outputStream);

            pdfMerger.addSources(additionalPdfStreams);

            pdfMerger.addSource(inputStreamPdDocument);

            pdfMerger.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());

The Problem is that an Api from a third party vendor got an Problem with this Fonts.
So : What am i doing wrong and how can i remove the unused and doubled fonts ??

asked Nov 15 '18 at 9:27

Skary

5210

This question has an open bounty worth +350
reputation from danny117 ending in 3 days.

This question has not received enough attention.

Consolidate fonts, Consolidate backgrounds, Consolidate images. Optimize for web viewing. Same things acobat standard does when user opens a pdf followed by save as pdf.

2

Please also share the source PDF files to allow reproducing the issue. In particular I'm surprised that your test run seems to indicate that PDFBox renames embedded subsets. It is possible I missed that but I don't consider it probable.

– mkl
Nov 15 '18 at 10:29

PDFBox doesn't rename fonts. What PDFBox version do you use? Are you sure that the result file was font-analysed directly after the merge, and not after something else? Is it the correct file?

– Tilman Hausherr
Nov 15 '18 at 12:27

Hi, i cannot upload the PDF Files its not for the public. @TilmanHausherr : Yes, the PDF was analyzed directly after the PDFBox merged it we are using 2.0.11

– Skary
Nov 16 '18 at 7:45

Current version is 2.0.12. Can you reproduce the problem by using the command line merge utility? If yes, could you try to reproduce the problem with two non confidential PDF files?

– Tilman Hausherr
Nov 16 '18 at 9:01

This is easily duplicated just copy mypdf.pdf to copy of mypdf.pdf then merge them together. They carry double fonts double images double backgrounds.

– danny117
Feb 27 at 22:18

|
show 2 more comments

i merge two PDF Files into one with PDFBOX Version 2.
The First one got Fonts:

name                                 type              encoding         emb sub uni object ID

------------------------------------ ----------------- ---------------- --- --- --- ---------

XXMGEM+Arial-BoldMT                  TrueType          WinAnsi          yes yes yes     15  0

XXMGEM+ArialMT                       TrueType          WinAnsi          yes yes yes     19  0

XXMGEM+ArialMT                       CID TrueType      Identity-H       yes yes yes     27  0

XXMGEM+ArialNarrow-Bold              TrueType          WinAnsi          yes yes yes     40  0

XXMGEM+ArialNarrow                   TrueType          WinAnsi          yes yes yes     44  0

and the Second one:

name                                 type              encoding         emb sub uni object ID

------------------------------------ ----------------- ---------------- --- --- --- ---------

UNTWVR+HelveticaLTCom-Roman          CID TrueType      Identity-H       yes yes yes     25  0

UNTYID+HelveticaLTCom-Bold           CID TrueType      Identity-H       yes yes yes     26  0

UNTZUP+ArialMT                       CID TrueType      Identity-H       yes yes yes     27  0

UNUBHB+Arial-BoldMT                  CID TrueType      Identity-H       yes yes yes     28  0

Helvetica-Bold                       Type 1            WinAnsi          no  no  no      29  0

UNXPUH+HelveticaLTCom-Roman          CID TrueType      Identity-H       yes yes yes     50  0

UNXRGT+HelveticaLTCom-Bold           CID TrueType      Identity-H       yes yes yes     51  0

UNXSTF+ArialMT                       CID TrueType      Identity-H       yes yes yes     52  0

UNXUFR+Arial-BoldMT                  CID TrueType      Identity-H       yes yes yes     53  0

After Merging, this happens:

name                                 type              encoding         emb sub uni object ID

------------------------------------ ----------------- ---------------- --- --- --- ---------

SRWYVL+HelveticaLTCom-Roman          CID TrueType      Identity-H       yes yes yes    420  0

SRXAHX+HelveticaLTCom-Bold           CID TrueType      Identity-H       yes yes yes    421  0

SRXBUJ+ArialMT                       CID TrueType      Identity-H       yes yes yes    422  0

SRXDGV+Arial-BoldMT                  CID TrueType      Identity-H       yes yes yes    423  0

Helvetica-Bold                       Type 1            WinAnsi          no  no  no     424  0

SRWYVL+HelveticaLTCom-Roman          CID TrueType      Identity-H       yes yes yes    425  0

SRXAHX+HelveticaLTCom-Bold           CID TrueType      Identity-H       yes yes yes    426  0

SRXBUJ+ArialMT                       CID TrueType      Identity-H       yes yes yes    427  0

SRXDGV+Arial-BoldMT                  CID TrueType      Identity-H       yes yes yes    428  0

SRWYVL+ArialMT                       CID TrueType      Identity-H       yes yes yes    429  0

SRXAHX+HelveticaLTCom-Roman          CID TrueType      Identity-H       yes yes yes    430  0

SRXBUJ+HelveticaLTCom-Bold           CID TrueType      Identity-H       yes yes yes    431  0

SRXDGV+Arial-BoldMT                  CID TrueType      Identity-H       yes yes yes    432  0

WDEGAT+Arial-BoldMT                  TrueType          WinAnsi          yes yes yes    436  0

GSEDXU+ArialMT                       TrueType          WinAnsi          yes yes yes    437  0

Arial                                TrueType          WinAnsi          yes no  no     416  0

ZapfDingbats                         TrueType          WinAnsi          yes no  yes    419  0

ArialNarrow                          TrueType          WinAnsi          yes no  no     417  0

ACHRDX+ZapfDingbats                  TrueType          WinAnsi          yes yes yes    618  0

ACHRDX+ZapfDingbats                  TrueType          WinAnsi          yes yes yes    619  0

ACHRDX+ZapfDingbats                  TrueType          WinAnsi          yes yes yes    620  0

ACHRDX+ZapfDingbats                  TrueType          WinAnsi          yes yes yes    621  0

ACHRDX+ZapfDingbats                  TrueType          WinAnsi          yes yes yes    622  0

GSEDXU+ArialNarrow-Bold              TrueType          WinAnsi          yes yes yes    560  0

NVGLHQ+ArialNarrow                   TrueType          WinAnsi          yes yes yes    561  0

KWHHMM+ArialMT                       CID TrueType      Identity-H       yes yes yes    578  0

My Code in Java:

final PDFMergerUtility pdfMerger = new PDFMergerUtility();

            pdfMerger.setDestinationStream(outputStream);

            pdfMerger.addSources(additionalPdfStreams);

            pdfMerger.addSource(inputStreamPdDocument);

            pdfMerger.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());

The Problem is that an Api from a third party vendor got an Problem with this Fonts.
So : What am i doing wrong and how can i remove the unused and doubled fonts ??

asked Nov 15 '18 at 9:27

Skary

5210

This question has an open bounty worth +350
reputation from danny117 ending in 3 days.

This question has not received enough attention.

Consolidate fonts, Consolidate backgrounds, Consolidate images. Optimize for web viewing. Same things acobat standard does when user opens a pdf followed by save as pdf.

2

Please also share the source PDF files to allow reproducing the issue. In particular I'm surprised that your test run seems to indicate that PDFBox renames embedded subsets. It is possible I missed that but I don't consider it probable.

– mkl
Nov 15 '18 at 10:29

PDFBox doesn't rename fonts. What PDFBox version do you use? Are you sure that the result file was font-analysed directly after the merge, and not after something else? Is it the correct file?

– Tilman Hausherr
Nov 15 '18 at 12:27

Hi, i cannot upload the PDF Files its not for the public. @TilmanHausherr : Yes, the PDF was analyzed directly after the PDFBox merged it we are using 2.0.11

– Skary
Nov 16 '18 at 7:45

Current version is 2.0.12. Can you reproduce the problem by using the command line merge utility? If yes, could you try to reproduce the problem with two non confidential PDF files?

– Tilman Hausherr
Nov 16 '18 at 9:01

This is easily duplicated just copy mypdf.pdf to copy of mypdf.pdf then merge them together. They carry double fonts double images double backgrounds.

– danny117
Feb 27 at 22:18

|
show 2 more comments

i merge two PDF Files into one with PDFBOX Version 2.
The First one got Fonts:

name                                 type              encoding         emb sub uni object ID

------------------------------------ ----------------- ---------------- --- --- --- ---------

XXMGEM+Arial-BoldMT                  TrueType          WinAnsi          yes yes yes     15  0

XXMGEM+ArialMT                       TrueType          WinAnsi          yes yes yes     19  0

XXMGEM+ArialMT                       CID TrueType      Identity-H       yes yes yes     27  0

XXMGEM+ArialNarrow-Bold              TrueType          WinAnsi          yes yes yes     40  0

XXMGEM+ArialNarrow                   TrueType          WinAnsi          yes yes yes     44  0

and the Second one:

name                                 type              encoding         emb sub uni object ID

------------------------------------ ----------------- ---------------- --- --- --- ---------

UNTWVR+HelveticaLTCom-Roman          CID TrueType      Identity-H       yes yes yes     25  0

UNTYID+HelveticaLTCom-Bold           CID TrueType      Identity-H       yes yes yes     26  0

UNTZUP+ArialMT                       CID TrueType      Identity-H       yes yes yes     27  0

UNUBHB+Arial-BoldMT                  CID TrueType      Identity-H       yes yes yes     28  0

Helvetica-Bold                       Type 1            WinAnsi          no  no  no      29  0

UNXPUH+HelveticaLTCom-Roman          CID TrueType      Identity-H       yes yes yes     50  0

UNXRGT+HelveticaLTCom-Bold           CID TrueType      Identity-H       yes yes yes     51  0

UNXSTF+ArialMT                       CID TrueType      Identity-H       yes yes yes     52  0

UNXUFR+Arial-BoldMT                  CID TrueType      Identity-H       yes yes yes     53  0

After Merging, this happens:

name                                 type              encoding         emb sub uni object ID

------------------------------------ ----------------- ---------------- --- --- --- ---------

SRWYVL+HelveticaLTCom-Roman          CID TrueType      Identity-H       yes yes yes    420  0

SRXAHX+HelveticaLTCom-Bold           CID TrueType      Identity-H       yes yes yes    421  0

SRXBUJ+ArialMT                       CID TrueType      Identity-H       yes yes yes    422  0

SRXDGV+Arial-BoldMT                  CID TrueType      Identity-H       yes yes yes    423  0

Helvetica-Bold                       Type 1            WinAnsi          no  no  no     424  0

SRWYVL+HelveticaLTCom-Roman          CID TrueType      Identity-H       yes yes yes    425  0

SRXAHX+HelveticaLTCom-Bold           CID TrueType      Identity-H       yes yes yes    426  0

SRXBUJ+ArialMT                       CID TrueType      Identity-H       yes yes yes    427  0

SRXDGV+Arial-BoldMT                  CID TrueType      Identity-H       yes yes yes    428  0

SRWYVL+ArialMT                       CID TrueType      Identity-H       yes yes yes    429  0

SRXAHX+HelveticaLTCom-Roman          CID TrueType      Identity-H       yes yes yes    430  0

SRXBUJ+HelveticaLTCom-Bold           CID TrueType      Identity-H       yes yes yes    431  0

SRXDGV+Arial-BoldMT                  CID TrueType      Identity-H       yes yes yes    432  0

WDEGAT+Arial-BoldMT                  TrueType          WinAnsi          yes yes yes    436  0

GSEDXU+ArialMT                       TrueType          WinAnsi          yes yes yes    437  0

Arial                                TrueType          WinAnsi          yes no  no     416  0

ZapfDingbats                         TrueType          WinAnsi          yes no  yes    419  0

ArialNarrow                          TrueType          WinAnsi          yes no  no     417  0

ACHRDX+ZapfDingbats                  TrueType          WinAnsi          yes yes yes    618  0

ACHRDX+ZapfDingbats                  TrueType          WinAnsi          yes yes yes    619  0

ACHRDX+ZapfDingbats                  TrueType          WinAnsi          yes yes yes    620  0

ACHRDX+ZapfDingbats                  TrueType          WinAnsi          yes yes yes    621  0

ACHRDX+ZapfDingbats                  TrueType          WinAnsi          yes yes yes    622  0

GSEDXU+ArialNarrow-Bold              TrueType          WinAnsi          yes yes yes    560  0

NVGLHQ+ArialNarrow                   TrueType          WinAnsi          yes yes yes    561  0

KWHHMM+ArialMT                       CID TrueType      Identity-H       yes yes yes    578  0

My Code in Java:

final PDFMergerUtility pdfMerger = new PDFMergerUtility();

            pdfMerger.setDestinationStream(outputStream);

            pdfMerger.addSources(additionalPdfStreams);

            pdfMerger.addSource(inputStreamPdDocument);

            pdfMerger.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());

The Problem is that an Api from a third party vendor got an Problem with this Fonts.
So : What am i doing wrong and how can i remove the unused and doubled fonts ??

asked Nov 15 '18 at 9:27

Skary

5210

i merge two PDF Files into one with PDFBOX Version 2.
The First one got Fonts:

name                                 type              encoding         emb sub uni object ID

------------------------------------ ----------------- ---------------- --- --- --- ---------

XXMGEM+Arial-BoldMT                  TrueType          WinAnsi          yes yes yes     15  0

XXMGEM+ArialMT                       TrueType          WinAnsi          yes yes yes     19  0

XXMGEM+ArialMT                       CID TrueType      Identity-H       yes yes yes     27  0

XXMGEM+ArialNarrow-Bold              TrueType          WinAnsi          yes yes yes     40  0

XXMGEM+ArialNarrow                   TrueType          WinAnsi          yes yes yes     44  0

and the Second one:

name                                 type              encoding         emb sub uni object ID

------------------------------------ ----------------- ---------------- --- --- --- ---------

UNTWVR+HelveticaLTCom-Roman          CID TrueType      Identity-H       yes yes yes     25  0

UNTYID+HelveticaLTCom-Bold           CID TrueType      Identity-H       yes yes yes     26  0

UNTZUP+ArialMT                       CID TrueType      Identity-H       yes yes yes     27  0

UNUBHB+Arial-BoldMT                  CID TrueType      Identity-H       yes yes yes     28  0

Helvetica-Bold                       Type 1            WinAnsi          no  no  no      29  0

UNXPUH+HelveticaLTCom-Roman          CID TrueType      Identity-H       yes yes yes     50  0

UNXRGT+HelveticaLTCom-Bold           CID TrueType      Identity-H       yes yes yes     51  0

UNXSTF+ArialMT                       CID TrueType      Identity-H       yes yes yes     52  0

UNXUFR+Arial-BoldMT                  CID TrueType      Identity-H       yes yes yes     53  0

After Merging, this happens:

name                                 type              encoding         emb sub uni object ID

------------------------------------ ----------------- ---------------- --- --- --- ---------

SRWYVL+HelveticaLTCom-Roman          CID TrueType      Identity-H       yes yes yes    420  0

SRXAHX+HelveticaLTCom-Bold           CID TrueType      Identity-H       yes yes yes    421  0

SRXBUJ+ArialMT                       CID TrueType      Identity-H       yes yes yes    422  0

SRXDGV+Arial-BoldMT                  CID TrueType      Identity-H       yes yes yes    423  0

Helvetica-Bold                       Type 1            WinAnsi          no  no  no     424  0

SRWYVL+HelveticaLTCom-Roman          CID TrueType      Identity-H       yes yes yes    425  0

SRXAHX+HelveticaLTCom-Bold           CID TrueType      Identity-H       yes yes yes    426  0

SRXBUJ+ArialMT                       CID TrueType      Identity-H       yes yes yes    427  0

SRXDGV+Arial-BoldMT                  CID TrueType      Identity-H       yes yes yes    428  0

SRWYVL+ArialMT                       CID TrueType      Identity-H       yes yes yes    429  0

SRXAHX+HelveticaLTCom-Roman          CID TrueType      Identity-H       yes yes yes    430  0

SRXBUJ+HelveticaLTCom-Bold           CID TrueType      Identity-H       yes yes yes    431  0

SRXDGV+Arial-BoldMT                  CID TrueType      Identity-H       yes yes yes    432  0

WDEGAT+Arial-BoldMT                  TrueType          WinAnsi          yes yes yes    436  0

GSEDXU+ArialMT                       TrueType          WinAnsi          yes yes yes    437  0

Arial                                TrueType          WinAnsi          yes no  no     416  0

ZapfDingbats                         TrueType          WinAnsi          yes no  yes    419  0

ArialNarrow                          TrueType          WinAnsi          yes no  no     417  0

ACHRDX+ZapfDingbats                  TrueType          WinAnsi          yes yes yes    618  0

ACHRDX+ZapfDingbats                  TrueType          WinAnsi          yes yes yes    619  0

ACHRDX+ZapfDingbats                  TrueType          WinAnsi          yes yes yes    620  0

ACHRDX+ZapfDingbats                  TrueType          WinAnsi          yes yes yes    621  0

ACHRDX+ZapfDingbats                  TrueType          WinAnsi          yes yes yes    622  0

GSEDXU+ArialNarrow-Bold              TrueType          WinAnsi          yes yes yes    560  0

NVGLHQ+ArialNarrow                   TrueType          WinAnsi          yes yes yes    561  0

KWHHMM+ArialMT                       CID TrueType      Identity-H       yes yes yes    578  0

My Code in Java:

final PDFMergerUtility pdfMerger = new PDFMergerUtility();

            pdfMerger.setDestinationStream(outputStream);

            pdfMerger.addSources(additionalPdfStreams);

            pdfMerger.addSource(inputStreamPdDocument);

            pdfMerger.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());

The Problem is that an Api from a third party vendor got an Problem with this Fonts.
So : What am i doing wrong and how can i remove the unused and doubled fonts ??

java pdfbox

asked Nov 15 '18 at 9:27

Skary

5210

asked Nov 15 '18 at 9:27

Skary

5210

asked Nov 15 '18 at 9:27

Skary

5210

asked Nov 15 '18 at 9:27

Skary

5210

asked Nov 15 '18 at 9:27

Skary

5210

This question has an open bounty worth +350
reputation from danny117 ending in 3 days.

This question has not received enough attention.

Consolidate fonts, Consolidate backgrounds, Consolidate images. Optimize for web viewing. Same things acobat standard does when user opens a pdf followed by save as pdf.

This question has an open bounty worth +350
reputation from danny117 ending in 3 days.

This question has not received enough attention.

Consolidate fonts, Consolidate backgrounds, Consolidate images. Optimize for web viewing. Same things acobat standard does when user opens a pdf followed by save as pdf.

2

Please also share the source PDF files to allow reproducing the issue. In particular I'm surprised that your test run seems to indicate that PDFBox renames embedded subsets. It is possible I missed that but I don't consider it probable.

– mkl
Nov 15 '18 at 10:29

PDFBox doesn't rename fonts. What PDFBox version do you use? Are you sure that the result file was font-analysed directly after the merge, and not after something else? Is it the correct file?

– Tilman Hausherr
Nov 15 '18 at 12:27

Hi, i cannot upload the PDF Files its not for the public. @TilmanHausherr : Yes, the PDF was analyzed directly after the PDFBox merged it we are using 2.0.11

– Skary
Nov 16 '18 at 7:45

Current version is 2.0.12. Can you reproduce the problem by using the command line merge utility? If yes, could you try to reproduce the problem with two non confidential PDF files?

– Tilman Hausherr
Nov 16 '18 at 9:01

This is easily duplicated just copy mypdf.pdf to copy of mypdf.pdf then merge them together. They carry double fonts double images double backgrounds.

– danny117
Feb 27 at 22:18

|
show 2 more comments

2

Please also share the source PDF files to allow reproducing the issue. In particular I'm surprised that your test run seems to indicate that PDFBox renames embedded subsets. It is possible I missed that but I don't consider it probable.

– mkl
Nov 15 '18 at 10:29

PDFBox doesn't rename fonts. What PDFBox version do you use? Are you sure that the result file was font-analysed directly after the merge, and not after something else? Is it the correct file?

– Tilman Hausherr
Nov 15 '18 at 12:27

Hi, i cannot upload the PDF Files its not for the public. @TilmanHausherr : Yes, the PDF was analyzed directly after the PDFBox merged it we are using 2.0.11

– Skary
Nov 16 '18 at 7:45

Current version is 2.0.12. Can you reproduce the problem by using the command line merge utility? If yes, could you try to reproduce the problem with two non confidential PDF files?

– Tilman Hausherr
Nov 16 '18 at 9:01

This is easily duplicated just copy mypdf.pdf to copy of mypdf.pdf then merge them together. They carry double fonts double images double backgrounds.

– danny117
Feb 27 at 22:18

Please also share the source PDF files to allow reproducing the issue. In particular I'm surprised that your test run seems to indicate that PDFBox renames embedded subsets. It is possible I missed that but I don't consider it probable.

– mkl
Nov 15 '18 at 10:29

PDFBox doesn't rename fonts. What PDFBox version do you use? Are you sure that the result file was font-analysed directly after the merge, and not after something else? Is it the correct file?

– Tilman Hausherr
Nov 15 '18 at 12:27

Hi, i cannot upload the PDF Files its not for the public. @TilmanHausherr : Yes, the PDF was analyzed directly after the PDFBox merged it we are using 2.0.11

– Skary
Nov 16 '18 at 7:45

Current version is 2.0.12. Can you reproduce the problem by using the command line merge utility? If yes, could you try to reproduce the problem with two non confidential PDF files?

– Tilman Hausherr
Nov 16 '18 at 9:01

This is easily duplicated just copy mypdf.pdf to copy of mypdf.pdf then merge them together. They carry double fonts double images double backgrounds.

– danny117
Feb 27 at 22:18

|
show 2 more comments

1 Answer
1

active

oldest

votes

The "duplication" issue seems like it's coming from multiple pages, because each page contains its own font metadata. If you iterate over the pages and get the font names, then you will see duplicates in the output if a font is used in more than one page.

Something seems very wrong with the details in the question though. Neither of the source files have ZapfDingbats font, so where did it come from into the merged document?

First, I wrote a couple of helper methods:

static String mergePdfs(InputStream is1, InputStream is2) throws IOException {

    PDFMergerUtility pdfMerger = new PDFMergerUtility();

    pdfMerger.addSource(is1);

    pdfMerger.addSource(is2);



    String destFile = System.getProperty("java.io.tmpdir") + System.nanoTime() + ".pdf";

    pdfMerger.setDestinationFileName(destFile);

    pdfMerger.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());



    return destFile;

}



static List<String> getFontNames(PDDocument doc) throws IOException {

    List<String> result = new ArrayList<>();

    for (int i=0; i < doc.getNumberOfPages(); i++){

        PDPage page = doc.getPage(i);

        PDResources res = page.getResources();

        for (COSName fontName : res.getFontNames()) {

            result.add(res.getFont(fontName).toString());

        }

    }



    return result;

}

Then I created 3 test PDF documents. The first 2, test-pdf-1.pdf and test-pdf-2.pdf contain one page each and use the same two fonts: PDTrueTypeFont BAAAAA+ArialMT and PDTrueTypeFont CAAAAA+Roboto-Black. The 3rd one, test-pdf-3.pdf, contains 2 pages from the first two documents, and was created with a text editor and not with PDFBox.

And then added the following test code:

Class clazz = Test.class;

String src1, src2, src3;

src1 = "/test-pdf-1.pdf";

src2 = "/test-pdf-2.pdf";

src3 = "/test-pdf-3.pdf";



InputStream is1, is2, is3;

is1 = clazz.getResourceAsStream(src1);

is2 = clazz.getResourceAsStream(src2);



String merged = mergePdfs(is1, is2);



PDDocument doc1, doc2, doc3, doc4;



is1 = clazz.getResourceAsStream(src1);

doc1 = PDDocument.load(is1);



is2 = clazz.getResourceAsStream(src2);

doc2 = PDDocument.load(is2);



is3 = clazz.getResourceAsStream(src3);

doc3 = PDDocument.load(is3);



doc4 = PDDocument.load(new File(merged));



System.out.println(src1 + " >nt" + getFontNames(doc1));

System.out.println(src2 + " >nt" + getFontNames(doc2));

System.out.println(src3 + " >nt" + getFontNames(doc3));

System.out.println(merged  + " >nt" + getFontNames(doc4));

The output is as follows (I truncated the last file name for readability and easier comparison):

/test-pdf-1.pdf >

[PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black]

/test-pdf-2.pdf >

[PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black]

/test-pdf-3.pdf >

[PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black, PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black]

C:Temp..9.pdf >

[PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black, PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black]

You can see that both the file created by PDFBox's merge, "C:temp7193671804393899.pdf" (abbreviated in the output for readability), and the file "test-pdf-3.pdf" which was created with an editor have the same output for fonts, showing each font twice, one for each page.

Opening the merged file in Acrobat Reader confirms that only one copy of the fonts exists:

C:temp7193671804393899.pdf Properties > Fonts

edited Feb 28 at 2:37

answered Feb 28 at 1:59

isapir

6,86254662

Your code gets only the top level resources. There could be more fonts in form xobjects, in field widgets, etc. Btw if you'd use the result as a set and not as a list you'd eliminate the duplicates.

– Tilman Hausherr
Feb 28 at 10:29

The intent here was to address the question specifically so using a Set would have defeated the purpose. I created the test documents so I know that there are no forms etc. I kept the code as simple as possible for clarity and readability.

– isapir
Feb 28 at 15:00

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53316182%2fpdfbox-merge-adds-unused-fonts-how-to-remove-it%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Something seems very wrong with the details in the question though. Neither of the source files have ZapfDingbats font, so where did it come from into the merged document?

First, I wrote a couple of helper methods:

static String mergePdfs(InputStream is1, InputStream is2) throws IOException {

    PDFMergerUtility pdfMerger = new PDFMergerUtility();

    pdfMerger.addSource(is1);

    pdfMerger.addSource(is2);



    String destFile = System.getProperty("java.io.tmpdir") + System.nanoTime() + ".pdf";

    pdfMerger.setDestinationFileName(destFile);

    pdfMerger.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());



    return destFile;

}



static List<String> getFontNames(PDDocument doc) throws IOException {

    List<String> result = new ArrayList<>();

    for (int i=0; i < doc.getNumberOfPages(); i++){

        PDPage page = doc.getPage(i);

        PDResources res = page.getResources();

        for (COSName fontName : res.getFontNames()) {

            result.add(res.getFont(fontName).toString());

        }

    }



    return result;

}

And then added the following test code:

Class clazz = Test.class;

String src1, src2, src3;

src1 = "/test-pdf-1.pdf";

src2 = "/test-pdf-2.pdf";

src3 = "/test-pdf-3.pdf";



InputStream is1, is2, is3;

is1 = clazz.getResourceAsStream(src1);

is2 = clazz.getResourceAsStream(src2);



String merged = mergePdfs(is1, is2);



PDDocument doc1, doc2, doc3, doc4;



is1 = clazz.getResourceAsStream(src1);

doc1 = PDDocument.load(is1);



is2 = clazz.getResourceAsStream(src2);

doc2 = PDDocument.load(is2);



is3 = clazz.getResourceAsStream(src3);

doc3 = PDDocument.load(is3);



doc4 = PDDocument.load(new File(merged));



System.out.println(src1 + " >nt" + getFontNames(doc1));

System.out.println(src2 + " >nt" + getFontNames(doc2));

System.out.println(src3 + " >nt" + getFontNames(doc3));

System.out.println(merged  + " >nt" + getFontNames(doc4));

The output is as follows (I truncated the last file name for readability and easier comparison):

/test-pdf-1.pdf >

[PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black]

/test-pdf-2.pdf >

[PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black]

/test-pdf-3.pdf >

[PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black, PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black]

C:Temp..9.pdf >

[PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black, PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black]

Opening the merged file in Acrobat Reader confirms that only one copy of the fonts exists:

C:temp7193671804393899.pdf Properties > Fonts

edited Feb 28 at 2:37

answered Feb 28 at 1:59

isapir

6,86254662

Your code gets only the top level resources. There could be more fonts in form xobjects, in field widgets, etc. Btw if you'd use the result as a set and not as a list you'd eliminate the duplicates.

– Tilman Hausherr
Feb 28 at 10:29

The intent here was to address the question specifically so using a Set would have defeated the purpose. I created the test documents so I know that there are no forms etc. I kept the code as simple as possible for clarity and readability.

– isapir
Feb 28 at 15:00

add a comment |

Something seems very wrong with the details in the question though. Neither of the source files have ZapfDingbats font, so where did it come from into the merged document?

First, I wrote a couple of helper methods:

static String mergePdfs(InputStream is1, InputStream is2) throws IOException {

    PDFMergerUtility pdfMerger = new PDFMergerUtility();

    pdfMerger.addSource(is1);

    pdfMerger.addSource(is2);



    String destFile = System.getProperty("java.io.tmpdir") + System.nanoTime() + ".pdf";

    pdfMerger.setDestinationFileName(destFile);

    pdfMerger.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());



    return destFile;

}



static List<String> getFontNames(PDDocument doc) throws IOException {

    List<String> result = new ArrayList<>();

    for (int i=0; i < doc.getNumberOfPages(); i++){

        PDPage page = doc.getPage(i);

        PDResources res = page.getResources();

        for (COSName fontName : res.getFontNames()) {

            result.add(res.getFont(fontName).toString());

        }

    }



    return result;

}

And then added the following test code:

Class clazz = Test.class;

String src1, src2, src3;

src1 = "/test-pdf-1.pdf";

src2 = "/test-pdf-2.pdf";

src3 = "/test-pdf-3.pdf";



InputStream is1, is2, is3;

is1 = clazz.getResourceAsStream(src1);

is2 = clazz.getResourceAsStream(src2);



String merged = mergePdfs(is1, is2);



PDDocument doc1, doc2, doc3, doc4;



is1 = clazz.getResourceAsStream(src1);

doc1 = PDDocument.load(is1);



is2 = clazz.getResourceAsStream(src2);

doc2 = PDDocument.load(is2);



is3 = clazz.getResourceAsStream(src3);

doc3 = PDDocument.load(is3);



doc4 = PDDocument.load(new File(merged));



System.out.println(src1 + " >nt" + getFontNames(doc1));

System.out.println(src2 + " >nt" + getFontNames(doc2));

System.out.println(src3 + " >nt" + getFontNames(doc3));

System.out.println(merged  + " >nt" + getFontNames(doc4));

The output is as follows (I truncated the last file name for readability and easier comparison):

/test-pdf-1.pdf >

[PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black]

/test-pdf-2.pdf >

[PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black]

/test-pdf-3.pdf >

[PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black, PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black]

C:Temp..9.pdf >

[PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black, PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black]

Opening the merged file in Acrobat Reader confirms that only one copy of the fonts exists:

C:temp7193671804393899.pdf Properties > Fonts

edited Feb 28 at 2:37

answered Feb 28 at 1:59

isapir

6,86254662

Your code gets only the top level resources. There could be more fonts in form xobjects, in field widgets, etc. Btw if you'd use the result as a set and not as a list you'd eliminate the duplicates.

– Tilman Hausherr
Feb 28 at 10:29

The intent here was to address the question specifically so using a Set would have defeated the purpose. I created the test documents so I know that there are no forms etc. I kept the code as simple as possible for clarity and readability.

– isapir
Feb 28 at 15:00

add a comment |

Something seems very wrong with the details in the question though. Neither of the source files have ZapfDingbats font, so where did it come from into the merged document?

First, I wrote a couple of helper methods:

static String mergePdfs(InputStream is1, InputStream is2) throws IOException {

    PDFMergerUtility pdfMerger = new PDFMergerUtility();

    pdfMerger.addSource(is1);

    pdfMerger.addSource(is2);



    String destFile = System.getProperty("java.io.tmpdir") + System.nanoTime() + ".pdf";

    pdfMerger.setDestinationFileName(destFile);

    pdfMerger.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());



    return destFile;

}



static List<String> getFontNames(PDDocument doc) throws IOException {

    List<String> result = new ArrayList<>();

    for (int i=0; i < doc.getNumberOfPages(); i++){

        PDPage page = doc.getPage(i);

        PDResources res = page.getResources();

        for (COSName fontName : res.getFontNames()) {

            result.add(res.getFont(fontName).toString());

        }

    }



    return result;

}

And then added the following test code:

Class clazz = Test.class;

String src1, src2, src3;

src1 = "/test-pdf-1.pdf";

src2 = "/test-pdf-2.pdf";

src3 = "/test-pdf-3.pdf";



InputStream is1, is2, is3;

is1 = clazz.getResourceAsStream(src1);

is2 = clazz.getResourceAsStream(src2);



String merged = mergePdfs(is1, is2);



PDDocument doc1, doc2, doc3, doc4;



is1 = clazz.getResourceAsStream(src1);

doc1 = PDDocument.load(is1);



is2 = clazz.getResourceAsStream(src2);

doc2 = PDDocument.load(is2);



is3 = clazz.getResourceAsStream(src3);

doc3 = PDDocument.load(is3);



doc4 = PDDocument.load(new File(merged));



System.out.println(src1 + " >nt" + getFontNames(doc1));

System.out.println(src2 + " >nt" + getFontNames(doc2));

System.out.println(src3 + " >nt" + getFontNames(doc3));

System.out.println(merged  + " >nt" + getFontNames(doc4));

The output is as follows (I truncated the last file name for readability and easier comparison):

/test-pdf-1.pdf >

[PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black]

/test-pdf-2.pdf >

[PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black]

/test-pdf-3.pdf >

[PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black, PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black]

C:Temp..9.pdf >

[PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black, PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black]

Opening the merged file in Acrobat Reader confirms that only one copy of the fonts exists:

C:temp7193671804393899.pdf Properties > Fonts

edited Feb 28 at 2:37

answered Feb 28 at 1:59

isapir

6,86254662

Something seems very wrong with the details in the question though. Neither of the source files have ZapfDingbats font, so where did it come from into the merged document?

First, I wrote a couple of helper methods:

static String mergePdfs(InputStream is1, InputStream is2) throws IOException {

    PDFMergerUtility pdfMerger = new PDFMergerUtility();

    pdfMerger.addSource(is1);

    pdfMerger.addSource(is2);



    String destFile = System.getProperty("java.io.tmpdir") + System.nanoTime() + ".pdf";

    pdfMerger.setDestinationFileName(destFile);

    pdfMerger.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());



    return destFile;

}



static List<String> getFontNames(PDDocument doc) throws IOException {

    List<String> result = new ArrayList<>();

    for (int i=0; i < doc.getNumberOfPages(); i++){

        PDPage page = doc.getPage(i);

        PDResources res = page.getResources();

        for (COSName fontName : res.getFontNames()) {

            result.add(res.getFont(fontName).toString());

        }

    }



    return result;

}

And then added the following test code:

Class clazz = Test.class;

String src1, src2, src3;

src1 = "/test-pdf-1.pdf";

src2 = "/test-pdf-2.pdf";

src3 = "/test-pdf-3.pdf";



InputStream is1, is2, is3;

is1 = clazz.getResourceAsStream(src1);

is2 = clazz.getResourceAsStream(src2);



String merged = mergePdfs(is1, is2);



PDDocument doc1, doc2, doc3, doc4;



is1 = clazz.getResourceAsStream(src1);

doc1 = PDDocument.load(is1);



is2 = clazz.getResourceAsStream(src2);

doc2 = PDDocument.load(is2);



is3 = clazz.getResourceAsStream(src3);

doc3 = PDDocument.load(is3);



doc4 = PDDocument.load(new File(merged));



System.out.println(src1 + " >nt" + getFontNames(doc1));

System.out.println(src2 + " >nt" + getFontNames(doc2));

System.out.println(src3 + " >nt" + getFontNames(doc3));

System.out.println(merged  + " >nt" + getFontNames(doc4));

The output is as follows (I truncated the last file name for readability and easier comparison):

/test-pdf-1.pdf >

[PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black]

/test-pdf-2.pdf >

[PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black]

/test-pdf-3.pdf >

[PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black, PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black]

C:Temp..9.pdf >

[PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black, PDTrueTypeFont BAAAAA+ArialMT, PDTrueTypeFont CAAAAA+Roboto-Black]

Opening the merged file in Acrobat Reader confirms that only one copy of the fonts exists:

C:temp7193671804393899.pdf Properties > Fonts

edited Feb 28 at 2:37

answered Feb 28 at 1:59

isapir

6,86254662

edited Feb 28 at 2:37

answered Feb 28 at 1:59

isapir

6,86254662

answered Feb 28 at 1:59

isapir

6,86254662

answered Feb 28 at 1:59

isapir

6,86254662

Your code gets only the top level resources. There could be more fonts in form xobjects, in field widgets, etc. Btw if you'd use the result as a set and not as a list you'd eliminate the duplicates.

– Tilman Hausherr
Feb 28 at 10:29

The intent here was to address the question specifically so using a Set would have defeated the purpose. I created the test documents so I know that there are no forms etc. I kept the code as simple as possible for clarity and readability.

– isapir
Feb 28 at 15:00

add a comment |

Your code gets only the top level resources. There could be more fonts in form xobjects, in field widgets, etc. Btw if you'd use the result as a set and not as a list you'd eliminate the duplicates.

– Tilman Hausherr
Feb 28 at 10:29

The intent here was to address the question specifically so using a Set would have defeated the purpose. I created the test documents so I know that there are no forms etc. I kept the code as simple as possible for clarity and readability.

– isapir
Feb 28 at 15:00

Your code gets only the top level resources. There could be more fonts in form xobjects, in field widgets, etc. Btw if you'd use the result as a set and not as a list you'd eliminate the duplicates.

– Tilman Hausherr
Feb 28 at 10:29

The intent here was to address the question specifically so using a Set would have defeated the purpose. I created the test documents so I know that there are no forms etc. I kept the code as simple as possible for clarity and readability.

– isapir
Feb 28 at 15:00

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vfrdtyky