Extract a page out of a PdfDocument

Hi All,

I have an acroform PdfDocument which contains several pages. In certain situation I only need a particular page, but trying to use

byte[] b = pdf.pages[x].GetContents();

which is a byte array and use that to create a new PdfDocument

PdfDocument p = new PdfDocument(b);

is throwing Invalid PDF File. Cross-reference table not found.

My intention is to use p as:

MergeDocument m = new MergeDocument();
d.Append(p, new MergeOptions(true, prefix));

And then do a form flattenning by using CreateLabel on p.

I am forced into trying to split the original acroform document because there are fields in several pages which were given the same name, so GetOriginalPageNumber for any of such field returns -1 and CreateLabel is failing.

Any help will be appreciated.

Thanks.

Posted by a ceTe Software moderator

Hello,

The pdf.pages[x].GetContents() method returns the byte data of a particular page and not the PDF document. This data cannot be used to create PdfDocument object. You can directly use the original PdfDocument object you are having when merging to the document. You can specify the start page number and number of pages to merge along with the MergeOptions as parameters in the Append method. So there will be no problem in reorganizing the form field names.

MergeDocument document = new MergeDocument();
document.Append(pdfDocument, startPageNumber, numberOfPagesToMerge, new MergeOptions(true, prefix));

Thanks,
ceTe Software Support Team.

Thanks for the reply.

But the problem I have with the method you suggested is that, I
am doing form flattenning, so I will still have to call CreateLabel on the original PdfDocument, in which I need to supply the page on which the label is located.

This is failing.

This is why I want to split up the original acroform PdfDocument that contains say 3 pages, so that I can then prefix those duplicated field names on each page, before merging them back, to a create a new corrected acroform pdfdocument.

Note that obtaining a list of form fields in this situation actually returned a single entry for the duplicated fields even though it apears in three different pages.So my thinking is that is it possible then to break up the three pages of the acroform pdfdocument, so that each page is a full fledge pdfdocument? If that is possible, please throw light on how to.

The easiest fix would have been for the acroform pdfdocument to be corrected to make each field name unique, but at the moment, I don't have tht luxury.

Thanks once again.

Posted by a ceTe Software moderator

Hello,

Are you trying to do the form flattening after merging the pages by reorganizing the field names? If so you can do this by creating a PdfDocument object with the final PDF after merging the pages. Below is a sample code for this.

   PdfDocument pdf = new PdfDocument(@"C:\Temp\document.pdf");
   MergeDocument doc = new MergeDocument();
   for (int i = 0; i < 5; i++)
   {
      doc.Append(pdf, 1, 1, new MergeOptions(true, i.ToString()));
   }

   PdfDocument pdfNew = new PdfDocument(doc.Draw());
   MergeDocument docNew = new MergeDocument(pdfNew, MergeOptions.None);

   PdfFormField f = pdfNew.Form.Fields["0.f1-1"];
   f.CreateLabel(docNew.Pages[f.GetOriginalPageNumber() - 1], "New flattened text", Font.CourierBold, 12);

   docNew.DrawToWeb();

Thanks,
ceTe Software Support Team.

Aah, yes. I think I found what I want.

Thanks a million.

Individuals searching for the best tool to extract pages from a PDF file should use this one that I found on the internet, the Pcinfotools PDF Split and Merge tool, this software splits large PDF files into multiple parts and allows users to split large files with multiple pages into separate files, It is an incredible tool that can extract individual pages from large PDF files with PDF splitter software. It also offers a division of large files by pages from PDF files with various pages. In the end, this software ensures complete data safety during the PDF file-splitting process and supports all versions of Microsoft Windows OS.

Extract a page out of a PdfDocument

DynamicPDF CoreSuite for .NET (v5) Forum