Composition

Page-level PDF assembly. The JetsonPDF.Composition package does two things and does them losslessly: PageExtractor pulls a chosen subset of pages out of a PDF into a new file, and Merger concatenates whole PDFs into one. Both copy a page's content, resources, fonts, images, and annotations in their original encoded form — nothing is re-rendered or re-encoded.

Two static entry points.

Extract — PageExtractor.Extract(source, 1, 3, 5). Copy chosen 1-based pages into a new PDF. Order is preserved, so the same call reorders and duplicates pages too.
Merge — Merger.Merge(a, b, c). Concatenate documents in order, carrying over and de-colliding their outlines, named destinations, and AcroForm fields.
Lossless — both are a COS object-graph copy (ISO 32000-2 §7.3), not a render pass. Output quality is identical to the input; there is no generational loss from repeated extract/merge cycles.

Package: dotnet add package JetsonPDF.Composition · targets net8.0 / netstandard2.0 / net462 · depends on Common, Reader.

Overview

Composition sits on top of the Reader — it does not pull in the Writer. A merge or extract is an object-graph copy rather than a render pass:

Parse & resolve — the source is read by the Reader's file parser, which resolves the cross-reference table/stream and decrypts the file when a password is supplied.
Deep-copy each page — the page dictionary and everything reachable from it (content streams, /Resources, fonts, images, annotations) is copied into a fresh object table with all indirect references remapped. A dedup map keyed by (source, object-number) handles cycles and shared resources, so a font used by ten pages is copied once.
Materialize inherited attributes — /Resources, /MediaBox, /CropBox, and /Rotate are flattened onto each copied page (§7.7.3.4) before it is reparented under the new page tree; /Parent is dropped.
Write a fresh file — a new catalog, page tree, classic cross-reference table (§7.5.4), and trailer with a new /ID. The output PDF version is the maximum of the source versions.

Targets — net8.0 / netstandard2.0 / net462. No native dependencies.
Stateless — both types are static and thread-safe; each call builds its own assembler over the input bytes.
In memory — input streams are read fully into memory before processing.

Quick start

Add the package to any .NET project:

dotnet add package JetsonPDF.Composition

Then extract or merge — the in-memory byte[] overloads are the core; file and stream overloads wrap them.

using JetsonPDF.Composition;

// Pull pages 1, 3 and 5 out of a report into a new PDF
byte[] excerpt  = PageExtractor.Extract(reportBytes, 1, 3, 5);

// Concatenate three PDFs into one
byte[] combined = Merger.Merge(coverBytes, bodyBytes, appendixBytes);

File.WriteAllBytes("excerpt.pdf",  excerpt);
File.WriteAllBytes("combined.pdf", combined);

Extract pages

Extracts a subset of pages from an existing PDF into a brand-new PDF. Page numbers are 1-based, and the output keeps them in the exact order you list — so the same call also reorders and duplicates pages.

using JetsonPDF.Composition;

// Single page
byte[] cover    = PageExtractor.Extract(sourceBytes, 1);

// Several pages, in the order given
byte[] picked   = PageExtractor.Extract(sourceBytes, 3, 1, 5);

// Reorder + duplicate: page 2, then page 1 twice
byte[] shuffled = PageExtractor.Extract(sourceBytes, 2, 1, 1);

// Inclusive 1-based range (pages 5..12)
byte[] chapter  = PageExtractor.ExtractRange(sourceBytes, 5, 12);

// Encrypted source — decrypt with the password, output is not encrypted
byte[] unlocked = PageExtractor.Extract(sourceBytes, password: "secret", 1, 2);

File-to-file and stream-to-stream overloads avoid the manual read/write. Streams are never closed by the call, so their lifetimes stay yours.

// File to file
PageExtractor.Extract("report.pdf", "summary.pdf", 1, 2, 10);

// Stream to stream
using var input  = File.OpenRead("report.pdf");
using var output = File.Create("summary.pdf");
PageExtractor.Extract(input, output, 1, 2, 10);

Member	Returns	Notes
`Extract(byte[] source, params int[] pageNumbers)`	`byte[]`	1-based, order preserved.
`Extract(byte[] source, string password, params int[] pageNumbers)`	`byte[]`	Decrypts an encrypted source.
`Extract(string inputPath, string outputPath, params int[] pageNumbers)`	`void`	Reads and writes files (password overload too).
`Extract(Stream input, Stream output, params int[] pageNumbers)`	`void`	Neither stream is closed.
`ExtractRange(byte[] source, int firstPage, int lastPage)`	`byte[]`	Inclusive 1-based range.

Merge documents

Concatenates multiple PDFs into one, in the order supplied. Every page of every source is copied losslessly into a fresh page tree and catalog.

using JetsonPDF.Composition;

// params overload
byte[] combined = Merger.Merge(firstBytes, secondBytes, thirdBytes);

// IEnumerable overload — merge a whole folder in name order
byte[] all = Merger.Merge(
    Directory.EnumerateFiles("chapters", "*.pdf")
             .OrderBy(p => p)
             .Select(File.ReadAllBytes));

// File to file
Merger.Merge(new[] { "a.pdf", "b.pdf", "c.pdf" }, "combined.pdf");

// Stream to stream — the output stream is written but not closed
using var output = File.Create("combined.pdf");
Merger.Merge(new[] { File.OpenRead("a.pdf"), File.OpenRead("b.pdf") }, output);

Encrypted sources must be decrypted first. Merge has no password parameter and throws on an encrypted file it can't read. Extract each source with its password (which yields a decrypted byte[]), then merge the results.

Member	Returns	Notes
`Merge(params byte[][] sources)`	`byte[]`	Concatenate in argument order.
`Merge(IEnumerable<byte[]> sources)`	`byte[]`	Concatenate a sequence.
`Merge(IEnumerable<string> inputPaths, string outputPath)`	`void`	Read files, write the result.
`Merge(IEnumerable<Stream> inputs, Stream output)`	`void`	Output stream is not closed.

Navigation & interactive features

This is where Composition does more than byte-splicing. The document-level features that reference pages are merged across all sources, and cross-document name collisions are disambiguated so nothing silently shares state.

Outlines / bookmarks — each source's outline tree is appended under one merged /Outlines root. Destinations are remapped to the new page objects; a bookmark whose target page was dropped (and which has no surviving children) is pruned. Prev/Next/First/Last/Count linkage is rebuilt, preserving open/closed state.
Named destinations — the modern /Names /Dests name tree and the legacy /Dests dictionary are merged into one name tree. Destinations targeting dropped pages are removed; name collisions across documents are suffixed (intro, intro_2, …).
AcroForm fields — a combined /AcroForm with a unified /Fields list, a merged default-resource (/DR) dictionary, OR-combined /NeedAppearances and /SigFlags, and a concatenated calculation order (/CO). Top-level field-name collisions are suffixed (signature → signature + signature_2) so two forms that reuse a field name stay independent rather than sharing a value.
Default-resource fonts — identical standard fonts from different documents are shared under one resource name; a genuinely different font that lands under an already-used name is added under a fresh name and the referring /DA appearance strings are rewritten to match.

Collision suffixing is consistent across features: a bookmark that points at a renamed named destination follows the rename, and a widget on a renamed field carries the new name too.

// Two PDFs that both define a "signature" field merge into
// "signature" + "signature_2" — each keeps its own value.
byte[] combined = Merger.Merge(formA, formB);

The runnable PdfCompositionDemo sample builds a report (with an outline and named destinations) and a form (with AcroForm fields), then prints the page counts, outline titles, destination keys, and field names of every output so you can confirm exactly what carried over.

Errors

Both operations validate their arguments eagerly.

Condition	Exception
`source` / `sources` / `inputPaths` is `null`	`ArgumentNullException`
No page numbers passed to `Extract`	`ArgumentException`
A page number `< 1` (page numbers are 1-based)	`ArgumentOutOfRangeException`
`ExtractRange` with `firstPage < 1` or `lastPage < firstPage`	`ArgumentException`
Empty `sources` passed to `Merge`	`ArgumentException`
Source is encrypted and the password didn't unlock it (or none was supplied)	`InvalidOperationException`

Scope & limitations

A fresh catalog and page tree are always emitted. What is and isn't carried over:

Carried over	Not carried over
Page content, resources, fonts, images	Document structure tree (tagged-PDF `/StructTreeRoot`)
Per-page annotations (links, widgets, markup)	Catalog-level viewer preferences
Outlines / bookmarks (remapped + pruned)	Page labels
Named destinations (modern tree + legacy dict)	Article threads, OCG layer config
AcroForm fields (`/Fields`, `/DR`, `/CO`, flags)

The document information dictionary (/Info) is preserved — from the source on extract, and from the first document on merge.

Need to build the pages you're assembling? Author them with the Writer, Fluent, or Flow APIs, then compose the results here. To edit the fields of an existing form rather than merge whole documents, see JetsonPDF.Forms.

See the full feature matrix →