PdfReader – PDF Structure Inspector

A lightweight C# console tool for inspecting PDF structure, detecting content, and comparing blank vs. filled PDFs. This tool is especially useful when debugging auto-generated PDFs where content may be invisible or empty.

🚀 Features

Automatically scans all PDF files in the Files/ directory

Prints, for each PDF:

Total pages
Text length
Word & letter count
Number of images
Page text content
Helps you detect PDFs that are effectively “blank”
No command-line args needed — simple dotnet run
Uses PdfPig (free & open-source)

📁 Project Structure

PdfReader/
│
├── Files/                   # Put your PDF files here
│   ├── sample1.pdf
│   ├── sample2.pdf
│
├── Program.cs               # Entry point – scans Files/ folder
├── PdfAnalyzer.cs           # Extracts PDF structure information
├── PdfReport.cs             # Formats and prints analysis results
├── PdfReader.csproj
└── README.md

🛠 Requirements

.NET 8.0 or later
Windows, Linux, or macOS
NuGet package: UglyToad.PdfPig
Install PdfPig:
```
dotnet add package UglyToad.PdfPig
```

📦 Installation

Restore dependencies:

dotnet restore

Build the project:

dotnet build

📂 Adding PDF Files

Inside the project directory, create a folder named:

Files

Add any .pdf files you want to analyze:

PdfReader/Files/
    blank.pdf
    document1.pdf
    invoice.pdf

▶️ Running the Project

Run:

dotnet run

The app will:

Automatically detect all PDFs inside Files/

Process each PDF one by one

Print structured reports to the console

Example output:

📁 Found 3 PDF(s) in: .../PdfReader/bin/Debug/net8.0/Files

======================================================
📄 Processing: blank.pdf
======================================================
------------- PAGE 1 -------------
Text Length:   0
Letters Count: 0
Words Count:   0
Images:        0

Text Content:
[NO TEXT]

======================================================
📄 Processing: filled.pdf
======================================================
------------- PAGE 1 -------------
Text Length:   120
Letters Count: 145
Words Count:   18
Images:        1

Text Content:
Patient: John Doe...

⚙️ Ensuring Files/ Folder Is Copied to Output

The Files folder must be included in your build output so the app can find the PDFs when running from bin/.

Your PdfReader.csproj must contain:

<ItemGroup>
  <Content Include="Files\**\*">
    <CopyToOutputDirectory>Always</CopyToOutputDirectory>
  </Content>
</ItemGroup>

This ensures that files are available at runtime under:

bin/Debug/net8.0/Files/

📐 Code Overview PdfAnalyzer.cs

Responsible for analyzing each PDF and building a PdfAnalysisResult with:

Page text

Word count

Letter count

Images count

Example shape (simplified):

public class PdfPageInfo
{
    public int PageNumber { get; set; }
    public string Text { get; set; } = "";
    public int ImagesCount { get; set; }
    public int WordsCount { get; set; }
    public int LettersCount { get; set; }
}

PdfReport.cs

Formats and prints readable console output for each PdfAnalysisResult.

Program.cs

Locates the Files/ folder

Enumerates all *.pdf files

Uses PdfAnalyzer to analyze each PDF

Uses PdfReport to print the results

🧪 Detecting “Blank PDFs” (Optional Helper)

You can add a small helper method to classify a PDF as “basically empty”:

using System.Linq;

bool IsBasicallyEmpty(PdfAnalysisResult pdf)
{
    return pdf.Pages.All(p =>
        string.IsNullOrWhiteSpace(p.Text) &&
        p.ImagesCount == 0 &&
        p.LettersCount == 0 &&
        p.WordsCount == 0
    );
}

You can then call this per file after analysis to quickly decide if the PDF has meaningful content or not.

📈 Possible Future Enhancements

Export analysis results to JSON or CSV

Compare two PDFs side by side

Highlight structural differences between PDFs

Colored console output for better readability

Save reports into a /Reports directory

Heuristics to distinguish scanned-image PDFs vs. digital-text PDFs

📝 License

This project is intended for debugging, testing, and internal development use. You are free to modify or extend it according to your needs.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
PdfReader		PdfReader
.dockerignore		.dockerignore
.gitignore		.gitignore
PdfReader.sln		PdfReader.sln
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PdfReader – PDF Structure Inspector

🚀 Features

📁 Project Structure

🛠 Requirements

📦 Installation

📂 Adding PDF Files

▶️ Running the Project

⚙️ Ensuring Files/ Folder Is Copied to Output

🧪 Detecting “Blank PDFs” (Optional Helper)

📈 Possible Future Enhancements

📝 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PdfReader – PDF Structure Inspector

🚀 Features

📁 Project Structure

🛠 Requirements

📦 Installation

📂 Adding PDF Files

▶️ Running the Project

⚙️ Ensuring Files/ Folder Is Copied to Output

🧪 Detecting “Blank PDFs” (Optional Helper)

📈 Possible Future Enhancements

📝 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages