javascriptroom blog

How to Detect File Type (PDF, JPG) from Buffer in Node.js: A Practical Guide

In Node.js, handling files is a common task—whether you’re building a file upload service, processing user-generated content, or validating data integrity. A critical security and functionality requirement is detecting the actual file type of a resource, rather than relying on filenames or extensions (which are easily spoofed). For example, a malicious user might rename a .exe file to .jpg, but the file’s content will reveal its true nature.

This guide focuses on detecting common file types like PDF and JPG directly from a Node.js Buffer (raw binary data). We’ll explore how file signatures (magic numbers) work, leverage popular libraries, and walk through practical examples—from basic buffer analysis to real-world scenarios like web uploads.

2025-12

Table of Contents#

  1. Prerequisites
  2. Understanding File Signatures (Magic Numbers)
  3. Popular Libraries for File Type Detection
  4. Step-by-Step Implementation
  5. Advanced Scenarios
  6. Common Pitfalls and Solutions
  7. Conclusion
  8. References

Prerequisites#

Before diving in, ensure you have:

  • Node.js (v14+ recommended, as we’ll use modern features like ES modules).
  • Basic familiarity with Node.js Buffer objects (raw binary data containers).
  • npm or yarn for package management.

Understanding File Signatures (Magic Numbers)#

File types are identified by magic numbers (file signatures)—specific byte sequences at the start of a file. These sequences are unique to each file format, making them far more reliable than filenames.

Key Signatures for PDF and JPG:#

  • JPG/JPEG: Starts with 0xFF 0xD8 0xFF (hex) or ÿØÿ in ASCII.
  • PDF: Starts with %PDF- (ASCII), which translates to hex 25 50 44 46 2D.

Libraries like file-type use these signatures to detect file types from a buffer.

Several Node.js libraries simplify file type detection from buffers. Here are the most reliable:

LibraryUse CaseFeatures
file-typeGeneral-purpose detection (supports 1000+ types)Promise-based, supports buffers/streams, minimal dependencies.
is-jpgJPG-specific detectionLightweight, checks JPG magic numbers.
is-pdfPDF-specific detectionLightweight, checks PDF header.

We’ll focus on file-type for its versatility—it handles both PDF and JPG (and hundreds more) with minimal code.

Step-by-Step Implementation#

Basic Detection with file-type#

First, install file-type:

npm install file-type  
# or  
yarn add file-type  

file-type works by analyzing the first few bytes of a buffer. It returns an object with mime (MIME type) and ext (file extension) properties, or undefined if undetectable.

Example 1: Detect File Type from a Buffer#

// Import file-type (ES modules)  
import { fileTypeFromBuffer } from 'file-type';  
// For CommonJS: const { fileTypeFromBuffer } = require('file-type');  
 
import { readFile } from 'fs/promises';  
 
async function detectFile(buffer) {  
  const type = await fileTypeFromBuffer(buffer);  
  if (!type) {  
    console.log('File type not detected');  
    return;  
  }  
  console.log(`Detected: ${type.ext} (${type.mime})`);  
}  
 
// Test with a JPG file  
const jpgBuffer = await readFile('image.jpg');  
detectFile(jpgBuffer); // Output: "Detected: jpg (image/jpeg)"  
 
// Test with a PDF file  
const pdfBuffer = await readFile('document.pdf');  
detectFile(pdfBuffer); // Output: "Detected: pdf (application/pdf)"  

Detecting PDF and JPG Specifically#

To explicitly check for PDF or JPG, validate the ext or mime properties from file-type:

async function isPdfOrJpg(buffer) {  
  const type = await fileTypeFromBuffer(buffer);  
  if (!type) return false;  
 
  const isPdf = type.ext === 'pdf' && type.mime === 'application/pdf';  
  const isJpg = type.ext === 'jpg' && type.mime === 'image/jpeg';  
 
  return { isPdf, isJpg };  
}  
 
// Usage  
const buffer = await readFile('unknown-file');  
const { isPdf, isJpg } = await isPdfOrJpg(buffer);  
if (isPdf) console.log('It’s a PDF!');  
if (isJpg) console.log('It’s a JPG!');  

Streaming Large Files#

For large files, you don’t need the entire buffer—just the first few bytes. Use file-type’s streaming API to avoid loading the whole file into memory:

import { fileTypeFromStream } from 'file-type';  
import { createReadStream } from 'fs';  
 
async function detectFromStream(filePath) {  
  const stream = createReadStream(filePath);  
  const type = await fileTypeFromStream(stream);  
  console.log(`Stream detected: ${type?.ext}`);  
}  
 
detectFromStream('large-image.jpg'); // Output: "Stream detected: jpg"  

Advanced Scenarios#

Detecting from Base64 Strings#

If your data is in Base64 (common in APIs), convert it to a buffer first:

async function detectFromBase64(base64String) {  
  // Convert Base64 to buffer  
  const buffer = Buffer.from(base64String, 'base64');  
  return fileTypeFromBuffer(buffer);  
}  
 
// Example: Base64-encoded JPG snippet  
const base64Jpg = '/9j/4AAQSkZJRgABAQEAYABgAAD//gA7Q1JFQVRPUjogZ2QtanBlZyB2MS4wICh1c2luZyBJSkcgSlBFRyB2ODApLCBxdWFsaXR5ID0gOTAK/...';  
const type = await detectFromBase64(base64Jpg);  
console.log(type.ext); // "jpg"  

Express.js File Uploads#

In web apps, validate uploaded files (e.g., with Express and multer). Access the buffer from req.file.buffer:

npm install express multer  
import express from 'express';  
import multer from 'multer';  
import { fileTypeFromBuffer } from 'file-type';  
 
const app = express();  
const upload = multer(); // In-memory storage (use diskStorage for large files)  
 
app.post('/upload', upload.single('file'), async (req, res) => {  
  try {  
    const buffer = req.file.buffer;  
    const type = await fileTypeFromBuffer(buffer);  
 
    if (!type || !(type.ext === 'pdf' || type.ext === 'jpg')) {  
      return res.status(400).send('Only PDF/JPG files are allowed');  
    }  
 
    res.send(`Valid file: ${type.ext}`);  
  } catch (err) {  
    res.status(500).send('Error processing file');  
  }  
});  
 
app.listen(3000, () => console.log('Server running on port 3000'));  

Common Pitfalls and Solutions#

PitfallSolution
Relying on filename extensionsAlways validate the buffer content (extensions are easily faked).
Insufficient buffer sizefile-type needs at least the first 4100 bytes for most types. Read enough of the buffer.
Corrupted filesHandle undefined results from file-type (file may be corrupted).
Streaming without early terminationUse file-type’s fileTypeFromStream to detect early and end the stream.

Conclusion#

Detecting file types from buffers ensures security and reliability in Node.js applications. By leveraging libraries like file-type and focusing on content-based signatures (magic numbers), you avoid the risks of trusting filenames. Key takeaways:

  • Use file-type for versatile, promise-based detection.
  • Validate buffers (not just extensions) for uploaded/processed files.
  • Handle streams and large files efficiently with streaming APIs.

By following this guide, you’ll build robust file validation into your Node.js projects.

References#