Table of Contents#
- Prerequisites
- Understanding File Signatures (Magic Numbers)
- Popular Libraries for File Type Detection
- Step-by-Step Implementation
- Advanced Scenarios
- Common Pitfalls and Solutions
- Conclusion
- References
Prerequisites#
Before diving in, ensure you have:
- Node.js (v14+ recommended, as we’ll use modern features like ES modules).
- Basic familiarity with Node.js
Bufferobjects (raw binary data containers). - npm or yarn for package management.
Understanding File Signatures (Magic Numbers)#
File types are identified by magic numbers (file signatures)—specific byte sequences at the start of a file. These sequences are unique to each file format, making them far more reliable than filenames.
Key Signatures for PDF and JPG:#
- JPG/JPEG: Starts with
0xFF 0xD8 0xFF(hex) orÿØÿin ASCII. - PDF: Starts with
%PDF-(ASCII), which translates to hex25 50 44 46 2D.
Libraries like file-type use these signatures to detect file types from a buffer.
Popular Libraries for File Type Detection#
Several Node.js libraries simplify file type detection from buffers. Here are the most reliable:
| Library | Use Case | Features |
|---|---|---|
file-type | General-purpose detection (supports 1000+ types) | Promise-based, supports buffers/streams, minimal dependencies. |
is-jpg | JPG-specific detection | Lightweight, checks JPG magic numbers. |
is-pdf | PDF-specific detection | Lightweight, checks PDF header. |
We’ll focus on file-type for its versatility—it handles both PDF and JPG (and hundreds more) with minimal code.
Step-by-Step Implementation#
Basic Detection with file-type#
First, install file-type:
npm install file-type
# or
yarn add file-type file-type works by analyzing the first few bytes of a buffer. It returns an object with mime (MIME type) and ext (file extension) properties, or undefined if undetectable.
Example 1: Detect File Type from a Buffer#
// Import file-type (ES modules)
import { fileTypeFromBuffer } from 'file-type';
// For CommonJS: const { fileTypeFromBuffer } = require('file-type');
import { readFile } from 'fs/promises';
async function detectFile(buffer) {
const type = await fileTypeFromBuffer(buffer);
if (!type) {
console.log('File type not detected');
return;
}
console.log(`Detected: ${type.ext} (${type.mime})`);
}
// Test with a JPG file
const jpgBuffer = await readFile('image.jpg');
detectFile(jpgBuffer); // Output: "Detected: jpg (image/jpeg)"
// Test with a PDF file
const pdfBuffer = await readFile('document.pdf');
detectFile(pdfBuffer); // Output: "Detected: pdf (application/pdf)" Detecting PDF and JPG Specifically#
To explicitly check for PDF or JPG, validate the ext or mime properties from file-type:
async function isPdfOrJpg(buffer) {
const type = await fileTypeFromBuffer(buffer);
if (!type) return false;
const isPdf = type.ext === 'pdf' && type.mime === 'application/pdf';
const isJpg = type.ext === 'jpg' && type.mime === 'image/jpeg';
return { isPdf, isJpg };
}
// Usage
const buffer = await readFile('unknown-file');
const { isPdf, isJpg } = await isPdfOrJpg(buffer);
if (isPdf) console.log('It’s a PDF!');
if (isJpg) console.log('It’s a JPG!'); Streaming Large Files#
For large files, you don’t need the entire buffer—just the first few bytes. Use file-type’s streaming API to avoid loading the whole file into memory:
import { fileTypeFromStream } from 'file-type';
import { createReadStream } from 'fs';
async function detectFromStream(filePath) {
const stream = createReadStream(filePath);
const type = await fileTypeFromStream(stream);
console.log(`Stream detected: ${type?.ext}`);
}
detectFromStream('large-image.jpg'); // Output: "Stream detected: jpg" Advanced Scenarios#
Detecting from Base64 Strings#
If your data is in Base64 (common in APIs), convert it to a buffer first:
async function detectFromBase64(base64String) {
// Convert Base64 to buffer
const buffer = Buffer.from(base64String, 'base64');
return fileTypeFromBuffer(buffer);
}
// Example: Base64-encoded JPG snippet
const base64Jpg = '/9j/4AAQSkZJRgABAQEAYABgAAD//gA7Q1JFQVRPUjogZ2QtanBlZyB2MS4wICh1c2luZyBJSkcgSlBFRyB2ODApLCBxdWFsaXR5ID0gOTAK/...';
const type = await detectFromBase64(base64Jpg);
console.log(type.ext); // "jpg" Express.js File Uploads#
In web apps, validate uploaded files (e.g., with Express and multer). Access the buffer from req.file.buffer:
npm install express multer import express from 'express';
import multer from 'multer';
import { fileTypeFromBuffer } from 'file-type';
const app = express();
const upload = multer(); // In-memory storage (use diskStorage for large files)
app.post('/upload', upload.single('file'), async (req, res) => {
try {
const buffer = req.file.buffer;
const type = await fileTypeFromBuffer(buffer);
if (!type || !(type.ext === 'pdf' || type.ext === 'jpg')) {
return res.status(400).send('Only PDF/JPG files are allowed');
}
res.send(`Valid file: ${type.ext}`);
} catch (err) {
res.status(500).send('Error processing file');
}
});
app.listen(3000, () => console.log('Server running on port 3000')); Common Pitfalls and Solutions#
| Pitfall | Solution |
|---|---|
| Relying on filename extensions | Always validate the buffer content (extensions are easily faked). |
| Insufficient buffer size | file-type needs at least the first 4100 bytes for most types. Read enough of the buffer. |
| Corrupted files | Handle undefined results from file-type (file may be corrupted). |
| Streaming without early termination | Use file-type’s fileTypeFromStream to detect early and end the stream. |
Conclusion#
Detecting file types from buffers ensures security and reliability in Node.js applications. By leveraging libraries like file-type and focusing on content-based signatures (magic numbers), you avoid the risks of trusting filenames. Key takeaways:
- Use
file-typefor versatile, promise-based detection. - Validate buffers (not just extensions) for uploaded/processed files.
- Handle streams and large files efficiently with streaming APIs.
By following this guide, you’ll build robust file validation into your Node.js projects.