5月28日 07:18

How does Puppeteer implement page screenshots and PDF generation? What are the advanced options and practical use cases?

Puppeteer provides powerful page screenshot and PDF generation capabilities, which can be used for automated testing, document generation, web archiving, and many other scenarios.

1. Page Screenshots

Basic Screenshot:

javascript
const puppeteer = require('puppeteer'); (async () => { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://example.com'); // Basic screenshot await page.screenshot({ path: 'example.png' }); await browser.close(); })();

Screenshot Options:

javascript
await page.screenshot({ path: 'screenshot.png', // Save path type: 'png', // Format: 'png' or 'jpeg' quality: 90, // JPEG quality (0-100), only for JPEG fullPage: true, // Capture entire page (including scrolled content) clip: { // Clip region x: 0, y: 0, width: 800, height: 600 }, omitBackground: false, // Omit white background (transparent PNG) encoding: 'base64', // Encoding: 'base64' or 'binary' captureBeyondViewport: false // Capture content outside viewport });

Screenshot Specific Element:

javascript
const element = await page.$('#header'); await element.screenshot({ path: 'header.png' });

Screenshot Viewport:

javascript
await page.setViewport({ width: 1920, height: 1080 }); await page.screenshot({ path: 'viewport.png' });

Full Page Screenshot:

javascript
await page.screenshot({ path: 'fullpage.png', fullPage: true });

High Quality JPEG:

javascript
await page.screenshot({ path: 'high-quality.jpg', type: 'jpeg', quality: 95 });

Transparent Background:

javascript
await page.screenshot({ path: 'transparent.png', omitBackground: true });

Get Screenshot as Base64:

javascript
const base64 = await page.screenshot({ encoding: 'base64' }); console.log(base64);

2. PDF Generation

Basic PDF:

javascript
await page.pdf({ path: 'page.pdf' });

PDF Options:

javascript
await page.pdf({ path: 'output.pdf', // Save path scale: 1, // Scale factor displayHeaderFooter: false, // Display header/footer headerTemplate: '', // Header HTML template footerTemplate: '', // Footer HTML template printBackground: false, // Print background graphics landscape: false, // Landscape orientation pageRanges: '', // Page ranges, e.g., '1-5, 8, 11-13' format: 'A4', // Paper format width: '', // Paper width, e.g., '10in' height: '', // Paper height, e.g., '20in' margin: { // Margins top: '1cm', right: '1cm', bottom: '1cm', left: '1cm' }, preferCSSPageSize: false // Use CSS page size });

Supported Paper Formats:

  • Letter: 8.5in x 11in
  • Legal: 8.5in x 14in
  • Tabloid: 11in x 17in
  • Ledger: 17in x 11in
  • A0: 33.1in x 46.8in
  • A1: 23.4in x 33.1in
  • A2: 16.5in x 23.4in
  • A3: 11.7in x 16.5in
  • A4: 8.27in x 11.7in
  • A5: 5.83in x 8.27in
  • A6: 4.13in x 5.83in

Landscape PDF:

javascript
await page.pdf({ path: 'landscape.pdf', landscape: true, format: 'A4' });

Custom Paper Size:

javascript
await page.pdf({ path: 'custom.pdf', width: '200mm', height: '300mm' });

Set Margins:

javascript
await page.pdf({ path: 'margin.pdf', margin: { top: '20px', right: '20px', bottom: '20px', left: '20px' } });

Print Background Graphics:

javascript
await page.pdf({ path: 'background.pdf', printBackground: true });

Add Header/Footer:

javascript
await page.pdf({ path: 'header-footer.pdf', displayHeaderFooter: true, headerTemplate: ` <div style="font-size: 10px; text-align: center; width: 100%;"> Generated by Puppeteer </div> `, footerTemplate: ` <div style="font-size: 10px; text-align: center; width: 100%;"> Page <span class="pageNumber"></span> of <span class="totalPages"></span> </div> ` });

Print Specific Pages:

javascript
await page.pdf({ path: 'pages.pdf', pageRanges: '1-3, 5, 8-10' });

3. Practical Use Cases

Use Case 1: Web Archiving

javascript
async function archiveWebpage(url, outputPath) { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto(url, { waitUntil: 'networkidle2' }); // Generate PDF archive await page.pdf({ path: outputPath, format: 'A4', printBackground: true, margin: { top: '1cm', right: '1cm', bottom: '1cm', left: '1cm' } }); await browser.close(); } archiveWebpage('https://example.com', 'archive.pdf');

Use Case 2: Batch Screenshot Service

javascript
async function batchScreenshots(urls) { const browser = await puppeteer.launch(); const page = await browser.newPage(); for (const url of urls) { await page.goto(url, { waitUntil: 'networkidle2' }); const filename = url .replace(/https?:\/\//, '') .replace(/\//g, '_') + '.png'; await page.screenshot({ path: `screenshots/${filename}`, fullPage: true }); console.log(`Screenshot saved: ${filename}`); } await browser.close(); } batchScreenshots([ 'https://example.com', 'https://example.com/about', 'https://example.com/contact' ]);

Use Case 3: Generate Invoice PDF

javascript
async function generateInvoice(invoiceData) { const browser = await puppeteer.launch(); const page = await browser.newPage(); // Load invoice template await page.setContent(` <html> <head> <style> body { font-family: Arial, sans-serif; padding: 40px; } .header { text-align: center; margin-bottom: 40px; } .invoice-info { margin-bottom: 30px; } table { width: 100%; border-collapse: collapse; } th, td { border: 1px solid #ddd; padding: 10px; text-align: left; } th { background-color: #f2f2f2; } .total { text-align: right; font-weight: bold; margin-top: 20px; } </style> </head> <body> <div class="header"> <h1>INVOICE</h1> <p>Invoice #: ${invoiceData.number}</p> </div> <div class="invoice-info"> <p>Date: ${invoiceData.date}</p> <p>Customer: ${invoiceData.customer}</p> </div> <table> <thead> <tr> <th>Item</th> <th>Quantity</th> <th>Price</th> <th>Total</th> </tr> </thead> <tbody> ${invoiceData.items.map(item => ` <tr> <td>${item.name}</td> <td>${item.quantity}</td> <td>$${item.price}</td> <td>$${item.quantity * item.price}</td> </tr> `).join('')} </tbody> </table> <div class="total"> Total: $${invoiceData.total} </div> </body> </html> `); // Generate PDF await page.pdf({ path: `invoice_${invoiceData.number}.pdf`, format: 'A4', printBackground: true, margin: { top: '20px', right: '20px', bottom: '20px', left: '20px' } }); await browser.close(); } generateInvoice({ number: 'INV-001', date: '2024-01-15', customer: 'John Doe', items: [ { name: 'Product A', quantity: 2, price: 50 }, { name: 'Product B', quantity: 1, price: 75 } ], total: 175 });

Use Case 4: Responsive Design Test Screenshots

javascript
async function responsiveScreenshots(url) { const browser = await puppeteer.launch(); const page = await browser.newPage(); const viewports = [ { name: 'mobile', width: 375, height: 667 }, { name: 'tablet', width: 768, height: 1024 }, { name: 'desktop', width: 1920, height: 1080 } ]; for (const viewport of viewports) { await page.setViewport(viewport); await page.goto(url, { waitUntil: 'networkidle2' }); await page.screenshot({ path: `${viewport.name}.png`, fullPage: true }); console.log(`Screenshot saved: ${viewport.name}.png`); } await browser.close(); } responsiveScreenshots('https://example.com');

4. Performance Optimization Tips

1. Parallel Processing:

javascript
const urls = ['url1', 'url2', 'url3']; const browser = await puppeteer.launch(); await Promise.all(urls.map(async (url, index) => { const page = await browser.newPage(); await page.goto(url); await page.screenshot({ path: `screenshot-${index}.png` }); await page.close(); })); await browser.close();

2. Reuse Browser Instance:

javascript
const browser = await puppeteer.launch(); // Reuse same browser instance multiple times for (const url of urls) { const page = await browser.newPage(); await page.goto(url); await page.screenshot({ path: `${url}.png` }); await page.close(); } await browser.close();

3. Disable Unnecessary Resources:

javascript
await page.setRequestInterception(true); page.on('request', (request) => { if (['image', 'font', 'media'].includes(request.resourceType())) { request.abort(); } else { request.continue(); } });

5. Important Notes

  1. PDF Generation Limitation: PDF generation is only available in headless mode
  2. Font Support: Ensure required fonts are installed on the system
  3. Page Loading: Use waitUntil: 'networkidle2' to ensure page is fully loaded
  4. Memory Management: Monitor memory usage when processing many pages
  5. Error Handling: Add appropriate error handling logic
  6. Timeout Settings: Adjust timeout based on page complexity
标签:Puppeteer