pdf2text

			Stars	Issues	Version	Updated	Created	Size
	pdf2text		4	0	1.1.0	9 years ago	9 years ago

PDF2Text

Extract text from a pdf into an array of pages / text arrays. Useful for parsing on structured pdf text. Uses no external dependecies other than npm modules.
Modified from Brian C's pdf-text and using Mozilla's pdf.js via pdf2json.

Install

npm install pdf2text

Usage

var pdf2Text = require('pdf2text')
var pathToPdf = __dirname + "/info.pdf"

pdf2Text(pathToPdf).then(function(pages) {
  //pages is an array of string arrays 
  //loosely corresponding to text objects within the pdf
})

//or parse a buffer of pdf data
//this is handy when you already have the pdf in memory
//and don't want to write it to a temp file
var fs = require('fs')
var buffer = fs.readFileSync(pathToPdf)
pdf2Text(buffer).then(function(pages) {

})

Example output of parsing a W4 form:

[[ 'Form W-4 (2013)',
    'Purpose. ',
    'Complete Form W-4 so that your',
    'employer can withhold the correct federal income',
    'tax from your pay. Consider completing a new',
    'Form ',
    'W-4 each year and when your personal or',
    'financial ',
    'situation changes.',
    'Exemption from withholding. ',
    'If you are',
    'exempt, ',
    'complete ',
    ' only  ',
    'lines 1, 2, 3, 4, and 7',
    'and sign the ',
    ...
  ],
  [ ... ]
]

api

pdf2text(string pathToPdfFile): Promise.

Promise returns an array Pages, which contains an array of all the strings on a page. The array is ordered similarly to how the text appears on the page, making it possible to extract key pieces by finding them based on how they relate to other 'known' pieces of text in the page.

pdfText(Buffer bufferOfPdfContents): Promise.

Optionally pass a buffer of pdf data instead of a path to the file.

Downloads in past

Stats

Popular Searches

Readme

Install

Usage

api

pdf2text(string pathToPdfFile): Promise.

pdfText(Buffer bufferOfPdfContents): Promise.

pdf2text

Downloads in past1 Month3 Months6 Months1 Year2 Years5 YearsAll time

Stats

Popular Searches

Readme

Install

Usage

api

pdf2text(string pathToPdfFile): Promise.

pdfText(Buffer bufferOfPdfContents): Promise.

Sick of boring JavaScript newsletters?

Downloads in past