BFJ
Big-Friendly JSON. Asynchronous streaming functions for large JSON data sets.
- Why would I want those?
- Is it fast?
- What functions does it implement?
- How do I install it?
- How do I read a JSON file?
- How do I parse a stream of JSON?
- How do I selectively parse individual items from a JSON stream?
- How do I write a JSON file?
- How do I create a stream of JSON?
- How do I create a JSON string?
- What other methods are there?
- Is it possible to pause parsing or serialisation from calling code?
- Can it handle newline-delimited JSON (NDJSON)?
- Why does it default to bluebird promises?
- Can I specify a different promise implementation?
- Is there a change log?
- How do I set up the dev environment?
- What versions of Node.js does it support?
- What license is it released under?
Why would I want those?
If you need to parse huge JSON strings or stringify huge JavaScript data sets, it monopolises the event loop and can lead to out-of-memory exceptions. BFJ implements asynchronous functions and uses pre-allocated fixed-length arrays to try and alleviate those issues.Is it fast?
No.BFJ yields frequently to avoid monopolising the event loop, interrupting its own execution to let other event handlers run. The frequency of those yields can be controlled with the
yieldRate
option,
but fundamentally it is not designed for speed.Furthermore, when serialising data to a stream, BFJ uses a fixed-length buffer to avoid exhausting available memory. Whenever that buffer is full, serialisation is paused until the receiving stream processes some more data, regardless of the value of
yieldRate
.
You can control the size of the buffer
using the bufferLength
option
but really,
if you need quick results,
BFJ is not for you.What functions does it implement?
Nine functions are exported.Five are concerned with parsing, or turning JSON strings into JavaScript data:
asynchronously parses a JSON file from disk.
are for asynchronously parsing streams of JSON.
selectively parses individual items from a JSON stream.
asynchronously walks a stream, emitting events as it encounters JSON tokens. Analagous to a SAX parsersax.
The other four functions handle the reverse transformations, serialising JavaScript data to JSON:
asynchronously serialises data to a JSON file on disk.
asynchronously serialises data to a stream of JSON.
asynchronously serialises data to a JSON string.
asynchronously traverses a data structure depth-first, emitting events as it encounters items. By default it coerces promises, buffers and iterables to JSON-friendly values.
How do I install it?
If you're using npm:npm i bfj --save
Or if you just want the git repo:
git clone git@gitlab.com:philbooth/bfj.git
How do I read a JSON file?
const bfj = require('bfj');
bfj.read(path, options)
.then(data => {
// :)
})
.catch(error => {
// :(
});
read
returns a bluebird promisepromise and
asynchronously parses
a JSON file
from disk.It takes two arguments; the path to the JSON file and an options object.
If there are no syntax errors, the returned promise is resolved with the parsed data. If syntax errors occur, the promise is rejected with the first error.
How do I parse a stream of JSON?
const bfj = require('bfj');
// By passing a readable stream to bfj.parse():
bfj.parse(fs.createReadStream(path), options)
.then(data => {
// :)
})
.catch(error => {
// :(
});
// ...or by passing the result from bfj.unpipe() to stream.pipe():
request({ url }).pipe(bfj.unpipe((error, data) => {
if (error) {
// :(
} else {
// :)
}
}))
parse
returns a bluebird promisepromise
It takes two arguments; a readable streamreadable from which the JSON will be parsed and an options object.
If there are no syntax errors, the returned promise is resolved with the parsed data. If syntax errors occur, the promise is rejected with the first error.
unpipe
returns a writable streamwritable
stream.pipe
pipe,
then parses JSON data
read from the stream.It takes two arguments; a callback function that will be called after parsing is complete and an options object.
If there are no errors, the callback is invoked with the result as the second argument. If errors occur, the first error is passed the callback as the first argument.
How do I selectively parse individual items from a JSON stream?
const bfj = require('bfj');
// Call match with your stream and a selector predicate/regex/JSONPath/string
const dataStream = bfj.match(jsonStream, selector, options);
// Get data out of the returned stream with event handlers
dataStream.on('data', item => { /* ... */ });
dataStream.on('end', () => { /* ... */);
dataStream.on('error', () => { /* ... */);
dataStream.on('dataError', () => { /* ... */);
// ...or you can pipe it to another stream
dataStream.pipe(someOtherStream);
match
returns a readable, object-mode stream
and asynchronously parses individual matching items
from an input JSON stream.It takes three arguments: a readable streamreadable from which the JSON will be parsed; a selector argument for determining matches, which may be a string, a regular expression, a JSONPath expression, or a predicate function; and an options object.
If the selector is a string, it will be compared to property keys to determine whether each item in the data is a match. If it is a regular expression, the comparison will be made by calling the RegExp
test
methodregexp-test
with the property key.
If it is a JSONPath expression,
it must start with $.
to identify the root node
and only use child
scope expressions for subsequent nodes.
Predicate functions will be called with three arguments:
key
, value
and depth
.
If the result of the predicate is a truthy value
then the item will be deemed a match.In addition to the regular options accepted by other parsing functions, you can also specify
minDepth
to only apply the selector
to certain depths.
This can improve performance
and memory usage,
if you know that
you're not interested in
parsing top-level items.If there are any syntax errors in the JSON, a
dataError
event will be emitted.
If any other errors occur,
an error
event will be emitted.How do I write a JSON file?
const bfj = require('bfj');
bfj.write(path, data, options)
.then(() => {
// :)
})
.catch(error => {
// :(
});
write
returns a bluebird promisepromise
and asynchronously serialises a data structure
to a JSON file on disk.
The promise is resolved
when the file has been written,
or rejected with the error
if writing failed.It takes three arguments; the path to the JSON file, the data structure to serialise and an options object.
How do I create a stream of JSON?
const bfj = require('bfj');
const stream = bfj.streamify(data, options);
// Get data out of the stream with event handlers
stream.on('data', chunk => { /* ... */ });
stream.on('end', () => { /* ... */);
stream.on('error', () => { /* ... */);
stream.on('dataError', () => { /* ... */);
// ...or you can pipe it to another stream
stream.pipe(someOtherStream);
streamify
returns a readable streamreadable
and asynchronously serialises
a data structure to JSON,
pushing the result
to the returned stream.It takes two arguments; the data structure to serialise and an options object.
If there a circular reference is encountered in the data and
options.circular
is not set to 'ignore'
,
a dataError
event will be emitted.
If any other errors occur,
an error
event will be emitted.How do I create a JSON string?
const bfj = require('bfj');
bfj.stringify(data, options)
.then(json => {
// :)
})
.catch(error => {
// :(
});
stringify
returns a bluebird promisepromise and
asynchronously serialises a data structure
to a JSON string.
The promise is resolved
to the JSON string
when serialisation is complete.It takes two arguments; the data structure to serialise and an options object.
What other methods are there?
bfj.walk (stream, options)
const bfj = require('bfj');
const emitter = bfj.walk(fs.createReadStream(path), options);
emitter.on(bfj.events.array, () => { /* ... */ });
emitter.on(bfj.events.object, () => { /* ... */ });
emitter.on(bfj.events.property, name => { /* ... */ });
emitter.on(bfj.events.string, value => { /* ... */ });
emitter.on(bfj.events.number, value => { /* ... */ });
emitter.on(bfj.events.literal, value => { /* ... */ });
emitter.on(bfj.events.endArray, () => { /* ... */ });
emitter.on(bfj.events.endObject, () => { /* ... */ });
emitter.on(bfj.events.error, error => { /* ... */ });
emitter.on(bfj.events.dataError, error => { /* ... */ });
emitter.on(bfj.events.end, () => { /* ... */ });
walk
returns an event emittereventemitter
and asynchronously walks
a stream of JSON data,
emitting events
as it encounters
tokens.It takes two arguments; a readable streamreadable from which the JSON will be read and an options object.
The emitted events are defined as public properties of an object,
bfj.events
:bfj.events.array
[
character.bfj.events.endArray
]
character.bfj.events.object
{
character.bfj.events.endObject
}
character.bfj.events.property
bfj.events.string
bfj.events.number
bfj.events.literal
true
, false
or null
)
has been encountered.
The listener
will be passed
the value
as its argument.bfj.events.error
Error
instance
as its argument.bfj.events.dataError
Error
instance
decorated with actual
, expected
, lineNumber
and columnNumber
properties
as its argument.bfj.events.end
bfj.events.endLine
ndjson
option is set.If you are using
bfj.walk
to sequentially parse items in an array,
you might also be interested in
the bfj-collections module.bfj.eventify (data, options)
const bfj = require('bfj');
const emitter = bfj.eventify(data, options);
emitter.on(bfj.events.array, () => { /* ... */ });
emitter.on(bfj.events.object, () => { /* ... */ });
emitter.on(bfj.events.property, name => { /* ... */ });
emitter.on(bfj.events.string, value => { /* ... */ });
emitter.on(bfj.events.number, value => { /* ... */ });
emitter.on(bfj.events.literal, value => { /* ... */ });
emitter.on(bfj.events.endArray, () => { /* ... */ });
emitter.on(bfj.events.endObject, () => { /* ... */ });
emitter.on(bfj.events.error, error => { /* ... */ });
emitter.on(bfj.events.dataError, error => { /* ... */ });
emitter.on(bfj.events.end, () => { /* ... */ });
eventify
returns an event emittereventemitter
and asynchronously traverses
a data structure depth-first,
emitting events as it
encounters items.
By default it coerces
promises, buffers and iterables
to JSON-friendly values.It takes two arguments; the data structure to traverse and an options object.
The emitted events are defined as public properties of an object,
bfj.events
:bfj.events.array
bfj.events.endArray
bfj.events.object
bfj.events.endObject
bfj.events.property
bfj.events.string
bfj.events.number
bfj.events.literal
true
, false
or null
)
has been encountered.
The listener
will be passed
the value
as its argument.bfj.events.error
Error
instance
as its argument.bfj.events.dataError
circular
option was not set to 'ignore'
.
The listener
will be passed
an Error
instance
as its argument.bfj.events.end
What options can I specify?
Options for parsing functions
options.reviver
:
options.yieldRate
:
16384
.options.Promise
:
options.ndjson
:
true
,
newline characters at the root level
will be treated as delimiters between
discrete chunks of JSON.
See NDJSON for more information.options.numbers
:
bfj.match
only,
set this to true
if you wish to match against numbers
with a string or regular expression
selector
argument.options.bufferLength
:
bfj.match
only,
the length of the match buffer.
Smaller values use less memory
but may result in a slower parse time.
The default value is 1024
.options.highWaterMark
:
bfj.match
only,
set this if you would like to
pass a value for the highWaterMark
option
to the readable stream constructor.Options for serialisation functions
options.space
:
options.promises
:
'ignore'
for improved performance
if you don't need
to coerce promises.options.buffers
:
toString
method.
Set this property
to 'ignore'
for improved performance
if you don't need
to coerce buffers.options.maps
:
'ignore'
for improved performance
if you don't need
to coerce maps.options.iterables
:
'ignore'
for improved performance
if you don't need
to coerce iterables.options.circular
:
'ignore'
if you'd prefer
to silently skip past
circular references
in the data.options.bufferLength
:
1024
.options.highWaterMark
:
highWaterMark
option
to the readable stream constructor.options.yieldRate
:
16384
.options.Promise
:
Is it possible to pause parsing or serialisation from calling code?
Yes it is! Bothwalk
and eventify
decorate their returned event emitters
with a pause
method
that will prevent any further events being emitted.
The pause
method itself
returns a resume
function
that you can call to indicate
that processing should continue.For example:
const bfj = require('bfj');
const emitter = bfj.walk(fs.createReadStream(path), options);
// Later, when you want to pause parsing:
const resume = emitter.pause();
// Then when you want to resume:
resume();
Can it handle newline-delimited JSON (NDJSON)?
Yes. If you pass thendjson
option
to bfj.walk
, bfj.match
or bfj.parse
,
newline characters at the root level
will act as delimiters between
discrete JSON values:bfj.walk
will emit abfj.events.endLine
event
bfj.match
will just ignore the newlines
bfj.parse
will resolve with the first value
undefined
(undefined
is not a valid JSON token).bfj.unpipe
and bfj.read
will not parse NDJSON.Why does it default to bluebird promises?
Until version4.2.4
,
native promises were used.
But they were found
to cause out-of-memory errors
when serialising large amounts of data to JSON,
due to well-documented problems
with the native promise implementation.
So in version 5.0.0
,
bluebird promises were used instead.
In version 5.1.0
,
an option was added
that enables callers to specify
the promise constructor to use.
Use it at your own risk.Can I specify a different promise implementation?
Yes. Just pass thePromise
option
to any method.
If you get out-of-memory errors
when using that option,
consider changing your promise implementation.Is there a change log?
Yeshistory.How do I set up the dev environment?
The development environment relies on Node.jsnode, ESLint, Mocha, Chai, Proxyquire and Spooks. Assuming that you already have node and NPM set up, you just need to runnpm install
to install
all of the dependencies
as listed in package.json
.You can lint the code with the command
npm run lint
.You can run the tests with the command
npm test
.What versions of Node.js does it support?
As of version8.0.0
,
only Node.js versions 18 or greater
are supported.Between versions
3.0.0
and 6.1.2
,
only Node.js versions 6 or greater
were supported.Until version
2.1.2
,
only Node.js versions 4 or greater
were supported.