@gmod/cram
Read CRAM files (indexed or unindexed) with pure JS, works in node or in the browser.
- Reads CRAM 3.x and 2.x (3.1 added in v1.6.0)
- Does not read CRAM 1.x
- Can use .crai indexes out of the box, for efficient sequence fetching, but
- Has preliminary support for bzip2 and lzma codecs. lzma requires the latest
Install
$ npm install --save @gmod/cram
# or
$ yarn add @gmod/cram
Usage
const { IndexedCramFile, CramFile, CraiIndex } = require('@gmod/cram')
// Use indexedfasta library for seqFetch, if using local file (see below)
const { IndexedFasta, BgzipIndexedFasta } = require('@gmod/indexedfasta')
// this uses local file paths for node.js for IndexedFasta, for usages using
// remote URLs see indexedfasta docs for filehandles and
// https://github.com/gmod/generic-filehandle
const t = new IndexedFasta({
path: '/filesystem/yourfile.fa',
faiPath: '/filesystem/yourfile.fa.fai',
})
// example of fetching records from an indexed CRAM file.
// NOTE: only numeric IDs for the reference sequence are accepted.
// For indexedfasta the numeric ID is the order in which the sequence names
// appear in the header
// Wrap in an async and then run
run = async () => {
const idToName = []
const nameToId = {}
// example opening local files on node.js
// can also pass `cramUrl` (for the IndexedCramFile class), and `url` (for
// the CraiIndex) params to open remote URLs
//
// alternatively `cramFilehandle` (for the IndexedCramFile class) and
// `filehandle` (for the CraiIndex) can be used, see for examples
// https://github.com/gmod/generic-filehandle
const indexedFile = new IndexedCramFile({
cramPath: '/filesystem/yourfile.cram',
//or
//cramUrl: 'url/to/file.cram'
//cramFilehandle: a generic-filehandle or similar filehandle
index: new CraiIndex({
path: '/filesystem/yourfile.cram.crai',
// or
// url: 'url/to/file.cram.crai'
// filehandle: a generic-filehandle or similar filehandle
}),
seqFetch: async (seqId, start, end) => {
// note:
// * seqFetch should return a promise for a string, in this instance retrieved from IndexedFasta
// * we use start-1 because cram-js uses 1-based but IndexedFasta uses 0-based coordinates
// * the seqId is a numeric identifier, so we convert it back to a name with idToName
// * you can return an empty string from this function for testing if you want, but you may not get proper interpretation of record.readFeatures
return t.getSequence(idToName[seqId], start - 1, end)
},
checkSequenceMD5: false,
})
const samHeader = await indexedFile.cram.getSamHeader()
// use the @SQ lines in the header to figure out the
// mapping between ref ref ID numbers and names
const sqLines = samHeader.filter(l => l.tag === 'SQ')
sqLines.forEach((sqLine, refId) => {
sqLine.data.forEach(item => {
if (item.tag === 'SN') {
// this is the ref name
const refName = item.value
nameToId[refName] = refId
idToName[refId] = refName
}
})
})
const records = await indexedFile.getRecordsForRange(
nameToId['chr1'],
10000,
20000,
)
records.forEach(record => {
console.log(`got a record named ${record.readName}`)
if (record.readFeatures != undefined) {
record.readFeatures.forEach(({ code, pos, refPos, ref, sub }) => {
// process the read features. this can be used similar to
// CIGAR/MD strings in SAM. see CRAM specs for more details.
if (code === 'X') {
console.log(
`${record.readName} shows a base substitution of ${ref}->${sub} at ${refPos}`,
)
}
})
}
})
}
run()
// can also pass `cramUrl` (for the IndexedCramFile class), and `url` (for the CraiIndex) params to open remote URLs
// alternatively `cramFilehandle` (for the IndexedCramFile class) and `filehandle` (for the CraiIndex) can be used, see for examples https://github.com/gmod/generic-filehandle
You can use cram-js without NPM also with the cram-bundle.js. See the example directory for usage with script tag
API (auto-generated)
- CramRecord - format of CRAM records returned by this API
- IndexedCramFile - indexed access into a CRAM file
- CramFile - .cram API
- CraiIndex - .crai index API
- Error Classes - special error classes thrown by this API
CramRecord
Table of Contents
- isPaired - isProperlyPaired - isSegmentUnmapped - isMateUnmapped - isReverseComplemented - isMateReverseComplemented - isRead1 - isRead2 - isSecondary - isFailedQc - isDuplicate - isSupplementary - isDetached - hasMateDownStream - isPreservingQualityScores - isUnknownBases - getReadBases - getPairOrientation - addReferenceSequence- [Parameters](#parameters)
CramRecord
Class of each CRAM record returned by this API.isPaired
Returns boolean true if the read is paired, regardless of whether both segments are mappedisProperlyPaired
Returns boolean true if the read is paired, and both segments are mappedisSegmentUnmapped
Returns boolean true if the read itself is unmapped; conflictive with isProperlyPairedisMateUnmapped
Returns boolean true if the read itself is unmapped; conflictive with isProperlyPairedisReverseComplemented
Returns boolean true if the read is mapped to the reverse strandisMateReverseComplemented
Returns boolean true if the mate is mapped to the reverse strandisRead1
Returns boolean true if this is read number 1 in a pairisRead2
Returns boolean true if this is read number 2 in a pairisSecondary
Returns boolean true if this is a secondary alignmentisFailedQc
Returns boolean true if this read has failed QC checksisDuplicate
Returns boolean true if the read is an optical or PCR duplicateisSupplementary
Returns boolean true if this is a supplementary alignmentisDetached
Returns boolean true if the read is detachedhasMateDownStream
Returns boolean true if the read has a mate in this same CRAM segmentisPreservingQualityScores
Returns boolean true if the read contains qual scoresisUnknownBases
Returns boolean true if the read has no sequence basesgetReadBases
Get the original sequence of this read.Returns String sequence basepairs
getPairOrientation
Get the pair orientation of a paired read. Adapted from igv.jsReturns String of paired orientatin
addReferenceSequence
Annotates this feature with the given reference sequence basepair information. This will add asub
and a ref
item to base subsitution read features given
the actual substituted and reference base pairs, and will make the
getReadSequence()
method work.Parameters
refRegion
-
refRegion.start
**[number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)**
- refRegion.end
**[number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)**
- refRegion.seq
**[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)**
compressionScheme
CramContainerCompressionScheme
Returns undefined nothing
ReadFeatures
The feature objects appearing in thereadFeatures
member of CramRecord objects
that show insertions, deletions, substitutions, etc.Static fields
- code (
character
): One of "bqBXIDiQNSPH". See page 15 of the CRAM v3 spec
- data (
any
): the data associated with the feature. The format of this
- pos (
number
): location relative to the read (1-based) - refPos (
number
): location relative to the reference (1-based)
IndexedCramFile
Table of Contents
- Parameters - Parameters - Parametersconstructor
Parameters
args
-
args.cram
CramFile
- args.index
Index-like object that supportsgetEntriesForRange(seqId,start,end) -> Promise\[Array\[index entries]]
- args.cacheSize
**[number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)?**
optional maximum number of CRAM records to cache. default 20,000
- args.fetchSizeLimit
**[number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)?**
optional maximum number of bytes to fetch in a single getRecordsForRange
call. Default 3 MiB.
- args.checkSequenceMD5
**[boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)?**
default true. if false, disables verifying the MD5 checksum of the reference
sequence underlying a slice. In some applications, this check can cause an
inconvenient amount (many megabases) of sequences to be fetched.
getRecordsForRange
Parameters
seq
start
end
opts
(optional, default{}
)
hasDataForReferenceSequence
Parameters
seqId
Returns Promise true if the CRAM file contains data for the given reference sequence numerical ID
CramFile
Table of Contents
- Parametersconstructor
Parameters
args
-
args.filehandle
**[object](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Object)?**
a filehandle that implements the stat() and read() methods of the Node
filehandle API <https://nodejs.org/api/fs.html#fs_class_filehandle>
- args.path
**[object](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Object)?**
path to the cram file
- args.url
**[object](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Object)?**
url for the cram file. also supports file:// urls for local files
- args.seqFetch
**[function](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Statements/function)?**
a function with signature `(seqId, startCoordinate, endCoordinate)` that
returns a promise for a string of sequence bases
- args.cacheSize
**[number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)?**
optional maximum number of CRAM records to cache. default 20,000
- args.checkSequenceMD5
**[boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)?**
default true. if false, disables verifying the MD5 checksum of the reference
sequence underlying a slice. In some applications, this check can cause an
inconvenient amount (many megabases) of sequences to be fetched.
containerCount
CraiIndex
Table of Contents
- Parameters - Parameters - Parametersconstructor
Parameters
args
-
args.path
**[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)?**
- args.url
**[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)?**
- args.filehandle
FileHandle?hasDataForReferenceSequence
Parameters
seqId
Returns Promise true if the index contains entries for the given reference sequence ID, false otherwise
getEntriesForRange
fetch index entries for the given rangeParameters
seqId
queryStart
queryEnd
Returns Promise promise for an array of objects of the form
{start, span, containerStart, sliceStart, sliceBytes }
CramUnimplementedError
Extends ErrorError caused by encountering a part of the CRAM spec that has not yet been implemented
CramMalformedError
Extends CramErrorAn error caused by malformed data.
CramBufferOverrunError
Extends CramMalformedErrorAn error caused by attempting to read beyond the end of the defined data.