@gmod/cram

Read CRAM files (indexed or unindexed) with pure JS, works in node or in the browser.

Reads CRAM 3.x and 2.x (3.1 added in v1.6.0)
Does not read CRAM 1.x
Can use .crai indexes out of the box, for efficient sequence fetching, but

also has an index API that would allow use with other index types

Has preliminary support for bzip2 and lzma codecs. lzma requires the latest

@gmod/cram version, and uses webassembly. If you find you are unable to compile it, you can try downgrading

Install

$ npm install --save @gmod/cram
# or
$ yarn add @gmod/cram

Usage

const { IndexedCramFile, CramFile, CraiIndex } = require('@gmod/cram')

// Use indexedfasta library for seqFetch, if using local file (see below)
const { IndexedFasta, BgzipIndexedFasta } = require('@gmod/indexedfasta')

// this uses local file paths for node.js for IndexedFasta, for usages using
// remote URLs see indexedfasta docs for filehandles and
// https://github.com/gmod/generic-filehandle
const t = new IndexedFasta({
  path: '/filesystem/yourfile.fa',
  faiPath: '/filesystem/yourfile.fa.fai',
})

// example of fetching records from an indexed CRAM file.
// NOTE: only numeric IDs for the reference sequence are accepted.
// For indexedfasta the numeric ID is the order in which the sequence names
// appear in the header

// Wrap in an async and then run
run = async () => {
  const idToName = []
  const nameToId = {}

  // example opening local files on node.js
  // can also pass `cramUrl` (for the IndexedCramFile class), and `url` (for
  // the CraiIndex) params to open remote URLs
  //
  // alternatively `cramFilehandle` (for the IndexedCramFile class) and
  // `filehandle` (for the CraiIndex) can be used,  see for examples
  // https://github.com/gmod/generic-filehandle

  const indexedFile = new IndexedCramFile({
    cramPath: '/filesystem/yourfile.cram',
    //or
    //cramUrl: 'url/to/file.cram'
    //cramFilehandle: a generic-filehandle or similar filehandle
    index: new CraiIndex({
      path: '/filesystem/yourfile.cram.crai',
      // or
      // url: 'url/to/file.cram.crai'
      // filehandle: a generic-filehandle or similar filehandle
    }),
    seqFetch: async (seqId, start, end) => {
      // note:
      // * seqFetch should return a promise for a string, in this instance retrieved from IndexedFasta
      // * we use start-1 because cram-js uses 1-based but IndexedFasta uses 0-based coordinates
      // * the seqId is a numeric identifier, so we convert it back to a name with idToName
      // * you can return an empty string from this function for testing if you want, but you may not get proper interpretation of record.readFeatures
      return t.getSequence(idToName[seqId], start - 1, end)
    },
    checkSequenceMD5: false,
  })
  const samHeader = await indexedFile.cram.getSamHeader()

  // use the @SQ lines in the header to figure out the
  // mapping between ref ref ID numbers and names

  const sqLines = samHeader.filter(l => l.tag === 'SQ')
  sqLines.forEach((sqLine, refId) => {
    sqLine.data.forEach(item => {
      if (item.tag === 'SN') {
        // this is the ref name
        const refName = item.value
        nameToId[refName] = refId
        idToName[refId] = refName
      }
    })
  })

  const records = await indexedFile.getRecordsForRange(
    nameToId['chr1'],
    10000,
    20000,
  )
  records.forEach(record => {
    console.log(`got a record named ${record.readName}`)
    if (record.readFeatures != undefined) {
      record.readFeatures.forEach(({ code, pos, refPos, ref, sub }) => {
        // process the read features. this can be used similar to
        // CIGAR/MD strings in SAM. see CRAM specs for more details.
        if (code === 'X') {
          console.log(
            `${record.readName} shows a base substitution of ${ref}->${sub} at ${refPos}`,
          )
        }
      })
    }
  })
}

run()

// can also pass `cramUrl` (for the IndexedCramFile class), and `url` (for the CraiIndex) params to open remote URLs
// alternatively `cramFilehandle` (for the IndexedCramFile class) and `filehandle` (for the CraiIndex) can be used,  see for examples https://github.com/gmod/generic-filehandle

You can use cram-js without NPM also with the cram-bundle.js. See the example directory for usage with script tag

API (auto-generated)

CramRecord - format of CRAM records returned by this API

- ReadFeatures - format of read features on records

IndexedCramFile - indexed access into a CRAM file
CramFile - .cram API
CraiIndex - .crai index API
Error Classes - special error classes thrown by this API

CramRecord

CramRecord

- isPaired - isProperlyPaired - isSegmentUnmapped - isMateUnmapped - isReverseComplemented - isMateReverseComplemented - isRead1 - isRead2 - isSecondary - isFailedQc - isDuplicate - isSupplementary - isDetached - hasMateDownStream - isPreservingQualityScores - isUnknownBases - getReadBases - getPairOrientation - addReferenceSequence

- [Parameters](#parameters)

CramRecord

Class of each CRAM record returned by this API.

isPaired

Returns boolean true if the read is paired, regardless of whether both segments are mapped
isProperlyPaired
Returns boolean true if the read is paired, and both segments are mapped
isSegmentUnmapped
Returns boolean true if the read itself is unmapped; conflictive with isProperlyPaired
isMateUnmapped
Returns boolean true if the read itself is unmapped; conflictive with isProperlyPaired
isReverseComplemented
Returns boolean true if the read is mapped to the reverse strand
isMateReverseComplemented
Returns boolean true if the mate is mapped to the reverse strand
isRead1
Returns boolean true if this is read number 1 in a pair
isRead2
Returns boolean true if this is read number 2 in a pair
isSecondary
Returns boolean true if this is a secondary alignment
isFailedQc
Returns boolean true if this read has failed QC checks
isDuplicate
Returns boolean true if the read is an optical or PCR duplicate
isSupplementary
Returns boolean true if this is a supplementary alignment
isDetached
Returns boolean true if the read is detached
hasMateDownStream
Returns boolean true if the read has a mate in this same CRAM segment
isPreservingQualityScores
Returns boolean true if the read contains qual scores
isUnknownBases
Returns boolean true if the read has no sequence bases
getReadBases
Get the original sequence of this read.
Returns String sequence basepairs
getPairOrientation
Get the pair orientation of a paired read. Adapted from igv.js
Returns String of paired orientatin
addReferenceSequence
Annotates this feature with the given reference sequence basepair information. This will add a sub and a ref item to base subsitution read features given the actual substituted and reference base pairs, and will make the getReadSequence() method work.
Parameters
refRegion
object
- refRegion.start
**[number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)**
- refRegion.end
**[number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)**
- refRegion.seq
**[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)**
compressionScheme CramContainerCompressionScheme

Returns undefined nothing
ReadFeatures
The feature objects appearing in the readFeatures member of CramRecord objects that show insertions, deletions, substitutions, etc.
Static fields
code (character): One of "bqBXIDiQNSPH". See page 15 of the CRAM v3 spec
for their meanings.
data (any): the data associated with the feature. The format of this
varies depending on the feature code.
pos (number): location relative to the read (1-based)
refPos (number): location relative to the reference (1-based)
IndexedCramFile
Table of Contents
constructor
- Parameters

getRecordsForRange

- Parameters

hasDataForReferenceSequence

- Parameters

constructor

Parameters

args

object
- args.cram CramFile - args.index Index-like object that supports
getEntriesForRange(seqId,start,end) -> Promise\[Array\[index entries]]
- args.cacheSize
**[number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)?** optional maximum number of CRAM records to cache. default 20,000
- args.fetchSizeLimit
**[number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)?** optional maximum number of bytes to fetch in a single getRecordsForRange call. Default 3 MiB.
- args.checkSequenceMD5
**[boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)?** default true. if false, disables verifying the MD5 checksum of the reference sequence underlying a slice. In some applications, this check can cause an inconvenient amount (many megabases) of sequences to be fetched.
getRecordsForRange
Parameters
seq
number numeric ID of the reference sequence
start
number start of the range of interest. 1-based closed coordinates.
end
number end of the range of interest. 1-based closed coordinates.
opts (optional, default {})
hasDataForReferenceSequence
Parameters
seqId
number
Returns Promise true if the CRAM file contains data for the given reference sequence numerical ID
CramFile
Table of Contents
constructor
- Parameters

containerCount

constructor

Parameters

args

object
- args.filehandle
**[object](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Object)?** a filehandle that implements the stat() and read() methods of the Node filehandle API <https://nodejs.org/api/fs.html#fs_class_filehandle>
- args.path
**[object](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Object)?** path to the cram file
- args.url
**[object](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Object)?** url for the cram file. also supports file:// urls for local files
- args.seqFetch
**[function](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Statements/function)?** a function with signature `(seqId, startCoordinate, endCoordinate)` that returns a promise for a string of sequence bases
- args.cacheSize
**[number](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Number)?** optional maximum number of CRAM records to cache. default 20,000
- args.checkSequenceMD5
**[boolean](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Boolean)?** default true. if false, disables verifying the MD5 checksum of the reference sequence underlying a slice. In some applications, this check can cause an inconvenient amount (many megabases) of sequences to be fetched.
containerCount
CraiIndex
Table of Contents
constructor
- Parameters

hasDataForReferenceSequence

- Parameters

getEntriesForRange

- Parameters

constructor

Parameters

args

object
- args.path
**[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)?**
- args.url
**[string](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/String)?**
- args.filehandle FileHandle?hasDataForReferenceSequence
Parameters
seqId
number
Returns Promise true if the index contains entries for the given reference sequence ID, false otherwise
getEntriesForRange
fetch index entries for the given range
Parameters
seqId
number
queryStart
number
queryEnd
number
Returns Promise promise for an array of objects of the form {start, span, containerStart, sliceStart, sliceBytes }
CramUnimplementedError
Extends Error
Error caused by encountering a part of the CRAM spec that has not yet been implemented
CramMalformedError
Extends CramError
An error caused by malformed data.
CramBufferOverrunError
Extends CramMalformedError
An error caused by attempting to read beyond the end of the defined data.
Academic Use
This package was written with funding from the NHGRI as part of the JBrowse project. If you use it in an academic project that you publish, please cite the most recent JBrowse paper, which will be linked from jbrowse.org.

@gmod/cram

Downloads in past1 Month3 Months6 Months1 Year2 Years5 YearsAll time

Stats

Popular Searches

Readme

Install

Usage

API (auto-generated)

CramRecord

Table of Contents

CramRecord

isPaired

isProperlyPaired

isSegmentUnmapped

isMateUnmapped

isReverseComplemented

isMateReverseComplemented

isRead1

isRead2

isSecondary

isFailedQc

isDuplicate

isSupplementary

isDetached

hasMateDownStream

isPreservingQualityScores

isUnknownBases

getReadBases

getPairOrientation

addReferenceSequence

Parameters

ReadFeatures

Static fields

IndexedCramFile

Table of Contents

constructor

Parameters

getRecordsForRange

Parameters

hasDataForReferenceSequence

Parameters

CramFile

Table of Contents

constructor

Parameters

containerCount

CraiIndex

Table of Contents

constructor

Parameters

hasDataForReferenceSequence

Parameters

getEntriesForRange

Parameters

CramUnimplementedError

CramMalformedError

CramBufferOverrunError

Academic Use

License

Sick of boring JavaScript newsletters?

Downloads in past