pdfanno

<p align="center"><img src="https://github.com/paperai/pdfanno/blob/master/pdfanno.gif" width="850"></p>

Stats

StarsIssuesVersionUpdatedCreatedSize
pdfanno
160140.3.14 years ago5 years agoMinified + gzip package size for pdfanno in KB

Readme

PDFAnno

PDFAnno is a browser-based linguistic annotation tool for PDF documents.
It offers functions for annotating PDF with labels and relations.
For natural language processing and machine learning, it is suitable for development of gold-standard data with named entity spans, dependency relations, and coreference chains.

If you install PDFAnno locally,

git clone https://github.com/paperai/pdfanno.git
cd pdfanno
npm install

or

npm install pdfanno

See the developer's guide for more details.

Usage

  1. Visit the online demo with the latest version of Chrome.
  2. Load your PDF and annotation file (if any). Sample PDFs and annotations are downloadable from here.
  3. Annotate the PDF as you like.
  4. Save your annotations via button.
    If you continue the annotation, respecify your directory via Browse button to reload the PDF and anno file.

For security reasons, PDFAnno does NOT automatically save your annotations.
Don't forget to download your current annotations!

Annotation Tools

Icon Description
Span highlighting. It is disallowed to cross page boundaries.
One-way relation. This is used for annotating dependency relation between spans.
Two-way relation.
Link relation. If you want to add non-directional relation between spans, use this.
Rectangle. It is disallowed to cross page boundaries.

Annotation File (.anno)

In PDFAnno, the annotation file (.anno) follows TOML format.
Here is an example of anno file:

version = 0.3

[1]
type = "span"
page = 1
position = [["95.818", "252.977", "181.761", "10.909"], ["95.818", "264.806", "107.136", "10.909"]]
label = "label-1"

[2]
type = "span"
page = 1
position = [["323.863", "230.715", "213.988", "11.590"], ["313.125", "244.522", "224.829", "10.795"]]
label = "label-2"

[3]
type = "rect"
page = 1
position = ["323.863", "230.715", "213.988", "11.590"]
label = "label-3"

[4]
type = "relation"
dir = "two-way"
ids = ["1", "2"]
label = "label-4"

where position indicates (x, y, width, height) of the annotation.

Reference Anno File

To support multi-user annotation, PDFAnno allows to load reference anno file.
For example, if you create a.anno and an another annotator creates b.anno for the same PDF, load a.anno as usual, and load b.anno as a reference file. Then PDFAnno renders a.anno and b.anno with different colors each other. Rendering more than one reference file is also supported.
This is useful to check inter-annotator agreement and resolving annotation conflicts.
Note that the reference files are rendered as read-only.

Annotation API

PDFAnno provides annotation API.

Span

var span = new SpanAnnotation({
  page: 1,
  position:
 [["139.03536681054345","60.237086766202694","155.97302418023767","14.366197183098592"]],
  label: 'orange',
  text: 'Ready?',
  id: 1
});
window.add(span);
window.delete(span);

Relation

var rel = new RelationAnnotation({
  dir: 'link',
  ids: ["1","2"],
  label: 'sample'
});
window.add(rel);
window.delete(rel);

Rectangle

var rect = new RectAnnotation({
  page:1,
  position:["9.24324324324326","435.94054054054055","235.7027027027027","44.65945945945946"],
  label: 'rect-label',
  id: 2
});
window.add(rect);
window.delete(rect);

Read from TOML or JSON

var toml = `

version = 0.2

[1]
type = "span"
page = 1
position = [["139.03536681054345","60.237086766202694","155.97302418023767","14.366197183098592"]]
label = "orange"
text = "Ready?"
`;

var anno = readTOML(toml);
var annoObj = window.addAll(anno);
window.delete(annoObj["1"]);

// delete all annotations
window.clear();

Developer's Guide

PDFAnno is built upon pdf.js for PDF viewer. We implement custom layers for rendering annotations on pdf.js.

Install and Build

First, install Node.js and npm. The version of Node.js must be 6+.
Then, run the following commands:

npm install
npm run publish:latest

where the output is on docs/latest, and you can access PDFAnno via docs/latest/index.html.

For developing,

npm run dev

This command starts Webpack Dev Server and you can access http://localhost:8080/dist/index.html in your browser.

Authors

LICENSE

MIT

If you find any bugs or have a feature request, please open an issue on github!

The npm package download data comes from npm's download counts api and package details come from npms.io.