HTML Labelizer Documentation

A client-side web application for semantic annotation of HTML documents with hierarchical label schemas, pattern-based labeling, and coreference resolution capabilities.

Quick Start

Get started with HTML Labelizer in five simple steps:

Open the application: Navigate to the labelizer tool
Load your document: Drag and drop an HTML file or click "Load HTML File"
Define your schema: Click "Add Label" to create your annotation labels
Annotate content: Select text in the rendered view and apply labels from the context menu
Save your work: Click "Download" () to export manually, or enable automatic backups () to save every 15 minutes

Privacy First

All processing happens locally in your browser. No data is transmitted to any server.

Key Features

Designed for information retrieval research, coreference resolution, entity disambiguation, and semantic annotation tasks in legal, academic, and research contexts.

Hierarchical Labels

Create nested label structures up to 2 levels deep with custom attributes

Pattern Search

Find and label all occurrences of text patterns automatically with bulk apply

Coreference Resolution

Link related annotations with synchronized attributes across the document

IAA Comparison Tool

Compare two annotated documents with Inter-Annotator Agreement analysis and visual statistics

No Installation Required

Web-based tool with no downloads, setup, or configuration needed. Just open and start annotating

Free, Open Source & Private

Completely free to use with open source code on GitHub. All data stays in your browser

Real-time Statistics

Track annotation progress with live label counts and distribution

Persistent Storage

Schemas and annotations embedded directly in HTML for portability

Hierarchical Labels

HTML Labelizer supports creating complex, nested label structures with up to 2 levels of hierarchy. Each label can have custom attributes and child sublabels.

Label Structure

Labels are organized in a tree structure where:

Parent labels represent high-level semantic categories
Sublabels provide fine-grained classification within parent categories
Attributes store metadata about the labeled content

Why Use Sublabels? A Comparison

When annotating text, you can store information either as attributes (metadata) or as sublabels (nested annotations). The key difference is that sublabels preserve the location of information within the text.

Example 1: Book Citation with Attributes Only

This approach stores all information as metadata but loses positional information:

book (parent label)
├── Attributes:
│   ├── book_id (string) - Unique identifier
│   ├── title (string) - Book title
│   ├── author (string) - Author name
│   ├── year (string) - Publication year
│   ├── publisher (string) - Publisher name
│   └── isbn (string) - ISBN number
└── No sublabels

Text example:

"The Great Gatsby by F. Scott Fitzgerald (1925)"

Annotation: The entire text is wrapped in one label with attributes, but you cannot identify where exactly "The Great Gatsby" or "F. Scott Fitzgerald" appears in the text.

Example 2: Book Citation with Sublabels

This approach preserves the exact location of each piece of information:

book (parent label)
├── Attributes:
│   ├── book_id (string) - Unique identifier for coreference
│   ├── genre (dropdown: fiction, non-fiction, academic, reference)
│   └── language (dropdown: English, French, Spanish, other)
└── Sublabels:
    ├── title
    │   ├── title_type (dropdown: main, subtitle, series)
    │   └── abbreviated (checkbox)
    ├── author
    │   ├── author_type (dropdown: primary, co-author, editor)
    │   └── full_name (checkbox)
    ├── publication
    │   ├── year (string)
    │   └── publisher (string)
    └── identifier
        ├── isbn (string)
        └── doi (string)

Text example:

"The Great Gatsby by F. Scott Fitzgerald (1925)"

Annotation with sublabels: You can mark exactly where each element appears:

"The Great Gatsby" is labeled with the title sublabel
"F. Scott Fitzgerald" is labeled with the author sublabel
"1925" is labeled with the publication sublabel

When to Use Sublabels

Use sublabels when you need to preserve the position of information within text. Use attributes when you only need to store metadata about the entire labeled span. Often, a combination of both is ideal: attributes for general metadata, sublabels for locating specific elements.

Label Attributes

Attributes provide metadata and additional context for labeled content. There are three data types and three functional types of attributes:

Attribute Data Types

String Attributes

Free-text fields for flexible data entry. Use for identifiers, names, or any textual metadata.

book_id: "gatsby-1925"
isbn: "978-0-7432-7356-5"
author: "F. Scott Fitzgerald"

Dropdown Attributes

Controlled vocabulary with predefined options. Ensures consistent annotation and enables filtering.

genre: ["fiction", "non-fiction", "academic", "reference"]
language: ["English", "French", "Spanish", "other"]
title_type: ["main", "subtitle", "series"]

Checkbox Attributes

Boolean flags for binary properties. Use for yes/no questions or feature toggles.

abbreviated: true
full_name: false
in_print: true

Attribute Functional Types

Attributes also have different roles in the coreference system:

Regular Attributes

Standard attributes that are set individually for each label instance. You define and edit these in the parameter menu when applying labels. Each annotation can have different values for regular attributes.

Example: A "mention" label with attribute "notetype" (dropdown: "citation", "reference", "quotation")
Each mention can have its own notetype value.

Group ID Attribute

A special attribute that links multiple label instances together as a coreference group. When multiple labels with the same name share the same Group ID value, they are automatically linked together and appear as a group in the Co-reference panel.

Example: A "mention" label with attribute "docid" configured as the Group ID
All mentions with docid="irpa" form one coreference group.

How Group ID Works

The Group ID attribute is a regular attribute that you set during the attribute creation. The special part is that when you designate it as the "Group ID Attribute" during its creation, the system uses it to link labels together. Labels with the same Group ID value become members of the same coreference group.

Group Attributes

Attributes designated as "Group Attributes" are synchronized across all members of a coreference group. These attributes cannot be edited in the attribute menu when applying labels. Instead, they must be set in the Co-reference panel, and any changes automatically propagate to all group members.

Example: A "mention" label with "uri" configured as a Group Attribute
You cannot set "uri" when creating individual mentions.
Instead, set it once in the Co-reference panel, and all group members receive the same value.

Important: Group Attributes vs Regular Attributes

Regular attributes: Set individually in the attribute menu for each label instance.
Group Attributes: Can only be set in the Co-reference panel and are shared across all group members.

Coreference & Groups

Group management allows you to link related annotations that refer to the same entity, enabling coreference resolution across your document.

Configuration

Group ID Attribute: The attribute used to identify group membership (e.g., docid)
Group Attributes: Which attributes should synchronize across all group members

Functionality

Labels with the same Group ID value are automatically linked
Changes to Group Attributes propagate to all group members
The Co-reference panel displays all coreference groups in real-time
Click on groups to highlight all occurrences in the document

Use Case

When a document references the same legislation multiple times using different names ("IRPA", "the Act", "Immigration and Refugee Protection Act"), assign the same docid value to link them together.

Pattern-Based Labeling

The Advanced Labelization panel enables efficient annotation of repeated patterns throughout your document.

How It Works

Enter a text pattern in the search field
Navigate through matches using Previous/Next buttons
Select portions of the highlighted match
Apply labels to the selection
Use "Apply All" to replicate the annotation to all matching patterns

Best Practice

Create your complete nested label structure on the first occurrence, then use "Apply All" to annotate all similar patterns in one action.

Multi-Selection with Ctrl

For patterns that are similar but not identical, use Ctrl+click to apply the same label to multiple selections at once. This is perfect for variations like "IRPA", "the Act", "s.117 of the IRPA", "section 117"—all referring to the same thing but with different phrasings.

How to Use Multi-Selection

Select your first text span normally
Hold Ctrl (or Cmd on Mac) and select additional text spans
Release the selection to open the context menu
Choose a label—it will be applied to all selected spans at once
Fill in attributes that apply to all instances

When to Use Multi-Selection vs Pattern Search

Pattern Search: Best for identical text patterns scattered throughout the document.
Multi-Selection (Ctrl+click): Best for related but differently-worded references that need the same label.

Creating Labels

Build your annotation schema using the Label Tree panel:

Add a Parent Label

Click the "Add Label" button at the top of the Labels panel
Enter a name for your label

Add Attributes

Click the gold "⚙" icon next to the label name in the label tree
Enter the attribute name
Select the attribute type (string, dropdown, or checkbox)
For dropdowns, define the available options

Add Sublabels

Click the green "+" icon next to the label name in the label tree
Configure the sublabel with its own name and color
Add attributes to sublabels following the same process

Applying Labels

Annotate your document by applying labels to text selections:

Manual Application

Select text in the rendered HTML view
Release the selection to open the context menu
Choose a label from the hierarchical menu
Fill in attribute values in the attribute form
Press enter or click elsewhere to create the annotation

Nested Labels

To create nested structures:

Apply a parent label to a text span
Select a portion within the parent label
Apply a sublabel from the context menu
Fill in sublabel-specific attributes

Editing Labels

Click on label attributes in the rendered view to edit values
Click the delete button (×) on a label to remove it
Deleting a parent label removes all nested children

Adjusting Label Boundaries

After applying a label, you can easily extend or shorten its boundaries without having to delete and reapply it.

How to Adjust Boundaries

Hold Ctrl (or Cmd on Mac)
Click on the label you want to adjust
Click elsewhere in the text to set the new boundary
The label will expand or contract to include the new selection

Quick Editing

This is especially useful when you initially apply a label but realize you need to include a bit more context or remove extra words from the selection. No need to delete and recreate—just Ctrl+click to adjust!

This feature is particularly valuable for correcting annotations generated automatically by LLMs, which may not always capture the exact boundaries correctly.

Managing Groups

The coreference system allows you to link related annotations that refer to the same entity (e.g., multiple references to the same document or person throughout your text). This is accomplished through Group ID attributes and Group Attributes.

Understanding the System

Group ID Attribute: An attribute you designate to identify which labels belong together. When multiple labels share the same Group ID value, they form a coreference group.
Group Attributes: Attributes that are shared across all members of a group. These can only be edited in the Co-reference panel and changes propagate to all group members automatically.
Regular Attributes: All other attributes remain individual and are set in the attribute menu when applying labels.

Step 1: Configure Group Settings

Create your label and add all attributes you need (including the Group ID attribute)
Expand the label in the Label Tree panel
Locate the "Group Configuration" section at the bottom
Select the Group ID Attribute: Choose which attribute will serve as the group identifier (e.g., docid)
Select Group Attributes: Check the boxes for attributes that should be synchronized across all group members (e.g., uri, doctype)

Configuration Example

For a "mention" label tracking legal documents:
• Group ID Attribute: docid (to identify which document is referenced)
• Group Attributes: uri, jurisdiction (shared information about the document)
• Regular Attributes: page_number, context_type (specific to each mention)

Step 2: Create Label Instances

Select text in the rendered view and apply your configured label
In the parameter menu, set a value for the Group ID attribute (e.g., docid="irpa")
Set values for any regular attributes (these are specific to this instance)
Note: You will not see Group Attributes in this menu—they're only editable in the Co-reference panel
Press Enter or click elsewhere to create the annotation

Step 3: Link Additional Instances

Find another text span referring to the same entity
Apply the same label type
Use the same Group ID value (e.g., docid="irpa" again)
The two instances are now linked as a coreference group

Step 4: Manage Groups in Co-reference Panel

Once groups are created, they appear in the Co-reference panel (right side of the interface):

Groups are organized by label name
Each group shows its Group ID value and member count
Click "Edit" on a group to set Group Attribute values
Changes to Group Attributes apply to all group members instantly
Click on a group to highlight all its members in the document

Use Case Example

Annotating references to the Immigration and Refugee Protection Act:

First mention: "Immigration and Refugee Protection Act"
  • Label: mention
  • docid: "irpa" (Group ID)
  • page_number: "5" (Regular attribute)

Second mention: "IRPA"
  • Label: mention  
  • docid: "irpa" (Same Group ID → linked!)
  • page_number: "12" (Different regular attribute)

Third mention: "the Act"
  • Label: mention
  • docid: "irpa" (Same Group ID → linked!)
  • page_number: "15" (Different regular attribute)

In Co-reference panel:
  • Group "irpa" (3 members)
  • Set Group Attributes:
    - uri: "https://laws.justice.gc.ca/eng/acts/I-2.5/"
    - jurisdiction: "Federal"
  • These values now apply to all three mentions

Remember

Group ID attribute: Set individually when creating each label (but use the same value to link them).
Group Attributes: Only editable in the Co-reference panel, shared across all group members.
Regular attributes: Set individually in the parameter menu, unique to each instance.

Import & Export

Save and reload your annotated documents with embedded schemas:

Export

Click the "Download" button () or "Save As" button ( Save As ) to save your work. The exported HTML includes:

Original document content
Applied annotations using <manual_label> elements
Label schema stored in HTML comments
All attribute values preserved

Automatic Backups

Enable automatic saving every 15 minutes to protect your work:

Click the auto-save button () in the header
Select a backup folder on your computer
Backups will be automatically saved every 15 minutes to a backups/ subfolder
Each backup is timestamped for easy recovery
Click the auto-save button again to disable automatic backups

⚠ Browser Requirement

Automatic backups require the File System Access API, which is currently supported in Chrome, Edge, and other Chromium-based browsers. The feature may not work in Firefox or Safari.

Import

Load a previously annotated document:

Click "Load HTML File" or drag-drop the file
The label schema is automatically extracted from the HTML comments
Applied labels are parsed and displayed
Continue annotating or make edits

Portability

All data is self-contained within the HTML file. Share annotated documents with collaborators who can open them directly in HTML Labelizer.

Example: Legal Citations

Annotating a legal citation with nested structure:

Original Text

See Theratechnologies inc. v. 121851 Canada inc., 2015 SCC 18

Annotation Process

Select the entire citation
Apply "mention" label with attributes:
- docid="scc2015-18"
- doctype="decision"
Select "Theratechnologies inc. v. 121851 Canada inc."
Apply "title" sublabel with titletype="case name"
Select "2015 SCC 18"
Apply "reference" sublabel

Resulting HTML

<manual_label labelname="mention" docid="scc2015-18" doctype="decision">
  See <manual_label labelname="title" titletype="case name">
    Theratechnologies inc. v. 121851 Canada inc.
  </manual_label>, 
  <manual_label labelname="reference">
    2015 SCC 18
  </manual_label>
</manual_label>

Example: Coreference

Linking multiple references to the same document:

Scenario

A document refers to the Immigration and Refugee Protection Act in three different ways:

"Immigration and Refugee Protection Act" (full name)
"IRPA" (acronym)
"the Act" (anaphoric reference)

Annotation

Apply the "mention" label to each occurrence with the same docid="irpa":

<manual_label labelname="mention" docid="irpa" doctype="legislation">
  Immigration and Refugee Protection Act
</manual_label>

...

<manual_label labelname="mention" docid="irpa">
  IRPA
</manual_label>

...

<manual_label labelname="mention" docid="irpa">
  the Act
</manual_label>

Result

All three annotations form a coreference group. If you edit the doctype on any member (assuming it's configured as a Group Attribute), the change propagates to all members.

Example: Nested Structures

Complex annotation with multiple nesting levels:

Text

Section 12(1)(a) of the Competition Act, R.S.C. 1985, c. C-34

Structure

mention (entire citation)
- docid="competition-act"
- doctype="legislation"
- fragment (Section 12(1)(a))
  - fragmentid="12(1)(a)"
  - fragmenttype="section"
- title (Competition Act)
  - titletype="short title"
- reference (R.S.C. 1985, c. C-34)
  - reftype="official"

Data Format

HTML Labelizer uses a custom data format embedded within HTML documents:

Schema Storage

The label schema is stored as a JSON object in an HTML comment before the <head> tag:

<!-- HTMLLabelizer {
  "labels": [...],
  "version": "1.0"
} -->

Label Schema Structure

{
  "name": "mention",
  "color": "#6aa3ff",
  "type": "structured",
  "params": {
    "docid": { "type": "string", "default": "" },
    "doctype": { 
      "type": "dropdown", 
      "options": ["legislation", "decision", "other"],
      "default": "legislation"
    }
  },
  "sublabels": [...],
  "groupConfig": {
    "groupIdAttribute": "docid",
    "groupAttributes": ["doctype", "url"]
  }
}

HTML Structure

Applied annotations use custom <manual_label> elements:

Element Format

<manual_label 
  labelname="mention" 
  docid="example-123" 
  doctype="legislation">
  annotated text content
</manual_label>

Attributes

labelname - Required. The name of the label being applied
Additional attributes - All label parameters stored as HTML attributes

Nesting

Sublabels are nested within parent <manual_label> elements:

<manual_label labelname="mention" docid="abc">
  Some text
  <manual_label labelname="title">
    nested content
  </manual_label>
  more text
</manual_label>

Browser Support

HTML Labelizer works in modern browsers with ES6 module support:

Minimum Versions

Chrome/Edge: Version 90 or later
Firefox: Version 88 or later
Safari: Version 14 or later

Required Features

ES6 Modules (import/export)
JavaScript Map and Set
CSS Grid and Flexbox
Local file system access via File API

No Installation Required

HTML Labelizer is a static web application that runs entirely in your browser. No server components, no installation process, and no external dependencies.

Note on File Access

For security reasons, browsers do not allow automatic writing to the file system. Use the "Download" () or "Save As" ( Save As ) button to save your work, which triggers a browser download.

Need Help?

Found a bug or have a feature request? Visit the GitHub repository to open an issue or contribute to the project.

HTML Labelizer Documentation

HTML Labelizer Documentation

Quick Start

Key Features

Hierarchical Labels

Pattern Search

Coreference Resolution

IAA Comparison Tool

No Installation Required

Free, Open Source & Private

Real-time Statistics

Persistent Storage

Hierarchical Labels

Label Structure

Why Use Sublabels? A Comparison

Example 1: Book Citation with Attributes Only

Example 2: Book Citation with Sublabels

Label Attributes

Attribute Data Types

String Attributes

Dropdown Attributes

Checkbox Attributes

Attribute Functional Types

Regular Attributes

Group ID Attribute

Group Attributes

Coreference & Groups

Configuration

Functionality

Use Case

Pattern-Based Labeling

How It Works

Multi-Selection with Ctrl

How to Use Multi-Selection

Creating Labels

Add a Parent Label

Add Attributes

Add Sublabels

Applying Labels

Manual Application

Nested Labels

Editing Labels

Adjusting Label Boundaries

How to Adjust Boundaries

Managing Groups

Understanding the System

Step 1: Configure Group Settings

Step 2: Create Label Instances

Step 3: Link Additional Instances

Step 4: Manage Groups in Co-reference Panel

Use Case Example

Import & Export

Export

Automatic Backups

Import

Portability

Example: Legal Citations

Original Text

Annotation Process

Resulting HTML

Example: Coreference

Scenario

Annotation

Result

Example: Nested Structures

Text

Structure

Data Format

Schema Storage

Label Schema Structure

HTML Structure

Element Format

Attributes

Nesting

Browser Support

Minimum Versions

Required Features

No Installation Required

Settings

Theme