HTML Labelizer Documentation

A client-side web application for semantic annotation of HTML documents with hierarchical label schemas, pattern-based labeling, and coreference resolution capabilities.

Quick Start

Get started with HTML Labelizer in five simple steps:

  1. Open the application: Navigate to the labelizer tool
  2. Load your document: Drag and drop an HTML file or click "Load HTML File"
  3. Define your schema: Click "Add Label" to create your annotation labels
  4. Annotate content: Select text in the rendered view and apply labels from the context menu
  5. Save your work: Click "Download" (Download) to export manually, or enable automatic backups (Auto-save) to save every 15 minutes
Privacy First

All processing happens locally in your browser. No data is transmitted to any server.

Key Features

Designed for information retrieval research, coreference resolution, entity disambiguation, and semantic annotation tasks in legal, academic, and research contexts.

Hierarchical Labels

Create nested label structures up to 2 levels deep with custom attributes

Pattern Search

Find and label all occurrences of text patterns automatically with bulk apply

Coreference Resolution

Link related annotations with synchronized attributes across the document

IAA Comparison Tool

Compare two annotated documents with Inter-Annotator Agreement analysis and visual statistics

No Installation Required

Web-based tool with no downloads, setup, or configuration needed. Just open and start annotating

Free, Open Source & Private

Completely free to use with open source code on GitHub. All data stays in your browser

Real-time Statistics

Track annotation progress with live label counts and distribution

Persistent Storage

Schemas and annotations embedded directly in HTML for portability

Hierarchical Labels

HTML Labelizer supports creating complex, nested label structures with up to 2 levels of hierarchy. Each label can have custom attributes and child sublabels.

Label Structure

Labels are organized in a tree structure where:

  • Parent labels represent high-level semantic categories
  • Sublabels provide fine-grained classification within parent categories
  • Attributes store metadata about the labeled content

Why Use Sublabels? A Comparison

When annotating text, you can store information either as attributes (metadata) or as sublabels (nested annotations). The key difference is that sublabels preserve the location of information within the text.

Example 1: Book Citation with Attributes Only

This approach stores all information as metadata but loses positional information:

book (parent label)
├── Attributes:
│   ├── book_id (string) - Unique identifier
│   ├── title (string) - Book title
│   ├── author (string) - Author name
│   ├── year (string) - Publication year
│   ├── publisher (string) - Publisher name
│   └── isbn (string) - ISBN number
└── No sublabels

Text example:

"The Great Gatsby by F. Scott Fitzgerald (1925)"

Annotation: The entire text is wrapped in one label with attributes, but you cannot identify where exactly "The Great Gatsby" or "F. Scott Fitzgerald" appears in the text.

Example 2: Book Citation with Sublabels

This approach preserves the exact location of each piece of information:

book (parent label)
├── Attributes:
│   ├── book_id (string) - Unique identifier for coreference
│   ├── genre (dropdown: fiction, non-fiction, academic, reference)
│   └── language (dropdown: English, French, Spanish, other)
└── Sublabels:
    ├── title
    │   ├── title_type (dropdown: main, subtitle, series)
    │   └── abbreviated (checkbox)
    ├── author
    │   ├── author_type (dropdown: primary, co-author, editor)
    │   └── full_name (checkbox)
    ├── publication
    │   ├── year (string)
    │   └── publisher (string)
    └── identifier
        ├── isbn (string)
        └── doi (string)

Text example:

"The Great Gatsby by F. Scott Fitzgerald (1925)"

Annotation with sublabels: You can mark exactly where each element appears:

  • "The Great Gatsby" is labeled with the title sublabel
  • "F. Scott Fitzgerald" is labeled with the author sublabel
  • "1925" is labeled with the publication sublabel
When to Use Sublabels

Use sublabels when you need to preserve the position of information within text. Use attributes when you only need to store metadata about the entire labeled span. Often, a combination of both is ideal: attributes for general metadata, sublabels for locating specific elements.

Label Attributes

Attributes provide metadata and additional context for labeled content. There are three data types and three functional types of attributes:

Attribute Data Types

String Attributes

Free-text fields for flexible data entry. Use for identifiers, names, or any textual metadata.

book_id: "gatsby-1925"
isbn: "978-0-7432-7356-5"
author: "F. Scott Fitzgerald"

Dropdown Attributes

Controlled vocabulary with predefined options. Ensures consistent annotation and enables filtering.

genre: ["fiction", "non-fiction", "academic", "reference"]
language: ["English", "French", "Spanish", "other"]
title_type: ["main", "subtitle", "series"]

Checkbox Attributes

Boolean flags for binary properties. Use for yes/no questions or feature toggles.

abbreviated: true
full_name: false
in_print: true

Attribute Functional Types

Attributes also have different roles in the coreference system:

Regular Attributes

Standard attributes that are set individually for each label instance. You define and edit these in the parameter menu when applying labels. Each annotation can have different values for regular attributes.

Example: A "mention" label with attribute "notetype" (dropdown: "citation", "reference", "quotation")
Each mention can have its own notetype value.

Group ID Attribute

A special attribute that links multiple label instances together as a coreference group. When multiple labels with the same name share the same Group ID value, they are automatically linked together and appear as a group in the Co-reference panel.

Example: A "mention" label with attribute "docid" configured as the Group ID
All mentions with docid="irpa" form one coreference group.
How Group ID Works

The Group ID attribute is a regular attribute that you set during the attribute creation. The special part is that when you designate it as the "Group ID Attribute" during its creation, the system uses it to link labels together. Labels with the same Group ID value become members of the same coreference group.

Group Attributes

Attributes designated as "Group Attributes" are synchronized across all members of a coreference group. These attributes cannot be edited in the attribute menu when applying labels. Instead, they must be set in the Co-reference panel, and any changes automatically propagate to all group members.

Example: A "mention" label with "uri" configured as a Group Attribute
You cannot set "uri" when creating individual mentions.
Instead, set it once in the Co-reference panel, and all group members receive the same value.
Important: Group Attributes vs Regular Attributes

Regular attributes: Set individually in the attribute menu for each label instance.
Group Attributes: Can only be set in the Co-reference panel and are shared across all group members.

Coreference & Groups

Group management allows you to link related annotations that refer to the same entity, enabling coreference resolution across your document.

Configuration

  • Group ID Attribute: The attribute used to identify group membership (e.g., docid)
  • Group Attributes: Which attributes should synchronize across all group members

Functionality

  • Labels with the same Group ID value are automatically linked
  • Changes to Group Attributes propagate to all group members
  • The Co-reference panel displays all coreference groups in real-time
  • Click on groups to highlight all occurrences in the document

Use Case

When a document references the same legislation multiple times using different names ("IRPA", "the Act", "Immigration and Refugee Protection Act"), assign the same docid value to link them together.

Pattern-Based Labeling

The Advanced Labelization panel enables efficient annotation of repeated patterns throughout your document.

How It Works

  1. Enter a text pattern in the search field
  2. Navigate through matches using Previous/Next buttons
  3. Select portions of the highlighted match
  4. Apply labels to the selection
  5. Use "Apply All" to replicate the annotation to all matching patterns
Best Practice

Create your complete nested label structure on the first occurrence, then use "Apply All" to annotate all similar patterns in one action.

Multi-Selection with Ctrl

For patterns that are similar but not identical, use Ctrl+click to apply the same label to multiple selections at once. This is perfect for variations like "IRPA", "the Act", "s.117 of the IRPA", "section 117"—all referring to the same thing but with different phrasings.

How to Use Multi-Selection

  1. Select your first text span normally
  2. Hold Ctrl (or Cmd on Mac) and select additional text spans
  3. Release the selection to open the context menu
  4. Choose a label—it will be applied to all selected spans at once
  5. Fill in attributes that apply to all instances
When to Use Multi-Selection vs Pattern Search

Pattern Search: Best for identical text patterns scattered throughout the document.
Multi-Selection (Ctrl+click): Best for related but differently-worded references that need the same label.

Creating Labels

Build your annotation schema using the Label Tree panel:

Add a Parent Label

  1. Click the "Add Label" button at the top of the Labels panel
  2. Enter a name for your label

Add Attributes

  1. Click the gold "⚙" icon next to the label name in the label tree
  2. Enter the attribute name
  3. Select the attribute type (string, dropdown, or checkbox)
  4. For dropdowns, define the available options

Add Sublabels

  1. Click the green "+" icon next to the label name in the label tree
  2. Configure the sublabel with its own name and color
  3. Add attributes to sublabels following the same process

Applying Labels

Annotate your document by applying labels to text selections:

Manual Application

  1. Select text in the rendered HTML view
  2. Release the selection to open the context menu
  3. Choose a label from the hierarchical menu
  4. Fill in attribute values in the attribute form
  5. Press enter or click elsewhere to create the annotation

Nested Labels

To create nested structures:

  1. Apply a parent label to a text span
  2. Select a portion within the parent label
  3. Apply a sublabel from the context menu
  4. Fill in sublabel-specific attributes

Editing Labels

  • Click on label attributes in the rendered view to edit values
  • Click the delete button (×) on a label to remove it
  • Deleting a parent label removes all nested children

Adjusting Label Boundaries

After applying a label, you can easily extend or shorten its boundaries without having to delete and reapply it.

How to Adjust Boundaries

  1. Hold Ctrl (or Cmd on Mac)
  2. Click on the label you want to adjust
  3. Click elsewhere in the text to set the new boundary
  4. The label will expand or contract to include the new selection
Quick Editing

This is especially useful when you initially apply a label but realize you need to include a bit more context or remove extra words from the selection. No need to delete and recreate—just Ctrl+click to adjust!

This feature is particularly valuable for correcting annotations generated automatically by LLMs, which may not always capture the exact boundaries correctly.

Managing Groups

The coreference system allows you to link related annotations that refer to the same entity (e.g., multiple references to the same document or person throughout your text). This is accomplished through Group ID attributes and Group Attributes.

Understanding the System

  • Group ID Attribute: An attribute you designate to identify which labels belong together. When multiple labels share the same Group ID value, they form a coreference group.
  • Group Attributes: Attributes that are shared across all members of a group. These can only be edited in the Co-reference panel and changes propagate to all group members automatically.
  • Regular Attributes: All other attributes remain individual and are set in the attribute menu when applying labels.

Step 1: Configure Group Settings

  1. Create your label and add all attributes you need (including the Group ID attribute)
  2. Expand the label in the Label Tree panel
  3. Locate the "Group Configuration" section at the bottom
  4. Select the Group ID Attribute: Choose which attribute will serve as the group identifier (e.g., docid)
  5. Select Group Attributes: Check the boxes for attributes that should be synchronized across all group members (e.g., uri, doctype)
Configuration Example

For a "mention" label tracking legal documents:
• Group ID Attribute: docid (to identify which document is referenced)
• Group Attributes: uri, jurisdiction (shared information about the document)
• Regular Attributes: page_number, context_type (specific to each mention)

Step 2: Create Label Instances

  1. Select text in the rendered view and apply your configured label
  2. In the parameter menu, set a value for the Group ID attribute (e.g., docid="irpa")
  3. Set values for any regular attributes (these are specific to this instance)
  4. Note: You will not see Group Attributes in this menu—they're only editable in the Co-reference panel
  5. Press Enter or click elsewhere to create the annotation

Step 3: Link Additional Instances

  1. Find another text span referring to the same entity
  2. Apply the same label type
  3. Use the same Group ID value (e.g., docid="irpa" again)
  4. The two instances are now linked as a coreference group

Step 4: Manage Groups in Co-reference Panel

Once groups are created, they appear in the Co-reference panel (right side of the interface):

  • Groups are organized by label name
  • Each group shows its Group ID value and member count
  • Click "Edit" on a group to set Group Attribute values
  • Changes to Group Attributes apply to all group members instantly
  • Click on a group to highlight all its members in the document

Use Case Example

Annotating references to the Immigration and Refugee Protection Act:

First mention: "Immigration and Refugee Protection Act"
  • Label: mention
  • docid: "irpa" (Group ID)
  • page_number: "5" (Regular attribute)

Second mention: "IRPA"
  • Label: mention  
  • docid: "irpa" (Same Group ID → linked!)
  • page_number: "12" (Different regular attribute)

Third mention: "the Act"
  • Label: mention
  • docid: "irpa" (Same Group ID → linked!)
  • page_number: "15" (Different regular attribute)

In Co-reference panel:
  • Group "irpa" (3 members)
  • Set Group Attributes:
    - uri: "https://laws.justice.gc.ca/eng/acts/I-2.5/"
    - jurisdiction: "Federal"
  • These values now apply to all three mentions
Remember

Group ID attribute: Set individually when creating each label (but use the same value to link them).
Group Attributes: Only editable in the Co-reference panel, shared across all group members.
Regular attributes: Set individually in the parameter menu, unique to each instance.

Import & Export

Save and reload your annotated documents with embedded schemas:

Export

Click the "Download" button (Download) or "Save As" button (Save As) to save your work. The exported HTML includes:

  • Original document content
  • Applied annotations using <manual_label> elements
  • Label schema stored in HTML comments
  • All attribute values preserved

Automatic Backups

Enable automatic saving every 15 minutes to protect your work:

  1. Click the auto-save button (Auto-save) in the header
  2. Select a backup folder on your computer
  3. Backups will be automatically saved every 15 minutes to a backups/ subfolder
  4. Each backup is timestamped for easy recovery
  5. Click the auto-save button again to disable automatic backups
⚠ Browser Requirement

Automatic backups require the File System Access API, which is currently supported in Chrome, Edge, and other Chromium-based browsers. The feature may not work in Firefox or Safari.

Import

Load a previously annotated document:

  1. Click "Load HTML File" or drag-drop the file
  2. The label schema is automatically extracted from the HTML comments
  3. Applied labels are parsed and displayed
  4. Continue annotating or make edits

Portability

All data is self-contained within the HTML file. Share annotated documents with collaborators who can open them directly in HTML Labelizer.

Example: Coreference

Linking multiple references to the same document:

Scenario

A document refers to the Immigration and Refugee Protection Act in three different ways:

  • "Immigration and Refugee Protection Act" (full name)
  • "IRPA" (acronym)
  • "the Act" (anaphoric reference)

Annotation

Apply the "mention" label to each occurrence with the same docid="irpa":

<manual_label labelname="mention" docid="irpa" doctype="legislation">
  Immigration and Refugee Protection Act
</manual_label>

...

<manual_label labelname="mention" docid="irpa">
  IRPA
</manual_label>

...

<manual_label labelname="mention" docid="irpa">
  the Act
</manual_label>

Result

All three annotations form a coreference group. If you edit the doctype on any member (assuming it's configured as a Group Attribute), the change propagates to all members.

Example: Nested Structures

Complex annotation with multiple nesting levels:

Text

Section 12(1)(a) of the Competition Act, R.S.C. 1985, c. C-34

Structure

  • mention (entire citation)
    • docid="competition-act"
    • doctype="legislation"
    • fragment (Section 12(1)(a))
      • fragmentid="12(1)(a)"
      • fragmenttype="section"
    • title (Competition Act)
      • titletype="short title"
    • reference (R.S.C. 1985, c. C-34)
      • reftype="official"

Data Format

HTML Labelizer uses a custom data format embedded within HTML documents:

Schema Storage

The label schema is stored as a JSON object in an HTML comment before the <head> tag:

<!-- HTMLLabelizer {
  "labels": [...],
  "version": "1.0"
} -->

Label Schema Structure

{
  "name": "mention",
  "color": "#6aa3ff",
  "type": "structured",
  "params": {
    "docid": { "type": "string", "default": "" },
    "doctype": { 
      "type": "dropdown", 
      "options": ["legislation", "decision", "other"],
      "default": "legislation"
    }
  },
  "sublabels": [...],
  "groupConfig": {
    "groupIdAttribute": "docid",
    "groupAttributes": ["doctype", "url"]
  }
}

HTML Structure

Applied annotations use custom <manual_label> elements:

Element Format

<manual_label 
  labelname="mention" 
  docid="example-123" 
  doctype="legislation">
  annotated text content
</manual_label>

Attributes

  • labelname - Required. The name of the label being applied
  • Additional attributes - All label parameters stored as HTML attributes

Nesting

Sublabels are nested within parent <manual_label> elements:

<manual_label labelname="mention" docid="abc">
  Some text
  <manual_label labelname="title">
    nested content
  </manual_label>
  more text
</manual_label>

Browser Support

HTML Labelizer works in modern browsers with ES6 module support:

Minimum Versions

  • Chrome/Edge: Version 90 or later
  • Firefox: Version 88 or later
  • Safari: Version 14 or later

Required Features

  • ES6 Modules (import/export)
  • JavaScript Map and Set
  • CSS Grid and Flexbox
  • Local file system access via File API

No Installation Required

HTML Labelizer is a static web application that runs entirely in your browser. No server components, no installation process, and no external dependencies.

Note on File Access

For security reasons, browsers do not allow automatic writing to the file system. Use the "Download" (Download) or "Save As" (Save As) button to save your work, which triggers a browser download.

Need Help?

Found a bug or have a feature request? Visit the GitHub repository to open an issue or contribute to the project.