HTML Labelizer Documentation
A client-side web application for semantic annotation of HTML documents with hierarchical label schemas, pattern-based labeling, and coreference resolution capabilities.
Quick Start
Get started with HTML Labelizer in five simple steps:
- Open the application: Navigate to the labelizer tool
- Load your document: Drag and drop an HTML file or click "Load HTML File"
- Define your schema: Click "Add Label" to create your annotation labels
- Annotate content: Select text in the rendered view and apply labels from the context menu
- Save your work: Click "Download" (
) to export manually, or enable automatic backups (
) to save every 15 minutes
All processing happens locally in your browser. No data is transmitted to any server.
Key Features
Designed for information retrieval research, coreference resolution, entity disambiguation, and semantic annotation tasks in legal, academic, and research contexts.
Hierarchical Labels
Create nested label structures up to 2 levels deep with custom attributes
Pattern Search
Find and label all occurrences of text patterns automatically with bulk apply
Coreference Resolution
Link related annotations with synchronized attributes across the document
IAA Comparison Tool
Compare two annotated documents with Inter-Annotator Agreement analysis and visual statistics
No Installation Required
Web-based tool with no downloads, setup, or configuration needed. Just open and start annotating
Free, Open Source & Private
Completely free to use with open source code on GitHub. All data stays in your browser
Real-time Statistics
Track annotation progress with live label counts and distribution
Persistent Storage
Schemas and annotations embedded directly in HTML for portability
Hierarchical Labels
HTML Labelizer supports creating complex, nested label structures with up to 2 levels of hierarchy. Each label can have custom attributes and child sublabels.
Label Structure
Labels are organized in a tree structure where:
- Parent labels represent high-level semantic categories
- Sublabels provide fine-grained classification within parent categories
- Attributes store metadata about the labeled content
Why Use Sublabels? A Comparison
When annotating text, you can store information either as attributes (metadata) or as sublabels (nested annotations). The key difference is that sublabels preserve the location of information within the text.
Example 1: Book Citation with Attributes Only
This approach stores all information as metadata but loses positional information:
book (parent label)
├── Attributes:
│ ├── book_id (string) - Unique identifier
│ ├── title (string) - Book title
│ ├── author (string) - Author name
│ ├── year (string) - Publication year
│ ├── publisher (string) - Publisher name
│ └── isbn (string) - ISBN number
└── No sublabels
Text example:
"The Great Gatsby by F. Scott Fitzgerald (1925)"
Annotation: The entire text is wrapped in one label with attributes, but you cannot identify where exactly "The Great Gatsby" or "F. Scott Fitzgerald" appears in the text.
Example 2: Book Citation with Sublabels
This approach preserves the exact location of each piece of information:
book (parent label)
├── Attributes:
│ ├── book_id (string) - Unique identifier for coreference
│ ├── genre (dropdown: fiction, non-fiction, academic, reference)
│ └── language (dropdown: English, French, Spanish, other)
└── Sublabels:
├── title
│ ├── title_type (dropdown: main, subtitle, series)
│ └── abbreviated (checkbox)
├── author
│ ├── author_type (dropdown: primary, co-author, editor)
│ └── full_name (checkbox)
├── publication
│ ├── year (string)
│ └── publisher (string)
└── identifier
├── isbn (string)
└── doi (string)
Text example:
"The Great Gatsby by F. Scott Fitzgerald (1925)"
Annotation with sublabels: You can mark exactly where each element appears:
- "The Great Gatsby" is labeled with the
titlesublabel - "F. Scott Fitzgerald" is labeled with the
authorsublabel - "1925" is labeled with the
publicationsublabel
Use sublabels when you need to preserve the position of information within text. Use attributes when you only need to store metadata about the entire labeled span. Often, a combination of both is ideal: attributes for general metadata, sublabels for locating specific elements.
Label Attributes
Attributes provide metadata and additional context for labeled content. There are three data types and three functional types of attributes:
Attribute Data Types
String Attributes
Free-text fields for flexible data entry. Use for identifiers, names, or any textual metadata.
book_id: "gatsby-1925"
isbn: "978-0-7432-7356-5"
author: "F. Scott Fitzgerald"
Dropdown Attributes
Controlled vocabulary with predefined options. Ensures consistent annotation and enables filtering.
genre: ["fiction", "non-fiction", "academic", "reference"]
language: ["English", "French", "Spanish", "other"]
title_type: ["main", "subtitle", "series"]
Checkbox Attributes
Boolean flags for binary properties. Use for yes/no questions or feature toggles.
abbreviated: true
full_name: false
in_print: true
Attribute Functional Types
Attributes also have different roles in the coreference system:
Regular Attributes
Standard attributes that are set individually for each label instance. You define and edit these in the parameter menu when applying labels. Each annotation can have different values for regular attributes.
Example: A "mention" label with attribute "notetype" (dropdown: "citation", "reference", "quotation")
Each mention can have its own notetype value.
Group ID Attribute
A special attribute that links multiple label instances together as a coreference group. When multiple labels with the same name share the same Group ID value, they are automatically linked together and appear as a group in the Co-reference panel.
Example: A "mention" label with attribute "docid" configured as the Group ID
All mentions with docid="irpa" form one coreference group.
The Group ID attribute is a regular attribute that you set during the attribute creation. The special part is that when you designate it as the "Group ID Attribute" during its creation, the system uses it to link labels together. Labels with the same Group ID value become members of the same coreference group.
Group Attributes
Attributes designated as "Group Attributes" are synchronized across all members of a coreference group. These attributes cannot be edited in the attribute menu when applying labels. Instead, they must be set in the Co-reference panel, and any changes automatically propagate to all group members.
Example: A "mention" label with "uri" configured as a Group Attribute
You cannot set "uri" when creating individual mentions.
Instead, set it once in the Co-reference panel, and all group members receive the same value.
Regular attributes: Set individually in the attribute menu for each label instance.
Group Attributes: Can only be set in the Co-reference panel and are shared across all group members.
Coreference & Groups
Group management allows you to link related annotations that refer to the same entity, enabling coreference resolution across your document.
Configuration
- Group ID Attribute: The attribute used to identify group membership (e.g.,
docid) - Group Attributes: Which attributes should synchronize across all group members
Functionality
- Labels with the same Group ID value are automatically linked
- Changes to Group Attributes propagate to all group members
- The Co-reference panel displays all coreference groups in real-time
- Click on groups to highlight all occurrences in the document
Use Case
When a document references the same legislation multiple times using different names
("IRPA", "the Act", "Immigration and Refugee Protection Act"), assign the same
docid value to link them together.
Pattern-Based Labeling
The Advanced Labelization panel enables efficient annotation of repeated patterns throughout your document.
How It Works
- Enter a text pattern in the search field
- Navigate through matches using Previous/Next buttons
- Select portions of the highlighted match
- Apply labels to the selection
- Use "Apply All" to replicate the annotation to all matching patterns
Create your complete nested label structure on the first occurrence, then use "Apply All" to annotate all similar patterns in one action.
Multi-Selection with Ctrl
For patterns that are similar but not identical, use Ctrl+click to apply the same label to multiple selections at once. This is perfect for variations like "IRPA", "the Act", "s.117 of the IRPA", "section 117"—all referring to the same thing but with different phrasings.
How to Use Multi-Selection
- Select your first text span normally
- Hold Ctrl (or Cmd on Mac) and select additional text spans
- Release the selection to open the context menu
- Choose a label—it will be applied to all selected spans at once
- Fill in attributes that apply to all instances
Pattern Search: Best for identical text patterns scattered throughout the document.
Multi-Selection (Ctrl+click): Best for related but differently-worded references that need the same label.
Creating Labels
Build your annotation schema using the Label Tree panel:
Add a Parent Label
- Click the "Add Label" button at the top of the Labels panel
- Enter a name for your label
Add Attributes
- Click the gold "⚙" icon next to the label name in the label tree
- Enter the attribute name
- Select the attribute type (string, dropdown, or checkbox)
- For dropdowns, define the available options
Add Sublabels
- Click the green "+" icon next to the label name in the label tree
- Configure the sublabel with its own name and color
- Add attributes to sublabels following the same process
Applying Labels
Annotate your document by applying labels to text selections:
Manual Application
- Select text in the rendered HTML view
- Release the selection to open the context menu
- Choose a label from the hierarchical menu
- Fill in attribute values in the attribute form
- Press enter or click elsewhere to create the annotation
Nested Labels
To create nested structures:
- Apply a parent label to a text span
- Select a portion within the parent label
- Apply a sublabel from the context menu
- Fill in sublabel-specific attributes
Editing Labels
- Click on label attributes in the rendered view to edit values
- Click the delete button (×) on a label to remove it
- Deleting a parent label removes all nested children
Adjusting Label Boundaries
After applying a label, you can easily extend or shorten its boundaries without having to delete and reapply it.
How to Adjust Boundaries
- Hold Ctrl (or Cmd on Mac)
- Click on the label you want to adjust
- Click elsewhere in the text to set the new boundary
- The label will expand or contract to include the new selection
This is especially useful when you initially apply a label but realize you need to include a bit more context or remove extra words from the selection. No need to delete and recreate—just Ctrl+click to adjust!
This feature is particularly valuable for correcting annotations generated automatically by LLMs, which may not always capture the exact boundaries correctly.
Managing Groups
The coreference system allows you to link related annotations that refer to the same entity (e.g., multiple references to the same document or person throughout your text). This is accomplished through Group ID attributes and Group Attributes.
Understanding the System
- Group ID Attribute: An attribute you designate to identify which labels belong together. When multiple labels share the same Group ID value, they form a coreference group.
- Group Attributes: Attributes that are shared across all members of a group. These can only be edited in the Co-reference panel and changes propagate to all group members automatically.
- Regular Attributes: All other attributes remain individual and are set in the attribute menu when applying labels.
Step 1: Configure Group Settings
- Create your label and add all attributes you need (including the Group ID attribute)
- Expand the label in the Label Tree panel
- Locate the "Group Configuration" section at the bottom
- Select the Group ID Attribute: Choose which attribute will serve as the group identifier (e.g.,
docid) - Select Group Attributes: Check the boxes for attributes that should be synchronized across all group members (e.g.,
uri,doctype)
For a "mention" label tracking legal documents:
• Group ID Attribute: docid (to identify which document is referenced)
• Group Attributes: uri, jurisdiction (shared information about the document)
• Regular Attributes: page_number, context_type (specific to each mention)
Step 2: Create Label Instances
- Select text in the rendered view and apply your configured label
- In the parameter menu, set a value for the Group ID attribute (e.g.,
docid="irpa") - Set values for any regular attributes (these are specific to this instance)
- Note: You will not see Group Attributes in this menu—they're only editable in the Co-reference panel
- Press Enter or click elsewhere to create the annotation
Step 3: Link Additional Instances
- Find another text span referring to the same entity
- Apply the same label type
- Use the same Group ID value (e.g.,
docid="irpa"again) - The two instances are now linked as a coreference group
Step 4: Manage Groups in Co-reference Panel
Once groups are created, they appear in the Co-reference panel (right side of the interface):
- Groups are organized by label name
- Each group shows its Group ID value and member count
- Click "Edit" on a group to set Group Attribute values
- Changes to Group Attributes apply to all group members instantly
- Click on a group to highlight all its members in the document
Use Case Example
Annotating references to the Immigration and Refugee Protection Act:
First mention: "Immigration and Refugee Protection Act"
• Label: mention
• docid: "irpa" (Group ID)
• page_number: "5" (Regular attribute)
Second mention: "IRPA"
• Label: mention
• docid: "irpa" (Same Group ID → linked!)
• page_number: "12" (Different regular attribute)
Third mention: "the Act"
• Label: mention
• docid: "irpa" (Same Group ID → linked!)
• page_number: "15" (Different regular attribute)
In Co-reference panel:
• Group "irpa" (3 members)
• Set Group Attributes:
- uri: "https://laws.justice.gc.ca/eng/acts/I-2.5/"
- jurisdiction: "Federal"
• These values now apply to all three mentions
Group ID attribute: Set individually when creating each label (but use the same value to link them).
Group Attributes: Only editable in the Co-reference panel, shared across all group members.
Regular attributes: Set individually in the parameter menu, unique to each instance.
Import & Export
Save and reload your annotated documents with embedded schemas:
Export
Click the "Download" button (
) or "Save As" button (
) to save your work. The exported HTML includes:
- Original document content
- Applied annotations using
<manual_label>elements - Label schema stored in HTML comments
- All attribute values preserved
Automatic Backups
Enable automatic saving every 15 minutes to protect your work:
- Click the auto-save button (
) in the header - Select a backup folder on your computer
- Backups will be automatically saved every 15 minutes to a
backups/subfolder - Each backup is timestamped for easy recovery
- Click the auto-save button again to disable automatic backups
Automatic backups require the File System Access API, which is currently supported in Chrome, Edge, and other Chromium-based browsers. The feature may not work in Firefox or Safari.
Import
Load a previously annotated document:
- Click "Load HTML File" or drag-drop the file
- The label schema is automatically extracted from the HTML comments
- Applied labels are parsed and displayed
- Continue annotating or make edits
Portability
All data is self-contained within the HTML file. Share annotated documents with collaborators who can open them directly in HTML Labelizer.
Example: Legal Citations
Annotating a legal citation with nested structure:
Original Text
See Theratechnologies inc. v. 121851 Canada inc., 2015 SCC 18
Annotation Process
- Select the entire citation
- Apply "mention" label with attributes:
docid="scc2015-18"doctype="decision"
- Select "Theratechnologies inc. v. 121851 Canada inc."
- Apply "title" sublabel with
titletype="case name" - Select "2015 SCC 18"
- Apply "reference" sublabel
Resulting HTML
<manual_label labelname="mention" docid="scc2015-18" doctype="decision">
See <manual_label labelname="title" titletype="case name">
Theratechnologies inc. v. 121851 Canada inc.
</manual_label>,
<manual_label labelname="reference">
2015 SCC 18
</manual_label>
</manual_label>
Example: Coreference
Linking multiple references to the same document:
Scenario
A document refers to the Immigration and Refugee Protection Act in three different ways:
- "Immigration and Refugee Protection Act" (full name)
- "IRPA" (acronym)
- "the Act" (anaphoric reference)
Annotation
Apply the "mention" label to each occurrence with the same docid="irpa":
<manual_label labelname="mention" docid="irpa" doctype="legislation">
Immigration and Refugee Protection Act
</manual_label>
...
<manual_label labelname="mention" docid="irpa">
IRPA
</manual_label>
...
<manual_label labelname="mention" docid="irpa">
the Act
</manual_label>
Result
All three annotations form a coreference group. If you edit the doctype
on any member (assuming it's configured as a Group Attribute), the change propagates to all members.
Example: Nested Structures
Complex annotation with multiple nesting levels:
Text
Section 12(1)(a) of the Competition Act, R.S.C. 1985, c. C-34
Structure
- mention (entire citation)
docid="competition-act"doctype="legislation"- fragment (Section 12(1)(a))
fragmentid="12(1)(a)"fragmenttype="section"
- title (Competition Act)
titletype="short title"
- reference (R.S.C. 1985, c. C-34)
reftype="official"
Data Format
HTML Labelizer uses a custom data format embedded within HTML documents:
Schema Storage
The label schema is stored as a JSON object in an HTML comment before the <head> tag:
<!-- HTMLLabelizer {
"labels": [...],
"version": "1.0"
} -->
Label Schema Structure
{
"name": "mention",
"color": "#6aa3ff",
"type": "structured",
"params": {
"docid": { "type": "string", "default": "" },
"doctype": {
"type": "dropdown",
"options": ["legislation", "decision", "other"],
"default": "legislation"
}
},
"sublabels": [...],
"groupConfig": {
"groupIdAttribute": "docid",
"groupAttributes": ["doctype", "url"]
}
}
HTML Structure
Applied annotations use custom <manual_label> elements:
Element Format
<manual_label
labelname="mention"
docid="example-123"
doctype="legislation">
annotated text content
</manual_label>
Attributes
labelname- Required. The name of the label being applied- Additional attributes - All label parameters stored as HTML attributes
Nesting
Sublabels are nested within parent <manual_label> elements:
<manual_label labelname="mention" docid="abc">
Some text
<manual_label labelname="title">
nested content
</manual_label>
more text
</manual_label>
Browser Support
HTML Labelizer works in modern browsers with ES6 module support:
Minimum Versions
- Chrome/Edge: Version 90 or later
- Firefox: Version 88 or later
- Safari: Version 14 or later
Required Features
- ES6 Modules (
import/export) - JavaScript Map and Set
- CSS Grid and Flexbox
- Local file system access via File API
No Installation Required
HTML Labelizer is a static web application that runs entirely in your browser. No server components, no installation process, and no external dependencies.
For security reasons, browsers do not allow automatic writing to the file system.
Use the "Download" (
) or "Save As" (
) button to save your work, which triggers a browser download.
Found a bug or have a feature request? Visit the GitHub repository to open an issue or contribute to the project.