JSON Image Metadata

Editor’s Draft,

This version:
https://Inclusio-Community.github.io/json-image-metadata/
Issue Tracking:
GitHub
Editors:
(Fizz Studio)
(Fizz Studio)
Semantic Version:
1.1.0

Abstract

A specification for expressing technical image metadata, with an emphasis on accessibility of data visualizations.

1. Introduction

This specification defines a structured metadata schema for image documents to express information, data, and behaviors related to the visual image, with a particular emphasis on accessibility. The metadata includes raw data, provenance information, and predefined behaviors for assistive technology techniques, such as haptics, sonification, tactiles, high contrast, voicing, braille, and more.

The metadata is aimed primarily at data visualizations such as charts, graphs, diagrams and other information graphics, but can also be used for related purposes in any image.

It is intended for use with a structured graphics markup such as Scalable Vector Graphics (SVG), but could be embedded in and applied to any image format, including raster images and 3D objects and scenes.

1.1. Motivation

This specification was written to help unify and standardize accessible graphics, and to enable systemic experimentation with assistive technology techniques. By defining the way techniques such as haptics, sonification, and tactiles are expressed in a declarative way, the parameters can be explored and tested with users to ensure ease of consistent authoring and interoperable behavior across different assistive technology readers and devices.

By standardizing the accessibility features of a document, we enable users (such as people with disabilities, or teachers or caregivers of people with disabilities) to easily find content that matches their needs, or to find content they want to adapt or have adapted for their needs. This provides a meaningful path towards the curation, creation, and adaptation of accessible content, which is one of the primary gaps in accessibility today, particularly in education.

Of particular note is support for hybrid physical-digital documents, such as printed or embossed images that are overlaid on a touch screen where the digital equivalent is mirrored, so that a tactile experience can be enhanced with haptics or voicing. Similarly, a 3D-printed object could be used with a camera scanner to associate sections of the object with specific metadata.

This specification uses familiar technologies such as JSON and CSS Selectors, for ease of implementation and authoring.

2. JIM Conformance

2.1. JIM MIME type

JIM documents are JSON texts and are encoded as UTF-8 JSON.

When a JSON Image Metadata (JIM) document is exchanged as a standalone resource (for example, as an HTTP response body or as a sidecar file), the intended media type is:

application/jim+json

JSON processing implementations that do not recognize the application/jim+json media type may treat JIM content as application/json.

This specification does not yet register application/jim+json with IANA. A future revision of this specification is expected to request formal registration of this media type.

2.2. Version conformance

This specification defines a semantic version [Semantic Versioning 2.0.0].

A document conforming to this specification should specify the version number of the specification to which it conforms, in the jim key of the version block.

An implementation conforming to this specification must store internally the version number of the specification to which it conforms, and should expose the version number to the end user. If the implementation opens a JIM document which specifies an incompatible version of JIM, the implementation must warn the end user.

Note that this specification is being developed in parallel with several experimental implementations, on a milestone basis, as part of the Inclusio Project. The minor version number (MAJOR.MINOR.PATCH) corresponds to the project milestone number across the Inclusio Project implementations.
Note that version 0.3.0 of this specification is the first version to use Semantic Versioning. Future versions of this specification adhere to the practice of incrementing the major version number for breaking changes.

2.3. Conforming document

A document conforming to this specification must contain a metadata block that matches the schema defined in this specification, and must have at least one non-empty datasets block or one non-empty behaviors block.

This specification assumes the format of the metadata is in a JSON object. Other conforming serializations are possible, such as XML or a binary format, as long as they adhere to the structure of the defined JSON schema, and can be serialized as JSON.

A conforming document must provide a means to discover the metadata block.

2.3.1. Conforming SVG document

In SVG, the metadata block must be the content of a metadata element with an attribute data-type with the attribute value application/jim+json, the MIME type for a JIM document.

2.3.2. Conforming raster document

For raster image documents, the JIM content block must be included in the metadata section of the raster image file.

2.4. Conforming implementation

An implementation conforming to this specification must access the contents of the metadata block. For an SVG-capable implementation displaying an SVG document, a conforming implementation must read the JSON data in the indicated metadata element.

A conforming implementation must provide a means for the end user to extract the contents of the datasets block.

A conforming implementation must resolve all selectors and execute all supported features of the behaviors block.

2.4.1. Classes of Conforming implementation

This specification defines multiple classes of conforming implementations. A conforming implementation must satisfy the requirements of at least one defined class. An implementation may claim conformance to multiple classes if it meets the requirements of each.

This specification defines requirements for conforming implementations that generate, consume, or process JSON Image Metadata (JIM). A conforming implementation must satisfy the requirements of at least one conformance class defined in this section. An implementation may claim conformance to multiple classes if it meets the requirements of each.

Conformance does not imply any guarantees of content quality, accessibility outcomes, pedagogical suitability, or fitness for a particular purpose. Conformance solely indicates adherence to the interoperability requirements defined in this specification.

TODO: define conformance classes

3. JIM metadata structure and forms

A JIM document may be represented either as a "root form" document (where JIM-defined blocks appear at the JSON root) or as an "enveloped form" document (where JIM-defined blocks appear under a nested jim block). Both forms represent the same conceptual information model.

The enveloped form separates normative JIM metadata (jim) from optional non-JIM metadata (extensions). This separation makes trust boundaries explicit, supports vendor innovation without affecting conformance, and enables audit mechanisms to target the JIM-defined information model without binding unrelated metadata.

When using the enveloped form, the JSON document root is an envelope object that may contain the following blocks:

Unless otherwise stated, all requirements and definitions in this specification that apply to "the JIM object" apply to the value of the jim block when the enveloped form is used, and apply to the JSON document root when the root form is used.

3.1. jim block

In the enveloped form, the jim block must be present and its value must be a JSON object. The jim object contains only blocks defined by this specification (or future revisions thereof). Generators must not place extension-only metadata inside the jim object.

3.2. extensions block

If present, the extensions block must be a JSON object. The extensions object is reserved for metadata not defined by this specification. The semantics and validation of extension blocks are outside the scope of JIM.

Keys within extensions should be collision-resistant namespace identifiers (for example, reverse-DNS identifiers or absolute URIs). The value associated with each namespaced key should be a JSON object.

JIM processors must ignore extensions for the purposes of JIM conformance and semantic interpretation. JIM processors must not treat extension metadata as overriding, redefining, or contradicting JIM-defined semantics.

3.3. Unknown blocks and forward compatibility

Processors must ignore any JSON object blocks they do not recognize, except where this specification explicitly defines error handling for a particular block. This requirement applies both to:

3.4. Root form and enveloped form compatibility

Comforming JIM implementations must accept both forms.

A document is in the enveloped form if and only if the JSON root contains a jim block whose value is a JSON object.

If a document contains a jim block at the root, the jim object is authoritative for JIM-defined blocks. Generators must not duplicate JIM-defined items both at the document root and inside jim. If a consumer encounters such duplication, it must treat the document as invalid under this specification.

3.5. Root form and enveloped form examples

Root form:

{
 "datasets": [],
  "selectors": {},
  "behaviors":  []
}

Enveloped form:

{
  "jim": {
    "datasets": [],
    "selectors": {},
    "behaviors":  []
  },
  "extensions": {
    "org.example.charts": {
      "settings": { 
        "sonification-enabled": true 
      }
    }
  }
}

4. JIM block definitions

Note that the examples below are notional, and intended for illustrative purposes. They are not a formal definition. The final definitions will be expressed in JSON Schema [JSON-Schema], and will include data types for each block.

This vocabulary is defined in terms of "blocks", or objects identified by a key with a defined set of optional sub-keys. This includes the datasets, provenance, tags, and behaviors blocks.

{
  "datasets": [
  ],
  "selectors": {
  },
  "behaviors": [
  ],
  "provenance": {
  },
  "tags": {
  },
  "href": {
  }
}

4.1. Datasets

Datasets refer to the raw data which the image represents.

There are two ways to link the data to the image file: inline; and external reference. These methods can be used together in some ways, defined below.

A datasets block is an array of objects with the key datasets. Each child object represents a discrete dataset represented in the graphic.

{
  "datasets": [
    {
    }
  ]
}  

Note: Multiple datasets serve the use cases of: representing more than one type of chart in the same rendered chart, such as a line chart superimposed over a bar chart; representing multiple scales, such as a line chart with 2 series, each of which is on a different scale (e.g. where there are two y-axes, one on each side of the chart); dashboards with multiple charts, especially when two of the charts are closely related or show different aspects of the same data; and data that is comes from disparate sources or provenance.

Note: As a heuristic, if all of the facets (and sources) are identical between two or more datasets, they can be merged into a single dataset object; but if any of the facets (or sources) is different, they are better represented as different datasets.

4.1.1. Fidelity in dataset representation

A document conforming to this specification must maintain an accurate visual and/or structural representation of the dataset specified in the datasets block, including the selectors that bind a DOM representation to the JSONPath of the represented data.

A conforming graphical document which represents structured data must also contain the corresponding dataset in JSON or must link to the dataset in some other format.

Not all graphical documents, especially those that show illustrations rather than structured charts, have an obvious or useful JSON representation. If no schema defined in this specification matches the type of graphics depicted in the document, then a JIM block can still be useful in providing the title, description, provenance, and other information.

4.1.1.1. Superset datasets

A conforming document may include more data than is represented in the visualization, so long as the visual and structural representation is an accurate representation of a subset of the datasets.

Valid examples of data supersets include (but are not limited to):

There are a number of useful scenarios where a data superset occurs. For example, an author may wish for others to have their whole dataset for independent verification, even though the author focuses a visualization on only one aspect of that dataset; or, a user may "zoom in" on a shorter range of the data, in an interactive chart that enables that.

4.1.1.2. Synchronizing datasets and representations

Conforming authoring tools, including any software that enables editing of graphics or data, must ensure that any changes to the graphics that represent data are also changed in the corresponding data, that any changes to the data are also changed in the corresponding graphics, and that any relevant selectors still bind the corresponding data and representation.

4.1.1.3. Conforming misrepresentation of datasets

This specification does not place conformance restrictions on the accuracy, aptness, or honesty of the data representation.

A visualization which contains mistakes in the dataset itself or in the visualization, which uses the wrong type of chart for the data, or which is deliberately misleading does not necessarily fail in conformance to this specification, so long as the document uses accurate datasets and corresponding selectors.

Note: The authors of this specification strongly condemn misrepresenting data with a visualization. However, this is an ethical issue, not a technical or interoperability issue.

4.1.2. Dataset structures

This specification requires that an inline dataset must be in JSON format, but does not mandate a particular structure for a dataset.

No single dataset structure optimally represents every different kind of data or chart type. Often, an inline dataset might reflect the internal model used by the software that generated the image, diagram, or data visualization.

Instead, this specification defines the selector attribute, which can be added to an object in an arbitrary JSON data structure to indicate a link between a data point and its graphical representation element.

This specification does include a set of canonical dataset structures which may be used. However, conforming implementations may express the dataset in a different JSON structure, so long as it includes the same set of keys as in the schema.

4.1.2.1. Structured data

Structured data includes faceted data, whether qualitative or quantitative, that can be conform to a well-defined schema. Normally, structured data is represented by a chart or graph.

This data must be included in objects in the facets and series keys.

Note: A future version of this specification will define schemas for a canonical dataset structure or set of structures for different types of data visualization.

4.1.2.2. Semi-structured data

Semi-structured data includes data that may include structured data, but also includes unstructured data that is significant to the representation. Semi-structured data may be represented by some type of diagrams, illustrations, pictorial graphics, or infographics. There is usually a grouping or hierarchies, but they do not conform to a schema.

Labels, descriptions, and other semi-structured metadata may optionally be included in objects in the items key.

Note: Certain types of diagram that denote steps, stages, or relationships between objects, such as the water evaporation cycle, can be represented as structured data with the graph schema.

4.1.2.3. Unstructured data

Unstructured data includes many types of illustrations, pictorial graphics, or infographics. There may be some informal structure to such graphics, including hierarchies such as groupings or section headings, but they are not formatted in a defined manner.

Labels, descriptions, and other unstructured metadata may optionally be included in objects in the items key.

4.1.3. Dataset keys

The notional dataset described in this specification includes exemplar entities which should be included in any inline dataset for optimal correlation to common graphical representation elements.

These entities include the title, subtitle, description, representation, facets, series, items, source, and href keys and their values.

key value type description requirement
title non-empty string The title of the graphic document required
subtitle non-empty string The subtitle of the graphic document optional
description non-empty string A meaningful summary of the graphic document optional
representation object Various aspects of the representation, such as the type of chart optional
facets array Details of each facet in the dataset, such as x and y axes or legend keys, for structured and semi-structured data optional for unstructured data, required for structured and semi-structured data
series array Raw data values of each record in each series of the dataset for structured and semi-structured data optional for unstructured data, required for structured and semi-structured data
items object A loosely organized collection of items for unstructured data, such as labels, values, and descriptions for diagrams optional
source object The provenance of the dataset optional
href array An array of objects containing links to relevant resources, such as the dataset in another format, or an extended dataset optional
An example of a datasets block (with elided values) for a structured data representation, such as a bar chart.
"datasets": [
  {
    "title": "Chart Title",
    "subtitle": "Chart Subtitle",
    "description": "A description of the chart.",
    "representation": {
    },
    "facets": {
    },
    "series": [
    ],
    "source": {
    },
    "href": {
    }
  }
]

TODO: add example

The individual entries of the datasets block are defined in Inline data.

4.1.4. The title key

The title of the graphic document.

4.1.5. The subtitle key

The subtitle of the graphic document.

4.1.6. The description key

A meaningful summary of the graphic document.

4.1.7. The representation key

The representation block describes high-level attributes of the dataset’s representation. It is intended to support basic classification, discovery, and interoperable default rendering across consumers.

The representation block may be present on a dataset. Consumers may ignore the representation block.

The representation object contains the type, subtype, and structure keys.

key value type description requirement
type string The type of representation, such as chart, diagram, or illustration required if the representation block is present
subtype string The subtype of representation, such as a specific kind of chart, diagram, or illustration optional for unstructured data, required for structured and semi-structured data
structure string The structure steps of a subtype, organized by role and 'by' optional

The structure key of the representation block is an optional ordered array that describes how a consumer may group, nest, or otherwise organize a representation using one or more facets. It is intended to provide declarative guidance for common organization patterns without defining a complete rendering grammar.

4.1.7.1. The type key

The type key of the representation block identifies the broad category of what the graphic is intended to show. If the representation block is present, the representation.type key must be present.

The value of representation.type must be one of the following lowercase ASCII tokens:

token description structure status
chart A visual representation primarily intended to communicate quantitative or categorical data relationships. structured
diagram A structured representation intended to explain relationships, processes, systems, or concepts that are not primarily statistical in nature. structured or semi-structured
map A representation that conveys spatial relationships tied to geographic or coordinate-based locations. structured or semi-structured
table A representation that presents data in a grid of rows and columns. structured
formula A representation whose primary content is a mathematical, logical, or symbolic expression. structured
illustration A representational or explanatory image that does not fall into the other defined categories and does not primarily encode structured data relationships. semi-structured or unstructured
other A fallback value used when none of the defined vocabulary terms apply. structured, semi-structured, or unstructured

Generators must not invent new representation.type values. If none of the defined values apply, generators must use other and may provide additional classification using extensions.

Consumers must not infer required data structures, rendering behavior, interaction affordances, accessibility semantics, or authorization decisions solely from the value of representation.type.

4.1.7.2. The subtype key

The subtype key of the representation block optionally provides additional specialization within a given representation.type. It is intended to support interoperable default rendering and high-level accessibility workflows. It does not encode the full set of visual or semantic variations within a subtype.

If representation.subtype is present, representation.type must also be present.

Consumers may ignore representation.subtype. Consumers must not infer required data structure, interaction semantics, or accessibility semantics solely from representation.subtype.

4.1.7.2.1. Chart subtype registry (external)

When representation.type is chart, representation.subtype should be a token defined in the chart subtype registry.

The chart subtype registry is maintained by this project as a versioned document alongside this specification. Each published version of this specification corresponds to a specific revision of the registry.

Generators should use registry-defined tokens when applicable. Generators may emit tokens not present in the registry; when doing so, generators should use other if interoperability is required, or should use a prefixed token (for example vendorname:clustered-stacked) to avoid collisions.

Consumers may accept tokens not present in the registry. Consumers that validate representation.subtype against the registry should treat unrecognized tokens as non-fatal and may emit a warning.

Consumers must not infer required data structures, rendering behavior, interaction affordances, or accessibility semantics solely from representation.subtype.

4.1.7.2.2. Registry token format

Registry tokens must be lowercase ASCII and must match the pattern:

^[a-z0-9]+(-[a-z0-9]+)*$

4.1.7.2.3. Example subtype registry (informative)

This table shows a non-normative set of examples for the chart type registry. This table is included for reference only, and is not to be confused or conflated with the actual subtype registry.

token description type host
bar A chart family that encodes values using rectangular bars. chart
line A chart family that encodes values using line segments connecting ordered points. chart
area A chart family that encodes values using a filled region. chart
point A chart family that encodes values using discrete marks such as points or symbols. chart
scatter A chart family that encodes relationships using point marks positioned by two quantitative or ordered dimensions. chart
histogram A chart family that represents a distribution by binning a variable and encoding bin counts or densities. chart
boxplot A chart family that represents distributions using quartiles, median, and whiskers. chart
heatmap A chart family that encodes values using a color scale across a two-dimensional grid of categories or bins. chart
pie A chart family that encodes part-to-whole proportions as angular segments of a circle. chart
donut A pie-like chart family with a central hole. chart
waterfall A chart family that encodes additive or subtractive contributions to a running total. chart
funnel A chart family that encodes stage-based quantities decreasing through a process. chart
radar A chart family that encodes values along multiple radial axes. chart
other A fallback value used when none of the defined chart subtypes apply. chart
flowchart A diagram family that represents a process or workflow using nodes and directed connections. diagram
venn A diagram family that represents set relationships using overlapping regions. diagram

Generators should use the chart subtype tokens defined above when applicable. If none apply, generators should use other and may provide additional specialization using extensions.

Consumers that render charts should treat representation.subtype as a primary chart-family discriminator when present, and should use facets, series, and other dataset content to determine specifics.

4.1.7.3. The structure key

The structure key of the representation block is an optional ordered array that describes how a consumer may group, nest, or otherwise organize a representation using one or more facets. It is intended to provide declarative guidance for common organization patterns without defining a complete rendering grammar.

Each entry in representation.structure is a structure step.

Each structure step must be an object with the following members:

key value type description requirement
role string A lowercase ASCII token naming the intended organizational role of the step. This specification may define common roles for interoperability, but generators may use additional role values required if the structure block is present
facet-keys array of strings A non-empty array of facet key strings. Each string must correspond to a key present in facets required if the structure block is present

The order of entries in representation.structure is significant and defines a nesting sequence from outer to inner organization steps.

The meaning of a structure step depends on representation.type and, when present, representation.subtype. Consumers may ignore unknown role values, may ignore representation.structure entirely, and should not treat it as an error if representation.structure is present but not understood.

Consumers must not treat representation.structure as an authorization signal or as a trigger for privileged or unsafe behavior.

An example of a representation block with two structure steps, "cluster" and "stack". The values "cluster" and "stack" each match a facet-string. The result, in a supporting chart system, would be a multi-series bar chart with "brand" clusters, stacked by "region".
{
  "representation": {
    "type": "chart",
    "subtype": "bar",
    "structure": [
      { "role": "cluster", "facet-keys": ["brand"] },
      { "role": "stack", "facet-keys": ["region"] }
    ]
  }
}

4.1.8. The facets key

The facets block contains details of each facet in the dataset, such as x and y axes or legend keys, for structured and semi-structured data.

The facets object contains one or more [facet-string] keys. A [facet-string] key may be any value allowed by the schema for the representation type, and will normally represent the axes (e.g. "x" and "y"), a legend (i.e. '"legend"'), a magnitude, or any other type of recorded data being represented. Each [facet-string] is the key to an object which defines the parameters, value type, and descriptors for the corresponding key in the child objects of the records array in the series array.

These [facet-string] keys must consist of a string of ASCII letters, digits, '_', and '-', with no whitespace. Each [facet-string] key must be a unique value.

An example of a facets block with two [facet-string] object entries, "x" and "y". The values "x" and "y" are matched in the records block objects, which contain the actual values for those facets.
{
  "datasets": [
    {
      "facets": {
        "x": {
          "label": "Month",
          "variableType": "independent",
          "measure": "ordinal"
        },
        "y": {
          "label": "Temperature",
          "variableType": "dependent",
          "measure": "ratio",
          "units": "degrees Fahrenheit",
          "multiplier": 1
        }
      },
      "series": [
        {
          "name": "Average monthly temperatures",
          "type": "column",
          "records": [
            {
              "x": "January",
              "y": "23"
            },
            {
              "x": "February",
              "y": "42"
            },
            {
              "x": "March",
              "y": "55"
            }
          ]
        }
      ]
    }
  ]
}
4.1.8.1. The [facet-string] key
key value type description requirement
label non-empty string The title for the facet optional for unstructured data, required for structured and semi-structured data
datatype non-empty string The data type of the facet. Allowed values are: string, number, boolean, and date optional for unstructured data, required for structured and semi-structured data
variableType string The type of measure for that facet. Allowed values are: independent and dependent optional for unstructured data, required for structured and semi-structured data
measure string In any given facet, what the scale of measure for that facet is. Allowed values are: nominal, ordinal, interval, or ratio optional for unstructured data, required for structured and semi-structured data
units non-empty string The type of thing being measured optional for unstructured data, required for structured and semi-structured data
multiplier number The multiplier for numeric values, e.g. 1000 for "(in thousands)", 1000000 for "(in millions)" optional for unstructured data, required for structured and semi-structured data
4.1.8.2. The datatype key

A facet object may include a datatype member. When present, datatype must be a string identifying the semantic type used to interpret facet values.

The value of datatype defines how facet values are interpreted for the purposes of comparison, ordering, filtering, and validation. Implementations must apply the semantics defined by the declared datatype when evaluating facet behavior. datatype is normative and is not merely a presentation hint.

If datatype is not present, implementations must treat facet values as having the "string" datatype.

4.1.8.2.1. core datatype values

A conforming implementation must recognize and support the following core datatype values:

value description
string Facet values are interpreted as Unicode strings and compared using code point-based lexical ordering.
number Facet values are interpreted as numeric values. Implementations must attempt to parse values as finite real numbers. Comparison and ordering must be numeric rather than lexical.
boolean Facet values are interpreted as boolean values. Implementations must recognize true and false. Other values must be treated as invalid.
date Facet values are interpreted as temporal values. Implementations must parse values using a well-defined, implementation-documented date or date-time format and must compare values according to their temporal ordering.
4.1.8.2.2. value coercion and validity

When evaluating facet values, implementations may perform type coercion consistent with the declared datatype. If coercion fails, the value must be treated as invalid for the purposes of facet evaluation.

Invalid, missing, or unparsable values must not cause evaluation failure of the containing facet. Such values may be ignored, grouped separately, or surfaced as warnings, but implementations must behave consistently for a given datatype.

4.1.8.2.3. extensibility

Implementations may support additional, non-core datatype values. Unknown datatype values must not cause a facet to be rejected. If an implementation does not recognize a declared datatype, it must treat facet values as having the "string" datatype.

Specifications or profiles may define additional datatypes and their associated semantics.

4.1.8.2.4. conformance

A conforming implementation must apply the semantics defined by the declared datatype when the value is recognized. An implementation that ignores a recognized datatype when evaluating facet behavior does not conform to this specification.

4.1.9. The series key

The series block contains the information about each series of the dataset for structured and semi-structured data, including the raw data values of each record in each series.

The series object contains one or more objects, each of which contains one or more [facet-string] keys, each with a corresponding value. Each [facet-string] key in a series array object must be defined in the facets block, and the value for that [facet-string] key must conform to the datatype and other properties defined for it in the facets block.

key value type description requirement
name non-empty string The name for this series, normally used as the series label optional for unstructured data, required for structured and semi-structured data
records array The raw data values for each facet of each record of this series optional for unstructured data, required for structured and semi-structured data
4.1.9.1. The name key

The name key has a value that is a string that is the name for this series, normally used as the series label.

4.1.9.2. The records key

The records block contains the raw data values for each facet of each record of this series.

The series array contains one or more objects, each of which contains one or more [facet-string] keys, each with a corresponding value. Each [facet-string] key in a series array object must be defined in the facets block, and the value for that [facet-string] key must conform to the datatype and other properties defined for it in the facets block.

4.1.9.3. The items key

A loosely organized collection of items for unstructured data, such as labels, values, and descriptions for diagrams.

TODO: describe the items object

4.1.9.4. The source key

The provenance of the dataset.

TODO: describe the source object

4.1.10. The href key in a datasets block

The href key defines an object that refers to an external data file. The content model and processing of the href key are different in the datasets block than in other metadata blocks.

This external file can be used to link data to graphical elements, as defined in Inline data.

4.2. Selectors

In order to identify and link between the DOM of an SVG document and the JSON metadata, this specification uses CSS Selectors and JSONPath.

CSS Selectors [Selectors-4] enables the matching of DOM nodes, such as graphical or textual elements.

JSONPath [JSONPath] enables the matching of JSON keys or values in the JSON metadata. Conforming implementations must implement JSONPath (RFC 9535) semantics.

TODO: include more details about linking

4.2.1. The selectors block data

The selectors block is an object containing named sets of selectors.

The selectors object contains one or more [selector-string] keys. A [selector-string] key may be any value allowed by the schema for the representation type, and will normally represent a human-readable identifier for the selector set (e.g. "title", "bar-3", or "axis_labels"). Each [selector-string] is the key to an object which defines the DOM selectors, JSONPath selectors, and optional descriptor for the selector set.

These [selector-string] keys must consist of a string of ASCII letters, digits, and '_', and '-', with no whitespace. Each [selector-string] key must be a unique value.

4.2.1.1. The [selector-string] key
key value type description requirement
dom A string of a valid CSS selector, or an array of strings of a valid CSS selectors The DOM selector (i.e. CSS selector) or selectors indicating a DOM element or set of elements that is the representation of a component of a JIM dataset. required
json A string of a valid JSONPath selector, or an array of strings of a valid JSONPath selectors The JSONPath selector or selectors indicating a JIM key or set of JIM keys that is basis for the DOM representation. required
note non-empty string A human-readable description of what the selector set represents. This is informational only. optional

TODO: Maybe we could also add an option for a class key (with an array of strings as the value), so all selectors sets with a certain class can be addressed by target selectors?

An example of a selectors block with three selector set object entries.

The first selector set object selects an element with the id="chart-title" as the target element, and associates that with the object key for the title of the first dataset in the JSON.

The second selector set object selects an element with a complex DOM selector for an element without an id attribute as the target element, and associates that with the object key for a label of a record in the dataset of the JSON.

The third selector set object selects an element with id="datapoint-Sunday_2000" as the target element, and associates that with the object key for all of the values of a record in the dataset of the JSON, using a wildcard.

{
  "selectors": {
    "title": {
      "dom": "#chart-title",
      "json": "$.datasets[0].title"
    },
    "xLabel3": {
      "dom": "#x-axis g:nth-child(3) text",
      "json": "$.datasets[0].series[0].records[2].x"
    },
    "dp5": {
      "dom": "#datapoint-Thursday_2000",
      "json": "$.datasets[0].series[0].records[4].*",
      "note": "x/y value for Thursday 2000"
    }
  }
}
An example of a selectors block with two selector set object entries with alternate syntaxes for the same kind of connection.

The first selector set object selects an element with the id="datapoint-Sunday_2000" as the target element, and associates that with the object key for the x and y values of a record in the dataset of the JSON, using an array of JSONPath selectors.

The second selector set object selects an element with id="datapoint-Monday_2000" as the target element, and associates that with the object key for all of the values of a record in the dataset of the JSON, using a wildcard.

Assuming a record structure with only x and y keys, these selectors would yield the same outcome, making the wildcard selector a shorthand for the array syntax.

{
  "selectors": {
    "dp0": {
      "dom": "#datapoint-Sunday_2000",
      "json": [
        "$.datasets[0].series[0].records[0].x",
        "$.datasets[0].series[0].records[0].y"
      ]
    },
    "dp1": {
      "dom": "#datapoint-Monday_2000",
      "json": "$.datasets[0].series[0].records[1].*"
    }
  }
}

TODO: provide examples of one-to-many, many-to-one, and many-to-many selector sets.

4.2.1.2. [selector-string] as target selector

The JSONPath for any given [selector-string] key can be used as the value of a behavior target selector.

An example of a selector block with one selector set object entry, and a behaviors block with an entry referencing that selector set by its JSONPath.
{
  "selectors": {
    "dp5": {
      "dom": "#datapoint-Thursday_2000",
      "json": "$.datasets[0].series[0].records[4].*"
    }
  },
  "behaviors": [
    {
      "target": {
        "selector": "$.selectors.dp5"
      },
      "enter": {
        {
          "haptic": {
            "durations": [ 50, 100, 150 ]
          }
        }
      }
    }
  ]
}

4.2.2. Inline data

If a dataset is fairly small, it is best to include it in the image file directly.

4.2.2.1. Linking data and representational elements

Each data point in the raw data may be represented visually in the image. There are multiple benefits in providing a mechanism to link a data point in the dataset to its representation, such as enabling users to drill down into the raw data value for a data element, or selecting a label in a different language than the graphical text label.

This specification defines a selector attribute to enable this relationship between a graphical element and the data point it represents.

4.2.2.1.1. Data model cardinality

The relationship between a graphical element and the data it represents might be complex.

Sometimes this is a one-to-one relationship, where one element or set of elements represents exactly one data point, such as a specific bar in a bar chart, where the height of the bar signifies a specific numerical value.

Sometimes this is a one-to-many relationship, where one image element represents multiple data points, such as a symbol in a scatterplot that represents two values (the independent or x-axis value and the dependent or y-axis value), or a single bar in a histogram that represents an aggregated set of values, or a segment in a line chart that represents the change in x/y value pairs at the beginning and end of each time period (possibly with a symbol representing one specific x/y data point pair). The one-to-many relationship might go the other way, where multiple graphical elements represent the same data point, such as in a bar chart where the height of the bar and the numeric label of the bar both represent the same value, or a segment in a line chart that represents the change in x/y value pairs at the beginning and end of each time period (possibly with a symbol representing one specific x/y data point pair)

Rarely, you might have a many-to-many relationship.

TODO: provide a many-to-many example.

On occasion, there are extra data points in the raw dataset that are not currently depicted in the visualization, either because they have been actively filtered out, or because the author chose not to include that data point, such as when it presents an extraneous factor or is outside the range subset the visualization depicts. In this case, inclusion of this dataset in the metadata might be for context or completeness.

TODO: talk about label elements and their representation.

TODO: cover rounding and precision in visual labels vs raw data

4.2.3. External data reference

Note: It might seem unintuitive to refer to data as metadata, but in the context of a graphics image file, the "data" of the file is the graphical elements that compose the file, while the raw data (the dataset) that is being represented is the metadata.

4.3. External files

A metadata object can be inline in the file, or referenced as an external file, in whole or in part. There are several reasons to support external metadata files. Two major use cases include size restrictions and shared resources.

Size restrictions: While including the raw data in the image is a best practice, sometimes a dataset is simply too large to pragmatically include inline. In this case, referencing a raw data file is the best practical approach.

Shared resources: An author might wish to share a common metadata file or set of files for a set of image documents. This allows the metadata files to be updated independently, reduces duplication and file size, enables cacheing in the user agent, and allows for a modular approach with well-tested rules.

The href key defines an object that refers to an external file. This external file must be loaded by the user agent, and applied following the same processing rules as an inline metadata file, with the exception of the datasets block.

If an external metadata file contains rules that duplicate or conflict with rules defined in the inline metadata, the inline metadata takes precedence. This allows for customization of specific images while relying on generic common rules.

An href key may be included at any level of the metadata, with the relevant key string as the attribute key and a valid URL string as the value for that attribute.

{
  "href": {
    "datasets": "https://path.to.datasets",
    "selectors": "https://path.to.selectors",
    "behaviors": "https://path.to.behaviors",
    "provenance": "https://path.to.provenance",
    "tags": "https://path.to.tags",
  }
}

An additional default key is defined for referring to a single external metadata file that contains all the metadata blocks. This external file must be loaded by the user agent, and applied following the same processing rules as an inline metadata file.

{
  "href": {
    "default": "https://path.to.metadata"
  }
}

4.4. Behaviors

A behaviors block is an array with the key behaviors, which contains one or more objects including at least a target (defined by the target key), and an announcement and/or an event type containing one or more of the behaviors, including haptic, audio, sonification, contrast, and tactile.

4.4.1. Behavior and conforming generators

A generator conforming to this specification is not required to add behaviors to graphics, only a dataset and selectors.

Note: While this specification does not define best practices for different behaviors for different classes of user agent and graphics, a companion best practices document is under development. A conforming generator might be much more useful if it also follows that guidance and generates graphic documents with behaviors optimized for the target user agent capabilities.

4.4.2. Behavior events

An event block is an object with the one of the enter, details, activate, and exit keys, and a value which is a list of key-value pairs defining specific behaviors.

key description
enter Dispatched when the user-guided pointer moves onto the element
details Dispatched after the enter event, and before the exit event, in a manner determined by the user agent, typically when the user-guided pointer has persisted over the element for a predetermined length of time (usually less than 2 seconds)
activate Dispatched when the user activates the element, such as with a click of the Enter or Space key, or issues a verbal command, as supported by the user agent
exit Dispatched when the user-guided pointer moves off of the element

Moving from one event state to another must cancel any currently active behavior, and initiate the new event behavior.

4.4.3. Behavior types

4.4.3.1. Announcement Behaviors

Announcements (or "voicings" or "utterances") are textual statements associated with a user interface element that may be presented to the end user, either as screen reader prompts, braille display, or visual text. An announcement is defined in an announcement block.

Details on the syntax of an announcement block are defined in the Announcements section.

4.4.3.2. Audio Behaviors

Audio playback is the use of sound to provide feedback to the user as they interact with an image or object.

Note: The audio block is distinct from the related sonification block in that it plays back a static prerecorded audio file or earcon, while the sonification block dynamically generates the audio given the dataset and selected element.

Details on the syntax of an audio block are defined in the Audio section.

4.4.3.3. Embossing Behaviors
4.4.3.4. Haptic Behaviors

Haptics is the use of vibration to provide feedback to the user as they interact with an image or object.

Haptic feedback typically uses vibration patterns on a touch surface to provide the user with information when the user moves their pointer over a particular region of the screen. In a web browser, this is enabled through the W3C Vibration API [vibration]. Other haptic feedback methods may be used on specialized devices, and unless otherwise stated, they will use the same haptic pattern syntax as for vibrations.

Haptic feedback may be accompanied or supplemented by auditory feedback.

Each instance of haptic feedback is associated with a target and event type, and is composed of a haptic block, with a durations key and optional intensities, repeatInterval, and repeatIndex keys.

The behavior must be triggered when the user moves a pointer over the target region, and must stop immediately when the user moves their pointer outside the target region.

An example of a haptic block with two haptic object entries in an array. The first haptic object selects an element with the id=bar_1 as the target region, and defines a repeating vibration pattern. The second haptic object defines a circle as the target region, defines a simple repeating vibration pattern, and links to an external audio file that will be played once when the target is first triggered.
{
  "behaviors": [
    {
      "target": {
        "selector": "#bar_1"
      },
      "enter": {
        "haptic": {
          "durations": [ 50, 100, 150 ],
          "repeatInterval": 250
        }
      }
    },
    {
      "target": {
        "shapes": [
          {
            "element": "circle",
            "cx": 100,
            "cy": 121,
            "r": 32
          }
        ]
      },
      "enter": {
        "haptic": {
          "durations": [ 0, 150 ],
          "repeatInterval": 250
        },
        "audio": {
          "href": "./assets/example.mp3",
          "repeat": "none"
        }
      }
    }
  ]
}
4.4.3.5. Refreshable Pin Display Behaviors

Note: Depending on the refresh rate of the refreshable pin display, the haptic pattern might be applicable and useful here.

TODO: Cover refreshable braille displays and tactile tablets like the Graphiti, Dot pad, Monarch, and single-line braille displays.

4.4.3.6. Sonification Behaviors

Sonification is the audio playback of sounds to represent data. A sonification is defined in an sonification block.

Details on the syntax of an sonification block are defined in the Sonification section.

4.4.3.7. Tactile Behaviors
4.4.3.8. Visual Contrast Behaviors
TODO: Reference and expand on CSS prefers-dark-mode media query.

4.5. Provenance

Provenance describes where a document came from. This includes entries such as authors, organizations, date and time created or modified, where it was first published, the title of the paper or article it supplemented, or the original work on which it was based.

A provenance block is an object with the key provenance.

{
  "provenance": {
    "notes": [
      "item 1",
      "item 2"
    ]
  }
}

4.5.1. Notes

The notes key defines an array where any details not covered by specific keys can be identified. This might include the name of the organization or individuals who sponsored the work, a dedications to a meaningful person in the author’s life, and so on.

Each item should be a quoted non-empty string separated by a comma.

{
  "provenance": {
    "notes": [
      "item 1",
      "item 2"
    ]
  }
}  

4.6. Tags

Tags are a way to categorize the file, either by the content, by the capabilities, or by a rating system, or some combination thereof.

Tags are used to aid in filtering and searching for content.

4.6.1. Keywords

The keywords key denotes an array of user-defined non-empty strings. Examples of keywords include labels from a folksonomy, terms defined in a formal document, steps in a workflow process, short descriptions of items depicted in the image, or any other strings.

{
  "tags": {
    "keywords": [
      "barchart",
      "design_phase",
      "unreviewed",
      "needs_braille"
    ]
  }
}

4.6.2. Capabilities and ratings

Note: This is a rough notion of how we might define capabilities

The capabilities key denotes an object that includes which accommodations have been defined in the image document, and some system of rating that scores the effectiveness of that accommodation.

Each capability defines an array of capability instances, each of which consists of a least a condition of applicability, and a rating.

{
  "capabilities": {
    "haptic": [
      {
        "condition": "(insert haptic device capability here)",
        "ratings": [
          {
            "user": "Devin C.",
            "username": "dev",
            "rating": "7",
            "comments": "Works well on device X, but not device Y"
          }
        ]
      }
    ],
    "tactile": [
      {
        "condition": "@media print and (min-resolution: 300dpi)",
        "ratings": []
      },
      {
        "condition": "@media (min-resolution: 100dpi)",
        "ratings": []
      }    
    ]
  }
}  

Note: Consider splitting out capabilities and ratings into separate entries.

Note: To be defined further.

4.7. Version

A conforming document may include an optional version key.

A version block is an object with the key version, which contains one or more key-value pairs including one or more of the keys document and jim and their values.

A version block, if present, must be at the root level of the JIM document.

A version block, if present, must include a jim key with a valid value. The value of the jim key must be a string conforming to the Semantic Versioning 2.0.0 specification, 3 integers separated by periods (.). The default value of the jim key is 1.0.0. The value must refer to a published version of the JIM specification.

The optional document key defines the version of the document itself. The value of the optional document key, if present, may be any non-empty string that is meaningful to the author, but it should be a string conforming to the Semantic Versioning 2.0.0 specification.

key value type description requirement
document non-empty string The version of the document itself optional
jim semantic version string The version of the JIM specification the document conforms to required
{
  "version": {
    "document": "1.0.1",
    "jim": "0.4.2",
  }
}

5. Targets

A target is an area of an diagram image that is associated with a particular behavior.

Targets may consist of one of three possible values:

  1. An element link: a selector indicating a specific element or set of elements in the DOM of a SVG file;

  2. An JSONPath link: a selector indicating a specific selector set in the JIM of a SVG file;

  3. A shapes array: an array of objects that defines a shape to be dynamically represented in the diagram file.

5.1. Target selector key

A target may consist of a selector key with a valid selector, either a DOM selector pointing to one or more elements in the DOM of an SVG file, or a JSONPath selector pointing to a selector set.

5.1.1. Target selector for elements

An element target may be a textual element, a shape element, or a container element. If the selector target is a container element, the shape target is all targetable elements within that container, but not the bounding box of the container element itself.

If the selector is not valid, or if the target element is not found, the target definition is ignored.

A target selector targeting a single element with an id.
{
  "target": {
    "selector": "#bar_1"
  }
}
A target selector targeting all elements with the class line in the container element graph_area.
{
  "target": {
    "selector": "#graph_area .line"
  }
}
A target selector targeting all elements in the container element graph_area.
{
  "target": {
    "selector": "#graph_area"
  }
}

5.1.2. Target selector for selector sets

A selector set target may be a JSONPath selector pointing to a selector set, or an array of such JSONPath selectors.

TODO: Expand on this.

An example of a target block with a selector key which has a JSONPath selector addressing a labeled selector set.
{
  "selectors": {
    "dp5": {
      "dom": "#datapoint-Thursday_2000",
      "json": "$.datasets[0].series[0].records[4].*"
    }
  },
  "behaviors": [
    {
      "target": {
        "selector": "$.selectors.dp5"
      }
    }
  ]
}

5.1.3. Target shapes

The shapes block is an array that contains a set of shapes that must be automatically generated by a conforming implementations and overlay in the top rendering layer of the document. These shapes must intercept pointer events, but must not be rendered by default.

A conforming implementation, particularly a generator, may render a overlay shape, particularly on user request.

Note: This feature is intended to enable drawing "invisible shapes" over areas of the image. Sample uses include creating larger hit-detection targets for small or fragmented graphical elements, grouping items, providing precision for overlaps, creating overlays for raster images that aren’t vectorized, or for any other purpose where you don’t want to rely on the native SVG hit detection for the graphical element. The intention is that these hidden shapes are isolated from the SVG itself, and are defined solely in the JSON metadata;the user agent dynamically generates these and inserts them into the DOM, keeps the source SVG file clean.

5.1.3.1. Target shape fill and stroke

The fill and stroke attributes control whether that shape’s fill or stroke area of that shape, respectively, must intercept pointer events. If the fill attribute is defined, the fill area of the shape must intercept pointer events. If the stroke attribute is defined, the stroke area of the shape, with a thickness defined by the stroke-width attribute, must intercept pointer events.

If the values of the fill or stroke attributes are valid color definitions, and a conforming implementation is rendering the overlay shape, the implemention should render that overlay shape with the colors defined.

5.1.3.2. Target shape keys

A shapes block is an array of objects, each of which contains a set of keys representing the shape element (the SVG shape element tag name) and the required attributes. The definitive list of shapes is defined by the SVG 2.0 [SVG2] specification. All SVG shape elements are valid.

Any attributes not defined by the target shape definition default to the SVG lacunae values for that attribute. Any attributes with attribute values not conforming to the SVG 2.0 specification must be discarded.

key value type description requirement
element non-empty string The name of the element, per the SVG 2 definition required
fill CSS color string The fill color for the shape, if it is rendered. If this key is present and has a non-empty string value, the shape’s fill area must intercept pointer events. Non-color values should be ignored for purposes of rendering. optional
stroke CSS color string The stroke color for the shape, if it is rendered. If this key is present and has a non-empty string value, the shape’s stroke area, with the thickness defined by the stroke-width key, must intercept pointer events. Non-color values should be ignored for purposes of rendering. optional
stroke-width number The width of the stroke, as defined in the SVG 2.0 specification. optional
[attribute-name] non-empty string The name of the SVG element attribute. Allowed values conform to the definition of the attributes for the element in the SVG 2 specification optional, but should be defined to provide the position and shape of the element
A target describing a circular area with a centerpoint at 100, 121 and a radius of 32 pixels.
{
  "target": {
    "shapes": [
      {
        "element": "circle",
        "cx": 100,
        "cy": 121,
        "r": 32
      }
    ]
  }
}
A target describing a hexagonal area using the path definition and two rectangle definitions, which together make a single target. The first rectangle definition elides the "x" attribute, which defaults to 0, per the SVG 2 specification.
{
  "target": {
    "shapes": [
      {
        "element": "path",
        "d": "M200,50 L269,90 V170 L200,210 L131,170 V90 Z"
      },
      {
        "element": "rect",
        "y": "90",
        "width": "120",
        "height": "80"
      },
      {
        "element": "rect",
        "x": "279",
        "y": "90",
        "width": "60",
        "height": "80"
      }
    ]
  }
}

For each shape definition, a conforming implementation must instantiate a corresponding shape matching the element type and attributes.

6. Announcements

Announcements (or "voicings" or "utterances") are textual statements associated with a user interface element that may be presented to the end user, either as screen reader prompts, braille display, or visual text.

{
  "announcement": {
  }
}

Announcements may be implicit or explicit. Explicit announcement blocks must be defined in the top level of the behavior object, not in an event block.

Note: The motivation for separating announcments from other behaviors in deliberate. Announcements are fundamental and universal to all graphical document types, and are intended to be triggered by user-agent-specific mechanisms, while other behaviors are specific to particular graphical documents or document types, and require authorial intent.

6.1. Implicit (default) announcements

By default, a conforming JIM viewer must generate an implicit announcement name for any element identified by a dom selector that has a valid json JSONPath selector that resolves to an endpoint (i.e., a location per the JSONPath specification), triggered as for the enter event.

For each valid JSONPath endpoint that resolves to a key-value pair, the default announcement must be the text of the value.

For each valid JSONPath endpoint that resolves to an object, the default announcement must be the text of each value that is a direct child of the object, as a comma-separated list.

For each valid JSONPath endpoint that resolves to an wildcard (*), the default announcement must be the text of each value that matches the wildcard, as a comma-separated list.

A conforming user agent may allow a user to change the default announcement content to include both the key and the value for any valid endpoint.

An example of a default implicit announcment based on a selectors block.

In this example, when the user moves the pointer over the SVG element with the id of datapoint-Sunday_2000, the conforming JIM viewer would announce, "2000, Sunday, 57".

{
  "datasets": [
    {"series": [
        {
          "name": "2000",
          "type": "column",
          "records": [
            {
              "x": "Sunday",
              "y": "57"
            },]
        },]
    }
  ],
  "selectors": {
    "Sunday_2000": {
      "dom": "#datapoint-Sunday_2000",
      "json": [
        "$.datasets[0].series[0].name",
        "$.datasets[0].series[0].records[0].*"
      ]
    },}
}

6.2. Explicit announcements

For more control over the content, type, and event triggers of an announcement, the author can include an announcement block within a behavior block, using one or more announcement types.

If an announcement block does not include a name announcement type, the implicit name announcement must be used.

6.3. Announcement types

WCAG defines two primary types of announcement: the accessible name; and the accessible description. This specification defines analogs to that convention as announcement name and announcement description, and adds two more types, announcement details and announcement hints.

An announcement block is an object with the key announcement, which contains one or more key-value pairs including one or more of the keys name, description, details, and hint keys and their values.

key value type description requirement
name non-empty string The announcement name of the graphic element optional
description non-empty string The longer description of the graphic element optional
details non-empty string The supplemental announcement based on interaction with the graphic element optional
hint non-empty string The instructions for use of the graphic element optional

6.3.1. Announcement name

The announcement name is normally short, 1-3 words, and serves two primary purposes:

  1. To convey the purpose or intent of the element;

  2. To distinguish the element from other elements in the graphic.

A document conforming to this specification must define an announcement name for all visual elements that convey meaning, unless that element is deliberately excluded (such as to remove redundancy or simplify a graphic).

As noted in the implicit announcements section, the default announcement name for a meaningful element is defined by the selector structure, and can be overridden in an explicit announcement block.

{
  "announcement": {
    "name": "purpose of the element"
  }
}

6.3.2. Announcement description

The announcement description is a longer text passage that provides additional information for an element, such as visual or structural details, relationship to other elements, or annotations.

{
  "announcement": {
    "description": "additional information for an element"
  }
}

6.3.3. Announcement details

The announcement details is a longer text passage that provides supplemental information for an element based on a specified user interaction with the element.

{
  "announcement": {
    "details": "supplemental information for element interaction"
  }
}

6.3.4. Announcement hint

The announcement hint is a text passage (ideally short) that provides instructions for interaction with the graphical element.

{
  "announcement": {
    "hint": "instruction for element interaction"
  }
}

6.4. Complete announcement example

An example of a default explicit announcment based on a selectors block.

An example of a selector block with one selector set object entry, a behaviors block with an entry referencing that selector set by its JSONPath, an announcement block, and a haptic block triggered on an enter event.

In this example, when the user moves the pointer over the SVG element with the id of datapoint-Sunday_2000, the conforming JIM viewer might announce, "2000, Sunday: $57", and play the haptic feedback. If the user requested more information (e.g., with a double-tap, long tap, or simply waiting, depending on the viewer UI), the viewer would announce "This is an increase of $8 over the previous value".

{
  "datasets": [
    {"series": [
        {
          "name": "2000",
          "type": "column",
          "records": [
            {
              "x": "Sunday",
              "y": "57"
            },]
        },]
    }
  ],
  "selectors": {
    "Sunday_2000": {
      "dom": "#datapoint-Sunday_2000",
      "json": [
        "$.datasets[0].series[0].name",
        "$.datasets[0].series[0].records[0].*"
      ]
    },
    "behaviors": [
      {
        "target": {
          "selector": "$.selectors.Sunday_2000"
        },
        "announcement": {
          "name": "Sunday, 2000: $57",
          "description": "This is an increase of $8 over the previous value"
        },
        "enter": {
          {
            "haptic": {
              "durations": [ 50, 100, 150 ]
            }
          }
        }
      }
    ]}
}

7. Audio

Audio feedback can take one of three forms:

  1. Playback of an audio file

  2. Playback of a pre-defined earcon

  3. Dynamic generation of a sound with an ADSR envelope definition

7.1. Audio files

An audio file is a prerecorded external sound file that is referenced through a file path in an href key.

An external link may be to a local file or a file on the Web. Local audio files are recommended for lack of latency. This specification does not define how audio files are distributed or packaged with diagram files; a separate specification may define a packaging format.

Conforming user agents must support the following audio formats for audio playback:

Name File extension MIME type
MP3 .mp3 audio/audio/mpeg
Ogg Opus .opus audio/ogg; codecs=opus
Ogg Vorbis .ogg audio/ogg; codecs=vorbis
Wave .wav audio/wav

7.2. Earcons

An earcon is a brief, distinctive sound which represents a specific event or conveys other information. It is the audio equivalent of an icon. This specification defines a mechanism for playback of a pre-defined set of earcons that conforming user agents must support. An earcon consists of a unique name and an audio definition. A list of earcon definitions will be defined in a separate specification.

7.3. ADSR

7.3.1. ADSR definitions

TODO: provide more details here.

See ADSR envelopes for more details.

7.3.2. ADSR envelopes

Timed media, such as haptic or sonification effects, are defined in this specification with the common Attack-Decay-Sustain-Release envelope.

This is the most common kind of envelope generator, and it has four stages: attack, decay, sustain, and release (ADSR). Attack, decay, and release all refer to time, while sustain refers to effect level. Each of these stages is defined by an attribute

All time values are expressed in seconds. No unit is necessary.

These attributes are contained in an effect block. The effect key defines the unit type for the sustain attribute. Possible values for the effect key are:

Note: Consider using A-weighted decibels (dBA) instead of dB. The dB scale is based only on sound intensity, while the dBA scale is based on intensity and on how the human ear responds, which better describes when sound can damage your hearing.

Note: Consider whether or not to include pitch as an effect type. It might be easier for authoring if the author is a musician, but it can be looked up and expressed as frequency. But it’s a nice-to-have.

Any given rule can have multiple effect types. For example, a sonification could simultaneously change amplitude and frequency.

8. Haptics

Haptics is the use of vibration to provide feedback to the user as they interact with an image or object.

Haptic feedback typically uses vibration patterns on a touch surface to provide the user with information when the user moves their pointer over a particular region of the screen. In a web browser, this is enabled through the W3C Vibration API [vibration]. Other haptic feedback methods may be used on specialized devices, and unless otherwise stated, they will use the same haptic pattern syntax as for vibrations.

Note: The W3C Vibration API defines only a simple non-repeating sequence of durations. This specification extends that with options for an additional value for repetition of that pattern with a fixed interval between pattern instances, a repetition index, and a matching amplitude array.

A haptics block is an object with the key haptics, which contains one or more key-value pairs including one or more of the keys durations, intensities, repeatInterval, and repeatIndex keys and their values.

8.1. The haptic key

key value type description requirement
durations integer array An array of integers, each representing the duration of that segment in milliseconds.

The even entries in the array (starting with index 0) must define the number of milliseconds without vibration, and the odd entries in the array must define the number of milliseconds with vibration.

required
intensities number array An array of numbers, expressed as a percentage of the range 0 to 1, each representing the vibrational intensity of that segment, indexed to the same sequence as the duration array.

The even entries in the array (starting with index 0) correspond to the pauses between vibrations and be ignored. The odd entries in the array must define the amplitude of the vibration for the corresponding vibration duration in the duration array.

optional
repeatInterval integer The time in milliseconds before the duration array begins repeating. If the value is 0 or a positive integer, the haptic pattern must repeat until the activating behavior is terminated (such as by the user moving their finger off the target region). If the value is a negative integer, or if the value is omitted, the haptic pattern must not repeat. optional
repeatIndex integer The index of the duration array that is the starting duration for repetition, if any. If the value is 0 or a positive integer, each repetition of the haptic pattern must start at the indicated index and continue to the end of vibration duration array. If the value is a negative integer, or if the value is omitted, the haptic pattern must start at the 0 index. Repetition must only occur according to the value of the repeatInterval. optional

8.2. Vibration patterns

A haptic vibration pattern, denoted by a durations key of the haptic object, is an array of integers defining a sequence of vibration durations. The sequence of durations is an alternating pattern of vibrations and pauses, starting with an initial pause before the first vibration starts.

8.3. Intensity patterns

The sequence of intensities is an array of numbers describing an alternating pattern of vibration intensities and pauses, indexed to the same sequence as the vibration duration array.

The intensities key value array defines how strong the vibration is for each value in the durations array. The intensity is a value from 0 to 1. Any values outside the range 0.0 (no intensity) to 1.0 (highest intensity) will be clamped to this range.

Each intensity value is a percentage of the maximum vibration amplitude of the device or the implementation. An implementation may define a range of amplitudes that is narrower than the device’s capabilities.

Note: Each device with vibration support has a different range of amplitudes. Thus, this specification does not define an amplitude with units, but rather a percentage of the device’s maximum possible amplitude.

Note: The W3C Vibration API does not currently define any capabilities for the amplitude of vibration, and will ignore the amplitudes. This specification includes the amplitudes key to support the Android Vibration API and the iOS Core Haptics API.

If the platform or operating system does not support variable intensities, the intensities key and value must be ignored.

Note: The intensities array is defined to mirror the vibration duration array, for ease of authoring. The values of "blank" (even-numbered) indexes have no effect, and can be defined as 0 to avoid confusion.

8.4. Haptic examples

A simple haptic pattern that will trigger a single vibration lasting 200 milliseconds.
{
  "haptic": {
    "durations": [ 0, 200 ]
  }
}
A haptic pattern that will trigger a single vibration lasting 200 milliseconds, once every 500 milliseconds.
{
  "haptic": {
    "durations": [ 0, 200 ],
    "repeatInterval": 500
  }
}
A complex haptic pattern that will trigger a sequence of vibrations 10 milliseconds after the event (e.g. a touch event), the first vibration lasting 50 milliseconds, followed by a pause of 100 milliseconds, followed by a second vibration that lasts 150 milliseconds, with the pattern repeating every 250 milliseconds starting at the second vibration index (the 50 millisecond vibration).
{
  "haptic": {
    "durations": [ 10, 50, 100, 150 ],
    "repeatInterval": 250,
    "repeatIndex": 1
  }
}
A set of duration and amplitude patterns that will trigger a pair of vibrations lasting 200 milliseconds with a pause of 100 milliseconds between them, once every 500 milliseconds. On supporting devices, the first vibration will have an intensity of 30%, and the second vibration will have an intensity of 60%.
{
  "haptic": {
    "durations": [ 0, 200, 100, 200 ],
    "intensities": [ 0, 0.3, 0, 0.6 ],
    "repeatInterval": 500,
    "repeatIndex": 1
  }
}
A set of duration and amplitude patterns that simulate a set of rising and falling tones.

For each pair of vibration durations, there is a pause of 0 milliseconds between them, but a change of intensity between the first and second vibration, making it feel like a single vibration that changes intensities.

The first pair will collectively last for 400 milliseconds, doubling in intensity halfway through; after a pause of 350 milliseconds the second pair will collectively last for 250 milliseconds, with the intensity dropping from full to one third after 150 milliseconds.

{
  "haptic": {
    "durations": [ 0, 200, 0, 200, 350, 150, 0, 100 ],
    "intensities": [ 0, .5, 0, 1, 0, 1, 0, .3 ],
  }
}

TODO: provide examples with the selector, to illustrate authoring

9. Sonification

The sonification block dynamically generates audio feedback given the dataset, selected element, and optionally cursor position.

10. Views

The views block defines an object containing named sets of views of the graphical document. A view is composed of a viewport key and a set of selectors and style definitions states, along with a name and description of the view.

If a JIM document includes views, a conforming JIM viewer should provide an affordance to enable an end user to select among those views, and adjust the digital document accordingly. The view selection affordance must include a way to return to the default initial implicit view.

The styles block contains style definition states, composed of a DOM selector key which has a value of an object, containing style property names and values. A conforming JIM viewer must apply these styles via CSS when the view is active, and must revert to the document defaults for those style properties when the view is not active.

Note: A JIM view is similar to, but more expansive than, an SVG <view> element. Both define a viewport (viewBox in SVG), but SVG does not define styling for views.

key value type description requirement
viewport object An object defining the x, y, width, and height of the viewport optional
styles object An object defining the selector key, which has an object with the style properties optional
{
  "views": {
    "state-nc": {
      "name": "North Carolina",
      "description": "The US state of North Carolina, with counties and major cities",
      "viewport": {
        "x": "279",
        "y": "90",
        "width": "60",
        "height": "80"
      },
      "styles": {
        ".county": {
          "display": "inline"
        },
        ".city": {
          "display": "inline",
          "fill": "purple"
        },
        "text": {
          "display": "inline"
        } 
      }
    }
  }
}

11. Selective Device Capabilities

All devices, including assistive technology devices, have specific capabilities distinct from other similar devices. Some of those capabilities require specific changes to a graphical document’s appearance, such as hiding or showing certain objects or layers to simplify a document for embossing vs printing, or rendering a thicker stroke on shapes so there’s a larger "hit box" for haptics.

This specification defines a set of device capabilities for assistive technology devices such that the most appropriate stylesheet can be selected for displaying the document on that device.

Note: This is similar to and informed by the CSS Media Queries capabilities, but dedicated specifically to the various capabilities relevant for assistive technology devices, at a finer-grained level of detail than CSS currently defines. A future CSS specification might adopt these device capabilities, especially if the assistive technology device industry adopts this mechanism and enough content uses it.

Note: This specification will not define styles for best practices for these different capabilities, but only the mechanism to switch or select stylesheets. Another companion specification or specifications might define best styling practice guidelines for specific media and devices.

TODO: Add device capabilities for haptics

11.1. Tactile Capabilities

Embossers, swell paper, and other tactile printing technologies each have a different set of capabilities. Some tactile printing technologies may have multiple levels of dot height, while others may have only possible 1 or 2 dot heights. Some technologies rely on the width of a line to determine the height of the raised surface, while others use color to determine surface height and pattern. Some allow fine details, while others require a sparse layout.

To enable a single document to be applicable to multiple tactile output techniques, this specification defines a capabilities switch with different CSS stylesheets to hide or show different elements, to increase line width, to use particular color palettes, and so forth.

The tactile key defines an object that defines conditional device capabilities, each of which is associated with its appropriate external CSS stylesheet file for print media. The appropriate external CSS file must be loaded by the user agent based on the target printer, and applied following the same processing rules as an inline CSS rule set.

Device capabilities….
{
  "tactile": {
    "dpi > 200": {
      "href": "https://path.to.high_dpi_stylesheet.css"
    }
    "dpi < 200": {
      "href": "https://path.to.low_dpi_stylesheet.css"
    }
  },
}
{
  "tactile": {
    "dot_height > 5": {
      "href": "https://path.to.max_dotheight_stylesheet.css"
    }
    "dot_height < 2": {
      "href": "https://path.to.min_dotheight_stylesheet.css"
    }
  },
}

TODO: Define and describe mechanisms for setting layers on objects (e.g. print only, print and tactile, emboss only, braille)

11.2. Object Search Capabilities

For touch screen and similar devices, by way of aiding the user’s orientation withi a graphic document, a conforming JIM viewer should provide a means to guide an end user to any target object (that is, any object with a selector, accessible name, or text content) in the graphic document.

The means that the conforming JIM viewer uses to guide the user may vary. At the minimum, the viewer should allow the end user to chose any target object from a list of all available target objects, and provide verbal, auditory, or haptic cues that guides the end user from their current pointer position to the chosen target object.

12. Relationship to other technologies

12.1. Scalable Vector Graphics (SVG)

This specification is designed specifically with SVG in mind, though the metadata format can be used in other image types.

12.1.1. The SVG metadata element

The SVG [SVG11] specification defines a metadata element (or "tag"), which can contain this metadata format. In the SVG 2.0 specification, defined in the late 1990s, it was expected that metadata would be expressed in XML; the SVG 2.0 [SVG2] specification lifts this restriction, and defines the metadata child content model as text content. This allows any structured or unstructured text, including structured content like JSON.

Note: The SVG 2.0 specification is not yet approved, and has not yet reached W3C Recommentation status.

Parts of it are implemented in browsers and other user agents, but some validation tools might still flag non-XML content in the metadata element as a violation. This might impact some organizational requirements that all HTML or SVG documents are required to pass validation in order to be accessible. These tools should be urged to update to the SVG 2.0 specification for processing the metadata element.

It is important to note that currently, SVG does not define a processing algorithm or behavior for metadata. A user agent may process the metadata in any manner it supports. Currently, no general purpose web browser processes the contents of the metadata element.

Because the SVG metadata element does not have any defined attributes (such as an href attribute), there is no defined way to link to external resources; this contrasts with the style element in SVG, which can contain CSS rules as the child content of the element, or can reference external CSS files through the href attribute. Thus, this specification defines a linking syntax to allow external metadata files to be applied in whole or in part to the referring image document.

12.2. Cascading Style Sheets (CSS)

Some features of this specification, such as contrast control, overlap with capabilities of CSS, such as the prefers-high-contrast media query. This specification is designed to complement and extend such capabilities.

12.2.1. CSS format

CSS defines its own syntax, processing model, and format, initially defined in the 1990s. It requires a custom processor.

This specification uses the common JSON format, which can be processed by any JSON tools, including web browser contexts, JavaScript, and many other tools.

This specification attempts to define its rules in a way that CSS might be expresssed in JSON.

12.2.2. CSS selectors

This specification uses the CSS selector syntax to link specific metadata entries to specific markup elements, to classes on markup elements, or other linking mechanisms. Unlike CSS, however, it does not use selectors as an "object key", but rather as a value for a selector attribute, to fit the JSON schema and allow multiple rules to use the same selector.

12.2.3. Intended authoring usage

To enable a CSS feature that is extended in this specification, the author can include the rule in either the metadata definition or the CSS definition.

If the rule is defined in the metadata, a user agent must apply in the same manner as if it were defined in the CSS. For a user agent that supports this specification, this can provide a greater degree of user control, ease authoring and maintenance, and decrease synchronization conflicts.

The author does not need to define it in both the metadata and the CSS, but might wish to do so for a user agent (such as a general-purpose web browser) that supports the feature in CSS but does not support this specification.

12.2.4. Conflicts

If a rule in a document’s metadata conflicts with a hard-coded CSS rule, the CSS rule should take precedence, unless the user has made a preference selection, through a default settting or a specific selection at the time of viewing.

12.2.5. Future adoption and integration into CSS

This specification defines some of its functionality in a way that could be integrated into future versions of CSS. Among other examples, the features that involve timing, such as haptics or sonification, are modeled after the animation syntax in CSS.

13. Specification Goals

The goals of this specification are varied. We also define non-goals, things this specification deliberately avoids doing.

13.1. Interoperability

This specification should be complete and definitive with the state of the art, to prevent the need for additional or competing specifications that would decrease interoperability.

13.2. Personalization

This assistive technology must enable the user to select which of the available accommodations are enabled at any time, and at which available effect level. This must be able to set as a default in the user agent, and must be overrideable by the user to meet their specific need at any time.

Any given end-user might have multiple disabilities, with varying degrees of effect at any given time; any given document might have multiple accommodations enabled and encoded in it; any given user agent might suport multiple assistive technologies; any given environment might have different affordances and constraints.

A low vision user might wish to enable very high contrast in a bright environment or when they are tired, and lower contrast when they are in a more moderately lighted environment and well-rested.
A blind user might wish to enable haptics and braille output when they are in a quiet zone or noisy environment, and sonification and spoken output when they are in a private, quiet environment.

13.3. Security

The distribution of interactive content presents a conflict between innovative capabilities and user security. In the web platform, most advanced interactivity can only be achieved through JavaScript, but allowing JavaScript in a file opens the user’s system up to possible security holes.

This is often unexpected and overlooked in images, but SVG (unlike most image formats) allows the inclusion of arbitrary JavaScript, including remote references to external JavaScript files. For this reason, many software applications (such as content management systems or email servers) might strip out the script element, disabling the interactivity of the content.

By defining a set of declarative syntaxes for different assistive technologies, including parameters and conditional execution triggers, this specification enables the user agent to provide the interactivity defined by the author, without compromising security. This specification refers to such declarative capabilities as "behaviors".

13.4. Privacy

By enabling security through declarative behaviors, this specification ensures the privacy of the user from third parties. Note that data collection might still be performed through some primary user agents and content providers, but this should be done only with the user’s consent.

13.5. Portability

Data visualizations can often lose their association with their context, including the article they were published in or the data they represent. This specification must define a way to include the raw data and the provenance in the image file, so the context can be preserved as the document is shared through various means.

13.6. Reusability

This specification should include a versatile way to include the raw data, or a link to it, in the data visualization file that represents it. By doing this, other authors are empowered to extract the data, create different representations, mix the data with other datasets, subset the data, verify the accuracy of the representation, and otherwise practice good data science.

In addition, a visual image can be enhanced with different metadata to serve a different purpose, such as providing a different set of descriptions for an audience at different reading levels, or translating the document into another language.

13.7. Familiarity

Document technologies are most useful when the syntax and model is familiar and unambiguous. This allows for ease of implementation, increased interoperability, and ease of authoring. This specification should use common technologies like JSON and CSS as its underpinnings.

13.8. Flexibility

This specification should be able to define the full range of expression for any assistive technology technique. The current version should detail parameters for haptics, sonification, tactiles, high contrast, voicing, and braille, and should be extensible to accommodate other techniques in the future.

At the same time, this specification should establish defaults and baselines based on best practices, wherever possible, to encourage good authoring and normalize user experience.

13.9. Searching and categorization

This specification must define easy ways to enable the distribution and discovery of content based on user needs and accommodation provided in the document. One way to do this is to provide a standardized way to express keyword tags, ratings, and capabilities within the document itself.

13.10. Provenance expression

Where, when, and by whom an image was created is often very important information, not least because it can enable the user to find more content by any given author, or to verify the quality of the source. It can also provide a "provenance chain", where content that has been adapted to specific needs beyond its original publisher can be shared, while also providing credit to the original author or publisher.

13.11. Encapsulation

Through encapsulation, more than one metadata profile can be included for any given image document, much like a stylesheet can change the appearance of the same HMTL file. This specification should not require any changes to the markup of SVG or other files, with 2 exceptions:

  1. The inclusion of a metadata element and its contents.

  2. The encouragement of the use of id attributes on key elements. This is not necessary, since selector syntax can target any point in a DOM tree, but the inclusion of ids makes creating and maintaining the markup easier.

13.12. Non-goals

13.12.1. Formal semantics and namespaces

While namespaces allow a flexible extensibility and modularity, they also hamper and complicate authoring and reading. This document will avoid the use of namespaces, and related technologies such as JSON-LD. Future supplements to this specification might define a JSON-LD schema, if demand exists for it.

13.12.2. Generic styling

While this specification does include some accessibility-specific extensions of CSS, such as a finer control of color contrast settings, it must not define or be used to supercede CSS.

13.12.3. ARIA Syntax for Data Visualization

This specification is complementary and orthogonal to the W3C specifications defining role attributes, including the WAI-ARIA Graphics Module and the SVG Accessibility API Mappings.

14. Definitions

14.1. inline

In the context of this specification, "inline" refers to content that is included textually in the document itself, in contrast to external content that is referenced, but not incorporated into the document. This is akin to the concept of inclusion by value versus inclusion by reference.

14.2. rule

A rule is a combination of a selector and a behavior or data entry, which applies to an element or set of elements.

14.3. media query

A conditional meta-rule that applies rules based on the capabilities and environment of a user agent or device. Defined by the W3C Media Queries Level 4 specification. Syntactically, they consist of an @ token operator followed by a string, followed by an optional condition set, where the string is one keyword from a pre-defined set of keywords denoting a capability or environmental variable, and the optional condition set is a value or range of values for that capability or environmental variable.

For example, the following media query expresses that a rule is used on printing devices only, with a resolution greater than 300 dots per CSS:

@media print and (min-resolution: 300dpi)

14.4. selector

A pattern that matches against elements in a tree. Defined by the W3C CSS Selectors specification. Syntactically, they normally consist of a token followed by a string, where the token is the operator, and the string is an user-defined alphanumeric "word" assigned to an element in the markup.

The most common examples are the id selector (#), which matches the id of a single element (e.g. #bar-1, which matches the element with the id "bar-1"), and the class selector (.), which matches all instances of a string declared as the value of a class attribute (e.g. .bar, which matches all elements with the class including "bar").

14.5. raw data

The dataset on which the visualization or image is based. Optimally, this dataset is included in the image file as metadata.

14.6. user

14.7. end user

The person experiencing or interacting with the image content, through whatever medium.

14.8. user agent

The software that processes and presents the document to the end user.

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

References

Normative References

[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119

Informative References

[JSON-Schema]
Austin Wright; et al. JSON Schema: A Media Type for Describing JSON Documents. 10 June 2022. Internet-Draft. URL: https://datatracker.ietf.org/doc/html/draft-bhutton-json-schema
[JSONPath]
S. Gössner; G. Normington; C. Bormann. JSONPath: Query expressions for JSON. Internet-Draft. URL: https://www.rfc-editor.org/rfc/rfc9535
[Selectors-4]
Elika Etemad; Tab Atkins Jr.. Selectors Level 4. URL: https://drafts.csswg.org/selectors/
[SVG11]
Erik Dahlström; et al. Scalable Vector Graphics (SVG) 1.1 (Second Edition). 16 August 2011. REC. URL: https://www.w3.org/TR/SVG11/
[SVG2]
Amelia Bellamy-Royds; et al. Scalable Vector Graphics (SVG) 2. URL: https://svgwg.org/svg2-draft/
[VIBRATION]
Anssi Kostiainen. Vibration API. URL: https://w3c.github.io/vibration/