Browser extension for presentation of accessible content
 

Browser extension for presentation of accessible content

Deliverable D8 (Browser extension for presentation of accessible content)

Document Technical Details

Document Number: D8
Document title: Browser extension for presentation of accessible content
Version: 1.0
Document status: Final version
Work package/task: WP2/Task 2.2
Delivery type: Software prototype
Due date of deliverable: April 30, 2021
Actual date of submission: April 30, 2021
Confidentiality: Public

Document History

Version Date Status Author Description
0.1 25/04/2021 Draft Letícia Pereira First draft
0.2 28/04/2021 Draft Letícia Pereira Final draft
0.3 29/04/2021 Draft Carlos Duarte Review
1.0 30/04/2021 Final Carlos Duarte Final version

Contents

Introduction

SONAAR aims to facilitate the user-generation of accessible content on social network services by developing a solution that supports the authoring and consumption of media content on social platforms in both desktop and mobile devices. In addition to improving the accessibility of this content, the proposed solution has also the potential to raise awareness to the importance of authoring accessible content by engaging users in accessible authoring practices.

This deliverable concerns work package 2 of the SONAAR project. In WP2, the work focused on deploying user-generated content on mobile and web platforms. In particular, extending the support of our prototypes to support the accessible content deployment feature.

This document is structured as follows: The following section describes the functionalities deployed in the current version of the prototype. It also presents the updates to the backend service that were required to support the updated functionalities. Additionally, this section introduces a set of workflows illustrating potential usage scenarios of the SONAAR prototype, either currently supported or that might be only supported in the future. The following section explains how the SONAAR Google Chrome web extension can be installed. The final section presents the next steps in what concerns the browser extension for the final period of the SONAAR project.

Functionalities description

With the availability of image descriptions presented in the previous deliverable D6.1 and the updates described in D6.2, we extended the functionalities of the prototypes to offer support for screen reader users that need to access image descriptions in any web page. Given that images are not focusable by default on a web page, when a user finds an image for which a description is desired, either because it does not have one or the user judged the description to be of poor quality, activating the prototype on that web page results in all images on the web page that have a non-empty alt text attribute or do not have an alt text attribute being sent to the backend. We do not request descriptions for images with empty alt text attributes because that is the proper way for a developer to mark an image as decorative, and therefore it should not have a description. On receiving an answer, the prototype modifies the page's DOM to make the images focusable and inserts the descriptions in the alt attribute of the corresponding image. The user can then browse the images in the page and listen to the descriptions for any non-decorative image. Different interfaces to present this content are further discussed in the Workflows section.

Updates to the backend service

The backend is composed of a database that stores image descriptions previously provided by SONAAR users. This database connects an image identifier with the currently known descriptions for that image and, now, the language the description is written in. In order to provide these descriptions upon a client's request, the backend needs to be able to search for an image in the database.

Image searching is achieved through the image recognition service provided by Clarifai. We store images in Clarifai and use their image searching feature to search the image database for the image for which the description has been requested. Clarifai provides a similarity measure between the searched image and every image in the database. When an image has a similarity measure above the "same image threshold" we consider it to be the same image we have on the database. We have fine-tuned the "same image threshold" so that the same image is identified even if it has suffered small modifications, like a small crop, or the addition of a watermark or signature. The process to define the value for this threshold consisted in modifying several images in different ways (e.g. different amount of crop, different degrees of rotation, or inserting differently sized and colored watermarks) and observing the changes in the similarity measure returned by Clarifai when compared with the unmodified image. From the observations we empirically determined the value associated with changes that we classified as resulting in an image that should be considered different.

To assist in the preparation of image descriptions we expanded the use of features provided by Clarifai. The first one is the ability to provide a list of concepts related to the image that is searched. Clarifai provides us with a list of concepts, with, for each concept, a level of confidence in the accuracy of the result. We keep the concepts from this list that are above a "concept confidence threshold". These concepts offer us another way to create an image description. This threshold was also empirically defined through an analysis of the concepts generated by Clarifai for multiple images. The threshold captures what we considered the most relevant concepts without limiting the number of concepts returned. A second feature we use from Clarifai is the ability to recognize text present in the image. This is particularly relevant for the social network domain, where many posted images contain text (e.g. memes). The image's text is, very often, another source for creating an image description.

A final source of image descriptions are the ones created by the users of social networks themselves. When our front end prototypes detect an image being posted with a description, that description is sent to the backend. The backend stores that description if it has not been stored before. If that description had already been stored, we now increase a counter of the number of times it has been used. The language the description is written in is also stored. We use the Franc Natural Language Detection library to identify the language of the description. If it is not possible to detect the language from the description (e.g. because it does not have enough words) we apply the same procedure to the language of the tweet or post. If this also does not return a result, we determine the language from the language of the user’s browser. In this way we try to accommodate those instances where a user has the browser set in one language but might write tweets or posts in multiple languages.

In summary, our current sources for descriptions include: descriptions provided by users, image concepts identified by Clarifai, and any text in the image. Image descriptions are characterized by a language and by the number of times they have been used in tweets or posts.

In order to answer the client's request, the backend will have to decide which description or descriptions to send. To make that decision, we currently explore two features. The client's request includes the language of the user's browser. With that information we can limit our selection to descriptions in the same language. The second feature is the amount of times a description has been used. The backend uses this additional information to order the list of descriptions of the same language of the users' browser that are sent back to the client.

Workflows

In this section we present a set of workflows exploring different approaches to present users with accessible content using the Chrome extension. In order to assess the effectiveness of the workflows that have been implemented, we prepared different versions of the Chrome extension and will distribute them to selected groups of end-users during the evaluation phase. We also present some suggested workflows that could be deployed in our prototypes in future improvements.

Implemented backend workflows

  • Answering a request - Default language:
    • Search for a previous entry for this image using Clarifai image recognition
    • When no other instance of this image is identified on the database:
      • Store the image identifier
      • Store the image concepts identified by Clarifai
      • Store any text that has been recognized by Clarifai in the image
      • Return a list composed by the concepts and recognized text
    • When an instance of this image is identified on the database:
    • Search alternative descriptions previously provided by other SONAAR users for the same image in the same language
      • Search for concept list provided by Clarifai for the same image
      • Search for OCR mechanism to recognize eventual text content in the image
      • Return an ordered list of descriptions, concepts and text

Suggested backend workflows

The current structure could be extended in order to comprise other features. One possible avenue for the SONAAR backend is supporting descriptions in different languages, in addition to the ones currently used. In this scenario, the user could define which languages the descriptions will be sent in.

  • Answering a request - Multi-language support
    • Search for a previous entry for this image using Clarifai image recognition
    • When no other instance of this image is identified on the database:
      • Store the image identifier
      • Store the image concepts identified by Clarifai
      • Store any text that has been recognized by Clarifai in the image
      • Return a list composed by the concepts and recognized text
    • When an instance of this image is identified on the database:
    • Search alternative descriptions previously provided by other SONAAR users for the same image in the languages previously defined
      • Search for concept list provided by Clarifai for the same image
      • Search for OCR mechanism to recognize eventual text content in the image
      • Return an ordered list of descriptions, concepts and text

Another possibility will be to use a natural language processing service to translate descriptions to the user’s language.

  • Answering a request - Translated language:
    • Search for a previous entry for this image using Clarifai image recognition
    • When no other instance of this image is identified on the database:
      • Store the image identifier
      • Store the image concepts identified by Clarifai
      • Translate the concepts to the user’s language if needed
      • Store any text that has been recognized by Clarifai in the image
      • Return a list composed by the concepts and recognized text
    • When an instance of this image is identified on the database:
    • Search alternative descriptions previously provided by other SONAAR users independently of the language
      • If no description is found in the user’s language, translate descriptions to the user’s language
      • Translate the concept list provided by Clarifai to the user’s language if needed
      • Search for OCR mechanism to recognize eventual text content in the image
      • Return an ordered list of translated descriptions, concepts and text

Implemented workflows for supporting accessible image browsing on the Google Chrome web extension

In order to provide users with descriptions in any web page, the first workflow supported by SONAAR consists in embedding the first description on the list provided by the backend in all images on the web page requested by the user.

  • One result
    • User asks for SONAAR to analyse current web page images
    • SONAAR identifies all images on the web page that do not have an empty alt text
    • SONAAR queries the backend service
    • SONAAR makes the identified images on the web page focusable
    • SONAAR embeds the corresponding description suggested in the identified images

At this moment, all images are focusable and the previous workflow can be extended, giving the user the possibility to ask for more results for a specific image.

  • Ask for more results
    • User asks for SONAAR to analyse current web page images
    • SONAAR identifies all images on the web page that do not have an empty alt text
    • SONAAR queries the backend service
    • SONAAR makes the identified images on the web page focusable
    • SONAAR embeds the corresponding description suggested in the identified images
    • User focus on an image and activates the shortcut to trigger SONAAR service (ctrl + alt + i)
    • SONAAR opens a window containing a list of other descriptions identified for that image

Another scenario also explored by SONAAR is providing users the complete list of results from the start. For that, the following workflow is established:

  • List of results
    • User asks for SONAAR to analyse current web page images
    • SONAAR identifies all images on the web page that do not have an empty alt text
    • SONAAR queries the backend service
    • SONAAR makes the identified images on the web page focusable
    • SONAAR embeds the list of descriptions provided by the backend in the identified images

Suggested workflow to classify image descriptions

The previous workflows open several possibilities for SONAAR future improvements. The first one is providing users the opportunity to upvote or downvote the descriptions provided by SONAAR. With that, the backend would have another source to improve the current quality rating system. This system could be used by screen reader users to indicate when a description provides enough information or not, but also by sighted users interested in collaborating with SONAAR.

  • Classify current descriptions
    • User asks for SONAAR to analyse current web page images
    • SONAAR identifies all images on the web page that do not have an empty alt text
    • SONAAR queries the backend service
    • SONAAR makes the identified images on the web page focusable
    • SONAAR embeds the corresponding description suggested in the identified images
    • User focus on an image and activates the shortcut to trigger SONAAR service (ctrl + alt + i)
    • SONAAR opens a window containing:
      • A list of all the descriptions identified for that image
      • Two buttons for each description, one to upvote and other to downvote

Suggested workflow to contribute with image descriptions

Another possible contribution workflow can be defined. In this case, users could also provide another description for an image.

  • Contribute with a new description
    • User asks for SONAAR to analyse current web page images
    • SONAAR identifies all images on the web page that do not have an empty alt text
    • SONAAR queries the backend service
    • SONAAR makes the identified images on the web page focusable
    • SONAAR embeds the corresponding description suggested in the identified images
    • User focus on an image and activates the shortcut to trigger SONAAR service (ctrl + alt + i)
    • SONAAR opens a window containing:
      • A list of all the descriptions identified for that image
      • Two buttons for each description, one to upvote and other to downvote
      • An input box for a new description for that image
    • User enters a new description for that image
    • SONAAR logs the information provided by the user

Implemented workflow to report a problem

We also established a possible workflow allowing users to report a problem. In this scenario, the user can just alert to a problem with SONAAR or send a message providing more information about the trouble identified.

  • Report a problem
    • User selects the option to report a problem
    • SONAAR opens a window message containing:
      • An optional input box for a description of the problem
      • A button to send the report
    • User provides the information about the problem found
    • SONAAR sends a message to the support team

Suggested workflow to report a problem

In order to cope with the frequent changes in the interfaces of major social platforms and the challenges it raises, we suggest an extension to the previous workflow allowing users to contribute with the identification of elements on the interface. For this, the value of required elements on the interface (e.g. upload media button, enter alt text button, alt tex input box) would be dynamically stored. In this scenario, when SONAAR detects that a specific set of values is not currently present on the interface, another attempt can be made with a different set. This could be useful not only to cope with new versions of the interface, but also with different interface themes and possible personalization settings made by the users. This suggested workflow allows users to identify the required elements on their own interfaces and send it back to SONAAR. The next time the user activates SONAAR services, this information would be already available and this new set of values for required elements identified on the interface.

  • Identify required interface elements
    • User selects the option to report a problem
    • SONAAR opens a window message containing:
      • An optional input box for a description of the problem
      • A button to send the report
      • A button to identify required elements on the interface
    • User selects the option to identify required elements on the interface
    • For each one of the required elements:
      • SONAAR shows a message asking the user to identify the element
      • User selects the corresponding element
    • SONAAR logs the information provided

Setup instructions

The web extension was developed and tested on the Chrome browser, but is also supported on chromium based browsers like Edge, Brave, Opera or Vivaldi.

The current version of SONAAR is available for download on the Chrome web store at: https://chrome.google.com/webstore/detail/sonaar-add-alts/fclfledfnfpilnpdhflpbpnboiohbmdl

The web extension can also be manually installed:

  1. Download the code from https://github.com/SONAARProject/add-alt-extension
  2. Update the endpoints.js file to point to your own backend installation
  3. Open the extensions tab on the browser
  4. Enable developer mode
  5. Install the extension by clicking the “Load unpacked” button and selecting the folder where the code is.

The extension is constantly being updated with new features developed during the project.

Next steps

One of the next steps concerns conducting a new study with social network users, guided by two main objectives: validate with users the effectiveness of the documentation for authoring accessible content, and validate the new interaction flow for accessible content authoring. The study will be focused on two different groups, allowing us to investigate the individual experiences of participants, the accessibility and usability of our prototypes, but also particular aspects of each one of these groups. The first group is composed of different clusters of one blind participant and at least 3 sighted social media contacts of this participant, in particular those that publish media content that they usually consume. With these settings we will be able to also investigate the general impact that our prototypes may have in the media content consumed by blind users. The second group is composed by other interested participants and no further criteria is required. With the feedback of this group we will be able to further explore the engagement and motivational factors that may be raised by SONAAR resources in a context where people may not have any personal connection with a blind person as an intrinsic motivation. We expect that SONAAR will raise awareness and reduce mainstream users' effort to create accessible content, therefore, promoting accessible practices everywhere, not only on social networks, and lessen the burden on people with disabilities to promote these practices. Having SONAAR being used on a larger scale will also allow us to explore other topics. In addition to the users feedback, the amount of times an alternative description has been used can also contribute to better understanding users' preferences on image descriptions. Also, in this period of time we will be monitoring the frequency of social networks interface changes, in order to assess the real impact on our prototypes. The information gathered will be used to re-evaluate the established workflows, discuss future improvements in our prototypes and general contributions to the context of social media accessibility. The results of this study will be documented for future reference in the final deliverables according to its topics.

Furthermore, we are also exploring further approaches to improve the suggestion of alternative descriptions. One of them is the possibility of identifying related images. For that we can define a "related image threshold", which will allow us to identify images that are related but not the same. This knowledge will be useful for those instances where we haven't seen the image before, by allowing us to still offer a description of a related image. The other feature is fully integrating a quality measure of the description for the image. We are currently storing the amount of times a description has been used. Heuristically, we can expect the most popular description to be the most adequate description for an image. However, if we apply this number without further consideration, we might disregard newer descriptions that might be better, but, since they are new, they will have a lower number of users. Our quality measure applies the algorithm described in our previous work which returns a metric of the similarity between the terms in the image description and several features in the image (including the concepts present in the image, concepts related to the image domain and any metadata in the image). The algorithm classifies the semantic similarity between the image and a description in a scale from 0 to 1. The backend will use this additional information, combined with the amount of times a description has been used, to sort the list of descriptions of the same language of the users' device or browser that are sent back to the client.

We also anticipate that, after SONAAR has been in use for some time, one or a few descriptions of an image will become more popular than the others. By using a system that relies on the number of times a description has been used to sort the list of suggestions, there is a chance that a new description, that might be better than existing descriptions, will not be presented to users because it is at the bottom of the list. In addition to using the quality measure to assist in sorting the list, we will implement a mechanism to minimize this problem. When a user selects one description from a list, the selected one has its count increased and the other that have not been selected have their counts decreased. Bad descriptions that never get selected will eventually have negative counts, and new descriptions, by being initialised at zero count, will be above the bad descriptions in the sorting order and therefore will have a higher likelihood to enter the suggestions list.

Finally, future efforts will also be focused on the dissemination of SONAAR. We have been conducting dissemination activities on the ongoing project, even though further improvements on our resources will still be conducted during the next months, SONAAR can now be used by a larger community. At this moment, we have stable prototypes, supporting different sources of alternative descriptions, and a solid documentation on social media accessibility. For that, we will contact different associations of people with disabilities and communities offering relevant services, not only to participate in our user study, but also to freely use SONAAR. We will also contact publishers of current third-party social networking clients to assess the viability of collaborating towards a realistic sustainability for the SONAAR prototypes.