Towards a sustainable backend solution

Deliverable D3 - Towards a sustainable backend solution

Document Technical Details

Document Number:	D3
Document title:	Towards a sustainable backend solution
Version:	1.0
Document status:	Final version
Work package/task:	WP2/Task 2.3
Delivery type:	Report
Due date of deliverable:	July 30, 2021
Actual date of submission:	July 30, 2021
Confidentiality:	Public

Document History

Version	Date	Status	Author	Description
0.1	23/07/2021	Draft	Carlos Duarte	First draft
0.2	24/07/2021	Draft	Letícia Pereira	Review
0.3	27/07/2021	Draft	André Rodrigues	Review
0.4	28/07/2021	Draft	José Coelho	Review
1.0	30/07/2021	Final	Carlos Duarte	Final version

Introduction
Scaling of the backend service
Service coverage
Desirability of the service
Outlook for SONAAR

Introduction

This report, according to the description of work, discusses how a sustainable solution for a backend supporting the SONAAR mechanisms for authoring and consuming accessible media content could be built, based on the learned experiences from this project. Following the interim review meeting, we decided to extend the scope of this report to encompass a discussion of other relevant factors that could contribute to a future deployment of a service with the characteristics of SONAAR.

The report is structured in 4 main chapters. The following chapter covers aspects related to the scaling of the backend service. This is followed by a chapter addressing aspects related to the different ways that the coverage of service similar to SONAAR can be increased. The next chapter focuses on aspects relevant to the promotion and adoption of a service similar to SONAAR. The final chapter discusses the possible avenues to keep and improve SONAAR after the ending of the project.

Scaling of the backend service

This chapter discusses architectural and other technical aspects relevant to the deployment of a SONAAR enabling backend. In the chapter, we cover aspects related to the storage of the data and to the recognition of images and image contents.

Managing and storing large amount of data

The backend supporting the SONAAR prototypes stores a small amount of data for each image: the image identifier, the concepts associated with the image as recognised by the image processing service, any text that was recognised in the image, all text alternatives for that image entered by users of SONAAR, the language of the text alternative, the number of times it was selected by users of SONAAR and the quality of the text alternative as predicted by the quality assessing algorithm. The images are stored in the service responsible for locating an image in the image database, identifying the concepts in the image and recognising any text present in the image.

The backend (a MySQL database plus a Node.JS server for managing the REST API) was deployed on a server with limited hardware specs, but sufficient for the load demanded so far. Nevertheless, in a setting with a large adoption of the SONAAR platform, the storage and access to the text alternatives might require a different approach. To analyse this subject, we will consider two scenarios: SONAAR being deployed as envisioned in this project, i.e., working on multiple social networks; and SONAAR being integrated into a specific social network, either by the social network provider or through a third party application or service.

In the first scenario, the backend would have to be autonomous of the social network infrastructure. We don't expect this backend to have specific requirements that distinguish it from web services that need to handle large numbers of concurrent requests. A standard architecture to balance the load between a number of web servers, plus a database server should suffice for the processing needs of the backend.

In the second scenario, the backend would be integrated with the backend of the social network. Therefore, it's implementation would be dependent of the social network's backend architecture. For instance, the alternative texts will already be available in the social network database. Therefore, deciding to replicate the alternative texts in the SONAAR backend or adding extra information to the social network backend will require judging the performance benefits of either solution.

Service for recognition of similar images

In SONAAR we have resorted to a AI platform service - Clarifai - to recognise when two images are the same so that we can suggest the alternative text of one of them to a user posting the other. For running our prototypes, we subscribed to a level of service supporting 30 thousand operations per month (approximately, one thousand operations per day), which was enough for our usage level. If the adoption of SONAAR increased, this would need to be reviewed to maintain the required quality of service.

Clarifai supports solutions to handle an increased number of requests. In fact, it is possible to have dedicated nodes to support increasing loads, or in alternative, to deploy their solution in cloud platforms, such as AWS, Google or Azure. This offers a high level of flexibility to handle what is the most resource intensive component of the SONAAR workflow.

In the scenario where SONAAR features are integrated with a social network, we need to consider the possibility of the social network having available its own services that could replace the Clarifai service (which is the case of Facebook). This would require an adaptation of the backend architecture of SONAAR. This also allows us to understand that it is feasible to deploy a solution that shares characteristics with what we propose in SONAAR without impacting the performance from the perspective of the user. Facebook's algorithms to automatically insert alternative descriptions in images (based on AI powered recognition of the concepts present in the image), or to detect users in images, do not create any visible overheads to the user experience.

Services for recognising concepts in images and text in images

The SONAAR backend offers suggestions for text alternatives of an image based on the concepts found in the image, or the text recognised in the image. Both the concepts identification and the text recognition are services provided by Clarifai. Consequently, the considerations established in the previous section apply also for these services. We just point out that, in the scenario where SONAAR is integrated directly in the social network, for these features to be available, the social network would need to provide the image concepts identification mechanisms and the text recognition mechanism.

Service coverage

This chapter discusses aspects related to how the coverage of a service similar to SONAAR could be increased, extending its reach. In this chapter we discuss how SONAAR could be integrated into social networks, or third-party tools that provide access to social networks. Additionally we discuss the, desirable, scenario where information about text descriptions could be exchanged between multiple social networks and third party tools. Afterwards, we focus our discussion on technical aspects that we learned during the project are paramount to ensure adoption of this type of services: the availability of the service in multiple languages, the ability to react to interface changes, and ways to increase the number of text alternatives available.

Due to aspects related with automatically detecting when the user is publishing content with images (discussed in section Reacting to interface changes), and to the challenges faced with raising awareness to the SONAAR service (discusses in chapter Desirability of the service), the most efficient solution for increasing reach and adoption of SONAAR would be to have it integrated directly into the social networks themselves.

The social network service would easily be able to detect whenever a user is publishing media content and remind the user to add a description. Having this integrated with SONAAR, would allow the social network to additionally offer a suggestion to the image. Twitter employees, members of @TwitterA11y and @TwitterAble have put forward in July 2021 an initial design supporting such a concept when a user is detected retweeting a tweet with an image that does not have a description. This concept is illustrated in figure 1.

Figure 1 - Concept vision of Twitter warning a user for a retweet of an image without description.

We have reached out to Twitter (Andrew Hayward, lead accessibility engineer, who tweeted the concept presented before) to introduce SONAAR and scheduled a meeting to present SONAAR at the beginning of August.

Any social network would be able to incorporate the detection of media publishing and warn their users about the possibility to write a description if the social network interface supports that feature. SONAAR would be useful in this scenario to suggest existing descriptions, therefore lessening the burden on the users publishing the media. If the social network does not have native support for entering media descriptions, by integrating with SONAAR it would become possible for users to enter a description and have it stored in the backend of the SONAAR deployment. In this way, by deploying their own copy of the SONAAR service, each social network will have its accessibility improved by offering more descriptions.

Integration with third party tools

An alternative to the official social network service providers adopting SONAAR, is to have third party applications integrating it into their interfaces. Not all social networks provide access to their contents, therefore this option limits the reach of the service.

We have reached out to several organisations and individuals responsible for third party applications providing access to social networks (mostly Twitter), introducing SONAAR and making ourselves available to discuss the possibility of having SONAAR adopted by the application. Table 1 presents the applications contacted, the social network supported, the party responsible for the application, and the feedback we received. Two of the organisations, responsible for a total of three applications, replied declining our invitation. Until this moment we haven't received further answers.

Application	URL	Social network	Organization	Feedback
Bacon Reader	https://baconreader.com/	Reddit	One Louder	Missing
Twitterrific	https://twitterrific.com/	Twitter	Icon Factory	Declined
Tweetbot	https://tapbots.com/tweetbot	Twitter	Tapbots	Missing
Tweetings	https://tweetings.net/	Twitter	Dazzle	Missing
Plume	https://myplume.com	Twitter	Oak Barrel Media	Declined
Easy Chirp	http://www.easychirp.com/timeline	Twitter	Web Overhauls	Missing
UberSocial	https://ubersocial.com	Twitter	Oak Barrel Media	Declined
TweetCaster	http://tweetcaster.com	Twitter	One Louder	Missing
Chicken Nugget	https://getaccessibleapps.com/chicken_nugget/	Twitter	Accessible Apps	Missing

The two previous sections introduced scenarios where SONAAR would be integrated directly into specific social networks or third party applications. While SONAAR would operate as designed inside the scope of the social network or application, it would not benefit from exchanging text descriptions across different social networks and applications, which is supported by the current prototypes.

In the eventuality of different social networks or applications integrating SONAAR, it would still be possible to exchange media descriptions between social networks and applications if a backend was set up. This backend would need to replicate the current SONAAR backend, storing text descriptions for the different images, and accessing Clarifai (or a similar service) for identifying concepts and recognising text in images. The backend would also need to expose an API that social networks and applications could tap into to query for suggestions of text descriptions for a given image or to send text descriptions of the images their users would publish.

Compared with the current prototypes, this solution would have the advantage of not being required to react to interface changes, given the responsibility of detecting when a user is publishing media content would be handled by the social networks service or third-party application. On the other hand, it would not be possible to guarantee that social networks and applications would contribute to the pool of text descriptions. Additionally, the current SONAAR prototypes support two use cases: first, a social network user receives suggestions for descriptions when publishing media content; second, a user can request a description of an image found on the web or any Android application. While the first use case could continue to be supported by the social networks, the second use case would no longer be directly supported. The current SONAAR prototypes could continue to be operated to offer this support. However, we would like to argue that the preferable solution would be the user agents (the web browser or the mobile operating system) integrating the calls to the SONAAR API.

Support for multiple languages

Some of the limitations that we could identify with the usage of the SONAAR prototypes were related to language issues. These impacted the prototypes at two levels: the usage level and the development level.

At the usage level, this could be perceived by providing SONAAR in both English and Portuguese languages. The prototypes had their interfaces available in English and Portuguese, with the user having the possibility to switch between the language used in the interface. This in itself proved an advantage, by allowing users who have the operating system in one language to switch to the other if desired, and we did not receive any negative feedback from trial participants on this topic. However, it did raise the expectation of users of the Portuguese version that all contents would be provided in that language. This is a perfectly reasonable expectation, but one that could not be met by the current technology stack employed in the SONAAR prototypes. The one aspect where this expectation was not met relates to the suggestion of descriptions. SONAAR tries to identify the language of each description, and offers suggestions of descriptions that are in the language of the user's interface. However, the suggestion that is built from the concepts in the image is only available in English, given that Clarifai provides those concepts in English. This would not be a serious issue if there were descriptions in Portuguese available in the descriptions database. However, with the limited number of users of SONAAR, and the limited amount of contributions, the availability of descriptions in Portuguese is reduced. Therefore, it is common that users of the Portuguese interface are provided with a single suggestion for image descriptions, and that suggestion is in English.

In order to address this issue, the technology stack of the SONAAR backend would need to be augmented with the addition of a component that could perform language translations. This component would be useful for: 1) translating the automatically generated suggestion based on image concepts from English to whatever is the language of the user's interface; and 2) translating stored description in one language to the language of the user's interface. This last use case would be helpful in, at least, two situations. The current situation, where the number of descriptions is limited, and it is therefore likely that there are no descriptions in specific languages. In this situation, descriptions in any language could be translated to the requested language. The other situation is more likely in the future and incorporates the description quality assessment. When SONAAR detects descriptions of substantially higher quality in a language that is not the language in which the description is being requested, translating description could mean suggesting descriptions of a high quality instead of only presenting low quality descriptions.

At the technical development level, the language of the interface also plays an important role. The mechanism for identifying when a user is publishing content with an image, in the Android prototype, relies on names and labels of multiple interface components. The names and labels used in the same Android applications (Facebook and Twitter) in different languages are, often, not direct translations. Additionally, they are updated periodically, especially in Facebook's application. This makes the tasks of maintaining the SONAAR prototypes, resource intensive and complex (given the number of interface elements that need to be tracked). This is the main reason we have deployed the prototypes in just two languages.

Reacting to interface changes

Perhaps the biggest challenge to the sustainability of the SONAAR service is the lack of control over the interfaces of the supported social networks. Both Twitter and Facebook periodically update elements on their interfaces, with the frequency of changes on Facebook's interface being higher. Both SONAAR prototypes rely on attributes of the interface components to be able to identify when a user is publishing media content and to decide where to present the suggestions to the user. When these attributes change, the SONAAR prototypes is no longer able to detect when a user is publishing media content, impacting its ability to offer suggestions for alternative descriptions.

To address this issue in SONAAR we envision the possibility of “recruiting” active users of the platform to “describe” the new interface. To this end, SONAAR would need to include a guided procedure for remediation purposes. This procedure could be triggered by SONAAR users that detect that suggestions are not being offered anymore, or could be prompted by SONAAR itself if, after a given period of time, no requests for suggestions have been made to the backend. The procedure, after triggered, would ask the user to navigate to specific steps in the publishing process. For example, the user might be requested to enter the screen where a publication is prepared. In each step, SONAAR would ask the user to activate (by clicking, tapping, or through the keyboard) specific interface elements. For example, the add media to the publication button, or the Post or Tweet buttons. SONAAR would intercept the activations and use that to identify the attributes of the elements in the interface that it uses to recognise the steps in the publishing process the user is in. This would allow SONAAR to repair itself without the need for preparing and deploying a new version of the extension or application.

To make the whole process more efficient, we would need to deploy an additional service to the SONAAR backend. This service would store the information identifying the elements of the interface that are relevant to SONAAR, for the different social networks, in different devices, platforms, languages and potentially versions of mobile applications. By having this information in a central location, it would be possible to propagate it to other users of SONAAR that require it. This means that it would only take one user to go through the remediation process to make it available to other users, which means the SONAAR would recover quickly for all users.

However, the procedure described above would be complex and, certainly, not without limitations. For example, until one user detects and reports the problem and executes the remediation procedure, other users would not be able to use SONAAR. The most effective way to sustain this service would be to have the social networks integrate the user interface part of SONAAR, responsible for detecting when a user is publishing media content and for presenting to the user the suggestions received from SONAAR. The backend, where descriptions are stored and suggestions are prepared, would remain independent of the social networks, in order to increase the service's reach.

Increasing the number of descriptions

A limitation that was felt in the user testing of the SONAAR prototypes was the lack of image descriptions, which reduced its usefulness. We can envision two ways to address this situation. While the number of descriptions is small, it might be necessary to recruit crowd workers to write descriptions of images for which SONAAR users request descriptions. While this does not address the immediate request of the user (i.e., crowd workers would not be able to reply in real time), the description would eventually be stored in the SONAAR backend, allowing the user to come back to the social network feed and consult the description. Other users that would find that image on their social network feeds would also benefit from this user's request, by having the description already available. To preempt the lack of images, a mechanism to crawl images in popular Twitter feeds, Instagram accounts, and other sources, could be deployed. These images would then be sent to crowd workers, and descriptions would be written in advance of users sharing the images on their social network accounts. This crowd input would also be useful to deploy the second part of the proposed solution.

For the second way to be applicable, SONAAR requires a sufficient amount of images with descriptions to be stored in the backend. When these images and descriptions are available it would be possible to find images similar to the one the user is requesting a description for. By locating similar images, it would become possible to suggest descriptions from similar images, instead of only descriptions of the exact match to the image. This would considerably increase the range of images that SONAAR could offer suggestions for.

Desirability of the service

This chapter discusses the issue of promoting the SONAAR service and, consequently, promoting the accessibility of media content on social networks.

We have promoted SONAAR mainly through social media, given that the users we want to reach are social media users. On Twitter, the number of impressions of the tweets announcing the first user study (filling an online questionnaire) and second user study (using the SONAAR prototypes) were of the same order, reaching around 10,000 impressions. However, the engagement level was significantly different. While for the first study we received more than 250 answers to the questionnaire, for the second study we recruited only eight participants willing to install and use the SONAAR prototypes for a period of two weeks before being interviewed. Even taking into account the higher commitment required by the second user study, the conversion rate is very low. If we account for the fact that some of the participants in the second study have been contacted directly by the research team, this gives a grim overview of the appeal that this type of service represents to users of social networks. It is important to recall that this is in stark contrast to what was expressed in the answers to the first survey, where 58% of the respondents without disabilities and 83% of the respondents with a disability declared to be interested in a tool to support them in authoring accessible media content.

As reported in sections Integration with social network services and Integration with third party tools, we have also reached out to organisations that could adopt, support or integrate the outcomes of this pilot project into their products or services. Unfortunately, the level of interest from these organisations is comparable to the one that social media users have displayed. Less than half of the organisations replied back, and all the answers declined our invitation to schedule a meeting to present SONAAR with more detail in view to its adoption.

Both the lack of interest displayed by end-users and application providers is representative of the current state of awareness to the benefits of accessibility. Bringing attention to this issue might not be enough, as the efforts of this pilot project demonstrate, even though we understand it's limited scope. Other, more impactful, measures might be required. The European Accessibility Act directive will certainly be a step in the right direction. Even if it does not apply to the social network services that are targeted in this project, by applying to e-commerce services it will certainly create awareness to the accessibility domain in a way that hasn't been possible before.

Outlook for SONAAR

This chapter discusses the future plans for SONAAR. As mentioned in the section Integration with social network services, we have scheduled a meeting with a Twitter representative to present SONAAR and discuss Twitter's interest in adopting some of the concepts explored in this project.

In the meantime, we have already had a meeting with the coordinator of the H2020 funded MediaVerse project. An outcome of the meeting was the possibility to replace the Clarifai service with equivalent or improved services from MediaVerse. In particular, a service to identify similar images and a service to identify concepts in images. By replacing Clarifai with these services would allow us to replace a commercial service with a service that we could use for free (until a source of revenue is found). Without having financial charges associated we will be able to sustain the SONAAR service for longer. We plan to start working in the integration of the new services in the coming months.

Additionally, MediaVerse can provide services that we currently do not have access to. The services that are available for images are also available for videos. If we can address the issue of acquiring a video a user is publishing to send it to our backend, SONAAR should become able to provide descriptions for videos. In the future, MediaVerse aims to offer captioning for videos, which could then become a feature made available by SONAAR. A further feature from MediaVerse is a model capable of recognising image memes. Such a service could allow SONAAR to tailor the suggestions of descriptions specifically to this type of publications.

The collaboration between MediaVerse and SONAAR could work in both directions. In this regard, we envision the possibility of training models to identify concepts that are more relevant to visually impaired people. Current models (on MediaVerse, but also on Clarifai) are generic and provide some concepts of questionable utility for a visually impaired person. With SONAAR we have the ability to find which descriptions are favoured by users. This information would be useful for MediaVerse to improve its models, which would contribute to improve the quality of the suggestions provided by SONAAR. MediaVerse also has a model to provide captions to images. SONAAR could provide these image captions as suggestions. Once more, by tracking the preferences of users, SONAAR could provide information for MediaVerse to improve these models, benefiting both platforms.

This pilot project allowed us to better understand the technical and adoption challenges for a solution promoting digital media accessibility through a service that supports social network users in creating more accessible content. This acquired knowledge will guide us in future research efforts, and, hopefully, assist us in maintaining the SONAAR service available to the community in the coming years.