Facebook 'labels' posts by hand, posing privacy
questions
Send a link to a friend
[May 06, 2019]
By Munsif Vengattil and Paresh Dave
HYDERABAD, India/SAN FRANCISCO (Reuters) -
Over the past year, a team of as many as 260 contract workers in
Hyderabad, India has ploughed through millions of Facebook Inc photos,
status updates and other content posted since 2014.
The workers categorize items according to five "dimensions," as Facebook
calls them.
These include the subject of the post - is it food, for example, or a
selfie or an animal? What is the occasion - an everyday activity or
major life event? And what is the author's intention - to plan an event,
to inspire, to make a joke?
The work is aimed at understanding how the types of things users post on
its services are changing, Facebook said. That can help the company
develop new features, potentially increasing usage and ad revenue.
Details of the effort were provided by multiple employees at outsourcing
firm Wipro Ltd over several months. The workers spoke on condition of
anonymity due to fear of retaliation by the Indian firm. Facebook later
confirmed many details of the project. Wipro declined to comment and
referred all questions to Facebook.
The Wipro work is among about 200 content labeling projects that
Facebook has at any time, employing thousands of people globally,
company officials told Reuters. Many projects are aimed at "training"
the software that determines what appears in users' news feeds and
powers the artificial intelligence underlying many other features.
The labeling efforts have not previously been reported.
"It's a core part of what you need," said Nipun Mathur, the director of
product management for AI at Facebook. "I don't see the need going
away."
The content labeling program could raise new privacy issues for Facebook,
according to legal experts consulted by Reuters. The company is facing
regulatory investigations worldwide over an unrelated set of alleged
privacy abuses involving the sharing of user data with business
partners.
The Wipro workers said they gain a window into lives as they view a
vacation photo or a post memorializing a deceased family member.
Facebook acknowledged that some posts, including screenshots and those
with comments, may include user names.
The company said its legal and privacy teams must sign off on all
labeling efforts, adding that it recently introduced an auditing system
"to ensure that privacy expectations are being followed and parameters
in place are working as expected."
But one former Facebook privacy manager, speaking on condition of
anonymity, expressed unease about users' posts being scrutinized without
their explicit permission. The European Union's year-old General Data
Protection Regulation (GDPR) has strict rules about how companies gather
and use personal data and in many cases requires specific consent.
"One of the key pieces of GDPR is purpose limitation," said John
Kennedy, a partner at law firm Wiggin and Dana who has worked on
outsourcing, privacy and AI.
If the purpose is looking at posts to improve the precision of services,
that should be stated explicitly, Kennedy said. Using an outside vendor
for the work could also require consent, he said.
It remains unclear exactly how GDPR will be interpreted and whether
regulators and consumers would see Facebook's internal labeling
practices as problematic. Europe's top data privacy official declined to
comment on possible concerns.
A Facebook spokeswoman said: "We make it clear in our data policy that
we use the information people provide to Facebook to improve their
experience and that we might work with service providers to help in this
process."
U.S. Senator Mark Warner, a Democrat and leading critic of social media,
told Reuters in a statement that large platforms increasingly are
"taking more and more data from users, for wider and more far-reaching
uses, without any corresponding compensation to the user."
Warner said he is drafting legislation that would require Facebook to
"disclose the value of users' data, and tell users exactly how their
data is being monetized."
THE PROJECT
Human-powered content labeling, also referred to as "data annotation,"
is a growth industry as companies seek to harness data for AI training
and other purposes.
Self-driving car companies such as Alphabet Inc's Waymo have labelers
identify traffic lights and pedestrians in videos to fortify their AI.
Voice assistant developers including Amazon.com Inc have people annotate
customer audio to improve AI's ability to decipher speech.
[to top of second column] |
Silhouettes of mobile users are seen next to a screen projection of
Instagram logo in this picture illustration taken March 28, 2018.
REUTERS/Dado Ruvic/Illustration/File Photo
Facebook launched the Wipro project in April last year. The Indian firm received
a $4 million contract and formed a team of about 260 labelers, according to the
workers. Last year, the work consisted of analyzing posts from the prior five
years.
After completing that, the team in December was cut to about 30 and shifted to
labeling each month posts from the prior month. Work is expected to last through
at least the end of 2019, they said.
Facebook confirmed the staffing changes but declined to comment on financial
details.
The company said its analysis is ongoing so it could not provide any findings
from the labeling or resulting product decisions. It has not told labelers the
purpose or results of the project, and the workers said all they have inferred
from their limited view is that selfies are increasingly popular.
The Wipro labelers and Facebook said the posts are a random sampling of
text-based status updates, shared links, event posts, Stories feature uploads,
videos and photos, including user-posted screenshots of chats on Facebook's
various messaging apps. The posts come from Facebook and Instagram users
globally, in languages including English, Hindi and Arabic.
Each item goes to two labelers to check accuracy, and a third if they disagree,
Facebook said. Workers said they see on average 700 items per day. Facebook said
the target average is lower.
Facebook confirmed labelers in Timisoara, Romania and Manila, the Philippines
are involved in the same project.
Among Facebook's other labeling projects, one worker in Hyderabad for
outsourcing vendor Cognizant Technology Solutions Corp said he and at least 500
colleagues look for sensitive topics or profane language in Facebook videos.
The aim is to train an automated Facebook tool that enables advertisers to avoid
sponsoring videos that are, for example, adult or political, Facebook said.
Cognizant did not respond to a request for comment.
Another application of labeling involved the social network's Marketplace
shopping feature, where it automated category recommendations for new listings
by first having labelers and product experts categorize some existing listings,
Facebook's Mathur said.
PRIVATE POSTS
Facebook users are not offered the chance to opt out of their data being
labeled.
At Wipro, the posts being examined include not only public posts but also those
that are shared privately to a limited set of a user's friends. That ensures the
sample reflects the range of activity on Facebook and Instagram, said Karen
Courington, director of product support operations at Facebook.
Facebook's data policy does not explicitly mention manual analysis.
"We provide information and content to vendors and service providers who support
our business, such as by providing technical infrastructure services, analyzing
how our products are used, providing customer service, facilitating payments or
conducting surveys," the policy states.
Europe's GDPR also requires companies delete user data upon request. Facebook
said it has technology to routinely sync labeled posts with both deletion
requests and changes to content privacy settings.
Facebook and other companies are testing techniques to curtail the need for
outsourced labeling, in part to analyze more data faster and cheaper. For
instance, AI training data for news feed rankings and photo descriptions for the
blind came from hashtags on Instagram posts, Facebook's Mathur said.
"We try to minimize the amount of things we send out," he said.
(Reporting by Munsif Vengattil in Hyderabad and Paresh Dave in San Francisco;
Additional reporting by Douglas Busvine in Frankfurt; Editing by Patrick Graham,
Jonathan Weber and Edwina Gibbs)
[© 2019 Thomson Reuters. All rights
reserved.] Copyright 2019 Reuters. All rights reserved. This material may not be published,
broadcast, rewritten or redistributed.
Thompson Reuters is solely responsible for this content. |