Faces for cookware: data collection industry flourishes as China pursues
AI ambitions
Send a link to a friend
[June 28, 2019] By
Cate Cadell
PINGDINGSHAN, China (Reuters) - In a
village in central China's Henan province, amid barking dogs and
wandering chickens, villagers gather along a dirt road to trade images
of their faces for kettles, pots and tea cups.
At the front of the line, a woman stands in front of a camera zip-tied
to a tripod. She holds a photograph of her head with the eyes and the
nose cut out in front of her face and slowly rotates side to side.
Villagers waiting their turn take a numbered ticket. Some of them say
it's the third or fourth time they've come to do this sort of work.
The project, run out of a sleepy courtyard village house adorned with
posters of former China leader Mao Zedong, is collecting material that
could train AI software to distinguish between real facial features and
still images.
"The largest projects have tens of thousands of people, all of whom live
in this area." said Liu Yangfeng, CEO at Qianji Data Co Ltd, which
collects and labels data for several of China's largest tech firms and
is based in the nearby city of Pingdingshan.
"We are creating more data sets to serve more AI algorithm companies, so
they can serve the development of artificial intelligence in China,"
said Liu, declining to disclose his clients.
The boom in demand for data to train AI algorithms is feeding a new
global industry that gathers information such as photos and videos,
which are then labeled to tell the machines what they are seeing.
Companies involved in data labeling or data annotation as it is also
called include crowdsourcing platforms such as Amazon.com's <AMZN.O>
Mechanical Turk which offer users small amounts of money in return for
simple tasks, outsourcing firms such as India's Wipro Ltd <WIPR.NS> as
well as professional labellers like Qianji.
Cognilytica, a U.S. research firm specializing in AI, estimates the
global market for machine-learning related data annotation grew 66% to
$500 million in 2018 and is set to more than double by 2023. Some
industry insiders say, however, that much of the work done is not
disclosed, making accurate estimates difficult.
WEAK PRIVACY LAWS, CHEAP LABOR
China has emerged as a key hub for data collection and labeling thanks
to insatiable demand from a burgeoning artificial intelligence sector
backed by the ruling Communist Party, which sees AI as an engine of
economic growth and a tool for social control.
A plethora of firms have invested heavily in an area of AI known as
machine learning, which is at the core of facial recognition technology
and other systems based on finding patterns in data.
These include tech giants Alibaba Group Holding Ltd <BABA.N>, Tencent
Holding Ltd <0700.HK>, Baidu Inc <BIDU.O> as well as younger companies
such as AI specialist SenseTime Group Ltd and speech recognition firm
Iflytek Co Ltd <002230.SZ>.
The result has been a proliferation of AI products and services in
China, from facial recognition-based payment systems to automated
surveillance and even AI-animated state media news anchors. Chinese
consumers mostly see these technologies as novel and futuristic, despite
concerns raised by some over more invasive applications.
[to top of second column] |
Employees work on labeling different items for data collection on
computer screens, which would serve for developing artificial
intelligence (AI) and machine learning technology, at the Qian Ji
Data Co in Jia county, Henan province, China March 20, 2019.
REUTERS/Irene Wang
Weak data privacy laws and cheap labor have also been a competitive advantage
for China as it races to become a global leader in AI. The Henan villagers were
happy to trade several sessions in front of a camera for a tea cup, or several
hours for a stove-top pot.
OVERSEAS CUSTOMERS
Beijing-based BasicFinder, a leading data labeling firm with locations across
Hebei, Shandong and Shanxi provinces, boasts a robust mix of domestic and
overseas clients.
At a recent visit to its Beijing offices, some staff were labeling images of
sleepy people that will be used by an autonomous driving project to identify
drivers who might be falling asleep at the wheel.
Others were labeling British documents from the 1800s for a Western online
ancestry service, marking fields for dates, names and genders on birth and death
certificates.
According to BasicFinder Chief Executive Du Lin, hiring trained labellers in
China is cheaper than using Western crowdsourcing marketplaces.
A Princeton University project related to autonomous driving initially put a
task on Amazon's Mechanical Turk but as the task became more complicated, people
began making mistakes and BasicFinder was brought in to help correct the
results, said Du.
In that project, one trained BasicFinder labeler was able to do the work of
three crowdsourced labellers, he added.
"Gradually they saw they were paying less for labeling from us, so they hired us
to label all the works from the very beginning," said Du.
Princeton declined to comment.
For labeling employees, the reasons for joining China's data industry are
straightforward. The work, though sometimes tedious, is an upgrade on other jobs
available to young workers who want to return home to small Chinese cities and
villages.
Labellers at Qianji make roughly 100 yuan ($14.50) a day marking data points on
photographs of people, surveillance footage and street images.
The work is usually simple, according to the employees, though some overseas
content poses a challenge.
"One time we thought we were classifying Europe-style cooker machines that have
a washer attached," said Jia Yahui, a labeler at Qianji. "Later we were told
it's actually two separate things, a stove and a dishwasher."
The labeling work brings some of the employment benefits of the tech sector to
rural areas, but those benefits may prove short-lived if AI improves enough to
perform many of the tasks labellers do.
"We think this industry will still exist in three to five years. It may not be a
long-term career - we can only think of the five-year plan for now," said Qianji
CEO Liu.
(Reporting by Cate Cadell; Editing by Jonathan Weber and Edwina Gibbs)
[© 2019 Thomson Reuters. All rights
reserved.] Copyright 2019 Reuters. All rights reserved. This material may not be published,
broadcast, rewritten or redistributed.
Thompson Reuters is solely responsible for this content. |