论文部分内容阅读
Picture yourself driving down a city street. You go around a curve, and suddenly see something in the middle of the road ahead. What should you do?
Of course, the answer depends on what that “something” is. A torn paper bag, a lost shoe, or a tumbleweed1? You can drive right over it without a second thought, but you’ll definitely swerve2 around a pile of broken glass. You’ll probably stop for a dog standing in the road but move straight into a flock of pigeons, knowing that the birds will fly out of the way. You might plough right through a pile of snow, but veer around a carefully constructed snowman.3 In short, you’ll quickly determine the actions that best fit the situation—what humans call having “common sense.”
Human drivers aren’t the only ones who need common sense; its lack in artificial intelligence (AI) systems will likely be the major obstacle to the wide deployment4 of fully autonomous cars. Even the best of today’s self-driving cars are challenged by the object-in-the-road problem. Perceiving “obstacles”that no human would ever stop for, these vehicles are liable to slam on the brakes5 unexpectedly, catching other motorists off-guard. Rear-ending6 by human drivers is the most common accident involving self-driving cars.
The challenges for autonomous vehicles probably won’t be solved by giving cars more training data or explicit rules for what to do in unusual situations. To be trustworthy, these cars need common sense: broad knowledge about the world and an ability to adapt that knowledge in novel7 circumstances. While today’s AI systems have made impressive strides in domains ranging from image recognition to language processing, their lack of a robust foundation of common sense makes them susceptible to unpredictable and unhumanlike errors.8
Common sense is multifaceted, but one essential aspect is the mostly tacit “core knowledge” that humans share9—knowledge we are born with or learn by living in the world. That includes vast knowledge about the properties10 of objects, animals, other people and society in general, and the ability to flexibly apply this knowledge in new situations. You can predict, for example, that while a pile of glass on the road won’t fly away as you approach, a flock of birds likely will. If you see a ball bounce in front of your car, for example, you know that it might be followed by a child or a dog running to retrieve11 it. From this perspective, the term “common sense” seems to capture exactly what current AI cannot do: use general knowledge about the world to act outside prior training or pre-programmed rules. Today’s most successful AI systems use deep neural networks. These are algorithms trained to spot patterns, based on statistics gleaned from extensive collections of human-labelled examples.12 This process is very different from how humans learn. We seem to come into the world equipped with innate knowledge of certain basic concepts that help to bootstrap our way to understanding—including the notions of discrete objects and events, the three-dimensional nature of space, and the very idea of causality itself.13 Humans also seem to be born with nascent concepts of sociality: Babies can recognise simple facial expressions,they have inklings about language and its role in communication, and rudimentary strategies to entice adults into communication.14 Such knowledge is so elemental and immediate that we aren’t even conscious we have it, or that it forms the basis for all future learning. A big lesson from decades of AI research is how hard it is to teach such concepts to machines.
On top of their innate knowledge, children also exhibit innate drives to actively explore the world, figure out the causes and effects of events, make predictions, and enlist15 adults to teach them what they want to know. The formation of concepts is tightly linked to children developing motor skills and awareness of their own bodies—for example, it appears that babies start to reason about why other people reach for objects at the same time that they can do such reaching for themselves. While today’s state-of-the-art machine-learning systems start out as blank slates, and function as passive, bodiless learners of statistical patterns; by contrast, common sense in babies grows via innate knowledge combined with learning that’s embodied, social, active and geared towards creating and testing theories of the world.16
The history of implanting common sense in AI systems has largely focused on cataloguing human knowledge: manually programming, crowdsourcing, or web-mining commonsense “assertions” or computational representations of stereotyped situations.17 But all such attempts face a major, possibly fatal obstacle: Much of our core intuitive knowledge is unwritten, unspoken, and not even in our conscious awareness.
The US Defense Advanced Research Projects Agency (DARPA)18, a major funder of AI research, recently launched a four-year programme on “Foundations of Human Common Sense” that takes a different approach. It challenges researchers to create an AI system that learns from“experience” in order to attain the cognitive abilities of an 18-month-old baby. It might seem strange that matching a baby is considered a grand challenge for AI, but this reflects the gulf19 between AI’s success in specific, narrow domains and more general, robust intelligence. Core knowledge in infants develops along a predictable timescale, according to developmental psychologists. For example, around the age of two to five months, babies exhibit knowledge of “object permanence”20: If an object is blocked by another object, the first object still exists, even though the baby can’t see it. At this time babies also exhibit awareness that when objects collide, they don’t pass through one another, but their motion changes; they also know that “agents”—entities with intentions, such as humans or animals—can change objects’ motion. Between nine and 15 months, infants come to have a basic “theory of mind”: they understand what another person can or cannot see and, by 18 months, can recognise when another person displays the need for help.
Since babies under 18 months can’t tell us what they’re thinking, some cognitive milestones have to be inferred indirectly. This usually involves experiments that test “violation of expectation.” Here, a baby watches one of two staged scenarios, only one of which conforms to commonsense expectations. The theory is that a baby will look for a longer time at the scenario that violates her expectations, and indeed, babies tested in this way look longer when the scenario does not make sense.
In DARPA’s Foundations of Human Common Sense challenge, each team of researchers is charged with developing a computer program—a simulated “commonsense agent”—that learns from videos or virtual reality. DARPA’s plan is to evaluate these agents by performing experiments similar to those that have been carried out on infants and measuring the agents’ “violation of expectation signals.”
This won’t be the first time that AI systems are evaluated on tests designed to gauge21 human intelligence. In 2015, one group showed that an AI system could match a four-year-old’s performance on an IQ test, resulting in the BBC reporting that “AI had IQ of four-year-old child.” More recently,researchers at Stanford University created a “reading” test that became the basis for the New York Post reporting that “AI systems are beating humans in reading comprehension.” These claims are misleading, however. Unlike humans who do well on the same test, each of these AI systems was specifically trained in a narrow domain and didn’t possess any of the general abilities the test was designed to measure. As the computer scientist Ernest Davis at New York University warned: “The public can easily jump to the conclusion that, since an AI program can pass a test, it has the intelligence of a human that passes the same test.” I think it’s possible—even likely—that something similar will happen with DARPA’s initiative. It could produce an AI program specifically trained to pass DARPA’s tests for cognitive milestones, yet possess none of the general intelligence that gives rise to these milestones in humans. I suspect there’s no shortcut to actual common sense, whether one uses an encyclopaedia22, training videos or virtual environments. To develop an understanding of the world, an agent needs the right kind of innate knowledge, the right kind of learning architecture, and the opportunity to actively grow up in the world. They should experience not just physical reality, but also all of the social and emotional aspects of human intelligence that can’t really be separated from our “cognitive” capabilities.
While we’ve made remarkable progress, the machine intelligence of our current age remains narrow and unreliable. To create more general and trustworthy AI, we might need to take a radical step backward: to design our machines to learn more like babies, instead of training them specifically for success against particular benchmarks23. After all, parents don’t directly train their kids to exhibit “violation of expectation” signals; how infants behave in psychology experiments is simply a side effect of their general intelligence. If we can figure out how to get our machines to learn like children, perhaps after some years of curiosity-driven, physical and social learning, these young“commonsense agents” will finally become teenagers—ones who are sufficiently sensible to be entrusted with the car keys.
1. tumbleweed: 風滚草,通常生长于北美和澳大利亚,枯萎后在地面处折落,随风像球一样四处滚动。
2. swerve: 突然改变方向,急转弯。
3. plough through: 猛地撞过;veer: 改变方向,转向。
4. deployment: 使用,运用。
5. slam on the brake: 猛踩刹车。
6. rear-ending: 追尾。
7. novel: 新的,新奇的。
8. 虽然今天的人工智能系统在图像识别、语言处理等方面都取得了长足进步,但因为它们缺乏常识的坚实基础,所以容易犯下一些不可预测的、人类不会犯的错误。robust: 坚实的,强有力的;susceptible: 易受影响的,易受伤害的。
9. multifaceted: 多方面的;tacit:不言明的。
10. property: 属性,特性。
11. retrieve: 找回,取回。
12. 这些算法可以从大量人类标记的例子中收集数据,然后识别出各种模式。algorithm:(计算机的)算法,计算程序;glean: 缓慢而艰难地收集(信息)。
13. innate: 固有的,与生俱来的;bootstrap: 通过努力来达到;discrete: 分离的,不相关的;causality: 因果关系。
14. 同样,人类似乎生来就有社会性的观念,比如婴儿能够识别简单的面部表情,大致知道语言及其在交流中的作用,也有一些基本策略来吸引成人与之互动。nascent: 新生的,萌芽的;inkling: 略知,模糊印象;rudimentary: 基本的,初步的;entice: 诱惑。
15. enlist: 争取(帮助或支持)。
16. state-of-the-art: 最新的,最前沿的;blank slates: 白板;gear towards:(使)准备好,(使)合适。
17. 统观过去,在人工智能系统中植入常识的主要关注点都是对人类知识的编目,具体方法包括人工编程、众包、在网上挖掘关于常识的“判断”或者用计算机来展示固定情景等。implant:植入;catalogue: 将……编入目录;crowdsourcing: 众包,从广泛群体尤其是在线社区中获取想法、服务或内容的方法。
18. DARPA: 美国国防高级研究计划局,是美国国防部下属的一个行政机构,负责研发用于军事用途的高新科技。
19. gulf: 鸿沟,巨大差距。
20. object permanence: 客体永久性,即儿童理解物体是作为独立实体而存在的,即使不能知觉到,这些物体也依然存在。
21. gauge: 测量。
22. encyclopaedia: 百科全书。
23. benchmark: 基准(点)。
Of course, the answer depends on what that “something” is. A torn paper bag, a lost shoe, or a tumbleweed1? You can drive right over it without a second thought, but you’ll definitely swerve2 around a pile of broken glass. You’ll probably stop for a dog standing in the road but move straight into a flock of pigeons, knowing that the birds will fly out of the way. You might plough right through a pile of snow, but veer around a carefully constructed snowman.3 In short, you’ll quickly determine the actions that best fit the situation—what humans call having “common sense.”
Human drivers aren’t the only ones who need common sense; its lack in artificial intelligence (AI) systems will likely be the major obstacle to the wide deployment4 of fully autonomous cars. Even the best of today’s self-driving cars are challenged by the object-in-the-road problem. Perceiving “obstacles”that no human would ever stop for, these vehicles are liable to slam on the brakes5 unexpectedly, catching other motorists off-guard. Rear-ending6 by human drivers is the most common accident involving self-driving cars.
The challenges for autonomous vehicles probably won’t be solved by giving cars more training data or explicit rules for what to do in unusual situations. To be trustworthy, these cars need common sense: broad knowledge about the world and an ability to adapt that knowledge in novel7 circumstances. While today’s AI systems have made impressive strides in domains ranging from image recognition to language processing, their lack of a robust foundation of common sense makes them susceptible to unpredictable and unhumanlike errors.8
在許多特定的考试科目如算术、阅读中,计算机都能取得高于人类平均水平的成绩。但是否就可以说,人工智能已经拥有人类的智商了呢?或许,比考试更难的是生活中大大小小的常识。这些常识在判断中起着至关重要的作用,但是又庞杂琐碎,很难习得。人工智能与人脑的差距,可能恰恰就在这里。
Common sense is multifaceted, but one essential aspect is the mostly tacit “core knowledge” that humans share9—knowledge we are born with or learn by living in the world. That includes vast knowledge about the properties10 of objects, animals, other people and society in general, and the ability to flexibly apply this knowledge in new situations. You can predict, for example, that while a pile of glass on the road won’t fly away as you approach, a flock of birds likely will. If you see a ball bounce in front of your car, for example, you know that it might be followed by a child or a dog running to retrieve11 it. From this perspective, the term “common sense” seems to capture exactly what current AI cannot do: use general knowledge about the world to act outside prior training or pre-programmed rules. Today’s most successful AI systems use deep neural networks. These are algorithms trained to spot patterns, based on statistics gleaned from extensive collections of human-labelled examples.12 This process is very different from how humans learn. We seem to come into the world equipped with innate knowledge of certain basic concepts that help to bootstrap our way to understanding—including the notions of discrete objects and events, the three-dimensional nature of space, and the very idea of causality itself.13 Humans also seem to be born with nascent concepts of sociality: Babies can recognise simple facial expressions,they have inklings about language and its role in communication, and rudimentary strategies to entice adults into communication.14 Such knowledge is so elemental and immediate that we aren’t even conscious we have it, or that it forms the basis for all future learning. A big lesson from decades of AI research is how hard it is to teach such concepts to machines.
On top of their innate knowledge, children also exhibit innate drives to actively explore the world, figure out the causes and effects of events, make predictions, and enlist15 adults to teach them what they want to know. The formation of concepts is tightly linked to children developing motor skills and awareness of their own bodies—for example, it appears that babies start to reason about why other people reach for objects at the same time that they can do such reaching for themselves. While today’s state-of-the-art machine-learning systems start out as blank slates, and function as passive, bodiless learners of statistical patterns; by contrast, common sense in babies grows via innate knowledge combined with learning that’s embodied, social, active and geared towards creating and testing theories of the world.16
The history of implanting common sense in AI systems has largely focused on cataloguing human knowledge: manually programming, crowdsourcing, or web-mining commonsense “assertions” or computational representations of stereotyped situations.17 But all such attempts face a major, possibly fatal obstacle: Much of our core intuitive knowledge is unwritten, unspoken, and not even in our conscious awareness.
The US Defense Advanced Research Projects Agency (DARPA)18, a major funder of AI research, recently launched a four-year programme on “Foundations of Human Common Sense” that takes a different approach. It challenges researchers to create an AI system that learns from“experience” in order to attain the cognitive abilities of an 18-month-old baby. It might seem strange that matching a baby is considered a grand challenge for AI, but this reflects the gulf19 between AI’s success in specific, narrow domains and more general, robust intelligence. Core knowledge in infants develops along a predictable timescale, according to developmental psychologists. For example, around the age of two to five months, babies exhibit knowledge of “object permanence”20: If an object is blocked by another object, the first object still exists, even though the baby can’t see it. At this time babies also exhibit awareness that when objects collide, they don’t pass through one another, but their motion changes; they also know that “agents”—entities with intentions, such as humans or animals—can change objects’ motion. Between nine and 15 months, infants come to have a basic “theory of mind”: they understand what another person can or cannot see and, by 18 months, can recognise when another person displays the need for help.
Since babies under 18 months can’t tell us what they’re thinking, some cognitive milestones have to be inferred indirectly. This usually involves experiments that test “violation of expectation.” Here, a baby watches one of two staged scenarios, only one of which conforms to commonsense expectations. The theory is that a baby will look for a longer time at the scenario that violates her expectations, and indeed, babies tested in this way look longer when the scenario does not make sense.
In DARPA’s Foundations of Human Common Sense challenge, each team of researchers is charged with developing a computer program—a simulated “commonsense agent”—that learns from videos or virtual reality. DARPA’s plan is to evaluate these agents by performing experiments similar to those that have been carried out on infants and measuring the agents’ “violation of expectation signals.”
This won’t be the first time that AI systems are evaluated on tests designed to gauge21 human intelligence. In 2015, one group showed that an AI system could match a four-year-old’s performance on an IQ test, resulting in the BBC reporting that “AI had IQ of four-year-old child.” More recently,researchers at Stanford University created a “reading” test that became the basis for the New York Post reporting that “AI systems are beating humans in reading comprehension.” These claims are misleading, however. Unlike humans who do well on the same test, each of these AI systems was specifically trained in a narrow domain and didn’t possess any of the general abilities the test was designed to measure. As the computer scientist Ernest Davis at New York University warned: “The public can easily jump to the conclusion that, since an AI program can pass a test, it has the intelligence of a human that passes the same test.” I think it’s possible—even likely—that something similar will happen with DARPA’s initiative. It could produce an AI program specifically trained to pass DARPA’s tests for cognitive milestones, yet possess none of the general intelligence that gives rise to these milestones in humans. I suspect there’s no shortcut to actual common sense, whether one uses an encyclopaedia22, training videos or virtual environments. To develop an understanding of the world, an agent needs the right kind of innate knowledge, the right kind of learning architecture, and the opportunity to actively grow up in the world. They should experience not just physical reality, but also all of the social and emotional aspects of human intelligence that can’t really be separated from our “cognitive” capabilities.
While we’ve made remarkable progress, the machine intelligence of our current age remains narrow and unreliable. To create more general and trustworthy AI, we might need to take a radical step backward: to design our machines to learn more like babies, instead of training them specifically for success against particular benchmarks23. After all, parents don’t directly train their kids to exhibit “violation of expectation” signals; how infants behave in psychology experiments is simply a side effect of their general intelligence. If we can figure out how to get our machines to learn like children, perhaps after some years of curiosity-driven, physical and social learning, these young“commonsense agents” will finally become teenagers—ones who are sufficiently sensible to be entrusted with the car keys.
1. tumbleweed: 風滚草,通常生长于北美和澳大利亚,枯萎后在地面处折落,随风像球一样四处滚动。
2. swerve: 突然改变方向,急转弯。
3. plough through: 猛地撞过;veer: 改变方向,转向。
4. deployment: 使用,运用。
5. slam on the brake: 猛踩刹车。
6. rear-ending: 追尾。
7. novel: 新的,新奇的。
8. 虽然今天的人工智能系统在图像识别、语言处理等方面都取得了长足进步,但因为它们缺乏常识的坚实基础,所以容易犯下一些不可预测的、人类不会犯的错误。robust: 坚实的,强有力的;susceptible: 易受影响的,易受伤害的。
9. multifaceted: 多方面的;tacit:不言明的。
10. property: 属性,特性。
11. retrieve: 找回,取回。
12. 这些算法可以从大量人类标记的例子中收集数据,然后识别出各种模式。algorithm:(计算机的)算法,计算程序;glean: 缓慢而艰难地收集(信息)。
13. innate: 固有的,与生俱来的;bootstrap: 通过努力来达到;discrete: 分离的,不相关的;causality: 因果关系。
14. 同样,人类似乎生来就有社会性的观念,比如婴儿能够识别简单的面部表情,大致知道语言及其在交流中的作用,也有一些基本策略来吸引成人与之互动。nascent: 新生的,萌芽的;inkling: 略知,模糊印象;rudimentary: 基本的,初步的;entice: 诱惑。
15. enlist: 争取(帮助或支持)。
16. state-of-the-art: 最新的,最前沿的;blank slates: 白板;gear towards:(使)准备好,(使)合适。
17. 统观过去,在人工智能系统中植入常识的主要关注点都是对人类知识的编目,具体方法包括人工编程、众包、在网上挖掘关于常识的“判断”或者用计算机来展示固定情景等。implant:植入;catalogue: 将……编入目录;crowdsourcing: 众包,从广泛群体尤其是在线社区中获取想法、服务或内容的方法。
18. DARPA: 美国国防高级研究计划局,是美国国防部下属的一个行政机构,负责研发用于军事用途的高新科技。
19. gulf: 鸿沟,巨大差距。
20. object permanence: 客体永久性,即儿童理解物体是作为独立实体而存在的,即使不能知觉到,这些物体也依然存在。
21. gauge: 测量。
22. encyclopaedia: 百科全书。
23. benchmark: 基准(点)。