0 views • 13:58

It's 2000, and legendary Silicon Valley programmer, namely Bill Joy, writes a fairly eloquent editorial for Wired: "Why The Future Doesn't Need Us". Joy's argument is very simple: If we continue to improve our ability to build machines that can perform complex tasks, we are very close to inventing the last machine: a machine that will make us all superfluous and obsolete. 20 years later, it’s 2019, the press and the entire Silicon Valley, where I come from, are trying every day to convince us that we are very close to a world where Joy's prophecy is almost fulfilled. The so-called fourth industrial revolution, the one empowered by AI, Artificial Intelligence, thinking machines. I am Jacopo, I'm the founder of a start-up called Tooso, together with two friends - clearly much more serious than I am. I live in San Francisco, and I work in Artificial Intelligence. I accepted the invitation to this talk, mostly, to explain to my parents what I do across the ocean all day long. (Laughter) AI, San Francisco, Silicon Valley: On paper, I am the right person to come and convince you of Californian "gospel"; but if you expect a typical Silicon Valley talk you’re going to be fairly disappointed today. The truth is, I don't blame you for buying this rhetoric of thinking machines. In recent years, Artificial Intelligence has achieved extraordinary results, succeeding in doing things that were previously considered typically human, such as recognizing kittens on YouTube - which is an activity all of you are very fond of, I know - and, on a more serious level, beating the Go world champion. Go is a very complicated Chinese version of chess. And the world champion of Go, as you can see in the picture, was not particularly happy with the outcome of the game. Today, most of Artificial Intelligence revolves around the concept of what we call "neural networks," which are usually represented by these vague schemes of neurons and connections, hence the name "neural network," vaguely inspired by the human brain. What happens is that you give an input to this network, which processes it - let’s say the image of a cat - and it tells you what is contained in these images: for instance, a cat. The truth is that to understand why Silicon Valley is wrong you don't really need to know what a neural network is or all the formulas explaining how it works. In particular, I would like you to forget everything you know about neural networks: think of neural networks as a magic box for today. It's a box that initially is completely empty, and you start filling it with images of dogs and cats, one after the other. Every time you put an image in this magic box, you say: "This is a dog" or "This is a cat." The box is magical because after you do that 100 thousand times, or ten million times, the box will be able by itself to tell a dog from a cat. But what they don't tell you on the glossy covers of Wired or Time Magazine is how much it costs to learn what a cat is on YouTube. Learning what a cat is on YouTube took 16 thousand processors and the equivalent of 10 millions kittens, which are really a lot of kittens. It took five million games at Go to beat a human being, which is the equivalent of 500 years spent playing. For Google, the cost of electricity alone to program that computer is estimated to be 25 million US dollars. So next time you complain about your electric bill, think about Google's. It's a pretty intuitive idea, actually, if we think about what happens in our everyday lives: If I go to Munich tomorrow, and after two days I come back and perfectly speak German, you'll all be stunned by my capability to learn: You'll think I'm a genius. If I lived in Germany for 40 years, got married to a German woman, worked for a German company, had children, [and] I come back and speak German - besides being very useful, nobody would think that I am a genius. What happens when Silicon Valley tries to convince you that Bill Joy's world is near and upon us - is that the shift between the human-grade concept of intelligence and learning and the one we use for machines is profoundly different. And in this shift lies the biggest trick. Let's do an experiment - unfortunately, I have nothing to make you sing along. I don't know how many of you ever saw an "Aardvark" in their life. Those who have seen an aardvark, raise your hand. None? Wow, someone! I wasn't prepared for anybody to actually say "yes". It’s not really important what an Aardvark is - it is a sub-Saharan mammal. The nice thing is, when I mentioned "Aardvark," you all looked at this portion of the image without me suggesting anything. Why did you do that? Because you recognized some ears, some eyes; in short, you recognized all the typical characteristics that make him stand out from the background. Not only: after a single example you are able to identify it in new photos - even if it's portrayed from a different angle or under a different light. Moreover, you can distinguish an aardvark from similar animals, such as an anteater. With just one example! When a neural network, a magic box, sees an aardvark for the first time, it doesn't know where to look. Each portion of the image can potentially contain an aardvark. That's the deep reason why you need to show it ten million of them. Because if you show ten million different images of aardvarks from different angles, with or without light, the AI can finally distinguish the mammal from the background. If we move from images to language, which is my field of study, things get even worse. This is a classic task of lexical analogy, very simple. The dog barks and the cat, obviously, meows. I can take my laptop that’s over there, run state-of-the-art Artificial Intelligence algorithms and in about three hours, reading the equivalent of 10,000 times War and Peace, the system can answer to this question. But this is not the only way to get it done. There is a physical system, very expensive - especially in America, where I live - and very complex, that can solve this problem in a much easier way. The amazing thing is that some of you might have this physical system at home: Some of you might even have more than one. We commonly call this system a "child" For your information, it's a scaled version of a human being. A child can solve this task in about three seconds, once you point to a cat and say "mew." A child is so good at learning that between 12 and 18 months old she can learn 75 new words a month. A child is so good at learning, that when you drop your car keys and curse, the child will learn it, whether you like it or not. (Laughter) Three hours, 10,000 War and Peace - and boy, I was told that War and Peace is really long; three seconds. This is the real gap between the Artificial Intelligence we have today and real intelligence. The distance between the world we live in and Billie Joy's world is the distance between three hours and three seconds. The good news is that we can use children to build better machines, trying to steal some of the secrets of these incredible system. This is a standard experiment, in which we tested a six month old child with a little game: there is a rail with a little train, as you see the train runs along the rail; the yellow screen is lifted up and you see a cube behind the rail. So far, nothing weird: a little train that simply runs along. As good scientists, at this point we trick the six month old child and we do exactly the same thing. But when the screen is lifted up, after the train has passed, we put the cube in the middle of the rail. Obviously, this is just to trick the child, because it’s physically impossible: In our world, solid bodies do not interpenetrate. But she knows nothing about physics: she is a six month old child. Even if you wanted to explain it to her, she could not understand it. So it’s interesting to understand: what goes on inside the mind of a child? When a child sees the possible world, the ordinary world, that child is happy, like all children are, I guess, looking at a train running along rails. When a child sees an impossible world, which physically cannot exist, here is what happens. (Laughter) The child stares for several seconds at the experimental setup, somehow wondering what is not working properly in reality. But there's something even more interesting: When you take the train that works in an impossible way and you give it to the child, she will do this. (Laughter) The child will test the sturdiness of the train. In fact, if you change experiment and you take a toy car instead, and you trick the child - it's really easy to trick six month olds, apparently - you take a toy car and pretend that it goes up, instead of falling, as gravity predicts. When you give the car back to the child, do you know what she will do? She will drop it from her high chair, to test that it actually obeys to the laws of gravity. These are two fundamental differences between how we build Artificial Intelligence and children. The first is that to learn quickly we already need to know something: In the case of a child, she already has an intuitive notion of physics in her brain, and uses it to learn. The second is, a child won't stand still, like a neural network, waiting for you to provide millions of images of cats. A child is like a little scientist: she uses the world around her to run some experiments, understand how reality works. The good news is, we can already use these principles to build systems that work in reality, and we have already begun to do so, in our own way. Let's take a very simple case you are all familiar with: purchasing clothes online. Go to an e-commerce platform, write a word, say "sheath dress," and press return. What happens at the other end is that there is an AI that tries to understand the meaning of this word. How do we program an Artificial Intelligence to solve this problem? Well, we have to do it as children do. The first thing is, provide the Artificial Intelligence with preliminary knowledge that explain our reference domain. It's not physics anymore, it is a more mundane domain: clothes. But it is useful for our Artificial Intelligence to know, for example, that clothes are organized into dresses, skirts, jeans; and that clothes are made of a certain color, a certain fabric, and have a certain style. The second ingredient is the experiment: How do we get a search engine to run experiments, an AI that can only observe your browser? Well, using all of you as guinea pigs. Every time you end up on a website which has one of these new Artificial Intelligences which learn as children do, every time you click on a product when the Artificial Intelligence offers you the results of the search engine, you are the metaphorical equivalent of the table the child was beating the train on. Artificial Intelligence is using you, your feedback, and your Intelligence, to learn by using the environment very efficiently. By using an Artificial Intelligence that works with these principles - and not like most AI work nowadays - you can learn that "sheath dress" and "dress" are actually synonyms, with only ten examples. Ten examples are not three seconds: however, in the continuum between three hours and three seconds, ten examples are the first step down the right path. Why should this be interesting for everyone, and not just for people like me who are clearly looking for an excuse to stay on the other side of the world, far away from their parents? For two reasons. The first is practical. There are so many areas in your life, for example areas covered by privacy, or areas covered by very rare events, in which even if you wanted you could never accumulate ten million examples. We would like to use machines, and all the things that machines can do better than us, in these areas. So learning with ten examples is a great way to use them more widely. The second reason, perhaps more important, is connected to society. A world where the benefits of Artificial Intelligence are only reaped by companies that can collect millions, billions, hundreds of billions of data is a world in which few companies determines what progress is. It is a world in which our vision of the future is largely determined by those who can afford it. That's why programming machines that work a little more like children somehow democratizes access to all the new technologies and to all the good things we can do together with the machines. Artificial Intelligence has made huge steps ahead in the last ten years, we can’t deny it. But the only way to produce "real" Artificial Intelligence is by taking inspiration from the real intelligence that we have at hand: the one of human beings, and in particular the one of children. So at the end of the day, with all due respect for Bill Joy, the future has actually never needed us more. Thank you. (Applause)