What could 2024 hold for AI?

Is litigation about to catch up with large language models?

02 January 2024

Interview with

Michael Wooldridge, University of Oxford

Part of the show What science has in store for 2024

COMPUTER_NETWORK

Credit:

CC0, via Pixabay

Play Download

Artificial intelligence was rarely off the front page in 2023; it seems to have impacted on every aspect of our lives. Many are worried about its impacts. So what will the hear ahead hold in this area? Professor Michael Wooldridge is an AI specialist from the department of Computer Science at the University of Oxford and also delivered last year's Royal Institution Christmas Lectures on the topic...

Michael - If I look at my field, there are various sort of watershed years in technology, but very often you don't realise they're watershed years until sometime later. There's the transistor being invented in the 1960s, and then you've got the microprocessor technology in the 1970s. Then fast forward to the worldwide web in the late 90s and the empires that are created there are Google and Amazon. In the smartphone era, the mobile era, you've got Facebook and Twitter. They're basically the empires that are created on the back of that. And what's happened this year is that one of those speculative bets, the bet on large language model, turns out to be the one that is surprisingly successful. So what we're seeing now is OpenAIs, ChatGPTs is the tool that takes off. Now we're seeing everybody else scramble to catch up.

Chris - How's that Counteroffensive going to play out? Are Google slash Microsoft so big that rather than seeing, as you described about these watershed moments in the evolution of technology, new frontrunners emerge. Well, the new front runners here are old runners that have got a new song to sing. So is that going to be the status quo or are we going to see whole new ways of doing this burst onto the scene and gazump them?

Michael - With respect to the Googles and the Microsofts, there is a certain element of just not losing ground. I mean, they're desperate to maintain their market position and not have a new, like it was in 1998 or so, the Google that suddenly takes over from Alta Vista and, and the other search companies whose names I can't remember anymore.

Chris - Well Yahoo of course, still a player. But yes, I was going to go down that line and say that web search was immediately the big driving factor. And quite quickly one front ran a Google emerged and has dominated. At one point accounting for into the 90s of percentage of all the search that was being done on the web around the world. Are we going to see AI technologies go the same way where there is this sort of competition survival of the fittest, one of them emerges. Or are we just going to see mass competition? Everyone continues to use and deploy their own in-house forms of this?

Michael - Difficult to know, but what's remarkable is that ChatGPT has just become the generic name for these things, right? I mean it's like the Hoover of the large language model world. And that I think is a huge advantage that OpenAI already has. They've got the brand out there, they managed to land that. So it's their, at the moment, it's their game to lose. I think.

Chris - What about the issue of it's a closed black box. We've no idea what goes on inside it versus open source. Some people are actually making the workings, the mechanisms of their platforms, available for people to look at, tinker with and so on, which they say makes it more transparent. How's that likely to play out? Do you think that that might be the panacea where people say, 'to give consumer confidence, we will have an open source approach to this', so that people can see how it works and they can be involved in its architecture and its evolution?

Michael - The open source large language model world has a very, very powerful advocate in the form of Yann LeCunn, who's I think head of AI at Facebook Research. And he very, very passionately believes that the open models are the way forward. But open is quite a complicated story with respect to large language models. Firstly, there's the training data, the stuff that goes into it, how the data that you use to actually build these models. And ideally we would like to be able to see that data. For example, I would quite like to know the data that was used to train ChatGPT that relates to me. And there is some training data in there, I know that, because it can answer questions about me. Then there's actually the code that was used with that training data to build the model. And ideally we would like to be able to see that. But then finally there's also what's called the runtime version of it, which is once you've trained the model, the version that you actually see. But also there are all the processes. So for example, exactly what process did these companies use in order to decide what data went into the model. How did they screen that data? And also we'd like to know how these things are tested and even the advocates of open source very often aren't particularly keen on opening up all of those different elements. So it's quite a complicated story. And open doesn't necessarily mean as open as we might like

Chris - When we're speaking about training data. One of the things that I think a bit predictably, actually, has surfaced has been people, maybe quite rightly, saying 'it's obvious that you've used my copyright, my intellectual property to train your system.' So what's likely to emerge as the solution there, if we are feeding the entire world wide web into these models and therefore the fruits of everyone's labours is being used to inform how they work and someone ultimately is making money out of it. I could say, 'well, I want a slice of that pie.'

Michael - With respect to books. Books get pirated. And it was a source of great frustration to me that my textbook that I wrote, the very first link that you found when you searched for it was to a pirated version. The difficulty here is works like that seem to have been ingested either knowingly or unknowingly. And the jury's a little bit out on whether the extent to which it was knowingly or unknowingly done. And they are therefore sort of implicitly there, but current copyright law wasn't really designed to deal with issues like this. So has my book actually been copied or not in a conventional sense? Is it a derivative work? Not in a conventional sense. And if a model is trained on the other side of the planet, then how does copyright law work there? So these are difficult issues. The companies that develop these models claim that this is a fair use, but this is one of those situations where I think we're gonna see things played out in the courts and we will have to wait for the courts to make their rulings. The difficulty is courts don't move quickly and the technology is moving very, very, very quickly. So, in the meantime there's going to be a lot of people that are very, very anxious about this.

Chris - You've just given the Christmas lectures this year on this very topic of artificial intelligence and its inexorable march forward. It's for a young audience. What do they make of it?

Michael - We try to do as responsible a job as we can about them getting excited about the beneficial applications. So, For example, we saw some really wonderful examples of how AI is used in healthcare, but in the final lecture, we address the issues and we touch on this issue. For example, in the section on arts, the issue of copyrights touched on there. But we raise the concerns that people have concerns about misinformation and we go to the really, really big ones. Questions about existential risk and so on, we try to address those as well. But the lectures are designed to give this audience who are going to be the first generation in history that are going to enter adulthood in the age of AI. We try to do as good a job as possible about educating them about what the technology is so they're not under any illusion that there's a mind on the other side of the screen, what the technology means, and what the risks are as well as the potential beneficial applications.