Next-Level Computing Will Overcome Energy Limits


Alphabet’s Hennessy: Technology Powers AI, Connectivity Breakthroughs

In light of the artificial intelligence revolution — “and it’s exactly that, a revolution,” John L. Hennessy, Chairman of Alphabet, Inc., contends — now the question is, how do we ensure that technology get’s used for lots of good and minimize potential harms? Training AI on the entire internet is probably not the wisest thing, because there’s a lot of garbage.”

Hennessy led off TDK Ventures’ Digital Transformation Week 2023 discoursing on energy, computing, and a range of other issues in a fireside chat with the venture company’s President, Nicolas Sauvage.

He said using Wikipedia for training works because the community-edited and managed online encyclopedia contains, by and large, accurate information. Better training data, more accurate human interaction in the form of prompts, and use of models for specific industries and applications will expand AI’s functionality and usefulness, he said.

“Can you make a smaller model and train it better?” he asked. “That way the results will be better than with a big model that’s trained with a lot of data which is not carefully curated.”

In response to Sauvage’s question about whether training AI on limited datasets might introduce a greater potential for bias, Hennessy distinguished the biases that could be inherent in AI:

· Societal bias, which is built into the input data. A query about race and income could pick up the bias that exists in U.S. data.

· Algorithmic bias, where the decisions AI exacerbates the bias present in the underlying data, leading it to overpredict or underpredict situations.

“If the AI system is replacing a person in making decisions, guess what? We have this image of people being perfect. People are biased, also,” Hennessy reminded the audience. “And you don’t even know what their biases are. At least with AI, I can test for bias. Can you imagine telling people when they come into a jury, ‘We’re going to give you a test to see whether you have certain biases that may lead to a higher probability of conviction’?”

He said future work will not only help ferret out the biases within AI, but also enable AI to uncover unconscious human biases even better than the psychological tests now being used.

He acknowledged that there are many areas where we want AI to be an advisor to the human, who in the end makes the decision. He would welcome AI examining all the health data available, which would be difficult for even a team of physicians to perform, before arriving at a diagnosis. But, he said, “if an AI system is doing diagnostic work and is about to tell me I have some terrible disease, and I’m not going to live very long, I really do want a person to tell me that news and share it in a personal setting.”

Sauvage observed that innovation often is the result of combining breakthroughs from two different disciplines and wondered if AI might be able to make that happen more quickly than humans.

“Think of AI as programming by learning from data,” Hennessy replied. “Take a gigantic database of observations and train a system to program lots of applications. Think about a large cloud server that Microsoft, Amazon, or Google is running and how hard it would be to optimize behavior on the cloud with lots of different applications with different usage patterns, running many different numbers of virtual machines using different amounts of memory.”

In the past, heuristics were the best, though inadequate, tool. AI, however, can observe the system’s behavior and learn how to optimize it.

Leveraging AI and other advanced technologies’ applications will necessitate next-level computing, Sauvage said. He asked Hennessy what computer advances and new connectivity breakthroughs he anticipates.

“We’ve kind of reached the point of diminishing returns in conventional general-purpose processors — machines that can run a hunk of C code that is just sitting there native and doesn’t have a lot of hints on how to do it efficiently. It’s got random pointers and all the things we’re used to programming and that good programmers can do to get efficiency up.”

Sauvage hit on the term “good programmers,” asking if they will be needed in an age of AI.

Hennessy assured the audience that will be the case for the foreseeable future.

“Don’t trust the code,” Hennessy advised. He mentioned Stanford Cryptography Professor Dan Boneh’s paper explaining that large language models don’t always get the code right, even though they think they do. And potentially more troubling, humans believe them to be infallible, because nobody wants to read somebody or something else’s code.

Programmers will move beyond this type of conventional computing, which is limited for the most part by energy efficiency (as well as silicon efficiency), Hennessy said.

“The reason is the techniques we use,” he explained. “All these machines speculate. A recent Intel processor has 100, maybe, 150 instructions in flight at any given time. So, it’s speculated through 3, 4, 5, 6 branches. What are the odds it got each of those six branches right? If not, we’re wasting time and energy. So, speculation is kind of played out.”

Caches, too, are reaching the end of their shelf life. Hennessy said it makes little sense to build beyond the 4-level caches we have today.

The AI revolution notwithstanding, “Computers have gotten a lot faster and it’s harder to find novel applications that justify spending on much faster ones,” he contended. “But then you put AI into the mix, and all of a sudden, for training and inference, we’ve got computationally demanding things that behave very differently than traditional applications — streaming data access, linear algebra, both sparse and dense. So, the machines have to be built differently.”

He said that necessitates a move into an era of domain-specific computers. This means designing computers to specialize in a class of problems rather than a single application as is common in smartphones. Computers focused on efficient training or inference, for example, can be developed without using 64 or even 32 bits.

“We do it by using a variety of techniques,” Hennessy said. “IEEE (bit standard) is overkill for most of these problems. But you’ve got a lot of parallelism, and you have to exploit that efficiently. By combining new systems like TensorFlow or CUDA that are designed to program in that environment with new architectures, the levels of performance (will skyrocket).”

Hennessy noted that the world’s fastest computers are being used not for massive mathematical and scientific calculations, but to train large language models. But there’s not enough of them, as evidenced by Microsoft and Oracle’s discussions to share servers, when they need additional capacity to accommodate large-scale AI customers. He said Alphabet measures the time it takes to train a large language model in terms of months depending on how many exaflops of computational power can be applied. (An exaflop is one quintillion — that’s a 1 followed by 18 0s — floating-point operations per second.)

Sauvage double-clicked on the training aspect in creating a need for different types of computing.

“There’s a scale issue in the size of the training set,” Hennessy observed. “If you want to train these new models faster, you’re going to have to use multiple clusters, so you’re going to have to distribute it. That means changing the algorithm, because you can’t instantly modify the weights globally. There are only a few handfuls of people who have actually had to wrestle with the problem.”

Do companies that possess large modified curated databases enjoy an unfair advantage over entrepreneurs? Hennessy noted that huge troves of public data are available, but this also raises issues. For instance, what is the role of copyright if models are trained on data (or books, art, code) that is someone’s intellectual property? Should people or companies who generate output based on these inputs have to pay for its use? How would they know whose original compositions contributed to their derivative work? Who should benefit? What’s a fair licensing fee?

“There’s no clear linkage between the data the model was trained on and when it gets used in an inference problem,” Hennessy noted. “We’ve got a bunch of societal issues we have to figure out. The availability of data will be crucial, but the key is going to be the quality of the data as much as the quantity.”

As the conversation shifted to connectivity, Hennessy expressed optimism that the age of optical conversion is drawing close. Short-haul applications can use optical to increase bandwidth, despite the cost of electrical-to-optical conversion.

“That’s an area where, if we had a real technology breakthrough that did some kind of hybrid solution, so we didn’t have to cross a package boundary to go from electrical to optical, we could really get significant enhancements in performance per power,” he predicted. “We have become a wireless world and we’ll see continued progress in mobile connectivity despite the challenges in terms of distance and coverage.”

Sauvage’s final question asked when singularity will be achieved. Hennessy said he believes large language models have already passed the Turing test for non-experts. These models today can convince basic computer users that they are human.

Singularity on the other hand implies that a model has reached a threshold general intelligence level. Hennessy said he has cut his original 20- to 40-year estimate in half. While others say singularity may be achieved in as little as five years, it will take only a decade until computer intelligence will be indistinguishable from human intelligence, even among experts across multiple disciplines.

“That doesn’t mean that AI will start replicating itself, unplugging humans, and replacing them,” he said. “But Bard already can explain mathematics to the level of someone who is a graduate-level researcher — probably the 98th percentile of human cognition on the subject.”

The second annual TDK Ventures DX Week brought together some of the world’s brightest talent in the digital space. Held over three days in three global technology hubs — Silicon Valley, Tokyo, and Bengaluru — the panels, interviews, and lectures centered on the nexus between the analog and digital environments. The most prestigious thought-leadership event of the year, DX Week highlighted the insights, best practices, and visions that will guide digital technologies toward creating a more productive, inclusive, and sustainable planet.

"*" indicates required fields