NVIDIA CEO Jensen Huang's Vision for the Future

NVIDIA's key insight: 10% of code does 99% of work. Their GPU tech revolutionized AI via AlexNet in 2012. Huang sees AI as human enhancement, not replacement. Success tip: stay curious, ask good questions!

NVIDIA CEO Jensen Huang's Vision for the Future

Disclaimer:

This blog post was auto-generated by Accunote AI, aiming to make audio knowledge sharing more accessible. While we strive for accuracy, please note that AI-generated content may contain minor inaccuracies. The original ideas and insights belong to their respective creators: CleoAbram.



Q1: What was the key insight that led to NVIDIA's fundamental shift in computing, and how did it evolve from video games to the current AI revolution?
- The key insight in the early 1990s was that in a software program, about 10% of the code does 99% of the processing, and that 99% could be done in parallel. However, the other 90% of the code has to be done sequentially. NVIDIA realized that the perfect computer is one that could do both sequential and parallel processing, not just one or the other. This observation led NVIDIA to build a company focused on solving computer problems that normal computers couldn't handle, which was the beginning of NVIDIA. The company initially focused on video games because they required parallel processing for 3D graphics, and the team loved the application of simulating virtual worlds. Additionally, video games had the potential to become the largest entertainment market ever, which proved to be true. Having a large market was crucial because it allowed for a substantial R&D budget to create new technology, given the complexity of the technology involved.


Q2: How did NVIDIA's approach to parallel processing in GPUs differ from traditional CPUs, and why was this significant?
- NVIDIA's approach to parallel processing in GPUs (Graphics Processing Units) differed significantly from traditional CPUs (Central Processing Units) in their ability to handle multiple tasks simultaneously. This difference is well illustrated by a 15-year-old video on the NVIDIA YouTube channel featuring the Mythbusters. In the video, they use a small robot shooting paintballs one by one to demonstrate sequential processing on a CPU, solving problems one at a time. In contrast, they then show a huge robot that shoots all the paintballs at once, representing parallel processing on a GPU, which can handle many smaller problems simultaneously.


Q3: Why did NVIDIA choose to focus on video games initially, and how did this decision contribute to the company's success?
- NVIDIA chose to focus on video games initially for several reasons. Firstly, the team loved the application of video games, which involves simulating virtual worlds. Secondly, they had the insight that video games had the potential to become the largest market for entertainment ever, which turned out to be true. This large market was crucial because the technology NVIDIA was developing was complicated and expensive. Having a large market meant that NVIDIA could have a substantial R&D budget, allowing them to create new technology and continually innovate. The focus on video games provided the financial foundation and technological playground for NVIDIA to develop and refine their parallel processing capabilities, which would later prove instrumental in other fields, including AI and scientific computing.


Q4: Could you explain the concept of GPU as a "time machine" and its impact on various industries?
- A GPU is like a time machine because it allows users to see the future sooner. This concept is exemplified by a quantum chemistry scientist who told the interviewee that because of NVIDIA's work, he could complete his life's work within his lifetime. This is time travel in the sense that GPUs make applications run so much faster, enabling researchers to accomplish tasks that were previously beyond their lifetime. For example, in weather prediction, users are essentially seeing the future. Similarly, when simulating a virtual city with virtual traffic and testing a self-driving car through that virtual environment, it's a form of time travel. Parallel processing, which took off in gaming, has allowed the creation of complex worlds in computers that were previously impossible. This technology has revolutionized various industries, changing how we perceive what's possible with computers. In the early 2000s, researchers had to trick GPUs into thinking their problems were graphics problems to utilize this power. This led to the creation of CUDA, a platform that allows programmers to instruct GPUs using familiar programming languages like C, making this computing power more accessible to a broader range of users.


Q5: What was the vision behind creating CUDA, and what factors contributed to its development?
- The vision behind CUDA emerged from a combination of researchers' discoveries, internal inspiration, and problem-solving needs. Some of the first external ideas for using GPUs for parallel processing came from interesting work in medical imaging, where researchers at Massachusetts General Hospital were using NVIDIA's graphics processors for CT reconstruction. This external application inspired the company. Internally, NVIDIA was trying to solve the problem of creating dynamic and beautiful virtual worlds for video games, which required particle physics and fluid dynamics simulations. These simulations were challenging to implement with a pipeline designed only for computer graphics. Additionally, researchers were experimenting with using GPUs for general-purpose acceleration. When the time came to create something more structured, CUDA was developed as a result of these converging factors. The interviewee was confident in CUDA's potential success because NVIDIA's GPUs were set to become the highest volume parallel processors built in the world, driven by the large video game market. This architecture had a good chance of reaching many people, making it an optimistic and potentially transformative project.


Q6: How did the development of GPUs and CUDA contribute to the advancement of AI, particularly in image recognition?
- The development of GPUs and CUDA played a crucial role in advancing AI, particularly in image recognition. In 2012, a group of three researchers submitted an entry to a famous competition aimed at creating computer systems that could recognize and categorize images. Their entry, called AlexNet, significantly outperformed the competition, achieving a much lower error rate in image recognition tasks. AlexNet was a type of AI called a neural network. One of the key reasons for its exceptional performance was the use of a vast amount of training data, which was processed using NVIDIA GPUs. This breakthrough demonstrated that GPUs were not just tools for making computers faster and more efficient, but they were becoming the engines of a new computing paradigm. This shift moved from instructing computers with step-by-step directions to training them to learn by exposing them to a huge number of examples. The 2012 AlexNet moment marked the beginning of a seismic shift in AI that we are witnessing today, where GPUs have become instrumental in enabling more complex and powerful AI systems.


Q7: Could you describe what that moment was like from your perspective, and what did you see it would mean for all of our futures?
- When creating something new like CUDA, there's always the cynical perspective that "if you build it, they might not come." However, the optimist perspective, which is usually how we look at the world, is that "if you don't build it, they can't come." We have to reason intuitively about why this would be very useful. In 2012, Ilya Sutskever, Alex Krizhevsky, and Jeff Hinton from the University of Toronto reached out to use a GeForce GTX 580 because they learned about CUDA and its potential as a parallel processor for training AlexNet. Our inspiration that GeForce could be the vehicle to bring this parallel architecture into the world, and that researchers would somehow find it someday, was a good strategy. It was a strategy based on hope, but it was also reasoned hope. Simultaneously, we were trying to solve the computer vision problem inside NVIDIA and were frustrated by early developments internally. When we saw AlexNet, this new algorithm that was completely different from previous computer vision algorithms, take a giant leap in capability, we were highly interested and inspired. The big breakthrough was when we asked ourselves how far AlexNet could go. If it could do this with computer vision, how far could it go? We rightfully reasoned that if machine learning and deep learning architecture could scale, the vast majority of machine learning problems could be represented with deep neural networks. The type of problems we could solve with machine learning is so vast that it has the potential to reshape the computer industry altogether. This prompted us to reengineer the entire computing stack, which is where DGX came from. All of this came from the observation that we ought to reinvent the entire computing stack layer by layer. After 65 years since IBM System 360 introduced modern general-purpose computing, we've reinvented computing as we know it.


Q8: How would you summarize those core beliefs about the way computers should work and what they can do for us that keeps you not only coming through that decade but also doing what you're doing now, making bets for the next few decades?
- The first core belief was about accelerated computing and parallel computing versus general-purpose computing. We would add two of those processors together, and we would do accelerated computing. NVIDIA continues to believe that today. The second was the recognition that these deep learning networks, these DNNs that came to the public during 2012, have the ability to learn patterns and relationships from a whole bunch of different types of data. They can learn more and more nuanced features if they could be larger and larger, and it's easier to make them larger and larger to make them deeper and wider. So the scalability of the architecture is empirically true. The fact that model size and the data size being larger and larger can learn more knowledge is also empirically true. Unless there's a physical limit, or an architectural limit, or mathematical limit, which was never found, we believe that you could scale it. Then the question is, what can you learn from data? What can you learn from experience? Data is basically digital versions of human experience.


Q9: We've now demonstrated that AI, or deep learning, has the ability to learn almost any modality of data, and it can translate to any modality of data. What does that mean?
- You can go from text to text, summarize a paragraph. You can go from text to text, translate from language to language. You can go from text to images, that's image generation. You can go from images to text, that's captioning. You can even go from amino acid sequences to protein structures. In the future, you'll go from protein to words, or give an example of a protein that has these properties identifying a drug target. So you could just see that all of these problems are around the corner to be solved. You can go from words to video. You obviously can learn object recognition from images. You can learn speech from just listening to sound. You can learn even languages, vocabulary, syntax, and grammar, all just by studying a whole bunch of letters and words.


Q10: Why can't you go from words to action tokens for a robot? From the computer's perspective, how is it any different?
- The transition from words to action tokens for robots presents unique challenges compared to text-based language models. While computers process both as data, the fundamental difference lies in the physical world interaction required for robots. Robots need to understand and interact with the real world, which involves complex factors like physics, spatial awareness, and real-time decision-making. This opens up a universe of opportunities and problems to solve, which is quite exciting. We are on the cusp of an enormous change, and it's becoming increasingly difficult to predict how we will be using the technology being developed in the next 10 years.


Q11: How do you see the application of AI evolving in the next decade compared to the last?
- The last 10 years were primarily focused on the fundamental science of AI. In contrast, the next 10 years will be about the application science of AI, while still advancing the fundamental science. This shift towards applied research will explore how AI can be integrated into various fields such as digital biology, climate technology, agriculture, fishery, robotics, transportation, logistics optimization, and even podcasting. The focus will be on practical implementations and real-world applications of AI technologies across diverse sectors.


Q12: Can you explain the concept of physical AI and how NVIDIA is contributing to its development?
- Physical AI, or robots, encompasses a wide range of applications from humanoid robots to self-driving cars, smart buildings, autonomous warehouses, and autonomous lawnmowers. NVIDIA is developing tools to accelerate the training of these robotic systems. Traditionally, robots were trained in the real world, which was time-consuming and potentially damaging, or through limited data sources like motion capture suits. NVIDIA's Omniverse provides 3D digital worlds for training robotic systems, allowing for more repetitions, diverse conditions, and faster learning without physical world limitations. The recently announced Cosmos enhances this capability by making the 3D universe more realistic, providing various lighting conditions, times of day, and experiences for more comprehensive robot training.


Q13: How does NVIDIA's approach to robotics compare to language models like ChatGPT?
- NVIDIA's approach to robotics draws parallels with the development of language models like ChatGPT. Similar to how ChatGPT generates text from prompts, NVIDIA is developing a world foundation model for robotics. This model, called Cosmos, serves as a world language model, encoding physical common sense such as gravity, friction, inertia, geometric and spatial awareness, and object permanence. Just as language models can be grounded with context from PDFs or search results, Cosmos is grounded with physical simulations provided by Omniverse. Omniverse uses physics simulation based on Newtonian physics, providing a ground truth for the AI. The combination of Omniverse and Cosmos allows for generating an infinite number of physically grounded scenarios for robot training, similar to how language models can generate a vast array of text-based responses.


Q14: How does this new approach to robot training compare to traditional methods?
- This new approach to robot training offers significant advantages over traditional methods. For instance, in a factory setting, instead of manually guiding a robot through various routes, which could take days and cause wear and tear, we can now simulate all possible routes digitally in a fraction of the time. Moreover, these simulations can include a wide range of situations the robot might face, such as different lighting conditions or obstacles. This allows for much faster and more comprehensive learning without the physical limitations and risks associated with real-world training. The ability to rapidly simulate diverse scenarios in the digital world is transforming how robots are trained, potentially leading to more capable and adaptable robotic systems in the future.


Q15: How do you see people interacting with AI and robotics technology in the near future?
- Interviewee believes that everything that moves will be robotic someday, and it will happen soon. The idea of manually pushing a lawnmower is already becoming obsolete, except for those who might do it for fun. Every car is going to be robotic. Human-like robots are just around the corner, and they will learn how to be robots in Omniverse Cosmos, which will generate physically plausible futures for them to learn from. These robots will then come into the physical world, replicating what they've learned. The future where we're surrounded by robots is certain. Interviewee expresses excitement about having a personal R2-D2-like robot, though it might not be in the exact form of the Star Wars character.


Q16: How do you envision the integration of AI assistants like your personal "R2" in daily life?
- Interviewee envisions that their personal AI assistant, referred to as "R2", will be omnipresent. It will be accessible through smart glasses, phones, PCs, and cars. At home, there might be a physical version of R2 that interacts with the digital R2. The concept is that people will have their own AI assistant for their entire life, growing and evolving with them. This future is now considered a certainty.


Q17: What are the main areas of concern or challenges you foresee with the advancement of AI and robotics?
- Interviewee outlines several areas of concern: 1) Bias, toxicity, or hallucination in AI-generated content, which could lead to reliance on inaccurate information. 2) Generation of fake information, including fake news or images. 3) Impersonation, where AI could convincingly pretend to be a specific human. 4) AI safety issues that require deep research and engineering, such as an AI system that intends to do the right thing but fails in execution (e.g., a self-driving car making an error due to sensor failure). 5) System failures where the AI intends to prevent something but is unable to due to hardware malfunction. To address these challenges, interviewee suggests implementing redundancy systems similar to those in aviation, with multiple layers of safety checks and controls. The AI safety systems need to be architected as a community to ensure proper functioning, prevent harm, and maintain overall safety and security.


Q18: How has the shift from CPU-based sequential processing to parallel processing changed the landscape of AI development?
- Interviewee notes that we are currently in a moment where many of the technological limits associated with CPUs and sequential processing have been overcome. The shift to parallel processing has not only provided a new way of computing but also opened up new avenues for continuous improvement. Parallel processing operates under different physical principles compared to the improvements made in CPU technology, unlocking new possibilities for AI advancement.


Q19: What are the scientific or technological limitations that we face now in the current world that you're thinking a lot about?
- Everything ultimately comes down to how much work can be accomplished within the constraints of available energy. This is a physical limit governed by the laws of physics, particularly concerning the transportation and manipulation of information and bits. The energy required for these processes limits what we can achieve, and the amount of energy available further constrains our capabilities. However, we are far from reaching any fundamental limits that would prevent us from advancing. In the meantime, our focus is on building better and more energy-efficient computers. For instance, the first version of our DGX-1, delivered to OpenAI in 2016, cost $250,000 and required 10,000 times more power than the current prototype version. This demonstrates the incredible progress we've made in just 8 years, increasing the energy efficiency of computing by 10,000 times. To put this into perspective, if we applied similar efficiency gains to other technologies, a 100-watt light bulb would now produce the same illumination using 10,000 times less energy. This advancement in energy efficiency, particularly for AI computing, is essential as we aim to create more intelligent systems and utilize more computation to enhance our capabilities. Improving energy efficiency to accomplish more work remains our top priority.


Q20: As applications of technology get more specific, such as transformers in AI, how do you make decisions between designing hardware optimized for specific tasks versus maintaining more general-purpose capabilities?
- This question touches on a core aspect of our decision-making process. Our fundamental belief is that transformers are not the final AI algorithm or architecture that researchers will ever discover. Instead, we view them as a stepping stone towards future evolutions that may be barely recognizable as transformers years from now. This perspective is grounded in historical precedent - throughout the history of computer algorithms, software, engineering, and innovation, no single idea has remained dominant for an extended period. The beauty of computers lies in their ability to perform tasks that were unimaginable just a decade ago. If we had converted computers from ten years ago into specialized, single-purpose devices, we would have limited the potential for new applications and innovations. We believe in the richness of innovation and invention, and we aim to create an architecture that allows inventors, innovators, software programmers, and AI researchers to explore and develop amazing ideas.


Q21: How do you approach the evolution of AI architectures, particularly in relation to transformers and attention mechanisms?
- The fundamental characteristic of a transformer is its attention mechanism, which aims to understand the meaning and relevance of every word in relation to every other word. However, as the context window expands to hundreds of thousands or even millions of tokens, processing all possible relationships becomes computationally infeasible. This challenge has sparked numerous innovations in attention mechanisms, such as flash attention, hierarchical attention, and wave attention. The number of different types of attention mechanisms invented since the introduction of transformers is quite extraordinary. We believe this trend of innovation will continue, and we maintain that computer science and AI research are far from reaching their endpoints. Our philosophy is to design computers that enable flexibility in research and innovation, allowing for the exploration and implementation of new ideas. This approach of maintaining adaptability and supporting ongoing innovation is fundamentally the most important aspect of our strategy in chip design and development.


Q22: How do you approach designing tools in the context of current physical limitations? What factors do you consider when pushing technological boundaries?
- The approach at NVIDIA involves developing deep expertise in areas where we outsource manufacturing, such as semiconductor physics for chips made by TSMC. We have experts who understand the limits of current semiconductor physics, allowing us to work closely with manufacturers to push these boundaries. We apply the same principle to system engineering and cooling systems. For instance, plumbing is crucial for liquid cooling, and fans are essential for air cooling. We employ aerodynamics engineers to design fans that maximize airflow while minimizing noise. Although we don't manufacture these components, we design them with deep expertise in how they're made, which enables us to push technological limits.


Q23: NVIDIA has made several successful big bets on future technologies. What are some of the current and upcoming areas where you expect to see significant developments?
- We anticipate being correct about robotics and the Omniverse. Our latest bet, introduced at CES, is the fusion of Omniverse and Cosmos, creating a new type of generative world generation system, a multiverse generation system. This development is expected to be profoundly important for the future of robotics and physical systems. In humanoid robotics, we're developing tooling systems, training systems, and human demonstration systems, with significant advancements expected in the next 5 years. We're also working on digital biology to understand the language of molecules and cells, aiming to create a digital twin of the human body. This work has profound implications. Our goal is to create "time machines" in various areas that allow us to see and predict the future, enabling us to optimize for the best possible outcomes.


Q24: How would you advise people to prepare for the future given the rapid advancements in technology and computing?
- One way to reason about the future we're creating is to consider how it might impact the importance of your work versus the effort required to do it. Imagine if tasks that currently take a week could be completed almost instantaneously, with the drudgery reduced to zero. This scenario is comparable to the changes brought about by the introduction of highways during the last industrial revolution. That development led to the creation of suburbs, improved distribution of goods, and the emergence of new economies centered around travel infrastructure like gas stations, fast food restaurants, and motels.


Q25: How might widespread adoption of video conferencing technology impact work and living patterns?
- The widespread adoption of effective video conferencing technology could make it acceptable to work much further away from the office. This shift could lead to people choosing to live farther from their workplaces, potentially reshaping residential patterns and commuting norms. Such changes prompt us to consider broader implications for urban planning, real estate, and work-life balance in the future.


Q26: What would happen if a software programmer was available to you at all times, capable of writing any software you could dream up? How would that change your life and opportunities?
- If I had a seed of an idea and could quickly have a prototype or production version put in front of me, it would significantly change my life and opportunities. I believe that in the next decade, intelligence for certain tasks will become superhuman. I can relate to this feeling as I'm surrounded by superhuman people who are the best in the world at what they do, performing their tasks far better than I can. Despite being surrounded by thousands of such individuals, it has never made me feel unnecessary. Instead, it empowers me and gives me the confidence to tackle more ambitious challenges.


Q27: Suppose everyone is surrounded by these super AIs that are very good at specific things. How would that make people feel?
- It's going to empower you and make you feel confident. I'm pretty sure you probably use ChatGPT and AI, and I feel more empowered and confident to learn something new today. The barriers to understanding knowledge in almost any particular field have been reduced. It's like having a personal tutor with you all the time. I think that feeling should be universal. If there's one thing I would encourage everybody to do, it's to get yourself an AI tutor right away. This AI tutor could teach you anything you like, help you program, write, analyze, think, and reason. All of these capabilities are going to make you feel empowered, and I think that's going to be our future. We're going to become superhumans, not because we have superpowers, but because we have super AIs.


Q28: Could you tell us a little bit about each of these objects? Did you touch them?
- It is essentially a supercomputer that you put into your PC, and then we use it for gaming. People today also use it for design, creative arts, and it does amazing AI. The real breakthrough here is truly amazing. GeForce enabled AI and enabled Jeff Hinton, Ilya Sutskever, and Alex Krizhevsky to train AlexNet. We discovered AI and we advanced AI. Out of 8M pixels or so in a 4K display, we are computing and processing only 500,000 of them. The rest of them, we use AI to predict. The AI guesses it, and yet the image is perfect. We inform it by the 500,000 pixels that we computed and ray traced, and it's all beautiful and perfect. Then we tell the AI, if these are the 500,000 perfect pixels in the screen, what are the other 8M? It fills in the rest of the screen, and it's perfect. Because you only have to do fewer pixels, you're able to invest more in doing that, so the quality is better and the extrapolation of the AI does better. Whatever computing or attention you have, whatever resources you have, you can place it into 500,000 pixels.


Q29: What do you feel is important for this audience to know that I haven't asked?
- One of the most important things I would advise is, for example, if I were a student today, the first thing I would do is to learn AI.


Q30: How do I learn to interact with ChatGPT, Gemini Pro, and Grok?
- Learning how to interact with AI is not unlike being someone who is really good at asking questions. You're incredibly good at asking questions, and prompting AI is very, very similar.


Q31: How can AI be used to enhance professional performance across various fields?
- Interviewee suggests that regardless of the field or profession, individuals should ask themselves how they can use AI to improve their job performance. This applies to lawyers, doctors, chemists, biologists, and professionals in any field. The interviewee draws a parallel to how their generation was the first to ask how computers could be used to enhance job performance. They emphasize that the current generation doesn't need to ask about computer usage, but should focus on leveraging AI. The interviewee notes that AI has lowered barriers to understanding, knowledge, and intelligence, making it easier to access information than traditional research methods. They encourage everyone to try using AI, highlighting its user-friendly nature compared to learning how to use a computer. The interviewee points out that AI, like ChatGPT, can guide users on how to use it effectively, making people "superhuman" in the process.


Q32: What impact does NVIDIA aim to achieve, and how does the company view its role in technological advancement?
- The interviewee expresses that they want NVIDIA to be remembered for making an extraordinary impact. They believe that due to core beliefs established long ago and consistently built upon, NVIDIA has become one of the most important and consequential technology companies in the world. The company takes this responsibility seriously, working to make their capabilities available to both large companies and individual researchers and developers across all fields of science, regardless of profitability, size, or fame. The interviewee hopes that in the future, people will recognize NVIDIA's transformative impact on various fields. They envision NVIDIA being at the epicenter of revolutions in digital biology, life sciences, material sciences, robotics, and autonomous driving. The interviewee wants the next generation to associate NVIDIA not only with gaming technology but also with these wide-ranging technological advancements that have improved various aspects of life and work.

Read more