Nvidia started as a humble graphics card maker. Now it’s riding the tech industry’s AI obsession to absurd new heights. The company added $329 billion to its market cap on Wall Street today after a record-breaking day of stock trading, Bloomberg reports.Read more...
Nvidia started as a humble graphics card maker. Now it’s riding the tech industry’s AI obsession to absurd new heights. The company added $329 billion to its market cap on Wall Street today after a record-breaking day of stock trading, Bloomberg reports.
Each year, the AI Index lands on virtual desks with a louder virtual thud—this year, its 393 pages are a testament to the fact that AI is coming off a really big year in 2023. For the past three years, IEEE Spectrum has read the whole damn thing and pulled out a selection of charts that sum up the current state of AI (see our coverage from 2021, 2022, and 2023). This year’s report, published by the Stanford Institute for Human-Centered Artificial Intelligence (HAI), has an expanded chapter on re
Each year, the AI Index lands on virtual desks with a louder virtual thud—this year, its 393 pages are a testament to the fact that AI is coming off a really big year in 2023. For the past three years, IEEE Spectrum has read the whole damn thing and pulled out a selection of charts that sum up the current state of AI (see our coverage from 2021, 2022, and 2023).
This year’s report, published by the Stanford Institute for Human-Centered Artificial Intelligence (HAI), has an expanded chapter on responsible AI and new chapters on AI in science and medicine, as well as its usual roundups of R&D, technical performance, the economy, education, policy and governance, diversity, and public opinion. This year is also the first time that Spectrum has figured into the report, with a citation of an article published here about generative AI’s visual plagiarism problem.
1. Generative AI investment skyrockets
While corporate investment was down overall last year, investment in generative AI went through the roof. Nestor Maslej, editor-in-chief of this year’s report, tells Spectrum that the boom is indicative of a broader trend in 2023, as the world grappled with the new capabilities and risks of generative AI systems like ChatGPT and the image-generating DALL-E 2. “The story in the last year has been about people responding [to generative AI],” says Maslej, “whether it’s in policy, whether it’s in public opinion, or whether it’s in industry with a lot more investment.” Another chart in the report shows that most of that private investment in generative AI is happening in the United States.
2. Google is dominating the foundation model race
Foundation models are big multipurpose models—for example, OpenAI’s GPT-3 and GPT-4 are the foundation model that enable ChatGPT users to write code or Shakespearean sonnets. Since training these models typically requires vast resources, Industry now makes most of them, with academia only putting out a few. Companies release foundation models both to push the state-of-the-art forward and to give developers a foundation on which to build products and services. Google released the most in 2023.
3. Closed models outperform open ones
One of the hot debates in AI right now is whether foundation models should be open or closed, with some arguing passionately that open models are dangerous and others maintaining that open models drive innovation. The AI Index doesn’t wade into that debate, but instead looks at trends such as how many open and closed models have been released (another chart, not included here, shows that of the 149 foundation models released in 2023, 98 were open, 23 gave partial access through an API, and 28 were closed).
The chart above reveals another aspect: Closed models outperform open ones on a host of commonly used benchmarks. Maslej says the debate about open versus closed “usually centers around risk concerns, but there’s less discussion about whether there are meaningful performance trade-offs.”
4. Foundation models have gotten super expensive
Here’s why industry is dominating the foundation model scene: Training a big one takes very deep pockets. But exactly how deep? AI companies rarely reveal the expenses involved in training their models, but the AI Index went beyond the typical speculation by collaborating with the AI research organization Epoch AI. To come up with their cost estimates, the report explains, the Epoch team “analyzed training duration, as well as the type, quantity, and utilization rate of the training hardware” using information gleaned from publications, press releases, and technical reports.
It’s interesting to note that Google’s 2017 transformer model, which introduced the architecture that underpins almost all of today’s large language models, was trained for only US $930.
5. And they have a hefty carbon footprint
The AI Index team also estimated the carbon footprint of certain large language models. The report notes that the variance between models is due to factors including model size, data center energy efficiency, and the carbon intensity of energy grids. Another chart in the report (not included here) shows a first guess at emissions related to inference—when a model is doing the work it was trained for—and calls for more disclosures on this topic. As the report notes: “While the per-query emissions of inference may be relatively low, the total impact can surpass that of training when models are queried thousands, if not millions, of times daily.”
6. The United States leads in foundation models
While Maslej says the report isn’t trying to “declare a winner to this race,” he does note that the United States is leading in several categories, including number of foundation models released (above) and number of AI systems deemed significant technical advances. However, he notes that China leads in other categories including AI patents granted and installation of industrial robots.
7. Industry calls new PhDs
This one is hardly a surprise, given the previously discussed data about industry getting lots of investment for generative AI and releasing lots of exciting models. In 2022 (the most recent year for which the Index has data), 70 precent of new AI PhDs in North America took jobs in industry. It’s a continuation of a trend that’s been playing out over the last few years.
8. Some progress on diversity
For years, there’s been little progress on making AI less white and less male. But this year’s report offers a few hopeful signs. For example, the number of non-white and female students taking the AP computer science exam is on the rise. The graph above shows the trends for ethnicity, while another graph, not included here, shows that 30 percent of the students taking the exam are now girls.
Another graph in the report shows that at the undergraduate level, there’s also a positive trend in increasing ethnic diversity among North American students earning bachelor degrees in computer science, although the number of women earning CS bachelor degrees has barely budged over the last five years. Says Maslej, “it’s important to know that there’s still a lot of work to be done here.”
9. Chatter in earnings calls
Businesses are awake to the possibilities of AI. The Index got data about Fortune 500 companies’ earnings calls from Quid, a market intelligence firm that used natural language processing tools to scan for all mentions of “artificial intelligence,” “AI,” “machine learning,” “ML,” and “deep learning.” Nearly 80 percent of the companies included discussion of AI in their calls. “I think there’s a fear in business leaders that if they don’t use this technology, they’re going to miss out,” Maslej says.
And while some of that chatter is likely just CEOs bandying about buzzwords, another graph in the report shows that 55 percent of companies included in a McKinsey survey have implemented AI in at least one business unit.
10. Costs go down, revenues go up
And here’s why AI isn’t just a corporate buzzword: The same McKinsey survey showed that the integration of AI has caused companies’ costs to go down and their revenues go up. Overall, 42 percent of respondents said they’d seen reduced costs, and 59 percent claimed increased revenue.
Other charts in the report suggest that this impact on the bottom line reflects efficiency gains and better worker productivity. In 2023, a number of studies in different fields showed that AI enabled workers to complete tasks more quickly and produce better quality work. One study looked at coders using Copilot, while others looked at consultants, call center agents, and law students. “These studies also show that although every worker benefits, AI helps lower-skilled workers more than it does high-skilled workers,” says Maslej.
11. Corporations do perceive risks
This year, the AI Index team ran a global survey of 1,000 corporations with revenues of at least $500 million to understand how businesses are thinking about responsible AI. The results showed that privacy and data governance is perceived as the greatest risk across the globe, while fairness (often discussed in terms of algorithmic bias) still hasn’t registered with most companies. Another chart in the report shows that companies are taking action on their perceived risks: The majority of organizations across regions have implemented at least one responsible AI measure in response to relevant risks.
12. AI can’t beat humans at everything... yet
In recent years, AI systems have outperformed humans on a range of tasks, including reading comprehension and visual reasoning, and Maslej notes that the pace of AI performance improvement has also picked up. “A decade ago, with a benchmark like ImageNet, you could rely on that to challenge AI researchers for for five or six years,” he says. “Now, a new benchmark is introduced for competition-level mathematics and the AI starts at 30 percent, and then in a year it gets to 90 percent.” While there are still complex cognitive tasks where humans outperform AI systems, let’s check in next year to see how that’s going.
13. Developing norms of AI responsibility
When an AI company is preparing to release a big model, it’s standard practice to test it against popular benchmarks in the field, thus giving the AI community a sense of how models stack up against each other in terms of technical performance. However, it has been less common to test models against responsible AI benchmarks that assess such things as toxic language output (RealToxicityPrompts and ToxiGen), harmful bias in responses (BOLD and BBQ), and a model’s degree of truthfulness (TruthfulQA). That’s starting to change, as there’s a growing sense that checking one’s model against theses benchmarks is, well, the responsible thing to do. However, another chart in the report shows that consistency is lacking: Developers are testing their models against different benchmarks, making comparisons harder.
14. Laws both boost and constrain AI
Between 2016 and 2023, the AI Index found that 33 countries had passed at least one law related to AI, with most of the action occurring in the United States and Europe; in total, 148 AI-related bills have been passed in that timeframe. The Index researchers also classified bills as either expansive laws that aim to enhance a country’s AI capabilities or restrictive laws that place limits on AI applications and usage. While many bills continue to boost AI, the researchers found a global trend toward restrictive legislation.
15. AI makes people nervous
The Index’s public opinion data comes from a global survey on attitudes toward AI, with responses from 22,816 adults (ages 16 to 74) in 31 countries. More than half of respondents said that AI makes them nervous, up from 39 percent the year before. And two-thirds of people now expect AI to profoundly change their daily lives in the next few years.
Maslej notes that other charts in the index show significant differences in opinion among different demographics, with young people being more inclined toward an optimistic view of how AI will change their lives. Interestingly, “a lot of this kind of AI pessimism comes from Western, well-developed nations,” he says, while respondents in places like Indonesia and Thailand said they expect AI’s benefits to outweigh its harms.
Andrew Ng has serious street cred in artificial intelligence. He pioneered the use of graphics processing units (GPUs) to train deep learning models in the late 2000s with his students at Stanford University, cofounded Google Brain in 2011, and then served for three years as chief scientist for Baidu, where he helped build the Chinese tech giant’s AI group. So when he says he has identified the next big shift in artificial intelligence, people listen. And that’s what he told IEEE Spectrum in an
Andrew Ng has serious street cred in artificial intelligence. He pioneered the use of graphics processing units (GPUs) to train deep learning models in the late 2000s with his students at Stanford University, cofounded Google Brain in 2011, and then served for three years as chief scientist for Baidu, where he helped build the Chinese tech giant’s AI group. So when he says he has identified the next big shift in artificial intelligence, people listen. And that’s what he told IEEE Spectrum in an exclusive Q&A.
Ng’s current efforts are focused on his company
Landing AI, which built a platform called LandingLens to help manufacturers improve visual inspection with computer vision. He has also become something of an evangelist for what he calls the data-centric AI movement, which he says can yield “small data” solutions to big issues in AI, including model efficiency, accuracy, and bias.
The great advances in deep learning over the past decade or so have been powered by ever-bigger models crunching ever-bigger amounts of data. Some people argue that that’s an unsustainable trajectory. Do you agree that it can’t go on that way?
Andrew Ng: This is a big question. We’ve seen foundation models in NLP [natural language processing]. I’m excited about NLP models getting even bigger, and also about the potential of building foundation models in computer vision. I think there’s lots of signal to still be exploited in video: We have not been able to build foundation models yet for video because of compute bandwidth and the cost of processing video, as opposed to tokenized text. So I think that this engine of scaling up deep learning algorithms, which has been running for something like 15 years now, still has steam in it. Having said that, it only applies to certain problems, and there’s a set of other problems that need small data solutions.
When you say you want a foundation model for computer vision, what do you mean by that?
Ng: This is a term coined by Percy Liang and some of my friends at Stanford to refer to very large models, trained on very large data sets, that can be tuned for specific applications. For example, GPT-3 is an example of a foundation model [for NLP]. Foundation models offer a lot of promise as a new paradigm in developing machine learning applications, but also challenges in terms of making sure that they’re reasonably fair and free from bias, especially if many of us will be building on top of them.
What needs to happen for someone to build a foundation model for video?
Ng: I think there is a scalability problem. The compute power needed to process the large volume of images for video is significant, and I think that’s why foundation models have arisen first in NLP. Many researchers are working on this, and I think we’re seeing early signs of such models being developed in computer vision. But I’m confident that if a semiconductor maker gave us 10 times more processor power, we could easily find 10 times more video to build such models for vision.
Having said that, a lot of what’s happened over the past decade is that deep learning has happened in consumer-facing companies that have large user bases, sometimes billions of users, and therefore very large data sets. While that paradigm of machine learning has driven a lot of economic value in consumer software, I find that that recipe of scale doesn’t work for other industries.
It’s funny to hear you say that, because your early work was at a consumer-facing company with millions of users.
Ng: Over a decade ago, when I proposed starting the Google Brain project to use Google’s compute infrastructure to build very large neural networks, it was a controversial step. One very senior person pulled me aside and warned me that starting Google Brain would be bad for my career. I think he felt that the action couldn’t just be in scaling up, and that I should instead focus on architecture innovation.
“In many industries where giant data sets simply don’t exist, I think the focus has to shift from big data to good data. Having 50 thoughtfully engineered examples can be sufficient to explain to the neural network what you want it to learn.”
—Andrew Ng, CEO & Founder, Landing AI
I remember when my students and I published the first
NeurIPS workshop paper advocating using CUDA, a platform for processing on GPUs, for deep learning—a different senior person in AI sat me down and said, “CUDA is really complicated to program. As a programming paradigm, this seems like too much work.” I did manage to convince him; the other person I did not convince.
I expect they’re both convinced now.
Ng: I think so, yes.
Over the past year as I’ve been speaking to people about the data-centric AI movement, I’ve been getting flashbacks to when I was speaking to people about deep learning and scalability 10 or 15 years ago. In the past year, I’ve been getting the same mix of “there’s nothing new here” and “this seems like the wrong direction.”
How do you define data-centric AI, and why do you consider it a movement?
Ng: Data-centric AI is the discipline of systematically engineering the data needed to successfully build an AI system. For an AI system, you have to implement some algorithm, say a neural network, in code and then train it on your data set. The dominant paradigm over the last decade was to download the data set while you focus on improving the code. Thanks to that paradigm, over the last decade deep learning networks have improved significantly, to the point where for a lot of applications the code—the neural network architecture—is basically a solved problem. So for many practical applications, it’s now more productive to hold the neural network architecture fixed, and instead find ways to improve the data.
When I started speaking about this, there were many practitioners who, completely appropriately, raised their hands and said, “Yes, we’ve been doing this for 20 years.” This is the time to take the things that some individuals have been doing intuitively and make it a systematic engineering discipline.
The data-centric AI movement is much bigger than one company or group of researchers. My collaborators and I organized a
data-centric AI workshop at NeurIPS, and I was really delighted at the number of authors and presenters that showed up.
You often talk about companies or institutions that have only a small amount of data to work with. How can data-centric AI help them?
Ng: You hear a lot about vision systems built with millions of images—I once built a face recognition system using 350 million images. Architectures built for hundreds of millions of images don’t work with only 50 images. But it turns out, if you have 50 really good examples, you can build something valuable, like a defect-inspection system. In many industries where giant data sets simply don’t exist, I think the focus has to shift from big data to good data. Having 50 thoughtfully engineered examples can be sufficient to explain to the neural network what you want it to learn.
When you talk about training a model with just 50 images, does that really mean you’re taking an existing model that was trained on a very large data set and fine-tuning it? Or do you mean a brand new model that’s designed to learn only from that small data set?
Ng: Let me describe what Landing AI does. When doing visual inspection for manufacturers, we often use our own flavor of RetinaNet. It is a pretrained model. Having said that, the pretraining is a small piece of the puzzle. What’s a bigger piece of the puzzle is providing tools that enable the manufacturer to pick the right set of images [to use for fine-tuning] and label them in a consistent way. There’s a very practical problem we’ve seen spanning vision, NLP, and speech, where even human annotators don’t agree on the appropriate label. For big data applications, the common response has been: If the data is noisy, let’s just get a lot of data and the algorithm will average over it. But if you can develop tools that flag where the data’s inconsistent and give you a very targeted way to improve the consistency of the data, that turns out to be a more efficient way to get a high-performing system.
“Collecting more data often helps, but if you try to collect more data for everything, that can be a very expensive activity.”
—Andrew Ng
For example, if you have 10,000 images where 30 images are of one class, and those 30 images are labeled inconsistently, one of the things we do is build tools to draw your attention to the subset of data that’s inconsistent. So you can very quickly relabel those images to be more consistent, and this leads to improvement in performance.
Could this focus on high-quality data help with bias in data sets? If you’re able to curate the data more before training?
Ng: Very much so. Many researchers have pointed out that biased data is one factor among many leading to biased systems. There have been many thoughtful efforts to engineer the data. At the NeurIPS workshop, Olga Russakovsky gave a really nice talk on this. At the main NeurIPS conference, I also really enjoyed Mary Gray’s presentation, which touched on how data-centric AI is one piece of the solution, but not the entire solution. New tools like Datasheets for Datasets also seem like an important piece of the puzzle.
One of the powerful tools that data-centric AI gives us is the ability to engineer a subset of the data. Imagine training a machine-learning system and finding that its performance is okay for most of the data set, but its performance is biased for just a subset of the data. If you try to change the whole neural network architecture to improve the performance on just that subset, it’s quite difficult. But if you can engineer a subset of the data you can address the problem in a much more targeted way.
When you talk about engineering the data, what do you mean exactly?
Ng: In AI, data cleaning is important, but the way the data has been cleaned has often been in very manual ways. In computer vision, someone may visualize images through a Jupyter notebook and maybe spot the problem, and maybe fix it. But I’m excited about tools that allow you to have a very large data set, tools that draw your attention quickly and efficiently to the subset of data where, say, the labels are noisy. Or to quickly bring your attention to the one class among 100 classes where it would benefit you to collect more data. Collecting more data often helps, but if you try to collect more data for everything, that can be a very expensive activity.
For example, I once figured out that a speech-recognition system was performing poorly when there was car noise in the background. Knowing that allowed me to collect more data with car noise in the background, rather than trying to collect more data for everything, which would have been expensive and slow.
What about using synthetic data, is that often a good solution?
Ng: I think synthetic data is an important tool in the tool chest of data-centric AI. At the NeurIPS workshop, Anima Anandkumar gave a great talk that touched on synthetic data. I think there are important uses of synthetic data that go beyond just being a preprocessing step for increasing the data set for a learning algorithm. I’d love to see more tools to let developers use synthetic data generation as part of the closed loop of iterative machine learning development.
Do you mean that synthetic data would allow you to try the model on more data sets?
Ng: Not really. Here’s an example. Let’s say you’re trying to detect defects in a smartphone casing. There are many different types of defects on smartphones. It could be a scratch, a dent, pit marks, discoloration of the material, other types of blemishes. If you train the model and then find through error analysis that it’s doing well overall but it’s performing poorly on pit marks, then synthetic data generation allows you to address the problem in a more targeted way. You could generate more data just for the pit-mark category.
“In the consumer software Internet, we could train a handful of machine-learning models to serve a billion users. In manufacturing, you might have 10,000 manufacturers building 10,000 custom AI models.”
—Andrew Ng
Synthetic data generation is a very powerful tool, but there are many simpler tools that I will often try first. Such as data augmentation, improving labeling consistency, or just asking a factory to collect more data.
To make these issues more concrete, can you walk me through an example? When a company approaches Landing AI and says it has a problem with visual inspection, how do you onboard them and work toward deployment?
Ng: When a customer approaches us we usually have a conversation about their inspection problem and look at a few images to verify that the problem is feasible with computer vision. Assuming it is, we ask them to upload the data to the LandingLens platform. We often advise them on the methodology of data-centric AI and help them label the data.
One of the foci of Landing AI is to empower manufacturing companies to do the machine learning work themselves. A lot of our work is making sure the software is fast and easy to use. Through the iterative process of machine learning development, we advise customers on things like how to train models on the platform, when and how to improve the labeling of data so the performance of the model improves. Our training and software supports them all the way through deploying the trained model to an edge device in the factory.
How do you deal with changing needs? If products change or lighting conditions change in the factory, can the model keep up?
Ng: It varies by manufacturer. There is data drift in many contexts. But there are some manufacturers that have been running the same manufacturing line for 20 years now with few changes, so they don’t expect changes in the next five years. Those stable environments make things easier. For other manufacturers, we provide tools to flag when there’s a significant data-drift issue. I find it really important to empower manufacturing customers to correct data, retrain, and update the model. Because if something changes and it’s 3 a.m. in the United States, I want them to be able to adapt their learning algorithm right away to maintain operations.
In the consumer software Internet, we could train a handful of machine-learning models to serve a billion users. In manufacturing, you might have 10,000 manufacturers building 10,000 custom AI models. The challenge is, how do you do that without Landing AI having to hire 10,000 machine learning specialists?
So you’re saying that to make it scale, you have to empower customers to do a lot of the training and other work.
Ng: Yes, exactly! This is an industry-wide problem in AI, not just in manufacturing. Look at health care. Every hospital has its own slightly different format for electronic health records. How can every hospital train its own custom AI model? Expecting every hospital’s IT personnel to invent new neural-network architectures is unrealistic. The only way out of this dilemma is to build tools that empower the customers to build their own models by giving them tools to engineer the data and express their domain knowledge. That’s what Landing AI is executing in computer vision, and the field of AI needs other teams to execute this in other domains.
Is there anything else you think it’s important for people to understand about the work you’re doing or the data-centric AI movement?
Ng: In the last decade, the biggest shift in AI was a shift to deep learning. I think it’s quite possible that in this decade the biggest shift will be to data-centric AI. With the maturity of today’s neural network architectures, I think for a lot of the practical applications the bottleneck will be whether we can efficiently get the data we need to develop systems that work well. The data-centric AI movement has tremendous energy and momentum across the whole community. I hope more researchers and developers will jump in and work on it.