Storage

人工智能将会给存储带来怎样的变化?

Currie Munce著 - 2023-11-14

在仓库工作是一个激动人心的时刻. 我们正处于IT行业颠覆性变革的风口浪尖. It revolves around how artificial intelligence (AI) will change how we architect and build servers, 以及我们期望电脑为我们做什么. 业界和公众都对生成式人工智能议论纷纷. ChatGPT的出现TM earlier this year captured imaginations around how a computer could understand our natural language questions, 和我们就任何话题进行对话, 像人一样写诗和押韵. Or the various image-generation AI models that can create stunning visual masterpieces based on simple text prompts given by the user.

The rapid emergence of AI is creating considerable demands for higher bandwidth memory, HBM. HBM解决方案现在比黄金更受欢迎. Large language models (LLM) are driving demand for larger capacity memory footprint on the CPU to support even bigger, 更复杂的模型. While the importance of more memory bandwidth and capacity are well understood, 经常被遗忘的是存储在支持人工智能发展中的作用.

存储在人工智能工作负载中的作用或重要性是什么?

存储将在两个方面发挥至关重要的作用. 一个是本地的, high-speed storage that acts as a cache for feeding training data into the HBM on the GPU. 由于性能方面的需要,需要使用高性能的SSD. The other key role of storage is to hold all the training datasets in large data lakes.

本地缓存驱动器

LLMs are training on human-generated information found on the web, in books and related dictionaries. The I/O pattern to the training data on the local cache drive is structured and is mainly reading of large data blocks to prefetch the next batch of data into memory. Hence, for traditional LLMs, the SSD’s performance is not normally a bottleneck to GPU processing. 其他AI/ML模型, 如计算机视觉或混合模式LLM+CV, 要求更高的带宽,挑战本地缓存驱动器.

Graph Neural Networks (GNN) are often used for product recommendation/deep learning recommendation models (DLRM), 欺诈检测和网络入侵. The DLRM is sometimes referred to as the largest revenue generation algorithm on the internet. Models for the training of GNNs tend to access data more randomly and in smaller block sizes. They can truly challenge the performance of the local cache SSD and can lead to idling expensive GPUs. 需要新的SSD功能来缓解这种性能瓶颈. Micron is actively working on solutions with industry leaders and is presenting some of this work at SC23 in Denver, where we will demonstrate ways for the GPU and SSD to interact to speed up some I/O intensive processing times by up to 100x. 

人工智能数据湖

For large data lakes, large-capacity SSDs will become the storage media of preference. HDDs get cheaper ($/TB) as they get larger capacity, but they also get slower (MB/s / TB). HDD capacities larger than 20TB will truly challenge the ability of large data lakes to power-efficiently source the type of bandwidth (TB/s) needed for large AI/ML GPU clusters. SSDs, 另一方面, 有足够的表现, and, in purpose-built forms can deliver the required capacities at lower power (8x lower Watt/TB) and even lower electrical energy (10x lower kW-hr /TB) levels than HDD. 这些节省将使数据中心有更多的能量来添加更多的gpu. Today, Micron is deploying its 32TB high-capacity data center SSD into numerous 人工智能数据湖 and object stores. Capacities for 15-watt SSDs that can individually deliver several GB/s of bandwidth will scale up to 250TB in the future.

人工智能将如何影响NAND闪存存储需求?

首先,所有新的AI/ML模型的训练都需要“学习”的数据.IDC估计,这将从2005年开始, the amount of data generated every year exceeded the amount of storage purchased each year. 这意味着一些数据必须是短暂的. 用户必须决定它的值, and whether the value of keeping the data exceeds the cost of buying more storage to retain it.

机器-照相机, sensors, IoT, 喷气发动机诊断, 分组路由信息, swipes and clicks – now generate several orders of magnitude more data in a day than humans can. Machine-generated data that humans did not previously have the time or capacity to analyze can now be especially useful to AI/ML routines to extract useful and valuable information. The emergence of AI/ML should make this data more valuable to retain and hence grow the demand for storage.

这些训练数据存储在人工智能数据湖中. These data lakes exhibit characteristics of higher-than-normal access density to feed a growing number of GPUs per cluster while simultaneously supporting a high mixture of ingestion and preprocessing. There is also a lot of re-training on the data such that there is often little “cold” data. 这种工作负载特性更适合大容量, 比传统的基于hdd的对象存储更节能的ssd. These data lakes can be quite large – hundreds of petabytes – for computer vision, 比如自动驾驶或DLRM. 随着这些数据湖容量和数量的增长, 这将为NAND闪存ssd带来巨大的增长机会.

随着人工智能模型的发展和扩展, NAND flash storage will become increasingly critical to maintain their exponential growth in performance.

Currie Munce

Currie Munce is Vice President Storage Solutions Architecture for Micron’s Storage Business Unit where he is responsible for defining storage architectural directions for the company, 包括原型设计和与客户和合作伙伴的联合合作.
+