Large language models (LLMs) represent a major advancement in AI, with the promise of transforming domains through learned knowledge. LLM sizes have been increasing 10X every year for the last few years, and as these models grow in complexity and size, so do their capabilities.
Yet, LLMs are hard to develop and maintain, making LLMs inaccessible to most enterprises.
for marketing copy and storyline creation.
for news and email.
for brand creation and gaming characters.
for intelligent Q&A and real-time customer support.
for dynamic commenting and function generation.
for languages and Wikipedia.
NeMo LLM service running on the NVIDIA AI platform provides enterprises the fastest path to customizing and deploying LLMs on private and public clouds or accessing them through the API service.
NeMo LLM service exposes the NVIDIA Megatron 530B model as a cloud API. Try the capabilities of the 530B model through either the Playground or through representational state transfer (REST) APIs.
NeMo Megatron is an end-to-end framework for training and deploying LLMs with billions or trillions of parameters.
The containerized framework delivers high training efficiency across thousands of GPUs and makes it practical for enterprises to build and deploy large-scale models. It provides capabilities to curate training data, train large-scale models up to trillions of parameters, customize using prompt learning, and deploy using the NVIDIA Triton™ Inference Server to run large-scale models on multiple GPUs and multiple nodes.
NeMo Megaton is optimized to run on NVIDIA DGX™ Foundry, NVIDIA DGX SuperPOD™, Amazon Web Services, Microsoft Azure, and Oracle Cloud Infrastructure.
Data scientists and engineers are starting to push the boundaries of what’s possible with large language models. NVIDIA Triton™ Inference Server is an open-source inference serving software that can be used to deploy, run, and scale LLMs. It supports multi-GPU, multi-node inference for large language models using a FasterTransformer backend. Triton uses tensor and pipeline parallelism and Message Passing Interface (MPI) and the NVIDIA Collective Communication Library (NCCL) for distributed high-performance inference and supports GPT, T5, and other LLMs. LLM inference functionality is in beta.
BioNeMo is an AI-powered drug discovery cloud service and framework built on NVIDIA NeMo Megatron for training and deploying large biomolecular transformer AI models at supercomputing scale. The service includes pretrained LLMs and native support for common file formats for proteins, DNA, RNA, and chemistry, providing data loaders for SMILES for molecular structures and FASTA for amino acid and nucleotide sequences. The BioNeMo framework will also be available for download for running on your own infrastructure.
Stay current on the latest NVIDIA Triton Inference Server and NVIDIA® TensorRT™ product updates, content, news, and more.
Check out the latest on-demand sessions on LLMs from NVIDIA GTCs.
Read about the evolving inference-usage landscape, considerations for optimal inference, and the NVIDIA AI platform.
Try NVIDIA NeMo LLM Service today.