AI model weights have become the forgotten bottleneck in enterprise deployments. While companies have mastered shipping software code through containers and registries, they're still managing AI models through ad hoc scripts, manual file transfers, and unsecured shared folders. A new approach treats model weights as first-class artifacts in the same container infrastructure used for applications, bringing versioning, security scanning, and automated deployment to the AI lifecycle. Why Are Model Files So Hard to Manage? Modern AI models are enormous. A quantized version of Meta's LLaMA-3 70B model weighs approximately 140 gigabytes, while cutting-edge multimodal models can exceed 1 terabyte. These aren't files you can version-control with standard Git or store in traditional databases. Teams need to handle multiple model versions, rapidly distribute them across GPU inference nodes in different regions, and guarantee that any production deployment traces back to an exact, immutable artifact. The core challenges break down into three categories: storing models at massive scale, distributing them quickly to inference servers, and ensuring reproducibility across deployments. Most organizations today rely on one of three approaches, and none of them work well enough for enterprise needs. What Are the Current Solutions and Their Limitations? Organizations typically choose between three existing strategies, each with significant drawbacks: - Git LFS (Hugging Face Hub): Offers native version control through branches, tags, commits, and history, but inherits Git's transport inefficiencies and lacks optimizations for distributing huge files across cloud-native environments. - Object Storage (S3, MinIO): Cloud providers offer this as a standard solution with native support in inference engines like vLLM and SGLang, but it lacks structured metadata and provides only weak version management capabilities. - Distributed Filesystems (NFS, CephFS): POSIX-compatible systems have low integration costs, but they lack structured metadata, weak version management, and create high operational complexity for distributed deployments. The fundamental problem: none of these approaches were designed with Kubernetes-native delivery in mind. Software containers are pulled from OCI (Open Container Initiative) registries with full versioning, security scanning, and rollback support. Model weights, by contrast, are often downloaded via shell scripts, copied manually between storage buckets, or distributed through unsecured shared filesystems. How Can Teams Ship Models Like They Ship Code? The solution involves applying proven software delivery practices to AI model lifecycle management. Instead of treating models as special cases, organizations can package them as OCI artifacts and manage them through the same container infrastructure already in place for applications. The parallel workflow looks like this: - Development Phase: Algorithm engineers push model weights and configurations to the Hugging Face Hub, treating it as the central Git repository for model code and artifacts. - Build Phase: CI/CD pipelines package weights, runtime configurations, and metadata into an immutable model artifact, just as they compile and test application code. - Management Phase: The model artifact is stored in an artifact registry, reusing existing container infrastructure and tooling for supply chain security, access control, and P2P distribution. - Deployment Phase: Engineers use Kubernetes OCI Volumes or a Model CSI (Container Storage Interface) Driver to mount models into inference containers as volumes, decoupling the AI model from the inference engine. This approach brings critical benefits: versioning ensures you can roll back to previous model versions instantly, immutability guarantees that a deployed model never changes unexpectedly, and GitOps-driven deployment lets teams manage model updates through declarative YAML files. Steps to Implement Cloud-Native Model Delivery A practical implementation uses a CLI tool called modctl, which standardizes how models are packaged, versioned, stored, and deployed. The process involves five straightforward steps: - Generate Definition: Run modctl to create a Modelfile in your model directory, which defines the model's name, architecture, family, format, configuration files, weights, and code components. - Customize Metadata: Edit the Modelfile to specify details like model name (e.g., qwen2.5-0.5b), architecture type (transformer, CNN, RNN), model family, format (safetensors, ONNX, PyTorch), and paths to configuration and weight files. - Authenticate Registry: Log in to your artifact registry (such as Harbor) using modctl with your credentials to enable secure artifact storage and distribution. - Build OCI Artifact: Run modctl build to package the model into an OCI artifact, which generates a Model Manifest containing descriptive information stored as application/vnd.cncf.model.config.v1+json. - Push to Registry: Upload the built artifact to your artifact registry, making it available for deployment across your Kubernetes infrastructure. When you build a model artifact, it generates a manifest that includes layers for model weights, configuration files, and code. Each layer is identified by a cryptographic digest and size, ensuring immutability and traceability. What Does This Mean for Enterprise AI Operations? The gap between software delivery infrastructure and AI model management has created deployment fragility, security risks, and operational overhead at scale. By treating model weights as first-class OCI artifacts, organizations can leverage the full ecosystem of container tooling they've already invested in: security scanning, signed provenance, GitOps-driven deployment, and Kubernetes-native pulling. "The cloud native gap: Most existing ML model storage approaches were not designed with Kubernetes-native delivery in mind, leaving a critical gap between how software artifacts are managed and how model artifacts are managed," explained Wenbo Qi, Dragonfly and ModelPack maintainer, alongside Chenyu Zhang and Feynman Zhou. Wenbo Qi, Dragonfly/ModelPack Maintainer; Chenyu Zhang, Harbor/ModelPack Maintainer; and Feynman Zhou, ORAS Maintainer and CNCF Ambassador This shift represents a maturation of AI infrastructure. As organizations scale their AI deployments, they're discovering that the ad hoc approaches that worked for small teams break down under production load. The solution isn't new technology; it's applying established software engineering practices to the unique challenges of managing massive model files. Teams that adopt this approach gain reproducibility, security, and operational efficiency that manual processes simply cannot provide.