Organizations are deploying AI infrastructure everywhere, and most security teams have no idea what they're running. A new security scanning tool called Julius has just doubled its detection capabilities, revealing a sprawling landscape of cloud-managed AI services, self-hosted inference engines, and gateway systems that often sit on the open internet with minimal or no authentication. The problem isn't that these tools are inherently insecure. It's that they're easy to deploy, they solve obvious business problems, and teams spin them up without involving security. By the time anyone notices, the system has been indexing sensitive documents or routing API traffic with no network restrictions and no monitoring. This is shadow IT for the artificial intelligence era. What Changed in Julius v0.2.0? Julius, an open-source security scanner developed by Praetorian, nearly doubled its detection capabilities in its latest release. The tool went from identifying 33 different AI services to 63 in a single update, adding 30 new detection probes that cover the full spectrum of how organizations actually deploy AI infrastructure. The expansion reflects a critical gap in enterprise security: while teams were focused on detecting self-hosted basics like Ollama and vLLM, they missed the bigger picture. Organizations are now running AI through cloud providers, deploying high-performance inference engines for production workloads, and using gateway systems to route traffic between applications and models. Julius now covers all three layers. Which AI Services Are Now Detectable? The new probes span three major categories of AI infrastructure. Cloud-managed services represent the first wave, where organizations assume their endpoints are inherently private. They often aren't. Misconfigured API gateways, exposed proxy layers, and overly permissive network policies can put them on the open internet. - AWS Bedrock: Detected via foundation models endpoint and model conversation endpoints - Azure OpenAI: Azure-specific OpenAI endpoint detection - Google Vertex AI: Vertex AI prediction and model endpoint detection - Databricks Model Serving: Model serving endpoint detection - Managed inference APIs: Fireworks AI, Groq, Modal, Replicate, and Together AI The second category covers high-performance inference engines that teams deploy for speed, latency, or cost optimization. These tend to run with default configurations and minimal authentication. - SGLang: Detected via unique server information endpoints exposing memory and disaggregation settings - TensorRT-LLM: NVIDIA's optimized inference runtime - Triton Inference Server: NVIDIA's multi-framework serving platform - BentoML: ML model serving framework - Additional engines: Baseten Truss, DeepSpeed-MII, MLC LLM, Petals, PowerInfer, and Ray Serve The third category is where things get particularly sensitive. AI gateway systems route, observe, and control traffic between applications and language models. An exposed gateway often means access to every model and API key behind it. Why Self-Hosted RAG Platforms Present the Biggest Risk? Retrieval-Augmented Generation, or RAG, platforms are purpose-built to ingest and query internal documents. These systems are designed to handle contracts, HR policies, financial data, and source code. An exposed RAG endpoint is, by definition, an exposed document store. PrivateGPT is a telling example. The entire value proposition is "keep your documents private by running everything locally." The irony is that PrivateGPT's API defaults to no authentication. Its document list endpoint is a simple web request that returns every ingested document's metadata, including filenames and chunk counts. The model field is hardcoded to "private-gpt," which makes detection trivial and false positives near-zero. RAGFlow follows a similar pattern. Its health check endpoint is unauthenticated and returns a JSON response with a field unique to RAGFlow that tracks the status of the Elasticsearch or Infinity backend powering document retrieval. Even when RAGFlow is partially broken, the health endpoint still responds with the same structure, making detection reliable in any state. - PrivateGPT: Detected via unauthenticated document ingestion list endpoint returning metadata - RAGFlow: Detected via health check endpoint with Elasticsearch backend status - Quivr: Second brain RAG platform for knowledge management - h2oGPT: H2O.ai's document question-answering platform - Langflow: Visual language model orchestration framework How to Discover Hidden AI Infrastructure in Your Organization - Run Julius as a network scanner: Execute the command "julius probe" to scan your network for all 63 detectable AI services. The tool requires no external configuration, probe downloads, or API keys since all 63 probes are embedded in the binary. - Configure for enterprise environments: Use the "--ca-cert" flag to specify a custom certificate authority file if your organization uses internal PKI infrastructure, allowing Julius to work within your security policies. - Limit response sizes for safety: Deploy the "--max-response-size" flag (default 10 megabytes) to prevent memory exhaustion from large or malicious responses during scanning operations. - Test with insecure mode first: Use the "--insecure" flag to skip TLS certificate verification in testing environments before deploying to production scanning workflows. What Makes This Discovery Tool Different? Julius is not a model fingerprinting tool, which identifies which language model generated a piece of text. Instead, Julius identifies the server infrastructure itself: what software is running on the endpoint. Think of it as service detection for AI, similar to what network mapping tools like Nmap do for traditional infrastructure. The v0.2.0 release also hardened the scanner itself with response size limiting and TLS configuration options for enterprise environments. The tool fixed several detection issues, including an Ollama probe that was false-positiving on Ollama-compatible servers like SGLang and KoboldCpp by requiring specific fields in API responses. It also fixed header detection rules that silently failed on HTTP/2 connections, affecting five cloud probes. The coverage now spans the full AI infrastructure stack: from cloud-managed inference through self-hosted serving to the RAG and orchestration layer. If an organization is running AI infrastructure, Julius should find it. The development team continues to expand probe coverage as new tools emerge, accepting community contributions in the form of simple YAML files that can be tested locally before submission. For security teams accustomed to discovering shadow IT through traditional infrastructure scanning, this represents a new frontier. The AI infrastructure explosion has outpaced security discovery tooling, leaving organizations vulnerable to exposure of sensitive documents, API keys, and model access through endpoints they didn't know existed.