What is theNeuReality
Solution?

A holistic hardware and software platform that makes AI easy to develop, deploy, and manage.
NeuReality accelerates the possibilities of AI by offering a revolutionary solution that lowers the overall complexity, cost, and power consumption.
While other companies also develop Deep Learning Accelerators (DLAs) for deployment, no other company connects the dots with a software platform purpose-built to help manage specific hardware infrastructure.
This system-level, AI-centric approach simplifies running AI inference at scale.
NeuReality’s AI-Centric Approach
NeuReality Software
Develop, deploy, and manage AI inference
NeuReality is the only company that bridges the gap between the infrastructure where AI inference runs and the MLOps ecosystem.
We’ve created a suite of software tools that make it easy to develop, deploy, and manage AI inference.
Watch the video to learn more about our software.
Our software stack:
Any data scientist, software engineer, or DevOps engineer can run any model faster and easier with less headache, overhead, and cost.
NeuReality APIs
Our SDK includes three APIs that cover the complete life cycle of an AI inference deployment:
Toolchain API | Provisioning API | Inference API |
For developing inference solutions from any AI workflow (NLP, Computer Vision, Recommendation Engine) | For deploying and managing AI workflows | For running AI-as-a-service at scale |
NeuReality Architecture
NeuReality has developed a new architecture design to exploit the power of DLAs.
We accomplish this through the world’s first Network Addressable Processing Unit, or NAPU.
This architecture enables inference through hardware with AI-over-Fabric, an AI-hypervisor, and AI-pipeline offload.
This illustration shows all of the functionality that is contained within the NAPU:


AI-centric vs CPU-centric
Traditional, generic, multi-purpose CPUs perform all their tasks in software, increasing latency and bottlenecks.
Our purpose-built, AI-centric NAPUs perform those same tasks in hardware that was specifically designed for AI inference.
The following table compares these two approaches:
AI-centric NAPU | Traditional CPU-centric | |
Architecture Approach | Purpose-built for Inference workflows | Generic, multi-purpose chip |
AI Pipeline Processing | Linear | “Star” model |
Instruction Processing | Hardware based | Software based |
Management | AI Process natively managed by cloud orchestration tools | AI process not managed, only CPU managed |
Pre/Post Processing | Performed in Hardware | Performed in software by CPU |
System View | Single chip host | Partitioned (CPU, NIC, PCI switch) |
Scalability | Linear scalability | Diminishing returns |
Density | High | Low |
Total Cost of Ownership | Low | High |
Latency | Low | High, due to over partitioning and bottlenecking |
NeuReality Hardware


NR1 Network Addressable Processing Unit
The NeuReality NR1 is a network attached inference Server-on-a-Chip with an embedded Neural Network Engine. The NR1 is the world’s first Network Attached Processing Unit (NAPU). As workflow-optimized hardware devices with specialized processing units, native network capabilities, and virtualization capabilities, NAPUs are the ideal form of devices specialized for specific capabilities and important in the heterogeneous data center of the future.


NR1-M Inference Module
The NeuReality NR1-M module is a Full-Height Double-wide PCIe card containing one NR1 Network Attached Processing Unit (NAPU) system-on-chip and a network attached Inference Server and can connect to an external Deep Learning Accelerator (DLA).


NR1-S Inference Server
The world’s first AI-centric server, NeuReality’s AI-centric NR1-S is an optimized design for an inference server which contains NR1-M modules with the NR1 NAPU, which enables truly disaggregated AI service in a scalable and efficient architecture. The system not only lowers cost and power performance by up to 50X but doesn’t require IT to implement for end users.