visit
A team of researchers at the Indian Institute of Science, Bangalore have just created a revolutionary prototype of a computing paradigm that could shake (and replace) the foundations of modern computing - starting with the transistor and the GPUs!
The Von Neumann architecture and its infamous Bottleneck on which all our computing systems are built will become obsolete if this revolutionary model can scale.
This technology, called neuromorphic computing (computing that mimics the brain), uses memristors (a foundation device that can access multiple levels of state at once = to be precise, 16,500+ states in the groundbreaking novel prototype) to blow the acceleration of even quantum computing out of the park.
A team of researchers at the Indian Institute of Science (IISc) has achieved a revolutionary breakthrough in artificial intelligence, potentially changing the global technology landscape.
They have developed a brain-inspired computing platform with the potential to make AI significantly faster, more efficient, and accessible to a wider population.
It paves the way for bringing complex AI tasks, such as training Large Language Models (LLMs), currently confined to resource-intensive data centers, to personal devices like laptops and smartphones.
The world is on the brink of an AI revolution, and this time, India is leading the charge.
The Von Neumann Architecture bottleneck, often referred to as the Von Neumann bottleneck, is a limitation arising from the way traditional computer systems based on this architecture handle data.
Central Processing Unit (CPU) – The core unit that performs computations.
Memory (RAM) – Where instructions and data are stored.
Input/Output (I/O) System – Handles interaction with external devices.
Bus – Communication pathways that connect the CPU, memory, and I/O systems.
The Von Neumann bottleneck arises because the CPU and memory must communicate over the same bus to fetch both data and instructions. This creates a sequential processing bottleneck, as only one transaction can occur at a time (either fetching an instruction or fetching data). The limitation becomes more pronounced as processing power (CPU) increases, but the memory access speed does not keep up, causing the CPU to spend more time idle, waiting for instructions or data to arrive.
Instruction Fetch and Data Fetch Conflict: Since both program instructions and data share the same memory, the CPU cannot access both simultaneously. If the program needs data while executing an instruction, the CPU must wait for the next bus cycle to fetch it.
Memory Latency: Even though modern CPUs are extremely fast, memory access speeds have not scaled at the same rate. As a result, CPUs often stall while waiting for data, leading to wasted clock cycles.
Bandwidth Limitation: The bus bandwidth, which defines how much data can be transmitted between the CPU and memory at once, limits the overall throughput. This becomes a critical bottleneck when dealing with large data sets or complex computations.
Underutilization of CPU: The CPU, though capable of executing instructions at very high speeds, is often underutilized because it spends time waiting for memory operations to complete. This mismatch between processing power and memory access speed creates inefficiencies.
Increased Latency: Program execution can be delayed due to the sequential nature of instruction and data fetching. This delay worsens with the growing complexity of applications and data-intensive processes.
Cache Memory: Modern architectures introduce cache memory (L1, L2, L3) to store frequently accessed data and instructions closer to the CPU, reducing the need to access slower main memory frequently.
Pipelining: Breaking down instruction execution into multiple stages (fetch, decode, execute, etc.) allows overlapping of instruction processing to some extent, increasing CPU throughput.
Memory Interleaving and Out-of-Order Execution: Techniques like memory interleaving and allowing the CPU to execute instructions out-of-order based on data availability help alleviate the bottleneck by optimizing memory access patterns.
This is the reason that GPUs came into prominence.
By utilizing massive parallelism in the processors, GPUs operate in parallel by default and deal with data in AI almost independently of the central processor, and speed up AI data operations by 4 orders of magnitude.
Neuromorphic Computing.
In simple terms, neuromorphic computing involves creating computer systems that work like our brains. The human brain processes information using networks of neurons and synapses, which communicate through electrical signals. Neuromorphic systems use similar principles but are built with specialized hardware that can adapt and learn from experiences, much like how we learn from our environment.
Researchers at the Indian Institute of Science (IISc) in Bengaluru have recently made a groundbreaking advancement in neuromorphic computing by developing a platform that can store and process data in an astonishing 16,500 different states within a molecular film. This is a significant leap from traditional computers, which are limited to just two states (high and low conductance) for data processing and storage.
The IISc team created a memristor, a type of semiconductor device that mimics the brain's neural networks. This device uses a metal-organic film to track the movement of molecules and ions, allowing it to access many more memory states than conventional computers. They developed a custom circuit board capable of measuring extremely small voltages, enabling them to pinpoint these various states accurately.
This innovative approach allows the neuromorphic platform to perform complex calculations, such as matrix multiplication, much faster and with less energy than traditional digital computers. For example, they successfully recreated NASA's "Pillars of Creation" image using minimal energy and time compared to what would typically be required by supercomputers.
Democratizing AI: This technology could enable complex AI tasks to be performed on personal devices like laptops and smartphones, rather than relying on large data centers. This shift could make powerful AI tools more accessible to everyone.
Energy Savings: Businesses that rely on high-speed computing, such as finance and tech companies, could see reduced energy costs and faster data processing capabilities.
Advancements in AI: With faster and more efficient computing, advancements in artificial intelligence and machine learning could accelerate, leading to smarter applications and tools.
Global Leadership: This breakthrough positions India as a potential leader in AI hardware development, contributing to the global tech landscape.
This is huge. Matrix and vector multiplication is the heart of AI today - specifically speaking Generative AI. Instead of multiplying term by term, if the entire multiplication of even 64x64 matrics could be done in parallel in one step - that would be a step forward like nothing ever seen - yet. And if the system could scale to 512x512 or even 1024x1024 (though challenges remain - we’ll address that) all of Nvidia’s chips would be obsolete. I realize that’s a massive claim - but it’s just the facts.
Where GPUs require megawatts of electricity even weekly, this system produced an HPC application with millionths of a single volt. In fact, the biggest innovation in the research paper is the exceptionally fine-grained control over very small volt rates - which are instrumental in producing the 16,520 conductance states, each of which can hold data. This could be the answer to the energy consumption problem and the future of green computing for AI hardware.
Neuromorphic systems as powerful as the Hopper architectures from Nvidia could be deployed on edge devices like IOT, mobiles, budget laptops, and phablets. This has massive implications for artificial intelligence of the Edge - immense calculations and sophisticated processing could be possible with devices in the palm of your hand. No more necessity for even 8 VRAM GPUs!
Who needs Elon Musk’s Colossus supercomputer cluster when SLMs could soon be performing at the level of LLMs soon (with a complete rewrite in memristor technology)? Scale could become a problem of the past. Quantum computing promised HPC but is still in the experimental stages. In this neuromorphic computing discovery, we have a greater speedup than quantum computing which already has a real-world prototype!
The very notion of a computer will take a quantum leap. But don’t hold your breath - the rough timeline is three years (very rough, admittedly).
Parallel Processing Capabilities: Neuromorphic systems can perform many operations simultaneously due to their massively parallel architecture. Each neuron in a neuromorphic chip can execute different functions at the same time, theoretically allowing neuromorphic devices to handle as many tasks as there are neurons. This parallelism can lead to substantial speed improvements for tasks that are inherently parallelizable, such as those found in machine learning and AI applications.
Event-Driven Computation: Unlike traditional processors that operate on a clock cycle, neuromorphic systems utilize event-driven computation. This means that neurons only activate when they receive input spikes, which can lead to faster processing times as only the relevant parts of the system consume power and process data at any given moment.
Reduced Latency: By co-locating memory and processing, neuromorphic computing avoids the von Neumann bottleneck, where data transfer between separate memory and processing units slows down computation. This integrated approach can result in lower latency and faster response times for complex computations.
Energy Efficiency: Neuromorphic computing systems are designed to be highly energy-efficient. Research indicates that they can be four to sixteen times more energy-efficient than traditional AI systems running on conventional hardware. For instance, Intel's Loihi chip demonstrated this efficiency while processing large neural networks.
Low Power Consumption: The human brain operates on about 20 watts, which is significantly less than what current SOTA processors require for similar tasks. Neuromorphic systems can achieve comparable cognitive tasks with much lower energy budgets, making them suitable for edge computing applications where power is limited.
Idle Power Consumption: In neuromorphic systems, most of the neurons remain idle until activated by an event, leading to dramatically lower overall power consumption during non-active periods. This contrasts sharply with traditional processors, which often consume power continuously even when idle.
Memristors can be integrated into neural networks to create highly efficient and adaptable AI systems. Their ability to store and process information simultaneously allows for faster learning and inference, making them ideal for applications in deep learning and real-time decision-making.
Memristors enable AI processing directly on edge devices, reducing the need for cloud computing. This capability allows for faster response times and lower energy consumption, crucial for applications in autonomous vehicles, drones, and smart sensors.
Memristor technology can facilitate direct communication between the human brain and external devices. This could lead to advanced BMIs that allow for seamless interaction with machines, potentially enhancing capabilities in rehabilitation, assistive technologies, and even cognitive enhancement.
Memristors can significantly improve the efficiency of IoT devices by enabling local data processing. This reduces bandwidth usage and enhances privacy by minimizing data transmission to central servers. Applications include smart homes, industrial automation, and environmental monitoring systems.
When combined with energy harvesting technologies, memristor-based systems can operate independently of traditional power sources. This self-powered capability is particularly useful for remote sensors and devices deployed in challenging environments, such as disaster zones or deep-sea explorations.
The integration of memristors into robotic systems could enhance their learning capabilities and adaptability. Robots could learn from their environments in real-time, improving their performance in tasks ranging from manufacturing to healthcare.
Memristors can process large volumes of data quickly and efficiently, making them suitable for applications in big data analytics. This could transform industries such as finance, healthcare, and telecommunications by enabling faster insights and decision-making.
Memristors are essential for developing neuromorphic computing systems that mimic the brain's architecture. This technology promises to overcome the limitations of traditional computing, enabling more efficient processing of complex tasks like pattern recognition and sensory processing.
Standard computers use a concept called Single Instruction, Multiple Data, also known as SIMD.
Supercomputers using parallel computing have Multiple Instructions, Multiple Data, also known as MIMD.
Memristor-based Neuromorphic Computing requires a new term: Decentralized Instructions, Decentralized Data - DIDD - a term I coined just now.
Every memristor should have a part of the data and a parallel programming language that is decentralized.
Linear Algebra can be performed in single steps using Kirkhoff’s law and Ohm’s law.
Input Voltages: The system applies voltages to the rows (called wordlines). These voltages represent the numbers in a vector that we want to multiply.
Memristor Conductances: Each memristor has a property called conductance, which represents the values in a matrix (the weights). The conductance tells us how much current will flow through the memristor when a voltage is applied.
Ohm's Law: Ohm's Law states that the current flowing through a device (like a memristor) is equal to the voltage across it multiplied by its conductance. This means that if we know the voltage and conductance, we can find out how much current flows.
Kirchhoff's Law: Kirchhoff's Current Law says that the total current entering a point (like where several memristors connect) must equal the total current leaving that point. In this case, it helps us add up all the currents from the memristors in a column.
Output Currents: The currents that come out of each column represent the results of multiplying the vector by the matrix.
Matrix-Vector Multiplication: The system can multiply a matrix by a vector very efficiently without moving data around.
Matrix-Matrix Multiplication: By using multiple input vectors at once, it can also multiply two matrices together.
Dot Product: The system can calculate the dot product of two vectors by using one vector as input and treating the other as weights in the memristors.
Matrix Inversion: It can help solve systems of equations by performing matrix inversion operations.
Other Calculations: It can also perform other important calculations used in machine learning and data analysis.
Which means that:
Memristors have the potential to disrupt traditional GPU architectures (and thus Nvidia) in several ways:
Memristors offer major advantages in speed, efficiency, scalability and neuromorphic capabilities compared to GPUs.
This makes them a promising alternative for accelerating AI workloads, especially on resource-constrained devices.
International Researchers Move the Needle on Memristor Technology
Memristor Technology and Applications: An Overview
Memristor Report
AI-Driven Memristor-Based Microchip Design: A Comprehensive Study
Recent Advances in In-Memory Computing: Exploring Memristor and Memtransistor Arrays with 2D Materials
The Ouroboros of Memristors: Neural Networks Facilitating Memristor Programming
Advances in Memristor-Based Neural Networks
Memristors—From In‐Memory Computing, Deep Learning Acceleration, and Spiking Neural Networks to the Future of Neuromorphic and Bio‐Inspired Computing