Coding implementation for building neural memory agents with differentiable memory, meta-learning and experience replay to enable continuous adaptation in dynamic environments

by admin · November 10, 2025

In this tutorial, we explore how neural memory agents can continuously learn without forgetting past experiences. We design a memory-augmented neural network that integrates a differentiable neural computer (DNC) with experience replay and meta-learning to quickly adapt to new tasks while retaining prior knowledge. By implementing this approach in PyTorch, we demonstrate how content-based memory addressing and prioritized replay enable models to overcome catastrophic forgetting and maintain performance across multiple learning tasks. Check The complete code is here.

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from collections import deque
import matplotlib.pyplot as plt
from dataclasses import dataclass


@dataclass
class MemoryConfig:
   memory_size: int = 128
   memory_dim: int = 64
   num_read_heads: int = 4
   num_write_heads: int = 1

We first import all necessary libraries and define configuration classes for our neural memory system. Here we set parameters such as memory size, dimensionality, and number of read/write heads, which determine how the differentiable memory behaves throughout the training process. This setup is the basis on which our memory-enhanced architecture is built. Check The complete code is here.

class NeuralMemoryBank(nn.Module):
   def __init__(self, config: MemoryConfig):
       super().__init__()
       self.memory_size = config.memory_size
       self.memory_dim = config.memory_dim
       self.num_read_heads = config.num_read_heads
       self.register_buffer('memory', torch.zeros(config.memory_size, config.memory_dim))
       self.register_buffer('usage', torch.zeros(config.memory_size))
   def content_addressing(self, key, beta):
       key_norm = F.normalize(key, dim=-1)
       mem_norm = F.normalize(self.memory, dim=-1)
       similarity = torch.matmul(key_norm, mem_norm.t())
       return F.softmax(beta * similarity, dim=-1)
   def write(self, write_key, write_vector, erase_vector, write_strength):
       write_weights = self.content_addressing(write_key, write_strength)
       erase = torch.outer(write_weights.squeeze(), erase_vector.squeeze())
       self.memory = (self.memory * (1 - erase)).detach()
       add = torch.outer(write_weights.squeeze(), write_vector.squeeze())
       self.memory = (self.memory + add).detach()
       self.usage = (0.99 * self.usage + write_weights.squeeze()).detach()
   def read(self, read_keys, read_strengths):
       reads = []
       for i in range(self.num_read_heads):
           weights = self.content_addressing(read_keys[i], read_strengths[i])
           read_vector = torch.matmul(weights, self.memory)
           reads.append(read_vector)
       return torch.cat(reads, dim=-1)


class MemoryController(nn.Module):
   def __init__(self, input_dim, hidden_dim, memory_config: MemoryConfig):
       super().__init__()
       self.hidden_dim = hidden_dim
       self.memory_config = memory_config
       self.lstm = nn.LSTM(input_dim, hidden_dim, batch_first=True)
       total_read_dim = memory_config.num_read_heads * memory_config.memory_dim
       self.read_keys = nn.Linear(hidden_dim, memory_config.num_read_heads * memory_config.memory_dim)
       self.read_strengths = nn.Linear(hidden_dim, memory_config.num_read_heads)
       self.write_key = nn.Linear(hidden_dim, memory_config.memory_dim)
       self.write_vector = nn.Linear(hidden_dim, memory_config.memory_dim)
       self.erase_vector = nn.Linear(hidden_dim, memory_config.memory_dim)
       self.write_strength = nn.Linear(hidden_dim, 1)
       self.output = nn.Linear(hidden_dim + total_read_dim, input_dim)
   def forward(self, x, memory_bank, hidden=None):
       lstm_out, hidden = self.lstm(x.unsqueeze(0), hidden)
       controller_state = lstm_out.squeeze(0)
       read_k = self.read_keys(controller_state).view(self.memory_config.num_read_heads, -1)
       read_s = F.softplus(self.read_strengths(controller_state))
       write_k = self.write_key(controller_state)
       write_v = torch.tanh(self.write_vector(controller_state))
       erase_v = torch.sigmoid(self.erase_vector(controller_state))
       write_s = F.softplus(self.write_strength(controller_state))
       read_vectors = memory_bank.read(read_k, read_s)
       memory_bank.write(write_k, write_v, erase_v, write_s)
       combined = torch.cat([controller_state, read_vectors], dim=-1)
       output = self.output(combined)
       return output, hidden

We implemented a neural memory bank and a memory controller, which together form the core of the agent’s differentiable memory mechanism. The neural memory bank stores and retrieves information through content-based addressing, while the controller network dynamically interacts with this memory using read and write operations. This setup enables the agent to recall relevant information and adapt to new inputs efficiently. Check The complete code is here.

class ExperienceReplay:
   def __init__(self, capacity=10000, alpha=0.6):
       self.capacity = capacity
       self.alpha = alpha
       self.buffer = deque(maxlen=capacity)
       self.priorities = deque(maxlen=capacity)
   def push(self, experience, priority=1.0):
       self.buffer.append(experience)
       self.priorities.append(priority ** self.alpha)
   def sample(self, batch_size, beta=0.4):
       if len(self.buffer) == 0:
           return [], []
       probs = np.array(self.priorities)
       probs = probs / probs.sum()
       indices = np.random.choice(len(self.buffer), min(batch_size, len(self.buffer)), p=probs, replace=False)
       samples = [self.buffer[i] for i in indices]
       weights = (len(self.buffer) * probs[indices]) ** (-beta)
       weights = weights / weights.max()
       return samples, torch.FloatTensor(weights)


class MetaLearner(nn.Module):
   def __init__(self, model):
       super().__init__()
       self.model = model
   def adapt(self, support_x, support_y, num_steps=5, lr=0.01):
       adapted_params = {name: param.clone() for name, param in self.model.named_parameters()}
       for _ in range(num_steps):
           pred, _ = self.model(support_x, self.model.memory_bank)
           loss = F.mse_loss(pred, support_y)
           grads = torch.autograd.grad(loss, self.model.parameters(), create_graph=True)
           adapted_params = {name: param - lr * grad for (name, param), grad in zip(adapted_params.items(), grads)}
       return adapted_params

We designed experience replay and meta-learner components to enhance the agent’s ability to continue learning. The replay buffer enables the model to relive past experiences through preferential sampling, thereby reducing forgetting, while the meta-learner leverages MAML-style adaptation to quickly learn new tasks. Together, these modules bring stability and flexibility to the agent training process. Check The complete code is here.

class ContinualLearningAgent:
   def __init__(self, input_dim=64, hidden_dim=128):
       self.config = MemoryConfig()
       self.memory_bank = NeuralMemoryBank(self.config)
       self.controller = MemoryController(input_dim, hidden_dim, self.config)
       self.replay_buffer = ExperienceReplay(capacity=5000)
       self.meta_learner = MetaLearner(self.controller)
       self.optimizer = torch.optim.Adam(self.controller.parameters(), lr=0.001)
       self.task_history = []
   def train_step(self, x, y, use_replay=True):
       self.optimizer.zero_grad()
       pred, _ = self.controller(x, self.memory_bank)
       current_loss = F.mse_loss(pred, y)
       self.replay_buffer.push((x.detach().clone(), y.detach().clone()), priority=current_loss.item() + 1e-6)
       total_loss = current_loss
       if use_replay and len(self.replay_buffer.buffer) > 16:
           samples, weights = self.replay_buffer.sample(8)
           for (replay_x, replay_y), weight in zip(samples, weights):
               with torch.enable_grad():
                   replay_pred, _ = self.controller(replay_x, self.memory_bank)
                   replay_loss = F.mse_loss(replay_pred, replay_y)
                   total_loss = total_loss + 0.3 * replay_loss * weight
       total_loss.backward()
       torch.nn.utils.clip_grad_norm_(self.controller.parameters(), 1.0)
       self.optimizer.step()
       return total_loss.item()
   def evaluate(self, test_data):
       self.controller.eval()
       total_error = 0
       with torch.no_grad():
           for x, y in test_data:
               pred, _ = self.controller(x, self.memory_bank)
               total_error += F.mse_loss(pred, y).item()
       self.controller.train()
       return total_error / len(test_data)

We build a continuous learning agent that integrates memory, controller, replay, and meta-learning into a single adaptive framework. In this step, we define how the agent is trained on each batch, replaying past data and evaluating its performance. This implementation ensures that the model can learn new information while retaining prior knowledge without catastrophic forgetting. Check The complete code is here.

def create_task_data(task_id, num_samples=100):
   torch.manual_seed(task_id)
   x = torch.randn(num_samples, 64)
   if task_id == 0:
       y = torch.sin(x.mean(dim=1, keepdim=True).expand(-1, 64))
   elif task_id == 1:
       y = torch.cos(x.mean(dim=1, keepdim=True).expand(-1, 64)) * 0.5
   else:
       y = torch.tanh(x * 0.5 + task_id)
   return [(x[i], y[i]) for i in range(num_samples)]


def run_continual_learning_demo():
   print("🧠 Neural Memory Agent - Continual Learning Demon")
   print("=" * 60)
   agent = ContinualLearningAgent()
   num_tasks = 4
   results = {'tasks': [], 'without_memory': [], 'with_memory': []}
   for task_id in range(num_tasks):
       print(f"n📚 Learning Task {task_id + 1}/{num_tasks}")
       train_data = create_task_data(task_id, num_samples=50)
       test_data = create_task_data(task_id, num_samples=20)
       for epoch in range(20):
           total_loss = 0
           for x, y in train_data:
               loss = agent.train_step(x, y, use_replay=(task_id > 0))
               total_loss += loss
           if epoch % 5 == 0:
               avg_loss = total_loss / len(train_data)
               print(f"  Epoch {epoch:2d}: Loss = {avg_loss:.4f}")
       print(f"n  📊 Evaluation on all tasks:")
       for eval_task_id in range(task_id + 1):
           eval_data = create_task_data(eval_task_id, num_samples=20)
           error = agent.evaluate(eval_data)
           print(f"    Task {eval_task_id + 1}: Error = {error:.4f}")
           if eval_task_id == task_id:
               results['tasks'].append(eval_task_id + 1)
               results['with_memory'].append(error)
   fig, axes = plt.subplots(1, 2, figsize=(14, 5))
   ax = axes[0]
   memory_matrix = agent.memory_bank.memory.detach().numpy()
   im = ax.imshow(memory_matrix, aspect="auto", cmap='viridis')
   ax.set_title('Neural Memory Bank State', fontsize=14, fontweight="bold")
   ax.set_xlabel('Memory Dimension')
   ax.set_ylabel('Memory Slots')
   plt.colorbar(im, ax=ax)
   ax = axes[1]
   ax.plot(results['tasks'], results['with_memory'], marker="o", linewidth=2, markersize=8, label="With Memory Replay")
   ax.set_title('Continual Learning Performance', fontsize=14, fontweight="bold")
   ax.set_xlabel('Task Number')
   ax.set_ylabel('Test Error')
   ax.legend()
   ax.grid(True, alpha=0.3)
   plt.tight_layout()
   plt.savefig('neural_memory_results.png', dpi=150, bbox_inches="tight")
   print("n✅ Results saved to 'neural_memory_results.png'")
   plt.show()
   print("n" + "=" * 60)
   print("🎯 Key Insights:")
   print("  • Memory bank stores compressed task representations")
   print("  • Experience replay mitigates catastrophic forgetting")
   print("  • Agent maintains performance on earlier tasks")
   print("  • Content-based addressing enables efficient retrieval")


if __name__ == "__main__":
   run_continual_learning_demo()

We provide a comprehensive demonstration of the continuous learning process, generating comprehensive tasks to assess an agent’s adaptability across multiple environments. As we train and visualize the results, we observe how memory replay improves stability and maintains accuracy across tasks. The experiment concludes with graphical insights that highlight how differentiable memory can enhance an agent’s long-term learning capabilities.

In summary, we built and trained a neural memory agent that can continuously adapt to evolving tasks. We observed how differentiable memory enables efficient storage and retrieval of learned representations, while replay mechanisms enhance stability and knowledge retention. By combining these components with meta-learning, we see how these agents can pave the way for more resilient, adaptive neural systems that can remember, reason, and evolve without losing the knowledge they already possess.

Check The complete code is here. Please feel free to check out our GitHub page for tutorials, code, and notebooks. In addition, welcome to follow us twitter And don’t forget to join our 100k+ ML SubReddit and subscribe our newsletter. wait! Are you using Telegram? Now you can also join us via telegram.

Asif Razzaq is the CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of artificial intelligence for the benefit of society. His most recent endeavor is the launch of Marktechpost, an artificial intelligence media platform that stands out for its in-depth coverage of machine learning and deep learning news that is technically sound and easy to understand for a broad audience. The platform has more than 2 million monthly views, which shows that it is very popular among viewers.

🙌 FOLLOW MARKTECHPOST: Add us as your go-to source on Google.