Build an end-to-end object tracking and analysis system using RoboFlow supervision

by admin · August 3, 2025

In this advanced Roboflow Supervision Tutorial, we built a complete object detection pipeline using the Supervisory Library. First, we set up real-time object tracking using subschemas, add detection smoothing, and define polygonal areas to monitor specific areas in the video stream. When we process frames, we annotate them with bounding boxes, object IDs, and velocity data, allowing us to track and analyze object behavior over time. Our goal is to demonstrate how detection, tracking, region-based analysis and visual annotation can be combined into a seamless and intelligent video analytics workflow. Check The complete code is here.

!pip install supervision ultralytics opencv-python
!pip install --upgrade supervision 


import cv2
import numpy as np
import supervision as sv
from ultralytics import YOLO
import matplotlib.pyplot as plt
from collections import defaultdict


model = YOLO('yolov8n.pt')

We first install the necessary packages including supervision, superanalytics and OPENCV. Once we have the latest version of supervision, we import all the required libraries. We then initialize the Yolov8n model, which is the core detector in our pipeline. Check The complete code is here.

try:
   tracker = sv.ByteTrack()
except AttributeError:
   try:
       tracker = sv.ByteTracker()
   except AttributeError:
       print("Using basic tracking - install latest supervision for advanced tracking")
       tracker = None


try:
   smoother = sv.DetectionsSmoother(length=5)
except AttributeError:
   smoother = None
   print("DetectionsSmoother not available in this version")


try:
   box_annotator = sv.BoundingBoxAnnotator(thickness=2)
   label_annotator = sv.LabelAnnotator()
   if hasattr(sv, 'TraceAnnotator'):
       trace_annotator = sv.TraceAnnotator(thickness=2, trace_length=30)
   else:
       trace_annotator = None
except AttributeError:
   try:
       box_annotator = sv.BoxAnnotator(thickness=2)
       label_annotator = sv.LabelAnnotator()
       trace_annotator = None
   except AttributeError:
       print("Using basic annotators - some features may be limited")
       box_annotator = None
       label_annotator = None 
       trace_annotator = None


def create_zones(frame_shape):
   h, w = frame_shape[:2]
  
   try:
       entry_zone = sv.PolygonZone(
           polygon=np.array([[0, h//3], [w//3, h//3], [w//3, 2*h//3], [0, 2*h//3]]),
           frame_resolution_wh=(w, h)
       )
      
       exit_zone = sv.PolygonZone(
           polygon=np.array([[2*w//3, h//3], [w, h//3], [w, 2*h//3], [2*w//3, 2*h//3]]),
           frame_resolution_wh=(w, h)
       )
   except TypeError:
       entry_zone = sv.PolygonZone(
           polygon=np.array([[0, h//3], [w//3, h//3], [w//3, 2*h//3], [0, 2*h//3]])
       )
       exit_zone = sv.PolygonZone(
           polygon=np.array([[2*w//3, h//3], [w, h//3], [w, 2*h//3], [2*w//3, 2*h//3]])
       )
  
   return entry_zone, exit_zone

We set up the required components from the supervision library, including using byte link tracking, using detected optional smoothing, and a flexible annotator for bounding boxes, labels, and traces. To ensure cross-version compatibility, we use the Try-Except block to return alternative classes or basic functions when needed. Furthermore, we define dynamic polygonal regions within the framework to monitor specific regions such as inlet and exit areas, thus enabling advanced spatial analysis. Check The complete code is here.

class AdvancedAnalytics:
   def __init__(self):
       self.track_history = defaultdict(list)
       self.zone_crossings = {"entry": 0, "exit": 0}
       self.speed_data = defaultdict(list)
      
   def update_tracking(self, detections):
       if hasattr(detections, 'tracker_id') and detections.tracker_id is not None:
           for i in range(len(detections)):
               track_id = detections.tracker_id[i]
               if track_id is not None:
                   bbox = detections.xyxy[i]
                   center = np.array([(bbox[0] + bbox[2]) / 2, (bbox[1] + bbox[3]) / 2])
                   self.track_history[track_id].append(center)
                  
                   if len(self.track_history[track_id]) >= 2:
                       prev_pos = self.track_history[track_id][-2]
                       curr_pos = self.track_history[track_id][-1]
                       speed = np.linalg.norm(curr_pos - prev_pos)
                       self.speed_data[track_id].append(speed)
  
   def get_statistics(self):
       total_tracks = len(self.track_history)
       avg_speed = np.mean([np.mean(speeds) for speeds in self.speed_data.values() if speeds])
       return {
           "total_objects": total_tracks,
           "zone_entries": self.zone_crossings["entry"],
           "zone_exits": self.zone_crossings["exit"],
           "avg_speed": avg_speed if not np.isnan(avg_speed) else 0
       }


def process_video(source=0, max_frames=300):
   """
   Process video source with advanced supervision features
   source: video path or 0 for webcam
   max_frames: limit processing for demo
   """
   cap = cv2.VideoCapture(source)
   analytics = AdvancedAnalytics()
  
   ret, frame = cap.read()
   if not ret:
       print("Failed to read video source")
       return
  
   entry_zone, exit_zone = create_zones(frame.shape)
  
   try:
       entry_zone_annotator = sv.PolygonZoneAnnotator(
           zone=entry_zone,
           color=sv.Color.GREEN,
           thickness=2
       )
       exit_zone_annotator = sv.PolygonZoneAnnotator(
           zone=exit_zone,
           color=sv.Color.RED,
           thickness=2
       )
   except (AttributeError, TypeError):
       entry_zone_annotator = sv.PolygonZoneAnnotator(zone=entry_zone)
       exit_zone_annotator = sv.PolygonZoneAnnotator(zone=exit_zone)
  
   frame_count = 0
   results_frames = []
  
   cap.set(cv2.CAP_PROP_POS_FRAMES, 0) 
  
   while ret and frame_count

We define advanced analytics classes to track object motion, calculate velocity, and count area intersections, enabling rich real-time video insights. In the Process_Video function, we read each frame from the video source and run through our detection, tracking and smoothing pipelines. We use bounding boxes, labels, area coverage and real-time statistics annotation framework to provide us with a powerful and flexible system for object monitoring and spatial analysis. Throughout the loop, we also collect data to visualize and print final statistics and demonstrate the effectiveness of the end-to-end functionality of Roboflow Massission. Check The complete code is here.

def create_demo_video():
   """Create a simple demo video with moving objects"""
   fourcc = cv2.VideoWriter_fourcc(*'mp4v')
   out = cv2.VideoWriter('demo.mp4', fourcc, 20.0, (640, 480))
  
   for i in range(100):
       frame = np.zeros((480, 640, 3), dtype=np.uint8)
      
       x1 = int(50 + i * 2)
       y1 = 200
       x2 = int(100 + i * 1.5)
       y2 = 250
      
       cv2.rectangle(frame, (x1, y1), (x1+50, y1+50), (0, 255, 0), -1)
       cv2.rectangle(frame, (x2, y2), (x2+50, y2+50), (255, 0, 0), -1)
      
       out.write(frame)
  
   out.release()
   return 'demo.mp4'


demo_video = create_demo_video()
analytics = process_video(demo_video, max_frames=100)


print("nTutorial completed! Key features demonstrated:")
print("✓ YOLO integration with Supervision")
print("✓ Multi-object tracking with ByteTracker")
print("✓ Detection smoothing")
print("✓ Polygon zones for area monitoring")
print("✓ Advanced annotations (boxes, labels, traces)")
print("✓ Real-time analytics and statistics")
print("✓ Speed calculation and tracking history")

To test our full pipeline, we generate a synthetic demo video with two moving rectangles simulated tracking objects. This allows us to verify detection, tracking, area monitoring and speed analysis without actual input. We then run the Process_Video function on the generated clip. Finally, we print a summary of all the key features we implement, demonstrating the power of Roboflow supervising real-time visual analysis.

In short, we successfully implemented a complete pipeline that aggregates object detection, tracking, area monitoring and real-time analysis. We demonstrate how to visualize key insights such as object speed, region crossover and track history using annotated video frames. This setup allows us to go beyond basic detection and build intelligent monitoring or analysis systems using open source tools. Whether it is research or production use, we now have a strong foundation that can be expanded through more advanced capabilities.

Check The complete code is here. Check out ours anytime Tutorials, codes and notebooks for github pages. Also, please stay tuned for us twitter And don’t forget to join us 100K+ ml reddit And subscribe Our newsletter.

Asif Razzaq is CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, ASIF is committed to harnessing the potential of artificial intelligence to achieve social benefits. His recent effort is to launch Marktechpost, an artificial intelligence media platform that has an in-depth coverage of machine learning and deep learning news that can sound both technically, both through technical voices and be understood by a wide audience. The platform has over 2 million views per month, demonstrating its popularity among its audience.