Code for single-eye depth estimation using Intel Midas open source model on Google Colab using Intel Midas open source model with Pytorch and OpenCV

by admin · March 27, 2025

Monocular depth estimation involves predicting scene depth from a single RGB image, a fundamental task in computer vision with a wide range of applications, including augmented reality, robotics, and understanding of 3D scenes. In this tutorial, we implement Intel’s MIDAS (Single-eye depth estimation through multi-scale vision transformers), a state-of-the-art model designed to predict high-quality depth predictions from a single image. This tutorial uses Google Colab as a computing platform, as well as Pytorch, OpenCV, and Matplotlib, allowing you to upload images and easily visualize the corresponding depth maps.

!pip install -q timm opencv-python matplotlib

First, we installed the necessary Python libraries – TIMM for model support, OpenCV-Python for image processing and Matplotlib for visualizing depth maps.

!git clone 
%cd MiDaS

We then clone the official Intel Midas repository from Github and navigate to its directory to access the model code and transform utilities.

import torch
import cv2
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
from torchvision.transforms import Compose
from google.colab import files


from midas.dpt_depth import DPTDepthModel
from midas.transforms import Resize, NormalizeImage, PrepareForNet
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

We import all the necessary libraries and MIDAS components required to load models, preprocess images, process uploads and visualize depth predictions. We then set the computing device to GPU (CUDA) if there is one; otherwise, it will default to CPU, ensuring system compatibility.

model_path = torch.hub.load("intel-isl/MiDaS", "DPT_Large", pretrained=True, force_reload=True)
model = model_path.to(device)
model.eval()

Here we downloaded the expected MIDAS DPT_LARGE model from Intel’s Torch.hub, moved it to the selected device (CPU or GPU), and set it to the inference evaluation mode.

transform = Compose([
    Resize(384, 384, resize_target=None, keep_aspect_ratio=True, ensure_multiple_of=32, resize_method="upper_bound"),
    NormalizeImage(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    PrepareForNet()
])

We define an image preprocessing pipeline for MIDAS that adjusts the input image, normalizes its pixel values, and is appropriately used for model inference.

uploaded = files.upload()
for filename in uploaded:
    img = cv2.imread(filename)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    break

We allow users to upload an image in COLAB, read it using OPENCV, and then convert it from BGR to RGB format for accurate color representation.

img_input = transform({"image": img})["image"]
input_tensor = torch.from_numpy(img_input).unsqueeze(0).to(device)


with torch.no_grad():
    prediction = model(input_tensor)
    prediction = torch.nn.functional.interpolate(
        prediction.unsqueeze(1),
        size=img.shape[:2],
        mode="bicubic",
        align_corners=False,
    ).squeeze()


depth_map = prediction.cpu().numpy()

Now we apply the preprocessing transform to the uploaded image, convert it into a tensor, perform depth prediction using the MIDAS model, resize the output to match the original image size, and then extract the final depth map into a numpy array.

plt.figure(figsize=(10, 5))


plt.subplot(1, 2, 1)
plt.imshow(img)
plt.title("Original Image")
plt.axis("off")


plt.subplot(1, 2, 2)
plt.imshow(depth_map, cmap='inferno')
plt.title("Depth Map")
plt.axis("off")


plt.tight_layout()
plt.show()

Finally, we use matplotlib to create side-by-side visualization of the original image and its corresponding depth map. The depth map is displayed using the “Inferno” colormap for better contrast.

In summary, by completing this tutorial, we successfully deployed Intel’s MIDAS model on Google Colab, performing monocular depth estimation using only RGB images. Using Pytorch for model inference, OPENCV for image processing, and Matplotlib for visualization, we built a powerful pipeline to generate high-quality depth maps with minimal settings. This implementation is a powerful foundation for further exploration, including video depth estimation, real-time applications and AR/VR system integration.

This is COLAB notebook. Also, don’t forget to follow us twitter And join us Telegram Channel and LinkedIn GrOUP. Don’t forget to join us 85k+ ml reddit.

Asif Razzaq is CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, ASIF is committed to harnessing the potential of artificial intelligence to achieve social benefits. His recent effort is to launch Marktechpost, an artificial intelligence media platform that has an in-depth coverage of machine learning and deep learning news that can sound both technically, both through technical voices and be understood by a wide audience. The platform has over 2 million views per month, demonstrating its popularity among its audience.