Exploring Ultralytics YOLO Models

Introduction

The evolution of computer vision has been marked by continuous advancements, and one of the latest breakthroughs is the Ultralytics YOLO models. It has proven to be a powerful tool for real-time object detection, classification, and segmentation. In this blog, we'll walk you through how to build a real-time object-tracking application using Streamlit, leveraging the power of YOLO, SAM and other models from Ultralytics.

Overview

The goal of this project is to create a user-friendly interface where users can choose between various tasks such as object detection, segmentation, classification, and more. The app is capable of processing input from a webcam, image, or video file, and utilizes state-of-the-art models from Ultralytics.  

Key Aspects   

  • Model Selection Based on Task - Choose models dynamically based on the task (detection, segmentation, etc.).    
  • Streamlit Integration - Use Streamlit to build an interactive user interface, enabling easy uploading of images/videos, configuring models, and displaying results.  
  • Real-Time Video/Image Processing - Process video and images in real-time, applying the chosen model for the selected task.    
  • Efficient Resource Management - Manage GPU resources and video capture devices effectively to ensure smooth performance.    

Code Breakdown

Model Selection with YOLO  

The application provides several tasks like object detection, segmentation, classification, pose detection, and oriented bounding boxes (OBB). Based on the selected task, the app retrieves the appropriate model from the list of available Ultralytics models.

def GetModels(task):
   if task == "Detect":
       available_models = [
           x.replace("yolo", "YOLO") for x in GITHUB_ASSETS_STEMS 
           if not (x.endswith("-seg") or x.endswith("-cls") or x.endswith("-pose") or x.endswith("-obb"))
       ]
   elif task == "Segment":
       available_models = [
           x.replace("yolo", "YOLO") for x in GITHUB_ASSETS_STEMS 
           if x.startswith("sam_") or x.endswith("-seg.pt")
       ]
   ...

In this section of the code, the GetModels( ) function dynamically selects models based on the task. If the user selects "Detect," the function filters out models related to segmentation or classification and presents only detection models. Similarly, other task options like segmentation or classification will filter the appropriate models.

Interactive Streamlit Interface  

Streamlit offers an easy way to build w eb apps in Python, and it's a perfect choice for creating an interface for our object detection tool.

def Initialize(model=None):
   check_requirements("streamlit>=1.29.0")
   import streamlit as st
   from ultralytics import YOLO, SAM, FastSAM
   # Configure the Streamlit app
   st.set_page_config(page_title="Object Tracking", layout="wide")
   st.markdown("""<style>MainMenu {visibility: hidden;}</style>""", unsafe_allow_html=True)
   st.markdown("<h1 style='text-align:center;'>Object Tracking Application</h1>", unsafe_allow_html=True)

This function is the core of the application and starts by checking the necessary requirements. It uses Streamlit to create a minimalistic yet functional interface. The main title and layout are configured, and the sidebar allows users to select the input source (image, video, or webcam) and the task they want to perform (detect, segment, classify, etc.).

Real-Time Processing   

Once the user selects a task and model, the application processes the input (either an image or a video). This section of the code handles video capture and runs the selected model in real-time, applying the chosen task.

if model and st.sidebar.button("Start"):
   if source == "webcam" or source == "video":
       videocapture = cv2.VideoCapture(file_name)
       while videocapture.isOpened():
           success, frame = videocapture.read()
           if not success:
               break
           results = model(frame, conf=conf, iou=iou, classes=selected_ind)
           annotated_frame = results[0].plot()
           # Display original and annotated frames
           org_frame.image(frame, channels="BGR")
           ann_frame.image(annotated_frame, channels="BGR")

Here, we capture video frames using OpenCV (cv2.VideoCapture), process them with the selected YOLO model, and display the original and annotated frames side-by-side in real-time.

Efficient Resource Management  

Since we're working with potentially large models and GPU resources, it's essential to release resources after each operation.

def ClearAllResources(videocapture, st):
   if videocapture:
       videocapture.release()
   torch.cuda.empty_cache()  # Clear GPU memory
   cv2.destroyAllWindows()   # Close any open OpenCV windows

The ClearAllResources() function ensures that video capture devices are released, and GPU memory is cleared when they are no longer needed. This keeps the app from crashing due to excessive memory usage, especially when handling high-definition videos.

Key Features

  • Model Flexibility - The app supports multiple tasks (detection, segmentation, etc.) and automatically loads the appropriate models based on user selection.
  • Real-Time Performance - With GPU acceleration and efficient model inference, the app can handle real-time video input without lag.
  • User-Friendly Interface - The intuitive Streamlit interface allows users to quickly select their input source, task, and model, making it accessible even for non-experts.
  • Customizable Thresholds - The app provides controls for confidence and IoU thresholds, allowing users to tweak detection sensitivity to fit their needs.

Conclusion

This blog provides a detailed look into how to build a Streamlit application for object detection, classification, and segmentation using Ultralytics YOLO models. The combination of Ultralytics' advanced models and Streamlit's easy-to-use interface makes this a powerful tool for anyone interested in computer vision. The code can be further expanded by integrating additional Ultralytics models or implementing advanced features like multi-class tracking and interactive feedback for model training. 

Arun Gopalakrishnan
Senior Module Lead

Leave a comment

Your email address will not be published. Required fields are marked *