Introduction
The evolution of computer vision has been marked by continuous advancements, and one of the latest breakthroughs is the Ultralytics YOLO models. It has proven to be a powerful tool for real-time object detection, classification, and segmentation. In this blog, we'll walk you through how to build a real-time object-tracking application using Streamlit, leveraging the power of YOLO, SAM and other models from Ultralytics.
Overview
The goal of this project is to create a user-friendly interface where users can choose between various tasks such as object detection, segmentation, classification, and more. The app is capable of processing input from a webcam, image, or video file, and utilizes state-of-the-art models from Ultralytics.
Key Aspects
- Model Selection Based on Task - Choose models dynamically based on the task (detection, segmentation, etc.).
- Streamlit Integration - Use Streamlit to build an interactive user interface, enabling easy uploading of images/videos, configuring models, and displaying results.
- Real-Time Video/Image Processing - Process video and images in real-time, applying the chosen model for the selected task.
- Efficient Resource Management - Manage GPU resources and video capture devices effectively to ensure smooth performance.
Code Breakdown
Model Selection with YOLO
The application provides several tasks like object detection, segmentation, classification, pose detection, and oriented bounding boxes (OBB). Based on the selected task, the app retrieves the appropriate model from the list of available Ultralytics models.
def GetModels(task):
if task == "Detect":
available_models = [
x.replace("yolo", "YOLO") for x in GITHUB_ASSETS_STEMS
if not (x.endswith("-seg") or x.endswith("-cls") or x.endswith("-pose") or x.endswith("-obb"))
]
elif task == "Segment":
available_models = [
x.replace("yolo", "YOLO") for x in GITHUB_ASSETS_STEMS
if x.startswith("sam_") or x.endswith("-seg.pt")
]
...In this section of the code, the GetModels( ) function dynamically selects models based on the task. If the user selects "Detect," the function filters out models related to segmentation or classification and presents only detection models. Similarly, other task options like segmentation or classification will filter the appropriate models.
Interactive Streamlit Interface
Streamlit offers an easy way to build w eb apps in Python, and it's a perfect choice for creating an interface for our object detection tool.
def Initialize(model=None):
check_requirements("streamlit>=1.29.0")
import streamlit as st
from ultralytics import YOLO, SAM, FastSAM
# Configure the Streamlit app
st.set_page_config(page_title="Object Tracking", layout="wide")
st.markdown("""<style>MainMenu {visibility: hidden;}</style>""", unsafe_allow_html=True)
st.markdown("<h1 style='text-align:center;'>Object Tracking Application</h1>", unsafe_allow_html=True)This function is the core of the application and starts by checking the necessary requirements. It uses Streamlit to create a minimalistic yet functional interface. The main title and layout are configured, and the sidebar allows users to select the input source (image, video, or webcam) and the task they want to perform (detect, segment, classify, etc.).
Real-Time Processing
Once the user selects a task and model, the application processes the input (either an image or a video). This section of the code handles video capture and runs the selected model in real-time, applying the chosen task.
if model and st.sidebar.button("Start"):
if source == "webcam" or source == "video":
videocapture = cv2.VideoCapture(file_name)
while videocapture.isOpened():
success, frame = videocapture.read()
if not success:
break
results = model(frame, conf=conf, iou=iou, classes=selected_ind)
annotated_frame = results[0].plot()
# Display original and annotated frames
org_frame.image(frame, channels="BGR")
ann_frame.image(annotated_frame, channels="BGR")Here, we capture video frames using OpenCV (cv2.VideoCapture), process them with the selected YOLO model, and display the original and annotated frames side-by-side in real-time.
Efficient Resource Management
Since we're working with potentially large models and GPU resources, it's essential to release resources after each operation.
def ClearAllResources(videocapture, st):
if videocapture:
videocapture.release()
torch.cuda.empty_cache() # Clear GPU memory
cv2.destroyAllWindows() # Close any open OpenCV windowsThe ClearAllResources() function ensures that video capture devices are released, and GPU memory is cleared when they are no longer needed. This keeps the app from crashing due to excessive memory usage, especially when handling high-definition videos.
Key Features
- Model Flexibility - The app supports multiple tasks (detection, segmentation, etc.) and automatically loads the appropriate models based on user selection.
- Real-Time Performance - With GPU acceleration and efficient model inference, the app can handle real-time video input without lag.
- User-Friendly Interface - The intuitive Streamlit interface allows users to quickly select their input source, task, and model, making it accessible even for non-experts.
- Customizable Thresholds - The app provides controls for confidence and IoU thresholds, allowing users to tweak detection sensitivity to fit their needs.
Conclusion
This blog provides a detailed look into how to build a Streamlit application for object detection, classification, and segmentation using Ultralytics YOLO models. The combination of Ultralytics' advanced models and Streamlit's easy-to-use interface makes this a powerful tool for anyone interested in computer vision. The code can be further expanded by integrating additional Ultralytics models or implementing advanced features like multi-class tracking and interactive feedback for model training.
Leave a comment
Your email address will not be published. Required fields are marked *


