Create a Custom Object Detection Model with Photos

2025-12-03 | By Hector Eduardo Tovar Mendoza

Object detection enables computers to recognize and locate objects in images, videos, or live streams, such as cars, people, or animals. But what if you don’t need a model that detects everything, and instead want it to recognize just one specific object? That’s where custom object detection comes in. By creating your own dataset and training a model with images you capture, you can teach it to focus exclusively on what matters to you, be it a product, a tool, or something entirely unique.

Tools You’ll Use

For this project, you will need the following:

Roboflow: for organizing, labeling, and augmenting your images.
YOLO (You Only Look Once): for fast and efficient object detection.
Vscode (The code editor we will be using for this)
Python (Any version, minimum 3.8)
Computer (Windows, Mac, Linux)
Webcam (Integrated in your computer or webcam)

Installation & tools

After installing VS Code, you must create a folder for this project. Then, go to the top bar, look for Terminal, and click on New Terminal. After this, at the bottom, you will see a terminal. There, write the following command:

Image of Create a Custom Object Detection Model with Photos

The command:

python (or python3) -m venv env

After doing this, you will see a folder called env. This is a virtual environment that helps you manage dependencies. The next step is to activate this environment:

In Windows: ./env/Scripts/Activate.ps1
In Mac/Linux: env/bin/activate

Image of Create a Custom Object Detection Model with Photos

Now you'll see next to your command line in the terminal a green (env). This means the environment is activated. Now we must install the packages for this project.

The commands are:

pip install opencv-python
pip install ultralytics

Finally, create a file called main.py with the New File button in VS Code.

Image of Create a Custom Object Detection Model with Photos

Preparing Your Dataset

The first step in building a custom object detection model is preparing your dataset. After signing in to Roboflow, start by creating a new workspace. You can name it however you like, and then select the free public plan to get started.

Image of Create a Custom Object Detection Model with Photos

After that, we should create a new project you can do by clicking on the purple button.

Image of Create a Custom Object Detection Model with Photos

After clicking it, we shall see the following options. For this project, we must select object detection.

Image of Create a Custom Object Detection Model with Photos

Now that we’ve created the model, the next step is to upload photos to build our dataset. For this project, I’ll cover the basics of labeling and annotation, but the main focus will be on recognizing pets, specifically dogs and cats. To make it more personal, I’ll also include my own pet, Fender, in the dataset.

Image of Create a Custom Object Detection Model with Photos

You can obtain datasets from here. You must click on the black button and download it as a zip, then extract it.

Image of Create a Custom Object Detection Model with Photos

Now, going back to the upload step, I added 30 photos (10 for each pet) that include dogs, cats, and of course, my own dog, Fender.

Image of Create a Custom Object Detection Model with Photos

After clicking on Label Myself on the right side, a new UI will appear. Here, we need to select the Smart Polygon tool from the sidebar. This tool uses AI to speed up the labeling process by automatically detecting the object. It’s especially useful for simple shapes, though for more complex objects, other tools might be better. The best part? Smart Polygon is fast, accurate, and completely free!

Image of Create a Custom Object Detection Model with Photos

Here, I simply clicked on my dog, and the tool automatically recognized the shape. In this case, it detected everything correctly, but if any parts were missing, you could just click on those areas, and the tool would try to add them. Similarly, if it includes extra areas you don’t want, a click will remove them.

Image of Create a Custom Object Detection Model with Photos

At this stage, you’ll see an option on the left with a slider that lets you choose between Complex and Simple. This setting mainly matters if you’re training the model on your own GPU, since more points mean longer training times. Personally, I usually keep it in the middle for a good balance.

Image of Create a Custom Object Detection Model with Photos

After clicking the green Enter button, you’ll see a section for Classes. Here, you can type the name of the object you’re annotating. For example, since I’m labeling my own dog, I’ll create a class called Fender. For other pets, we can keep it more general by using classes like Cat or Dog.

Image of Create a Custom Object Detection Model with Photos

Here, you can see the points I mentioned earlier. In this example, I didn’t use manual annotation, but the points are shown as a reference. For this object, I labeled the class as Cat.

Image of Create a Custom Object Detection Model with Photos

Now, for this image, I labeled it as a Dog.

Image of Create a Custom Object Detection Model with Photos

Finally, once all the images are annotated, go to the top and click the green Confirm button. This will give you the option to add images to the dataset. After that, just press the purple button to complete the step.

Image of Create a Custom Object Detection Model with Photos

After that, it will redirect you to the dataset, and then you’ll see all the photos annotated by you. Then, as the arrow indicates, click on it.

Image of Create a Custom Object Detection Model with Photos

Then you’ll see this part where you can add more effects to each photo. For now, click on continue.

Image of Create a Custom Object Detection Model with Photos

Next, you’ll reach the Augmentation section. This step is really helpful when your object appears in different environments, like bright or dim lighting. I usually apply adjustments to contrast, saturation, and brightness to make the model more robust.

Image of Create a Custom Object Detection Model with Photos

Finally, the tool generates additional photos for your dataset. These are the same images but with different effects applied, such as increased brightness, higher saturation, and other adjustments I mentioned earlier.

Image of Create a Custom Object Detection Model with Photos

Now you’ll see the final dataset, ready for training your model. At this stage, you can choose to train it yourself or use a platform like Roboflow. From my experience, training on your own requires a capable GPU like an NVIDIA 20 series or higher and around 16 GB of RAM, and a Linux environment such as Ubuntu 22.04. If you don’t meet these requirements, go to Testing and Using Your Model.

Image of Create a Custom Object Detection Model with Photos

Training Your Custom Model on Your PC

For this step, based on my experience, you’ll need the following

Laptop with a GPU such as RTX 20 series or higher
Minimum of 16 GB RAM
Linux distribution
CUDA installed (link for installation)

Now, after having the previous requirements, we are going to export our dataset from Roboflow. For this, we need the following.

Open Visual Studio Code and create or open a folder—for example, mine is named training-model-tuto. Inside this folder, create a new file called model.ipynb. When you open it, VS Code may prompt you to install Jupyter. Simply accept the prompts and confirm the installations.

Image of Create a Custom Object Detection Model with Photos

Jupyter notebooks are organized into blocks (or cells), where you can run code step by step. For our first block, let’s add the following code:

Copy Code

import torch
import cv2
from roboflow import Roboflow
import dotenv
import os
print("GPU Available:", torch.cuda.is_available())
!nvidia-smi

Now, after clicking on run the block we should see the following:

Image of Create a Custom Object Detection Model with Photos

In the next block, we’ll export our dataset. To do this, simply add the following code in a new cell:

Copy Code

dotenv.load_dotenv()
rf = Roboflow(api_key=os.getenv("ROBOFLOW_API_KEY"))
project = rf.workspace(os.getenv("ROBOFLOW_WORKSPACE")).project(os.getenv("ROBOFLOW_PROJECT_ID"))
version = project.version(1)
dataset = version.download("yolov8")

Next, we’re going to use a .env file, which helps keep sensitive information—such as passwords or API keys—secure and separate from our main code. To set this up, create a new file named .env.

Image of Create a Custom Object Detection Model with Photos

In this file, we only add these 3 variables

Copy Code

ROBOFLOW_WORKSPACE= 
ROBOFLOW_API_KEY= 
ROBOFLOW_PROJECT_ID=

So, how do we get these variables? In Roboflow, you’ll need to navigate to the following section:

Image of Create a Custom Object Detection Model with Photos

When you click on the three dots, as shown in the image, you’ll see the option to copy the project ID. Copy it and paste it into the corresponding variable.

For the workspace, simply use the same name you chose when creating it earlier.

Finally, for the API key, make sure to use the private key. To find it, go to the left sidebar in Roboflow, open Settings, and look for the API Keys dropdown. There you’ll see both a public and a private key—copy the private key and add it to your .env file.

Image of Create a Custom Object Detection Model with Photos

Then, after adding the variables to the .env file, we should go to the versions page in our project.

Image of Create a Custom Object Detection Model with Photos

Next, go to Download Dataset and apply the same settings as shown in the example. Once you click Continue, a code snippet will be generated for you.

Image of Create a Custom Object Detection Model with Photos

In our code block, we should add the following, which is the same as given by roboflow

Copy Code

dotenv.load_dotenv()
rf = Roboflow(api_key=os.getenv("ROBOFLOW_API_KEY"))
project = rf.workspace("larc-hbgqy").project("digikey-svcln")
version = project.version(1)
export_info = version.export("yolov8")
dataset = version.download("yolov8")

Click the Run button, and your dataset will be installed. Once it’s ready, create a new code block and add the following:

Copy Code

from ultralytics import YOLO
# Load the pre-trained YOLOv8n model.
# "n" = nano (fastest, lightest) -> good for low CPU/GPU.
# You can switch to "yolov8s.pt" (small), "yolov8m.pt" (medium),
# "yolov8l.pt" (large), or "yolov8x.pt" (extra large).
# ⚠️ If you have limited resources, keep "yolov8n.pt".
model = YOLO("yolov8n.pt")
# Train the model with the downloaded dataset
results = model.train(
   data="digiKey-1/data.yaml",  # Make sure the path points to your dataset
   epochs=50,                   # ↓ Lower this (e.g. 10–20) if training is too slow
   imgsz=640,                   # ↓ Reduce (e.g. 320 or 416) to save memory/CPU
   batch=4,                     # ↓ Decrease (e.g. 2 or even 1) if you get OOM errors
   perspective=0.0005,          # Data augmentation (safe to keep, but you can disable to speed up)
   scale=0.4,                   # Same as above
   translate=0.04,              # Same as above
   degrees=5,                   # Same as above
   shear=1,                     # Same as above
   hsv_s=0.5,                   # Same as above
   hsv_v=0.3,                   # Same as above
   flipud=0.2,                  # Same as above
   patience=15,                 # Early stopping patience (reduce if you want faster experiments)
   save_period=20,              # How often to save checkpoints (increase to save disk space)
   name="pet-model",            # Change to your preferred model name
   project="runs/detect",       # Training results folder
   exist_ok=True                # Overwrite previous runs if they exist
)

# NOTE: Training may take a while depending on your hardware.

Before running the block, open the newly created folder and edit the data.yaml file. Update the paths for train, test, and val as shown.

Copy Code

train: train/images 
test: train/images 
val: train/images

Since this is a quick demo model, we don’t need to use test or val, but for a more robust model, including them is highly recommended. After making these changes, click the Run button on the block to start training your model.

Training time can vary depending on your GPU; it may take anywhere from a few minutes to several hours. If you have a larger dataset, it will take longer. And if you’re using a laptop, make sure it’s plugged in to allow the GPU to run at full performance.

Image of Create a Custom Object Detection Model with Photos

If it works, you should see the following output. For using your model as a webcam or just validating with photos, you can use the following code:

Copy Code

from ultralytics import YOLO
import cv2
# Load your trained model
model = YOLO("runs/detect/pet-model/weights/best.pt")
# ----------------------------
# 1) Run inference on a photo
# ----------------------------
def test_on_photo(image_path):
   results = model(image_path)  # quitamos show=True
   # Obtener el frame anotado (con las detecciones dibujadas)
   annotated_img = results[0].plot()
   # Mostrar en una ventana de OpenCV
  cv2.imshow("YOLOv8 Photo Detection", annotated_img)
   # Esperar a que presiones una tecla
   cv2.waitKey(0)
   # Cerrar ventanas
   cv2.destroyAllWindows()

# ----------------------------
# 2) Real-time detection (webcam)
# ----------------------------
def test_on_webcam():
   cap = cv2.VideoCapture(0)  # 0 = default camera
   if not cap.isOpened():
       print("❌ Error: Could not open webcam")
       return


   while True:
       ret, frame = cap.read()
       if not ret:
           print("❌ Failed to grab frame")
           break
       # Run YOLO inference on the frame
       results = model(frame)
       # Draw results on the frame
       annotated_frame = results[0].plot()
       # Show the frame
       cv2.imshow("YOLOv8 Detection (Press Q to quit)", annotated_frame)
       # Exit on pressing 'q'
       if cv2.waitKey(1) & 0xFF == ord("q"):
           break
   cap.release()
   cv2.destroyAllWindows()
# Example usage:
test_on_photo("fender.jpeg")

I’m using the test_on_photo function, so all you need to do is add the filename of the photo as an argument. When you run it, the function will display the result. This photo is new, meaning it wasn’t part of the dataset used for training.

Testing and Using Your Model

Now, using the roboflow training model, we can just click on the purple button and then press on the Roboflow Instant Model.

Image of Create a Custom Object Detection Model with Photos

After a few minutes, the model finishes training, and you can start uploading new photos. In my case, I added some new images, and the model instantly recognized Fender, cats, and dogs. At this point, we have a well-trained model ready to detect our objects accurately.

Image of Create a Custom Object Detection Model with Photos

In this project, we went through the entire process of creating a custom object detection model: uploading and labeling images, applying augmentations, generating additional dataset variations, and finally training the model. We saw how it can instantly recognize our objects, like Fender, cats, and dogs.

To build a really robust model, it’s important to take lots of photos and maintain standardization in the process, especially when applying augmentations. This helps the model handle different conditions, such as varying lighting, angles, or backgrounds.

I really encourage you to experiment with your own objects, from personal items to pets or anything unique you want your model to detect. The more diverse and well-annotated your dataset, the smarter and more reliable your model will become.

Have questions or comments? Continue the conversation on TechForum, DigiKey's online community and technical resource.