ControlNet Segmentation: How to Use and Explanation

ControlNet has so many features that it’s hard to know which one to use, right? To address the concerns of such individuals, we are explaining practical use cases of ControlNet. In this article, we will focus on ControlNet Segmentation. By reading this article, the following questions should be answered:

What is ControlNet Segmentation?
What are the specific practical uses of ControlNet Segmentation?
I want to know the difference between similar methods like ControlNet Depth and NormalMap.

What is Segmentation?
1. Difference between “Generating Images from Only Prompt” and “Using Segmentation”
2. Difference with ControlNet Depth and Normal
What can be achieved with Segmentation?
1. Reconstruction from Silhouettes
2. Generating Images with Consistent Composition
How to use ControlNet Segmentation?

What is Segmentation?

Segmentation is a technique used to separate and identify different objects or regions within an image or scene. It involves labeling each pixel or group of pixels to create a “map” of the different areas present in the image.

Difference between “Generating Images from Only Prompt” and “Using Segmentation”

When we refer to “generating images from only prompt,” we mean creating an image based on a given description or input. On the other hand, using segmentation involves using labeled regions in an image to manipulate or enhance specific areas or objects.

Difference with ControlNet Depth and Normal

ControlNet Depth and Normal are techniques related to capturing and representing spatial information in an image. ControlNet Depth estimates the distance or depth of objects in a scene, while ControlNet Normal estimates the surface orientation or normal vectors of objects. Segmentation, on the other hand, focuses on identifying and separating different regions or objects based on their visual characteristics.

What can be achieved with Segmentation?

Reconstruction from Silhouettes

Segmenting foreground and background regions allows for reconstructing missing information or inferring the shape and appearance of objects.

Generating Images with Consistent Composition

With segmented regions, it becomes possible to generate new images with consistent compositions, preserving the arrangement of objects while changing their visual attributes.

How to use ControlNet Segmentation?

Preparing for ControlNet Segmentation

ControlNet Segmentation is a feature of ControlNet, an extension of Stable Diffusion Web UI. To use ControlNet Segmentation, ControlNet must be installed. If you have not installed it yet, please refer to the following article on how to install ControlNet.

What is ControlNet? What Can It Do? A Comprehensive Guide to Installing ControlNet on Stable Diffusion Web UI (kindanai.com)

Installing ControlNet Segmentation

To use ControlNet Segmentation, you need the ControlNet Model. Download the following two files from the link below and place them in stable-diffusion-webui/models/ControlNet.

control_v11p_sd15_segmentation.pth
control_v11p_sd15_segmentation.yaml

lllyasviel/ControlNet-v1-1 at main (huggingface.co)

Using ControlNet Segmentation

Follow the steps below to configure the ControlNet menu.

Enter the prompt for the image generation.
Click the “▼” to open the ControlNet menu.
Set the reference image in the ControlNet menu screen.
Check the “Enable” box to activate ControlNet.
Select “Segmentation” for the Control Type. This will set up the Preprocessor and ControlNet Model.
Click the feature extraction button “💥” to perform feature extraction. The preprocessing will be applied, and the result of feature extraction will be displayed.
With ControlNet Segmentation applied, click “Generate” to create the image.

ControlNet Segmentation Preprocessors

The following are the details of each preprocessing option for ControlNet Segmentation:

seg_ofade20k: Generates segmentation maps using the ADE20K dataset, which is a comprehensive dataset for scene understanding and object categorization. This preprocessor accurately identifies and classifies elements within an image, enhancing their manageability.
seg_ofcoco: Generates segmentation maps using the COCO dataset, which is widely used for computer vision tasks like object detection, segmentation, and caption generation. Similar to seg_ofade20k, this preprocessor aims to detect and categorize objects present in the image.
seg_ufade20k: Also generates segmentation maps using the ADE20K dataset, but is considered to have lower performance compared to seg_ofade20k and seg_ofcoco. *1

These preprocessors are utilized in conjunction with the ControlNet Segmentation model and utilize the segmentation map as a conditional input during the image generation process.

It is strongly recommended to choose either seg_ofade20k or seg_ofcoco as they offer significantly better performance compared to seg_ufade20k.

GitHub – continue-revolution/sd-webui-segment-anything: Segment Anything for Stable Diffusion WebUI

*1: For enhanced accuracy, it is advisable to use seg_ofade20k and seg_ofcoco instead of seg_ufade20k.