ControlNet reference only! Freely modify images. A detailed usage explanation.

Have you ever wished to generate multiple images of a person using just one image?

ControlNet reference only is the perfect tool to fulfill this desire. With this technique, you can easily modify the background, pose, clothing, and many other aspects.

In this article, we will guide you through the installation process, demonstrate how to use ControlNet reference only, present validation results, and provide detailed settings. By the end of this article, you’ll have a comprehensive understanding of ControlNet reference only.

What does “ControlNet reference only” mean?

“ControlNet reference only” is a technology that enables style transfer. To put it simply, it is a function that allows for the conversion or generation of images while preserving the features that are not specified in the given prompt. For instance, you can modify an image by specifying changes to the clothing or background while keeping the face unchanged. In other words, you can create a new image based on a single image, reflecting the conditions expressed in the prompt.

Distinctions from basic img2img

Similar tasks can be accomplished with img2img, but the difference lies in the level of accuracy in preserving the original image’s characteristics. Particularly, facial features are prone to alteration with img2img, whereas using ControlNet reference only ensures preservation.

Possible applications of ControlNet reference include:

  • Clothing transformation: ControlNet reference can be used to modify or transform the appearance of clothing in digital imagery.
  • Background transformation: ControlNet reference offers the ability to alter the background or environment in which the subject is depicted.
  • Pose transformation: ControlNet reference allows for the manipulation or adjustment of the pose or position of objects or individuals in an image.
  • Live-action to animation conversion: ControlNet reference enables the conversion of live-action footage into animation by applying digital enhancements or modifications.

The range of possibilities is vast and limited only by imagination. However, it is important to acknowledge that ControlNet reference is still a relatively new technology and may have some stability issues. Therefore, this study aims to explore the efficacy of using ControlNet reference in various scenarios.

How to Use ControlNet Reference

Installing ControlNet

To utilize ControlNet Reference, you must first install ControlNet, an extension of the Stable Diffusion Web UI. If you have not yet installed ControlNet, please refer to the following article for installation instructions:

What is ControlNet? What Can It Do? A Comprehensive Guide to Installing ControlNet on Stable Diffusion Web UI (

No Model Download Required for ControlNet Reference

Normally, when using ControlNet, you would need to download a model. However, with ControlNet Reference, there is no need to download any models.

Steps for Usage in the Web UI

Controlnet reference only, clothing conversion to t-shirts.

We have successfully converted a uniform into a t-shirt with using controlnet reference only, while ensuring that the facial features remain unchanged.

Prompt: 1girl, a 20 years old pretty Japanese girl in classroom. blackboard, t-shirts

I transformed the background to a street scene.

Prompt: 1girl, a 20 years old pretty Japanese girl on the uniform

The image is provided below:

Subsequently, I altered the backdrop once again, this time to a beach environment, while also changing the outfit to a white bikini.

Prompt: 1girl, a 20 years old pretty Japanese girl on the beach.white bikini

I have made the requested changes to the background, clothing, and pose.

Prompt: 1girl, a 20 years old pretty Japanese girl on the beach.white bikini, arms up

Anime adaptation with controlnet reference only

We can create an anime adaptation of a live-action image by not only using prompts but also modifying the model.

Model: AnythingV5Ink_ink

The prompt has been changed to: A pretty Japanese girl, 20 years old, in a classroom wearing a school uniform, standing in front of a blackboard.

Let’s generate the image again with these modifications.

Certainly, even after the anime adaptation, some traces of the original image can still be seen.

Here is the rewritten section:

To translate this article into English, follow these steps:

  1. Use the AnythingV5Ink_ink model to generate the original image and save it in ControlNet with the reference_only setting.
  2. Switch the model to beautifulRealistic_brav5 and generate the image.

Here are the links to the images:

Image 1
Image 2

Although the images appear realistic, they may seem slightly unnatural. To address this, change the Control Mode to “My prompt is more important.”

The resulting images are as follows:

Image 3
Image 4

There doesn’t appear to be a significant change between the images.

Explanation of different settings for ControlNet for reference purposes only.

Style Fidelity

Style Fidelity refers to the level of faithfulness to the style. A higher Style Fidelity means that the reference image has more influence and the prompt has less influence. Conversely, a lower Style Fidelity means that the reference image has less influence and the prompt has more influence. This setting is only effective when the Control Mode is set to “Balanced”.


The reference-only Preprocessor has three options:

  • reference_only (attn)
  • reference_adain (adain)
  • reference_adain+attn (adain + attn)

The default option is reference_only. The reference_adain Preprocessor was added based on the latest research mentioned in the following article:

Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization

According to the author’s recommendation:

  1. “reference_adain+attn, Style Fidelity=1.0” is the current state-of-the-art method, but it can be overly strong, so it is not the recommended default.
  2. It is still recommended to use “reference_only + Style Fidelity=0.5” as the default option because it is more robust.

Therefore, it is suggested to use the default option for stability. I have also tested it and found that using the default option produces good results.

Preprocessor Comparison

Let’s compare the default option and the latest method using the previous examples.

Left: Reference Image, Middle: “reference_only + Style Fidelity=0.5”, Right: “reference_adain+attn, Style Fidelity=1.0”

The results are shown in the provided image.

For the latest method, while it retains some features from the original image, it may produce slightly different images.