使用diffusers训练DreamBooth模型

0xSoul 发表于 2023-7-26 20:05:32

# DreamBooth

(https://arxiv.org/abs/2208.12242) 是一种用于个性化文本到图像模型的方法，类似于 Stable Diffusion，只需要几张（3-5张）主题的图像。它使得模型能够生成主题在不同场景、姿势和视角下的情境化图像。

!(https://dreambooth.github.io/DreamBooth_files/teaser_static.jpg)

Dreambooth的示例来自[项目的博客页面](https://dreambooth.github.io/)。本指南将展示如何使用[`CompVis/stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4)模型在不同GPU尺寸和Flax上对Dreambooth进行微调。如果你有兴趣深入了解并查看其工作原理，所有在本指南中使用的Dreambooth训练脚本都可以在[此处](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth)找到。

在运行脚本之前，请确保安装了该库的训练依赖项。我们还建议从`main` GitHub分支安装🧨 Diffusers：

```
pip install git+https://github.com/huggingface/diffusers
pip install -U -r diffusers/examples/dreambooth/requirements.txt
```

xFormers不是训练要求的一部分，但是我们建议如果可能的话[安装它](https://huggingface.co/docs/diffusers/v0.18.2/en/optimization/xformers)，因为它可以使训练速度更快，内存占用更少。

在设置好所有依赖项后，使用以下方式初始化[🤗 Accelerate](https://github.com/huggingface/accelerate/)环境：

```
accelerate config
```

要设置一个默认的🤗 Accelerate环境，无需选择任何配置，请按如下步骤操作：

```
accelerate config default
```

如果你的环境不支持交互式Shell（例如笔记本），你可以使用以下方法：

```
from accelerate.utils import write_basic_config

write_basic_config()
```

最后，使用以下方法从(https://huggingface.co/datasets/diffusers/dog-example)下载几张狗的图片到DreamBooth：

```
from huggingface_hub import snapshot_download

local_dir = "./dog"
snapshot_download(
"diffusers/dog-example",
local_dir=local_dir,
repo_type="dataset",
ignore_patterns=".gitattributes",
)
```

要使用自己的数据集，请查看(https://huggingface.co/docs/diffusers/v0.18.2/en/training/create_dataset)指南。

## 微调（Finetuning）

DreamBooth的微调对超参数非常敏感，容易过拟合。我们建议您查看我们的[深度分析](https://huggingface.co/blog/dreambooth)，其中包含针对不同主题的推荐设置，以帮助您选择适当的超参数。

将`INSTANCE_DIR`环境变量设置为包含狗图片的目录路径。

指定`MODEL_NAME`环境变量（可以是Hub模型仓库ID或包含模型权重的目录路径），并将其传递给`pretrained_model_name_or_path`参数。`instance_prompt`参数是一个文本提示，其中包含唯一标识符，比如`sks`，以及图像所属的类别，在这个例子中是`a photo of a sks dog`。

```
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="./dog"
export OUTPUT_DIR="path_to_saved_model"
```

然后，您可以使用以下命令启动训练脚本（您可以在[此处](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py)找到完整的训练脚本）：

```
accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME\
--instance_data_dir=$INSTANCE_DIR \
--output_dir=$OUTPUT_DIR \
--instance_prompt="a photo of sks dog" \
--resolution=512 \
--train_batch_size=1 \
--gradient_accumulation_steps=1 \
--learning_rate=5e-6 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=400 \
--push_to_hub
```

如果您可以使用TPU或希望进行更快的训练，您可以尝试使用(https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_flax.py)。Flax训练脚本不支持梯度检查点或梯度累积，因此您需要一块至少拥有30GB内存的GPU。

在运行脚本之前，请确保已安装所需的依赖项：

```
pip install -U -r requirements.txt
```

指定`MODEL_NAME`环境变量（可以是Hub模型仓库ID或包含模型权重的目录路径），并将其传递给`pretrained_model_name_or_path`参数。`instance_prompt`参数是一个文本提示，其中包含唯一标识符，比如`sks`，以及图像所属的类别，在这个例子中是`a photo of a sks dog`。

现在，您可以使用以下命令启动训练脚本：

```
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax"
export INSTANCE_DIR="./dog"
export OUTPUT_DIR="path-to-save-model"

python train_dreambooth_flax.py \
--pretrained_model_name_or_path=$MODEL_NAME\
--instance_data_dir=$INSTANCE_DIR \
--output_dir=$OUTPUT_DIR \
--instance_prompt="a photo of sks dog" \
--resolution=512 \
--train_batch_size=1 \
--learning_rate=5e-6 \
--max_train_steps=400 \
--push_to_hub
```

## 使用保留先验信息的损失进行微调

先验信息保留用于避免过拟合和语言漂移（如果您有兴趣，可以查阅[论文](https://arxiv.org/abs/2208.12242)了解更多细节）。为了进行先验信息保留，您可以在训练过程中使用同一类别的其他图像。好处在于，您可以使用Stable Diffusion模型本身来生成这些图像！训练脚本将把生成的图像保存到您指定的本地路径。

作者建议为了先验信息保留而生成`num_epochs * num_samples`张图像。在大多数情况下，生成200-300张图像效果较好。

```
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="./dog"
export CLASS_DIR="path_to_class_images"
export OUTPUT_DIR="path_to_saved_model"

accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME\
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir=$CLASS_DIR \
--output_dir=$OUTPUT_DIR \
--with_prior_preservation --prior_loss_weight=1.0 \
--instance_prompt="a photo of sks dog" \
--class_prompt="a photo of dog" \
--resolution=512 \
--train_batch_size=1 \
--gradient_accumulation_steps=1 \
--learning_rate=5e-6 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--num_class_images=200 \
--max_train_steps=800 \
--push_to_hub
```

**JAX**

```
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax"
export INSTANCE_DIR="./dog"
export CLASS_DIR="path-to-class-images"
export OUTPUT_DIR="path-to-save-model"

python train_dreambooth_flax.py \
--pretrained_model_name_or_path=$MODEL_NAME\
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir=$CLASS_DIR \
--output_dir=$OUTPUT_DIR \
--with_prior_preservation --prior_loss_weight=1.0 \
--instance_prompt="a photo of sks dog" \
--class_prompt="a photo of dog" \
--resolution=512 \
--train_batch_size=1 \
--learning_rate=5e-6 \
--num_class_images=200 \
--max_train_steps=800 \
--push_to_hub
```

## Finetuning the text encoder and UNet

The script also allows you to finetune the `text_encoder` along with the `unet`. In our experiments (check out the (https://huggingface.co/blog/dreambooth) post for more details), this yields much better results, especially when generating images of faces.

Training the text encoder requires additional memory and it won’t fit on a 16GB GPU. You’ll need at least 24GB VRAM to use this option.

Pass the `--train_text_encoder` argument to the training script to enable finetuning the `text_encoder` and `unet`:

Pytorch

Hide Pytorch content

Copied

```
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="./dog"
export CLASS_DIR="path_to_class_images"
export OUTPUT_DIR="path_to_saved_model"

accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME\
--train_text_encoder \
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir=$CLASS_DIR \
--output_dir=$OUTPUT_DIR \
--with_prior_preservation --prior_loss_weight=1.0 \
--instance_prompt="a photo of sks dog" \
--class_prompt="a photo of dog" \
--resolution=512 \
--train_batch_size=1 \
--use_8bit_adam \
--gradient_checkpointing \
--learning_rate=2e-6 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--num_class_images=200 \
--max_train_steps=800 \
--push_to_hub
```

JAX

Hide JAX content

Copied

```
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax"
export INSTANCE_DIR="./dog"
export CLASS_DIR="path-to-class-images"
export OUTPUT_DIR="path-to-save-model"

python train_dreambooth_flax.py \
--pretrained_model_name_or_path=$MODEL_NAME\
--train_text_encoder \
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir=$CLASS_DIR \
--output_dir=$OUTPUT_DIR \
--with_prior_preservation --prior_loss_weight=1.0 \
--instance_prompt="a photo of sks dog" \
--class_prompt="a photo of dog" \
--resolution=512 \
--train_batch_size=1 \
--learning_rate=2e-6 \
--num_class_images=200 \
--max_train_steps=800 \
--push_to_hub
```

## Finetuning with LoRA

You can also use Low-Rank Adaptation of Large Language Models (LoRA), a fine-tuning technique for accelerating training large models, on DreamBooth. For more details, take a look at the (https://huggingface.co/docs/diffusers/v0.18.2/en/training/lora#dreambooth) guide.

## Saving checkpoints while training

It’s easy to overfit while training with Dreambooth, so sometimes it’s useful to save regular checkpoints during the training process. One of the intermediate checkpoints might actually work better than the final model! Pass the following argument to the training script to enable saving checkpoints:

Copied

```
--checkpointing_steps=500
```

This saves the full training state in subfolders of your `output_dir`. Subfolder names begin with the prefix `checkpoint-`, followed by the number of steps performed so far; for example, `checkpoint-1500` would be a checkpoint saved after 1500 training steps.

### Resume training from a saved checkpoint

If you want to resume training from any of the saved checkpoints, you can pass the argument `--resume_from_checkpoint` to the script and specify the name of the checkpoint you want to use. You can also use the special string `"latest"` to resume from the last saved checkpoint (the one with the largest number of steps). For example, the following would resume training from the checkpoint saved after 1500 steps:

Copied

```
--resume_from_checkpoint="checkpoint-1500"
```

This is a good opportunity to tweak some of your hyperparameters if you wish.

### Inference from a saved checkpoint

Saved checkpoints are stored in a format suitable for resuming training. They not only include the model weights, but also the state of the optimizer, data loaders, and learning rate.

If you have **`"accelerate>=0.16.0"`** installed, use the following code to run inference from an intermediate checkpoint.

Copied

```
from diffusers import DiffusionPipeline, UNet2DConditionModel
from transformers import CLIPTextModel
import torch

# Load the pipeline with the same arguments (model, revision) that were used for training
model_id = "CompVis/stable-diffusion-v1-4"

unet = UNet2DConditionModel.from_pretrained("/sddata/dreambooth/daruma-v2-1/checkpoint-100/unet")

# if you have trained with `--args.train_text_encoder` make sure to also load the text encoder
text_encoder = CLIPTextModel.from_pretrained("/sddata/dreambooth/daruma-v2-1/checkpoint-100/text_encoder")

pipeline = DiffusionPipeline.from_pretrained(model_id, unet=unet, text_encoder=text_encoder, dtype=torch.float16)
pipeline.to("cuda")

# Perform inference, or save, or push to the hub
pipeline.save_pretrained("dreambooth-pipeline")
```

If you have **`"accelerate<0.16.0"`** installed, you need to convert it to an inference pipeline first:

Copied

```
from accelerate import Accelerator
from diffusers import DiffusionPipeline

# Load the pipeline with the same arguments (model, revision) that were used for training
model_id = "CompVis/stable-diffusion-v1-4"
pipeline = DiffusionPipeline.from_pretrained(model_id)

accelerator = Accelerator()

# Use text_encoder if `--train_text_encoder` was used for the initial training
unet, text_encoder = accelerator.prepare(pipeline.unet, pipeline.text_encoder)

# Restore state from a checkpoint path. You have to use the absolute path here.
accelerator.load_state("/sddata/dreambooth/daruma-v2-1/checkpoint-100")

# Rebuild the pipeline with the unwrapped models (assignment to .unet and .text_encoder should work too)
pipeline = DiffusionPipeline.from_pretrained(
model_id,
unet=accelerator.unwrap_model(unet),
text_encoder=accelerator.unwrap_model(text_encoder),
)

# Perform inference, or save, or push to the hub
pipeline.save_pretrained("dreambooth-pipeline")
```

## Optimizations for different GPU sizes

Depending on your hardware, there are a few different ways to optimize DreamBooth on GPUs from 16GB to just 8GB!

### xFormers

(https://github.com/facebookresearch/xformers) is a toolbox for optimizing Transformers, and it includes a (https://facebookresearch.github.io/xformers/components/ops.html#module-xformers.ops) mechanism that is used in 🧨 Diffusers. You’ll need to (https://huggingface.co/docs/diffusers/v0.18.2/en/training/optimization/xformers) and then add the following argument to your training script:

Copied

```
--enable_xformers_memory_efficient_attention
```

xFormers is not available in Flax.

### Set gradients to none

Another way you can lower your memory footprint is to (https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html) to `None` instead of zero. However, this may change certain behaviors, so if you run into any issues, try removing this argument. Add the following argument to your training script to set the gradients to `None`:

Copied

```
--set_grads_to_none
```

### 16GB GPU

With the help of gradient checkpointing and (https://github.com/TimDettmers/bitsandbytes) 8-bit optimizer, it’s possible to train DreamBooth on a 16GB GPU. Make sure you have bitsandbytes installed:

Copied

```
pip install bitsandbytes
```

Then pass the `--use_8bit_adam` option to the training script:

Copied

```
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="./dog"
export CLASS_DIR="path_to_class_images"
export OUTPUT_DIR="path_to_saved_model"

accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME\
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir=$CLASS_DIR \
--output_dir=$OUTPUT_DIR \
--with_prior_preservation --prior_loss_weight=1.0 \
--instance_prompt="a photo of sks dog" \
--class_prompt="a photo of dog" \
--resolution=512 \
--train_batch_size=1 \
--gradient_accumulation_steps=2 --gradient_checkpointing \
--use_8bit_adam \
--learning_rate=5e-6 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--num_class_images=200 \
--max_train_steps=800 \
--push_to_hub
```

### 12GB GPU

To run DreamBooth on a 12GB GPU, you’ll need to enable gradient checkpointing, the 8-bit optimizer, xFormers, and set the gradients to `None`:

Copied

```
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="./dog"
export CLASS_DIR="path-to-class-images"
export OUTPUT_DIR="path-to-save-model"

accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME\
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir=$CLASS_DIR \
--output_dir=$OUTPUT_DIR \
--with_prior_preservation --prior_loss_weight=1.0 \
--instance_prompt="a photo of sks dog" \
--class_prompt="a photo of dog" \
--resolution=512 \
--train_batch_size=1 \
--gradient_accumulation_steps=1 --gradient_checkpointing \
--use_8bit_adam \
--enable_xformers_memory_efficient_attention \
--set_grads_to_none \
--learning_rate=2e-6 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--num_class_images=200 \
--max_train_steps=800 \
--push_to_hub
```

### 8 GB GPU

For 8GB GPUs, you’ll need the help of (https://www.deepspeed.ai/) to offload some tensors from the VRAM to either the CPU or NVME, enabling training with less GPU memory.

Run the following command to configure your 🤗 Accelerate environment:

Copied

```
accelerate config
```

During configuration, confirm that you want to use DeepSpeed. Now it’s possible to train on under 8GB VRAM by combining DeepSpeed stage 2, fp16 mixed precision, and offloading the model parameters and the optimizer state to the CPU. The drawback is that this requires more system RAM, about 25 GB. See (https://huggingface.co/docs/accelerate/usage_guides/deepspeed) for more configuration options.

You should also change the default Adam optimizer to DeepSpeed’s optimized version of Adam [`deepspeed.ops.adam.DeepSpeedCPUAdam`](https://deepspeed.readthedocs.io/en/latest/optimizers.html#adam-cpu) for a substantial speedup. Enabling `DeepSpeedCPUAdam` requires your system’s CUDA toolchain version to be the same as the one installed with PyTorch.

8-bit optimizers don’t seem to be compatible with DeepSpeed at the moment.

Launch training with the following command:

Copied

```
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="./dog"
export CLASS_DIR="path_to_class_images"
export OUTPUT_DIR="path_to_saved_model"

accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--class_data_dir=$CLASS_DIR \
--output_dir=$OUTPUT_DIR \
--with_prior_preservation --prior_loss_weight=1.0 \
--instance_prompt="a photo of sks dog" \
--class_prompt="a photo of dog" \
--resolution=512 \
--train_batch_size=1 \
--sample_batch_size=1 \
--gradient_accumulation_steps=1 --gradient_checkpointing \
--learning_rate=5e-6 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--num_class_images=200 \
--max_train_steps=800 \
--mixed_precision=fp16 \
--push_to_hub
```

## Inference

Once you have trained a model, specify the path to where the model is saved, and use it for inference in the (https://huggingface.co/docs/diffusers/v0.18.2/en/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline). Make sure your prompts include the special `identifier` used during training (`sks` in the previous examples).

If you have **`"accelerate>=0.16.0"`** installed, you can use the following code to run inference from an intermediate checkpoint:

Copied

```
from diffusers import DiffusionPipeline
import torch

model_id = "path_to_saved_model"
pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

prompt = "A photo of sks dog in a bucket"
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images

image.save("dog-bucket.png")
```

You may also run inference from any of the (https://huggingface.co/docs/diffusers/v0.18.2/en/training/dreambooth#inference-from-a-saved-checkpoint).

## IF

You can use the lora and full dreambooth scripts to train the text to image (https://huggingface.co/DeepFloyd/IF-I-XL-v1.0) and the stage II upscaler (https://huggingface.co/DeepFloyd/IF-II-L-v1.0).

Note that IF has a predicted variance, and our finetuning scripts only train the models predicted error, so for finetuned IF models we switch to a fixed variance schedule. The full finetuning scripts will update the scheduler config for the full saved model. However, when loading saved LoRA weights, you must also update the pipeline’s scheduler config.

Copied

```
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0")

pipe.load_lora_weights("<lora weights path>")

# Update scheduler config to fixed variance schedule
pipe.scheduler = pipe.scheduler.__class__.from_config(pipe.scheduler.config, variance_type="fixed_small")
```

Additionally, a few alternative cli flags are needed for IF.

`--resolution=64`: IF is a pixel space diffusion model. In order to operate on un-compressed pixels, the input images are of a much smaller resolution.

`--pre_compute_text_embeddings`: IF uses (https://huggingface.co/docs/transformers/model_doc/t5) for its text encoder. In order to save GPU memory, we pre compute all text embeddings and then de-allocate T5.

`--tokenizer_max_length=77`: T5 has a longer default text length, but the default IF encoding procedure uses a smaller number.

`--text_encoder_use_attention_mask`: T5 passes the attention mask to the text encoder.

### Tips and Tricks

We find LoRA to be sufficient for finetuning the stage I model as the low resolution of the model makes representing finegrained detail hard regardless.For common and/or not-visually complex object concepts, you can get away with not-finetuning the upscaler. Just be sure to adjust the prompt passed to the upscaler to remove the new token from the instance prompt. I.e. if your stage I prompt is “a sks dog”, use “a dog” for your stage II prompt.

For finegrained detail like faces that aren’t present in the original training set, we find that full finetuning of the stage II upscaler is better than LoRA finetuning stage II.

For finegrained detail like faces, we find that lower learning rates along with larger batch sizes work best.

For stage II, we find that lower learning rates are also needed.

We found experimentally that the DDPM scheduler with the default larger number of denoising steps to sometimes work better than the DPM Solver scheduler used in the training scripts.

### Stage II additional validation images

The stage II validation requires images to upscale, we can download a downsized version of the training set:

Copied

```
from huggingface_hub import snapshot_download

local_dir = "./dog_downsized"
snapshot_download(
"diffusers/dog-example-downsized",
local_dir=local_dir,
repo_type="dataset",
ignore_patterns=".gitattributes",
)
```

### IF stage I LoRA Dreambooth

This training configuration requires ~28 GB VRAM.

Copied

```
export MODEL_NAME="DeepFloyd/IF-I-XL-v1.0"
export INSTANCE_DIR="dog"
export OUTPUT_DIR="dreambooth_dog_lora"

accelerate launch train_dreambooth_lora.py \
--report_to wandb \
--pretrained_model_name_or_path=$MODEL_NAME\
--instance_data_dir=$INSTANCE_DIR \
--output_dir=$OUTPUT_DIR \
--instance_prompt="a sks dog" \
--resolution=64 \
--train_batch_size=4 \
--gradient_accumulation_steps=1 \
--learning_rate=5e-6 \
--scale_lr \
--max_train_steps=1200 \
--validation_prompt="a sks dog" \
--validation_epochs=25 \
--checkpointing_steps=100 \
--pre_compute_text_embeddings \
--tokenizer_max_length=77 \
--text_encoder_use_attention_mask
```

### IF stage II LoRA Dreambooth

`--validation_images`: These images are upscaled during validation steps.

`--class_labels_conditioning=timesteps`: Pass additional conditioning to the UNet needed for stage II.

`--learning_rate=1e-6`: Lower learning rate than stage I.

`--resolution=256`: The upscaler expects higher resolution inputs

Copied

```
export MODEL_NAME="DeepFloyd/IF-II-L-v1.0"
export INSTANCE_DIR="dog"
export OUTPUT_DIR="dreambooth_dog_upscale"
export VALIDATION_IMAGES="dog_downsized/image_1.png dog_downsized/image_2.png dog_downsized/image_3.png dog_downsized/image_4.png"

python train_dreambooth_lora.py \
--report_to wandb \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--output_dir=$OUTPUT_DIR \
--instance_prompt="a sks dog" \
--resolution=256 \
--train_batch_size=4 \
--gradient_accumulation_steps=1 \
--learning_rate=1e-6 \
--max_train_steps=2000 \
--validation_prompt="a sks dog" \
--validation_epochs=100 \
--checkpointing_steps=500 \
--pre_compute_text_embeddings \
--tokenizer_max_length=77 \
--text_encoder_use_attention_mask \
--validation_images $VALIDATION_IMAGES \
--class_labels_conditioning=timesteps
```

### IF Stage I Full Dreambooth

`--skip_save_text_encoder`: When training the full model, this will skip saving the entire T5 with the finetuned model. You can still load the pipeline with a T5 loaded from the original model.`use_8bit_adam`: Due to the size of the optimizer states, we recommend training the full XL IF model with 8bit adam.

`--learning_rate=1e-7`: For full dreambooth, IF requires very low learning rates. With higher learning rates model quality will degrade. Note that it is likely the learning rate can be increased with larger batch sizes.

Using 8bit adam and a batch size of 4, the model can be trained in ~48 GB VRAM.

Copied

```
export MODEL_NAME="DeepFloyd/IF-I-XL-v1.0"

export INSTANCE_DIR="dog"
export OUTPUT_DIR="dreambooth_if"

accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME\
--instance_data_dir=$INSTANCE_DIR \
--output_dir=$OUTPUT_DIR \
--instance_prompt="a photo of sks dog" \
--resolution=64 \
--train_batch_size=4 \
--gradient_accumulation_steps=1 \
--learning_rate=1e-7 \
--max_train_steps=150 \
--validation_prompt "a photo of sks dog" \
--validation_steps 25 \
--text_encoder_use_attention_mask \
--tokenizer_max_length 77 \
--pre_compute_text_embeddings \
--use_8bit_adam \
--set_grads_to_none \
--skip_save_text_encoder \
--push_to_hub
```

### IF Stage II Full Dreambooth

`--learning_rate=5e-6`: With a smaller effective batch size of 4, we found that we required learning rates as low as 1e-8.

`--resolution=256`: The upscaler expects higher resolution inputs

`--train_batch_size=2` and `--gradient_accumulation_steps=6`: We found that full training of stage II particularly with faces required large effective batch sizes.

Copied

```
export MODEL_NAME="DeepFloyd/IF-II-L-v1.0"
export INSTANCE_DIR="dog"
export OUTPUT_DIR="dreambooth_dog_upscale"
export VALIDATION_IMAGES="dog_downsized/image_1.png dog_downsized/image_2.png dog_downsized/image_3.png dog_downsized/image_4.png"

accelerate launch train_dreambooth.py \
--report_to wandb \
--pretrained_model_name_or_path=$MODEL_NAME \
--instance_data_dir=$INSTANCE_DIR \
--output_dir=$OUTPUT_DIR \
--instance_prompt="a sks dog" \
--resolution=256 \
--train_batch_size=2 \
--gradient_accumulation_steps=6 \
--learning_rate=5e-6 \
--max_train_steps=2000 \
--validation_prompt="a sks dog" \
--validation_steps=150 \
--checkpointing_steps=500 \
--pre_compute_text_embeddings \
--tokenizer_max_length=77 \
--text_encoder_use_attention_mask \
--validation_images $VALIDATION_IMAGES \
--class_labels_conditioning timesteps \
--push_to_hub
```

0xSoul 发表于 2023-7-27 11:58:02

# 适合墙内的diffusers安装，并训练dreambooth模型

先将diffusers下载：[https://github.com/huggingface/diffusers](https://github.com/huggingface/diffusers)

![截屏2023-07-2711.56.21.png](data/attachment/forum/202307/27/115653jl25jt29f5nqqjok.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/300 "截屏2023-07-27 11.56.21.png")

得到一个 zip 包：diffusers-main.zip

然后新建一个 python 项目后，将该包安装：

```
pip install diffusers-main.zip
```

然后解压该包，把examples目录下的内容提到python 项目主目录。

如果我们只是训练dreambooth的话，在dreambooth目录下把所有文件夹都拷贝到项目主目录。

我是直接要使用train_dreambooth.py，这个文件是训练dreambooth用的。

到此为止，目录结构是这样的：

![截屏2023-07-2712.05.12.png](data/attachment/forum/202307/27/120547rda0ga0asgas7dd0.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/300 "截屏2023-07-27 12.05.12.png")

安装依赖：

```
pip install -U -r requirements.txt

```

设置accelerate环境默认：

```
accelerate config default
```

将图片集放入目录 pic

开始训练

0xSoul 发表于 2023-8-2 16:36:50

371行： logging_dir=logging_dir to project_dir=logging_dir

hzx0814 发表于 2024-5-13 19:15:53

感谢大佬分享:handshake

lcm99 发表于 2024-11-3 20:36:28

感谢大佬分享:)

页: [1]

金房子｜人工智能发烧友论坛｜AIGC发烧友论坛 | Stable Diffusion 论坛's Archiver

使用diffusers训练DreamBooth模型