InstantX/InstantID · Hugging Face

Inhalt

InstantID Model Card

Introduction

InstantID is a new state-of-the-art tuning-free method to achieve ID-Preserving generation with only single image, supporting various downstream tasks.

Usage

You can directly download the model in this repository. You also can download the model in python script:

from huggingface_hub import hf_hub_download
hf_hub_download(repo_id="InstantX/InstantID", filename="ControlNetModel/config.json", local_dir="./checkpoints")
hf_hub_download(repo_id="InstantX/InstantID", filename="ControlNetModel/diffusion_pytorch_model.safetensors", local_dir="./checkpoints")
hf_hub_download(repo_id="InstantX/InstantID", filename="ip-adapter.bin", local_dir="./checkpoints")

For face encoder, you need to manutally download via this URL to models/antelopev2.

# !pip install opencv-python transformers accelerate insightface
import diffusers
from diffusers.utils import load_image
from diffusers.models import ControlNetModel

import cv2 import torch import numpy as np from PIL import Image

from insightface.app import FaceAnalysis from pipeline_stable_diffusion_xl_instantid import StableDiffusionXLInstantIDPipeline, draw_kps

prepare 'antelopev2' under ./models

app = FaceAnalysis(name='antelopev2', root='./', providers=['CUDAExecutionProvider', 'CPUExecutionProvider']) app.prepare(ctx_id=0, det_size=(640, 640))

prepare models under ./checkpoints

face_adapter = f'./checkpoints/ip-adapter.bin' controlnet_path = f'./checkpoints/ControlNetModel'

load IdentityNet

controlnet = ControlNetModel.from_pretrained(controlnet_path, torch_dtype=torch.float16)

pipe = StableDiffusionXLInstantIDPipeline.from_pretrained( ... "stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet, torch_dtype=torch.float16 ... ) pipe.cuda()

load adapter

pipe.load_ip_adapter_instantid(face_adapter)

Then, you can customized your own face images

# load an image
image = load_image("your-example.jpg")

prepare face emb

face_info = app.get(cv2.cvtColor(np.array(face_image), cv2.COLOR_RGB2BGR)) face_info = sorted(face_info, key=lambda x:(x['bbox'][2]-x['bbox'][0])*x['bbox'][3]-x['bbox'][1])[-1] # only use the maximum face face_emb = face_info['embedding'] face_kps = draw_kps(face_image, face_info['kps'])

pipe.set_ip_adapter_scale(0.8)

prompt = "analog film photo of a man. faded film, desaturated, 35mm photo, grainy, vignette, vintage, Kodachrome, Lomography, stained, highly detailed, found footage, masterpiece, best quality" negative_prompt = "(lowres, low quality, worst quality:1.2), (text:1.2), watermark, painting, drawing, illustration, glitch, deformed, mutated, cross-eyed, ugly, disfigured (lowres, low quality, worst quality:1.2), (text:1.2), watermark, painting, drawing, illustration, glitch,deformed, mutated, cross-eyed, ugly, disfigured"

generate image

image = pipe( ... prompt, image_embeds=face_emb, image=face_kps, controlnet_conditioning_scale=0.8 ... ).images[0]

For more details, please follow the instructions in our GitHub repository.

Usage Tips

  1. If you're not satisfied with the similarity, try to increase the weight of "IdentityNet Strength" and "Adapter Strength".
  2. If you feel that the saturation is too high, first decrease the Adapter strength. If it is still too high, then decrease the IdentityNet strength.
  3. If you find that text control is not as expected, decrease Adapter strength.
  4. If you find that realistic style is not good enough, go for our Github repo and use a more realistic base model.

Demos

Disclaimer

This project is released under Apache License and aims to positively impact the field of AI-driven image generation. Users are granted the freedom to create images using this tool, but they are obligated to comply with local laws and utilize it responsibly. The developers will not assume any responsibility for potential misuse by users.

Citation

@article{wang2024instantid,
  title={InstantID: Zero-shot Identity-Preserving Generation in Seconds},
  author={Wang, Qixun and Bai, Xu and Wang, Haofan and Qin, Zekui and Chen, Anthony},
  journal={arXiv preprint arXiv:2401.07519},
  year={2024}
}
Zusammenfassen
InstantID ist eine neuartige, tuning-freie Methode zur ID-erhaltenden Generierung mit nur einem einzelnen Bild, die verschiedene nachgelagerte Aufgaben unterstützt. Der Artikel beschreibt die Nutzung des Modells, das direkt aus einem Repository heruntergeladen werden kann. Es werden Anweisungen zur Installation und Verwendung gegeben, einschließlich der Vorbereitung eines Gesichtsencoders und der Anpassung von Gesichtsabbildungen. Nutzer können ihre eigenen Bilder generieren, indem sie spezifische Eingabeaufforderungen und negative Eingaben verwenden. Tipps zur Optimierung der Ergebnisse werden ebenfalls bereitgestellt, wie z.B. Anpassungen der Adapter- und Identitätsstärke. Das Projekt ist unter der Apache-Lizenz veröffentlicht und zielt darauf ab, die KI-gesteuerte Bildgenerierung positiv zu beeinflussen. Nutzer sind verpflichtet, die lokalen Gesetze zu beachten und das Tool verantwortungsbewusst zu nutzen. Der Artikel schließt mit einem Verweis auf eine wissenschaftliche Publikation, die das InstantID-Modell beschreibt.