Experiments with stable diffusion - textual inversion
Textual inversion is a technique that allows one to train a new “word” (and get a new embedding for that word) in the space of all possible embeddings our model knows. As a user you can load this new embedding and refer it during the de-noising process.
A better and more formal description can be found here A video explaining this topic is part of the stable diffusion lecture series by fast.ai.
Here is an example of leaning a specific watercolor houses drawing style].
The learned style then transferred to other objects:
One has to be careful - if guidance is too high (that is, how much the de-noising should stick to the prompt) -
even a cat can suddenly become a watercolor drawing of a house:
Image to image (style transfer) works quite well too, but you have to lower the guidance a bit:
Training an object (and not a style) yielded nice results
But failed to locate the learned object correctly in the context of the scene given the prompt, for example
Prompt: “photo of object driving a red car, yellow eyes, masterpiece, trending, beautiful, sharp focus, cute”
There is a huge collection of community created concepts here.
I found that the simplest way to train a new embedding is
using this google colab notebook
And here is an exploration of another technique to customize generative diffusion modes called Dreambooth.