Textual inversion is a technique that allows one to train a new “word” (and get a new embedding for that word) in the space of all possible embeddings our model knows. As a user you can load this new embedding and refer it during the de-noising process.

A better and more formal description can be found here A video explaining this topic is part of the stable diffusion lecture series by fast.ai.

Here is an example of leaning a specific watercolor houses drawing style]. The learned style then transferred to other objects: Textual inversion

One has to be careful - if guidance is too high (that is, how much the de-noising should stick to the prompt) - even a cat can suddenly become a watercolor drawing of a house: img.png

Image to image (style transfer) works quite well too, but you have to lower the guidance a bit: img.png img.png

Training an object (and not a style) yielded nice results img.png

But failed to locate the learned object correctly in the context of the scene given the prompt, for example

Prompt: “photo of object driving a red car, yellow eyes, masterpiece, trending, beautiful, sharp focus, cute”

img.png

There is a huge collection of community created concepts here.
I found that the simplest way to train a new embedding is using this google colab notebook

And here is an exploration of another technique to customize generative diffusion modes called Dreambooth.