Generating Coherent Video2Video and Text2Video Animations with SD-CN-Animation
SD-CN-Animation is a project that allows for the generation of coherent video2video and text2video animations using separate diffusion convolutional networks. The project previously existed in the form of a not too user-friendly script that worked through web-ui API. However, after multiple requests, it was turned into a proper web-ui extension. The project can be found on GitHub here, where more information can be found, along with examples of the project working.
The project uses a separate diffusion convolutional network (SD-CN) that is capable of generating coherent video2video and text2video animations. The animations can be generated through batch and controlnet, and multi-controlnet is also supported.
The SD-CN model used in the project is a variant of the diffusion models used in RAFT, which is another project on GitHub here. RAFT provides an implementation of optical flow estimation using the same diffusion model used in the SD-CN-Animation project. This diffusion model is trained with a contrastive loss and is capable of capturing long-term spatio-temporal dependencies, which is important for generating coherent video2video and text2video animations.
The SD-CN-Animation project has been found to be promising, however, some users have reported issues when running the vid2vid or text2vid feature. Specifically, users have received a ValueError: Need to enable queue to use generators error. The solution to this is to update the AUTOMATIC1111 web-ui to the latest version, as older versions did not enable queuing.
Additionally, some users have reported encountering errors in the command line, such as the TypeError: Script.postprocess() missing 12 required positional arguments error or the TypeError: AntiBurnExtension.postprocess_batch() missing 8 required positional arguments error. These errors could be due to using the vlad fork, so it is recommended to check if this is the case and if so, update to the latest version.
Overall, the SD-CN-Animation project provides a promising solution for generating coherent video2video and text2video animations using separate diffusion convolutional networks that are capable of capturing long-term spatio-temporal dependencies. Users interested in training their own video data set using the diffusion model used in this project can refer to the RAFT GitHub repository for more information on the implementation of the model.