in today’s video we are going to look at how you can create this powerful AI pipelines in our example today it’s going to be an AI video generator pipeline you can see we are going to generate an ID using GPT40 we’re going to generate an image from that ID using Flux from that image we’re going to generate an video with Cling we’re going to add some music we’re going to add some voice over narration using the new OpenAI TTS model and we’re going to edit the video using some simple ffmpeg commands and all of this is kind of
fully automated pipeline so we can take a look at an example of an output from this pipeline we can check out this one so let me just play this for you in Irish folklore a banshei is a spirit associated with death known for her mournful whale that signals the impending death of a family member so yeah I just want to take like a deep dive into the code today look at how I set everything up like step by step and hopefully you can learn from this maybe create other pipeline it doesn’t have to be video but I think this give you kind
idea of how we can link everything together using these cool AI tools and hopefully produce something that is yeah very interesting or very helpful or very just visual pleasing like this one okay so uh I kind of have two different version of this but we’re going to focus on the one that is a bit more simple to understand so that is what we are going to go through today so as always when I start these uh let’s say these pipeline projects uh the first thing I always do and this is a very important point when working with LLMs writing some kind of code uh I always start with gathering documentation so I went ahead gathered some information about the flux documentation the cleaning video documentation open AAI and the son out that is something I always do and I really recommend doing this spending 5 minutes gathering documentation will save you a lot of hassle if you want to use cursor to write the code if you’re writing the code yourself doesn’t matter too much but if you want to h get assistance from cursor writing the code

I will always do this first but today we’re going to look at kind of how I set this up this particular workflow so we’re going to kind of go through step by step how I thought about building this out uh okay i don’t think we have to start with the logging of the IDs we can come back to that because I want to start with kind of the the first thing and that is going to be generating the idea right so uh I wanted to generate some idea that had to do with like folklore and I wanted to have like a specific look it’s going to be firsterson view and for that I wanted to create an just a prompt image genext that’s going to be my prompt right that we’re going to feed into GPT40 to generate Yeah the same prompt uh like with varants every time and to get it working pretty uh to get like uh some consistent stuff here uh I provided seven different examples right so you can see we have example one ID uh let’s say POV you are entering a secret vampire gathering in Vienna 1722 that is the ID and from the prompt
point of view of your pale gloved uh hands delicated yet cold cold clutching blood revel you get the idea right so we have like a an example uh of an ID and an example of a prompt and total seven examples and the final one is create one similar ID that includes witches druids and like uh similar folklore archetypes that connects to nature and supernatural plus the prompt in the same style and format uh of the examples so that is kind of our prompt for generating the idea right so this is we’re going to
feed as context into the GP40 model right uh yeah you can see we feed in the idea prompt but you can always see uh we might as well do it right because I don’t want it to generate the same ID over and over again so I created a variable that is called uh avoid context avoid context i don’t know what to say but basically we store our latest six generated ids in just a simple JSON file here so this gets also fed into the context here so the model kind of doesn’t generate the same ID every single time so it looks at uh what it’s
going to avoid you can see please avoid generating these IDs similar to these recently created ones right and that is just to get some more variance in our outputs and you can see further down here uh we are going to save the ID we just generated to our history right and we have some formatting here because we need to extract because the format we get the output in is going to be pretty much sim similar to these examples so we want to extract the ID because we’re going to use it later in the pipeline and we want
to extract the prompt right and we return the ID and we return the prompt so a pretty straightforward but here comes kind of the next step and that is going to be to generate the image that we will be using uh in the yeah the pipeline so of course we’re just going to take the prompt we generated from the ID and send that as input to um the flux model i have some setup here for the flux model so it’s going to be a 916 format safety to max here is our model right and we’re going to yeah print out this
so uh I thought we can just stop halfway here and run this up to this point okay so when we run this now you can see gen ID using the open API you are a root walker harnessing the life force of ancient ancient groove the prompt is point of view of your vine encircled hands grasping a staff okay so we sent that uh prompt to the flux image generator uh let’s take a look at the image we got okay that was pretty good so now we’re going to use this image to actually generate the video uh I found out that that was a much better pipeline
to generate the image first and use this image as a reference for generating a video uh and that’s fine for now so let’s go back to our code and let’s look at kind of the next step and we’re going to skip the generate video uh because we have a step in between that and that is generate the voice dialogue because I want some voice over as you heard so again we have some example data of how I want this to look so this is just uh uh voice examples so this is just because of the new TTS model from OpenAI you can

kind of describe the voices warm relaxed friendly we can have some punctuations light and natural smooth and easygoing simple and direct we can have some tone settings so these are just three examples of how we can do this right that is just going to be our examples we’re going to feed that into the prompt and the dialogue prompt is create a short uh engaging single line of a dialogue question max 15 words for the following ID so here we feed in the ID so in our case it was a POV of a grower was it or something like that uh the
short uh dialogue question should be something like a character expressing the scene uh like uh I wonder what happened here what could be over here i wonder where Queen Cleopatra is buried some just some more examples in the dialogue avoid cringe cliches like secrets moon breath please be creative inspired by the idea uh also you must provide detailed voice instructions and we feed in the examples right and we’re going to end up with kind of a voice so you can see we also need to if you look here we have two different voices so
also determine whether these dialogue questions are best spoken by a male voice or a or a female voice based on the archetype of the idea so we’re going to end up with like this format so we’re going to pick a voice ballad or shimmer the dialogue questions and the instructions so this is kind of the output we want from this voice dialogue so if we go back to our terminal now you can see here generating voice dialogue for POV you’re a root worker right and we pick the voice shimmer and the dialogue is can the grooves echoes
whisper the way up to peace here are the instructions we want to feed in so that’s pretty good and yeah that is what we get so we can listen to the voice generator can the Groves echoes whisper the way to peace can the Groves echo yeah that worked pretty good so this is going to be merged into the video right and we can go back here now because the next step is going to be to actually generate the video and that is of course pretty straightforward so that is step three again we have a prompt videogen.txt so basically this is the
prompt I want to use pov action uh moving that is the prompt we have some negative prompts blurry and natural movements and we have some settings here that are going to be fed in here and of course the prompt we’re going to feed in you can see here is the settings 916 duration i think it’s going to be 10 seconds not eight but that’s fine uh negative prompts we’re going to feed in that right and yeah we have some time stamps to generate a file name and here is we feed in the prompt from the yeah the ID
generation we feed in negative prompt aspect ratio scale CFG duration and the image file so the starting image is of course going to be the image we generated here from Flux that is going to be starting image uh reference image for the cling uh video API and here’s the model we use the standard model not the pro one and it’s going to return a video file name right so if we go back to our terminal now in step three here we generated the video using cling AI and we saved it to video uh yeah you can see here here’s the
video we saved so we can look at that yeah that looks pretty good it’s a bit strange though but uh I kind of like this so it was pretty cool so that is the video we generated from the image and from there now we can just move on to the next step that is going to be to generate the music using music using the son outu API and again we have some uh reference music generation.txt and we want to feed in the ID of the prompt so for example POV you’re entering uh the secret to gathering so here we’re going to feed in
our ID into the prompt uh we’re going to set instrumental so this is a boolean value we’re going to set always this to true and the prompt strength is going to be 2.3 but we also have some additional tags in the generate music inputs we have uh some parameters called tags eteral chants folklore ancient spiritual ambient ritualistic and instrumental true MP3 out and the prompt strength and the music prompt is the other inputs and we’re just going to go through this and out comes an MP3 file uh under the
folder music right so let’s listen to this pretty cool and that’s about it so the final one is actually just putting everything together so this is using ffmpeg we’re going to put everything into like uh we’re going to fetch the video the music the uh dialogue and we’re going to use some ffmpeg commands here right to put everything together into one final output video and here you can see all the steps in the main function generate the ID generate the image generate the voice the video and the music and then final video to merge
everything together and if we bring back up our terminal now you can see we generated the music right success and then we went to kind of the pipeline here ffmpeg and we have the final output so let’s blow up the final output here and listen to what we created can the groves echoes whisper the way to peace yeah not bad right so that is just one example of how you can set up these pipelines it doesn’t have to be very complicated and you can just run this all over how many times you want uh but I thought we can check out the price for
this so I’m going to just calculate the price uh I’m just going to run uh or maybe we can skip this one we can actually do our second example where we create two videos and it’s a bit more expanding and then we can check the price of that one so let’s clear up this one because the second one we have is a bit more uh advanced uh it’s not that more it’s not that different but basically we create a secondary prompt that is complementaryary to the ID and we generate two images we generate the same voice dialogue but a bit it’s going
to tell like a story about folklore and we create two videos and we add the music and then we merge everything together so let’s run this one time uh and I’m going to monitor the price so I can give you like an idea of how one how much one video costs okay so I’m just going to run this now and then we’re going to check out the end uh how much this cost and we’re going to take a look at the weird video okay so uh we have finished so I calculated this so this was the price uh per not per video but
per 20 seconds so it’s $1.33ish so you can see from OpenAI it was basically nothing from Son Auto it’s like it might be a bit cheaper uh from replicate it’s about a dollar for two videos and the images so let’s take a look at how this ended up here so let me just find the final output and let’s blow this up and let’s listen to it yeah we might as well do this so let’s play this in Japanese folklore Jurugamu is a spider spirit that can transform into a beautiful woman it is known for luring unsuspecting men to its lair where it
reveals its true form and traps them in its web okay yeah that was pretty cool like not everything is perfect but uh I kind of like the videos it’s pretty It looks pretty strange though but the realism is there i really like the reflection in the look in the water there it’s pretty good right you can see the reflection all the way down here in the water so yeah pretty cool and if you upload this to like a social media platform you can add these captions there’s also an option to use SF ffmpeg to add captions
if you really wanted to do that but uh I think this just shows uh kind of how how cool things you can do by using these AI pipelines that we looked at today and if you’re really interested in just the pipeline I built uh if you become a member of the channel uh you can see I have pushed this code to the community GitHub that you will get access to if you become a member uh I might push the auto video 3 the last one we created now also if people are interested in trying that out so yeah if you want to try this out today just
become a member of the channel uh there’s a link in the description and you can get access to this but uh yeah just wanted to show you kind of my thinking about generating these cool pipelines using different AI tools of course these are API dependent but you could of course use MCP servers and stuff if you needed to specialize this uh this yeah kind of setups but this is like very fun to play around with and you learn a lot right doing this too if you’re interested in AI engineering types of stuff so definitely go check it

out uh I created like a simple uh YouTube Tik Tok channel uh for these types of videos we can take a look at this one in Egyptian mythology the sphinx is a creature with the body of a lion and the head of a human known for posing riddles to travelers often yeah you get the point so people kind of they got some likes these videos so I just think it’s cool to post this if they turn out good uh but you don’t have to you can just make other video pipelines using your own images and stuff so yeah thank you for tuning in today and hope
you learned something and hope you want to go out and create these cool AI powerful pipelines.