Skip to content

Long Video Generation

Most AI video tools only let you create 5-10 second clips. But what if you need a longer video? This workflow shows you how to chain multiple clips together seamlessly to create videos that are 20, 40, or even 60+ seconds long.

We’ll create a 20-second video by connecting two 10-second clips together. The trick? We make the ending frame of Clip A match the starting frame of Clip B, so they blend seamlessly.

What you’ll create:

  • 3 still images (the “anchor points” for our videos)
  • 2 video clips (that transition between the images)
  • Optional: Add voice and lipsync for a talking avatar effect

First, we’ll generate three images that show different poses or actions. Think of these as the “snapshots” that will mark the beginning, middle, and end of our video sequence.

We’ll create the first image showing Einstein standing in a bedroom.

  1. Open the Image Generation tool and select Seedream v4

  2. Use this prompt and set aspect ratio to 9:16:

    "Albert Einstein is standing in the middle of a bedroom in 2024. He is wearing casual men clothes. [9:16]."
  3. Click Generate

Image 1: Einstein standing

Now we’ll edit the first image to show Einstein sitting instead of standing.

  1. Open the Image Editing tool and select Seedream v4 Edit

  2. Drag Image 1 into the image input field

  3. Use this prompt:

    "Edit Reference: Do not modify anything except the man. Make the men sit. Keep the angle exactly the same"
  4. Click Generate

Image 2: Einstein sitting down

Now we’ll edit Image 2 to add a waving gesture.

  1. In the Image Editing tool with Seedream v4 Edit selected

  2. Drag Image 2 into the image input field

  3. Use this prompt:

    "Make the hand towards the wall wave like saying good bye"
  4. Click Generate

Image 3: Einstein waving goodbye

Now for the magic! We’ll create two video clips that smoothly transition between our three images.

This clip will show Einstein talking and sitting down on the bed (transitioning from Image 1 to Image 2).

  1. Open the Video Generation tool and select Kling 2.1 Pro

  2. Set First Frame = Image 1 and Last Frame = Image 2

  3. Use this prompt and set duration to 5-10 seconds:

    "The guy is continuously talking while keeping his look the same. While talking he gesticulates and sit on the bed."
  4. Click Generate


This clip will show Einstein continuing to talk and then waving goodbye (transitioning from Image 2 to Image 3).

  1. In the Video Generation tool with Kling 2.1 Pro selected

  2. Set First Frame = Image 2 and Last Frame = Image 3

  3. Use this prompt and set duration to 5-10 seconds:

    "The man talks to the camera while gesticulating confidently on the side without blocking the view of the face. The hand never covers his face. In the end waves with his hand saying good bye."
  4. Click Generate


Now let’s combine Clip A and Clip B into one long video.

  1. Open your video editor (CapCut, Premiere Pro, DaVinci Resolve, or any editor you prefer)
  2. Import Clip A and Clip B
  3. Place them back-to-back on your timeline
  4. The transition should be seamless—no awkward jump cut!
  5. Add captions, background music, or color grading if you’d like

You now have a 20-second video! And the best part? You can keep going with Clip C, D, E… to make it even longer.


Want Einstein to actually speak with realistic lip movements? Here’s how to add voice to each clip.

  1. Open the Lipsync tool and drag Clip A into the video input
  2. Click Create Speech to open the Speech Generation tool
  3. Choose a voice (like ElevenLabs), enter your script, and click Generate
  4. Back in the Lipsync tool, select Sync Lipsync v2 Pro and click Generate

Repeat the same process for Clip B


  • Keep clips short (5-10 seconds): Longer clips cost more credits and can drift away from your anchor images
  • Change one thing at a time: Between images, only change the pose—keep lighting, outfit, background, and camera angle identical
  • Use the same camera framing: Don’t zoom in or out between images, or the transition will look jarring
  • Test before adding voice: Stitch your clips together first to make sure the transitions work smoothly, then add lipsync

You can keep extending this pattern! Create Image 4, then Clip C (Image 3 → Image 4). Create Image 5, then Clip D (Image 4 → Image 5). Stack as many clips as you need to build 60+ second videos.

Related guides: Image EditingVideo GenerationLipsyncTroubleshooting