Secrets AI Video Generator: How It Works, Quality, and Cost
Video generation from AI companion images is the feature that most clearly separates Secrets AI from the competition. In a market where most platforms offer only static images alongside text chat, the ability to animate your AI companion into a short video clip is a genuine differentiator — and the quality, rated 4.1/5 by reviewers, is good enough that it holds up to scrutiny rather than just serving as a marketing bullet point.
This guide covers exactly how the video generator works, what each clip costs in Moments, how quality varies across tiers, and who this feature actually makes sense for.
What Makes This Feature Unusual
Most AI companion platforms — Character.AI, CrushOn AI, Janitor AI, and Replika — do not offer video generation. Candy AI has limited video capability. The AI art and deep learning infrastructure required for coherent short-form video generation is substantially more resource-intensive than static image generation using Stable Diffusion-class models, which explains why most platforms have avoided it.
Secrets AI's video generator takes a different approach from text-to-video AI systems: it starts from an existing companion image rather than generating video from scratch. This image-to-video workflow constrains the output to your specific character's appearance, creating continuity between the visual you've built and the animated result. The output is a short animated clip showing the character moving naturally — expressions, body movement — rather than a generic AI-generated video.
For a complete picture of all platform capabilities, see the full review.
How the Video Generator Works — Step by Step
- Select or generate a source image. The video is generated from an existing companion image. Either use one of the 4 auto-generated character images or generate a new image (25–50 Moments) to use as the source.
- Add a text prompt. Describe the desired movement, action, or scenario. The more specific the prompt, the more accurately the output reflects your intention. Generic prompts ("move naturally") produce generic movement; specific prompts ("look over shoulder and smile slowly") produce more directed animation.
- Submit and wait. Processing takes approximately 2 minutes. The AI uses the source image and prompt to render the animation. This is not real-time — it is a queued generation process.
- Review and save. The completed clip appears in the chat interface. Save it locally or keep it in your conversation history.
Video generation is available on Lite tier and above. Free accounts cannot access this feature.
Video Quality — An Honest Assessment
Reviewer ratings place video quality at 4.1/5 — the second-highest feature score on the platform after voice (4.3/5) and below chat (4.4/5). What that score represents in practice: clips generally look realistic, character movement is smooth, and facial expressions read as natural in most cases. The output is better than many users expect from an AI companion platform.
The quality floor matters: on lower-quality prompts or when the source image has visual inconsistencies, outputs can vary. Premium and Advanced generation models (accessible on higher tiers) produce visibly better results than the base model. Character anatomy holds up better in Premium outputs — the anatomical inconsistencies that occasionally appear in static images (particularly with hands) are less prominent in video clips when using higher-quality generation models.
Prompt complexity affects quality. Overly complex prompts with multiple simultaneous actions or very specific spatial requirements produce less reliable results than focused, single-action prompts. The recommendation from experienced users: keep prompts simple and specific for the best output-to-Moments ratio.
Moments Costs by Video Type
Video is the most Moments-intensive feature on the platform. Understanding the cost before you start generating prevents depleting your monthly allocation unexpectedly.
| Video Type | Approximate Moments Cost |
|---|---|
| Short clip (3 seconds) | ~50 Moments |
| Standard clip (longer) | ~600 Moments |
That range is wide — 50 versus 600 is a 12x cost difference. The 3-second short clips available on Lite tier cost ~50 Moments each. Longer, full-quality clips on Plus, Premium, and Ultimate cost approximately 600 Moments. Most users wanting more than a brief animation will be generating at the higher cost point.
Monthly video budget by tier:
| Tier | Moments/Month | Short Clips (~50) | Full Clips (~600) |
|---|---|---|---|
| Lite | 1,000 | ~20 | ~1-2 |
| Plus | 3,000 | ~60 | ~5 |
| Premium | 8,000 | ~160 | ~13 |
| Ultimate | 15,000 | ~300 | ~25 |
For a user who wants regular full-length video generation (more than 10 clips per month), Premium ($19.99) is the minimum viable tier — and Ultimate ($39.99) provides more comfortable headroom. For occasional video use (a few clips per month), Plus ($9.99) is sufficient.
For the complete Moments cost structure across all features, see the Moments costs breakdown on the pricing page.
How Video Compares to Images and Voice
When deciding how to allocate Moments across media types, the cost-per-output math helps:
| Feature | Moments Cost | What You Get |
|---|---|---|
| Text message | 1–2 | Single response |
| Image (standard) | 25–50 | Static image |
| Short video (3s) | ~50 | Brief animated clip |
| Full video | ~600 | Longer animated clip |
| Voice (1 min) | 100 | 1 minute audio call |
600 Moments for a full video equals: 12–24 images, or 6 minutes of voice calls, or approximately 300–600 text messages. The investment is significant — video represents a real tradeoff against other feature use.
The practical approach for Plus-tier users (3,000 Moments/month): allocate Moments to text and images first, then generate video selectively for content you specifically want animated. Treating every image as a potential video candidate quickly depletes a Plus allocation.
Tips for Better Video Results
Several practices consistently improve output quality:
Use high-quality source images. The video quality ceiling is set by the source image. Generate a fresh, high-quality image specifically as a video source rather than using an older or lower-quality image from earlier in the conversation.
Start with short clips. Test the prompt and character combination with a 3-second clip (~50 Moments) before committing to a full 600-Moment generation. This 12x cost difference makes a test clip worth the extra step.
Keep prompts specific and single-focus. One clear action or movement described specifically produces better results than multiple concurrent actions. "Slowly turn toward camera with a smile" outperforms "look around, smile, and wave."
Use Premium or Advanced generation models. If you have access to these models (Premium and Ultimate tiers), the quality improvement for video is noticeable compared to base models.
Save Moments by converting existing images. Rather than generating a new image and immediately creating video, browse your existing character images first. Auto-generated character images (4 images at character creation) can serve as video source material.
Who Actually Benefits From This Feature
The video generator adds meaningful value in specific use cases:
Worth using if: Visual content is important to your AI companion experience, you want media beyond static images, or you want content that genuinely reflects your companion's appearance rather than generic AI video.
Less relevant if: You primarily use Secrets AI for text conversation and relationship simulation, your Moments budget is tight and text/images are your priority, or you are on the free tier (video is inaccessible anyway).
Tier recommendation for video use:
- Occasional video (under 5 full clips/month) → Plus ($9.99/month)
- Regular video (5–15 full clips/month) → Premium ($19.99/month)
- Heavy video (15+ full clips/month) → Ultimate ($39.99/month)
For the free-vs-premium breakdown that covers how video access changes across all tiers, see video access by tier.
Which Competitors Offer Video Generation
The absence of video generation among Secrets AI's competitors is notable:
- Character.AI — No video generation
- CrushOn AI — No video generation
- Janitor AI — No video generation
- Candy AI — Limited video capability
- Replika — No video generation
Two platforms with comparable video features: SweetDream AI and Xotic AI (which offers 4K 15-second clips). Both are smaller platforms than Secrets AI with less developed chat and memory systems. For users whose primary need is video generation from AI companions, Secrets AI remains the most complete overall offering in this feature space.
FAQ
Video length depends on tier. Lite tier supports 3-second short clips (~50 Moments each). Plus, Premium, and Ultimate tiers access longer clips costing approximately 600 Moments per video. The exact maximum length for full clips is not publicly specified in frames or seconds, but reviewers describe them as "longer motion clips" beyond the 3-second short format. Processing time for all clips is approximately 2 minutes regardless of length.
No. Video generation is not available on the free tier. Access begins at Lite ($5.99/month), which supports 3-second short clips only. Full-length video generation requires Plus tier or above. The free plan's 200 starting Moments would in any case be insufficient for more than a few short clips before depleting entirely — video is one of the most Moments-intensive features on the platform.
It depends on your tier and whether you generate short or full clips. On Plus (3,000 Moments): approximately 60 short clips or 5 full clips per month. On Premium (8,000 Moments): approximately 160 short clips or 13 full clips. On Ultimate (15,000 Moments): approximately 300 short clips or 25 full clips. These numbers assume Moments are used exclusively on video — real-use scenarios with image generation and text mixed in will reduce video capacity. Top-up Moments bundles can extend capacity beyond the monthly allocation.
Yes, within the constraints of current AI video generation technology. Reviewer ratings of 4.1/5 reflect output that "looks good and moves smoothly most of the time." Character movement is generally natural, and facial expressions hold up in most outputs. Quality varies based on source image quality, prompt specificity, and generation model — Premium and Advanced models produce better results than the base model. The occasional quality variation noted by reviewers is typically related to overly complex prompts or lower-quality source images rather than a systematic quality problem.