fal.ai Models

Total: 1641 • New: 0 • Active: 1418 • Deprecated: 223

Show Deprecated

Model Name	Description	Status	Published	Action
MiniMax H3 Text to Video text-to-video	MiniMax H3 is a frontier video model. This endpoint generates video from a text prompt alone, rendering at 2K in durations from 5 to 15 seconds across seven aspect ratios. stylized transform lipsync	OK	2d	→
MiniMax H3 Reference to Video image-to-video	MiniMax H3 is a frontier video model. This endpoint generates 2K video from multimodal references up to 9 images for subject and style, 3 video clips for motion, and 3 audio clips each cited in the prompt by order, keeping subjects consistent while following the referenced motion and audio. stylized transform lipsync	OK	2d	→
MiniMax H3 Image to Video image-to-video	MiniMax H3 is a frontier video model. This endpoint animates a supplied image into 2K video, using it as the opening frame or pairs a first and last frame to control a transition between two images with the aspect ratio following the input. stylized transform lipsync	OK	2d	→
Ideogram Object Removal image-to-image	Prompt-free object removal from an image and mask, erasing objects with their shadows and reflections and reconstructing the scene cleanly. utility editing	OK	3d	→
Grok Imagine Video 1.5 Text to Video text-to-video	Generate videos from prompts with audio using xAI's Grok Imagine 1.5 Video model. stylized transform lipsync	OK	3d	→
Grok Imagine Video 1.5 Reference to Video image-to-video	Generate videos from images and audio references using xAI's Grok Imagine 1.5 Video model. stylized transform lipsync	OK	3d	→
Hailuo 03 Reference to Video image-to-video	MiniMax's Hailuo-03 is a frontier video model. This endpoint generates 2K video from multimodal references up to 9 images for subject and style, 3 video clips for motion, and 3 audio clips each cited in the prompt by order, keeping subjects consistent while following the referenced motion and audio. stylized transform lipsync	OK	4d	→
Hailuo 03 Image to Video image-to-video	MiniMax's Hailuo-03 is a frontier video model. This endpoint animates a supplied image into 2K video, using it as the opening frame or pairs a first and last frame to control a transition between two images with the aspect ratio following the input. stylized transform lipsync	OK	4d	→
Hailuo 03 Text to Video text-to-video	MiniMax's Hailuo-03 is a frontier video model. This endpoint generates video from a text prompt alone, rendering at 2K in durations from 5 to 15 seconds across seven aspect ratios. stylized transform lipsync	OK	4d	→
MAI Image 2.5 Pro (Edit) image-to-image	Apply precise, controllable edits to a reference image while preserving composition, typography, identity, and fine visual detail. image-editing typography photorealism controllable-editing multi-image	OK	5d	→
MAI Image 2.5 Pro (Text to Image) text-to-image	Generate high-fidelity, design-ready images with precise typography, strong prompt alignment, and rich visual detail using Microsoft's flagship MAI Image 2.5 Pro. photorealism typography illustration commercial text-to-image	OK	5d	→
Feynobg Background Remover image-to-image	FeyNobg is a state of the art AI model for background removal from feyninc utility editing	OK	5d	→
Pixelcut Product Photo image-to-image	Pixelcut's Background Remover produces fast, high-quality cutouts built for e-commerce product imagery utility editing	OK	5d	→
Ltx 2.3 Quality video-to-video	Remove character from your video using Ltx 2.3 video clean	OK	8d	→
Happy Oyster video-to-video	Realtime interactive world model — generate a world from a prompt, then explore it or direct its story as live video. video happy-oyster	OK	10d	→
Qwen Audio 3.0 TTS (Flash) text-to-speech	Generate natural multilingual speech from text with fast voice and language control using Qwen Audio 3.0 TTS Flash. text-to-speech audio speech-synthesis multilingual	OK	11d	→
Qwen Image 3 Text to Image text-to-image	Edits images from one to three reference images and a natural-language instruction, preserving key details such as facial features and identity while applying the requested changes stylized transform typography	OK	12d	→
Qwen Image 3 Image Editing image-to-image	Generates images from a text prompt at resolutions up to 2048×2048, with automatic prompt rewriting and prompt-guided resolution selection, building on Qwen's strength in complex text rendering and precise prompt adherence stylized transform typography	OK	12d	→
V1.1 Video to Video Music video-to-video	Generates perfectly synced music for any video. Return a licensed music soundtrack ready for commercial use (optional preservation of the original speech in video) music editing restoration	OK	12d	→
V1.1 Video to Sound Effects video-to-audio	Analyzes a video and generates synchronized, royalty-free sound effects timed to visible actions. Returns the generated sound-effects audio track for commercial use. sfx audio effects	OK	12d	→
V1.1 Video to Video Sound Effects video-to-video	Adds synchronized, royalty-free, commercial-use-safe sound effects to a video. Returns the finished video with the generated audio mixed in. sfx audio effects	OK	12d	→
V1.1 Text to Sound Effects text-to-audio	Generates high-quality, commercial-use-safe sound effects from a text prompt, with full control over type, texture, intensity, and exact duration. sfx audio effects	OK	12d	→
Ltx 2.3 video-to-video	LTX-2.3 Reframe converts your videos to any aspect ratio without destructive cropping. It intelligently recenters the original footage and generatively fills the newly exposed areas with content that seamlessly matches the scene, so the result looks like it was shot natively in the target format. Turn landscape footage into vertical 9:16 for social, square 1:1 for feeds, or anything in between. Supports videos up to 60 seconds, with 720p and 1080p outputs across 1:1, 4:5, 5:4, 9:16 and 16:9. reframe size	OK	17d	→
V1.1 Video to Video Music video-to-video	Generates perfectly synced music for any video. Return a licensed music soundtrack ready for commercial use (optional preservation of the original speech in video) stylized transform lipsync	OK	17d	→
Lucy 2.5 video-to-video	Real-time, prompt-driven video editing over WebRTC. Restyle, swap backgrounds, and add or replace objects live on a webcam or streamed feed at interactive latency. realtime video-to-video webrtc	OK	18d	→
Bria Product Dimensions image-to-image	Bria Product Dimensions turns one product photo and its measurements into a marketplace-ready dimension image with callout lines, labels, and weight or capacity readouts stylized transform typography	OK	20d	→
Reve 2.1 image-to-image	Remix images from text prompts with strong prompt adherence, layout intelligence, and accurate text rendering using Reve 2.1 stylized transform typography	OK	20d	→
Reve 2.1 image-to-image	Edit images from text prompts with strong prompt adherence, layout intelligence, and accurate text rendering using Reve 2.1	OK	20d	→
Reve 2.1 text-to-image	Generate high-quality images from text prompts with strong prompt adherence, layout intelligence, and accurate text rendering using Reve 2.1. text-to-image	OK	20d	→
VEED Lipsync video-to-video	Generate production-quality lipsync from any audio using VEED's most advanced model yet. veed lipsync video-to-video avatar	OK	20d	→
Krea 2 Text to Image Turbo Style text-to-image	Generate high-fidelity images from text with Krea 2 using a style reference image. Apply a reference image to guide the visual style into new generations, with aspect ratio, creativity, and seed controls. stylized style transfer reference image realism	OK	23d	→
Seedream text-to-image	Text to Image endpoint for the fast Lite version of Seedream 5.0, supporting high quality intelligent text-to-image generation. text-to-image bytedance seedream-5.0-lite	OK	24d	→
Seedream image-to-image	Image editing endpoint for the fast Lite version of Seedream 5.0, supporting high quality intelligent image editing with multiple inputs. bytedance seedream-5.0-lite edit	OK	24d	→
Seedream 5.0 Pro Text to Image text-to-image	ByteDance's Seedream 5.0 Pro is flagship text-to-image model, with deep-thinking prompt understanding, native text in 14 languages, and precise control over dense layouts and structured designs. realism typography stylized	OK	25d	→
Seedream 5.0 Pro Image Editing image-to-image	Seedream 5.0 Pro is grounded, region-precise image editing model that changes one element while keeping the rest of the frame intact with layer separation, sketch completion, and up to 10 reference images. realism typography stylized	OK	25d	→
Ltx 2.3 Quality video-to-video	Extend high-quality video with audio from input video using LTX-2.3 with Lora extend longer	OK	26d	→
Ltx 2.3 Quality video-to-video	Extend high-quality video with audio from input video using LTX-2.3 extend longer	OK	26d	→
V4.0q [instant] text-to-image	Generate high-quality images, posters, and logos with Ideogram's latest V4.0q — producing crisp visuals with accurate text rendering, fine detail, and full creative control for polished, ready-to-use designs FRACTION OF A SECOND. realism typography stylized	OK	27d	→
V4.0q [fast] text-to-image	Generate high-quality images, posters, and logos with Ideogram's latest V4.0q — producing crisp visuals with accurate text rendering, fine detail, and full creative control for polished, ready-to-use designs IN A SECOND. realism typography stylized	OK	7/3	→
Nano Banana 2 Lite text-to-image	Nano banana lite is the efficiency-focused model in the image generation family. Sub-2 second latency with cost-effective generation and editing, fast multi-turn local edits, and 14 supported aspect ratios.	OK	6/30	→
Nano Banana Lite Edit image-to-image	Nano banana lite is the efficiency-focused model in the image generation family. Sub-2 second latency with cost-effective generation and editing, fast multi-turn local edits, and 14 supported aspect ratios.	OK	6/30	→
Nano Banana Lite text-to-image	Nano banana lite is the efficiency-focused model in the image generation family. Sub-2 second latency with cost-effective generation and editing, fast multi-turn local edits, and 14 supported aspect ratios.	OK	6/30	→
Gemini Omni Flash image-to-video	Generates video with audio from combined multimodal references. Accepts text, images, audio, and video together as input to guide subject, motion, style, and sound in the output. stylized transform lipsync	OK	6/30	→
Gemini Omni Flash image-to-video	Animates a still image into video with audio. Extends a single frame into coherent motion, grounded in Gemini's physical understanding of how scenes and subjects behave. stylized transform lipsync	OK	6/30	→
Gemini Omni Flash video-to-video	Edits generated video across multiple conversational turns while preserving scene coherence. Applies iterative changes through natural-language instructions without regenerating the full sequence from scratch. stylized transform lipsync	OK	6/30	→
Gemini Omni Flash text-to-video	Creates video with synchronized audio from text input. Grounded in Gemini's real-world knowledge, with improved physics understanding for more coherent motion and interaction. stylized transform lipsync	OK	6/30	→
Extract Object image-to-image	Bria Extract Object uses text prompts to isolate a selected object from an image and return it as an RGBA PNG with a transparent background. Ideal for product, ecommerce, advertising, and creative editing workflows. Bria's Extract Object API leads in product shot extraction, outperforming SAM 3.1 where it counts most for commercial use.	OK	6/28	→
Ltx 2.3 Quality video-to-video	Transform your 3D video render into realistic using first frame with Ltx 2.3 3d video	OK	6/26	→
Seed Audio 1.0 text-to-audio	Seed Audio 1.0 is a new audio model from Bytedance that can generate high-quality, natural sounding audio using text, reference audios or an image.	OK	6/25	→
Ltx 2.3 Quality video-to-video	Cross-eyes for high-quality video using LTX-2.3 eyes	OK	6/25	→
Ltx 2.3 Quality video-to-video	Day to Night for high-quality video using LTX-2.3 day night	OK	6/25	→
Ltx 2.3 Quality video-to-video	Water Simulation transformation for high-quality video using LTX-2.3 water simulation	OK	6/25	→
Ltx 2.3 Quality video-to-video	Instant shave high-quality video using LTX-2.3 shave hair	OK	6/25	→
Ltx 2.3 Quality video-to-video	Decompression / Denoise high-quality video using LTX-2.3 decompression denoise	OK	6/25	→
Ltx 2.3 Quality video-to-video	Deblur high-quality video using LTX-2.3 deblur denoise	OK	6/24	→
Ltx 2.3 Quality video-to-video	Colorize high-quality video using LTX-2.3	OK	6/24	→
TRELLIS.2 LoRA Inference image-to-3d	Run inference on LoRA adapters for TRELLIS.2 model	OK	6/23	→
Seedance 2.0 Mini image-to-video	Seedance 2.0 Mini is a faster version of Seedance 2.0 that brings great performance and high generation speed at a lower cost. stylized transform lipsync	OK	6/23	→
TRELLIS.2 Trainer training	Train LoRA adapters for TRELLIS.2 model	OK	6/23	→
Seedance 2.0 Mini Image to Video image-to-video	Seedance 2.0 Mini is a faster version of Seedance 2.0 that brings great performance and high generation speed at a lower cost. stylized transform lipsync	OK	6/23	→
Seedance 2.0 Mini Text to Video text-to-video	Seedance 2.0 Mini is a faster version of Seedance 2.0 that brings great performance and high generation speed at a lower cost. stylized transform lipsync	OK	6/23	→
Telestyle V2 Style Transfer image-to-image	Restyle any image with TeleStyle v2 — provide an original image and a styling reference, and the model re-renders the original in the reference's visual style while preserving its content and composition. stylized transform editing	OK	6/22	→
sync-3 Avatar Image to Video image-to-video	sync-3 image to video turns a single still into a talking character, and works with any illustration or animated frame paired with a voice track animation lip sync text-to-speech	OK	6/22	→
Happy Horse 1.1 Reference to Video image-to-video	Happy Horse 1.1 is Alibaba's #1-ranked video model. This reference-to-video endpoint turns up to 9 reference images into 1080p video with synchronized native audio and multilingual lip-sync for consistent characters. happy-horse video reference	OK	6/21	→
Happy Horse 1.1 Image to Video image-to-video	Happy Horse 1.1 is Alibaba's #1-ranked video model. This image-to-video endpoint animates a still image into 1080p video with synchronized native audio and multilingual lip-sync happy-horse video image	OK	6/21	→
Happy Horse 1.1 Text to Video text-to-video	Happy Horse 1.1 is Alibaba's #1-ranked video model. This text-to-video endpoint generates 1080p video with synchronized native audio and multilingual lip-sync from a text prompt alone. happy-horse video text	OK	6/21	→
Krea 2 Text to Image Turbo LoRA text-to-image	Generate high-fidelity images from text with Krea 2 using a custom-trained LoRA. Apply your LoRA weights to carry a learned subject, character, or style into new generations, with aspect ratio, creativity, and seed controls. stylized transform typography realism	OK	6/19	→
Krea 2 Trainer training	Train a custom LoRA on your own images to teach Krea 2 a new subject, character, or style. Provide a set of training images (and an optional trigger word), and the trainer outputs LoRA weights you can use for inference with the Krea 2 LoRA endpoint. lora personalization	OK	6/19	→
Ltx 2.3 Quality image-to-video	Generate high-quality video with audio from reference, character sheet, storyboard using LTX-2.3 ingredient storyboard video	OK	6/18	→
Async Text to Speech Pro V1.0 text-to-speech	Generate professional-quality voiceovers in seconds with Async TTS Pro model text-based control over pauses, emphasis, and timing. Voice ids can be found at https://async.com/developer/voice-library text-to-speech voice-clone lipsync	OK	6/18	→
Krea 2 Turbo text-to-image	Generate high-fidelity images from text in seconds with Krea 2 Turbo, the speed-optimized open-source version of Krea 2, preserving its aesthetic range for rapid ideation. stylized transform typography	OK	6/18	→
Scail 2 video-to-video	SCAIL-2 is an end-to-end character animation model that drives a reference character from a source video without relying on intermediate pose representations like skeleton maps. stylized transform	OK	6/18	→
Ltx 2.3 Quality video-to-video	Inpaint high-quality video using LTX-2.3 with lora inpaint	OK	6/18	→
Ltx 2.3 Quality video-to-video	Inpaint high-quality video using LTX-2.3	OK	6/18	→
Boogu Image image-to-image	Image To Image Model using Boogu-Image	OK	6/18	→
Boogu Image text-to-image	Text To Image Model using Boogu-Image	OK	6/18	→
Sensenova U1 Infographic text-to-image	Generate Infographic Image with Sensenova U1	OK	6/17	→
Ltx 2.3 Quality text-to-audio	Text to Audio high-quality using LTX-2.3 with Lora text-to-audio	OK	6/17	→
Ltx 2.3 Quality text-to-audio	Text to Audio high-quality using LTX-2.3 text-to-audio	OK	6/17	→
Ltx 2.3 Quality video-to-video	Outpaint high-quality video using LTX-2.3 with Lora outpaint outpainting	OK	6/17	→
Ltx 2.3 Quality video-to-video	Outpaint high-quality video using LTX-2.3 outpaint outpainting	OK	6/17	→
LTX 2.3 Trainer (V2) - Masked Audio+Video IC-LoRA training	Train an IC-LoRA that regenerates a masked video region (guided by kept pixels and a video reference) while jointly generating audio from an audio reference.	OK	6/17	→
LTX 2.3 Trainer (V2) - Masked Video-to-Video IC-LoRA training	Train an IC-LoRA that regenerates only the masked region of a video, guided by the kept pixels and a separate reference/control video.	OK	6/17	→
LTX 2.3 Trainer (V2) - Audio+Video Reference IC-LoRA training	Train an IC-LoRA for a joint audio+video transformation, conditioned on a reference clip's video and audio to produce a matching target.	OK	6/17	→
LTX 2.3 Trainer (V2) - Audio-to-Audio IC-LoRA training	Train an IC-LoRA that transforms one audio clip into another, conditioned at inference on a reference audio clip.	OK	6/17	→
LTX 2.3 Trainer (V2) - Video-to-Video IC-LoRA training	Train an IC-LoRA that learns a video-to-video transformation from paired before/after clips, conditioned at inference on a reference (control) video.	OK	6/17	→
LTX 2.3 Trainer (V2) - Audio Inpainting training	Train a LoRA that regenerates masked time spans of an audio clip while keeping the rest unchanged.	OK	6/17	→
LTX 2.3 Trainer (V2) - Backward Audio Extension training	Train a LoRA that generates the lead-in to an audio clip, extending audio backward in time from its ending.	OK	6/17	→
LTX 2.3 Trainer (V2) - Forward Audio Extension training	Train a LoRA that continues an audio clip forward in time, generating the audio that follows a short clean prefix.	OK	6/17	→
LTX 2.3 Trainer (V2) - Text-to-Audio training	Train a LoRA that generates audio from a text prompt — the audio counterpart of text-to-video — learning a sound or style from your clips.	OK	6/17	→
LTX 2.3 Trainer (V2) - Keyframe Interpolation training	Train a LoRA that generates the video between keyframes — supply first/last (and optional middle) frames at inference and the model fills the in-between motion.	OK	6/17	→
LTX 2.3 Trainer (V2) - Masked Audio+Video Transformation training	Train a LoRA that regenerates a masked video region (guided by kept pixels and a video reference) while jointly generating audio from an audio reference.	OK	6/17	→
LTX 2.3 Trainer (V2) - Masked Video-to-Video training	Train a LoRA that regenerates only the masked region of a video, guided by both the kept pixels and a separate reference/control video.	OK	6/17	→
LTX 2.3 Trainer (V2) - Audio+Video Reference Transformation training	Train a LoRA for a joint audio+video transformation, conditioned on a reference clip (its video and audio) to produce a matching target clip.	OK	6/17	→
LTX 2.3 Trainer (V2) - Audio-to-Audio training	Train a LoRA that transforms one audio clip into another, learning a reference→target mapping from paired audio examples.	OK	6/17	→
LTX 2.3 Trainer (V2) - Video Inpainting training	Train a LoRA that regenerates a masked region of a video while keeping the rest unchanged, blending the new content with its surroundings.	OK	6/17	→
LTX 2.3 Trainer (V2) - Spatial Outpainting training	Train a LoRA that expands the video frame outward, keeping an inner rectangle fixed and generating the surrounding region.	OK	6/17	→
LTX 2.3 Trainer (V2) - Backward Video Extension training	Train a LoRA that generates the lead-in to a video, extending a clip backward in time from its ending.	OK	6/17	→
LTX 2.3 Trainer (V2) - Forward Video Extension training	Train a LoRA that continues a video forward in time — supply an opening clip at inference and the model generates what comes next.	OK	6/17	→
LTX 2.3 Trainer (V2) - Video-to-Audio training	Train a LoRA that generates audio (foley / sound design) for a silent video, learning a soundtrack that matches the on-screen action.	OK	6/17	→
LTX 2.3 Trainer (V2) - Audio-to-Video training	Train a LoRA that generates video from a start image plus a conditioning audio track, producing motion that matches the sound.	OK	6/17	→
LTX 2.3 Trainer (V2) - Video-to-Video training	Train a LoRA that learns a video-to-video transformation from paired before/after clips, steered at inference by a reference (control) video.	OK	6/17	→
LTX 2.3 Trainer (V2) - Image-to-Video training	Fine-tune LTX 2.3 to animate a starting image — supply a still plus a prompt at inference and the model generates a video that begins from that frame.	OK	6/17	→
LTX 2.3 Trainer (V2) - Text-to-Video training	Fine-tune LTX 2.3 on your own clips to teach it a new subject, character, object, or visual style, then generate full videos from a text prompt.	OK	6/17	→
Kling Video V3 Standard Turbo Text to Video text-to-video	Kling 3.0 Turbo Standard is a fast, cost-efficient video generation model that turns text prompts directly into 720P video with native audio, optimized for rapid iteration and high-volume production stylized transform lipsync	OK	6/16	→
Kling Video V3 Turbo Pro Text to Video text-to-video	Generate high quality 1080p videos using Kling's Turbo 3.0 model, with improved lipsync and multishot generation capabilities. kling v3 1080p turbo	OK	6/16	→
Kling Video V3 Standard Turbo Image to Video image-to-video	Kling 3.0 Turbo Standard animates a first and last frame reference image into 720P video with native audio, delivering quick, affordable image-driven motion for fast turnaround stylized transform lipsync	OK	6/16	→
Kling Video V3 Turbo Pro Image to Video image-to-video	Generate high quality 1080p videos from images using Kling's Turbo 3.0 model, with improved lipsync and multishot generation capabilities. kling v3 turbo 1080p	OK	6/16	→
Zonos2 Text to Speech text-to-speech	Zonos2 is a text-to-speech model that clones a voice from a short sample and speaks naturally across many languages. text-to-speech tts voice cloning	OK	6/16	→
Meshy Rigging Multi Animation 3d-to-3d	Meshy auto-rigs a humanoid 3D model fitting a skeleton and binding the mesh, then applies several motion presets from its animation library stylized transform 3D	OK	6/12	→
Pixelcut Video Background Removal video-to-video	Pixelcut's Video Background Remover is an AI segmentation model that erases backgrounds frame by frame, with seamless temporal consistency. transform utility rembg	OK	6/11	→
Stable Audio 3 Trainer training	Stable Audio 3 LoRA Trainer fine-tunes Stable Audio 3 base models on paired audio-caption datasets, producing compact LoRA weights that adapt generation toward a custom music style, sound palette, or domain. music audio sfx lora	OK	6/11	→
Luma Ray 3.2 Reframe video-to-video	Luma Ray 3.2 reframes an existing video into a new aspect ratio guided by a text prompt, preserving the original footage frame-for-frame while controlling resolution and outpainting the surrounding canvas. stylized transform lipsync	OK	6/11	→
Luma Ray 3.2 Video to Video video-to-video	Luma Ray 3.2 re-renders an existing video into new cinematic motion guided by a text prompt, preserving the source's look and movement while controlling resolution, duration, and HDR. stylized transform lipsync	OK	6/11	→
Ideogram V4.0q Tiling LoRA image-to-image	Ideogram V4.0q Tiling LoRA produces seamless repeatable patterns guided by a custom-trained LoRA, locking a specific aesthetic or motif into tileable textures for cohesive, large-scale surface design. stylized transform realism	OK	6/10	→
Ideogram V4.0q Tiling image-to-image	Ideogram V4.0q Tiling generates seamless, edge-matching textures and patterns that repeat infinitely in any direction, ideal for backgrounds, surfaces, and wallpapers. stylized transform realism	OK	6/10	→
Ideogram V4.0q Image to Image LoRA image-to-image	Ideogram V4.0q Image-to-Image LoRA applies a custom-trained LoRA on top of an input image, steering edits toward a specific style, subject, or brand identity while keeping the source composition intact. stylized transform realism	OK	6/10	→
Ideogram V4.0q Image to Image image-to-image	Ideogram V4.0q Image-to-Image transforms an input image with a text prompt, restyling and reworking the composition while preserving its core structure for prompt-faithful, high-fidelity edits. realism typography stylized	OK	6/10	→
Ideogram V4.0q LoRA Trainer training	Train custom LoRAs for personalization, styles or other use cases on top of Ideogram V4.	OK	6/9	→
Luma Uni-1 Text to Image text-to-image	Luma Uni-1 turns a text prompt into a single high-fidelity image, with control over aspect ratio and visual style, plus optional web-sourced and reference-image guidance for sharper grounding. realism typography stylized	OK	6/9	→
Luma Uni-1 Edit Max image-to-image	Luma Uni-1 Max Edit applies text-guided edits to a source image at maximum fidelity, holding the original structure while honoring reference images for precise, high-detail revisions. realism typography stylized	OK	6/9	→
Luma Uni-1 Text to Image Max text-to-image	Luma Uni-1 Max generates a single image at the model's highest fidelity, delivering richer detail and stronger prompt adherence than the base tier for hero-quality stills. realism typography stylized	OK	6/9	→
Luma Uni-1 Edit image-to-image	Luma Uni-1 Edit reworks a source image from a text instruction, preserving the original composition while applying style changes and following optional reference images to steer the result. stylized transform	OK	6/9	→
Luma Ray 3.2 Image to Video image-to-video	Luma Ray 3.2 animates a source image into cinematic motion guided by a text prompt, preserving the starting frame's look while controlling resolution, duration, and seamless looping. stylized transform lipsync	OK	6/9	→
Luma Ray 3.2 Text to Video text-to-video	Luma Ray 3.2 generates cinematic video from a text prompt, with control over resolution, duration, and seamless looping, plus reference images to lock in subject and style. stylized transform lipsync	OK	6/9	→
Bria's VRMBG 3.0 Realtime video-to-video	Remove video backgrounds in real time with Bria’s VRMBG 3.0 model. Built for live streaming, real-time video apps, content creation, and low-latency workflows that need fast, accurate background removal. bria video background-removal realtime	OK	6/9	→
Ideogram V4.0q Text to Image (LoRA) text-to-image	Generate high-quality images, posters, and logos with Ideogram's latest V4.0q using LoRA — producing crisp visuals with accurate text rendering, fine detail, and full creative control for polished, ready-to-use designs. realism typography stylized	OK	6/8	→
Bernini-R Edit Image image-to-image	Edit any image with a natural-language instruction using Bernini-R, changing the weather, materials, objects, or style while preserving the original composition. edit image transform	OK	6/8	→
Bernini-R Reference to Video image-to-video	Turn up to five reference images into one continuous, consistent video with Bernini-R, with smooth, stable camera motion and no scene cuts. reference video stylized	OK	6/8	→
Bernini-R Reference Edit Video video-to-video	Edit a video guided by reference images with Bernini-R, bringing an object, material, background, style, or weather from a reference image into your video. edit reference transform	OK	6/8	→
Bernini-R Text to Video text-to-video	Generate high-quality video from a text prompt with Bernini-R, ByteDance's unified video generation and editing model. text-to-video cinematic	OK	6/8	→
Bernini-R Edit Video video-to-video	Edit any video with a natural-language instruction using Bernini-R, changing objects, weather, background, or camera angle while keeping the rest of the scene intact. edit transform stylized	OK	6/8	→
Genfill image-to-image	The GenFill Route enables the generation of objects by prompt in a specific region of an image. You can define the area for object generation by using a mask that outlines the region where the object will be created. Our model is optimized to work seamlessly with blob-shaped masks.	OK	6/8	→
Fibo Edit image-to-image	The GenFill Route enables the generation of objects by prompt in a specific region of an image. You can define the area for object generation by using a mask that outlines the region where the object will be created. Our model is optimized to work seamlessly with blob-shaped masks.	Deprecated	6/8	→
Bria's VRMBG 3.0 video-to-video	Remove backgrounds from any video with Bria's VRMBG 3.0. Fast, accurate background removal across talking heads, podcasts, product videos, commercials, and cinematic footage. video-to-video	OK	6/8	→
Hyper3D - Rodin V2.5 - Text to 3D - Fast text-to-3d	Rodin V2.5 by Hyper3D generates realistic and production ready 3D models from text or images. Do fast prototyping using the fast model. text-to-3d	OK	6/5	→
Hyper3D - Rodin V2.5 - Image to 3D - Fast image-to-3d	Rodin V2.5 by Hyper3D generates realistic and production ready 3D models from text or images. Do fast prototyping using the fast model. image-to-3d	OK	6/5	→
Bytedance Seed Speech Text to Speech text-to-speech	Seed Speech developed by ByteDance, is a family of large-scale text-to-speech models capable of synthesizing speech that is virtually indistinguishable from human speech. stylized transform lipsync	OK	6/5	→
Scene Finder vision	Search any video with a text prompt - Scene Finder locates the matching moments and returns their time segments and extracted frames. video scene detection video search moment retrieval vision video understanding	OK	6/3	→
Krea 2 Medium Text to Image Turbo text-to-image	Generate high-fidelity images extremely fast from text with Krea 2 Medium Turbo, supporting aspect ratio, creativity, seed controls, and optional style references. stylized transform typography	OK	6/3	→
Mai Image 2.5 image-to-image	MAI-Image-2.5 is Microsoft's photorealistic image generation and editing model that turns text prompts or uploaded images into high-quality, design-ready visuals with fine-grained, pixel-level control. realism typography stylized	OK	6/3	→
Triposplat image-to-3d	TripoSplat is an open-source model from TripoAI / VAST AI Research that converts a single 2D image into high-quality 3D Gaussians using a novel learned density-control approach 3D gaussian-splat	OK	6/3	→
V1.1 video-to-audio	Analyzes your video’s pacing, mood, and timing to generate a frame-synced, licensed, commercial-use-safe soundtrack in seconds. stylized transform lipsync	OK	6/3	→
Sonilo V1.1 Text to Music text-to-audio	Generates licensed, commercial-use-safe music from a single text prompt, with full control over style, mood, instrumentation, and exact duration. stylized transform lipsync	OK	6/3	→
Mai Image 2.5 Text to Image text-to-image	MAI-Image-2.5 is Microsoft's photorealistic image generation and editing model that turns text prompts or uploaded images into high-quality, design-ready visuals with fine-grained, pixel-level control. realism typography stylized	OK	6/2	→
Ideogram V4.0 Text to Image text-to-image	Generate high-quality images, posters, and logos with Ideogram's latest V4.0q — producing crisp visuals with accurate text rendering, fine detail, and full creative control for polished, ready-to-use designs. realism typography stylized	OK	6/2	→
Cosmos 3 Super Image to Video image-to-video	Cosmos3 is a collection of Omnimodal world models capable of generating dynamic, high-quality video, image, audio, and action commands from combinations of text, image, video, and action trajectory inputs. stylized transform lipsync	OK	6/1	→
Ltx 2.3 Quality video-to-video	Generate HDR from reference video using LTX-2.3 with lora video-to-video lora	OK	6/1	→
Ltx 2.3 Quality video-to-video	Generate HDR from reference video using LTX-2.3	OK	6/1	→
Ltx 2.3 Quality video-to-video	Generate high-quality video with audio from reference video, text and images using LTX-2.3 and custom LoRA	OK	6/1	→
Ltx 2.3 Quality video-to-video	Generate high-quality video with audio from reference video, text and images using LTX-2.3	OK	6/1	→
Ltx 2.3 Quality audio-to-video	Generate high-quality video with audio from audio, text and images using LTX-2.3 and custom LoRA audio-to-video lora	OK	6/1	→
Ltx 2.3 Quality audio-to-video	Generate high-quality video with audio from audio, text and images using LTX-2.3 audio-to-video	OK	6/1	→
Cosmos 3 Super text-to-image	Cosmos3 is a collection of Omnimodal world models capable of generating dynamic, high-quality video, image, audio, and action commands from combinations of text, image, video, and action trajectory inputs. stylized transform realism	OK	6/1	→
Ltx 2.3 Quality image-to-video	Generate high-quality video with audio from images using LTX-2.3 and custom LoRA image-to-video	OK	6/1	→
Ltx 2.3 Quality image-to-video	Generate high-quality video with audio from images using LTX-2.3 image-to-video	OK	6/1	→
Ltx 2.3 Quality text-to-video	Generate high-quality video with audio from text using LTX-2.3 and custom LoRA lora text-to-video video	OK	6/1	→
Ltx 2.3 Quality text-to-video	Generate high-quality video with audio from text using LTX-2.3 text-to-video video	OK	6/1	→
Nemotron Asr Multilingual speech-to-text	Nemotron-ASR-Streaming is a multi lingual, streaming Automatic Speech Recognition (ASR) engineered to deliver high-quality multi lingual transcription across both low-latency streaming and high-throughput batch workloads. utility transcribe	OK	6/1	→
Grok Imagine Video 1.5 image-to-video	Generate videos from images with audio using xAI's Grok Imagine 1.5 Video model. stylized transform lipsync	OK	5/31	→
Hyper3D - Rodin V2.5 - Image to 3D image-to-3d	Rodin V2.5 by Hyper3D generates realistic and production ready 3D models from text or images. image-to-3d	OK	5/28	→
Hyper3D - Rodin V2.5 - Text to 3D text-to-3d	Rodin V2.5 by Hyper3D generates realistic and production ready 3D models from text or images. text-to-3d	OK	5/28	→
FLUX Virtual Try-On image-to-image	Generate virtual try-on results from a person image plus one or more garment references. image-to-image vton	OK	5/27	→
ControlLight image-to-image	ControlLight is a LoRA fine-tune of FLUX.2 [klein] 9B that enhances low-light images while preserving scene structure and fine details, with a single alpha parameter that gives continuous control over enhancement strength from subtle to full brightening. stylized transform	OK	5/27	→
Krea 2 Medium text-to-image	Generate high-quality images from text with Krea 2 Medium, supporting aspect ratio, creativity controls, seeds, and optional style references. text-to-image image-generation style-reference krea krea-2	OK	5/27	→
Krea 2 Large text-to-image	Generate high-fidelity images from text with Krea 2 Large, supporting aspect ratio, creativity, seed controls, and optional style references. text-to-image image-generation style-reference krea krea-2	OK	5/27	→
Stable Audio 3 Small SFX Base Audio Outpainting audio-to-audio	Stable Audio 3 Small SFX Base audio outpainting is the foundational 459 million parameter checkpoint that extends sound-effect tracks via causal continuation guided by text prompts. sfx extension continuation	OK	5/25	→
Stable Audio 3 Small SFX Audio Outpainting audio-to-audio	Stable Audio 3 Small SFX audio outpainting is a 459 million parameter latent diffusion model that extends sound-effect tracks beyond their original endpoint via causal continuation. sfx extension continuation	OK	5/25	→
Stable Audio 3 Small SFX Base Audio Inpainting audio-to-audio	Stable Audio 3 Small SFX Base audio inpainting is the foundational 459 million parameter checkpoint for editing or filling selected sound-effect segments guided by text prompts. sfx editing restoration	OK	5/25	→
Stable Audio 3 Small SFX Audio Inpainting audio-to-audio	Stable Audio 3 Small SFX audio inpainting is a 459 million parameter latent diffusion model that fills in or reworks selected segments of a sound-effect track guided by text prompts. sfx editing restoration	OK	5/25	→
Stable Audio 3 Small SFX Base Audio to Audio audio-to-audio	Stable Audio 3 Small SFX Base audio-to-audio is the foundational 459 million parameter checkpoint that transforms input audio into new sound-effect variations guided by text prompts. sfx sound-effects style-transfer	OK	5/25	→
Stable Audio 3 audio-to-audio	Stable Audio 3 Small SFX audio-to-audio is a 459 million parameter latent diffusion model that transforms input audio into new sound-effect variations guided by text prompts. sfx sound-effects style-transfer	OK	5/25	→
Stable Audio 3 Small SFX Base Text to Audio text-to-audio	Stable Audio 3 Small SFX Base is the foundational 459 million parameter checkpoint generating sound effects from text prompts, intended as the unmodified base for fine-tuning. sfx sound-effects on-device	OK	5/25	→
Stable Audio 3 Small SFX Text to Audio text-to-audio	Stable Audio 3 Small SFX is a 459 million parameter latent diffusion model that generates high-quality sound effects from text prompts, designed for on-device deployment on mobile phones and consumer laptops. sfx sound-effects on-device	OK	5/25	→
Stable Audio 3 Small Music Base Audio Outpainting audio-to-audio	Stable Audio 3 Small Music Base audio outpainting is the foundational 459 million parameter checkpoint that extends music tracks via causal continuation guided by text prompts. music extension continuation	OK	5/25	→
Stable Audio 3 Small Music Audio Outpainting audio-to-audio	Stable Audio 3 Small Music audio outpainting is a 459 million parameter latent diffusion model that extends music compositions beyond their original endpoint via causal continuation. music extension continuation	OK	5/25	→
Stable Audio 3 Small Music Base Audio Inpainting audio-to-audio	Stable Audio 3 Small Music Base audio inpainting is the foundational 459 million parameter checkpoint for editing or filling selected music segments guided by text prompts. music editing restoration	OK	5/25	→
Stable Audio 3 Small Music Audio Inpainting audio-to-audio	Stable Audio 3 Small Music audio inpainting is a 459 million parameter latent diffusion model that fills in or reworks selected segments of a music track guided by text prompts. music editing restoration	OK	5/25	→
Stable Audio 3 Small Music Base Audio to Audio audio-to-audio	Stable Audio 3 Small Music Base audio-to-audio is the foundational 459 million parameter checkpoint that transforms input music into new variations up to 2 minutes guided by text prompts. music style-transfer remix	OK	5/25	→
Stable Audio 3 audio-to-audio	Stable Audio 3 Small Music audio-to-audio is a 459 million parameter latent diffusion model that transforms input music into new variations up to 2 minutes guided by text prompts. music style-transfer remix	OK	5/22	→
Stable Audio 3 text-to-audio	Stable Audio 3 Small Music Base is the foundational 459 million parameter checkpoint generating full music compositions up to 2 minutes from text prompts, intended as the unmodified base for fine-tuning. music on-device lightweight	OK	5/22	→
Stable Audio 3 Small Music Text to Audio text-to-audio	Stable Audio 3 Small Music is a 459 million parameter latent diffusion model that generates full stereo music compositions up to 2 minutes from text prompts, lightweight enough for on-device deployment. music on-device lightweight	OK	5/22	→
Stable Audio 3 Medium Base Audio Outpainting audio-to-audio	Stable Audio 3 Medium Base audio outpainting is the foundational 1.4 billion parameter checkpoint that extends existing stereo audio with causal continuation guided by text prompts. music extension continuation	OK	5/22	→
Stable Audio 3 Medium Audio Outpainting audio-to-audio	Stable Audio 3 Medium audio outpainting is a 1.4 billion parameter latent diffusion model that extends existing stereo audio beyond its original endpoint via causal continuation guided by text prompts. music extension continuation	OK	5/22	→
Stable Audio 3 Medium Base Audio Inpainting audio-to-audio	Stable Audio 3 Medium Base audio inpainting is the foundational 1.4 billion parameter checkpoint for editing or filling selected stereo audio segments guided by text prompts. music editing restoration	OK	5/22	→
Stable Audio 3 Medium Audio Inpainting audio-to-audio	Stable Audio 3 Medium audio inpainting is a 1.4 billion parameter latent diffusion model that fills in or reworks selected segments of a stereo track guided by text prompts, supporting single- and multi-segment editing. music editing restoration	OK	5/22	→
Stable Audio 3 Medium Base Audio to Audio audio-to-audio	Stable Audio 3 Medium Base audio-to-audio is the foundational 1.4 billion parameter checkpoint that transforms input audio into new stereo variations up to 6 minutes guided by text prompts. music style-transfer remix	OK	5/22	→
Stable Audio 3 Medium Audio to Audio audio-to-audio	Stable Audio 3 Medium audio-to-audio is a 1.4 billion parameter latent diffusion model that transforms an input audio clip into new stereo variations up to 6 minutes guided by a text prompt. music style-transfer remix	OK	5/22	→
Stable Audio 3 Medium Base Text to Audio text-to-audio	Stable Audio 3 Medium Base is the foundational 1.4 billion parameter text-to-audio checkpoint generating stereo music up to 6 minutes, intended as the unmodified base for custom fine-tuning workflows. music audio stereo	OK	5/22	→
Stable Audio 3 text-to-audio	Stable Audio 3 Medium is a 1.4 billion parameter latent diffusion model that generates high-quality stereo music up to 6 minutes from text prompts, trained on fully licensed data for safe commercial use. music audio stereo	OK	5/22	→
Flux Pro Erase image-to-image	Latest object erasing model from Black Forest Labs. Remove undesired objects, texts from images. utility editing	OK	5/21	→
Marlin vision	Marlin is a 2B video VLM tuned for the two questions developers actually want to ask of their videos: what is happening, and when? utility editing	OK	5/21	→
Marlin Find vision	Marlin is a 2B video VLM tuned for the two questions developers actually want to ask of their videos: what is happening, and when? utility editing	OK	5/21	→
Nemotron Diffusion Vlm vision	Nemotron-Labs-Diffusion-VLM-8B is the vision-language extension of the Nemotron-Labs-Diffusion family. utility editing	OK	5/21	→
Lyria 3 Pro text-to-audio	Lyria 3 Pro is the latest music model from Google audio sfx	OK	5/20	→
Meshy Rigging 3d-to-3d	Rig humanoid 3D models from GLB URLs with Meshy, returning rigged GLB/FBX files plus basic animations. 3d-to-3d rigging	OK	5/19	→
Imagineart 2.0 Edit Preview image-to-image	ImagineArt 2.0 Edit delivers precise prompt-guided image editing at 2K resolution, preserving fine detail and realism while accurately applying targeted changes across one or more reference images. stylized transform typography	OK	5/19	→
Mirelo SFX1.6 audio-to-audio	Erase and replace any moment in your audio with AI-driven precision. audio-to-audio sfx	OK	5/18	→
Mirelo SFX1.6 audio-to-audio	Extend any sound effect with seamless, natural tails. audio-to-audio sfx	OK	5/18	→
Mirelo SFX1.6 video-to-video	Generate synced sounds for any video, and return it with its new sound track (like MMAudio). Now up to 60 seconds! video-to-video sfx	OK	5/18	→
Heygen v5 Digital Twin text-to-video	Create natural HeyGen Avatar V digital twin videos from text or audio, with lip-sync, optional backgrounds, captions, and MP4/WebM output. avatar digital-twin talking-avatar text-to-video lip-sync heygen video-generation	OK	5/17	→
Mirelo SFX1.6 text-to-audio	Generate ambient sounds for any text prompt. Now you can turn any SFX into a natural loop for ambient soundscapes. text-to-audio sfx	OK	5/15	→
FLUX 2 Pro Outpaint image-to-image	Outpainting generation with FLUX.2 [pro] from Black Forest Labs. Optimized for maximum quality, exceptional photorealism and artistic images. image-to-image outpaint outpainting	OK	5/14	→
Recraft V4.1 Text to Image Utility text-to-image	Recraft V4.1 Utility is a faster, lighter variant of V4.1 made for high-volume creative workflows. Ideal for ideation, A/B exploration, and content pipelines, it keeps Recraft's design sensibility while optimizing for throughput and cost. stylized transform typography	OK	5/14	→
Recraft V4.1 Utility Text to Image text-to-image	Recraft V4.1 Utility Pro pairs the high-resolution output of V4.1 Pro with a faster, cost-efficient runtime. Designed for studios shipping large-format work at scale, it makes premium-quality raster generation viable across full creative pipelines. stylized transform typography	OK	5/14	→
Recraft V4.1 Text to Vector text-to-image	Recraft V4.1 Vector turns prompts into fully editable SVGs with structured layers and clean geometry. Built for logos, icons, and illustration systems, it produces artwork that goes straight from generation into Figma or Illustrator. stylized transform typography	OK	5/14	→
Recraft V4.1 Text to Image text-to-image	Recraft V4.1 builds on the design-first foundation of V4 with sharper prompt control and cleaner composition. Tuned for brand systems and editorial work, it delivers production-ready raster images that hold up next to a designer's hand. stylized transform typography	OK	5/14	→
Recraft V4.1 Text to Vector Pro text-to-image	Recraft V4.1 Pro Vector generates large-format, fully editable SVGs with the structural clarity professional illustrators expect. Built for poster art, complex brand assets, and detailed scene illustration, it scales without losing geometric integrity. stylized transform typography	OK	5/14	→
Recraft V4.1 Text to Image Pro text-to-image	Recraft V4.1 Pro pushes the V4.1 model into high-resolution territory — up to 2048×2048 and ultra-wide formats. Made for hero imagery, campaign work, and print, it preserves the same design taste at sizes ready for the final deliverable. stylized transform typography	OK	5/14	→
Pixal3d image-to-3d	Pixal3D turns a single image into a high-fidelity 3D model with detailed geometry and realistic textures. stylized transform	OK	5/13	→
Subtitles video-to-video	VEED’s Subtitles API transforms raw footage into polished, publish-ready content with professional burned-in subtitles starting at a base rate of $0.10 per minute.	OK	5/11	→
Hidream O1 Image image-to-image	Unified image generation with HiDream-O1-Image. Create, edit, and personalize high-resolution images up to 2K—single native model handles text-to-image, editing, and custom subjects without external components.	OK	5/9	→
Hidream O1 Image image-to-image	Unified image generation with HiDream-O1-Image. Create, edit, and personalize high-resolution images up to 2K—single native model handles text-to-image, editing, and custom subjects without external components.	OK	5/9	→
Hidream O1 Image text-to-image	Unified image generation with HiDream-O1-Image. Create, edit, and personalize high-resolution images up to 2K—single native model handles text-to-image, editing, and custom subjects without external components.	OK	5/9	→
Hidream O1 Image text-to-image	Unified image generation with HiDream-O1-Image. Create, edit, and personalize high-resolution images up to 2K—single native model handles text-to-image, editing, and custom subjects without external components.	OK	5/9	→
Ideogram Remove Background image-to-image	Remove backgrounds from existing images with Ideogram's remove background feature. Isolate subjects cleanly for compositing and creative reuse.	OK	5/7	→
Controlfoley video-to-video	Foley Control is a video-to-audio model that automatically generates synchronized sound effects for videos, using text prompts to shape the type of sound while matching the timing and action on screen. stylized transform lipsync	OK	5/5	→
Ffmpeg Api Images to Video image-to-video	A fal.ai endpoint that stitches an ordered list of images into an MP4 video by holding each image for a specified number of frames at a configurable frame rate utility editing	OK	5/5	→
Workflow Utilities Pick Image By Index workflow	Choose the Nth image from an image URL list for workflows.	OK	4/29	→
Lucy 2.1 VTON Realtime video-to-video	Realtime Try On experience with Decart Lucy 2.1 VTON	OK	4/29	→
Smart Resize image-to-image	Smart image resize to arbitrary dimensions, powered by Nano Banana Pro with vision-LLM-guided prompting for composition-aware recomposition. Crop, cropping, resize ads. realism typography visual ads	OK	4/28	→
Nemotron 3 Nano Omni image-to-text	Vision reasoning variant of NVIDIA's Nemotron 3 Nano Omni. 30B A3B hybrid Transformer-Mamba MoE - accepts an image plus a prompt and returns text. nemotron nvidia image-to-text vision-language vision-reasoning reasoning agentic agents open-weights hybrid-moe mamba 30b-a3b	OK	4/27	→
Nemotron 3 Nano Omni video-to-text	Video reasoning variant of NVIDIA's Nemotron 3 Nano Omni. 30B A3B hybrid Transformer-Mamba MoE - accepts video plus a prompt and returns text. nemotron nvidia video-to-text video-understanding video-reasoning reasoning agentic agents open-weights hybrid-moe mamba 30b-a3b	OK	4/27	→
Nemotron 3 Nano Omni audio-to-text	Audio reasoning variant of NVIDIA's Nemotron 3 Nano Omni. 30B A3B hybrid Transformer-Mamba MoE - accepts audio plus a prompt and returns text. nemotron nvidia audio-to-text audio-understanding audio-reasoning reasoning agentic agents open-weights hybrid-moe mamba 30b-a3b	OK	4/27	→
Nemotron 3 Nano Omni llm	Open, efficient reasoning model from NVIDIA. 30B A3B hybrid Transformer-Mamba MoE, built for enterprise agentic workflows. nemotron nvidia reasoning llm agentic agents instruct open-weights hybrid-moe mamba 30b-a3b long-context	OK	4/27	→
Happy Horse image-to-video	Generate 1080p video with synchronized native audio from a text prompt and references. Aspect ratios: 16:9, 9:16, 1:1, 4:3, 3:4. Duration: 3–15s. stylized transform lipsync	OK	4/27	→
Happy Horse Video Edit video-to-video	HappyHorse video editing supports advanced video editing through natural language instructions. It allows for local or global editing of video elements using up to 5 reference images. happy-horse video-editing video-to-video	OK	4/27	→
Happy Horse image-to-video	Alibaba's #1-ranked Happy Horse 1.0 — generate 1080p video with synchronized native audio and multilingual lip-sync from text prompts or images. video happy-horse	OK	4/24	→
Happy Horse text-to-video	Generate 1080p video with synchronized native audio from a text prompt. Aspect ratios: 16:9, 9:16, 1:1, 4:3, 3:4. Duration: 3–15s. happy-horse	OK	4/24	→
Meshy 6 - Multi Image To 3D image-to-3d	Meshy-6 is the latest model from Meshy. It generates realistic and production ready 3D models. image-to-3d	OK	4/23	→
Meshy 6 Preview image-to-3d	Meshy-6-Preview is the latest model from Meshy. It generates realistic and production ready 3D models. image-to-3d	Deprecated	4/22	→
Ideogram text-to-image	Train Ideogram on your photos, your style, your subject, your look, from a small set of reference images to images that feel consistently yours stylized transform	OK	4/22	→
Lyra 2 video-to-video	Lyra 2.0 is an image-to-video model that turns a single image into an explorable 3D-style video with camera-controlled motion.	Deprecated	4/22	→
Ideogram training	Train Ideogram on your photos, your style, your subject, your look, from a small set of reference images to images that feel consistently yours stylized transform	OK	4/22	→
Cohere Transcribe speech-to-text	Cohere Transcribe turns your business audio into accurate text, ready for search, analytics, and automation speech transcribe stt	OK	4/22	→
Kling Video image-to-video	Kling's Native 4K is a video generation model that directly outputs professional-grade 4K video in one step, eliminating the need for post-production upscaling stylized transform lipsync	OK	4/22	→
Kling Video image-to-video	Kling's Native 4K is a video generation model that directly outputs professional-grade 4K video in one step, eliminating the need for post-production upscaling stylized transform lipsync	OK	4/22	→
Kling Video text-to-video	Kling's Native 4K is a video generation model that directly outputs professional-grade 4K video in one step, eliminating the need for post-production upscaling stylized transform lipsync	OK	4/22	→
Kling Video image-to-video	Kling's Native 4K is a video generation model that directly outputs professional-grade 4K video in one step, eliminating the need for post-production upscaling stylized transform lipsync	OK	4/22	→
Kling Video V3 Text to Video 4K text-to-video	Kling's Native 4K is a video generation model that directly outputs professional-grade 4K video in one step, eliminating the need for post-production upscaling stylized transform lipsync	OK	4/22	→
Lyra 2 image-to-video	Lyra 2.0 is an image-to-video model that turns a single image into an explorable 3D-style video with camera-controlled motion.	Deprecated	4/21	→
GPT Image 2 API text-to-image	GPT Image 2, OpenAI's latest image model, is capable of creating extremely detailed images with fine typography. gpt-image-2 openai typography chatgpt-images-2	OK	4/20	→
GPT Image 2 API image-to-image	GPT Image 2, OpenAI's latest image model, is capable of making fine-grained, detailed edits to images. gpt-image-2 openai chatgpt-images-2	OK	4/20	→
Ernie Image Lora Turbo text-to-image	High-quality text-to-image model by Baidu. Supports English, Chinese, and Japanese prompts with built-in prompt expansion. stylized transform typography	OK	4/16	→
Ernie Image Lora text-to-image	High-quality text-to-image model by Baidu. Supports English, Chinese, and Japanese prompts with built-in prompt expansion.	OK	4/16	→
Heygen Video Agent text-to-video	Generate videos with a single prompt. Describe what you want in plain text, and the agent handles avatar selection, scripting, scene composition - all in one.	OK	4/16	→
Gemini 3.1 Flash Tts text-to-speech	Newest audio model from Google introduces granular audio tags that give you precise control to direct AI speech for expressive audio generation. lipsync avatar	OK	4/16	→
Nucleus Image text-to-image	Nucleus-Image is a text-to-image generation model built on a sparse mixture-of-experts (MoE) diffusion transformer architecture. stylized transform typography	OK	4/16	→
ERNIE-Image Trainer training	LoRA trainer for ERNIE-Image, Baidu's powerful 8B-parameter text-to-image model. lora personalization trainer	OK	4/15	→
LTX-2.3 22B Distilled video-to-video	Generate video with audio from reference videos using LTX-2.3 Distilled and custom LoRA	OK	4/14	→
LTX-2.3 22B Distilled video-to-video	Generate video with audio from reference videos using LTX-2.3 Distilled	OK	4/14	→
LTX 2.3 22B video-to-video	Generate video with audio from reference video, text and images using LTX-2.3 and custom LoRA	OK	4/14	→
LTX-2.3 22B video-to-video	Generate video with audio from reference video, text and images using LTX-2.3	OK	4/14	→
Heygen Lipsync - Speed video-to-video	Replace or dub audio on an existing video with fast audio-only lip-sync. stylized transform lipsync	OK	4/14	→
Heygen Lipsync - Precision video-to-video	Replace or dub audio on an existing video with high-accuracy avatar-inference lip-sync. lipsync stylized transform	OK	4/14	→
Ernie Image Turbo text-to-image	High-quality text-to-image model by Baidu. Supports English, Chinese, and Japanese prompts with built-in prompt expansion.	OK	4/13	→
Imagineart 2.0 Preview text-to-image	ImagineArt 2.0 is ImagineArt's latest state-of-the-art visual reasoning text-to-image model, generating high-fidelity, professional-grade visuals with lifelike realism, cinematic effects, and strong aesthetic quality. stylized transform typography	OK	4/13	→
Minimax Music 2.5 text-to-audio	MiniMax Music 2.5 creates complete tracks with singing, backing music, and detailed arrangements from lyrics and a style description. stylized transform lipsync	OK	4/11	→
Minimax Music 2.6 text-to-audio	MiniMax Music 2.6 creates complete tracks with singing, backing music, and detailed arrangements from lyrics and a style description. stylized transform lipsync	OK	4/11	→
Void Video Inpainting video-to-video	VOID removes objects from videos along with all interactions they induce on the scene utility editing	OK	4/10	→
Ernie Image text-to-image	High-quality text-to-image model by Baidu. Supports English, Chinese, and Japanese prompts with built-in prompt expansion. realism chinese multilingual portrait photorealistic	OK	4/10	→
Grok Imagine Image text-to-image	Grok Imagine Pro is an advanced AI model from xAI that creates high-quality visuals from text prompts and allows you to edit or analyze existing images. stylized transform typography	OK	4/9	→
Grok Imagine Image Editing Quality image-to-image	Grok Imagine Pro is an advanced AI model from xAI that creates high-quality visuals from text prompts and allows you to edit or analyze existing images. stylized transform typography	OK	4/9	→
PixVerse C1 Reference To Video image-to-video	Generate character-consistent videos from reference images using PixVerse C1, with subject and background references. video-generation reference-to-video pixverse character-consistency 1080p audio	OK	4/8	→
PixVerse C1 Transition image-to-video	Create seamless cinematic transitions between two images with PixVerse C1, with native audio and up to 1080p. video-generation transition pixverse morphing 1080p audio	OK	4/8	→
PixVerse C1 Image To Video image-to-video	Animate images into cinematic videos with PixVerse C1, supporting 1080p resolution and native audio generation. video-generation image-to-video pixverse animation 1080p audio	OK	4/8	→
PixVerse C1 Text To Video text-to-video	Generate film-grade videos from text prompts with native audio, up to 1080p and 15 seconds, using PixVerse C1. video-generation text-to-video pixverse cinematic film 1080p audio	OK	4/8	→
Vidu image-to-video	Vidu's latest Q3 Reference to Video Mix model	OK	4/8	→
PATINA image-to-image	Extract seamless tiling textures with PBR attribute maps from images material pbr extraction	OK	4/8	→
PATINA text-to-image	Generate complete seamlessly tiling PBR materials including normal, roughness, basecolor, height and metalness maps up to 8K material pbr displacement metalness normal roughness basecolor albedo height	OK	4/8	→
PATINA image-to-image	PATINA creates seamless high-resolution normal, roughness, basecolor (albedo), height (displacement) and metalness maps from images pbr displacement metalness normal roughness basecolor albedo height	OK	4/8	→
Tripo P1 Text to 3D text-to-3d	Generate 3D models from text descriptions using Tripo P1. 3d text-to-3d 3d-generation tripo	OK	4/7	→
Tripo P1 Image to 3D image-to-3d	Generate 3D models from a single image using Tripo P1. 3d image-to-3d 3d-generation tripo	OK	4/7	→
Tripo H3.1 Text to 3D text-to-3d	Generate 3D models from text descriptions using Tripo H3.1. 3d text-to-3d 3d-generation tripo	OK	4/7	→
Tripo H3.1 Multiview to 3D image-to-3d	Generate 3D models from multiple view images using Tripo H3.1. 3d multiview-to-3d 3d-generation tripo	OK	4/7	→
Tripo H3.1 Image to 3D image-to-3d	Generate high-quality 3D models from a single image using Tripo H3.1. 3d image-to-3d 3d-generation tripo	OK	4/7	→
Ideogram image-to-image	Ideogram Layerize takes an existing flat graphic, removes text, and returns structured text containers you can edit/recompose in html or json format. stylized transform typography	OK	4/7	→
Ideogram Transparent text-to-image	Generate images with transparent backgrounds using Ideogram Transparent model stylized transform typography	OK	4/7	→
ReconViaGen 0.5 image-to-3d	Generate 3D models from one or more images using ReconViaGen 0.5 multi-view 3d-reconstruction	OK	4/7	→
Joyai Image Edit image-to-image	All-in-one image AI with JoyAI-Image. Understand, create, and edit images through natural language—the model's deep visual understanding powers more accurate generation and precise editing in a unified system. image-to-image image-editing	OK	4/6	→
sync-3 Lipsync video-to-video	sync-3 most powerful lipsync model yet, featuring native visual intelligence for professional-quality video. stylized transform lipsync	OK	4/6	→
Seedance 2.0 Text to Video API text-to-video	ByteDance's most advanced text-to-video model. Cinematic output with native audio, multi-shot editing, real-world physics, and director-level camera control. stylized transform lipsync	OK	4/1	→
Seedance 2 Reference to Video image-to-video	ByteDance's most advanced reference-to-video model. Generate video from up to 9 images, 3 videos, and 3 audio clips with native audio and cinematic camera control. stylized transform lipsync	OK	4/1	→
Seedance 2.0 Fast Text to Video text-to-video	ByteDance's most advanced text-to-video model, fast tier. Lower latency and cost with cinematic output, native audio, multi-shot editing, and director-level camera control. stylized transform lipsync	OK	4/1	→
Seedance 2 Image to Video image-to-video	ByteDance's most advanced image-to-video model. Animate still images into cinematic video with synchronized audio, start and end frame control, and motion prompts. stylized transform lipsync	OK	4/1	→
Seedance 2.0 Fast Reference to Video image-to-video	ByteDance's most advanced reference-to-video model, fast tier. Lower latency and cost with up to 9 images, 3 videos, and 3 audio clips as inputs. stylized transform lipsync	OK	4/1	→
Seedance 2.0 Fast Image to Video image-to-video	ByteDance's most advanced image-to-video model, fast tier. Lower latency and cost with synchronized audio, start and end frame control, and motion prompts. stylized transform lipsync	OK	4/1	→
Wan image-to-image	Edit and transform images using text instructions with the WAN 2.7 Pro model for precise, professional-grade image modifications. wan image-editing pro	OK	4/1	→
Wan image-to-image	Transform and edit existing images with text-guided instructions using the WAN 2.7 model for creative image manipulation. wan image-to-image image-editing	OK	4/1	→
Wan text-to-image	Generate high-quality images from text prompts using the WAN 2.7 model with advanced prompt understanding and detailed output. wan text-to-image image-generation	OK	4/1	→
Wan text-to-image	Generate premium-quality images from text prompts using the enhanced WAN 2.7 Pro model with superior detail and composition. wan text-to-image pro	OK	4/1	→
Veo3.1 Lite FLF image-to-video	Veo 3.1 Lite balances practical utility with professional capabilities, supporting Text-to-Video and Image-to-Video stylized transform lipsync	OK	3/31	→
Veo3.1 Lite Image to Video image-to-video	Veo 3.1 Lite balances practical utility with professional capabilities, supporting Text-to-Video and Image-to-Video stylized transform lipsync	OK	3/31	→
Veo3.1 Lite Text to Video text-to-video	Veo 3.1 Lite balances practical utility with professional capabilities, supporting Text-to-Video and Image-to-Video stylized transform lipsync	OK	3/31	→
Sam 3 1 video-to-video	SAM 3.1 builds comes with Object Multiplex, a shared-memory approach for joint multi-object tracking that delivers faster speeds with larger number of objects tracked. segmentation mask real-time	OK	3/30	→
Sam 3 1 video-to-video	SAM 3.1 builds comes with Object Multiplex, a shared-memory approach for joint multi-object tracking that delivers faster speeds with larger number of objects tracked. segmentation mask real-time	OK	3/30	→
Sam 3 1 image-to-image	SAM 3.1 builds comes with Object Multiplex, a shared-memory approach for joint multi-object tracking that delivers faster speeds with larger number of objects tracked. segmentation mask real-time	OK	3/30	→
Sam 3 1 image-to-image	SAM 3.1 builds comes with Object Multiplex, a shared-memory approach for joint multi-object tracking that delivers faster speeds with larger number of objects tracked. segmentation mask real-time	OK	3/30	→
Lyria3 text-to-audio	Lyria 3 is most recent music model from Google audio music sfx	OK	3/30	→
PixVerse V6 Transition image-to-video	Pixverse's latest v6 Model. image-to-video first-frame-last-frame transition	OK	3/29	→
PixVerse V6 Extend video-to-video	Pixverse's latest v6 Model. video-to-video extend	OK	3/29	→
PixVerse V6 Image To Video image-to-video	Pixverse's latest V6 Model image-to-video	OK	3/29	→
PixVerse V6 Text To Video text-to-video	Pixverse's latest v6 Model. text-to-video	OK	3/29	→
Wan Text to Video text-to-video	Wan 2.7 is the latest generation AI video model, delivering enhanced motion smoothness, superior scene fidelity, and greater visual coherence. stylized transform lipsync	OK	3/28	→
Wan 2.7 Reference to Video image-to-video	Wan 2.7 is the latest generation AI video model, delivering enhanced motion smoothness, superior scene fidelity, and greater visual coherence. stylized transform lipsync	OK	3/28	→
Wan video-to-video	Wan 2.7 is the latest generation AI video model, delivering enhanced motion smoothness, superior scene fidelity, and greater visual coherence. stylized transform lipsync	OK	3/28	→
Wan image-to-video	Wan 2.7 is the latest generation AI video model, delivering enhanced motion smoothness, superior scene fidelity, and greater visual coherence. stylized transform lipsync	OK	3/28	→
Phota image-to-image	Phota's model enables personalized photo editing, preserving identity while erasing distractions seamlessly. edit personalization typography phota	OK	3/26	→
Phota Enhance image-to-image	Enhance images while preserving identities with Phota stylized transform typography phota	OK	3/26	→
Phota Text to Image text-to-image	Phota's model empowers developers, photographers, and creators with personalized photograph generation and editing. stylized transform typography phota	OK	3/26	→
Phota Create Profile training	Generate profiles using 30-50 images of a subject with Phota. stylized transform typography phota	OK	3/26	→
Hy Wu Edit image-to-image	Image editing with HY-WU. Transfer outfits, swap faces, and blend textures instantly—no finetuning needed, just describe what you want and provide reference images. image-to-image	OK	3/25	→
Davinci Magihuman image-to-video	Expressive facial performance, natural speech-expression coordination, realistic body motion, and accurate audio-video synchronization with DaVinci-MagiHuman model animation lip sync	OK	3/25	→
Grok Imagine Reference to Video image-to-video	Generate videos using multiple reference images with xAI's Grok Imagine video model video-edit v2v grok xai	OK	3/24	→
Grok Imagine Extend Video video-to-video	Extend videos with xAI's Grok Imagine video model video-edit v2v grok xai	OK	3/24	→
SeedVR2 image-to-image	Use SeedVR2 to upscale images, retaining seamless tiling upscale image-to-image seamless tiling	OK	3/23	→
Gemini TTS text-to-audio	Use Gemini TTS Models to convert your prompts to real audio. text-to-speech audio gemini	OK	3/20	→
LTX-2.3 22B Video to Video Trainer training	Train LTX-2.3 22B for video transformation or video-conditioned generation. ltx2-video fine-tuning video-to-video	OK	3/17	→
LTX-2.3 22B Video Trainer training	Train LTX-2.3 22B for custom styles and effects. ltx2.3-video fine-tuning	OK	3/17	→
FLUX.2 [klein] 9B LoRA image-to-image	Image-to-image editing with FLUX.2 [klein] 9B from Black Forest Labs and custom LoRA. Precise modifications using natural language descriptions and hex color control.	OK	3/17	→
FLUX.2 [klein] 9B LoRA text-to-image	Text-to-image generation with FLUX.2 [klein] 9B from Black Forest Labs and custom LoRA.	OK	3/17	→
FLUX.2 [klein] 4B LoRA image-to-image	Image-to-image editing with FLUX.2 [klein] 4B from Black Forest Labs and custom LoRA. Precise modifications using natural language descriptions and hex color control.	OK	3/17	→
FLUX.2 [klein] 4B LoRA text-to-image	Text-to-image generation with FLUX.2 [klein] 4B from Black Forest Labs and custom LoRA. Enhanced realism, crisper text generation, and native editing capabilities.	OK	3/17	→
Goal Force image-to-video	Physics-based video generation with Goal Force. Point where you want objects to move, set force direction and strength, get physically plausible results. image-to-video controlnet	Deprecated	3/17	→
Bytedance Seed V2 Mini llm	Seed 2.0 Mini is a high-performance multimodal model optimized for low latency and high concurrency. It supports text, image, and video input with 256K context and configurable thinking/reasoning modes.	OK	3/17	→
Z-Image Turbo Seamless Tiling Lora text-to-image	Generate seamlessly tiling photorealistic images from text using Z-Image Turbo and custom LoRA z-image turbo seamless tiling	OK	3/17	→
Z-Image Turbo Seamless Tiling text-to-image	Generate seamlessly tiling photorealistic images from text using Z-Image Turbo z-image turbo seamless tiling	OK	3/17	→
xAI Text to Speech text-to-speech	Generate speech with expressive and realistic voices from xAI	OK	3/17	→
LTX-2.3 22B Distilled video-to-video	Generate video with audio from videos using LTX-2.3 Distilled and custom LoRA	OK	3/14	→
LTX-2.3 22B Distilled audio-to-video	Generate video with audio from audio, text and images using LTX-2.3 Distilled and custom LoRA	OK	3/14	→
LTX-2.3 22B Distilled image-to-video	Generate video with audio from images using LTX-2.3 Distilled and custom LoRA	OK	3/14	→
LTX-2.3 22B Distilled text-to-video	Generate video with audio from text using LTX-2.3 Distilled and custom LoRA	OK	3/14	→
LTX-2.3 22B video-to-video	Extend video with audio using LTX-2.3 and custom LoRA	OK	3/14	→
LTX-2.3 22B video-to-video	Generate video with audio from videos using LTX-2.3 and custom LoRA	OK	3/14	→
LTX-2.3 22B audio-to-video	Generate video with audio from audio, text and images using LTX-2.3 and custom LoRA	OK	3/14	→
LTX-2.3 22B image-to-video	Generate video with audio from images using LTX-2.3 and custom LoRA	OK	3/14	→
LTX-2.3 22B text-to-video	Generate video with audio from text using LTX-2.3 and custom LoRA	OK	3/14	→
LTX-2.3 22B Distilled video-to-video	Generate video with audio from videos using LTX-2.3 Distilled	OK	3/14	→
LTX-2.3 22B Distilled audio-to-video	Generate video with audio from audio, text and images using LTX-2 Distilled	OK	3/14	→
LTX-2.3 22B Distilled image-to-video	Generate video with audio from images using LTX-2.3 Distilled	OK	3/14	→
LTX-2.3 22B Distilled text-to-video	Generate video with audio from text using LTX-2.3 Distilled	OK	3/14	→
LTX-2.3 22B video-to-video	Extend video with audio using LTX-2.3	OK	3/14	→
LTX-2.3 22B video-to-video	Generate video with audio from videos using LTX-2.3	OK	3/14	→
LTX-2.3 22B audio-to-video	Generate video with audio from audio, text and images using LTX-2	OK	3/14	→
LTX-2.3 22B image-to-video	Generate video with audio from images using LTX-2.3	OK	3/14	→
LTX-2.3 22B text-to-video	Generate video with audio from text using LTX-2.3	OK	3/14	→
Inworld TTS-1.5 Max text-to-speech	Text to Speech Endpoint for Inworld's TTS-1.5 Max. text-to-speech inworld tts	OK	3/13	→
Sora 2 image-to-video	Generate character ids to use with Sora 2 generations	Deprecated	3/12	→
Flashhead image-to-video	SoulX-FlashHead is a unified 1.3B-parameter framework designed for high-fidelity, infinite-length, and real-time streaming portrait video generation. portrait video streaming real-time face-animation talking-head	OK	3/12	→
Flashtalk audio-to-video	Audio-driven talking avatar generation powered by the SoulX-FlashTalk 14B model. avatar talking-head audio-driven lip-sync portrait video	OK	3/12	→
Lux TTS text-to-speech	High-quality voice cloning TTS model that generates 48kHz speech from text and a reference audio. Distilled to 4 steps for fast inference. tts voice-cloning speech-synthesis	Deprecated	3/12	→
Tada audio-to-audio	A unified speech-language model that synchronizes speech and text into a single, cohesive stream via 1:1 alignment.	OK	3/12	→
Tada TTS 1B audio-to-audio	A unified speech-language model that synchronizes speech and text into a single, cohesive stream via 1:1 alignment. Lighter 1B variant	OK	3/12	→
Physic Edit image-to-image	Physics-aware image editing with PhysicEdit. Make realistic edits that follow real-world physics—handles complex effects like refraction, material changes, and deformation with physically plausible results. image-editing	Deprecated	3/12	→
Omnilottie json	Convert your assets into lottie using Omnilottie. lottie	OK	3/11	→
Omnilottie json	Convert your assets into lottie using Omnilottie. lottie	OK	3/11	→
Omnilottie json	Convert your assets into lottie using Omnilottie. lotties	OK	3/11	→
Vecglypher image-to-image	Vector font generation with VecGlypher. Create custom glyphs from text descriptions or reference images—outputs clean SVG paths directly without raster-to-vector conversion.	OK	3/10	→
Vecglypher text-to-image	Vector font generation with VecGlypher. Create custom glyphs from text descriptions or reference images—outputs clean SVG paths directly without raster-to-vector conversion.	OK	3/10	→
Onereward image-to-image	OneReward is a finetuned version of Flux 1.0 Fill with intelligent editing capabilities. onereward	Deprecated	3/5	→
LTX 2.3 Video Fast image-to-video	LTX-2.3 is a high-quality, fast AI video model available in Pro and Fast variants for text-to-video, image-to-video, and audio-to-video. stylized transform lipsync	OK	3/5	→
LTX 2.3 Video Fast text-to-video	LTX-2.3 is a high-quality, fast AI video model available in Pro and Fast variants for text-to-video, image-to-video, and audio-to-video. stylized transform lipsync	OK	3/5	→
LTX 2.3 Video Pro image-to-video	LTX-2.3 is a high-quality, fast AI video model available in Pro and Fast variants for text-to-video, image-to-video, and audio-to-video. stylized transform lipsync	OK	3/5	→
LTX Video 2.3 Pro text-to-video	LTX-2.3 is a high-quality, fast AI video model available in Pro and Fast variants for text-to-video, image-to-video, and audio-to-video. stylized transform lipsync	OK	3/5	→
Firered Image Edit V1.1 image-to-image	FireRed Image Edit v1.1 is an updated version of FireRed Image Edit, with improved image editing capabilities. firered-image-edit	OK	3/5	→
LTX Video 2.3 Pro video-to-video	LTX-2.3 is a high-quality, fast AI video model available in Pro and Fast variants for text-to-video, image-to-video, and audio-to-video. stylized transform lipsync	OK	3/5	→
LTX 2.3 Video Pro audio-to-video	LTX-2.3 is a high-quality, fast AI video model available in Pro and Fast variants for text-to-video, image-to-video, and audio-to-video. stylized transform lipsync	OK	3/5	→
LTX Video 2.3 Pro video-to-video	LTX-2.3 is a high-quality, fast AI video model available in Pro and Fast variants for text-to-video, image-to-video, and audio-to-video. stylized transform lipsync	OK	3/5	→
Kling Video video-to-video	Transfer movements from a reference video to any character image. Cost-effective mode for motion transfer, perfect for portraits and simple animations. stylized transform editing	OK	3/5	→
Kling Video video-to-video	Transfer movements from a reference video to any character image. Cost-effective mode for motion transfer, perfect for portraits and simple animations. stylized transform editing	OK	3/5	→
LTX Video 2.0 Pro video-to-video	Extends videos with audio using LTX-2	OK	3/4	→
LTX 2.0 Video Pro audio-to-video	Generate video from audio using LTX-2 stylized transform lipsync	OK	3/4	→
Pixelcut Background Remover image-to-image	Pixelcut’s Background Remover enables fast, ultra high-quality removal of backgrounds from images. Perfect for e-commerce and image editing workflows. Powered by advanced AI for clean, perfect cutouts every time. background removal utility remove background	OK	3/4	→
Qwen Image 2 text-to-image	Qwen-Image-2.0 is a next-generation foundational unified generation-and-editing model realism typography	OK	3/3	→
Qwen Image 2 text-to-image	Qwen-Image-2.0 is a next-generation foundational unified generation-and-editing model realism typography	OK	3/3	→
Qwen Image 2 image-to-image	Qwen-Image-2.0 is a next-generation foundational unified generation-and-editing model stylized transform	OK	3/3	→
Qwen Image 2 image-to-image	Qwen-Image-2.0 is a next-generation foundational unified generation-and-editing model stylized transform	OK	3/3	→
Depth Anything Video video-to-video	Generates depth maps from video using Video Depth Anything (CVPR 2025). Produces per-frame depth estimation with temporal consistency across frames. Supports 3 model sizes (Small, Base, Large), 5 colormaps including grayscale, side-by-side comparison with the original video, and raw depth export as .npz. Useful for 3D reconstruction, video effects, compositing, and scene understanding. video to video motion edit	OK	3/2	→
Fibo Bbq Preview text-to-image	A preview to the next level of control of Text-to-Image models.	OK	3/2	→
Trellis 2 image-to-3d	Generate 3D models from your images using Trellis 2. A native 3D generative model enabling versatile and high-quality 3D asset creation. image-to-3D	OK	3/2	→
Nano Banana 2 image-to-image	Nano Banana 2 is Google's new state-of-the-art image generation and editing model	OK	2/26	→
Nano Banana 2 text-to-image	Nano Banana 2 is Google's new state-of-the-art fast image generation and editing model	OK	2/26	→
Gemini 3.1 Flash Image Preview image-to-image	Gemini 3.1 Flash Image (a.k.a. Nano Banana 2) is Google's new state-of-the-art fast image generation and editing model	OK	2/26	→
Gemini 3.1 Flash Image Preview text-to-image	Gemini 3.1 Flash Image (a.k.a Nano Banana 2) is Google's new state-of-the-art fast image generation and editing model	OK	2/26	→
Embed Product image-to-image	Seamlessly embed products into any scene with pixel-perfect control, automatic perspective, and natural lighting. Trained on licensed data - risk-free for advertising and eCommerce production. product-shot advertising	OK	2/25	→
Multishot Master text-to-video	MultiShotMaster is a controllable multi-shot narrative video generation framework that supports text-driven inter-shot consistency, variable shot counts and shot durations, customized subject with motion control, and background-driven customized scene. text-to-video multi-shot	Deprecated	2/24	→
Cosmos Predict 2.5 2B Distilled text-to-video	Generate video from text and videos using NVIDIA's 2B Cosmos Distilled Model	OK	2/24	→
Cosmos Predict 2.5 2B video-to-video	Generate video from text and videos using NVIDIA's 2B Cosmos Post-Trained Model	OK	2/24	→
Cosmos Predict 2.5 2B image-to-video	Generate video from text and images using NVIDIA's 2B Cosmos Post-Trained Model	OK	2/24	→
Cosmos Predict 2.5 2B text-to-video	Generate video from text using NVIDIA's 2B Cosmos Post-Trained Model	OK	2/24	→
Heygen video-to-video	Heygen Translate Model with Extreme Speed video-to-video	OK	2/23	→
Heygen video-to-video	Heygen Translate Model with Extreme Precision video-to-video	OK	2/23	→
Heygen image-to-video	Heygen Photo Avatar 4 Model image-to-video	OK	2/23	→
Heygen text-to-video	Heygen Avatar 4 Digital Twin Model text-to-video	OK	2/23	→
Heygen text-to-video	Heygen Avatar V3 Model for Digital Twin text-to-video	OK	2/23	→
Wan-2.2 LoRA Trainer training	Train custom LoRAs for Wan-2.2 T2V/I2V 480P lora training video	OK	2/23	→
Wan-2.2 LoRA Trainer training	Train custom LoRAs for Wan-2.2 T2V/I2V 480P lora training video	OK	2/23	→
Lucy Image to Video image-to-video	Lucy delivers lightning fast performance that redefines what's possible with image to video AI	Deprecated	2/23	→
Lava SR audio-to-audio	Enhance muffled 16 kHz speech audio into crystal-clear 48 kHz, with denoising for particularly bad inputs. lava-sr audo-upscaler	Deprecated	2/23	→
Upscale image-to-image	Professional-grade creative upscaler that doubles resolution up to 10MP, regenerating sharper textures, refined details, and cleaner faces. Trained exclusively on licensed data for risk-free commercial use. bria aesthetics upscaler	OK	2/23	→
Bytedance Seedream V5 Lite Edit image-to-image	Image editing endpoint for the fast Lite version of Seedream 5.0, supporting high quality intelligent image editing with multiple inputs. bytedance seedream-5.0-lite edit	Deprecated	2/23	→
Bytedance Seedream V5 Lite Text To Image text-to-image	Text to Image endpoint for the fast Lite version of Seedream 5.0, supporting high quality intelligent text-to-image generation. text-to-image bytedance seedream-5.0-lite	Deprecated	2/23	→
Aesthetics Upscaler image-to-image	Image aesthetics upscaler that enhances resolution while improving lighting, color balance, sharpness, and overall visual appeal. At a fixed resolution of 4MP, preserving natural detail and refined style. aesthetics upscaler	Deprecated	2/22	→
Bitdance text-to-image	Image generation with BitDance. Fast, high-resolution photorealistic images using an autoregressive LLM— for efficient, high-quality results. text-to-image	OK	2/21	→
Personaplex audio-to-audio	PersonaPlex is a real-time, full-duplex speech-to-speech conversational model that enables persona control through text-based role prompts and audio-based voice conditioning. audio-to-audio realtime conversational	OK	2/20	→
Workflow Utilities Reverse Video video-to-video	FFMPEG Utility to Reverse Videos video-to-video	OK	2/19	→
Firered Image Edit image-to-image	FireRed Image Edit is FireRed's state of the art open source editing model, re-trained from Qwen Image Edit 2509. image-editing firered	OK	2/19	→
Wan Motion video-to-video	Wan Motion is a streamlined character animation model that transfers motion from a driving video onto a reference character image. Based on Wan-Animate which preserves the original character's proportions, Simple uses pose retargeting to adapt the driving video's skeleton to match the reference character's body shape, producing more natural results when the two have different builds. It outputs at 720p with optimized defaults for fast, high-quality generation — just provide a video, an image, and an optional prompt.	OK	2/19	→
Recraft V4 Pro (Vector) text-to-image	Recraft V4 was developed with designers to bring true visual taste to AI image generation. Built for brand systems and production-ready workflows, it goes beyond prompt accuracy — delivering stronger composition, refined lighting, realistic materials, and a cohesive aesthetic. The result is imagery shaped by professional design judgment, ready for immediate real-world use without additional post-processing. text-to-image text-to-vector	OK	2/16	→
Recraft V4 (Vector) text-to-image	Recraft V4 was developed with designers to bring true visual taste to AI image generation. Built for brand systems and production-ready workflows, it goes beyond prompt accuracy — delivering stronger composition, refined lighting, realistic materials, and a cohesive aesthetic. The result is imagery shaped by professional design judgment, ready for immediate real-world use without additional post-processing. text-to-image text-to-vector	OK	2/16	→
Workflow Utilities Scale Video video-to-video	FFMPEG Utilities to Scale Videos	OK	2/16	→
Heygen text-to-video	Heygen Text to Video Generation Model text-to-video	OK	2/15	→
Genfocus image-to-image	GenFocus Model to Refocus Images image-to-image	Deprecated	2/14	→
Genfocus image-to-image	GenFocus Model to Refocus Images image-to-image	Deprecated	2/14	→
Personaplex audio-to-audio	PersonaPlex is a real-time, full-duplex speech-to-speech conversational model that enables persona control through text-based role prompts and audio-based voice conditioning. audio	OK	2/12	→
Recraft V4 Pro text-to-image	Recraft V4 was developed with designers to bring true visual taste to AI image generation. Built for brand systems and production-ready workflows, it goes beyond prompt accuracy — delivering stronger composition, refined lighting, realistic materials, and a cohesive aesthetic. The result is imagery shaped by professional design judgment, ready for immediate real-world use without additional post-processing. text-to-image	OK	2/12	→
Recraft V4 text-to-image	Recraft V4 was developed with designers to bring true visual taste to AI image generation. Built for brand systems and production-ready workflows, it goes beyond prompt accuracy delivering stronger composition, refined lighting, realistic materials, and a cohesive aesthetic. The result is imagery shaped by professional design judgment, ready for immediate real-world use without additional post-processing. text-to-image	OK	2/12	→
Workflow Utilities Trim Video video-to-video	FFMPEG Utility for Trim Video video-to-video	OK	2/11	→
Meshy 6 image-to-3d	Meshy-6 is the latest model from Meshy. It generates realistic and production ready 3D models. image-to-3d	OK	2/9	→
Meshy 6 text-to-3d	Meshy-6 is the latest model from Meshy. It generates realistic and production ready 3D models. text-to-3d	OK	2/9	→
Qwen Image Trainer V2 training	Qwen Image LoRA training lora personalization	Deprecated	2/6	→
Vidu image-to-video	Vidu's Q3 Turbo Model image-to-video	OK	2/6	→
Vidu text-to-video	Vidu's Q3 Turbo Model. text-to-video	OK	2/6	→
V2.6 video-to-video	Wan 2.6 reference-to-video flash model. reference-to-video	OK	2/6	→
Bytedance Dreamactor V2 video-to-video	Transfer motion from a video to characters in an image using Dreamactor v2. Great performance for non-human and multiple characters motion-control dreamactor	OK	2/6	→
Flux 2 [klein] Realtime image-to-image	Realtime generation with FLUX.2 [klein] from Black Forest Labs. realtime image-to-image	OK	2/5	→
Workflow Utilities Impulse Response audio-to-audio	FFMPEG Utility for Impulse Response	OK	2/5	→
Workflow Utilities Extract Nth Frame image-to-image	FFMPEG Untility for Extracting nth Frame	OK	2/5	→
Workflow Utilities Blend Video video-to-video	FFMPEG Utility for Blending Videos	OK	2/5	→
Workflow Utilities Audio Compressor audio-to-audio	FFMPEG Utility for Audio Compression	OK	2/5	→
Kling Video v3 Text to Video [Pro] text-to-video	Kling 3.0 Pro: Top-tier text-to-video with cinematic visuals, fluid motion, and native audio generation, with multi-shot support. text-to-video	OK	2/4	→
Kling O3 Image to Video [Pro] image-to-video	Generate a video by taking a start frame and an end frame, animating the transition between them while following text-driven style and scene guidance. image-to-video	OK	2/4	→
Kling O3 Reference to Video [Pro] image-to-video	Transform images, elements, and text into consistent, high-quality video scenes, ensuring stable character identity, object details, and environments. reference-to-video	OK	2/4	→
Kling O3 Text to Video [Pro] text-to-video	Generate realistic videos using Kling O3 from Kling Team! text-to-video	OK	2/4	→
Kling O3 Text to Video [Standard] text-to-video	Generate realistic videos using Kling O3 from Kling Team! text-to-video	OK	2/4	→
Kling O3 Reference to Video [Standard] image-to-video	Transform images, elements, and text into consistent, high-quality video scenes, ensuring stable character identity, object details, and environments. reference-to-video	OK	2/4	→
Kling Video v3 Text to Video [Standard] text-to-video	Kling 3.0 Standard: Top-tier text-to-video with cinematic visuals, fluid motion, and native audio generation, with multi-shot support. text-to-video	OK	2/4	→
Kling O3 Image to Video [Pro] image-to-video	Generate a video by taking a start frame and an end frame, animating the transition between them while following text-driven style and scene guidance. image-to-video	OK	2/4	→
Kling O3 Edit Video [Standard] video-to-video	Edit videos using Kling O3 from Kling Team! video-to-video	OK	2/4	→
Kling O3 Edit Video [Pro] video-to-video	Edit videos using Kling O3 from Kling Team! video-to-video	OK	2/4	→
Kling O3 Reference Video to Video [Standard] video-to-video	Kling O3 Omni generates new shots guided by an input reference video, preserving cinematic language such as motion, and camera style to produce seamless scene continuity. video-to-video	OK	2/4	→
Kling Video v3 Image to Video [Standard] image-to-video	Kling 3.0 Standard: Top-tier image-to-video with cinematic visuals, fluid motion, and native audio generation, with custom element support. image-to-video	OK	2/4	→
Kling Video v3 Image to Video [Pro] image-to-video	Kling 3.0 Pro: Top-tier image-to-video with cinematic visuals, fluid motion, and native audio generation, with custom element support. image-to-video	OK	2/4	→
Kling O3 Reference Video to Video [Pro] video-to-video	Kling O3 Omni generates new shots guided by an input reference video, preserving cinematic language such as motion, and camera style to produce seamless scene continuity. video-to-video	OK	2/4	→
MiniMax Speech 2.8 [HD] text-to-speech	Generate speech from text prompts and different voices using the MiniMax Speech-2.8 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.	OK	2/4	→
MiniMax Speech 2.8 [Turbo] text-to-speech	Generate speech from text prompts and different voices using the MiniMax Speech-2.8 Turbo model, which leverages advanced AI techniques to create high-quality text-to-speech.	OK	2/4	→
Kling Image text-to-image	Kling V3: Latest Kling Image model text-to-image	OK	2/3	→
Kling Image image-to-image	Kling Image V3: Latest kling image model image-to-image	OK	2/3	→
Kling Image image-to-image	Kling Omni 3: Top-tier image-to-image with flawless consistency. image-to-image	OK	2/3	→
Kling Image text-to-image	Kling Omni 3: Top-tier text-to-image with flawless consistency. text-to-image	OK	2/3	→
Vidu image-to-video	Vidu's latest Q3 pro models. image-to-video	OK	1/31	→
Vidu text-to-video	Vidu's latest Q3 pro models text-to-video	OK	1/31	→
Hunyuan 3d text-to-3d	Create detailed, fully-textured 3D models with text 3d	OK	1/29	→
Grok Imagine Image image-to-image	Edit images precisely with xAI's Grok Imagine model grok xai image-editing	OK	1/29	→
Grok Imagine Image text-to-image	Generate highly aesthetic images with xAI's Grok Imagine Image generation model. xai grok text-to-image	OK	1/29	→
Grok Imagine Video video-to-video	Edit videos using xAI's Grok Imagine video-edit v2v grok xai	OK	1/29	→
Grok Imagine Video image-to-video	Generate videos from images with audio using xAI's Grok Imagine Video model. grok xai image-to-video i2v	OK	1/29	→
Grok Imagine Video text-to-video	Generate videos with audio from text using Grok Imagine Video. xai grok t2v text-to-video	OK	1/29	→
Hunyuan Image image-to-image	Image editing endpoint for Hunyuan Image 3.0 Instruct. tencent hunyuan-image instruct edit	OK	1/28	→
Hunyuan Image 3.0 Instruct text-to-image	Instruct version of Hunyuan-Image 3.0, with internal reasoning capabilities. hunyuan-image v3 instruct	OK	1/28	→
Hunyuan 3D Smart Topology 3d-to-3d	Optimize 3D mesh topology with Hunyuan 3D Smart Topology. 3d hunyuan topology	OK	1/27	→
Hunyuan 3D Rapid Image to 3D image-to-3d	Rapidly generate 3D models from images using Hunyuan 3D. 3d hunyuan image-to-3d	OK	1/27	→
Hunyuan 3D Pro Text to 3D text-to-3d	Generate 3D models from text prompts with Hunyuan 3D Pro 3d hunyuan text-to-3d	OK	1/27	→
Hunyuan 3D Pro Image to 3D image-to-3d	Generate 3D models from images with Hunyuan 3D Pro 3d hunyuan image-to-3d	OK	1/27	→
Z Image Base Trainer training	Fast LoRA trainer for Z-Image, a super fast text-to-image model of 6B parameters developed by Tongyi-MAI. lora personalization trainer	Deprecated	1/27	→
Hunyuan 3D Part Splitter 3d-to-3d	Split 3D models into parts with Hunyuan 3D 3d hunyuan mesh	OK	1/27	→
Qwen Image Max text-to-image	Text-to-Image endpoint for Qwen-Image-Max. Qwen Image Max improves upon the Qwen Image Plus series by enhancing the realism and naturalness of images. qwen-image max	OK	1/27	→
Qwen Image Max image-to-image	Image editing endpoint for Qwen-Image-Max. Qwen Image Max improves upon the Qwen Image Plus series by enhancing the realism and naturalness of images. qwen-image max	OK	1/27	→
Workflow Utilities Interleave Video unknown	ffmpeg utility to interleave videos	OK	1/27	→
Z Image Base Lora text-to-image	LoRA endpoint for Z-Image, the foundation model of the Z- Image family. z-image base lora	OK	1/27	→
Z Image Base text-to-image	Z-Image is the foundation model of the Z- Image family, engineered for good quality, robust generative diversity, broad stylistic coverage, and precise prompt adherence. z-image base	OK	1/27	→
LTX-2 19B Distilled audio-to-video	Generate video with audio from audio, text and images using LTX-2 Distilled and custom LoRA	OK	1/27	→
LTX-2 19B audio-to-video	Generate video with audio from audio, text and images using LTX-2 and custom LoRA	OK	1/27	→
LTX-2 19B Distilled audio-to-video	Generate video with audio from audio, text and images using LTX-2 Distilled	OK	1/27	→
LTX-2 19B audio-to-video	Generate video with audio from audio, text and images using LTX-2	OK	1/27	→
Replace Background image-to-image	Generate professional, eCommerce-ready product shots by replacing backgrounds with realistic lighting and accurate perspective from a simple text prompt. Trained exclusively on licensed data for safe commercial use. bria replace-background	OK	1/27	→
PixVerse V5.6 Transition image-to-video	Use the latest pixverse v5.6 model to turn your texts and images into amazing videos. image-to-video	OK	1/26	→
PixVerse V5.6 Image To Video image-to-video	Use the latest pixverse v5.6 model to turn your texts and images into amazing videos. image-to-video	OK	1/26	→
PixVerse V5.6 Text To Video text-to-video	Use the latest pixverse v5.6 model to turn your texts into amazing videos. text-to-video	OK	1/26	→
Qwen 3 TTS - Voice Design [1.7B] text-to-speech	Create custom voices using Qwen3-TTS Voice Design model and later use Clone Voice model to create your own voices! text-to-speech voice-design	OK	1/26	→
Qwen 3 TTS - Text to Speech [1.7B] text-to-speech	Bring speech to your texts using Qwen3-TTS Custom-Voice model with pre-trained voices or use your custom voice with Qwen3-TTS Clone Voice model text-to-speech	OK	1/26	→
Qwen 3 TTS - Text to Speech [0.6B] text-to-speech	Bring speech to your texts using Qwen3-TTS Custom-Voice model with pre-trained voices or use your custom voice with Qwen3-TTS Clone Voice model text-to-speech	OK	1/26	→
Qwen 3 TTS - Clone Voice [1.7B] audio-to-audio	Clone your voices using Qwen3-TTS Clone-Voice model with zero shot cloning capabilities and use it on text-to-speech models to create speeches of yours! clone-voice voice-clone	OK	1/26	→
Qwen 3 TTS - Clone Voice [0.6B] audio-to-audio	Clone your voices using Qwen3-TTS Clone-Voice model with zero shot cloning capabilities and use it on text-to-speech models to create speeches of yours! clone-voice voice-clone	OK	1/26	→
Z Image Turbo Trainer V2 training	Fast LoRA trainer for Z-Image-Turbo, a super fast text-to-image model of 6B parameters developed by Tongyi-MAI. lora personalization trainer	OK	1/24	→
Ai Face Swap video-to-video	AI-FaceSwap-Video is a service that can replace a person's face throughout a video clip while keeping their movements natural. faceswap utility transformation	Deprecated	1/23	→
Ai Face Swap image-to-image	AI-FaceSwap-Image is a service that can take one person's face and realistically blend it onto another's in a photo. faceswap utility transformation	Deprecated	1/23	→
Fibo Edit [Structured Instruction] text-to-json	Structured Instructions Generation endpoint for Fibo Edit, Bria's newest editing model. structured-prompt-generation fibo-edit json	OK	1/20	→
Fibo Edit [Replace Object by Text] image-to-image	Replace any object in an image using plain language with fine-grained, precise edits and strong prompt adherence. Trained on licensed data for risk-free commercial and brand-safe use. object-replacement bria fibo-edit json	OK	1/20	→
Fibo Edit [Sketch to Image] image-to-image	Convert line drawings and sketches into photorealistic, fully colored images with preserved structure. Trained exclusively on licensed data for safe commercial and design use. sketch-to-image bria fibo-edit json	OK	1/20	→
Fibo Edit [Restore] image-to-image	Photo restoration model that automatically denoises, deblurs, and enhances old or damaged photos - removes imperfections while preserving original character. image-restoration fibo-edit bria json	OK	1/20	→
Fibo Edit [Reseason] image-to-image	Transform the season or weather of an image - summer to winter, sunny to rainy - with realistic atmosphere and lighting. Trained exclusively on licensed data for risk-free commercial use. bria fibo-edit reseason	OK	1/20	→
Fibo Edit [Relight] image-to-image	Precise, controllable photo re-lighting with structured text inputs. Apply natural lighting styles, soften harsh shadows, and transform scene illumination - production-ready and trained exclusively on licensed data. bria fibo-edit relighting json	OK	1/20	→
Fibo Edit [Restyle] image-to-image	Production-grade style transfer that maps photos to distinct artistic styles using curated, brand-safe presets. Trained exclusively on licensed data for risk-free commercial use. bria fibo-edit restyle json	OK	1/20	→
Fibo Edit [Rewrite Text] image-to-image	Precisely rewrite text inside images while preserving typography, fonts, and layout. High-quality, brand-safe edits trained exclusively on licensed data for safe commercial use. bria fibo-edit text-rewriting image-editing	OK	1/20	→
Fibo Edit [Erase by Text] image-to-image	Remove unwanted objects from images with a text prompt - fast, precise editing that seamlessly blends results. Built for production scale and trained on licensed data for safe commercial use. bria fibo-edit prompt-eraser	OK	1/20	→
Fibo Edit image-to-image	High-fidelity image editing model with state-of-the-art controllability. Combines JSON + Mask + Image for precise, fine-grained edits ideal for production and enterprise workflows. Trained on licensed data - safe for commercial use. bria fibo-edit image-editing json	OK	1/20	→
Fibo Edit [Add Object by Text] image-to-image	Precisely insert new objects into images with structured spatial commands. Context-aware, high-quality editing with seamless blending. Trained on licensed data for risk-free commercial and brand-safe use. bria fibo-edit object-addition json	OK	1/20	→
Fibo Edit [Blend] image-to-image	image composition model. Combine and blend multiple image parts into complex compositions through natural language and sequential editing. bria fibo-edit blend json	OK	1/20	→
Fibo Edit [Colorize] image-to-image	Image colorization and color-grading model. Bring color to black-and-white photos or apply curated color treatments using simple style-based commands. bria fibo-edit color	OK	1/20	→
Vidu image-to-video	Use the latest Vidu Q2 Pro models which much more better quality and control on your videos.	OK	1/19	→
FLUX.2 [klein] 9B Base LoRA image-to-image	Image-to-image editing with LoRA support for FLUX.2 [klein] 9B Base from Black Forest Labs. Specialized style transfer and domain-specific modifications.	OK	1/19	→
FLUX.2 [klein] 9B Base LoRA text-to-image	Text-to-image generation with LoRA support for FLUX.2 [klein] 9B Base from Black Forest Labs. Custom style adaptation and fine-tuned model variations.	OK	1/19	→
FLUX.2 [klein] 4B Base LoRA image-to-image	Image-to-image editing with LoRA support for FLUX.2 [klein] 4B Base from Black Forest Labs. Specialized style transfer and domain-specific modifications.	OK	1/19	→
FLUX.2 [klein] 4B Base LoRA text-to-image	Text-to-image generation with LoRA support for FLUX.2 [klein] 4B Base from Black Forest Labs. Custom style adaptation and fine-tuned model variations.	OK	1/19	→
Nemotron audio-to-text	Use the fast speed and pin point accuracy of nemotron to transcribe your texts.	Deprecated	1/19	→
Nemotron audio-to-text	Use the fast speed and pin point accuracy of nemotron to transcribe your texts.	Deprecated	1/19	→
Fibo Lite text-to-json	Convert plain text into Fibo-Lite's transparent JSON-structured prompts - Bria's unique controllability layer that no closed model offers. Built for agentic and enterprise workflows. bria fibo structured-prompt	OK	1/19	→
Fibo Lite text-to-json	Structured Prompt Generation endpoint for Fibo-Lite, Bria's SOTA Open source model bria structured-prompting	Deprecated	1/19	→
V2.6 image-to-video	Wan 2.6 image-to-video flash model.	OK	1/18	→
Flux 2 Klein 9B Base Trainer training	Fine-tune FLUX.2 [klein] 9B from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific editing tasks.	OK	1/17	→
FLUX 2 [klein] 9b Base Trainer training	Fine-tune FLUX.2 [klein] 9B from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific editing tasks.	OK	1/17	→
FLUX 2 [klein] 4b Base Trainer training	Fine-tune FLUX.2 [klein] 4B from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific styles and domains.	Deprecated	1/17	→
Flux 2 Klein image-to-image	Image-to-image editing with LoRA support for FLUX.2 [klein] 9B from Black Forest Labs. Specialized style transfer and domain-specific modifications.	Deprecated	1/17	→
Flux 2 Klein text-to-image	Text-to-image generation with LoRA support for FLUX.2 [klein] from Black Forest Labs. Custom style adaptation and fine-tuned model variations.	Deprecated	1/17	→
Flux 2 Klein 4B Base Trainer training	Fine-tune FLUX.2 [klein] 4B from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific editing tasks.	Deprecated	1/16	→
FLUX.2 [klein] 4B Base image-to-image	Image-to-image editing with FLUX.2 [klein] 4B Base from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.	OK	1/15	→
FLUX.2 [klein] 9B Base text-to-image	Text-to-image generation with FLUX.2 [klein] 9B Base from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.	OK	1/15	→
FLUX.2 [klein] 9B Base image-to-image	Image-to-image editing with Flux 2 [klein] 9B Base from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.	OK	1/15	→
FLUX.2 [klein] 4B Base text-to-image	Text-to-image generation with FLUX.2 [klein] 4B Base from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.	OK	1/15	→
FLUX.2 [klein] 4B image-to-image	Image-to-image editing with FLUX.2 [klein] 4B from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.	OK	1/15	→
FLUX.2 [klein] 9B image-to-image	Image-to-image editing with FLUX.2 [klein] 9B from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.	OK	1/15	→
FLUX.2 [klein] 9B text-to-image	Text-to-image generation with FLUX.2 [klein] 9B from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.	OK	1/15	→
FLUX.2 [klein] 4B text-to-image	Text-to-image generation with FLUX.2 [klein] 4B from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.	OK	1/15	→
ImagineArt 1.5 Pro Preview text-to-image	ImagineArt 1.5 Pro is an advanced text-to-image model that creates ultra-high-fidelity 4K visuals with lifelike realism, refined aesthetics, and powerful creative output suited for professional use. visuals imagineart realism text	OK	1/15	→
Qwen Image 2512 Trainer V2 training	Fast LoRA trainer for Qwen-Image-2512 lora personalization	Deprecated	1/15	→
Flux 2 [klein] 4B image-to-image	Image-to-image editing with Flux 2 [klein] 4B from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.	Deprecated	1/15	→
ElevenLabs Voice Changer audio-to-audio	Change the voices in your audios with voices in ElevenLabs! voice-change audio-to-audio	OK	1/14	→
ElevenLabs Dubbing audio-to-video	Generate dubbed videos or audios using ElevenLabs Dubbing feature! dubbing audio-to-audio	OK	1/14	→
ElevenLabs Speech to Text - Scribe V2 speech-to-text	Use Scribe-V2 from ElevenLabs to do blazingly fast speech to text inferences! speech-to-text	OK	1/14	→
Glm Image image-to-image	Create high-quality images with accurate text rendering and rich knowledge details—supports editing, style transfer, and maintaining consistent characters across multiple images. image-to-image	OK	1/14	→
GLM Image text-to-image	Create high-quality images with accurate text rendering and rich knowledge details—supports editing, style transfer, and maintaining consistent characters across multiple images. text-to-image	OK	1/14	→
OpenRouter [Video][Enterprise] video-to-text	Run any VLM (Video Language Model) with fal, powered by OpenRouter.	OK	1/13	→
OpenRouter [Video] video-to-text	Run any video-capable LLM with fal. Analyze, summarize, and understand video files using Gemini (Google) models. Supports mp4, mpeg, mov, webm, and YouTube links. Powered by OpenRouter.	OK	1/13	→
Nova SR audio-to-audio	Enhance muffled 16 kHz speech audio into crystal-clear 48 kHz speech-enhancements audio-super-resolution audio-sr	Deprecated	1/13	→
FLUX 2 Trainer V2 Edit training	Fine-tune FLUX.2 [dev] from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific editing tasks.	OK	1/10	→
FLUX 2 Trainer V2 training	Fine-tune FLUX.2 [dev] from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific styles and domains.	OK	1/10	→
Longcat Multi Avatar audio-to-video	LongCat-Video-Avatar is an audio-driven video generation model that can generates super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity. audio-to-video image-to-video	Deprecated	1/8	→
Silero VAD audio-to-text	Detect speech presence and timestamps with accuracy and speed using the ultra-lightweight Silero VAD model vad silero voice-activity-detection	OK	1/8	→
DeepFilterNet 3 audio-to-audio	Enhance speech audio by removing background noise and upsampling to 48KHz speech-enhancement	OK	1/7	→
LTX-2 Video to Video Trainer training	Train LTX-2 for video transformation or video-conditioned generation. ltx2-video fine-tuning video-to-video	Deprecated	1/7	→
Qwen Image Edit 2511 Multiple Angles image-to-image	Generates same scene from different angles (azimuth/elevation) with Qwen image Edit 2511 and the Lora Multiple Angles stylized transform lora multi-angles multiples angles	OK	1/7	→
LTX-2 19B Distilled video-to-video	Generate video with audio from videos using LTX-2 Distilled and custom LoRA	OK	1/7	→
LTX-2 19B Distilled video-to-video	Generate video with audio from videos using LTX-2 Distilled	OK	1/7	→
LTX-2 19B video-to-video	Generate video with audio from videos using LTX-2 and custom LoRA	OK	1/7	→
LTX-2 19B video-to-video	Generate video with audio from videos using LTX-2	OK	1/7	→
Ultrashape 3d-to-3d	UltraShape-1.0 is a 3D diffusion framework that generates high-fidelity 3D geometry through coarse-to-fine geometric refinement. 3d-to-3d	Deprecated	1/6	→
LTX-2 19B Distilled video-to-video	Extend videos with audio using LTX-2 Distilled and custom LoRA	OK	1/5	→
LTX-2 19B Distilled video-to-video	Extend videos with audio using LTX-2 Distilled	OK	1/5	→
LTX-2 19B Distilled image-to-video	Generate video with audio from images using LTX-2 Distilled and custom LoRA	OK	1/5	→
LTX-2 19B Distilled image-to-video	Generate video with audio from images using LTX-2 Distilled	OK	1/5	→
LTX-2 19B Distilled text-to-video	Generate video with audio from text using LTX-2 Distilled and custom LoRA	OK	1/5	→
LTX-2 19B Distilled text-to-video	Generate video with audio from text using LTX-2 Distilled	OK	1/5	→
LTX-2 19B video-to-video	Extend video with audio using LTX-2 and custom LoRA	OK	1/5	→
LTX-2 19B text-to-video	Generate video with audio from text using LTX-2 and custom LoRA	OK	1/5	→
LTX-2 19B image-to-video	Generate video with audio from images using LTX-2 and custom LoRA	OK	1/5	→
LTX-2 19B video-to-video	Extend video with audio using LTX-2	OK	1/5	→
LTX-2 19B text-to-video	Generate video with audio from text using LTX-2	OK	1/5	→
LTX-2 19B image-to-video	Generate video with audio from images using LTX-2	OK	1/5	→
LTX-2 Video Trainer training	Train LTX-2 for custom styles and effects. ltx2-video fine-tuning	Deprecated	1/3	→
Qwen Image 2512 text-to-image	LoRA inference endpoint for Qwen Image 2512, an improved version of Qwen Image with better text rendering, finer natural textures, and more realistic human generation. qwen 2512 lora	OK	1/2	→
Qwen Image 2512 Trainer training	Qwen Image 2512 LoRA training lora personalization	OK	1/1	→
Qwen Image Edit 2511 image-to-image	Endpoint for Qwen's Image Editing 2511 model with LoRa support. stylized transform lora	OK	2025/12/30	→
Qwen Image 2512 text-to-image	Qwen Image 2512 is an improved version of Qwen Image with better text rendering, finer natural textures, and more realistic human generation. qwen 2512	OK	2025/12/30	→
Longcat Multi Avatar audio-to-video	LongCat-Video-Avatar is an audio-driven video generation model that can generates super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity. audio-to-video image-to-video	Deprecated	2025/12/30	→
Longcat Single Avatar audio-to-video	LongCat-Video-Avatar is an audio-driven video generation model that can generates super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity. audio-to-video image-to-video	OK	2025/12/30	→
Longcat Single Avatar audio-to-video	LongCat-Video-Avatar is an audio-driven video generation model that can generates super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity. audio-to-video	OK	2025/12/30	→
Sam Audio audio-to-audio	Audio separation with SAM Audio. Isolate any sound using natural language—professional-grade audio editing made simple for creators, researchers, and accessibility applications. audio-to-audio sam-audio	OK	2025/12/30	→
Sam Audio audio-to-audio	Audio separation with SAM Audio. Isolate any sound using natural language—professional-grade audio editing made simple for creators, researchers, and accessibility applications. audio-to-audio sam-audio	OK	2025/12/30	→
Sam Audio video-to-audio	Audio separation with SAM Audio. Isolate any sound using natural language—professional-grade audio editing made simple for creators, researchers, and accessibility applications. video-to-audio sam-audio	OK	2025/12/30	→
Ai Home image-to-image	AI Home Style reimagines your home interior and exterior design with bold, prompt-driven concepts stylized transform	Deprecated	2025/12/30	→
Ai Home image-to-image	AI Home Edit transforms your home interior and exterior photos with realistic, prompt-based edits stylized transform	Deprecated	2025/12/30	→
Hunyuan Motion [0.46B] text-to-3d	Generate 3D human motions via text-to-generation interface of Hunyuan Motion! text-to-3d motion	OK	2025/12/30	→
Hunyuan Motion [1B] text-to-3d	Generate 3D human motions via text-to-generation interface of Hunyuan Motion! text-to-3d motion	OK	2025/12/30	→
Arbiter Image Text vision	Semantic image alignment measurements clip-score	Deprecated	2025/12/26	→
Arbiter Image Image vision	Image reference comparison measurements dists sdi mse ssim lpips	Deprecated	2025/12/26	→
Arbiter Image vision	Reference-free image measurements arniqa nima iqa musiq	Deprecated	2025/12/26	→
Wan Move [480p] image-to-video	Use Wan-Move to generate videos with controlled the motion using trajectories image-to-video motion-control motion	Deprecated	2025/12/24	→
Qwen Image Layered image-to-image	Qwen-Image-Layered is a model capable of decomposing an image into multiple RGBA layers. Use loras to get your custom outputs. qwen lora	OK	2025/12/24	→
FFmpeg API [Merge Audios] audio-to-audio	Merge audios into a single audio using FFmpeg API! ffmpeg	OK	2025/12/23	→
Wan v2.6 Text to Image text-to-image	Wan 2.6 text-to-image model. text-to-image	OK	2025/12/23	→
Wan v2.6 Image to Image image-to-image	Wan 2.6 image-to-image model. image-to-image	OK	2025/12/23	→
Video video-to-video	High-fidelity keypoint-driven video object removal - minimal input, strong temporal consistency. Trained on licensed data for risk-free commercial video editing. bria video erase keypoints	OK	2025/12/23	→
Video video-to-video	Erase unwanted objects, people, or elements from video with a text prompt. High-fidelity output with strong temporal consistency, trained on licensed data for safe commercial use. bria video erase	OK	2025/12/23	→
Video video-to-video	High-fidelity mask-based video object removal with strong temporal consistency. Erase unwanted objects, people, or elements while preserving aesthetic quality. Trained on licensed data for risk-free commercial use. bria video erase	OK	2025/12/23	→
Qwen Image Edit 2511 Trainer training	LoRA trainer for Qwen Image Edit 2511	Deprecated	2025/12/23	→
Kandinsky5 Pro image-to-video	Kandinsky 5.0 Pro is a diffusion model for fast, high-quality image-to-video generation.	OK	2025/12/23	→
Kandinsky5 Pro text-to-video	Kandinsky 5.0 Pro is a diffusion model for fast, high-quality text-to-video generation.	OK	2025/12/23	→
Bytedance Seedance V1.5 Pro Text To Video text-to-video	Generate videos with audio with Seedance 1.5 bytedance seedance audio	OK	2025/12/23	→
Bytedance Seedance V1.5 Pro Image To Video image-to-video	Generate videos with audio with Seedance 1.5 (supports start & end frame) bytedance seedance audio	OK	2025/12/23	→
Qwen Image Layered Trainer training	Train LoRAs for the Qwen-Image-Layered model, customize how images are split into layers. qwen layer trainer	Deprecated	2025/12/23	→
OpenRouter [Enterprise] llm	Run any LLM (Large Language Model) with fal, powered by OpenRouter.	OK	2025/12/22	→
Live Avatar image-to-video	Real-time avatar generation with Live Avatar. Have natural face-to-face conversations with AI avatars that respond instantly—streaming infinite-length video with immediate visual feedback. realtime image-to-video audio-to-video	Deprecated	2025/12/22	→
OpenRouter [Audio] unknown	Run any audio capable LLM with fal. Process audio files — transcription, analysis, understanding, understand— using Gemini (Google) models. Supports wav, mp3, aiff, aac, ogg, flac, m4a. Powered by OpenRouter.	OK	2025/12/22	→
Elevenlabs Music text-to-audio	Generate high quality, realistic music with fine controls using Elevenlabs Music! music text-to-music	OK	2025/12/22	→
Lightx video-to-video	Use tlightx capabilities to relight and recamera your videos. video-to-video	OK	2025/12/22	→
Lightx video-to-video	Use the capabilities of lightx to relight and recamera your videos. video-to-video recamera relight	OK	2025/12/22	→
Kling Video v2.6 Motion Control [Standard] video-to-video	Transfer movements from a reference video to any character image. Cost-effective mode for motion transfer, perfect for portraits and simple animations.	OK	2025/12/21	→
Kling Video v2.6 Motion Control [Pro] video-to-video	Transfer movements from a reference video to any character image. Pro mode delivers higher quality output, ideal for complex dance moves and gestures.	OK	2025/12/21	→
Qwen Image Edit 2511 image-to-image	Endpoint for Qwen's Image Editing 2511 model. stylized transform	OK	2025/12/19	→
Qwen Image Layered image-to-image	Qwen-Image-Layered is a model capable of decomposing an image into multiple RGBA layers. qwen layer	OK	2025/12/19	→
Lucy Restyle video-to-video	Restyle videos up to 30 min long - maintaining maximum detail quality. video-edit	OK	2025/12/18	→
Z Image Turbo Inpaint Lora image-to-image	Generate images from text, an image, a mask and custom LoRA using Z-Image Turbo, Tongyi-MAI's super-fast 6B model. inpainting	OK	2025/12/18	→
Z Image Turbo Inpaint image-to-image	Generate images from text, an image and a mask using Z-Image Turbo, Tongyi-MAI's super-fast 6B model. inpainting	OK	2025/12/18	→
Trellis 2 image-to-3d	Generate 3D models from your images using Trellis 2. A native 3D generative model enabling versatile and high-quality 3D asset creation. image-to-3D	OK	2025/12/17	→
Scail video-to-video	SCAIL is a character animation model that uses 3D consistent pose representations to animate reference images with coherent motion, supporting complex movements.	Deprecated	2025/12/17	→
Crystal Upscaler [Video] video-to-video	Do high precision video upscaling that respects the original video perfectly using Crystal Upscaler's new video upscaling method! upscale video-to-video	OK	2025/12/17	→
Vibevoice text-to-speech	Generate long speech snippets fast using Microsoft's powerful TTS. vibevoice fast	OK	2025/12/17	→
Bria Video Eraser Erase Mask video-to-video	A high-fidelity capability for erasing unwanted objects, people, or visual elements from videos while maintaining aesthetic quality and temporal consistency. bria erase	OK	2025/12/17	→
Bria Video Eraser video-to-video	A high-fidelity capability for erasing unwanted objects, people, or visual elements from videos while maintaining aesthetic quality and temporal consistency. bria erase	OK	2025/12/17	→
Bria Video Eraser video-to-video	A high-fidelity capability for erasing unwanted objects, people, or visual elements from videos while maintaining aesthetic quality and temporal consistency bria erase	OK	2025/12/17	→
Hunyuan Video V1.5 image-to-video	Hunyuan Video 1.5 is Tencent's latest and best video model image-to-video	OK	2025/12/17	→
Hunyuan3d V3 text-to-3d	Turn simple sketches into detailed, fully-textured 3D models. Instantly convert your concept designs into formats ready for Unity, Unreal, and Blender.	OK	2025/12/16	→
Hunyuan3d V3 image-to-3d	Create your imagined 3D models with just text. Production-ready, export-ready professional assets with realistic lighting and materials in minutes.	OK	2025/12/16	→
Hunyuan3d V3 image-to-3d	Transform your photos into ultra-high-resolution 3D models in seconds. Film-quality geometry with PBR textures, ready for games, e-commerce, and 3D printing.	OK	2025/12/16	→
FLUX 2 Flash Edit image-to-image	Image-to-image editing with FLUX.2 [dev] from Black Forest Labs. Precise modifications using natural language descriptions and hex color control—in a flash.	OK	2025/12/16	→
FLUX 2 Flash text-to-image	Text-to-image generation with FLUX.2 [dev] from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities— in a flash.	OK	2025/12/16	→
GPT-Image 1.5 image-to-image	GPT Image 1.5 generates high-fidelity images with strong prompt adherence, preserving composition, lighting, and fine-grained detail. openai gpt-image	OK	2025/12/16	→
GPT-Image 1.5 text-to-image	GPT Image 1.5 generates high-fidelity images with strong prompt adherence, preserving composition, lighting, and fine-grained detail. openai gpt-image	OK	2025/12/16	→
Kling Video Create Voice audio-to-audio	Create Voices to be used with Kling Models Voice Control	OK	2025/12/16	→
Fibo Lite text-to-image	Fast, low-latency text-to-image model with high-quality output and full JSON-structured controllability. Open-source, trained on licensed data, and optimized for production-scale generation. bria fibo lite	OK	2025/12/16	→
FLUX 2 Turbo Edit image-to-image	Image-to-image editing with FLUX.2 [dev] from Black Forest Labs. Precise modifications using natural language descriptions and hex color control—all at turbo speed.	OK	2025/12/16	→
FLUX 2 Turbo text-to-image	Text-to-image generation with FLUX.2 [dev] from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities—all at turbo speed.	OK	2025/12/16	→
Flux 2 Max text-to-image	FLUX.2 [max] delivers state-of-the-art image generation and advanced image editing with exceptional realism, precision, and consistency. flux2 max	OK	2025/12/16	→
Flux 2 Max image-to-image	FLUX.2 [max] delivers state-of-the-art image generation and advanced image editing with exceptional realism, precision, and consistency. flux2 image-editing high-quality	OK	2025/12/16	→
Ai Baby And Aging Generator image-to-image	AI Baby Generator is a service that instantly creates realistic predictions of a future child from parent photos. stylized transform	Deprecated	2025/12/16	→
Ai Baby And Aging Generator image-to-image	AI Aging Generator performs controllable age progression or regression from a single face photo, generating lifelike portraits across eight age groups from baby to senior. utility editing	Deprecated	2025/12/16	→
Ai Detector vision	AI Detector (Image) is an advanced service that analyzes a single picture and returns a verdict on whether it was likely created by AI. utility	Deprecated	2025/12/16	→
Ai Detector text-to-text	AI Detector (Text) is an advanced AI service that analyzes a passage and returns a verdict on whether it was likely written by AI. utility	Deprecated	2025/12/16	→
Wan v2.6 Text to Video text-to-video	Wan 2.6 text-to-video model. text-to-video	OK	2025/12/16	→
Wan v2.6 Reference to Video video-to-video	Wan 2.6 reference-to-video model. reference-to-video	OK	2025/12/16	→
Wan 2.6 video-to-video	Wan 2.6 reference-to-video model. reference-to-video	Deprecated	2025/12/16	→
Wan 2.6 text-to-video	Wan 2.6 text-to-video model. text-to-video	Deprecated	2025/12/16	→
Qwen Image Edit 2509 Lora Gallery image-to-image	Apply designs/graphics onto people's shirts stylized transform	OK	2025/12/15	→
Qwen Image Edit 2509 Lora Gallery image-to-image	Remove existing lighting and apply soft, even illumination stylized transform	OK	2025/12/15	→
Qwen Image Edit 2509 Lora Gallery image-to-image	Remove unwanted elements (objects, people, text) while maintaining image consistency stylized transform	OK	2025/12/15	→
Qwen Image Edit 2509 Lora Gallery image-to-image	Removes harsh shadows and light spots from images, replacing them with soft, even, natural-looking illumination. stylized transform	OK	2025/12/15	→
Qwen Image Edit 2509 Lora Gallery image-to-image	Blend products into backgrounds with automatic perspective and lighting correction stylized transform	OK	2025/12/15	→
Qwen Image Edit 2509 Lora Gallery image-to-image	Create group photos stylized transform	OK	2025/12/15	→
Qwen Image Edit 2509 Lora Gallery image-to-image	Generate full portrait from a cropped face photo stylized transform	OK	2025/12/15	→
Qwen Image Edit 2509 Lora Gallery image-to-image	Add a realistic scene behind the object with white background stylized transform	OK	2025/12/15	→
Qwen Image Edit 2509 Lora Gallery image-to-image	Create cinematic transitions and scene progressions (camera movements, framing changes) stylized transform	OK	2025/12/15	→
Qwen Image Edit 2509 Lora Gallery image-to-image	Precise camera position and angle control (rotation, zoom, vertical movement) stylized transform	OK	2025/12/15	→
Veo 3.1 Fast video-to-video	Extend Veo-Created Videos up to 30 seconds extend-video	OK	2025/12/15	→
Veo 3.1 video-to-video	Extend Veo-Created Videos up to 30 seconds extend-video	OK	2025/12/15	→
Qwen Image Edit 2509 Lora image-to-image	LoRA endpoint for the Qwen Image Edit 2509 model. image-to-image image-editing	OK	2025/12/15	→
Qwen Image Edit 2509 Trainer training	LoRA trainer for Qwen Image Edit 2509	OK	2025/12/15	→
Qwen Image Edit 2509 image-to-image	Endpoint for Qwen's Image Editing Plus model also known as Qwen-Image-Edit-2509. Has superior text editing capabilities and multi-image support. image-editing image-to-image high-quality-text	OK	2025/12/15	→
Wan v2.6 Image to Video image-to-video	Wan 2.6 image-to-video model. image-to-video	OK	2025/12/15	→
Kling O1 Reference Image to Video [Standard] image-to-video	Transform images, elements, and text into consistent, high-quality video scenes, ensuring stable character identity, object details, and environments.	OK	2025/12/15	→
Kling O1 First Frame Last Frame to Video [Standard] image-to-video	Generate a video by taking a start frame and an end frame, animating the transition between them while following text-driven style and scene guidance.	OK	2025/12/15	→
Kling O1 Reference Video to Video [Standard] video-to-video	Kling O1 Omni generates new shots guided by an input reference video, preserving cinematic language such as motion, and camera style to produce seamless scene continuity.	OK	2025/12/15	→
Kling O1 Edit Video [Standard] video-to-video	Edit an existing video using natural-language instructions, transforming subjects, settings, and style while retaining the original motion structure.	OK	2025/12/15	→
Chatterbox Turbo text-to-speech	Turbo-charged voice generation. Control every breath, laugh, and sigh with inline tags - now at turbo speed. text-to-speech	Deprecated	2025/12/15	→
Qwen Image Edit Plus Lora Gallery image-to-image	Removes harsh shadows and light spots from images, replacing them with soft, even, natural-looking illumination. stylized transform	OK	2025/12/12	→
Maya text-to-speech	Maya1 is a state-of-the-art speech model by Maya Research for expressive voice generation, built to capture real human emotion and precise voice design. text-to-speech tts	OK	2025/12/12	→
Maya text-to-speech	Maya1 is a state-of-the-art speech model by Maya Research for expressive voice generation, built to capture real human emotion and precise voice design. text-to-speech tts	OK	2025/12/12	→
Moondream3 Preview [Segment] image-to-image	Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale. mask segmentation	OK	2025/12/12	→
Fabric 1.0 text-to-video	VEED Fabric 1.0 text-to-video API lipsync avatar text-to-video	OK	2025/12/12	→
Steady Dancer video-to-video	Create smooth, realistic videos from a single photo while keeping the original appearance intact—precise motion control without losing identity or visual quality.	Deprecated	2025/12/11	→
One To All Animation video-to-video	One-to-All Animation is a pose driven video model that animates characters from a single reference image, enabling flexible, alignment-free motion transfer across diverse styles and scenes video to video motion	OK	2025/12/11	→
One To All Animation video-to-video	One-to-All Animation is a pose driven video model that animates characters from a single reference image, enabling flexible, alignment-free motion transfer across diverse styles and scenes video to video motion	OK	2025/12/11	→
Creatify Aurora image-to-video	Generate high fidelity, studio quality videos of your avatar speaking or singing using the Aurora from Creatify team! lipsync image-to-video	OK	2025/12/11	→
Wan Vision Enhancer video-to-video	Wan Vision Enhancer for magnify/enhance video with high fidelity and creativity. stylized transform	Deprecated	2025/12/10	→
Sync React-1 video-to-video	Use React-1 from SyncLabs to refine human emotions and do realistic lip-sync without losing details! lipsync video-to-video	OK	2025/12/10	→
Stepx Edit2 image-to-image	Image-to-image editing with Step1X-Edit v2 from StepFun. Reasoning-enhanced modifications through a thinking–editing–reflection loop with MLLM world knowledge for abstract instruction comprehension.	OK	2025/12/9	→
Z Image Turbo Controlnet Lora image-to-image	Generate images from text and edge, depth or pose images using custom LoRA and Z-Image Turbo, Tongyi-MAI's super-fast 6B model. turbo z-image fast lora	OK	2025/12/7	→
Z Image Turbo Controlnet image-to-image	Generate images from text and edge, depth or pose images using Z-Image Turbo, Tongyi-MAI's super-fast 6B model.	OK	2025/12/7	→
Z Image Turbo Image To Image Lora image-to-image	Generate images from text and images using custom LoRA and Z-Image Turbo, Tongyi-MAI's super-fast 6B model. turbo z-image fast lora	OK	2025/12/7	→
Z Image Turbo Image To Image image-to-image	Generate images from text and images using Z-Image Turbo, Tongyi-MAI's super-fast 6B model. turbo z-image fast	OK	2025/12/7	→
Longcat Image image-to-image	LongCat image Edit is a 6B parameter image editing model excelling at multilingual text rendering, photorealism and deployment efficiency.	OK	2025/12/5	→
Longcat Image text-to-image	LongCat image is a 6B parameter model excelling at multilingual text rendering, photorealism and deployment efficiency.	OK	2025/12/5	→
Kling AI Avatar v2 Pro image-to-video	Kling AI Avatar v2 Pro: The premium endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters	OK	2025/12/4	→
Kling AI Avatar v2 Standard image-to-video	Kling AI Avatar v2 Standard: Endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters	OK	2025/12/4	→
Z Image Trainer training	Train LoRAs on Z-Image Turbo, a super fast text-to-image model of 6B parameters developed by Tongyi-MAI. turbo z-image fast trainer	OK	2025/12/3	→
Bytedance Seedream V4.5 Text To Image text-to-image	A new-generation image creation model ByteDance, Seedream 4.5 integrates image generation and image editing capabilities into a single, unified architecture. stylized transform	OK	2025/12/3	→
Bytedance Seedream V4.5 Edit image-to-image	A new-generation image creation model ByteDance, Seedream 4.5 integrates image generation and image editing capabilities into a single, unified architecture. stylized transform	OK	2025/12/3	→
Sam 3 3d-to-3d	SAM 3D enables full scene reconstructions, placing objects and humans in a shared context together. align 3D	OK	2025/12/2	→
Sam 3 image-to-3d	SAM 3D allows for accurate 3D reconstruction of human body shape and position from a single image. 3d human pose	OK	2025/12/2	→
Sam 3 image-to-3d	SAM 3D enables precise 3D reconstruction of objects from real images, while accurately reconstructing their geometry and texture. 3d object	OK	2025/12/2	→
Vidu image-to-image	Vidu Reference-to-Image creates images by using a reference images and combining them with a prompt. images-to-imag reference-to-image	OK	2025/12/2	→
Vidu text-to-image	Use vidu Text-to-Image to turn your prompts into reality.	OK	2025/12/2	→
Kling Video v2.6 Text to Video text-to-video	Kling 2.6 Pro: Top-tier text-to-video with cinematic visuals, fluid motion, and native audio generation.	OK	2025/12/2	→
Kling Video v2.6 Image to Video image-to-video	Kling 2.6 Pro: Top-tier image-to-video with cinematic visuals, fluid motion, and native audio generation.	OK	2025/12/2	→
PixVerse V5.5 Effects image-to-video	Pixverse Effects	OK	2025/12/2	→
PixVerse V5.5 Transition image-to-video	Pixverse Transition	OK	2025/12/2	→
Z Image Turbo Lora text-to-image	Text-to-Image endpoint with LoRA support for Z-Image Turbo, a super fast text-to-image model of 6B parameters developed by Tongyi-MAI. z-image lora fast	OK	2025/12/1	→
PixVerse V5.5 Image To Video image-to-video	Generate high quality video clips from text and image prompts using PixVerse v5.5 image-to-video	OK	2025/12/1	→
PixVerse V5.5 Text To Video text-to-video	Generate high quality video clips from text and image prompts using PixVerse v5.5 text-to-video	OK	2025/12/1	→
Video Background Removal video-to-video	Remove background from any video with people and objects. No green screen needed.	OK	2025/12/1	→
Kling O1 First Frame Last Frame to Video [Pro] image-to-video	Generate a video by taking a start frame and an end frame, animating the transition between them while following text-driven style and scene guidance.	OK	2025/12/1	→
Kling O1 Reference Image to Video [Pro] image-to-video	Transform images, elements, and text into consistent, high-quality video scenes, ensuring stable character identity, object details, and environments.	OK	2025/12/1	→
Kling O1 Edit Video [Pro] video-to-video	Edit an existing video using natural-language instructions, transforming subjects, settings, and style while retaining the original motion structure.	OK	2025/12/1	→
Kling O1 Reference Video to Video [Pro] video-to-video	Kling O1 Omni generates new shots guided by an input reference video, preserving cinematic language such as motion, and camera style to produce seamless scene continuity.	OK	2025/12/1	→
Kling O1 Image image-to-image	Perform precise image edits using strong reference control, transforming subjects, styles, and local details while preserving visual consistency. edit realism typography	OK	2025/12/1	→
Ovis Image text-to-image	Ovis-Image is a 7B text-to-image model specifically optimized for quick, high quality text rendering. ovis-image artistic	OK	2025/11/29	→
Video Background Removal video-to-video	Remove background from any video with people and objects. No green screen needed.	OK	2025/11/28	→
Video Background Removal video-to-video	Remove background from videos filmed using chromakey, with automatic green spill suppression for clean, professional edges.	OK	2025/11/28	→
LTX Video 2.0 Fast text-to-video	Create high-fidelity video with audio from text with LTX-2 Fast	OK	2025/11/26	→
LTX Video 2.0 Pro text-to-video	Create high-fidelity video with audio from text with LTX-2 Pro.	OK	2025/11/26	→
LTX Video 2.0 Fast image-to-video	Create high-fidelity video with audio from images with LTX-2 Fast	OK	2025/11/26	→
LTX Video 2.0 Pro image-to-video	Create high-fidelity video with audio from images with LTX-2 Pro	OK	2025/11/26	→
LTX Video 2.0 Retake video-to-video	Change sections of a video using LTX-2	OK	2025/11/26	→
Z Image Turbo text-to-image	Z-Image Turbo is a super fast text-to-image model of 6B parameters developed by Tongyi-MAI. turbo z-image fast	OK	2025/11/26	→
LTX Video 2.0 Retake video-to-video	Change sections of a video using LTX-2	Deprecated	2025/11/26	→
Lucy Edit [Fast] video-to-video	Lucy Edit Fast is a rapid, localized video editing model that lets you modify specific elements like objects, or backgrounds in just 10 seconds. edit video-edit	Deprecated	2025/11/25	→
Flux 2 Lora Gallery text-to-image	Applies sepia vintage effect to images stylized transform	OK	2025/11/25	→
Flux 2 Lora Gallery image-to-image	Virtual clothing try-on (2 images: person + garment) stylized transform	OK	2025/11/25	→
Flux 2 Lora Gallery text-to-image	Generates satellite/aerial view style images stylized transform	OK	2025/11/25	→
FLUX 2 Lora Gallery Realism text-to-image	Makes images more photorealistic and natural stylized transform	OK	2025/11/25	→
Flux 2 Lora Gallery image-to-image	Generates same object from different angles (azimuth/elevation) stylized transform	OK	2025/11/25	→
FLUX 2 Lora Gallery Hdr Style text-to-image	HDR surrealistic effect with intense colors stylized transform	OK	2025/11/25	→
Flux 2 Lora Gallery image-to-image	Extends a face into a full body portrait stylized transform	OK	2025/11/25	→
Flux 2 Lora Gallery text-to-image	Transforms images into comic book style stylized transform	OK	2025/11/25	→
Flux 2 Lora Gallery text-to-image	Ballpoint pen sketch drawing style stylized transform	OK	2025/11/25	→
Flux 2 Lora Gallery image-to-image	Virtually furnishes an empty apartment stylized transform	OK	2025/11/25	→
Flux 2 Lora Gallery image-to-image	Add a background to images with white/clean background stylized transform	OK	2025/11/25	→
Crystal Upscaler image-to-image	An advanced image enhancement tool designed specifically for facial details and portrait photography, utilizing Clarity AI's upscaling technology. image-to-image	OK	2025/11/25	→
FLUX 2 Trainer Edit training	Fine-tune FLUX.2 [dev] from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific editing tasks.	OK	2025/11/25	→
FLUX 2 Trainer training	Fine-tune FLUX.2 [dev] from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific styles and domains.	OK	2025/11/25	→
Flux 2 Flex image-to-image	Image editing with FLUX.2 [flex] from Black Forest Labs. Supports multi-reference editing with customizable inference steps and enhanced text rendering.	OK	2025/11/25	→
Flux 2 Flex text-to-image	Text-to-image generation with FLUX.2 [flex] from Black Forest Labs. Features adjustable inference steps and guidance scale for fine-tuned control. Enhanced typography and text rendering capabilities. stylized transform	OK	2025/11/25	→
FLUX 2 Lora Edit image-to-image	Image-to-image editing with LoRA support for FLUX.2 [dev] from Black Forest Labs. Specialized style transfer and domain-specific modifications.	OK	2025/11/23	→
FLUX 2 Lora text-to-image	Text-to-image generation with LoRA support for FLUX.2 [dev] from Black Forest Labs. Custom style adaptation and fine-tuned model variations.	OK	2025/11/23	→
FLUX 2 Edit image-to-image	Image-to-image editing with FLUX.2 [dev] from Black Forest Labs. Precise modifications using natural language descriptions and hex color control.	OK	2025/11/23	→
FLUX 2 text-to-image	Text-to-image generation with FLUX.2 [dev] from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities.	OK	2025/11/23	→
Flux 2 Pro text-to-image	Image editing with FLUX.2 [pro] from Black Forest Labs. Ideal for high-quality image manipulation, style transfer, and sequential editing workflows	OK	2025/11/23	→
FLUX 2 Pro Edit image-to-image	Text-to-image generation with FLUX.2 [pro] from Black Forest Labs. Optimized for maximum quality, exceptional photorealism and artistic images.	OK	2025/11/23	→
Chrono Edit Lora image-to-image	LoRA endpoint for the Chrono Edit model. image-to-image image-editing	OK	2025/11/21	→
Chrono Edit Lora Gallery image-to-image	You can make edits simply by drawing a quick sketch on the input image. paint edit sketch	OK	2025/11/21	→
Chrono Edit Lora Gallery image-to-image	Upscales and cleans up the image. upscale details	OK	2025/11/21	→
Hunyuan Video V1.5 text-to-video	Hunyuan Video 1.5 is Tencent's latest and best video model hunyuan-video text-to-video	OK	2025/11/21	→
Sam 3 image-to-image	SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks. segmentation rle real-time	OK	2025/11/20	→
Sam 3 vision	SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks. embeddings mask real-time	OK	2025/11/20	→
Sam 3 video-to-video	SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks. segmentation mask real-time rle	OK	2025/11/20	→
Sam 3 video-to-video	SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks. segmentation mask real-time	OK	2025/11/20	→
Segment Anything Model 3 image-to-image	SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks. segmentation mask real-time	OK	2025/11/20	→
Gemini 3 Pro Image Preview image-to-image	Gemini 3 Pro Image (a.k.a Nano Banana Pro) is Google's state-of-the-art high-fidelity image generation and editing model realism typography	OK	2025/11/20	→
Gemini 3 Pro Image Preview text-to-image	Gemini 3 Pro Image (a.k.a Nano Banana Pro) is Google's state-of-the-art high-fidelity image generation and editing model realism typography	OK	2025/11/20	→
Nano Banana Pro image-to-image	Nano Banana Pro is Google's new state-of-the-art image generation and editing model realism typography	OK	2025/11/20	→
Nano Banana Pro text-to-image	Nano Banana Pro is Google's new state-of-the-art image generation and editing model realism typography	OK	2025/11/20	→
Imagineart 1.5 Preview text-to-image	ImagineArt 1.5 text-to-image model generates high-fidelity professional-grade visuals with lifelike realism, strong aesthetics, and text that actually reads correctly. visuals imagineart realism text	OK	2025/11/20	→
Lynx image-to-video	Generate subject consistent videos using Lynx from ByteDance! image-to-video subject	OK	2025/11/18	→
Maya1 text-to-speech	Maya1 is a state-of-the-art speech model by Maya Research for expressive voice generation, built to capture real human emotion and precise voice design. text-to-speech tts	OK	2025/11/15	→
OpenRouter Responses [OpenAI Compatible] llm	The OpenRouter Responses API with fal, powered by OpenRouter, provides unified access to a wide range of large language models - including GPT, Claude, Gemini, and many others through a single API interface.	OK	2025/11/13	→
Fibo Mashup image-to-image	Combine three images to create an amazing mashup image with Bria's FIBO model. bria fibo image-to-image	Deprecated	2025/11/13	→
OpenRouter Embeddings [OpenAI Compatible] llm	Generate text embeddings using OpenAI-compatible API. Access embedding models like text-embedding-3-small, text-embedding-3-large (OpenAI), and other embedding models available through OpenRouter. Drop-in replacement for the OpenAI embeddings API. Powered by OpenRouter.	OK	2025/11/12	→
OpenRouter [Vision] vision	Run any Vision Language Model with fal. Analyze and understand images using Claude (Anthropic), GPT-5 / GPT-4o (OpenAI), Gemini (Google), Grok (xAI), Llama (Meta), Qwen, Pixtral (Mistral), and more. Send one or multiple images for captioning, analysis, OCR, or visual Q&A. Powered by OpenRouter.	OK	2025/11/12	→
OpenRouter llm	Run any LLM with fal. Access Claude (Anthropic), ChatGPT / GPT-5 / GPT-4o (OpenAI), Gemini (Google), Grok (xAI), DeepSeek, Llama (Meta), Qwen (Alibaba), Mistral, and 200+ more models through a single API. Supports reasoning, structured output, and streaming. Powered by OpenRouter.	OK	2025/11/12	→
OpenRouter Chat Completions [OpenAI Compatible] llm	OpenAI-compatible chat completions API. Drop-in replacement for the OpenAI API — use any OpenAI SDK or client to access Claude, Gemini, Grok, DeepSeek, Llama, Qwen, Mistral, and all OpenAI models (GPT-5, GPT-4o, o3) through fal. Powered by OpenRouter.	OK	2025/11/12	→
Editto video-to-video	Edit videos using instruction-based prompting using Editto model! video-edit wan-vace	OK	2025/11/12	→
Qwen Image Edit Plus Lora Gallery image-to-image	Precise camera position and angle control (rotation, zoom, vertical movement) stylized transform	OK	2025/11/11	→
Qwen Image Edit Plus Lora Gallery image-to-image	Apply designs/graphics onto people's shirts stylized transform	OK	2025/11/11	→
Qwen Image Edit Plus Lora Gallery image-to-image	Remove existing lighting and apply soft, even illumination stylized transform	OK	2025/11/11	→
Qwen Image Edit Plus Lora Gallery image-to-image	Remove unwanted elements (objects, people, text) while maintaining image consistency stylized transform	OK	2025/11/11	→
Qwen Image Edit Plus Lora Gallery image-to-image	Create cinematic transitions and scene progressions (camera movements, framing changes) stylized transform	OK	2025/11/11	→
Qwen Image Edit Plus Lora Gallery image-to-image	Blend products into backgrounds with automatic perspective and lighting correction stylized transform	OK	2025/11/11	→
Qwen Image Edit Plus Lora Gallery image-to-image	Create group photos stylized transform	OK	2025/11/11	→
Qwen Image Edit Plus Lora Gallery image-to-image	Generate full portrait from a cropped face photo stylized transform	OK	2025/11/11	→
Qwen Image Edit Plus Lora Gallery image-to-image	Add a realistic scene behind the object with white background stylized transform	OK	2025/11/11	→
Flashvsr video-to-video	Upscale your videos using FlashVSR with the fastest speeds! upscale video-to-video	OK	2025/11/11	→
PixVerse Swap image-to-video	Generate high quality video clips by swapping person, objects and background using Pixverse Swap.	OK	2025/11/10	→
Pika image-to-video	Discover ultimate control with Pikaframes key frame interpolation, a stunning image-to-video feature that allows you to upload up to 5 keyframes, customize their transition length and prompt, and see their images come to life as seamless videos.	OK	2025/11/7	→
Infinity Star text-to-video	InfinityStar’s unified 8B spacetime autoregressive engine to turn any text prompt into crisp 720p videos - 10× faster than diffusion models. text-to-video	OK	2025/11/7	→
Sana Video text-to-video	Leverage Sana's ultra-fast processing speed to generate high-quality assets that transform your text prompts into production-ready videos text-to-video	Deprecated	2025/11/7	→
Crystal Upscaler image-to-image	An advanced image enhancement tool designed specifically for facial details and portrait photography, utilizing Clarity AI's upscaling technology. image-to-image	Deprecated	2025/11/5	→
Workflow Utilities Auto Subtitle video-to-video	Add automatic subtitles to videos auto-subtitle captioning	OK	2025/11/4	→
Reve image-to-image	Reve’s fast remix model lets you upload an reference images and then combine/transform them via a text prompt at lightning speed! image-to-image	Deprecated	2025/11/4	→
Reve image-to-image	Reve’s fast edit model lets you upload an existing image and then transform it via a text prompt at lightning speed! image-to-image	Deprecated	2025/11/4	→
Image Outpaint image-to-image	Directional outpainting. Choose edges to expand. left, right, top, or center (uniform all sides). Only expanded areas are generated; an optional zoom-out pulls the frame back by the chosen amount. outpainting	OK	2025/11/3	→
Fashion Size Estimator vision	Fashion Size Estimator model analyzes human body images to predict clothing size recommendations and estimate key body measurements including height, bust, waist, and hip dimensions. utility editing	Deprecated	2025/11/3	→
Flux Vision Upscaler image-to-image	Flux Vision Upscaler for magnify/upscaling images with high fidelity and creativity.	OK	2025/11/2	→
Emu 3.5 Image image-to-image	Edit images with a text prompt using Emu 3.5 Image	OK	2025/11/1	→
Emu 3.5 Image text-to-image	Generate images from text using Emu 3.5 Image	OK	2025/11/1	→
Sima Video Upscaler Lite video-to-video	Upscale your videos at real-time speeds with Sima Labs! upscale video-to-video	Deprecated	2025/10/31	→
Bytedance Upscaler Upscale Video video-to-video	Upscale videos with Bytedance's video upscaler. upscaler video bytedance	OK	2025/10/31	→
Sima Upscaler image-to-image	Upscale your images at blazingly fast speeds with Sima Labs! upscale image-to-image	Deprecated	2025/10/31	→
Chrono Edit image-to-image	NVIDIA's Logically Consistent and Physics-Aware Image Editing Model image-editing	OK	2025/10/30	→
Minimax Music text-to-audio	Generate music from text prompts using the MiniMax Music 2.0 model, which leverages advanced AI techniques to create high-quality, diverse musical compositions. music audio	OK	2025/10/30	→
Qwen Image Edit Plus Trainer training	LoRA trainer for Qwen Image Edit Plus	Deprecated	2025/10/30	→
Qwen Image Edit Trainer training	LoRA trainer for Qwen Image Edit	Deprecated	2025/10/30	→
LongCat Video text-to-video	Generate long videos in 720p/30fps from text using LongCat Video	OK	2025/10/30	→
LongCat Video image-to-video	Generate long videos in 720p/30fps from images using LongCat Video	OK	2025/10/30	→
LongCat Video image-to-video	Generate long videos from images using LongCat Video	OK	2025/10/30	→
LongCat Video text-to-video	Generate long videos from text using LongCat Video	OK	2025/10/30	→
LongCat Video Distilled image-to-video	Generate long videos in 720p/30fps from images using LongCat Video Distilled	OK	2025/10/30	→
LongCat Video Distilled text-to-video	Generate long videos in 720p/30fps from text using LongCat Video Distilled	OK	2025/10/30	→
Fibo text-to-json	Structured Prompt Generation endpoint for Fibo, Bria's SOTA Open source model. bria fibo structured-prompting	OK	2025/10/29	→
Omnipart image-to-3d	Image-to-3D endpoint for OmniPart, a part-aware 3D generator with semantic decoupling and structural cohesion.	Deprecated	2025/10/29	→
MiniMax Speech 2.6 [Turbo] text-to-speech	Generate speech from text prompts and different voices using the MiniMax Speech-2.6 HD model, which leverages advanced AI techniques to create high-quality text-to-speech. text-to-speech	OK	2025/10/29	→
MiniMax Speech 2.6 [HD] text-to-speech	Generate speech from text prompts and different voices using the MiniMax Speech-2.6 HD model, which leverages advanced AI techniques to create high-quality text-to-speech. text-to-speech	OK	2025/10/29	→
Video As Prompt video-to-video	A model for unified semantic control in video generation. It animates a static reference image using the motion and semantics from a reference video as a prompt. video-as-prompt semantic control	Deprecated	2025/10/29	→
Bytedance image-to-3d	Image to 3D endpoint for Bytedance's high-quality Seed3D 3d model generator. seed3d.quality bytedance 3d	Deprecated	2025/10/29	→
Fibo text-to-image	SOTA open-source text-to-image model delivering high-fidelity outputs with accurate typography. JSON-structured prompts provide production-ready controllability for enterprise and agentic workflows. Trained exclusively on licensed data. bria fibo prompt-adherence	OK	2025/10/29	→
LongCat Video Distilled image-to-video	Generate long videos from images using LongCat Video Distilled	OK	2025/10/29	→
LongCat Video Distilled text-to-video	Generate long videos from text using LongCat Video Distilled	OK	2025/10/28	→
Demucs audio-to-audio	SOTA stemming model for voice, drums, bass, guitar and more. audio	OK	2025/10/27	→
Piflow text-to-image	Use the faster speed of piflow to generate images with same quality to that of slower models. text-to-image	Deprecated	2025/10/27	→
MiniMax Hailuo 2.3 [Pro] (Image to Video) image-to-video	MiniMax Hailuo-2.3 Image To Video API (Pro, 1080p): Advanced image-to-video generation model with 1080p resolution image-to-video	OK	2025/10/27	→
MiniMax Hailuo 2.3 Fast [Standard] (Image to Video) image-to-video	MiniMax Hailuo-2.3-Fast Image To Video API (Standard, 768p): Advanced fast image-to-video generation model with 768p resolution image-to-video	OK	2025/10/27	→
MiniMax Hailuo 2.3 [Standard] (Image to Video) image-to-video	MiniMax Hailuo-2.3 Image To Video API (Standard, 768p): Advanced image-to-video generation model with 768p resolution image-to-video	OK	2025/10/27	→
MiniMax Hailuo 2.3 Fast [Pro] (Image to Video) image-to-video	MiniMax Hailuo-2.3-Fast Image To Video API (Pro, 1080p): Advanced fast image-to-video generation model with 1080p resolution image-to-video	OK	2025/10/27	→
MiniMax Hailuo 2.3 [Standard] (Text to Video) text-to-video	MiniMax Hailuo-2.3 Text To Video API (Standard, 768p): Advanced text-to-video generation model with 768p resolution text-to-video	OK	2025/10/27	→
MiniMax Hailuo 2.3 [Pro] (Text to Video) text-to-video	MiniMax Hailuo-2.3 Text To Video API (Pro, 1080p): Advanced text-to-video generation model with 1080p resolution text-to-video	OK	2025/10/27	→
Birefnet video-to-video	Video background removal version of bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS) utility editing	OK	2025/10/26	→
Audio Understanding audio-to-audio	A audio understanding model to analyze audio content and answer questions about what's happening in the audio based on user prompts. utility audio	OK	2025/10/24	→
Bytedance Seedance V1 Pro Fast Image To Video image-to-video	Image to Video endpoint for Seedance 1.0 Pro Fast, a next-generation video model designed to deliver maximum performance at minimal cost bytedance seedance pro fast	OK	2025/10/24	→
Bytedance Seedance V1 Pro Fast Text To Video text-to-video	Text to Video endpoint for Seedance 1.0 Pro Fast, a next-generation video model designed to deliver maximum performance at minimal cost bytedance fast motion	OK	2025/10/24	→
Vidu video-to-video	Use the latest Vidu Q2 models which much more better quality and control on your videos.	OK	2025/10/24	→
Vidu image-to-video	Use the latest Vidu Q2 models which much more better quality and control on your videos. image-to-video	OK	2025/10/24	→
Vidu image-to-video	Use the latest Vidu Q2 models which much more better quality and control on your videos. image-to-video	OK	2025/10/24	→
LTX Video 2.0 Pro text-to-video	Create high-fidelity video with audio from text with LTX-2 Pro.	Deprecated	2025/10/23	→
LTX Video 2.0 Fast text-to-video	Create high-fidelity video with audio from text with LTX-2 Fast	Deprecated	2025/10/23	→
LTX Video 2.0 Pro image-to-video	Create high-fidelity video with audio from images with LTX-2 Pro	Deprecated	2025/10/23	→
LTX Video 2.0 Fast image-to-video	Create high-fidelity video with audio from images with LTX-2 Fast	Deprecated	2025/10/23	→
Vidu text-to-video	Use the latest Vidu Q2 models which much more better quality and control on your videos.	OK	2025/10/22	→
Kling Video image-to-video	Kling 2.5 Turbo Standard: Top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision. stylized transform	OK	2025/10/22	→
GPT Image 1 Mini image-to-image	GPT Image 1 mini combines OpenAI's advanced language capabilities, powered by GPT-5, with GPT Image 1 Mini for efficient image generation. image-to-image	OK	2025/10/21	→
GPT Image 1 Mini text-to-image	GPT Image 1 mini combines OpenAI's advanced language capabilities, powered by GPT-5, with GPT Image 1 Mini for efficient image generation. text-to-image	OK	2025/10/21	→
Qwen 3 Guard [8B] llm	Use Qwen 3 Guard [8B] to detect and classify text as safe or harmful, delivering precise and reliable safety categorization. filter safety utility	Deprecated	2025/10/20	→
Krea Wan 14b- Text to Video text-to-video	Fast Text-to-Video endpoint for Krea's Wan 14b model. text to video fast	OK	2025/10/20	→
Sound Effect Generation text-to-audio	Create professional-grade sound effects from animal and vehicle to nature, sci-fi, and otherworldly sounds. Perfect for films, games, and digital content. sfx audio effects speech	Deprecated	2025/10/18	→
Music Generation text-to-audio	Generate royalty-free instrumental music from electronic, hip hop, and indie rock to cinematic and classical genres. Perfect for games, films, social content, podcasts, and more. speech audio music	Deprecated	2025/10/18	→
Meshy 5 Retexture 3d-to-3d	Meshy-5 retexture applies new, high-quality textures to existing 3D models using either text prompts or reference images. It supports PBR material generation for realistic, production-ready results. 3d-to-3d	OK	2025/10/18	→
Meshy 5 Remesh 3d-to-3d	Meshy-5 remesh allows you to remesh and export existing 3D models into various formats 3d-to-3d	OK	2025/10/18	→
Reve image-to-image	Reve’s remix model lets you upload an reference images and then combine/transform them via a text prompt image-to-image	Deprecated	2025/10/17	→
Reve text-to-image	Reve’s text-to-image model generates detailed visual output that closely follow your instructions, with strong aesthetic quality and accurate text rendering. text-to-image	Deprecated	2025/10/17	→
Reve image-to-image	Reve’s edit model lets you upload an existing image and then transform it via a text prompt image-to-image	Deprecated	2025/10/17	→
Wan Alpha text-to-video	Generate videos with transparent backgrounds transparent alpha	Deprecated	2025/10/16	→
Mirelo SFX V1.5 video-to-audio	Generate synced sounds for any video, and return the new sound track (like MMAudio) video-to-audio sfx	OK	2025/10/15	→
Mirelo SFX V1.5 video-to-video	Generate synced sounds for any video, and return it with its new sound track (like MMAudio) video-to-video sfx	OK	2025/10/15	→
Krea Wan 14B video-to-video	Superfast video model based on Wan 2.1 14b by Krea, excelling at real-time video-editing.	OK	2025/10/14	→
Image2Pixel image-to-image	Turn images into pixel-perfect retro art post-processing pixel-art	OK	2025/10/14	→
Kandinsky5 text-to-video	Kandinsky 5.0 Distilled is a lightweight diffusion model for fast, high-quality text-to-video generation.	OK	2025/10/13	→
Kandinsky5 text-to-video	Kandinsky 5.0 is a diffusion model for fast, high-quality text-to-video generation.	OK	2025/10/13	→
DreamOmni2 image-to-image	DreamOmni2 is a unified multimodal model for text and image guided image editing.	OK	2025/10/10	→
Moondream3 Preview [Detect] vision	Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale. Vision	OK	2025/10/9	→
Moondream3 Preview [Point] vision	Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale. Vision	OK	2025/10/9	→
Moondream 3 Preview [Query] vision	Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale. Vision	OK	2025/10/9	→
Moondream3 Preview [Caption] vision	Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale. Vision	OK	2025/10/9	→
Kling Video video-to-audio	Generate audio from input videos using Kling	OK	2025/10/9	→
Sora 2 video-to-video	Video-to-video remix endpoint for Sora 2, OpenAI’s advanced model that transforms existing videos based on new text or image prompts allowing rich edits, style changes, and creative reinterpretations while preserving motion and structure video to video audio sora	Deprecated	2025/10/8	→
Veo 3.1 Fast image-to-video	Generate videos from a first/last frame using Google's Veo 3.1 Fast	OK	2025/10/8	→
Veo 3.1 image-to-video	Generate videos from a first and last framed using Google's Veo 3.1	OK	2025/10/8	→
Veo 3.1 Fast image-to-video	Generate videos from reference images using Google's Veo 3.1 Fast	OK	2025/10/8	→
Veo 3.1 image-to-video	Generate Videos from images using Google's Veo 3.1	OK	2025/10/8	→
Veo 3.1 Fast text-to-video	Faster and more cost effective version of Google's Veo 3.1!	OK	2025/10/8	→
Veo 3.1 Fast image-to-video	Generate videos from your image prompts using Veo 3.1 fast.	OK	2025/10/8	→
Veo 3.1 image-to-video	Veo 3.1 is the latest state-of-the art video generation model from Google DeepMind	OK	2025/10/8	→
Veo 3.1 text-to-video	Veo 3.1 by Google, the most advanced AI video generation model in the world. With sound on!	OK	2025/10/8	→
Hunyuan Part 3d-to-3d	Use the capabilities of hunyuan part to generate point clouds from your 3D files. 3D-to-3D point-cloud	Deprecated	2025/10/8	→
Wan 2.1 VACE Long Reframe video-to-video	Reframe entire videos scene-by-scene using Wan VACE 2.1	OK	2025/10/7	→
Index TTS 2.0 text-to-speech	Generate natural, clear speeches using Index TTS 2.0 from IndexTeam text-to-speech	OK	2025/10/7	→
Meshy 6 Preview text-to-3d	Meshy-6-Preview is the latest model from Meshy. It generates realistic and production ready 3D models. text-to-3d	OK	2025/10/6	→
Meshy 5 Multi image-to-3d	Meshy-5 multi image generates realistic and production ready 3D models from multiple images. multi-image-to-3d	OK	2025/10/6	→
Meshy 6 Preview image-to-3d	Meshy-6-Preview is the latest model from Meshy. It generates realistic and production ready 3D models. image-to-3d	OK	2025/10/6	→
Sora 2 image-to-video	Image-to-video endpoint for Sora 2 Pro, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images. image-to-video audio sora-2-pro	Deprecated	2025/10/6	→
Sora 2 text-to-video	Text-to-video endpoint for Sora 2 Pro, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images. text-to-video audio sora-2-pro	Deprecated	2025/10/6	→
Sora 2 text-to-video	Text-to-video endpoint for Sora 2, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images. text to video audio sora	Deprecated	2025/10/6	→
Sora 2 image-to-video	Image-to-video endpoint for Sora 2, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images. image-to-video audio sora	Deprecated	2025/10/6	→
Qwen Image Edit Plus Lora image-to-image	LoRA endpoint for the Qwen Image Edit Plus model. image-to-image image-editing	OK	2025/10/3	→
Lucidflux image-to-image	LucidFlux for upscaling images with very high fidelity image-to-image	Deprecated	2025/10/3	→
Ovi image-to-video	Ovi can generate videos with audio from image and text inputs. image-to-audio-video image-to-video	OK	2025/10/3	→
Ovi Text to Video text-to-video	A unified paradigm for audio-video generation	OK	2025/10/3	→
Fabric 1.0 Fast image-to-video	VEED Fabric 1.0 is an image-to-video API that turns any image into a talking video lipsync avatar	OK	2025/10/1	→
Qwen Image Edit image-to-image	Image to Image Endpoint for Qwen's Image Editing model. Has superior text editing capabilities. stylized transform	OK	2025/9/30	→
Hunyuan Image text-to-image	Leverage the state-of-the-art capabilities of Hunyuan Image 3.0 to generate visual content that effectively conveys the messaging of your written material. text-to-image	OK	2025/9/28	→
Hyper3d image-to-3d	Rodin by Hyper3D generates realistic and production ready 3D models from text or images. image-to-3d text-to-3d	OK	2025/9/26	→
Lynx image-to-video	Generate subject consistent videos using Lynx from ByteDance! image-to-video subject	Deprecated	2025/9/26	→
Wan 2.5 Image to Image image-to-image	Wan 2.5 image-to-image model.	OK	2025/9/25	→
Wan 2.5 Text to Image text-to-image	Wan 2.5 text-to-image model.	OK	2025/9/25	→
Wan 2.5 Text to Video text-to-video	Wan 2.5 text-to-video model.	OK	2025/9/24	→
Wan 2.5 Image to Video image-to-video	Wan 2.5 image-to-video model.	OK	2025/9/24	→
Bytedance Omnihuman V1.5 image-to-video	Omnihuman v1.5 is a new and improved version of Omnihuman. It generates video using an image of a human figure paired with an audio file. It produces vivid, high-quality videos where the character’s emotions and movements maintain a strong correlation with the audio. image-to-video lipsync	OK	2025/9/23	→
Product Photoshoot image-to-image	Create product advertisements with an example image of the product	Deprecated	2025/9/23	→
Qwen Image Edit Plus image-to-image	Endpoint for Qwen's Image Editing Plus model also known as Qwen-Image-Edit-2509. Has superior text editing capabilities and multi-image support. image-editing image-to-image high-quality-text	OK	2025/9/22	→
Kling Video image-to-video	Kling 2.5 Turbo Pro: Top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision. stylized transform	OK	2025/9/22	→
Kling v2.5 Text to Video text-to-video	Kling 2.5 Turbo Pro: Top-tier text-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision. animation stylized	OK	2025/9/22	→
Infinitalk video-to-video	Infinitalk model generates a talking avatar video from an image and audio file. The avatar lip-syncs to the provided audio with natural facial expressions. video-to-video	OK	2025/9/22	→
SeedVR2 video-to-video	Upscale your videos using SeedVR2 with temporal consistency! upscale video-to-video	OK	2025/9/22	→
SeedVR2 image-to-image	Use SeedVR2 to upscale your images upscale image-to-image	OK	2025/9/22	→
Wan VACE Video Edit video-to-video	Edit videos using plain language and Wan VACE video-edit wan-vace	OK	2025/9/22	→
Wan-2.2 Animate Replace video-to-video	Wan-Animate Replace is a model that can integrate animated characters into reference videos, replacing the original character while preserving the scene’s lighting and color tone for seamless environmental integration. video to video motion	OK	2025/9/21	→
Wan-2.2 Animate Move video-to-video	Wan-Animate is a video model that generates high-fidelity character videos by replicating the expressions and movements of characters from reference videos. video to video motion	OK	2025/9/21	→
Fabric 1.0 image-to-video	VEED Fabric 1.0 is an image-to-video API that turns any image into a talking video lipsync avatar	OK	2025/9/19	→
Product Holding image-to-image	Place products naturally in a person’s hands for realistic marketing visuals. product marketing	OK	2025/9/19	→
Product Photography image-to-image	Generate professional product photography with realistic lighting and backgrounds. product marketing	OK	2025/9/19	→
Lucy Edit [Pro] video-to-video	Edit outfits, objects, faces, or restyle your video - all with maximum detail retention. video-edit	OK	2025/9/18	→
Lucy Edit [Dev] video-to-video	Edit outfits, objects, faces, or restyle your video - all with maximum detail retention. video-edit	Deprecated	2025/9/18	→
Virtual Try-on image-to-image	Try on clothes virtually by combining person and clothing images. fashion try-on virtual-try-on	OK	2025/9/18	→
Texture Transform image-to-image	Transform objects with different surface textures like marble, wood, or fabric. texture-transform	OK	2025/9/18	→
Relighting image-to-image	Adjust and enhance images with different lighting styles. relighting	OK	2025/9/18	→
Style Transfer image-to-image	Apply artistic styles like impressionism, cubism, or surrealism to your images. style-transfer	OK	2025/9/18	→
Photo Restoration image-to-image	Restore old or damaged photos by fixing colors, scratches, and resolution. photo-restoration image-enhance	OK	2025/9/18	→
Portrait Enhance image-to-image	Enhance and refine portrait photos with improved clarity and detail. image-edit enhancement	OK	2025/9/18	→
Photography Effects image-to-image	Apply diverse photography styles and effects to transform your images. style-transfer photography	OK	2025/9/18	→
Perspective Change image-to-image	Easily adjust the perspective of any image to different angles. change-angle perspective	OK	2025/9/18	→
Object Removal image-to-image	Remove unwanted objects seamlessly from any image. remove object-removal	OK	2025/9/18	→
Headshot Generator image-to-image	Generate professional headshot photos with customizable backgrounds. headshot profile-photo	OK	2025/9/18	→
Hair Change image-to-image	Change hairstyles and hair colors in photos realistically. hair-edit style-change	OK	2025/9/18	→
Expression Change image-to-image	Change facial expressions in photos with realistic results. face-edit expression-change	OK	2025/9/18	→
City Teleport image-to-image	Place a person’s photo into iconic cities worldwide. city-teleport backgroundswap	OK	2025/9/18	→
Age Modify image-to-image	Modify a face to look younger or older while keeping identity realistic. age-transformation face-editing	OK	2025/9/18	→
Makeup Changer image-to-image	Apply realistic makeup styles with adjustable intensity. makeup transform	OK	2025/9/18	→
Qwen Image Edit image-to-image	Inpainting Endpoint for the Qwen Edit Image editing model. image-to-image inpainting qwen-image	OK	2025/9/17	→
Wan 2.2 VACE Fun A14B video-to-video	VACE Fun for Wan 2.2 A14B from Alibaba-PAI	OK	2025/9/17	→
Wan 2.2 VACE Fun A14B video-to-video	VACE Fun for Wan 2.2 A14B from Alibaba-PAI	OK	2025/9/17	→
Wan 2.2 VACE Fun A14B video-to-video	VACE Fun for Wan 2.2 A14B from Alibaba-PAI	OK	2025/9/17	→
Wan 2.2 VACE Fun A14B video-to-video	VACE Fun for Wan 2.2 A14B from Alibaba-PAI	OK	2025/9/17	→
Wan 2.2 VACE Fun A14B video-to-video	VACE Fun for Wan 2.2 A14B from Alibaba-PAI	Deprecated	2025/9/17	→
Isaac 0.1 [OpenAI Compatible Endpoint] vision	OpenAI spec compatible endpoint of Isaac-01 which is a multimodal vision-language model from Perceptron for various vision language tasks. multimodal vision	OK	2025/9/17	→
Isaac 0.1 vision	Isaac-01 is a multimodal vision-language model from Perceptron for various vision language tasks. multimodal vision	OK	2025/9/17	→
FLUX.1 SRPO [dev] image-to-image	FLUX.1 SRPO [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.	OK	2025/9/15	→
FLUX.1 SRPO [dev] text-to-image	FLUX.1 SRPO [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.	OK	2025/9/15	→
FLUX.1 SRPO [dev] image-to-image	FLUX.1 SRPO [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.	OK	2025/9/15	→
FLUX.1 SRPO [dev] text-to-image	FLUX.1 SRPO [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.	OK	2025/9/15	→
Pshuman image-to-3d	Use the 6D pose estimation capabilities of PSHuman to generate 3D files from single image. image-to-3D	Deprecated	2025/9/13	→
Kling TTS text-to-speech	Generate speech from text prompts and different voices using the Kling TTS model, which leverages advanced AI techniques to create high-quality text-to-speech. audio	OK	2025/9/13	→
Kling AI Avatar image-to-video	Kling AI Avatar Standard: Endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters stylized transform	OK	2025/9/13	→
Kling AI Avatar Pro image-to-video	Kling AI Avatar Pro: The premium endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters stylized transform	OK	2025/9/13	→
MiniMax (Hailuo AI) Music v1.5 text-to-audio	Generate music from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality, diverse musical compositions. music	OK	2025/9/11	→
Decart Lucy 14b image-to-video	Lucy-14B delivers lightning fast performance that redefines what's possible with image-to-video AI	Deprecated	2025/9/10	→
Qwen Image Edit Lora image-to-image	LoRA inference endpoint for the Qwen Image Editing model. image-to-image image-editing lora	OK	2025/9/10	→
Stable Audio 2.5 audio-to-audio	Generate high quality music and sound effects using Stable Audio 2.5 from StabilityAI audio	OK	2025/9/10	→
Stable Audio 2.5 text-to-audio	Generate high quality music and sound effects using Stable Audio 2.5 from StabilityAI audio	OK	2025/9/10	→
Stable Audio 25 audio-to-audio	Generate high quality music and sound effects using Stable Audio 2.5 from StabilityAI audio	OK	2025/9/10	→
Hunyuan Image text-to-image	Use the amazing capabilities of hunyuan image 2.1 to generate images that express the feelings of your text. text-to-image	OK	2025/9/9	→
Elevenlabs text-to-audio	Generate realistic audio dialogues using Eleven-v3 from ElevenLabs. audio	OK	2025/9/9	→
Vidu image-to-image	Vidu Reference-to-Image creates images by using a reference images and combining them with a prompt. images-to-image	OK	2025/9/9	→
Bytedance Seedream V4 Edit image-to-image	A new-generation image creation model ByteDance, Seedream 4.0 integrates image generation and image editing capabilities into a single, unified architecture. stylized transform editing	OK	2025/9/9	→
Bytedance Seedream V4 Text To Image text-to-image	A new-generation image creation model ByteDance, Seedream 4.0 integrates image generation and image editing capabilities into a single, unified architecture. stylized transform	OK	2025/9/9	→
Hunyuan Video Foley video-to-video	Use the capabilities of the hunyuan foley model to bring life to your videos by adding sound effect to them. video-to-video add-sound	OK	2025/9/8	→
Chatterbox text-to-speech	Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai. text-to-speech multilingual	OK	2025/9/4	→
Wan image-to-image	Wan 2.2's 14B model edit high-resolution, photorealistic images with powerful prompt understanding and fine-grained visual detail image-to-image	OK	2025/9/3	→
Elevenlabs Sound Effects V2 text-to-audio	Generate sound effects using ElevenLabs advanced sound effects model. sound	OK	2025/9/2	→
Sync Lipsync video-to-video	Generate high-quality realistic lipsync animations from audio while preserving unique details like natural teeth and unique facial features using the state-of-the-art Sync Lipsync 2 Pro model. animation lip sync high-quality	OK	2025/9/2	→
Bytedance image-to-video	Seedance lite reference-to-video allows the use of 1 to 4 images as reference to create a high-quality video. reference-to-video image-to-video	Deprecated	2025/9/1	→
Avatars Text to Video text-to-video	High-quality avatar videos that feel real, generated from your text	OK	2025/9/1	→
Avatars Audio to Video audio-to-video	High-quality avatar videos that feel real, generated from your audio	OK	2025/9/1	→
Uso image-to-image	Use USO to perform subject driven generations using reference image. image-to-image	OK	2025/8/30	→
Wan Ati image-to-video	WAN-ATI is a controllable video generation model that uses trajectory instructions to guide object, local, and camera motion, enabling precise and flexible image-to-video creation.	Deprecated	2025/8/29	→
Decart image-to-video	Lucy-5B is a model that can create 5-second I2V videos in under 5 seconds, achieving >1x RTF end-to-end	OK	2025/8/28	→
Wan 2.2 Fun Control video-to-video	Generate pose or depth controlled video using Alibaba-PAI's Wan 2.2 Fun wan pose depth	Deprecated	2025/8/28	→
VibeVoice 7B text-to-speech	Generate long, expressive multi-voice speech using Microsoft's powerful TTS text-to-speech multi-speaker podcast	OK	2025/8/27	→
VibeVoice 1.5B text-to-speech	Generate long, expressive multi-voice speech using Microsoft's powerful TTS text-to-speech multi-speaker podcast	OK	2025/8/27	→
Wan-2.2 Speech-to-Video 14B audio-to-video	Wan-S2V is a video model that generates high-quality videos from static images and audio, with realistic facial expressions, body movements, and professional camera work for film and television applications audio-to-video talking-head	OK	2025/8/27	→
Video video-to-video	Professional-grade video upscaler with strong temporal consistency, enhancing videos up to 8K resolution. Trained on fully licensed and commercially safe data - risk-free for production and enterprise use. video-upscaling upscale	OK	2025/8/26	→
Gemini 2.5 Flash Image image-to-image	Google's famous original image generation and editing model, a.k.a Nano Banana image-editing	OK	2025/8/26	→
Gemini 2.5 Flash Image text-to-image	Google's famous original image generation and editing model, a.k.a Nano Banana text-to-image	OK	2025/8/26	→
Qwen Image image-to-image	Qwen-Image (Image-to-Image) transforms and edits input images with high fidelity, enabling precise style transfer, enhancement, and creative modification. image-to-image	OK	2025/8/25	→
Sonauto V2 audio-to-audio	Extend an existing song music text-to-music text-to-audio	Deprecated	2025/8/23	→
Sonauto V2 text-to-audio	Replace sections of an existing audio with newly generated content music text-to-music text-to-audio	Deprecated	2025/8/23	→
Sonauto V2 text-to-audio	Create full songs in any style music text-to-music text-to-audio	Deprecated	2025/8/23	→
PixVerse V5 Transition image-to-video	Create seamless transition between images using PixVerse v5 stylized transform	OK	2025/8/23	→
PixVerse V5 Effects image-to-video	Generate high quality video clips with different effects using PixVerse v5 image-to-video	OK	2025/8/23	→
PixVerse V5 Image To Video image-to-video	Generate high quality video clips from text and image prompts using PixVerse v5 stylized transform	OK	2025/8/23	→
PixVerse V5 Text To Video text-to-video	Generate high quality video clips from text and image prompts using PixVerse v5	OK	2025/8/23	→
Infinitalk text-to-video	Infinitalk model generates a talking avatar video from a text and audio file. The avatar lip-syncs to the provided audio with natural facial expressions.	OK	2025/8/22	→
Infinitalk video-to-video	Infinitalk model generates a talking avatar video from an image and audio file. The avatar lip-syncs to the provided audio with natural facial expressions. stylized transform	OK	2025/8/21	→
Elevenlabs Tts Eleven V3 text-to-audio	Generate text-to-speech audio using Eleven-v3 from ElevenLabs. audio	OK	2025/8/20	→
Reimagine image-to-image	Reimagine uses a structure reference for generating new images while preserving the structure of an input image, guided by text prompts. Perfect for transforming sketches, illustrations, or photos into new illustrations. Trained exclusively on licensed data bria	Deprecated	2025/8/20	→
Nano Banana image-to-image	Google's famous original image generation and editing model image-editing	OK	2025/8/19	→
Nano Banana text-to-image	Google's famous original image generation and editing model image-generation	OK	2025/8/19	→
Nextstep 1 image-to-image	Endpoint for NextStep-1 Autoregressive Image Editing model.	Deprecated	2025/8/19	→
Qwen Image Edit image-to-image	Endpoint for Qwen's Image Editing model. Has superior text editing capabilities. image-editing image-to-image high-quality-text	OK	2025/8/18	→
Mirelo SFX video-to-audio	Generate synced sounds for any video, and return the new sound track (like MMAudio) sfx	OK	2025/8/15	→
Mirelo SFX video-to-video	Generate synced sounds for any video, and return it with its new sound track (like MMAudio) video-to-video sfx	OK	2025/8/14	→
Stable Avatar audio-to-video	Stable Avatar generates audio-driven video avatars up to five minutes long stable-avatar talking-head audio-to-video	Deprecated	2025/8/14	→
Marey Realism V1.5 video-to-video	Ideal for matching human movement. Your input video determines human poses, gestures, and body movements that will appear in the generated video.	OK	2025/8/14	→
Marey Realism V1.5 video-to-video	Pull motion from a reference video and apply it to new subjects or scenes.	OK	2025/8/14	→
Marey Realism V1.5 image-to-video	Generate a video starting from an image as the first frame with Marey, a generative video model trained exclusively on fully licensed data.	OK	2025/8/14	→
Qwen Image Trainer training	Qwen Image LoRA training lora personalization	Deprecated	2025/8/14	→
Marey Realism V1.5 text-to-video	Generate a video from a text prompt with Marey, a generative video model trained exclusively on fully licensed data.	OK	2025/8/14	→
EchoMimic V3 audio-to-video	EchoMimic V3 generates a talking avatar model from a picture, audio and text prompt. echomimic talking-head audio-to-video	OK	2025/8/13	→
Any LLM llm	Use any large language model from our selected catalogue (powered by OpenRouter) chat claude gpt streaming	Deprecated	2025/8/13	→
Luma Dream Machine image-to-video	Generate video clips from your images using Luma Dream Machine v1.5 motion transformation	Deprecated	2025/8/13	→
Any VLM vision	Use any vision language model from our selected catalogue (powered by OpenRouter) multimodal vision streaming	Deprecated	2025/8/13	→
Luma Dream Machine text-to-video	Generate video clips from your prompts using Luma Dream Machine v1.5 motion transformation	Deprecated	2025/8/13	→
PlayAI Text-to-Speech Dialog text-to-audio	Generate natural-sounding multi-speaker dialogues, and audio. Perfect for expressive outputs, storytelling, games, animations, and interactive media. audio	Deprecated	2025/8/13	→
PlayAI Text-to-Speech v3 text-to-speech	Blazing-fast text-to-speech. Generate audio with improved emotional tones and extensive multilingual support. Ideal for high-volume processing and efficient workflows.	Deprecated	2025/8/13	→
FLUX.1 [pro] Canny Fine-tuned image-to-image	Utilize Flux.1 [pro] Controlnet with a fine-tuned LoRA to generate high-quality images with precise control over composition, style, and structure through advanced edge detection and guidance mechanisms. controlnet detection editing composition	Deprecated	2025/8/13	→
FLUX.1 [pro] Depth image-to-image	Generate high-quality images from depth maps using Flux.1 [pro] depth estimation model. The model produces accurate depth representations for scene understanding and 3D visualization. depth utility composition	Deprecated	2025/8/13	→
Train Flux LoRAs For Pro Models training	FLUX LoRA for Pro endpoints. lora personalization	Deprecated	2025/8/13	→
FLUX.1 [pro] Depth Fine-tuned image-to-image	Generate high-quality images from depth maps using Flux.1 [pro] depth estimation model with a fine-tuned LoRA. The model produces accurate depth representations for scene understanding and 3D visualization. depth utility composition	Deprecated	2025/8/13	→
FLUX.1 [pro] Canny image-to-image	Utilize Flux.1 [pro] Controlnet to generate high-quality images with precise control over composition, style, and structure through advanced edge detection and guidance mechanisms. controlnet detection editing composition	Deprecated	2025/8/13	→
ElevenLabs Sound Effects text-to-audio	Generate sound effects using ElevenLabs advanced sound effects model. sound	Deprecated	2025/8/13	→
Easel AI Advanced Face Swap image-to-image	Swap faces of one or two people at once, while preserving user and scene details! face swap utility editing	Deprecated	2025/8/13	→
Tavus LipSync v2 video-to-video	Generate lip sync using Tavus' state-of-the-art model for high-quality synchronization.	Deprecated	2025/8/13	→
gpt-image-1 image-to-image	OpenAI's latest image generation and editing model: gpt-1-image. Currently powered with bring-your-own-key.	Deprecated	2025/8/13	→
gpt-image-1 text-to-image	OpenAI's latest image generation and editing model: gpt-1-image. Currently powered with bring-your-own-key.	Deprecated	2025/8/13	→
Easel Avatar text-to-image	Create scenes with one or two people using just selfies and text prompt (without LoRAs) avatars loras image-generation	Deprecated	2025/8/13	→
Easel Gifswap image-to-image	Swap faces on GIFs utility editing	Deprecated	2025/8/13	→
PlayAI Inpaint audio-to-audio	A novel way to perform audio editing, ensuring smooth transitions and consistent speaker characteristics for edits. audio inpaint	Deprecated	2025/8/13	→
Lipsync video-to-video	Realistic lipsync video - optimized for speed, quality, and consistency.	Deprecated	2025/8/13	→
any-llm Enterprise llm	Run any large language model with fal, powered by OpenRouter. This endpoint only supports models that do not train on private data. Read more in OpenRouter's Privacy and Logging documentation. chat claude gpt	Deprecated	2025/8/13	→
Fashion Photoshoot image-to-image	Instant fashion photoshoot with a selfie and an outfit image-to-image	Deprecated	2025/8/13	→
Fashion Try On image-to-image	Instant fashion try on with a full-body pic and an outfit	Deprecated	2025/8/13	→
Bytedance Video Stylize image-to-video	Transform your images into stylized videos using this workflow. image-to-video effects	Deprecated	2025/8/12	→
Ffmpeg Api video-to-video	Use ffmpeg capabilities to merge 2 or more videos.	OK	2025/8/12	→
Minimax text-to-speech	Generate speech from text prompts and different voices using the MiniMax Speech-02 HD model, which leverages advanced AI techniques to create high-quality text-to-speech. speech	OK	2025/8/11	→
Minimax text-to-speech	Generate fast speech from text prompts and different voices using the MiniMax Speech-02 Turbo model, which leverages advanced AI techniques to create high-quality text-to-speech. text-to-speech	OK	2025/8/11	→
Wan 2.2 14B Image Trainer training	Wan 2.2 text to image LoRA trainer. Fine-tune Wan 2.2 for subjects and styles with unprecedented detail. lora personalization	OK	2025/8/11	→
Ideogram V3 Character Edit image-to-image	Modify consistent characters while preserving their core identity. Edit poses, expressions, or clothing without losing recognizable character features character-consistency	OK	2025/8/7	→
Ideogram V3 Character image-to-image	Generate consistent character appearances across multiple images. Maintain facial features, proportions, and distinctive traits for cohesive storytelling and branding character-consistency	OK	2025/8/7	→
Ideogram V3 Character Remix image-to-image	Transform your consistent character into different art styles, settings, or scenarios while maintaining their distinctive appearance and identity character-consistency	OK	2025/8/7	→
Wan-2.2 Text-to-Video A14B with LoRAs text-to-video	Wan-2.2 text-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts. This endpoint supports LoRAs made for Wan 2.2.	OK	2025/8/7	→
Wan v2.2 A14B Image-to-Video A14B with LoRAs image-to-video	Wan-2.2 image-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts and images. This endpoint supports LoRAs made for Wan 2.2 image-to-video motion lora	OK	2025/8/7	→
Wan text-to-video	Wan 2.2's 5B distill model produces up to 5 seconds of video 720p at 24FPS with fluid motion and powerful prompt understanding	OK	2025/8/6	→
Minimax image-to-video	Create blazing fast and economical videos with MiniMax Hailuo-02 Image To Video API at 512p resolution stylized transform	OK	2025/8/6	→
Bytedance Dreamina V3.1 Text To Image text-to-image	Dreamina showcases superior picture effects, with significant improvements in picture aesthetics, precise and diverse styles, and rich details. text-to-image	OK	2025/8/6	→
Wan text-to-video	Wan 2.2's 5B FastVideo model produces up to 5 seconds of video 720p at 24FPS with fluid motion and powerful prompt understanding text to video motion	OK	2025/8/5	→
Wan v2.2 A14B Text-to-Image A14B with LoRAs text-to-image	Wan 2.2's 14B model with LoRA support generates high-fidelity images with enhanced prompt alignment, style adaptability.	OK	2025/8/5	→
Wan text-to-image	Wan 2.2's 5B model generates high-resolution, photorealistic images with powerful prompt understanding and fine-grained visual detail	OK	2025/8/5	→
Wan text-to-image	Wan 2.2's 14B model generates high-resolution, photorealistic images with powerful prompt understanding and fine-grained visual detail	OK	2025/8/5	→
Qwen Image text-to-image	Qwen-Image is an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and precise image editing. text-to-image	OK	2025/8/4	→
Wan video-to-video	Wan-2.2 video-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts and source videos.	OK	2025/8/2	→
Train Flux Krea LoRA training	Train styles, people and other subjects at blazing speeds using the FLUX.1 Krea [dev] base model. lora personalization	Deprecated	2025/8/1	→
Flux Krea Lora text-to-image	Super fast endpoint for the FLUX.1 [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs. lora personalization	OK	2025/8/1	→
FLUX.1 Krea [dev] Inpainting with LoRAs image-to-image	Super fast endpoint for the FLUX.1 [dev] inpainting model with LoRA support, enabling rapid and high-quality image inpaingting using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs. lora personalization	OK	2025/8/1	→
FLUX.1 Krea [dev] with LoRAs text-to-image	Super fast endpoint for the FLUX.1 [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs. lora personalization	OK	2025/8/1	→
FLUX.1 Krea [dev] with LoRAs image-to-image	FLUX LoRA Image-to-Image is a high-performance endpoint that transforms existing images using FLUX models, leveraging LoRA adaptations to enable rapid and precise image style transfer, modifications, and artistic variations. lora style transfer	OK	2025/8/1	→
Veo3 image-to-video	Veo 3 is the latest state-of-the art video generation model from Google DeepMind	Deprecated	2025/8/1	→
Wan image-to-video	Wan-2.2 Turbo image-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts.	OK	2025/7/31	→
Wan text-to-video	Wan-2.2 turbo text-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts. text to video motion	OK	2025/7/31	→
FLUX.1 Krea [dev] image-to-image	FLUX.1 Krea [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.	OK	2025/7/30	→
FLUX.1 Krea [dev] Redux image-to-image	FLUX.1 Krea [dev] Redux is a high-performance endpoint for the FLUX.1 Krea [dev] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.	OK	2025/7/30	→
FLUX.1 Krea [dev] text-to-image	FLUX.1 Krea [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.	OK	2025/7/30	→
FLUX.1 Krea [dev] image-to-image	FLUX.1 Krea [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.	OK	2025/7/30	→
FLUX.1 Krea [dev] Redux image-to-image	FLUX.1 Krea [dev] Redux is a high-performance endpoint for the FLUX.1 Krea [dev] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.	OK	2025/7/30	→
FLUX.1 Krea [dev] text-to-image	FLUX.1 Krea [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use.	OK	2025/7/30	→
Wan v2.2 5B image-to-video	Wan 2.2's 5B model produces up to 5 seconds of video 720p at 24FPS with fluid motion and powerful prompt understanding	OK	2025/7/30	→
Flux Kontext Lora image-to-image	Fast inpainting endpoint for the FLUX.1 Kontext [dev] model with LoRA support, enabling rapid and high-quality image inpainting with reference images, while using pre-trained LoRA adaptations for specific styles, brand identities, and product-specific outputs. image-editing image-inpainting image-to-image	OK	2025/7/29	→
Wan v2.2 5B text-to-video	Wan 2.2's 5B model produces up to 5 seconds of video 720p at 24FPS with fluid motion and powerful prompt understanding	OK	2025/7/28	→
Wan-2.2 Text-to-Video A14B text-to-video	Wan-2.2 text-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts. text to video motion	OK	2025/7/28	→
Wan v2.2 A14B image-to-video	fal-ai/wan/v2.2-A14B/image-to-video	OK	2025/7/28	→
Hunyuan World image-to-image	Hunyuan World 1.0 turns a single image into a panorama or a 3D world. It creates realistic scenes from the image, allowing you to explore and view it from different angles.	OK	2025/7/28	→
Hunyuan World image-to-3d	Hunyuan World 1.0 turns a single image into a panorama or a 3D world. It creates realistic scenes from the image, allowing you to explore and view it from different angles.	OK	2025/7/28	→
NSFW Checker vision	Predict whether an image is NSFW or SFW. filter safety utility	OK	2025/7/28	→
OmniHuman image-to-video	OmniHuman generates video using an image of a human figure paired with an audio file. It produces vivid, high-quality videos where the character’s emotions and movements maintain a strong correlation with the audio. image-to-video lipsync	OK	2025/7/27	→
Sky Raccoon text-to-image	Generate images from a text prompt. text-to-image	Deprecated	2025/7/26	→
Image Editing Retouch image-to-image	Retouch photos of faces. Remove blemishes and improve the skin.	OK	2025/7/24	→
Hidream E1 1 image-to-image	Edit images with natural language	Deprecated	2025/7/23	→
LTX-Video 13B 0.9.8 Distilled video-to-video	Extend videos using LTX Video-0.9.8 13B Distilled and custom LoRA ltx-video extend	OK	2025/7/23	→
RIFE video-to-video	Interpolate videos with RIFE - Real-Time Intermediate Flow Estimation interpolation	OK	2025/7/22	→
RIFE image-to-image	Interpolate images with RIFE - Real-Time Intermediate Flow Estimation interpolation	OK	2025/7/22	→
FILM video-to-video	Interpolate videos with FILM - Frame Interpolation for Large Motion interpolation	OK	2025/7/22	→
FILM image-to-image	Interpolate images with FILM - Frame Interpolation for Large Motion interpolation	OK	2025/7/22	→
MiniMax Voice Design text-to-speech	Design a personalized voice from a text description, and generate speech from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality text-to-speech. speech	OK	2025/7/18	→
Luma Ray 2 Flash Modify video-to-video	Ray2 Flash Modify is a video generative model capable of restyling or retexturing the entire shot, from turning live-action into CG or stylized animation, to changing wardrobe, props, or the overall aesthetic and swap environments or time periods, giving you control over background, location, or even weather. modify restyle	OK	2025/7/17	→
LTX-Video 13B 0.9.8 Distilled image-to-video	Generate long videos from prompts and images using LTX Video-0.9.8 13B Distilled and custom LoRA video ltx-video image-to-video	OK	2025/7/17	→
LTX-Video 13B 0.9.8 Distilled text-to-video	Generate long videos from prompts using LTX Video-0.9.8 13B Distilled and custom LoRA video ltx-video text-to-video	OK	2025/7/17	→
LTX-Video 13B 0.9.8 Distilled video-to-video	Generate long videos from prompts, images, and videos using LTX Video-0.9.8 13B Distilled and custom LoRA video ltx-video video-to-video multicondition-to-video image-to-video	OK	2025/7/17	→
Calligrapher image-to-image	Use the text and font retaining capabilities of calligrapher to modify texts on your books, clothes and many more. image-to-image	Deprecated	2025/7/12	→
Veo 3 Fast [Image to Video] image-to-video	Now with a 50% price drop. Generate videos from your image prompts using Veo 3 fast.	Deprecated	2025/7/9	→
Veo 3 Fast text-to-video	Faster and more cost effective version of Google's Veo 3!	Deprecated	2025/7/9	→
Ffmpeg Api json	Get EBU R128 loudness normalization from audio files using FFmpeg API. ffmpeg	OK	2025/7/8	→
Vidu image-to-video	Generate video clips from your multiple image references using Vidu Q1 stylized transform	OK	2025/7/8	→
Bria image-to-image	Structure Reference allows generating new images while preserving the structure of an input image, guided by text prompts. Perfect for transforming sketches, illustrations, or photos into new illustrations. Trained exclusively on licensed data for safe and risk-free commercial use.	OK	2025/7/8	→
PixVerse Sound Effects video-to-video	Add immersive sound effects and background music to your videos using PixVerse sound effects generation audio utility	OK	2025/7/7	→
Image Editing Realism image-to-image	Add details to faces, enhance face features, remove blur. stylized transform realism	OK	2025/7/7	→
ThinkSound video-to-video	Generate realistic audio from a video with an optional text prompt audio-generation video-to-audio	OK	2025/7/2	→
ThinkSound video-to-video	Generate realistic audio for a video with an optional text prompt and combine audio-generation video-to-audio	OK	2025/7/1	→
Post Processing Vignette image-to-image	Add a darkening vignette effect around the edges of the image with adjustable strength stylized transform	OK	2025/7/1	→
Post Processing Solarize image-to-image	Apply solarization effect by inverting pixel values above a threshold stylized transform	OK	2025/7/1	→
Post Processing Sharpen image-to-image	Apply sharpening effects with three modes: basic unsharp mask, smart sharpening with edge preservation, and Contrast Adaptive Sharpening (CAS). stylized transform	OK	2025/7/1	→
Post Processing Parabolize image-to-image	Apply a parabolic distortion effect with configurable coefficient and vertex position. stylized transform	OK	2025/7/1	→
Post Processing Grain image-to-image	Apply film grain effect with different styles (modern, analog, kodak, fuji, cinematic, newspaper) and customizable intensity and scale stylized transform	OK	2025/7/1	→
Post Processing Dodge Burn image-to-image	Apply dodge and burn effects with multiple modes and adjustable intensity. stylized transform	OK	2025/7/1	→
Post Processing Dissolve image-to-image	Blend two images together using smooth linear interpolation with a configurable blend factor. stylized transform	OK	2025/7/1	→
Post Processing Desaturate image-to-image	Reduce color saturation using different methods (luminance Rec.709, luminance Rec.601, average, lightness) with adjustable factor. stylized transform	OK	2025/7/1	→
Post Processing Color Tint image-to-image	Apply various color tints (sepia, red, green, blue, cyan, magenta, yellow, purple, orange, warm, cool, lime, navy, vintage, rose, teal, maroon, peach, lavender, olive) with adjustable strength. stylized transform	OK	2025/7/1	→
Post Processing Color Correction image-to-image	Adjust color temperature, brightness, contrast, saturation, and gamma values for color correction. stylized transform	OK	2025/7/1	→
Post Processing Chromatic Aberration image-to-image	Create chromatic aberration by shifting red, green, and blue channels horizontally or vertically with customizable shift amounts. stylized transform	OK	2025/7/1	→
Post Processing Blur image-to-image	Apply Gaussian or Kuwahara blur effects with adjustable radius and sigma parameters stylized transform	OK	2025/7/1	→
PixVerse Extend Fast video-to-video	PixVerse Extend model is a video extending tool for your videos using with high-quality video extending techniques utility editing	OK	2025/6/30	→
PixVerse Extend video-to-video	PixVerse Extend model is a video extending tool for your videos using with high-quality video extending techniques utility editing	OK	2025/6/30	→
PixVerse Lipsync video-to-video	Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization with PixVerse Lipsync model animation lip sync	OK	2025/6/30	→
Image Editing Youtube Thumbnails image-to-image	Generate YouTube thumbnails with custom text stylized transform	OK	2025/6/30	→
Video video-to-video	Automatically remove backgrounds from videos -perfect for creating clean, professional content without a green screen. background-removal	OK	2025/6/30	→
Luma Ray 2 Modify video-to-video	Ray2 Modify is a video generative model capable of restyling or retexturing the entire shot, from turning live-action into CG or stylized animation, to changing wardrobe, props, or the overall aesthetic and swap environments or time periods, giving you control over background, location, or even weather. modify restyle	OK	2025/6/28	→
Topaz image-to-image	Use the powerful and accurate topaz image enhancer to enhance your images. image-to-image	OK	2025/6/27	→
Bytedance image-to-image	SeedEdit 3.0 is an image editing model independently developed by ByteDance. It excels in accurately following editing instructions and effectively preserving image content, especially excelling in handling real images image-editing image-to-image	Deprecated	2025/6/27	→
Flux Kontext Trainer training	LoRA trainer for FLUX.1 Kontext [dev]	OK	2025/6/26	→
Image Editing Broccoli Haircut image-to-image	Transform your character's hair into broccoli style while keeping the original characters likeness stylized transform	OK	2025/6/26	→
Image Editing Wojak Style image-to-image	Transform your photos into wojak style while keeping the original characters likeness stylized transform	OK	2025/6/26	→
Image Editing Plushie Style image-to-image	Transform your photos into cool plushies while keeping the original characters likeness stylized transform	OK	2025/6/26	→
Flux Kontext Lora text-to-image	Super fast text-to-image endpoint for the FLUX.1 Kontext [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs. text-to-image	OK	2025/6/25	→
Flux Kontext Lora image-to-image	Fast endpoint for the FLUX.1 Kontext [dev] model with LoRA support, enabling rapid and high-quality image editing using pre-trained LoRA adaptations for specific styles, brand identities, and product-specific outputs. image-editing image-to-image	OK	2025/6/25	→
Omnigen V2 text-to-image	OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It can be used for various tasks such as Image Editing, Personalized Image Generation, Virtual Try-On, Multi Person Generation and more! multimodal editing try-on	OK	2025/6/25	→
FASHN Virtual Try-On V1.6 image-to-image	FASHN v1.6 delivers precise virtual try-on capabilities, accurately rendering garment details like text and patterns at 864x1296 resolution from both on-model and flat-lay photo references. try-on fashion clothing	OK	2025/6/24	→
AI Avatar Single Text image-to-video	MultiTalk model generates a talking avatar video from an image and text. Converts text to speech automatically, then generates the avatar speaking with lip-sync. stylized transform	OK	2025/6/23	→
Ai Avatar image-to-video	MultiTalk model generates a talking avatar video from an image and audio file. The avatar lip-syncs to the provided audio with natural facial expressions. stylized transform	Deprecated	2025/6/23	→
AI Avatar Multi Text image-to-video	MultiTalk model generates a multi-person conversation video from an image and text inputs. Converts text to speech for each person, generating a realistic conversation scene. stylized transform	OK	2025/6/23	→
AI Avatar Multi image-to-video	MultiTalk model generates a multi-person conversation video from an image and audio files. Creates a realistic scene where multiple people speak in sequence. stylized transform	OK	2025/6/23	→
Video Understanding vision	A video understanding model to analyze video content and answer questions about what's happening in the video based on user prompts. utility vision	OK	2025/6/20	→
Wan VACE 14B video-to-video	VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources. reframe	OK	2025/6/18	→
Wan VACE 14B video-to-video	VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources. image-to-video video-to-video text-to-video	OK	2025/6/18	→
Wan VACE 14B video-to-video	VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources. image-to-video video-to-video text-to-video	OK	2025/6/18	→
Wan VACE 14B video-to-video	VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources. image-to-video video-to-video text-to-video	OK	2025/6/18	→
Wan VACE 14B video-to-video	VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources. image-to-video video-to-video text-to-video	OK	2025/6/18	→
Chain Of Zoom image-to-image	Extreme Super-Resolution via Scale Autoregression and Preference Alignment	Deprecated	2025/6/18	→
Tripo3D image-to-3d	State of the art Multiview to 3D Object generation. Generate 3D models from multiple images! stylized multiview	OK	2025/6/18	→
MiniMax Hailuo 02 [Standard] (Image to Video) image-to-video	MiniMax Hailuo-02 Image To Video API (Standard, 768p, 512p): Advanced image-to-video generation model with 768p and 512p resolutions	OK	2025/6/18	→
MiniMax Hailuo 02 [Pro] (Image to Video) image-to-video	MiniMax Hailuo-02 Image To Video API (Pro, 1080p): Advanced image-to-video generation model with 1080p resolution	OK	2025/6/18	→
MiniMax Hailuo 02 [Pro] (Text to Video) text-to-video	MiniMax Hailuo-02 Text To Video API (Pro, 1080p): Advanced video generation model with 1080p resolution	OK	2025/6/18	→
MiniMax Hailuo 02 [Standard] (Text to Video) text-to-video	MiniMax Hailuo-02 Text To Video API (Standard, 768p): Advanced video generation model with 768p resolution	OK	2025/6/18	→
PASD image-to-image	Pixel-Aware Diffusion Model for Realistic Image Super-Resolution and Personalized Stylization utility editing	OK	2025/6/17	→
Bria 3.2 Text-to-Image text-to-image	Bria’s Text-to-Image model, trained exclusively on licensed data for safe and risk-free commercial use. Excels in Text-Rendering and Aesthetics. image generation	Deprecated	2025/6/17	→
Object Removal image-to-image	Removes box-selected objects and their visual effects, seamlessly reconstructing the scene with contextually appropriate content. utility editing	OK	2025/6/16	→
Object Removal image-to-image	Removes mask-selected objects and their visual effects, seamlessly reconstructing the scene with contextually appropriate content. utility editing	OK	2025/6/16	→
Object Removal image-to-image	Removes objects and their visual effects using natural language, replacing them with contextually appropriate content utility editing	OK	2025/6/16	→
Seedance 1.0 Pro text-to-video	Seedance 1.0 Pro, a high quality video generation model developed by Bytedance.	OK	2025/6/16	→
Seedance 1.0 Pro image-to-video	Seedance 1.0 Pro, a high quality video generation model developed by Bytedance.	OK	2025/6/16	→
DWPose Pose Prediction video-to-video	Predict poses from videos. pose utility	OK	2025/6/15	→
Hunyuan 3D 2.1 image-to-3d	Hunyuan3D-2.1 is a scalable 3D asset creation system that advances state-of-the-art 3D generation through Physically-Based Rendering (PBR). image-to-3d	Deprecated	2025/6/14	→
Seedance 1.0 Lite image-to-video	Seedance 1.0 Lite	Deprecated	2025/6/13	→
Seedance 1.0 Lite text-to-video	Seedance 1.0 Lite	Deprecated	2025/6/13	→
Recraft image-to-image	Converts a given raster image to SVG format using Recraft model. stylized transform	OK	2025/6/12	→
Imagen 4 text-to-image	Google’s highest quality image generation model	Deprecated	2025/6/12	→
Wan-2.1 LoRA Trainer training	Train custom LoRAs for Wan-2.1 T2V 1.3B lora training	OK	2025/6/11	→
Wan-2.1 LoRA Trainer training	Train custom LoRAs for Wan-2.1 T2V 14B lora training	OK	2025/6/11	→
Wan-2.1 LoRA Trainer training	Train custom LoRAs for Wan-2.1 I2V 720P lora training	OK	2025/6/11	→
Wan-2.1 LoRA Trainer training	Train custom LoRAs for Wan-2.1 FLF2V 720P lora training	Deprecated	2025/6/11	→
Bytedance text-to-image	Seedream 3.0 is a bilingual (Chinese and English) text-to-image model that excels at text-to-image generation.	Deprecated	2025/6/10	→
Ffmpeg Api image-to-image	ffmpeg endpoint for first, middle and last frame extraction from videos utility editing	OK	2025/6/9	→
Ffmpeg Api Merge Audio-Video video-to-video	Merge videos with standalone audio files or audio from video files. ffmpeg	OK	2025/6/9	→
Luma Photon image-to-image	Edit images from your prompts using Luma Photon. Photon is the most creative, personalizable, and intelligent visual models for creatives, bringing a step-function change in the cost of high-quality image generation. image-to-image	OK	2025/6/8	→
Luma Photon image-to-image	Edit images from your prompts using Luma Photon. Photon is the most creative, personalizable, and intelligent visual models for creatives, bringing a step-function change in the cost of high-quality image generation. image-to-image	OK	2025/6/8	→
Veo 3 text-to-video	Veo 3 by Google, the most advanced AI video generation model in the world. With sound on!	Deprecated	2025/6/5	→
Image Editing Reframe image-to-image	The reframe endpoint intelligently adjusts an image's aspect ratio while preserving the main subject's position, composition, pose, and perspective stylized transform	OK	2025/6/5	→
Wan Vace 1 3b video-to-video	Vace a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources. video-to-video	Deprecated	2025/6/4	→
Image Editing image-to-image	Transform any person into their baby version, while preserving the original pose and expression with childlike features. stylized transform	OK	2025/6/3	→
Luma Ray 2 Flash Reframe video-to-video	Adjust and enhance videos with Ray-2 Reframe. This advanced tool seamlessly reframes videos to your desired aspect ratio, intelligently inpainting missing regions to ensure realistic visuals and coherent motion, delivering exceptional quality and creative flexibility. reframe outpaint flash	OK	2025/6/3	→
Luma Ray 2 Reframe video-to-video	Adjust and enhance videos with Ray-2 Reframe. This advanced tool seamlessly reframes videos to your desired aspect ratio, intelligently inpainting missing regions to ensure realistic visuals and coherent motion, delivering exceptional quality and creative flexibility. reframe outpaint	OK	2025/6/3	→
Luma Photon Flash Reframe image-to-image	This advanced tool intelligently expands your visuals, seamlessly blending new content to enhance creativity and adaptability, offering unmatched speed and quality for creators at a fraction of the cost. flash reframe outpainting	OK	2025/6/3	→
Luma Photon Reframe image-to-image	Extend and reframe images with Luma Photon Reframe. This advanced tool intelligently expands your visuals, seamlessly blending new content to enhance creativity and adaptability, offering unmatched personalization and quality for creators at a fraction of the cost. outpainting reframe	OK	2025/6/3	→
Chatterboxhd speech-to-speech	Transform voices using Resemble AI's Chatterbox. Convert audio to new voices or your own samples, with expressive results and built-in perceptual watermarking.	OK	2025/6/2	→
Chatterboxhd text-to-speech	Generate expressive, natural speech with Resemble AI's Chatterbox. Features unique emotion control, instant voice cloning from short audio, and built-in watermarking.	OK	2025/6/2	→
FLUX.1 [schnell] Redux image-to-image	FLUX.1 [schnell] Redux is a high-performance endpoint for the FLUX.1 [schnell] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.	OK	2025/6/2	→
FLUX.1 [dev] Redux image-to-image	FLUX.1 [dev] Redux is a high-performance endpoint for the FLUX.1 [dev] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.	OK	2025/6/2	→
FLUX.1 [dev] image-to-image	FLUX.1 [dev] is a 12 billion parameter flow transformer that generates high-quality images from text. It is suitable for personal and commercial use.	OK	2025/6/2	→
FLUX.1 [schnell] text-to-image	Fastest inference in the world for the 12 billion parameter FLUX.1 [schnell] text-to-image model.	OK	2025/6/2	→
FLUX.1 [dev] text-to-image	FLUX.1 [dev] is a 12 billion parameter flow transformer that generates high-quality images from text. It is suitable for personal and commercial use.	OK	2025/6/2	→
Image Editing Text Removal image-to-image	Remove all text and writing from images while preserving the background and natural appearance. stylized transform	OK	2025/6/2	→
Image Editing Photo Restoration image-to-image	Restore and enhance old or damaged photos by removing imperfections, adding color while preserving the original character and details of the image. stylized transform	OK	2025/6/2	→
Chatterbox speech-to-speech	Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai. speech-to-speech	OK	2025/6/1	→
Chatterbox text-to-speech	Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai. text-to-speech	OK	2025/6/1	→
Image Editing Weather Effect image-to-image	Add realistic weather effects like snowfall, rain, or fog to your photos while maintaining the scene's mood. stylized transform	OK	2025/5/29	→
Image Editing Time Of Day image-to-image	Transform your photos to any time of day, from golden hour to midnight, with appropriate lighting and atmosphere. stylized transform	OK	2025/5/29	→
Image Editing Style Transfer image-to-image	Transform your photos into artistic masterpieces inspired by famous styles like Van Gogh's Starry Night or any artistic style you choose. stylized transform	OK	2025/5/29	→
Image Editing Scene Composition image-to-image	Place your subject in any scene you imagine, from enchanted forests to urban settings, with professional composition and lighting stylized transform	OK	2025/5/29	→
Image Editing Professional Photo image-to-image	Turn your casual photos into stunning professional studio portraits with perfect lighting and high-end photography style. stylized transform	OK	2025/5/29	→
Image Editing Object Removal image-to-image	Remove unwanted objects or people from your photos while seamlessly blending the background. stylized transform	OK	2025/5/29	→
Image Editing Hair Change image-to-image	Experiment with different hairstyles, from bald to any style you can imagine, while maintaining natural lighting and realistic results. stylized transform	OK	2025/5/29	→
Image Editing Face Enhancement image-to-image	Enhance facial features with professional retouching while maintaining a natural, realistic look stylized transform	OK	2025/5/29	→
Image Editing Expression Change image-to-image	Change facial expressions in photos to any emotion you desire, from smiles to serious looks. stylized transform	OK	2025/5/29	→
Image Editing Color Correction image-to-image	Perfect your photos with professional color grading, balanced tones, and vibrant yet natural colors stylized transform	OK	2025/5/29	→
Image Editing Cartoonify image-to-image	Transform your photos into vibrant cool cartoons with bold outlines and rich colors. stylized transform	OK	2025/5/29	→
Image Editing Background Change image-to-image	Replace your photo's background with any scene you desire, from beach sunsets to urban landscapes, with perfect lighting and shadows stylized transform	OK	2025/5/29	→
Image Editing Age Progression image-to-image	See how you or others might look at different ages, from younger to older, while preserving core facial features. stylized transform	OK	2025/5/29	→
FLUX.1 Kontext [max] image-to-image	Experimental version of FLUX.1 Kontext [max] with multi image handling capabilities	OK	2025/5/29	→
FLUX.1 Kontext [pro] image-to-image	Experimental version of FLUX.1 Kontext [pro] with multi image handling capabilities	OK	2025/5/29	→
Hunyuan Avatar image-to-video	HunyuanAvatar is a High-Fidelity Audio-Driven Human Animation model for Multiple Characters . stylized transform	Deprecated	2025/5/29	→
FLUX.1 Kontext [max] image-to-image	FLUX.1 Kontext [max] is a model with greatly improved prompt adherence and typography generation meet premium consistency for editing without compromise on speed.	OK	2025/5/29	→
FLUX.1 Kontext [max] text-to-image	FLUX.1 Kontext [max] text-to-image is a new premium model brings maximum performance across all aspects – greatly improved prompt adherence.	OK	2025/5/29	→
Kling 2.1 Master text-to-video	Kling 2.1 Master: The premium endpoint for Kling 2.1, designed for top-tier text-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.	OK	2025/5/29	→
Kling 2.1 Master image-to-video	Kling 2.1 Master: The premium endpoint for Kling 2.1, designed for top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision. _marquee-video-model	OK	2025/5/29	→
Kling 2.1 (pro) image-to-video	Kling 2.1 Pro is an advanced endpoint for the Kling 2.1 model, offering professional-grade videos with enhanced visual fidelity, precise camera movements, and dynamic motion control, perfect for cinematic storytelling.	OK	2025/5/28	→
Kling 2.1 (standard) image-to-video	Kling 2.1 Standard is a cost-efficient endpoint for the Kling 2.1 model, delivering high-quality image-to-video generation	OK	2025/5/28	→
FLUX.1 Kontext [pro] text-to-image	The FLUX.1 Kontext [pro] text-to-image delivers state-of-the-art image generation results with unprecedented prompt following, photorealistic rendering, and flawless typography.	OK	2025/5/28	→
FLUX.1 Kontext [dev] image-to-image	Frontier image editing model.	OK	2025/5/28	→
FLUX.1 Kontext [pro] image-to-image	FLUX.1 Kontext [pro] handles both text and reference images as inputs, seamlessly enabling targeted, local edits and complex transformations of entire scenes.	OK	2025/5/28	→
Lipsync video-to-video	Generate realistic lipsync from any audio using VEED's model. lipsync video-to-video avatar	OK	2025/5/28	→
Avatars text-to-video	Generate high-quality videos with UGC-like avatars from text lipsync text-to-video	OK	2025/5/28	→
Avatars audio-to-video	Generate high-quality videos with UGC-like avatars from audio lipsync audio-to-video	OK	2025/5/28	→
Hunyuan Portrait image-to-video	HunyuanPortrait is a diffusion-based framework for generating lifelike, temporally consistent portrait animations. animation lip sync	Deprecated	2025/5/27	→
Wan VACE 14B video-to-video	VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources. image-to-video video-to-video text-to-video	OK	2025/5/27	→
Bagel image-to-json	Bagel is a 7B parameter multimodal model from Bytedance-Seed that can generate both text and images. image-to-text vlm	OK	2025/5/21	→
Bagel image-to-image	Bagel is a 7B parameter multimodal model from Bytedance-Seed that can generate both images and text. image-to-image image-editing	OK	2025/5/21	→
Bagel text-to-image	Bagel is a 7B parameter from Bytedance-Seed multimodal model that can generate both text and images. text-to-image multimodal	OK	2025/5/21	→
Lyria2 text-to-audio	Lyria 2 is Google's latest music generation model, you can generate any type of music with this model. music stylized	OK	2025/5/20	→
Imagen 4 Ultra text-to-image	Google’s highest quality image generation model	Deprecated	2025/5/20	→
Imagen 4 text-to-image	Google’s highest quality image generation model	Deprecated	2025/5/20	→
Kling 1.6 Elements image-to-video	Generate video clips from your multiple image references using Kling 1.6 (standard)	OK	2025/5/20	→
Kling 1.6 Elements image-to-video	Generate video clips from your multiple image references using Kling 1.6 (pro)	OK	2025/5/20	→
DreamO text-to-image	DreamO is an image customization framework designed to support a wide range of tasks while facilitating seamless integration of multiple conditions. stylized realism	Deprecated	2025/5/19	→
LTX Video-0.9.7 13B Distilled video-to-video	Extend videos using LTX Video-0.9.7 13B Distilled and custom LoRA video ltx-video video-to-video extend-video	OK	2025/5/17	→
LTX Video-0.9.7 13B Distilled video-to-video	Generate videos from prompts, images, and videos using LTX Video-0.9.7 13B Distilled and custom LoRA video ltx-video video-to-video multicondition-to-video image-to-video	OK	2025/5/17	→
LTX Video-0.9.7 13B Distilled image-to-video	Generate videos from prompts and images using LTX Video-0.9.7 13B Distilled and custom LoRA video ltx-video image-to-video	OK	2025/5/17	→
LTX Video-0.9.7 13B video-to-video	Generate videos from prompts, images, and videos using LTX Video-0.9.7 13B and custom LoRA video ltx-video video-to-video multicondition-to-video image-to-video	Deprecated	2025/5/17	→
LTX Video-0.9.7 13B video-to-video	Extend videos using LTX Video-0.9.7 13B and custom LoRA video ltx-video video-to-video extend-video	Deprecated	2025/5/17	→
LTX Video-0.9.7 13B image-to-video	Generate videos from prompts and images using LTX Video-0.9.7 13B and custom LoRA video ltx-video image-to-video	Deprecated	2025/5/17	→
LTX Video-0.9.7 13B text-to-video	Generate videos from prompts using LTX Video-0.9.7 13B and custom LoRA video ltx-video text-to-video	Deprecated	2025/5/17	→
LTX Video-0.9.7 13B Distilled text-to-video	Generate videos from prompts using LTX Video-0.9.7 13B Distilled and custom LoRA video ltx-video text-to-video	OK	2025/5/17	→
Flux Lora text-to-image	Super fast endpoint for the FLUX.1 [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs. lora personalization	OK	2025/5/15	→
LTX Video-0.9.7 LoRA video-to-video	Generate videos from prompts, images, and videos using LTX Video-0.9.7 and custom LoRA video ltx-video video-to-video multicondition-to-video image-to-video	Deprecated	2025/5/15	→
LTX Video-0.9.7 LoRA image-to-video	Generate videos from prompts and images using LTX Video-0.9.7 and custom LoRA video ltx-video image-to-video	Deprecated	2025/5/15	→
LTX Video-0.9.7 LoRA text-to-video	Deprecated. Use fal-ai/ltx-video-13b-dev or fal-ai/ltx-video-13b-distilled instead. video ltx-video text-to-video	Deprecated	2025/5/15	→
PixVerse V4.5 Transition image-to-video	Create seamless transition between images using PixVerse v4.5 stylized transform	OK	2025/5/15	→
PixVerse V4.5 Image To Video Fast image-to-video	Generate fast high quality video clips from text and image prompts using PixVerse v4.5 stylized transform	OK	2025/5/15	→
PixVerse V4.5 Image To Video image-to-video	Generate high quality video clips from text and image prompts using PixVerse v4.5 stylized transform	OK	2025/5/15	→
PixVerse V4.5 Text To Video Fast text-to-video	Generate high quality and fast video clips from text and image prompts using PixVerse v4.5 fast stylized transform	OK	2025/5/15	→
PixVerse V4.5 Text To Video text-to-video	Generate high quality video clips from text and image prompts using PixVerse v4.5 stylized transform	OK	2025/5/15	→
PixVerse V4.5 Effects image-to-video	Generate high quality video clips with different effects using PixVerse v4.5 image-to-video	OK	2025/5/15	→
Hunyuan Custom image-to-video	HunyuanCustom revolutionizes video generation with unmatched identity consistency across multiple input types. Its innovative fusion modules and alignment networks outperform competitors, maintaining subject integrity while responding flexibly to text, image, audio, and video conditions. image-to-video	Deprecated	2025/5/14	→
Framepack F1 image-to-video	Framepack is an efficient Image-to-video model that autoregressively generates videos. image to video motion	OK	2025/5/13	→
ACE Step Audio Outpaint audio-to-audio	Extend the beginning or end of provided audio with lyrics and/or style using ACE-Step audio-to-audio audio-outpaint audio-extend	OK	2025/5/11	→
ACE Step Audio Inpaint audio-to-audio	Modify a portion of provided audio with lyrics and/or style using ACE-Step audio-to-audio audio-inpaint audio-repaint	OK	2025/5/11	→
ACE Step Audio To Audio audio-to-audio	Generate music from a lyrics and example audio using ACE-Step audio-to-audio audio-edit	OK	2025/5/11	→
ACE Step Prompt To Audio text-to-audio	Generate music from a simple prompt using ACE-Step text-to-audio text-to-music	OK	2025/5/11	→
Rembg Enhance (Remove Background Enhance) image-to-image	Rembg-enhance is optimized for 2D vector images, 3D graphics, and photos by leveraging matting technology. background removal image editing utility segmentation high resolution rembg	OK	2025/5/9	→
Vidu Start End to Video image-to-video	Vidu Q1 Start-End to Video generates smooth transition 1080p videos between specified start and end images. stylized transform	OK	2025/5/9	→
Vidu Text to Video text-to-video	Vidu Q1 Text to Video generates high-quality 1080p videos with exceptional visual quality and motion diversity stylized transform	OK	2025/5/9	→
Vidu Image to Video image-to-video	Vidu Q1 Image to Video generates high-quality 1080p videos with exceptional visual quality and motion diversity from a single image stylized transform	OK	2025/5/9	→
ACE Step text-to-audio	Generate music with lyrics from text using ACE-Step text-to-audio text-to-music	OK	2025/5/8	→
LTX Video Trainer training	Train LTX Video 0.9.7 for custom styles and effects. ltx-video fine-tuning	Deprecated	2025/5/8	→
Recraft Creative Upscale image-to-image	Enhances a given raster image using the 'creative upscale' tool, increasing image resolution, making the image sharper and cleaner. upscaling	OK	2025/5/7	→
Recraft Crisp Upscale image-to-image	Enhances a given raster image using 'crisp upscale' tool, boosting resolution with a focus on refining small details and faces. upscaling	OK	2025/5/7	→
Recraft V3 Create Style training	Recraft V3 Create Style is capable of creating unique styles for Recraft V3 based on your images. style vector personalization	OK	2025/5/7	→
Recraft V3 image-to-image	Recraft V3 is a text-to-image model with the ability to generate long texts, vector art, images in brand style, and much more. As of today, it is SOTA in image generation, proven by Hugging Face's industry-leading Text-to-Image Benchmark by Artificial Analysis. vector typography style	OK	2025/5/7	→
Recraft V3 text-to-image	Recraft V3 is a text-to-image model with the ability to generate long texts, vector art, images in brand style, and much more. As of today, it is SOTA in image generation, proven by Hugging Face's industry-leading Text-to-Image Benchmark by Artificial Analysis. vector typography style	OK	2025/5/7	→
Ltx Video V097 video-to-video	Deprecated. Use fal-ai/ltx-video-13b-dev or fal-ai/ltx-video-13b-distilled instead.	Deprecated	2025/5/6	→
MiniMax Voice Cloning text-to-speech	Clone a voice from a sample audio and generate speech from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality text-to-speech. speech	OK	2025/5/6	→
MiniMax Speech-02 Turbo text-to-speech	Generate fast speech from text prompts and different voices using the MiniMax Speech-02 Turbo model, which leverages advanced AI techniques to create high-quality text-to-speech. speech	OK	2025/5/6	→
LTX Video-0.9.7 video-to-video	Deprecated. Use fal-ai/ltx-video-13b-dev or fal-ai/ltx-video-13b-distilled instead. video image-to-video text-to-video	Deprecated	2025/5/6	→
MiniMax Speech-02 HD text-to-speech	Generate speech from text prompts and different voices using the MiniMax Speech-02 HD model, which leverages advanced AI techniques to create high-quality text-to-speech. speech	OK	2025/5/6	→
LTX Video-0.9.7 text-to-video	Deprecated. Use fal-ai/ltx-video-13b-dev or fal-ai/ltx-video-13b-distilled instead. video text-video	Deprecated	2025/5/6	→
LTX Video-0.9.7 image-to-video	Deprecated. Use fal-ai/ltx-video-13b-dev or fal-ai/ltx-video-13b-distilled instead. video image-to-video	Deprecated	2025/5/6	→
Minimax Image Subject Reference image-to-image	Generate images from text and a reference image using MiniMax Image-01 for consistent character appearance. stylized transform	OK	2025/5/6	→
MiniMax (Hailuo AI) Text to Image text-to-image	Generate high quality images from text prompts using MiniMax Image-01. Longer text prompts will result in better quality images. stylized realism	OK	2025/5/6	→
Hidream I1 Full image-to-image	HiDream-I1 full is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds. image-to-image hidream	OK	2025/5/5	→
Pony V7 text-to-image	Pony V7 is a finetuned text to image for superior aesthetics and prompt following. diffusion style	OK	2025/5/5	→
Trellis image-to-3d	Generate 3D models from multiple images using Trellis. A native 3D generative model enabling versatile and high-quality 3D asset creation. stylized	OK	2025/5/2	→
Ideogram image-to-image	Extend existing images with Ideogram V3's reframe feature. Create expanded versions and adaptations while preserving main image and adding new creative directions through prompt guidance. realism typography	OK	2025/5/1	→
Ideogram Text to Image text-to-image	Generate high-quality images, posters, and logos with Ideogram V3. Features exceptional typography handling and realistic outputs optimized for commercial and creative use. realism typography	OK	2025/5/1	→
Ideogram Replace Background image-to-image	Replace backgrounds existing images with Ideogram V3's replace background feature. Create variations and adaptations while preserving core elements and adding new creative directions through prompt guidance.	OK	2025/5/1	→
Ideogram image-to-image	Reimagine existing images with Ideogram V3's remix feature. Create variations and adaptations while preserving core elements and adding new creative directions through prompt guidance. realism typography	OK	2025/5/1	→
Ideogram V3 Edit image-to-image	Transform existing images with Ideogram V3's editing capabilities. Modify, adjust, and refine images while maintaining high fidelity and realistic outputs with precise prompt control. realism typography	OK	2025/5/1	→
Hidream E1 Full image-to-image	Edit images with natural language	Deprecated	2025/4/29	→
F Lite text-to-image	F Lite is a 10B parameter diffusion model created by Fal and Freepik, trained exclusively on copyright-safe and SFW content.	Deprecated	2025/4/28	→
F Lite (texture mode) text-to-image	F Lite is a 10B parameter diffusion model created by Fal and Freepik, trained exclusively on copyright-safe and SFW content. This is a high texture density variant of the model.	Deprecated	2025/4/28	→
Moondream2 vision	Moondream2 is a highly efficient open-source vision language model that combines powerful image understanding capabilities with a remarkably small footprint. Vision	OK	2025/4/26	→
Moondream2 vision	Moondream2 is a highly efficient open-source vision language model that combines powerful image understanding capabilities with a remarkably small footprint. Vision	OK	2025/4/26	→
Moondream2 vision	Moondream2 is a highly efficient open-source vision language model that combines powerful image understanding capabilities with a remarkably small footprint. image-to-image	OK	2025/4/26	→
Moondream2 vision	Moondream2 is a highly efficient open-source vision language model that combines powerful image understanding capabilities with a remarkably small footprint. image-to-image	OK	2025/4/26	→
Step1X Edit image-to-image	Step1X-Edit transforms your photos with simple instructions into stunning, professional-quality edits—rivaling top proprietary tools. editing	Deprecated	2025/4/25	→
Tripo3D image-to-3d	State of the art Image to 3D Object generation. Generate 3D model from a single image! image-to-3d stylized	OK	2025/4/25	→
Image2svg image-to-image	Image2SVG transforms raster images into clean vector graphics, preserving visual quality while enabling scalable, customizable SVG outputs with precise control over detail levels. utility editing	OK	2025/4/25	→
Uno image-to-image	An AI model that transforms input images into new ones based on text prompts, blending reference visuals with your creative directions. image-to-image	OK	2025/4/24	→
MAGI-1 video-to-video	MAGI-1 extends videos with an exceptional understanding of physical interactions and prompts video-to-video	Deprecated	2025/4/23	→
MAGI-1 text-to-video	MAGI-1 is a video generation model with exceptional understanding of physical interactions and cinematic prompts text-to-video	Deprecated	2025/4/23	→
MAGI-1 image-to-video	MAGI-1 generates videos from images with exceptional understanding of physical interactions and prompting image-to-video	Deprecated	2025/4/23	→
gpt-image-1 text-to-image	OpenAI's latest image generation and editing model: gpt-1-image.	OK	2025/4/23	→
gpt-image-1 image-to-image	OpenAI's latest image generation and editing model: gpt-1-image.	OK	2025/4/23	→
PixVerse V4 Effects image-to-video	Generate high quality video clips with different effects using PixVerse v4 image-to-video	OK	2025/4/23	→
MAGI-1 (Distilled) video-to-video	MAGI-1 distilled extends videos faster with an exceptional understanding of physical interactions and prompts video-to-video video-extend	OK	2025/4/23	→
MAGI-1 (Distilled) image-to-video	MAGI-1 distilled generates videos faster from images with exceptional understanding of physical interactions and prompting image-to-video	OK	2025/4/23	→
Dia Tts audio-to-audio	Clone dialog voices from a sample audio and generate dialogs from text prompts using the Dia TTS which leverages advanced AI techniques to create high-quality text-to-speech. speech	OK	2025/4/22	→
Framepack image-to-video	Framepack is an efficient Image-to-video model that autoregressively generates videos. image to video motion	OK	2025/4/22	→
Dia text-to-speech	Dia directly generates realistic dialogue from transcripts. Audio conditioning enables emotion control. Produces natural nonverbals like laughter and throat clearing. text-to-speech	OK	2025/4/22	→
MAGI-1 (Distilled) text-to-video	MAGI-1 distilled is a faster video generation model with exceptional understanding of physical interactions and cinematic prompts text-to-video	OK	2025/4/22	→
Pipecat's Smart Turn model speech-to-text	An open source, community-driven and native audio turn detection model by Pipecat AI.	OK	2025/4/21	→
Juggernaut Flux Lora image-to-image	Juggernaut Base Flux LoRA Inpainting by RunDiffusion is a drop-in replacement for Flux [Dev] inpainting that delivers sharper details, richer colors, and enhanced realism to all your LoRAs and LyCORIS with full compatibility.	OK	2025/4/21	→
FASHN Virtual Try-On V1.5 image-to-image	FASHN v1.5 delivers precise virtual try-on capabilities, accurately rendering garment details like text and patterns at 576x864 resolution from both on-model and flat-lay photo references. try-on fashion clothing	OK	2025/4/21	→
Plushify image-to-image	Turn any image into a cute plushie!	Deprecated	2025/4/20	→
Instant Character image-to-image	InstantCharacter creates high-quality, consistent characters from text prompts, supporting diverse poses, styles, and appearances with strong identity control. personalization customization	OK	2025/4/18	→
Wan-2.1 First-Last-Frame-to-Video image-to-video	Wan-2.1 flf2v generates dynamic videos by intelligently bridging a given first frame to a desired end frame through smooth, coherent motion sequences. image to video motion	OK	2025/4/17	→
Turbo Flux Trainer training	A blazing fast FLUX dev LoRA trainer for subjects and styles.	OK	2025/4/17	→
Framepack image-to-video	Framepack is an efficient Image-to-video model that autoregressively generates videos. image to video motion	OK	2025/4/17	→
Kling 2.0 Master image-to-video	Generate video clips from your images using Kling 2.0 Master	OK	2025/4/14	→
Kling 2.0 Master text-to-video	Generate video clips from your prompts using Kling 2.0 Master	OK	2025/4/14	→
Cartoonify image-to-image	Transform images into 3D cartoon artwork using an AI model that applies cartoon stylization while preserving the original image's composition and details. stylized transform	OK	2025/4/14	→
Vace video-to-video	Vace a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources. video-to-video image-to-video text-to-video	Deprecated	2025/4/11	→
Hidream I1 Full text-to-image	HiDream-I1 full is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds.	OK	2025/4/11	→
Hidream I1 Dev text-to-image	HiDream-I1 dev is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds.	OK	2025/4/11	→
Hidream I1 Fast text-to-image	HiDream-I1 fast is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within 16 steps.	OK	2025/4/11	→
Finegrain Eraser Mask image-to-image	Finegrain Eraser removes any object selected with a mask—along with its shadows, reflections, and lighting artifacts—seamlessly reconstructing the scene with contextually accurate content. utility editing	OK	2025/4/10	→
Finegrain Eraser Bbox image-to-image	Finegrain Eraser removes any object selected with a bounding box—along with its shadows, reflections, and lighting artifacts—seamlessly reconstructing the scene with contextually accurate content. utility editing	OK	2025/4/9	→
Finegrain Eraser image-to-image	Finegrain Eraser removes objects—along with their shadows, reflections, and lighting artifacts—using only natural language, seamlessly filling the scene with contextually accurate content. utility editing	OK	2025/4/9	→
Video Sound Effects Generator video-to-video	Add sound effects to your videos sound-effects sfx cassetteai	OK	2025/4/7	→
Speech-to-Text speech-to-text	Leverage the rapid processing capabilities of AI models to enable accurate and efficient real-time speech-to-text transcription.	OK	2025/4/4	→
Speech-to-Text speech-to-text	Leverage the rapid processing capabilities of AI models to enable accurate and efficient real-time speech-to-text transcription. streaming	OK	2025/4/4	→
Speech-To-text speech-to-text	Leverage the rapid processing capabilities of AI models to enable accurate and efficient real-time speech-to-text transcription. streaming	OK	2025/4/4	→
Speech-to-Text speech-to-text	Leverage the rapid processing capabilities of AI models to enable accurate and efficient real-time speech-to-text transcription.	OK	2025/4/4	→
Sound Effects Generator text-to-audio	Create stunningly realistic sound effects in seconds - CassetteAI's Sound Effects Model generates high-quality SFX up to 30 seconds long in just 1 second of processing time sound sfx sound-effects cassetteai	OK	2025/4/3	→
Sync Lipsync 2.0 video-to-video	Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization with Sync Lipsync 2.0 model animation lip sync	OK	2025/4/1	→
FLUX.1 [dev] text-to-image	FLUX.1 [dev] is a 12 billion parameter flow transformer that generates high-quality images from text. It is suitable for personal and commercial use.	OK	2025/4/1	→
StarVector image-to-image	AI vectorization model that transforms raster images into scalable SVG graphics, preserving visual details while enabling infinite scaling and easy editing capabilities. image-to-image	Deprecated	2025/4/1	→
PixVerse V4 Image To Video Fast image-to-video	Generate fast high quality video clips from text and image prompts using PixVerse v4	OK	2025/4/1	→
PixVerse V4 Image To Video image-to-video	Generate high quality video clips from text and image prompts using PixVerse v4	OK	2025/4/1	→
PixVerse V3.5 Effects image-to-video	Generate high quality video clips with different effects using PixVerse v3.5	OK	2025/4/1	→
PixVerse V4 Text To Video text-to-video	Generate high quality video clips from text and image prompts using PixVerse v4	OK	2025/4/1	→
PixVerse V3.5 Transition image-to-video	Create seamless transition between images using PixVerse v3.5	OK	2025/4/1	→
PixVerse V4 Text To Video Fast text-to-video	Generate high quality and fast video clips from text and image prompts using PixVerse v4 fast	OK	2025/4/1	→
Ghiblify Images image-to-image	Reimagine and transform your ordinary photos into enchanting Studio Ghibli style artwork stylized transform	OK	2025/3/31	→
Orpheus TTS text-to-speech	Orpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been finetuned to deliver human-level speech synthesis, achieving exceptional clarity, expressiveness, and real-time performances. text to speech voice synthesis high-fidelity	OK	2025/3/31	→
Sana v1.5 1.6B text-to-image	Sana v1.5 1.6B is a lightweight text-to-image model that delivers 4K image generation with impressive efficiency. text to image 4k lightweight	OK	2025/3/31	→
Sana v1.5 4.8B text-to-image	Sana v1.5 4.8B is a powerful text-to-image model that generates ultra-high quality 4K images with remarkable detail. text to image 4k high-quality	OK	2025/3/31	→
Sana Sprint text-to-image	Sana Sprint is a text-to-image model capable of generating 4K images with exceptional speed. text to image 4k high-speed	OK	2025/3/31	→
music generator text-to-audio	CassetteAI’s model generates a 30-second sample in under 2 seconds and a full 3-minute track in under 10 seconds. At 44.1 kHz stereo audio, expect a level of professional consistency with no breaks, no squeaks, and no random interruptions in your creations. music cassetteai	OK	2025/3/27	→
Kling LipSync Text-to-Video text-to-video	Kling LipSync is a text-to-video model that generates realistic lip movements from text input. text to video lipsync	OK	2025/3/27	→
Kling LipSync Audio-to-Video text-to-video	Kling LipSync is an audio-to-video model that generates realistic lip movements from audio input. audio to video lipsync	OK	2025/3/27	→
LatentSync video-to-video	LatentSync is a video-to-video model that generates lip sync animations from audio using advanced algorithms for high-quality synchronization. animation lip sync	OK	2025/3/25	→
Wan-2.1 Text-to-Video with LoRAs text-to-video	Add custom LoRAs to Wan-2.1 is a text-to-video model that generates high-quality videos with high visual quality and motion diversity from images "text to video" "motion" "lora"	OK	2025/3/25	→
Wan-2.1 LoRA Trainer training	Train custom LoRAs for Wan-2.1 I2V 480P lora training	Deprecated	2025/3/24	→
Thera image-to-image	Fix low resolution images with fast speed and quality of thera.	Deprecated	2025/3/24	→
MixDehazer image-to-image	An advanced dehaze model to remove atmospheric haze, restoring clarity and detail in images through intelligent neural network processing.	Deprecated	2025/3/24	→
Hunyuan3D image-to-3d	Generate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation. stylized	OK	2025/3/20	→
Gemini Flash Edit Multi Image image-to-image	Gemini Flash Edit is a model that can edit single image using a text prompt and a reference image. editing	Deprecated	2025/3/20	→
Hunyuan3D image-to-3d	Generate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation. stylized	OK	2025/3/20	→
Hunyuan3D image-to-3d	Generate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation. stylized	OK	2025/3/20	→
Hunyuan3D image-to-3d	Generate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation. stylized	OK	2025/3/20	→
Hunyuan3D image-to-3d	Generate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation. stylized	OK	2025/3/20	→
Hunyuan3D image-to-3d	Generate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation. stylized	OK	2025/3/20	→
Gemini Flash Edit Multi Image image-to-image	Gemini Flash Edit Multi Image is a model that can edit multiple images using a text prompt and a reference image. editing	Deprecated	2025/3/20	→
Luma Ray 2 Flash (Image to Video) image-to-video	Ray2 Flash is a fast video generative model capable of creating realistic visuals with natural, coherent motion. motion transformation	OK	2025/3/17	→
Luma Ray 2 Flash text-to-video	Ray2 Flash is a fast video generative model capable of creating realistic visuals with natural, coherent motion. motion transformation	OK	2025/3/17	→
Pika Effects (v1.5) image-to-video	Pika Effects are AI-powered video effects designed to modify objects, characters, and environments in a fun, engaging, and visually compelling manner. editing effects animation	Deprecated	2025/3/14	→
Pika Image to Video Turbo (v2) image-to-video	Turbo is the model to use when you feel the need for speed. Turn your image to stunning video up to 3x faster – all with high quality outputs. editing effects animation	OK	2025/3/14	→
Pika Text to Video (v2.2) text-to-video	Start with a simple text input to create dynamic generations that defy expectations in up to 1080p. Experience better image clarity and crisper, sharper visuals. editing effects animation	OK	2025/3/14	→
Invisible Watermark image-to-image	Invisible Watermark is a model that can add an invisible watermark to an image. utility editing	OK	2025/3/14	→
Pika Text to Video (v2.1) text-to-video	Start with a simple text input to create dynamic generations that defy expectations. Anything you dream can come to life with sharp details, impressive character control and cinematic camera moves. editing effects animation	OK	2025/3/14	→
Pika Text to Video Turbo (v2) text-to-video	Pika v2 Turbo creates videos from a text prompt with high quality output. editing effects animation	OK	2025/3/14	→
Pika Image to Video (v2.2) image-to-video	Turn photos into mind-blowing, dynamic videos in up to 1080p. Experience better image clarity and crisper, sharper visuals. editing effects animation	OK	2025/3/14	→
Pika Scenes (v2.2) image-to-video	Pika Scenes v2.2 creates videos from a images with high quality output. editing effects animation	OK	2025/3/14	→
Pika Image to Video (v2.1) image-to-video	Turn photos into mind-blowing, dynamic videos. Your images can can come to life with sharp details, impressive character control and cinematic camera moves. editing effects animation	OK	2025/3/14	→
Pikadditions (v2) video-to-video	Pikadditions is a powerful video-to-video AI model that allows you to add anyone or anything to any video with seamless integration. editing effects animation	Deprecated	2025/3/14	→
CSM-1B text-to-audio	CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs. conversational text to speech	OK	2025/3/13	→
Wan Effects image-to-video	Wan Effects generates high-quality videos with popular effects from images motion effects	OK	2025/3/13	→
Vidu Image to Video image-to-video	Vidu Image to Video generates high-quality videos with exceptional visual quality and motion diversity from a single image motion image to video	OK	2025/3/12	→
Vidu Start-End to Video image-to-video	Vidu Start-End to Video generates smooth transition videos between specified start and end images. motion transition	OK	2025/3/12	→
Vidu Reference to Video image-to-video	Vidu Reference to Video creates videos by using a reference images and combining them with a prompt. motion reference	OK	2025/3/12	→
Vidu Template to Video image-to-video	Vidu Template to Video lets you create different effects by applying motion templates to your images. motion template	OK	2025/3/12	→
Wan-2.1 Pro Image-to-Video image-to-video	Wan-2.1 Pro is a premium image-to-video model that generates high-quality 1080p videos at 30fps with up to 6 seconds duration, delivering exceptional visual quality and motion diversity from images image to video motion	OK	2025/3/11	→
Wan-2.1 Pro Text-to-Video text-to-video	Wan-2.1 Pro is a premium text-to-video model that generates high-quality 1080p videos at 30fps with up to 6 seconds duration, delivering exceptional visual quality and motion diversity from text prompts text to video motion	OK	2025/3/11	→
Veo 2 (Image to Video) image-to-video	Veo 2 creates videos from images with realistic motion and very high quality output. motion transformation	Deprecated	2025/3/11	→
Wan-2.1 Image-to-Video with LoRAs image-to-video	Add custom LoRAs to Wan-2.1 is a image-to-video model that generates high-quality videos with high visual quality and motion diversity from images image to video motion lora	OK	2025/3/8	→
Kling 1.5 text-to-video	Generate video clips from your prompts using Kling 1.5 (pro)	OK	2025/3/6	→
Kling 1.0 text-to-video	Generate video clips from your prompts using Kling 1.0 motion	OK	2025/3/6	→
Kling 1.6 text-to-video	Generate video clips from your prompts using Kling 1.6 (pro)	OK	2025/3/6	→
Kling 1.6 text-to-video	Generate video clips from your prompts using Kling 1.6 (std)	OK	2025/3/6	→
Hunyuan Video Image-to-Video Inference image-to-video	Image to Video for the high-quality Hunyuan Video I2V model. motion	OK	2025/3/6	→
Juggernaut Flux Lightning text-to-image	Juggernaut Lightning Flux by RunDiffusion provides blazing-fast, high-quality images rendered at five times the speed of Flux. Perfect for mood boards and mass ideation, this model excels in both realism and prompt adherence. image generation	OK	2025/3/5	→
Juggernaut Flux Pro image-to-image	Juggernaut Pro Flux by RunDiffusion is the flagship Juggernaut model rivaling some of the most advanced image models available, often surpassing them in realism. It combines Juggernaut Base with RunDiffusion Photo and features enhancements like reduced background blurriness. image generation	OK	2025/3/5	→
Rundiffusion Photo Flux text-to-image	RunDiffusion Photo Flux provides insane realism. With this enhancer, textures and skin details burst to life, turning your favorite prompts into vivid, lifelike creations. Recommended to keep it at 0.65 to 0.80 weight. Supports resolutions up to 1536x1536. image generation lora	OK	2025/3/5	→
Juggernaut Flux Base LoRA text-to-image	Juggernaut Base Flux LoRA by RunDiffusion is a drop-in replacement for Flux [Dev] that delivers sharper details, richer colors, and enhanced realism to all your LoRAs and LyCORIS with full compatibility. image generation	OK	2025/3/5	→
Juggernaut Flux Base image-to-image	Juggernaut Base Flux by RunDiffusion is a drop-in replacement for Flux [Dev] that delivers sharper details, richer colors, and enhanced realism, while instantly boosting LoRAs and LyCORIS with full compatibility. image generation	OK	2025/3/5	→
LTX Video-0.9.5 video-to-video	Generate videos from prompts and videos using LTX Video-0.9.5 video video-to-video	OK	2025/3/5	→
LTX Video-0.9.5 text-to-video	Generate videos from prompts using LTX Video-0.9.5 video text-video	OK	2025/3/5	→
LTX Video-0.9.5 video-to-video	Generate videos from prompts,images, and videos using LTX Video-0.9.5 video image-to-video text-to-video	OK	2025/3/5	→
Juggernaut Flux Pro text-to-image	Juggernaut Pro Flux by RunDiffusion is the flagship Juggernaut model rivaling some of the most advanced image models available, often surpassing them in realism. It combines Juggernaut Base with RunDiffusion Photo and features enhancements like reduced background blurriness. image generation	OK	2025/3/5	→
Juggernaut Flux Base text-to-image	Juggernaut Base Flux by RunDiffusion is a drop-in replacement for Flux [Dev] that delivers sharper details, richer colors, and enhanced realism, while instantly boosting LoRAs and LyCORIS with full compatibility. image generation	OK	2025/3/5	→
LTX Video-0.9.5 image-to-video	Generate videos from prompts and images using LTX Video-0.9.5 video image-to-video	Deprecated	2025/3/5	→
CogView text-to-image	Generate high quality images from text prompts using CogView4. Longer text prompts will result in better quality images. stylized	OK	2025/3/4	→
Topaz Video Upscale video-to-video	Professional-grade video upscaling using Topaz technology. Enhance your videos with high-quality upscaling. upscaling high-res	OK	2025/3/4	→
DiffRhythm: Lyrics to Song text-to-audio	DiffRhythm is a blazing fast model for transforming lyrics into full songs. It boasts the capability to generate full songs in less than 30 seconds. music	OK	2025/3/4	→
DocRes-dewarp image-to-image	Enhance wraped, folded documents with the superior quality of docres for sharper, clearer results. image-enhancement	OK	2025/3/3	→
DocRes image-to-image	Enhance low-resolution, blur, shadowed documents with the superior quality of docres for sharper, clearer results. image-enhancement	OK	2025/3/3	→
SWIN2SR image-to-image	Enhance low-resolution images with the superior quality of Swin2SR for sharper, clearer results. image-enhancement	Deprecated	2025/2/28	→
Ideogram V2A Remix image-to-image	Create variations of existing images with Ideogram V2A Remix while maintaining creative control through prompt guidance. realism typography	OK	2025/2/27	→
Kling 1.6 text-to-video	Generate video clips from your prompts using Kling 1.6 (pro)	OK	2025/2/27	→
Ideogram V2A Turbo Remix image-to-image	Rapidly create image variations with Ideogram V2A Turbo Remix. Fast and efficient reimagining of existing images while maintaining creative control through prompt guidance. realism typography	OK	2025/2/27	→
ElevenLabs TTS Multilingual v2 text-to-audio	Generate multilingual text-to-speech audio using ElevenLabs TTS Multilingual v2. audio	OK	2025/2/27	→
Wan-2.1 1.3B Text-to-Video text-to-video	Wan-2.1 1.3B is a text-to-video model that generates high-quality videos with high visual quality and motion diversity from text promptsat faster speeds. text to video motion	Deprecated	2025/2/27	→
Ideogram V2A Turbo text-to-image	Accelerated image generation with Ideogram V2A Turbo. Create high-quality visuals, posters, and logos with enhanced speed while maintaining Ideogram's signature quality. realism typography	OK	2025/2/27	→
ElevenLabs Speech to Text speech-to-text	Generate text from speech using ElevenLabs advanced speech-to-text model. speech	OK	2025/2/27	→
ElevenLabs Audio Isolation audio-to-audio	Isolate audio tracks using ElevenLabs advanced audio isolation technology. audio	OK	2025/2/27	→
ElevenLabs TTS Turbo v2.5 text-to-speech	Generate high-speed text-to-speech audio using ElevenLabs TTS Turbo v2.5. audio	OK	2025/2/27	→
Ideogram V2A text-to-image	Generate high-quality images, posters, and logos with Ideogram V2A. Features exceptional typography handling and realistic outputs optimized for commercial and creative use. realism typography	OK	2025/2/27	→
EVF-SAM2 Segmentation image-to-image	EVF-SAM2 combines natural language understanding with advanced segmentation capabilities, allowing you to precisely mask image regions using intuitive positive and negative text prompts. segmentation mask	OK	2025/2/26	→
DDColor image-to-image	Bring colors into old or new black and white photos with DDColor. image-recolorization faces utility	OK	2025/2/26	→
Wan-2.1 Text-to-Video text-to-video	Wan-2.1 is a text-to-video model that generates high-quality videos with high visual quality and motion diversity from text prompts text to video motion	OK	2025/2/25	→
Wan-2.1 Image-to-Video image-to-video	Wan-2.1 is a image-to-video model that generates high-quality videos with high visual quality and motion diversity from images image to video motion	OK	2025/2/25	→
Video Prompt Generator llm	Generate video prompts using a variety of techniques including camera direction, style, pacing, special effects and more. motion transformation chat claude gpt	OK	2025/2/25	→
Segment Anything Model 2 image-to-image	SAM 2 is a model for segmenting images automatically. It can return individual masks or a single mask for the entire image. segmentation mask	OK	2025/2/25	→
MiniMax (Hailuo AI) Video 01 Director - Image to Video image-to-video	Generate video clips more accurately with respect to initial image, natural language descriptions, and using camera movement instructions for shot control. motion transformation camera-controls	OK	2025/2/24	→
DRCT-Super-Resolution image-to-image	Upscale your images with DRCT-Super-Resolution. upscaling high-res	OK	2025/2/24	→
Veo 2 text-to-video	Veo 2 creates videos with realistic motion and high quality output. Explore different styles and find your own with extensive camera controls. motion transformation	Deprecated	2025/2/21	→
NAFNet-deblur image-to-image	Use NAFNet to fix issues like blurriness and noise in your images. This model specializes in image restoration and can help enhance the overall quality of your photography. image-restoration deblur denoise	OK	2025/2/21	→
NAFNet-denoise image-to-image	Use NAFNet to fix issues like blurriness and noise in your images. This model specializes in image restoration and can help enhance the overall quality of your photography. image-restoration deblur denoise	OK	2025/2/21	→
Post Processing image-to-image	Post Processing is an endpoint that can enhance images using a variety of techniques including grain, blur, sharpen, and more. stylized utility	OK	2025/2/18	→
Skyreels V1 (Image-to-Video) image-to-video	SkyReels V1 is the first and most advanced open-source human-centric video foundation model. By fine-tuning HunyuanVideo on O(10M) high-quality film and television clips motion	Deprecated	2025/2/18	→
Flow-Edit image-to-image	The model provides you high quality image editing capabilities. editing	OK	2025/2/14	→
Kokoro TTS (Mandarin Chinese) text-to-audio	A highly efficient Mandarin Chinese text-to-speech model that captures natural tones and prosody. speech	OK	2025/2/14	→
Kokoro TTS (Hindi) text-to-audio	A fast and expressive Hindi text-to-speech model with clear pronunciation and accurate intonation. speech	OK	2025/2/14	→
Kokoro TTS (Brazilian Portuguese) text-to-audio	A natural and expressive Brazilian Portuguese text-to-speech model optimized for clarity and fluency. speech	OK	2025/2/14	→
Kokoro TTS (Spanish) text-to-audio	A natural-sounding Spanish text-to-speech model optimized for Latin American and European Spanish. speech	OK	2025/2/14	→
Kokoro TTS (French) text-to-audio	An expressive and natural French text-to-speech model for both European and Canadian French. speech	OK	2025/2/14	→
Kokoro TTS (British English) text-to-audio	A high-quality British English text-to-speech model offering natural and expressive voice synthesis. speech	OK	2025/2/14	→
Kokoro TTS text-to-audio	Kokoro is a lightweight text-to-speech model that delivers comparable quality to larger models while being significantly faster and more cost-efficient. speech	OK	2025/2/14	→
Kokoro TTS (Japanese) text-to-audio	A fast and natural-sounding Japanese text-to-speech model optimized for smooth pronunciation. speech	OK	2025/2/14	→
Zonos-Audio-Clone text-to-audio	Clone voice of any person and speak anything in their voice using zonos' voice cloning. voice cloning	OK	2025/2/14	→
Kokoro TTS (Italian) text-to-audio	A high-quality Italian text-to-speech model delivering smooth and expressive speech synthesis. speech	OK	2025/2/14	→
Luma Ray 2 (Image to Video) image-to-video	Ray2 is a large-scale video generative model capable of creating realistic visuals with natural, coherent motion. motion transformation	OK	2025/2/14	→
GOT OCR 2.0 vision	GOT-OCR2 works on a wide range of tasks, including plain document OCR, scene text OCR, formatted document OCR, and even OCR for tables, charts, mathematical formulas, geometric shapes, molecular formulas and sheet music. optical character recognition high-res utility	OK	2025/2/12	→
FLUX.1 [dev] Control LoRA Canny image-to-image	FLUX Control LoRA Canny is a high-performance endpoint that uses a control image using a Canny edge map to transfer structure to the generated image and another initial image to guide color. lora style transfer	OK	2025/2/11	→
FLUX.1 [dev] Control LoRA Depth image-to-image	FLUX Control LoRA Depth is a high-performance endpoint that uses a control image using a depth map to transfer structure to the generated image and another initial image to guide color. lora style transfer	OK	2025/2/11	→
ben-v2-image image-to-image	A fast and high quality model for image background removal. background removal	OK	2025/2/11	→
FLUX.1 [dev] Control LoRA Canny text-to-image	FLUX Control LoRA Canny is a high-performance endpoint that uses a control image to transfer structure to the generated image, using a Canny edge map. lora style transfer	OK	2025/2/11	→
FLUX.1 [dev] Control LoRA Depth text-to-image	FLUX Control LoRA Depth is a high-performance endpoint that uses a control image to transfer structure to the generated image, using a depth map. lora style transfer	OK	2025/2/11	→
MiniMax (Hailuo AI) Video 01 Director text-to-video	Generate video clips more accurately with respect to natural language descriptions and using camera movement instructions for shot control. motion transformation camera-controls	OK	2025/2/11	→
Ben-Video-Bg-Rm video-to-video	A model for high quality and smooth background removal for videos. segmentation background removal	OK	2025/2/11	→
Imagen3 text-to-image	Imagen3 is a high-quality text-to-image model that generates realistic images from text prompts.	Deprecated	2025/2/10	→
Imagen3 Fast text-to-image	Imagen3 Fast is a high-quality text-to-image model that generates realistic images from text prompts.	Deprecated	2025/2/10	→
Ideogram Upscale image-to-image	Ideogram Upscale enhances the resolution of the reference image by up to 2X and might enhance the reference image too. Optionally refine outputs with a prompt for guided improvements. upscaling high-res	OK	2025/2/10	→
Hunyuan Video Image-to-Video LoRA Inference image-to-video	Image to Video for the Hunyuan Video model using a custom trained LoRA. motion	OK	2025/2/3	→
CodeFormer image-to-image	Fix distorted or blurred photos of people with CodeFormer. image-restoration faces utility	OK	2025/1/31	→
Lumina Image 2 text-to-image	Lumina-Image-2.0 is a 2 billion parameter flow-based diffusion transforer which features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. diffusion typography style	OK	2025/1/31	→
Hunyuan Video (Video-to-Video) video-to-video	Hunyuan Video is an Open video generation model with high visual quality, motion diversity, text-video alignment, and generation stability. Use this endpoint to generate videos from videos. video to video motion	OK	2025/1/30	→
Hunyuan Video LoRA Inference (Video-to-Video) video-to-video	Hunyuan Video is an Open video generation model with high visual quality, motion diversity, text-video alignment, and generation stability. Use this endpoint to generate videos from videos. video to video motion lora	Deprecated	2025/1/30	→
PixVerse V3.5 Text To Video text-to-video	Generate high quality video clips from text prompts using PixVerse v3.5	OK	2025/1/29	→
PixVerse V3.5 Image To Video Fast image-to-video	Generate high quality video clips from text and image prompts quickly using PixVerse v3.5 Fast	OK	2025/1/29	→
PixVerse V3.5 Text To Video Fast text-to-video	Generate high quality video clips quickly from text prompts using PixVerse v3.5 Fast	OK	2025/1/29	→
PixVerse V3.5 Image To Video image-to-video	Generate high quality video clips from text and image prompts using PixVerse v3.5	OK	2025/1/29	→
DeepSeek Janus-Pro text-to-image	DeepSeek Janus-Pro is a novel text-to-image model that unifies multimodal understanding and generation through an autoregressive framework stylized	OK	2025/1/28	→
YuE: Lyrics to Song text-to-audio	YuE is a groundbreaking series of open-source foundation models designed for music generation, specifically for transforming lyrics into full songs. music	Deprecated	2025/1/28	→
Luma Ray 2 text-to-video	Ray2 is a large-scale video generative model capable of creating realistic visuals with natural, coherent motion. motion transformation	OK	2025/1/27	→
Kling Kolors Virtual TryOn v1.5 image-to-image	Kling Kolors Virtual TryOn v1.5 is a high quality image based Try-On endpoint which can be used for commercial try on. try-on fashion clothing	OK	2025/1/23	→
FFmpeg API Metadata json	Get encoding metadata from video and audio files using FFmpeg API. ffmpeg	OK	2025/1/22	→
FFmpeg API Waveform json	Get waveform data from audio files using FFmpeg API. ffmpeg	OK	2025/1/22	→
FFmpeg API Compose video-to-video	Compose videos from multiple media sources using FFmpeg API. ffmpeg	OK	2025/1/22	→
MiniMax (Hailuo AI) Video 01 Subject Reference image-to-video	Generate video clips maintaining consistent, realistic facial features and identity across dynamic video content subject transformation	OK	2025/1/20	→
MoonDreamNext Batch vision	MoonDreamNext Batch is a multimodal vision-language model for batch captioning. multimodal	OK	2025/1/17	→
FLUX.1 [dev] Canny with LoRAs image-to-image	Utilize Flux.1 [dev] Controlnet to generate high-quality images with precise control over composition, style, and structure through advanced edge detection and guidance mechanisms. controlnet detection lora editing composition	OK	2025/1/16	→
FLUX1.1 [pro] text-to-image	FLUX1.1 [pro] is an enhanced version of FLUX.1 [pro], improved image generation capabilities, delivering superior composition, detail, and artistic fidelity compared to its predecessor.	OK	2025/1/16	→
FLUX1.1 [pro] ultra Fine-tuned text-to-image	FLUX1.1 [pro] ultra fine-tuned is the newest version of FLUX1.1 [pro] with a fine-tuned LoRA, maintaining professional-grade image quality while delivering up to 2K resolution with improved photo realism. high-res realism	OK	2025/1/16	→
FLUX.1 [pro] Fill Fine-tuned image-to-image	FLUX.1 [pro] Fill Fine-tuned is a high-performance endpoint for the FLUX.1 [pro] model with a fine-tuned LoRA that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities. editing	OK	2025/1/16	→
Hunyuan Video LoRA Inference text-to-video	Hunyuan Video is an Open video generation model with high visual quality, motion diversity, text-video alignment, and generation stability	Deprecated	2025/1/16	→
Train Hunyuan LoRA training	Train Hunyuan Video lora on people, objects, characters and more! lora personalization	Deprecated	2025/1/14	→
CogVideoX-5B text-to-video	Generate videos from prompts using CogVideoX-5B	OK	2025/1/14	→
TransPixar V1 text-to-video	Transform text into stunning videos with TransPixar - an AI model that generates both RGB footage and alpha channels, enabling seamless compositing and creative video effects.	Deprecated	2025/1/14	→
sync.so -- lipsync 1.9.0-beta video-to-video	Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization. animation lip sync	OK	2025/1/13	→
Sa2VA 8B Video vision	Sa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels multimodal vision	OK	2025/1/13	→
Sa2VA 4B Video vision	Sa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels multimodal vision	OK	2025/1/13	→
Sa2VA 4B Image vision	Sa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels multimodal vision	OK	2025/1/13	→
Sa2VA 8B Image vision	Sa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels multimodal vision	OK	2025/1/13	→
MoonDreamNext vision	MoonDreamNext is a multimodal vision-language model for captioning, gaze detection, bbox detection, point detection, and more. multimodal vision	OK	2025/1/9	→
MoonDreamNext Detection image-to-image	MoonDreamNext Detection is a multimodal vision-language model for gaze detection, bbox detection, point detection, and more. multimodal	OK	2025/1/9	→
Kling 1.6 image-to-video	Generate video clips from your images using Kling 1.6 (pro)	OK	2025/1/7	→
Kling 1.6 text-to-video	Generate video clips from your prompts using Kling 1.6 (std)	OK	2025/1/7	→
Kling 1.6 image-to-video	Generate video clips from your images using Kling 1.6 (std)	OK	2025/1/7	→
Auto-Captioner video-to-video	Automatically generates text captions for your videos from the audio as per text colour/font specifications captioning video	OK	2025/1/3	→
Train Flux LoRA training	Train styles, people and other subjects at blazing speeds. lora personalization	OK	2025/1/1	→
Switti 1024 text-to-image	Switti is a scale-wise transformer for fast text-to-image generation that outperforms existing T2I AR models and competes with state-of-the-art T2I diffusion models while being faster than distilled diffusion models.	Deprecated	2024/12/31	→
Switti 512 text-to-image	Switti is a scale-wise transformer for fast text-to-image generation that outperforms existing T2I AR models and competes with state-of-the-art T2I diffusion models while being faster than distilled diffusion models.	Deprecated	2024/12/31	→
MMAudio V2 Text to Audio text-to-audio	MMAudio generates synchronized audio given text inputs. It can generate sounds described by a prompt. audio fast	OK	2024/12/20	→
Sad Talker image-to-video	Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation animation	OK	2024/12/20	→
Dubbing video-to-video	This endpoint delivers seamlessly localized videos by generating lip-synced dubs in multiple languages, ensuring natural and immersive multilingual experiences animation lip sync dubbing	Deprecated	2024/12/20	→
Bria Expand Image image-to-image	Bria Expand expands images beyond their borders in high quality. Trained exclusively on licensed data for safe and risk-free commercial use. Access the model's source code and weights: https://bria.ai/contact-us outpainting	OK	2024/12/19	→
Bria Text-to-Image Fast text-to-image	Bria's Text-to-Image model with perfect harmony of latency and quality. Trained exclusively on licensed data for safe and risk-free commercial use. Available also as source code and weights. For access to weights: https://bria.ai/contact-us image generation	OK	2024/12/19	→
Bria Text-to-Image Base text-to-image	Bria's Text-to-Image model, trained exclusively on licensed data for safe and risk-free commercial use. Available also as source code and weights. For access to weights: https://bria.ai/contact-us image generation	OK	2024/12/19	→
Bria GenFill image-to-image	Bria GenFill enables high-quality object addition or visual transformation. Trained exclusively on licensed data for safe and risk-free commercial use. Access the model's source code and weights: https://bria.ai/contact-us image editing	OK	2024/12/19	→
Bria Background Replace image-to-image	Bria Background Replace allows for efficient swapping of backgrounds in images via text prompts or reference image, delivering realistic and polished results. Trained exclusively on licensed data for safe and risk-free commercial use image editing	OK	2024/12/19	→
Bria Eraser image-to-image	Bria Eraser enables precise removal of unwanted objects from images while maintaining high-quality outputs. Trained exclusively on licensed data for safe and risk-free commercial use. Access the model's source code and weights: https://bria.ai/contact-us image editing object removal	OK	2024/12/19	→
FLUX.1 [dev] Fill with LoRAs image-to-image	FLUX.1 [dev] Fill is a high-performance endpoint for the FLUX.1 [pro] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities. editing lora	OK	2024/12/19	→
Bria Text-to-Image HD text-to-image	Bria's Text-to-Image model for HD images. Trained exclusively on licensed data for safe and risk-free commercial use. Available also as source code and weights. For access to weights: https://bria.ai/contact-us image generation	OK	2024/12/19	→
Bria Product Shot image-to-image	Place any product in any scenery with just a prompt or reference image while maintaining high integrity of the product. Trained exclusively on licensed data for safe and risk-free commercial use and optimized for eCommerce. product photography	OK	2024/12/19	→
Bria RMBG 2.0 image-to-image	Bria RMBG 2.0 enables seamless removal of backgrounds from images, ideal for professional editing tasks. Trained exclusively on licensed data for safe and risk-free commercial use. Model weights for commercial use are available here: https://share-eu1.hsforms.com/2GLpEVQqJTI2Lj7AMYwgfIwf4e04 background removal image segmentation high resolution utility rembg	OK	2024/12/19	→
try-on image-to-image	Image based high quality Virtual Try-On try-on fashion clothing	OK	2024/12/17	→
Leffa Pose Transfer image-to-image	Leffa Pose Transfer is an endpoint for changing pose of an image with a reference image. pose utility	OK	2024/12/17	→
FLUX1.1 [pro] ultra text-to-image	FLUX1.1 [pro] ultra is the newest version of FLUX1.1 [pro], maintaining professional-grade image quality while delivering up to 2K resolution with improved photo realism. high-res realism	OK	2024/12/17	→
Leffa Virtual TryOn image-to-image	Leffa Virtual TryOn is a high quality image based Try-On endpoint which can be used for commercial try on. try-on fashion clothing	OK	2024/12/17	→
MiniMax (Hailuo AI) Music text-to-audio	Generate music from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality, diverse musical compositions. music	OK	2024/12/17	→
MiniMax (Hailuo AI) Video 01 image-to-video	Generate video clips from your images using MiniMax Video model motion transformation	OK	2024/12/16	→
Recraft 20b text-to-image	Recraft 20b is a new and affordable text-to-image model. image generation vector art typograph style	OK	2024/12/16	→
Hyper3D Rodin image-to-3d	Rodin by Hyper3D generates realistic and production ready 3D models from text or images. stylized	OK	2024/12/16	→
MiniMax (Hailuo AI) Video 01 Live text-to-video	Generate video clips from your prompts using MiniMax model motion transformation	OK	2024/12/16	→
Ideogram V2 Edit image-to-image	Transform existing images with Ideogram V2's editing capabilities. Modify, adjust, and refine images while maintaining high fidelity and realistic outputs with precise prompt control. realism typography	OK	2024/12/14	→
Trellis image-to-3d	Generate 3D models from your images using Trellis. A native 3D generative model enabling versatile and high-quality 3D asset creation. stylized	OK	2024/12/13	→
MMAudio V2 video-to-video	MMAudio generates synchronized audio given video and/or text inputs. It can be combined with video models to get videos with audio. ai video fast	OK	2024/12/12	→
Ideogram V2 text-to-image	Generate high-quality images, posters, and logos with Ideogram V2. Features exceptional typography handling and realistic outputs optimized for commercial and creative use. realism typography	OK	2024/12/4	→
Ideogram V2 Turbo text-to-image	Accelerated image generation with Ideogram V2 Turbo. Create high-quality visuals, posters, and logos with enhanced speed while maintaining Ideogram's signature quality. realism typography	OK	2024/12/4	→
Video Upscaler video-to-video	The video upscaler endpoint uses RealESRGAN on each frame of the input video to upscale the video to a higher resolution. video generation video to video ai video high fidelity motion	OK	2024/12/4	→
Ideogram V2 Turbo Edit image-to-image	Edit images faster with Ideogram V2 Turbo. Quick modifications and adjustments while preserving the high-quality standards and realistic outputs of Ideogram. realism typography	OK	2024/12/4	→
Ideogram V2 Turbo Remix image-to-image	Rapidly create image variations with Ideogram V2 Turbo Remix. Fast and efficient reimagining of existing images while maintaining creative control through prompt guidance. realism typography	OK	2024/12/4	→
Ideogram V2 Remix image-to-image	Reimagine existing images with Ideogram V2's remix feature. Create variations and adaptations while preserving core elements and adding new creative directions through prompt guidance. realism typography	OK	2024/12/4	→
Kling 1.0 text-to-video	Generate video clips from your prompts using Kling 1.0 motion	OK	2024/12/3	→
Luma Photon Flash text-to-image	Generate images from your prompts using Luma Photon Flash. Photon Flash is the most creative, personalizable, and intelligent visual models for creatives, bringing a step-function change in the cost of high-quality image generation.	OK	2024/12/3	→
AuraFlow text-to-image	AuraFlow v0.3 is an open-source flow-based text-to-image generation model that achieves state-of-the-art results on GenEval. The model is currently in beta. typography style	OK	2024/12/2	→
OmniGen v1 text-to-image	OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It can be used for various tasks such as Image Editing, Personalized Image Generation, Virtual Try-On, Multi Person Generation and more! multimodal editing try-on	OK	2024/11/29	→
FLUX.1 [schnell] Redux image-to-image	FLUX.1 [schnell] Redux is a high-performance endpoint for the FLUX.1 [schnell] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities. style transfer	OK	2024/11/27	→
Kling 1.5 text-to-video	Generate video clips from your prompts using Kling 1.5 (pro)	OK	2024/11/25	→
FLUX.1 [schnell] text-to-image	FLUX.1 [schnell] is a 12 billion parameter flow transformer that generates high-quality images from text in 1 to 4 steps, suitable for personal and commercial use.	OK	2024/11/25	→
FLUX1.1 [pro] Redux image-to-image	FLUX1.1 [pro] Redux is a high-performance endpoint for the FLUX1.1 [pro] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities. style transfer	OK	2024/11/21	→
FLUX.1 [dev] Redux image-to-image	FLUX.1 [dev] Redux is a high-performance endpoint for the FLUX.1 [dev] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.	OK	2024/11/21	→
FLUX.1 [dev] Depth with LoRAs image-to-image	Generate high-quality images from depth maps using Flux.1 [dev] depth estimation model. The model produces accurate depth representations for scene understanding and 3D visualization. depth lora utility composition	OK	2024/11/21	→
LTX Video (preview) image-to-video	Generate videos from images using LTX Video	OK	2024/11/21	→
FLUX1.1 [pro] ultra Redux image-to-image	FLUX1.1 [pro] ultra Redux is a high-performance endpoint for the FLUX1.1 [pro] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities. style transfer high-res	OK	2024/11/21	→
FLUX.1 [pro] Fill image-to-image	FLUX.1 [pro] Fill is a high-performance endpoint for the FLUX.1 [pro] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities. editing	OK	2024/11/21	→
FLUX.1 [pro] Redux image-to-image	FLUX.1 [pro] Redux is a high-performance endpoint for the FLUX.1 [pro] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities. style transfer	Deprecated	2024/11/21	→
Kolors Image to Image image-to-image	Photorealistic Image-to-Image realism editing diffusion	OK	2024/11/19	→
IC-Light-v2 for Image Relighting image-to-image	An endpoint for re-lighting photos and changing their backgrounds per a given description relighting editing	OK	2024/11/14	→
Mochi 1 text-to-video	Mochi 1 preview is an open state-of-the-art video generation model with high-fidelity motion and strong prompt adherence in preliminary evaluation.	Deprecated	2024/11/7	→
Train Flux LoRAs For Portraits training	FLUX LoRA training optimized for portrait generation, with bright highlights, excellent prompt following and highly detailed results. lora personalization	OK	2024/11/7	→
FLUX.1 [dev] Differential Diffusion image-to-image	FLUX.1 Differential Diffusion is a rapid endpoint that enables swift, granular control over image transformations through change maps, delivering fast and precise region-specific modifications while maintaining FLUX.1 [dev]'s high-quality output. transformation	Deprecated	2024/11/6	→
MiniMax (Hailuo AI) Video 01 image-to-video	Generate video clips from your images using MiniMax Video model motion transformation	OK	2024/10/30	→
PuLID Flux image-to-image	An endpoint for personalized image generation using Flux as per given description. personalization style transfer	OK	2024/10/29	→
Birefnet Background Removal V2 image-to-image	bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS) background removal segmentation high-res utility	OK	2024/10/28	→
Stable Diffusion 3.5 Large text-to-image	Stable Diffusion 3.5 Large is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. diffusion typography style	OK	2024/10/27	→
Stable Diffusion 3.5 Medium text-to-image	Stable Diffusion 3.5 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. diffusion typography style	OK	2024/10/27	→
Hunyuan Video text-to-video	Hunyuan Video is an Open video generation model with high visual quality, motion diversity, text-video alignment, and generation stability. This endpoint generates videos from text descriptions. motion	OK	2024/10/22	→
CogVideoX-5B image-to-video	Generate videos from images and prompts using CogVideoX-5B	OK	2024/10/17	→
F5 TTS text-to-audio	F5 TTS speech	OK	2024/10/17	→
CogVideoX-5B video-to-video	Generate videos from videos and prompts using CogVideoX-5B editing	OK	2024/10/17	→
LLaVA v1.5 13B vision	Vision multimodal vision	Deprecated	2024/10/5	→
Kling 1.0 image-to-video	Generate video clips from your images using Kling 1.0 (pro) motion	Deprecated	2024/10/4	→
Kling 1.5 image-to-video	Generate video clips from your images using Kling 1.5 (pro)	OK	2024/10/4	→
Kling 1.0 image-to-video	Generate video clips from your images using Kling 1.0 motion	OK	2024/10/4	→
Kling 1.0 text-to-video	Generate video clips from your prompts using Kling 1.0 (pro) motion	Deprecated	2024/10/4	→
LTX Video (preview) text-to-video	Generate videos from prompts using LTX Video	OK	2024/10/4	→
FLUX.1 [pro] text-to-image	FLUX.1 [pro] new is an accelerated version of FLUX.1 [pro], maintaining professional-grade image quality while delivering significantly faster generation speeds.	Deprecated	2024/10/3	→
Live Portrait image-to-image	Transfer expression from a video to a portrait. expression animation	OK	2024/10/1	→
FLUX.1 [dev] Inpainting with LoRAs text-to-image	Super fast endpoint for the FLUX.1 [dev] inpainting model with LoRA support, enabling rapid and high-quality image inpaingting using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs. lora personalization	OK	2024/9/18	→
FLUX.1 [dev] with Controlnets and Loras image-to-image	A general purpose endpoint for the FLUX.1 [dev] model, implementing the RF-Inversion pipeline. This can be used to edit a reference image based on a prompt. rf-inversion editing lora	OK	2024/9/17	→
High Quality Stable Video Diffusion image-to-video	Generate short video clips from your images using SVD v1.1	OK	2024/9/16	→
Image Preprocessors image-to-image	Holistically-Nested Edge Detection (HED) preprocessor. preprocess detection utility controlnet	OK	2024/9/16	→
Image Preprocessors image-to-image	Scribble preprocessor. preprocess utility editing controlnet sketch	OK	2024/9/16	→
Image Preprocessors image-to-image	Depth Anything v2 preprocessor. depth preprocess utility controlnet	OK	2024/9/16	→
Image Preprocessors image-to-image	MiDaS depth estimation preprocessor. depth preprocess utility controlnet	OK	2024/9/16	→
Image Preprocessors image-to-image	Line art preprocessor. preprocess utility sketch controlnet	OK	2024/9/16	→
Image Preprocessors image-to-image	Segment Anything Model (SAM) preprocessor. segmentation preprocess utility mask controlnet	OK	2024/9/16	→
Image Preprocessors image-to-image	ZoeDepth preprocessor. depth preprocess utility controlnet	OK	2024/9/16	→
Image Preprocessors image-to-image	TEED (Temporal Edge Enhancement Detection) preprocessor. preprocess detection utility controlnet	OK	2024/9/16	→
Image Preprocessors image-to-image	M-LSD line segment detection preprocessor. preprocess utility controlnet	OK	2024/9/16	→
Image Preprocessors image-to-image	PIDI (Pidinet) preprocessor. detection preprocess utility controlnet	OK	2024/9/16	→
Stable Video Diffusion text-to-video	Generate short video clips from your prompts using SVD v1.1	OK	2024/9/16	→
ControlNeXt SVD video-to-video	Animate a reference image with a driving video using ControlNeXt. animation stylized	Deprecated	2024/9/5	→
FLUX.1 [dev] with Controlnets and Loras text-to-image	A versatile endpoint for the FLUX.1 [dev] model that supports multiple AI extensions including LoRA, ControlNet conditioning, and IP-Adapter integration, enabling comprehensive control over image generation through various guidance methods. lora controlnet ip-adapter	OK	2024/8/21	→
Stable Diffusion V3 text-to-image	Stable Diffusion 3 Medium (Text to Image) is a Multimodal Diffusion Transformer (MMDiT) model that improves image quality, typography, prompt understanding, and efficiency. diffusion style	OK	2024/8/20	→
Segment Anything Model image-to-image	SAM. segmentation mask	Deprecated	2024/8/20	→
Segment Anything Model 2 image-to-image	SAM 2 is a model for segmenting images and videos in real-time. segmentation mask real-time	OK	2024/8/15	→
Segment Anything Model 2 video-to-video	SAM 2 is a model for segmenting images and videos in real-time. segmentation mask real-time	OK	2024/8/15	→
FLUX.1 [dev] with Controlnets and Loras image-to-image	FLUX General Image-to-Image is a versatile endpoint that transforms existing images with support for LoRA, ControlNet, and IP-Adapter extensions, enabling precise control over style transfer, modifications, and artistic variations through multiple guidance methods. lora controlnet ip-adapter	OK	2024/8/14	→
FLUX.1 [dev] with Controlnets and Loras image-to-image	FLUX General Inpainting is a versatile endpoint that enables precise image editing and completion, supporting multiple AI extensions including LoRA, ControlNet, and IP-Adapter for enhanced control over inpainting results and sophisticated image modifications. lora controlnet ip-adapter	OK	2024/8/14	→
FLUX.1 [dev] with Controlnets and Loras image-to-image	A specialized FLUX endpoint combining differential diffusion control with LoRA, ControlNet, and IP-Adapter support, enabling precise, region-specific image transformations through customizable change maps. lora controlnet ip-adapter	OK	2024/8/13	→
FLUX.1 [dev] with LoRAs image-to-image	FLUX LoRA Image-to-Image is a high-performance endpoint that transforms existing images using FLUX models, leveraging LoRA adaptations to enable rapid and precise image style transfer, modifications, and artistic variations. lora style transfer	OK	2024/8/13	→
Fooocus Upscale or Vary text-to-image	Default parameters with automated optimizations and quality improvements. upscaling vary stylized	OK	2024/8/12	→
FLUX.1 Subject text-to-image	Super fast endpoint for the FLUX.1 [schnell] model with subject input capabilities, enabling rapid and high-quality image generation for personalization, specific styles, brand identities, and product-specific outputs. personalization customization	OK	2024/8/1	→
Sana text-to-image	Sana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, with the ability to generate 4K images in less than a second.	OK	2024/8/1	→
PixArt-Σ text-to-image	Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation diffusion	OK	2024/8/1	→
FLUX.1 [dev] with LoRAs text-to-image	Super fast endpoint for the FLUX.1 [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs. lora personalization	OK	2024/8/1	→
SDXL ControlNet Union text-to-image	An efficent SDXL multi-controlnet text-to-image model. diffusion controlnet composition	OK	2024/7/31	→
SDXL ControlNet Union image-to-image	An efficent SDXL multi-controlnet image-to-image model. diffusion controlnet composition	OK	2024/7/31	→
SDXL ControlNet Union image-to-image	An efficent SDXL multi-controlnet inpainting model. diffusion controlnet composition	OK	2024/7/31	→
Kolors text-to-image	Photorealistic Text-to-Image realism diffusion	OK	2024/7/24	→
AMT Frame Interpolation image-to-video	Interpolate between image frames interpolation editing	OK	2024/7/18	→
MusePose video-to-video	Animate a reference image with a driving video using MusePose.	Deprecated	2024/7/18	→
FLUX.1 [dev] image-to-image	FLUX.1 Image-to-Image is a high-performance endpoint for the FLUX.1 [dev] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities. style transfer	OK	2024/7/11	→
Live Portrait image-to-video	Transfer expression from a video to a portrait. expression animation	OK	2024/7/9	→
Era 3D image-to-image	A powerful image to novel multiview model with normals.	Deprecated	2024/7/1	→
Stable Cascade text-to-image	Stable Cascade: Image generation on a smaller & cheaper latent space. diffusion lcm	OK	2024/6/25	→
Florence-2 Large image-to-image	Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks detection multimodal vision	OK	2024/6/22	→
Florence-2 Large image-to-image	Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks multimodal vision detection	OK	2024/6/22	→
Florence-2 Large vision	Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks multimodal vision	OK	2024/6/22	→
Florence-2 Large image-to-image	Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks multimodal vision	OK	2024/6/22	→
Florence-2 Large image-to-image	Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks multimodal vision	OK	2024/6/22	→
Florence-2 Large image-to-image	Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks ocr multimodal vision	OK	2024/6/22	→
Florence-2 Large vision	Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks captioning multimodal vision	OK	2024/6/22	→
Florence 2 Large OCR vision	Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks ocr multimodal vision	OK	2024/6/22	→
Florence-2 Large image-to-image	Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks multimodal vision segmentation	OK	2024/6/22	→
Florence-2 Large image-to-image	Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks multimodal vision	OK	2024/6/22	→
Florence 2 Large Region To Category vision	Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks multimodal vision	OK	2024/6/22	→
Florence 2 Large Caption vision	Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks captioning multimodal vision	OK	2024/6/22	→
Florence-2 Large image-to-image	Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks multimodal vision segmentation	OK	2024/6/22	→
Florence 2 Large Detailed Caption vision	Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks captioning multimodal vision	OK	2024/6/22	→
Stable Diffusion XL text-to-image	Run SDXL at the speed of light diffusion lora embeddings high-res style	OK	2024/6/12	→
Stable Diffusion V3 image-to-image	Stable Diffusion 3 Medium (Image to Image) is a Multimodal Diffusion Transformer (MMDiT) model that improves image quality, typography, prompt understanding, and efficiency. diffusion editing style	OK	2024/6/12	→
SoteDiffusion text-to-image	Anime finetune of Würstchen V3. lcm stylized	OK	2024/6/10	→
Luma Photon text-to-image	Generate images from your prompts using Luma Photon. Photon is the most creative, personalizable, and intelligent visual models for creatives, bringing a step-function change in the cost of high-quality image generation.	OK	2024/6/3	→
Stable Video Diffusion Turbo text-to-video	Generate short video clips from your images using SVD v1.1 at Lightning Speed lcm diffusion turbo	OK	2024/6/3	→
DWPose Pose Prediction image-to-image	Predict poses from images. pose utility	OK	2024/6/1	→
SD 1.5 Depth ControlNet image-to-image	SD 1.5 ControlNet diffusion editing manipulation controlnet	Deprecated	2024/5/31	→
CCSR Upscaler image-to-image	SOTA Image Upscaler upscaling	OK	2024/5/5	→
Omni Zero image-to-image	Any pose, any style, any identity style transfer	OK	2024/4/25	→
Lightning Models text-to-image	Collection of SDXL Lightning models. diffusion lightning	Deprecated	2024/4/25	→
Playground v2.5 text-to-image	State-of-the-art open-source model in aesthetic quality artistic style	OK	2024/4/25	→
Hyper SDXL image-to-image	Hyper-charge SDXL's performance and creativity. diffusion editing	Deprecated	2024/4/25	→
Realistic Vision text-to-image	Generate realistic images. realism diffusion	OK	2024/4/25	→
Dreamshaper text-to-image	Dreamshaper model. stylized diffusion	OK	2024/4/25	→
Hyper SDXL image-to-image	Hyper-charge SDXL's performance and creativity. diffusion	Deprecated	2024/4/25	→
IP Adapter Face ID image-to-image	High quality zero-shot personalization ip-adapter personalization customization editing	OK	2024/4/22	→
Stable Diffusion with LoRAs image-to-image	Run Any Stable Diffusion model with customizable LoRA weights. diffusion lora customization fine-tuning	OK	2024/4/18	→
Stable Diffusion with LoRAs image-to-image	Run Any Stable Diffusion model with customizable LoRA weights. diffusion lora customization fine-tuning	OK	2024/4/17	→
Stable Diffusion XL image-to-image	Run SDXL at the speed of light diffusion high-res lora ip-adapter controlnet	OK	2024/4/16	→
Stable Diffusion XL image-to-image	Run SDXL at the speed of light diffusion high-res lora ip-adapter controlnet	OK	2024/4/16	→
Stable Diffusion v1.5 text-to-image	Stable Diffusion v1.5 diffusion	OK	2024/4/16	→
Layer Diffusion XL text-to-image	SDXL with an alpha channel.	Deprecated	2024/4/13	→
MuseTalk image-to-video	MuseTalk is a real-time high quality audio-driven lip-syncing model. Use MuseTalk to animate a face with your own audio. animation lip sync real-time	OK	2024/4/11	→
Stable Diffusion XL Lightning text-to-image	Run SDXL at the speed of light diffusion lightning real-time	OK	2024/4/11	→
AuraSR image-to-image	Upscale your images with AuraSR. upscaling high-res	OK	2024/4/11	→
Sad Talker image-to-video	Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation animation	OK	2024/4/11	→
Wizper (Whisper v3 -- fal.ai edition) speech-to-text	[Experimental] Whisper v3 Large -- but optimized by our inference wizards. Same WER, double the performance! transcription speech	OK	2024/4/8	→
NSFW Filter vision	Predict the probability of an image being NSFW. filter safety utility	OK	2024/3/22	→
Moondream vision	Answer questions from the images. multimodal vision	OK	2024/3/20	→
Fooocus text-to-image	Fooocus extreme speed mode as a standalone app. stylized	OK	2024/3/13	→
Face to Sticker image-to-image	Create stickers from faces. sticker editing	Deprecated	2024/3/11	→
PhotoMaker image-to-image	Customizing Realistic Human Photos via Stacked ID Embedding editing customization realism personalization	OK	2024/3/8	→
T2V Turbo - Video Crafter text-to-video	Generate short video clips from your prompts turbo	OK	2024/3/8	→
ControlNet SDXL text-to-image	Generate Images with ControlNet. diffusion controlnet manipulation	OK	2024/2/28	→
Creative Upscaler image-to-image	Create creative upscaled images. upscaling	OK	2024/2/27	→
Birefnet Background Removal image-to-image	bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS) background removal segmentation high-res utility	OK	2024/2/27	→
Stable Diffusion XL Lightning image-to-image	Run SDXL at the speed of light diffusion lightning editing	OK	2024/2/21	→
Playground v2.5 image-to-image	State-of-the-art open-source model in aesthetic quality inpaint artistic style	OK	2024/2/21	→
Stable Diffusion XL Lightning image-to-image	Run SDXL at the speed of light diffusion lightning	OK	2024/2/21	→
Hyper SDXL text-to-image	Hyper-charge SDXL's performance and creativity. diffusion real-time	Deprecated	2024/2/21	→
Playground v2.5 image-to-image	State-of-the-art open-source model in aesthetic quality artistic style	OK	2024/2/21	→
AMT Interpolation video-to-video	Interpolate between video frames interpolation editing	OK	2024/2/21	→
AnimateDiff text-to-video	Animate your ideas! animation stylized	OK	2024/2/21	→
Whisper speech-to-text	Whisper is a model for speech transcription and translation. transcription translation speech	Deprecated	2024/2/19	→
Latent Consistency Models (v1.5/XL) image-to-image	Run SDXL at the speed of light lcm diffusion turbo real-time editing	OK	2024/2/19	→
Latent Consistency Models (v1.5/XL) text-to-image	Run SDXL at the speed of light lcm diffusion turbo real-time	OK	2024/2/19	→
Latent Consistency Models (v1.5/XL) image-to-image	Run SDXL at the speed of light lcm diffusion turbo real-time editing	OK	2024/2/19	→
Fooocus text-to-image	Fooocus extreme speed mode as a standalone app.	Deprecated	2024/2/16	→
LLaVA v1.6 34B vision	Vision multimodal vision	OK	2024/2/14	→
AnimateDiff Turbo text-to-video	Animate your ideas in lightning speed! animation stylized turbo	OK	2024/2/13	→
Illusion Diffusion text-to-image	Create illusions conditioned on image. composition stylized	OK	2024/2/13	→
Fooocus Image Prompt text-to-image	Default parameters with automated optimizations and quality improvements. stylized	OK	2024/2/13	→
Face Retoucher image-to-image	Automatically retouches faces to smooth skin and remove blemishes. editing	OK	2024/2/13	→
Stable Video Diffusion Turbo image-to-video	Generate short video clips from your images using SVD v1.1 at Lightning Speed turbo	OK	2024/2/13	→
Midas Depth Estimation image-to-image	Create depth maps using Midas depth estimation. depth utility	OK	2024/2/13	→
AnimateDiff Turbo video-to-video	Re-animate your videos in lightning speed! animation stylized turbo	OK	2024/2/13	→
Fooocus Inpainting text-to-image	Default parameters with automated optimizations and quality improvements. stylized editing	OK	2024/2/13	→
MiniMax (Hailuo AI) Video 01 text-to-video	Generate video clips from your prompts using MiniMax model motion transformation	OK	2024/2/13	→
AnimateDiff video-to-video	Re-animate your videos! animation stylized	OK	2024/2/13	→
Clarity Upscaler image-to-image	Clarity upscaler for upscaling images with high very fidelity. upscaling	OK	2024/2/4	→
Latent Consistency (SDXL & SDv1.5) text-to-image	Produce high-quality images with minimal inference steps. diffusion lcm real-time	Deprecated	2024/2/4	→
TripoSR image-to-3d	State of the art Image to 3D Object generation	OK	2024/1/30	→
DiffusionEdge text-to-image	Diffusion based high quality edge detection detection	Deprecated	2024/1/8	→
Stable Audio Open text-to-audio	Open source text-to-audio model. music	OK	2024/1/4	→
Marigold Depth Estimation image-to-image	Create depth maps using Marigold depth estimation. depth utility	OK	2023/12/28	→
PuLID image-to-image	Tuning-free ID customization. editing customization personalization	OK	2023/12/14	→
ControlNet SDXL image-to-image	Generate Images with ControlNet. diffusion controlnet editing manipulation	OK	2023/12/1	→
ControlNet SDXL image-to-image	Generate Images with ControlNet. diffusion controlnet editing manipulation	OK	2023/12/1	→
Fooocus text-to-image	Default parameters with automated optimizations and quality improvements. stylized	OK	2023/11/16	→
Optimized Latent Consistency (SDv1.5) image-to-image	Produce high-quality images with minimal inference steps. Optimized for 512x512 input image size. diffusion lcm real-time	OK	2023/11/9	→
Animatediff SparseCtrl LCM text-to-video	Animate Your Drawings with Latent Consistency Models! lcm animation stylized	Deprecated	2023/11/9	→
Inpainting sdxl and sd image-to-image	Inpaint images with SD and SDXL editing diffusion	OK	2023/11/4	→
ControlNet SDXL image-to-image	Generate Images with ControlNet. diffusion controlnet manipulation	Deprecated	2023/11/1	→
Upscale Images image-to-image	Upscale images by a given factor. upscaling high-res	OK	2023/10/30	→
Remove Background image-to-image	Remove the background from an image. background removal utility editing	OK	2023/10/5	→
Stable Diffusion with LoRAs text-to-image	Run Any Stable Diffusion model with customizable LoRA weights. diffusion lora customization	OK	2023/9/26	→