fal.ai Models

Browse all available fal.ai models
just now
Total: 1225 New: 0 Active: 1170 Deprecated: 55
Model Name Description Status Published Action
Genfocus image-to-image GenFocus Model to Refocus Images
image-to-image
OK 1d
Genfocus image-to-image GenFocus Model to Refocus Images
image-to-image
OK 1d
Personaplex audio-to-audio PersonaPlex is a real-time, full-duplex speech-to-speech conversational model that enables persona control through text-based role prompts and audio-based voice conditioning.
audio
OK 2d
Workflow Utilities video-to-video FFMPEG Utility for Trim Video
video-to-video
OK 4d
Meshy 6 image-to-3d Meshy-6 is the latest model from Meshy. It generates realistic and production ready 3D models.
image-to-3d
OK 6d
Meshy 6 text-to-3d Meshy-6 is the latest model from Meshy. It generates realistic and production ready 3D models.
text-to-3d
OK 6d
Qwen Image Trainer V2 training Qwen Image LoRA training
lora personalization
OK 8d
Vidu image-to-video Vidu's Q3 Turbo Model
image-to-video
OK 9d
Vidu text-to-video Vidu's Q3 Turbo Model.
text-to-video
OK 9d
V2.6 video-to-video Wan 2.6 reference-to-video flash model.
reference-to-video
OK 9d
Bytedance video-to-video Transfer motion from a video to characters in an image using Dreamactor v2. Great performance for non-human and multiple characters
motion-control dreamactor
OK 9d
Flux 2 [klein] Realtime image-to-image Realtime generation with FLUX.2 [klein] from Black Forest Labs.
realtime image-to-image
OK 9d
Workflow Utilities audio-to-audio FFMPEG Utility for Impulse Response OK 10d
Workflow Utilities image-to-image FFMPEG Untility for Extracting nth Frame OK 10d
Workflow Utilities video-to-video FFMPEG Utility for Blending Videos OK 10d
Workflow Utilities audio-to-audio FFMPEG Utility for Audio Compression OK 10d
Kling Video v3 Text to Video [Pro] text-to-video Kling 3.0 Pro: Top-tier text-to-video with cinematic visuals, fluid motion, and native audio generation, with multi-shot support.
text-to-video
OK 11d
Kling O3 Image to Video [Pro] image-to-video Generate a video by taking a start frame and an end frame, animating the transition between them while following text-driven style and scene guidance.
image-to-video
OK 11d
Kling O3 Reference to Video [Pro] image-to-video Transform images, elements, and text into consistent, high-quality video scenes, ensuring stable character identity, object details, and environments.
reference-to-video
OK 11d
Kling O3 Text to Video [Pro] text-to-video Generate realistic videos using Kling O3 from Kling Team!
text-to-video
OK 11d
Kling O3 Text to Video [Standard] text-to-video Generate realistic videos using Kling O3 from Kling Team!
text-to-video
OK 11d
Kling O3 Reference to Video [Standard] image-to-video Transform images, elements, and text into consistent, high-quality video scenes, ensuring stable character identity, object details, and environments.
reference-to-video
OK 11d
Kling Video v3 Text to Video [Standard] text-to-video Kling 3.0 Standard: Top-tier text-to-video with cinematic visuals, fluid motion, and native audio generation, with multi-shot support.
text-to-video
OK 11d
Kling O3 Image to Video [Pro] image-to-video Generate a video by taking a start frame and an end frame, animating the transition between them while following text-driven style and scene guidance.
image-to-video
OK 11d
Kling O3 Edit Video [Standard] video-to-video Edit videos using Kling O3 from Kling Team!
video-to-video
OK 11d
Kling O3 Edit Video [Pro] video-to-video Edit videos using Kling O3 from Kling Team!
video-to-video
OK 11d
Kling O3 Reference Video to Video [Standard] video-to-video Kling O3 Omni generates new shots guided by an input reference video, preserving cinematic language such as motion, and camera style to produce seamless scene continuity.
video-to-video
OK 11d
Kling Video v3 Image to Video [Standard] image-to-video Kling 3.0 Standard: Top-tier image-to-video with cinematic visuals, fluid motion, and native audio generation, with custom element support.
image-to-video
OK 11d
Kling Video v3 Image to Video [Pro] image-to-video Kling 3.0 Pro: Top-tier image-to-video with cinematic visuals, fluid motion, and native audio generation, with custom element support.
image-to-video
OK 11d
Kling O3 Reference Video to Video [Pro] video-to-video Kling O3 Omni generates new shots guided by an input reference video, preserving cinematic language such as motion, and camera style to produce seamless scene continuity.
video-to-video
OK 11d
MiniMax Speech 2.8 [HD] text-to-speech Generate speech from text prompts and different voices using the MiniMax Speech-2.8 HD model, which leverages advanced AI techniques to create high-quality text-to-speech. OK 11d
MiniMax Speech 2.8 [Turbo] text-to-speech Generate speech from text prompts and different voices using the MiniMax Speech-2.8 Turbo model, which leverages advanced AI techniques to create high-quality text-to-speech. OK 11d
Kling Image text-to-image Kling V3: Latest Kling Image model
text-to-image
OK 12d
Kling Image image-to-image Kling Image V3: Latest kling image model
image-to-image
OK 12d
Kling Image image-to-image Kling Omni 3: Top-tier image-to-image with flawless consistency.
image-to-image
OK 12d
Kling Image text-to-image Kling Omni 3: Top-tier text-to-image with flawless consistency.
text-to-image
OK 12d
Vidu image-to-video Vidu's latest Q3 pro models.
image-to-video
OK 15d
Vidu text-to-video Vidu's latest Q3 pro models
text-to-video
OK 15d
Hunyuan 3d text-to-3d Create detailed, fully-textured 3D models with text
3d
OK 17d
Grok Imagine Image image-to-image Edit images precisely with xAI's Grok Imagine model
grok xai image-editing
OK 17d
Grok Imagine Image text-to-image Generate highly aesthetic images with xAI's Grok Imagine Image generation model.
xai grok text-to-image
OK 17d
Grok Imagine Video video-to-video Edit videos using xAI's Grok Imagine
video-edit v2v grok xai
OK 17d
Grok Imagine Video image-to-video Generate videos from images with audio using xAI's Grok Imagine Video model.
grok xai image-to-video i2v
OK 17d
Grok Imagine Video text-to-video Generate videos with audio from text using Grok Imagine Video.
xai grok t2v text-to-video
OK 17d
Hunyuan Image image-to-image Image editing endpoint for Hunyuan Image 3.0 Instruct.
tencent hunyuan-image instruct edit
OK 18d
Hunyuan Image 3.0 Instruct text-to-image Instruct version of Hunyuan-Image 3.0, with internal reasoning capabilities.
hunyuan-image v3 instruct
OK 18d
Hunyuan 3D Smart Topology 3d-to-3d Optimize 3D mesh topology with Hunyuan 3D Smart Topology.
3d hunyuan topology
OK 18d
Hunyuan 3D Rapid Image to 3D image-to-3d Rapidly generate 3D models from images using Hunyuan 3D.
3d hunyuan image-to-3d
OK 18d
Hunyuan 3D Pro Text to 3D text-to-3d Generate 3D models from text prompts with Hunyuan 3D Pro
3d hunyuan text-to-3d
OK 18d
Hunyuan 3D Pro Image to 3D image-to-3d Generate 3D models from images with Hunyuan 3D Pro
3d hunyuan image-to-3d
OK 18d
Z-Image Trainer training Fast LoRA trainer for Z-Image, a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.
lora personalization trainer
OK 18d
Hunyuan 3D Part Splitter 3d-to-3d Split 3D models into parts with Hunyuan 3D
3d hunyuan mesh
OK 18d
Qwen Image Max text-to-image Text-to-Image endpoint for Qwen-Image-Max. Qwen Image Max improves upon the Qwen Image Plus series by enhancing the realism and naturalness of images.
qwen-image max
OK 19d
Qwen Image Max image-to-image Image editing endpoint for Qwen-Image-Max. Qwen Image Max improves upon the Qwen Image Plus series by enhancing the realism and naturalness of images.
qwen-image max
OK 19d
Workflow Utilities unknown ffmpeg utility to interleave videos OK 19d
Z Image Base (LoRA) text-to-image LoRA endpoint for Z-Image, the foundation model of the Z- Image family.
z-image base lora
OK 19d
Z Image Base text-to-image Z-Image is the foundation model of the Z- Image family, engineered for good quality, robust generative diversity, broad stylistic coverage, and precise prompt adherence.
z-image base
OK 19d
LTX-2 19B Distilled audio-to-video Generate video with audio from audio, text and images using LTX-2 Distilled and custom LoRA OK 19d
LTX-2 19B audio-to-video Generate video with audio from audio, text and images using LTX-2 and custom LoRA OK 19d
LTX-2 19B Distilled audio-to-video Generate video with audio from audio, text and images using LTX-2 Distilled OK 19d
LTX-2 19B audio-to-video Generate video with audio from audio, text and images using LTX-2 OK 19d
Replace Background image-to-image Creates enriched product shots by placing them in various environments using textual descriptions.
bria replace-background
OK 19d
Pixverse image-to-video Use the latest pixverse v5.6 model to turn your texts and images into amazing videos.
image-to-video
OK 20d
Pixverse image-to-video Use the latest pixverse v5.6 model to turn your texts and images into amazing videos.
image-to-video
OK 20d
Pixverse text-to-video Use the latest pixverse v5.6 model to turn your texts into amazing videos.
text-to-video
OK 20d
Qwen 3 TTS - Voice Design [1.7B] text-to-speech Create custom voices using Qwen3-TTS Voice Design model and later use Clone Voice model to create your own voices!
text-to-speech voice-design
OK 20d
Qwen 3 TTS - Text to Speech [1.7B] text-to-speech Bring speech to your texts using Qwen3-TTS Custom-Voice model with pre-trained voices or use your custom voice with Qwen3-TTS Clone Voice model
text-to-speech
OK 20d
Qwen 3 TTS - Text to Speech [0.6B] text-to-speech Bring speech to your texts using Qwen3-TTS Custom-Voice model with pre-trained voices or use your custom voice with Qwen3-TTS Clone Voice model
text-to-speech
OK 20d
Qwen 3 TTS - Clone Voice [1.7B] unknown Clone your voices using Qwen3-TTS Clone-Voice model with zero shot cloning capabilities and use it on text-to-speech models to create speeches of yours!
clone-voice voice-clone
OK 20d
Qwen 3 TTS - Clone Voice [0.6B] unknown Clone your voices using Qwen3-TTS Clone-Voice model with zero shot cloning capabilities and use it on text-to-speech models to create speeches of yours!
clone-voice voice-clone
OK 20d
Z Image Turbo Trainer V2 training Fast LoRA trainer for Z-Image-Turbo, a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.
lora personalization trainer
OK 22d
Ai Face Swap video-to-video AI-FaceSwap-Video is a service that can replace a person's face throughout a video clip while keeping their movements natural.
faceswap utility transformation
OK 23d
Ai Face Swap image-to-image AI-FaceSwap-Image is a service that can take one person's face and realistically blend it onto another's in a photo.
faceswap utility transformation
OK 23d
Fibo Edit [Structured Instruction] text-to-json Structured Instructions Generation endpoint for Fibo Edit, Bria's newest editing model.
structured-prompt-generation fibo-edit json
OK 26d
Fibo Edit [Replace Object by Text] image-to-image Natural, expressive object swapping within images using plain language
object-replacement bria fibo-edit json
OK 26d
Fibo Edit [Sketch to Image] image-to-image Converts line drawings and sketches into photorealistic, fully colored images
sketch-to-image bria fibo-edit json
OK 26d
Fibo Edit [Restore] image-to-image Automatically renews and cleans noisy or degraded images.
image-restoration fibo-edit bria json
OK 26d
Fibo Edit [Reseason] image-to-image Transforms the seasonal or weather atmosphere of an image.
bria fibo-edit reseason
OK 26d
Fibo Edit [Relight] image-to-image Precise, controllable lighting changes using simple, structured text inputs.
bria fibo-edit relighting json
OK 26d
Fibo Edit [Restyle] image-to-image Transforms images into distinct artistic styles using curated, production-grade style mappings
bria fibo-edit restyle json
OK 26d
Fibo Edit [Rewrite Text] image-to-image Precise, reliable modification of existing text inside images.
bria fibo-edit text-rewriting image-editing
OK 26d
Fibo Edit [Erase by Text] image-to-image Fast, reliable removal of unwanted elements from images. Designed for predictability, scale, and production use.
bria fibo-edit prompt-eraser
OK 26d
Fibo Edit image-to-image A high-quality editing model that achieves maximum controllability and transparency by combining JSON + Mask + Image.
bria fibo-edit image-editing json
OK 26d
Fibo Edit [Add Object by Text] image-to-image Precise, context-aware insertion of new objects into an existing image using simple, structured spatial commands.
bria fibo-edit object-addition json
OK 26d
Fibo Edit [Blend] image-to-image Complex, multi-step visual composition through natural language.
bria fibo-edit blend json
OK 26d
Fibo Edit [Colorize] image-to-image Transforms the color treatment of images using predefined, style-based commands
bria fibo-edit color
OK 26d
Vidu image-to-video Use the latest Vidu Q2 Pro models which much more better quality and control on your videos. OK 27d
Flux 2 [klein] 9B Base Lora image-to-image Image-to-image editing with LoRA support for FLUX.2 [klein] 9B Base from Black Forest Labs. Specialized style transfer and domain-specific modifications. OK 27d
Flux 2 [klein] 9B Base Lora text-to-image Text-to-image generation with LoRA support for FLUX.2 [klein] 9B Base from Black Forest Labs. Custom style adaptation and fine-tuned model variations. OK 27d
Flux 2 [klein] 4B Base Lora image-to-image Image-to-image editing with LoRA support for FLUX.2 [klein] 4B Base from Black Forest Labs. Specialized style transfer and domain-specific modifications. OK 27d
Flux 2 [klein] 4B Base Lora text-to-image Text-to-image generation with LoRA support for FLUX.2 [klein] 4B Base from Black Forest Labs. Custom style adaptation and fine-tuned model variations. OK 27d
Nemotron audio-to-text Use the fast speed and pin point accuracy of nemotron to transcribe your texts. OK 27d
Nemotron audio-to-text Use the fast speed and pin point accuracy of nemotron to transcribe your texts. OK 27d
Fibo Lite text-to-json Structured Prompt Generation endpoint for Fibo-Lite, Bria's SOTA Open source model
bria fibo structured-prompt
OK 27d
Fibo Lite text-to-json Structured Prompt Generation endpoint for Fibo-Lite, Bria's SOTA Open source model
bria structured-prompting
OK 27d
V2.6 image-to-video Wan 2.6 image-to-video flash model. OK 28d
Flux 2 Klein 9B Base Trainer training Fine-tune FLUX.2 [klein] 4B from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific editing tasks. OK 29d
Flux 2 Klein 9B Base Trainer training Fine-tune FLUX.2 [klein] 9B from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific editing tasks. OK 29d
Flux 2 Klein 4B Base Trainer training Fine-tune FLUX.2 [klein] 4B from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific styles and domains. OK 29d
Flux 2 Klein image-to-image Image-to-image editing with LoRA support for FLUX.2 [klein] 9B from Black Forest Labs. Specialized style transfer and domain-specific modifications. Deprecated 29d
Flux 2 Klein text-to-image Text-to-image generation with LoRA support for FLUX.2 [klein] from Black Forest Labs. Custom style adaptation and fine-tuned model variations. Deprecated 29d
Flux 2 Klein 4B Base Trainer training Fine-tune FLUX.2 [klein] 4B from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific editing tasks. OK 29d
Flux 2 [klein] 4B Base image-to-image Image-to-image editing with Flux 2 [klein] 4B Base from Black Forest Labs. Precise modifications using natural language descriptions and hex color control. OK 1/15
FLUX.2 [klein] 9B Base text-to-image Text-to-image generation with FLUX.2 [klein] 9B Base from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities. OK 1/15
Flux 2 [klein] 9B Base image-to-image Image-to-image editing with Flux 2 [klein] 9B Base from Black Forest Labs. Precise modifications using natural language descriptions and hex color control. OK 1/15
Flux 2 [klein] 4B Base text-to-image Text-to-image generation with Flux 2 [klein] 4B Base from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities. OK 1/15
Flux 2 [klein] 4B image-to-image Image-to-image editing with Flux 2 [klein] 4B from Black Forest Labs. Precise modifications using natural language descriptions and hex color control. OK 1/15
Flux 2 [klein] 9B image-to-image Image-to-image editing with Flux 2 [klein] 9B from Black Forest Labs. Precise modifications using natural language descriptions and hex color control. OK 1/15
FLUX.2 [klein] 9B text-to-image Text-to-image generation with FLUX.2 [klein] 9B from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities. OK 1/15
Flux 2 [klein] 4B text-to-image Text-to-image generation with Flux 2 [klein] 4B from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities. OK 1/15
ImagineArt 1.5 Pro Preview text-to-image ImagineArt 1.5 Pro is an advanced text-to-image model that creates ultra-high-fidelity 4K visuals with lifelike realism, refined aesthetics, and powerful creative output suited for professional use.
visuals imagineart realism text
OK 1/15
Qwen Image 2512 Trainer V2 training Fast LoRA trainer for Qwen-Image-2512
lora personalization
OK 1/15
Flux 2 [klein] 4B image-to-image Image-to-image editing with Flux 2 [klein] 4B from Black Forest Labs. Precise modifications using natural language descriptions and hex color control. Deprecated 1/15
ElevenLabs Voice Changer audio-to-audio Change the voices in your audios with voices in ElevenLabs!
voice-change audio-to-audio
OK 1/14
ElevenLabs Dubbing audio-to-video Generate dubbed videos or audios using ElevenLabs Dubbing feature!
dubbing audio-to-audio
OK 1/14
ElevenLabs Speech to Text - Scribe V2 speech-to-text Use Scribe-V2 from ElevenLabs to do blazingly fast speech to text inferences!
speech-to-text
OK 1/14
Glm Image image-to-image Create high-quality images with accurate text rendering and rich knowledge details—supports editing, style transfer, and maintaining consistent characters across multiple images.
image-to-image
OK 1/14
Glm Image text-to-image Create high-quality images with accurate text rendering and rich knowledge details—supports editing, style transfer, and maintaining consistent characters across multiple images.
text-to-image
OK 1/14
OpenRouter [Video][Enterprise] video-to-text Run any VLM (Video Language Model) with fal, powered by OpenRouter. OK 1/13
OpenRouter [Video] video-to-text Run any VLM (Video Language Model) with fal, powered by OpenRouter. OK 1/13
Nova SR audio-to-audio Enhance muffled 16 kHz speech audio into crystal-clear 48 kHz
speech-enhancements audio-super-resolution audio-sr
OK 1/13
Flux 2 Trainer V2 training Fine-tune FLUX.2 [dev] from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific editing tasks. OK 1/10
Flux 2 Trainer V2 training Fine-tune FLUX.2 [dev] from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific styles and domains. OK 1/10
Longcat Multi Avatar audio-to-video LongCat-Video-Avatar is an audio-driven video generation model that can generates super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity.
audio-to-video image-to-video
OK 1/8
Silero VAD audio-to-text Detect speech presence and timestamps with accuracy and speed using the ultra-lightweight Silero VAD model
vad silero voice-activity-detection
OK 1/8
DeepFilterNet 3 audio-to-audio Enhance speech audio by removing background noise and upsampling to 48KHz
speech-enhancement
OK 1/7
LTX-2 Video to Video Trainer training Train LTX-2 for video transformation or video-conditioned generation.
ltx2-video fine-tuning video-to-video
OK 1/7
Qwen Image Edit 2511 Multiple Angles image-to-image Generates same scene from different angles (azimuth/elevation) with Qwen image Edit 2511 and the Lora Multiple Angles
stylized transform lora multi-angles multiples angles
OK 1/7
LTX-2 19B Distilled video-to-video Generate video with audio from videos using LTX-2 Distilled and custom LoRA OK 1/7
LTX-2 19B Distilled video-to-video Generate video with audio from videos using LTX-2 Distilled OK 1/7
LTX-2 19B video-to-video Generate video with audio from videos using LTX-2 and custom LoRA OK 1/7
LTX-2 19B video-to-video Generate video with audio from videos using LTX-2 OK 1/7
Ultrashape 3d-to-3d UltraShape-1.0 is a 3D diffusion framework that generates high-fidelity 3D geometry through coarse-to-fine geometric refinement.
3d-to-3d
OK 1/6
LTX-2 19B Distilled video-to-video Extend videos with audio using LTX-2 Distilled and custom LoRA OK 1/5
LTX-2 19B Distilled video-to-video Extend videos with audio using LTX-2 Distilled OK 1/5
LTX-2 19B Distilled image-to-video Generate video with audio from images using LTX-2 Distilled and custom LoRA OK 1/5
LTX-2 19B Distilled image-to-video Generate video with audio from images using LTX-2 Distilled OK 1/5
LTX-2 19B Distilled text-to-video Generate video with audio from text using LTX-2 Distilled and custom LoRA OK 1/5
LTX-2 19B Distilled text-to-video Generate video with audio from text using LTX-2 Distilled OK 1/5
LTX-2 19B video-to-video Extend video with audio using LTX-2 and custom LoRA OK 1/5
LTX-2 19B text-to-video Generate video with audio from text using LTX-2 and custom LoRA OK 1/5
LTX-2 19B image-to-video Generate video with audio from images using LTX-2 and custom LoRA OK 1/5
LTX-2 19B video-to-video Extend video with audio using LTX-2 OK 1/5
LTX-2 19B text-to-video Generate video with audio from text using LTX-2 OK 1/5
LTX-2 19B image-to-video Generate video with audio from images using LTX-2 OK 1/5
LTX-2 Video Trainer training Train LTX-2 for custom styles and effects.
ltx2-video fine-tuning
OK 1/3
Qwen Image 2512 text-to-image LoRA inference endpoint for Qwen Image 2512, an improved version of Qwen Image with better text rendering, finer natural textures, and more realistic human generation.
qwen 2512 lora
OK 1/2
Qwen Image 2512 Trainer training Qwen Image 2512 LoRA training
lora personalization
OK 1/1
Qwen Image Edit 2511 image-to-image Endpoint for Qwen's Image Editing 2511 model with LoRa support.
stylized transform lora
OK 2025/12/30
Qwen Image 2512 text-to-image Qwen Image 2512 is an improved version of Qwen Image with better text rendering, finer natural textures, and more realistic human generation.
qwen 2512
OK 2025/12/30
Longcat Multi Avatar audio-to-video LongCat-Video-Avatar is an audio-driven video generation model that can generates super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity.
audio-to-video image-to-video
OK 2025/12/30
Longcat Single Avatar audio-to-video LongCat-Video-Avatar is an audio-driven video generation model that can generates super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity.
audio-to-video image-to-video
OK 2025/12/30
Longcat Single Avatar audio-to-video LongCat-Video-Avatar is an audio-driven video generation model that can generates super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity.
audio-to-video
OK 2025/12/30
Sam Audio audio-to-audio Audio separation with SAM Audio. Isolate any sound using natural language—professional-grade audio editing made simple for creators, researchers, and accessibility applications.
audio-to-audio sam-audio
OK 2025/12/30
Sam Audio audio-to-audio Audio separation with SAM Audio. Isolate any sound using natural language—professional-grade audio editing made simple for creators, researchers, and accessibility applications.
audio-to-audio sam-audio
OK 2025/12/30
Sam Audio video-to-audio Audio separation with SAM Audio. Isolate any sound using natural language—professional-grade audio editing made simple for creators, researchers, and accessibility applications.
video-to-audio sam-audio
OK 2025/12/30
Ai Home image-to-image AI Home Style reimagines your home interior and exterior design with bold, prompt-driven concepts
stylized transform
OK 2025/12/30
Ai Home image-to-image AI Home Edit transforms your home interior and exterior photos with realistic, prompt-based edits
stylized transform
OK 2025/12/30
Hunyuan Motion [0.46B] text-to-3d Generate 3D human motions via text-to-generation interface of Hunyuan Motion!
text-to-3d motion
OK 2025/12/30
Hunyuan Motion [1B] text-to-3d Generate 3D human motions via text-to-generation interface of Hunyuan Motion!
text-to-3d motion
OK 2025/12/30
Arbiter vision Semantic image alignment measurements
clip-score
OK 2025/12/26
Arbiter vision Image reference comparison measurements
dists sdi mse ssim lpips
OK 2025/12/26
Arbiter vision Reference-free image measurements
arniqa nima iqa musiq
OK 2025/12/26
Wan Move [480p] image-to-video Use Wan-Move to generate videos with controlled the motion using trajectories
image-to-video motion-control motion
OK 2025/12/24
Qwen Image Layered image-to-image Qwen-Image-Layered is a model capable of decomposing an image into multiple RGBA layers. Use loras to get your custom outputs.
qwen lora
OK 2025/12/24
FFmpeg API [Merge Audios] audio-to-audio Merge audios into a single audio using FFmpeg API!
ffmpeg
OK 2025/12/23
Wan v2.6 Text to Image text-to-image Wan 2.6 text-to-image model.
text-to-image
OK 2025/12/23
Wan v2.6 Image to Image image-to-image Wan 2.6 image-to-image model.
image-to-image
OK 2025/12/23
Video video-to-video A high-fidelity capability for erasing unwanted objects, people, or visual elements from videos while maintaining aesthetic quality and temporal consistency.
bria video erase keypoints
OK 2025/12/23
Video video-to-video A high-fidelity capability for erasing unwanted objects, people, or visual elements from videos while maintaining aesthetic quality and temporal consistency
bria video erase
OK 2025/12/23
Video video-to-video A high-fidelity capability for erasing unwanted objects, people, or visual elements from videos while maintaining aesthetic quality and temporal consistency.
bria video erase
OK 2025/12/23
Qwen Image Edit 2511 Trainer training LoRA trainer for Qwen Image Edit 2511 OK 2025/12/23
Kandinsky5 Pro image-to-video Kandinsky 5.0 Pro is a diffusion model for fast, high-quality image-to-video generation. OK 2025/12/23
Kandinsky5 Pro text-to-video Kandinsky 5.0 Pro is a diffusion model for fast, high-quality text-to-video generation. OK 2025/12/23
Bytedance text-to-video Generate videos with audio with Seedance 1.5
bytedance seedance audio
OK 2025/12/23
Bytedance image-to-video Generate videos with audio with Seedance 1.5 (supports start & end frame)
bytedance seedance audio
OK 2025/12/23
Qwen Image Layered Trainer training Train LoRAs for the Qwen-Image-Layered model, customize how images are split into layers.
qwen layer trainer
OK 2025/12/23
Live Avatar image-to-video Real-time avatar generation with Live Avatar. Have natural face-to-face conversations with AI avatars that respond instantly—streaming infinite-length video with immediate visual feedback.
realtime image-to-video audio-to-video
OK 2025/12/22
OpenRouter [Audio] unknown Run any ALM (Audio Language Model) with fal, powered by OpenRouter. OK 2025/12/22
Elevenlabs Music text-to-audio Generate high quality, realistic music with fine controls using Elevenlabs Music!
music text-to-music
OK 2025/12/22
Lightx video-to-video Use tlightx capabilities to relight and recamera your videos.
video-to-video
OK 2025/12/22
Lightx video-to-video Use the capabilities of lightx to relight and recamera your videos.
video-to-video recamera relight
OK 2025/12/22
Kling Video v2.6 Motion Control [Standard] video-to-video Transfer movements from a reference video to any character image. Cost-effective mode for motion transfer, perfect for portraits and simple animations. OK 2025/12/21
Kling Video v2.6 Motion Control [Pro] video-to-video Transfer movements from a reference video to any character image. Pro mode delivers higher quality output, ideal for complex dance moves and gestures. OK 2025/12/21
Qwen Image Edit 2511 image-to-image Endpoint for Qwen's Image Editing 2511 model.
stylized transform
OK 2025/12/19
Qwen Image Layered image-to-image Qwen-Image-Layered is a model capable of decomposing an image into multiple RGBA layers.
qwen layer
OK 2025/12/19
Lucy Restyle video-to-video Restyle videos up to 30 min long - maintaining maximum detail quality.
video-edit
OK 2025/12/18
Z-Image Turbo image-to-image Generate images from text, an image, a mask and custom LoRA using Z-Image Turbo, Tongyi-MAI's super-fast 6B model.
inpainting
OK 2025/12/18
Z-Image Turbo image-to-image Generate images from text, an image and a mask using Z-Image Turbo, Tongyi-MAI's super-fast 6B model.
inpainting
OK 2025/12/18
Trellis 2 image-to-3d Generate 3D models from your images using Trellis 2. A native 3D generative model enabling versatile and high-quality 3D asset creation.
image-to-3D
OK 2025/12/17
Scail video-to-video SCAIL is a character animation model that uses 3D consistent pose representations to animate reference images with coherent motion, supporting complex movements. OK 2025/12/17
Crystal Upscaler [Video] video-to-video Do high precision video upscaling that respects the original video perfectly using Crystal Upscaler's new video upscaling method!
upscale video-to-video
OK 2025/12/17
Vibevoice text-to-speech Generate long speech snippets fast using Microsoft's powerful TTS.
vibevoice fast
OK 2025/12/17
Bria Video Eraser video-to-video A high-fidelity capability for erasing unwanted objects, people, or visual elements from videos while maintaining aesthetic quality and temporal consistency.
bria erase
OK 2025/12/17
Bria Video Eraser video-to-video A high-fidelity capability for erasing unwanted objects, people, or visual elements from videos while maintaining aesthetic quality and temporal consistency.
bria erase
OK 2025/12/17
Bria Video Eraser video-to-video A high-fidelity capability for erasing unwanted objects, people, or visual elements from videos while maintaining aesthetic quality and temporal consistency
bria erase
OK 2025/12/17
Hunyuan Video V1.5 image-to-video Hunyuan Video 1.5 is Tencent's latest and best video model
image-to-video
OK 2025/12/17
Hunyuan3d V3 text-to-3d Turn simple sketches into detailed, fully-textured 3D models. Instantly convert your concept designs into formats ready for Unity, Unreal, and Blender. OK 2025/12/16
Hunyuan3d V3 image-to-3d Create your imagined 3D models with just text. Production-ready, export-ready professional assets with realistic lighting and materials in minutes. OK 2025/12/16
Hunyuan3d V3 image-to-3d Transform your photos into ultra-high-resolution 3D models in seconds. Film-quality geometry with PBR textures, ready for games, e-commerce, and 3D printing. OK 2025/12/16
Flux 2 image-to-image Image-to-image editing with FLUX.2 [dev] from Black Forest Labs. Precise modifications using natural language descriptions and hex color control—in a flash. OK 2025/12/16
Flux 2 text-to-image Text-to-image generation with FLUX.2 [dev] from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities— in a flash. OK 2025/12/16
GPT-Image 1.5 image-to-image GPT Image 1.5 generates high-fidelity images with strong prompt adherence, preserving composition, lighting, and fine-grained detail.
openai gpt-image
OK 2025/12/16
GPT-Image 1.5 text-to-image GPT Image 1.5 generates high-fidelity images with strong prompt adherence, preserving composition, lighting, and fine-grained detail.
openai gpt-image
OK 2025/12/16
Kling Video Create Voice audio-to-audio Create Voices to be used with Kling Models Voice Control OK 2025/12/16
Fibo Lite text-to-image Fibo Lite, the new addition to the Fibo model family, allows generating high-quality images with the same controllability of the JSON structured prompt with significantly improved latency.
bria fibo lite
OK 2025/12/16
Flux 2 image-to-image Image-to-image editing with FLUX.2 [dev] from Black Forest Labs. Precise modifications using natural language descriptions and hex color control—all at turbo speed. OK 2025/12/16
Flux 2 text-to-image Text-to-image generation with FLUX.2 [dev] from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities—all at turbo speed. OK 2025/12/16
Flux 2 Max text-to-image FLUX.2 [max] delivers state-of-the-art image generation and advanced image editing with exceptional realism, precision, and consistency.
flux2 max
OK 2025/12/16
Flux 2 Max image-to-image FLUX.2 [max] delivers state-of-the-art image generation and advanced image editing with exceptional realism, precision, and consistency.
flux2 image-editing high-quality
OK 2025/12/16
Ai Baby And Aging Generator image-to-image AI Baby Generator is a service that instantly creates realistic predictions of a future child from parent photos.
stylized transform
OK 2025/12/16
Ai Baby And Aging Generator image-to-image AI Aging Generator performs controllable age progression or regression from a single face photo, generating lifelike portraits across eight age groups from baby to senior.
utility editing
OK 2025/12/16
Ai Detector vision AI Detector (Image) is an advanced service that analyzes a single picture and returns a verdict on whether it was likely created by AI.
utility
OK 2025/12/16
Ai Detector text-to-text AI Detector (Text) is an advanced AI service that analyzes a passage and returns a verdict on whether it was likely written by AI.
utility
OK 2025/12/16
Wan v2.6 Text to Video text-to-video Wan 2.6 text-to-video model.
text-to-video
OK 2025/12/16
Wan v2.6 Reference to Video video-to-video Wan 2.6 reference-to-video model.
reference-to-video
OK 2025/12/16
Wan 2.6 video-to-video Wan 2.6 reference-to-video model.
reference-to-video
Deprecated 2025/12/16
Wan 2.6 text-to-video Wan 2.6 text-to-video model.
text-to-video
Deprecated 2025/12/16
Qwen Image Edit 2509 Lora Gallery image-to-image Apply designs/graphics onto people's shirts
stylized transform
OK 2025/12/15
Qwen Image Edit 2509 Lora Gallery image-to-image Remove existing lighting and apply soft, even illumination
stylized transform
OK 2025/12/15
Qwen Image Edit 2509 Lora Gallery image-to-image Remove unwanted elements (objects, people, text) while maintaining image consistency
stylized transform
OK 2025/12/15
Qwen Image Edit 2509 Lora Gallery image-to-image Removes harsh shadows and light spots from images, replacing them with soft, even, natural-looking illumination.
stylized transform
OK 2025/12/15
Qwen Image Edit 2509 Lora Gallery image-to-image Blend products into backgrounds with automatic perspective and lighting correction
stylized transform
OK 2025/12/15
Qwen Image Edit 2509 Lora Gallery image-to-image Create group photos
stylized transform
OK 2025/12/15
Qwen Image Edit 2509 Lora Gallery image-to-image Generate full portrait from a cropped face photo
stylized transform
OK 2025/12/15
Qwen Image Edit 2509 Lora Gallery image-to-image Add a realistic scene behind the object with white background
stylized transform
OK 2025/12/15
Qwen Image Edit 2509 Lora Gallery image-to-image Create cinematic transitions and scene progressions (camera movements, framing changes)
stylized transform
OK 2025/12/15
Qwen Image Edit 2509 Lora Gallery image-to-image Precise camera position and angle control (rotation, zoom, vertical movement)
stylized transform
OK 2025/12/15
Veo 3.1 Fast video-to-video Extend Veo-Created Videos up to 30 seconds
extend-video
OK 2025/12/15
Veo 3.1 video-to-video Extend Veo-Created Videos up to 30 seconds
extend-video
OK 2025/12/15
Qwen Image Edit 2509 Lora image-to-image LoRA endpoint for the Qwen Image Edit 2509 model.
image-to-image image-editing
OK 2025/12/15
Qwen Image Edit 2509 Trainer training LoRA trainer for Qwen Image Edit 2509 OK 2025/12/15
Qwen Image Edit 2509 image-to-image Endpoint for Qwen's Image Editing Plus model also known as Qwen-Image-Edit-2509. Has superior text editing capabilities and multi-image support.
image-editing image-to-image high-quality-text
OK 2025/12/15
Wan v2.6 Image to Video image-to-video Wan 2.6 image-to-video model.
image-to-video
OK 2025/12/15
Kling O1 Reference Image to Video [Standard] image-to-video Transform images, elements, and text into consistent, high-quality video scenes, ensuring stable character identity, object details, and environments. OK 2025/12/15
Kling O1 First Frame Last Frame to Video [Standard] image-to-video Generate a video by taking a start frame and an end frame, animating the transition between them while following text-driven style and scene guidance. OK 2025/12/15
Kling O1 Reference Video to Video [Standard] video-to-video Kling O1 Omni generates new shots guided by an input reference video, preserving cinematic language such as motion, and camera style to produce seamless scene continuity. OK 2025/12/15
Kling O1 Edit Video [Standard] video-to-video Edit an existing video using natural-language instructions, transforming subjects, settings, and style while retaining the original motion structure. OK 2025/12/15
Chatterbox Turbo text-to-speech Turbo-charged voice generation. Control every breath, laugh, and sigh with inline tags - now at turbo speed.
text-to-speech
Deprecated 2025/12/15
Qwen Image Edit Plus Lora Gallery image-to-image Removes harsh shadows and light spots from images, replacing them with soft, even, natural-looking illumination.
stylized transform
OK 2025/12/12
Maya text-to-speech Maya1 is a state-of-the-art speech model by Maya Research for expressive voice generation, built to capture real human emotion and precise voice design.
text-to-speech tts
OK 2025/12/12
Maya text-to-speech Maya1 is a state-of-the-art speech model by Maya Research for expressive voice generation, built to capture real human emotion and precise voice design.
text-to-speech tts
OK 2025/12/12
Moondream3 Preview [Segment] image-to-image Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.
mask segmentation
OK 2025/12/12
Fabric 1.0 text-to-video VEED Fabric 1.0 text-to-video API
lipsync avatar text-to-video
OK 2025/12/12
Steady Dancer video-to-video Create smooth, realistic videos from a single photo while keeping the original appearance intact—precise motion control without losing identity or visual quality. OK 2025/12/11
One To All Animation video-to-video One-to-All Animation is a pose driven video model that animates characters from a single reference image, enabling flexible, alignment-free motion transfer across diverse styles and scenes
video to video motion
OK 2025/12/11
One To All Animation video-to-video One-to-All Animation is a pose driven video model that animates characters from a single reference image, enabling flexible, alignment-free motion transfer across diverse styles and scenes
video to video motion
OK 2025/12/11
Creatify Aurora image-to-video Generate high fidelity, studio quality videos of your avatar speaking or singing using the Aurora from Creatify team!
lipsync image-to-video
OK 2025/12/11
Wan Vision Enhancer video-to-video Wan Vision Enhancer for magnify/enhance video with high fidelity and creativity.
stylized transform
OK 2025/12/10
Sync React-1 video-to-video Use React-1 from SyncLabs to refine human emotions and do realistic lip-sync without losing details!
lipsync video-to-video
OK 2025/12/10
Stepx Edit2 image-to-image Image-to-image editing with Step1X-Edit v2 from StepFun. Reasoning-enhanced modifications through a thinking–editing–reflection loop with MLLM world knowledge for abstract instruction comprehension. OK 2025/12/9
Z-Image Turbo image-to-image Generate images from text and edge, depth or pose images using custom LoRA and Z-Image Turbo, Tongyi-MAI's super-fast 6B model.
turbo z-image fast lora
OK 2025/12/7
Z-Image Turbo image-to-image Generate images from text and edge, depth or pose images using Z-Image Turbo, Tongyi-MAI's super-fast 6B model. OK 2025/12/7
Z-Image Turbo image-to-image Generate images from text and images using custom LoRA and Z-Image Turbo, Tongyi-MAI's super-fast 6B model.
turbo z-image fast lora
OK 2025/12/7
Z-Image Turbo image-to-image Generate images from text and images using Z-Image Turbo, Tongyi-MAI's super-fast 6B model.
turbo z-image fast
OK 2025/12/7
Longcat Image image-to-image LongCat image Edit is a 6B parameter image editing model excelling at multilingual text rendering, photorealism and deployment efficiency. OK 2025/12/5
Longcat Image text-to-image LongCat image is a 6B parameter model excelling at multilingual text rendering, photorealism and deployment efficiency. OK 2025/12/5
Kling AI Avatar v2 Pro image-to-video Kling AI Avatar v2 Pro: The premium endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters OK 2025/12/4
Kling AI Avatar v2 Standard image-to-video Kling AI Avatar v2 Standard: Endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters OK 2025/12/4
Z Image Trainer training Train LoRAs on Z-Image Turbo, a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.
turbo z-image fast trainer
OK 2025/12/3
Bytedance text-to-image A new-generation image creation model ByteDance, Seedream 4.5 integrates image generation and image editing capabilities into a single, unified architecture.
stylized transform
OK 2025/12/3
Bytedance image-to-image A new-generation image creation model ByteDance, Seedream 4.5 integrates image generation and image editing capabilities into a single, unified architecture.
stylized transform
OK 2025/12/3
Sam 3 3d-to-3d SAM 3D enables full scene reconstructions, placing objects and humans in a shared context together.
align 3D
OK 2025/12/2
Sam 3 image-to-3d SAM 3D allows for accurate 3D reconstruction of human body shape and position from a single image.
3d human pose
OK 2025/12/2
Sam 3 image-to-3d SAM 3D enables precise 3D reconstruction of objects from real images, while accurately reconstructing their geometry and texture.
3d object
OK 2025/12/2
Vidu image-to-image Vidu Reference-to-Image creates images by using a reference images and combining them with a prompt.
images-to-imag reference-to-image
OK 2025/12/2
Vidu text-to-image Use vidu Text-to-Image to turn your prompts into reality. OK 2025/12/2
Kling Video v2.6 Text to Video text-to-video Kling 2.6 Pro: Top-tier text-to-video with cinematic visuals, fluid motion, and native audio generation. OK 2025/12/2
Kling Video v2.6 Image to Video image-to-video Kling 2.6 Pro: Top-tier image-to-video with cinematic visuals, fluid motion, and native audio generation. OK 2025/12/2
Pixverse image-to-video Pixverse Effects OK 2025/12/2
Pixverse image-to-video Pixverse Transition OK 2025/12/2
Z-Image Turbo text-to-image Text-to-Image endpoint with LoRA support for Z-Image Turbo, a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.
z-image lora fast
OK 2025/12/1
Pixverse image-to-video Generate high quality video clips from text and image prompts using PixVerse v5.5
image-to-video
OK 2025/12/1
Pixverse text-to-video Generate high quality video clips from text and image prompts using PixVerse v5.5
text-to-video
OK 2025/12/1
Video Background Removal video-to-video Remove background from any video with people and objects. No green screen needed. OK 2025/12/1
Kling O1 First Frame Last Frame to Video [Pro] image-to-video Generate a video by taking a start frame and an end frame, animating the transition between them while following text-driven style and scene guidance. OK 2025/12/1
Kling O1 Reference Image to Video [Pro] image-to-video Transform images, elements, and text into consistent, high-quality video scenes, ensuring stable character identity, object details, and environments. OK 2025/12/1
Kling O1 Edit Video [Pro] video-to-video Edit an existing video using natural-language instructions, transforming subjects, settings, and style while retaining the original motion structure. OK 2025/12/1
Kling O1 Reference Video to Video [Pro] video-to-video Kling O1 Omni generates new shots guided by an input reference video, preserving cinematic language such as motion, and camera style to produce seamless scene continuity. OK 2025/12/1
Kling O1 Image image-to-image Perform precise image edits using strong reference control, transforming subjects, styles, and local details while preserving visual consistency.
edit realism typography
OK 2025/12/1
Ovis Image text-to-image Ovis-Image is a 7B text-to-image model specifically optimized for quick, high quality text rendering.
ovis-image artistic
OK 2025/11/29
Video Background Removal video-to-video Remove background from any video with people and objects. No green screen needed. OK 2025/11/28
Video Background Removal video-to-video Remove background from videos filmed using chromakey, with automatic green spill suppression for clean, professional edges. OK 2025/11/28
LTX Video 2.0 Fast text-to-video Create high-fidelity video with audio from text with LTX-2 Fast OK 2025/11/26
LTX Video 2.0 Pro text-to-video Create high-fidelity video with audio from text with LTX-2 Pro. OK 2025/11/26
LTX Video 2.0 Fast image-to-video Create high-fidelity video with audio from images with LTX-2 Fast OK 2025/11/26
LTX Video 2.0 Pro image-to-video Create high-fidelity video with audio from images with LTX-2 Pro OK 2025/11/26
LTX Video 2.0 Retake video-to-video Change sections of a video using LTX-2 OK 2025/11/26
Z-Image Turbo text-to-image Z-Image Turbo is a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.
turbo z-image fast
OK 2025/11/26
LTX Video 2.0 Retake video-to-video Change sections of a video using LTX-2 Deprecated 2025/11/26
Lucy Edit [Fast] video-to-video Lucy Edit Fast is a rapid, localized video editing model that lets you modify specific elements like objects, or backgrounds in just 10 seconds.
edit video-edit
OK 2025/11/25
Flux 2 Lora Gallery text-to-image Applies sepia vintage effect to images
stylized transform
OK 2025/11/25
Flux 2 Lora Gallery image-to-image Virtual clothing try-on (2 images: person + garment)
stylized transform
OK 2025/11/25
Flux 2 Lora Gallery text-to-image Generates satellite/aerial view style images
stylized transform
OK 2025/11/25
Flux 2 Lora Gallery text-to-image Makes images more photorealistic and natural
stylized transform
OK 2025/11/25
Flux 2 Lora Gallery image-to-image Generates same object from different angles (azimuth/elevation)
stylized transform
OK 2025/11/25
Flux 2 Lora Gallery text-to-image HDR surrealistic effect with intense colors
stylized transform
OK 2025/11/25
Flux 2 Lora Gallery image-to-image Extends a face into a full body portrait
stylized transform
OK 2025/11/25
Flux 2 Lora Gallery text-to-image Transforms images into comic book style
stylized transform
OK 2025/11/25
Flux 2 Lora Gallery text-to-image Ballpoint pen sketch drawing style
stylized transform
OK 2025/11/25
Flux 2 Lora Gallery image-to-image Virtually furnishes an empty apartment
stylized transform
OK 2025/11/25
Flux 2 Lora Gallery image-to-image Add a background to images with white/clean background
stylized transform
OK 2025/11/25
Crystal Upscaler image-to-image An advanced image enhancement tool designed specifically for facial details and portrait photography, utilizing Clarity AI's upscaling technology.
image-to-image
OK 2025/11/25
Flux 2 Trainer training Fine-tune FLUX.2 [dev] from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific editing tasks. OK 2025/11/25
Flux 2 Trainer training Fine-tune FLUX.2 [dev] from Black Forest Labs with custom datasets. Create specialized LoRA adaptations for specific styles and domains. OK 2025/11/25
Flux 2 Flex image-to-image Image editing with FLUX.2 [flex] from Black Forest Labs. Supports multi-reference editing with customizable inference steps and enhanced text rendering. OK 2025/11/25
Flux 2 Flex text-to-image Text-to-image generation with FLUX.2 [flex] from Black Forest Labs. Features adjustable inference steps and guidance scale for fine-tuned control. Enhanced typography and text rendering capabilities.
stylized transform
OK 2025/11/25
Flux 2 image-to-image Image-to-image editing with LoRA support for FLUX.2 [dev] from Black Forest Labs. Specialized style transfer and domain-specific modifications. OK 2025/11/23
Flux 2 text-to-image Text-to-image generation with LoRA support for FLUX.2 [dev] from Black Forest Labs. Custom style adaptation and fine-tuned model variations. OK 2025/11/23
Flux 2 image-to-image Image-to-image editing with FLUX.2 [dev] from Black Forest Labs. Precise modifications using natural language descriptions and hex color control. OK 2025/11/23
Flux 2 text-to-image Text-to-image generation with FLUX.2 [dev] from Black Forest Labs. Enhanced realism, crisper text generation, and native editing capabilities. OK 2025/11/23
Flux 2 Pro text-to-image Image editing with FLUX.2 [pro] from Black Forest Labs. Ideal for high-quality image manipulation, style transfer, and sequential editing workflows OK 2025/11/23
Flux 2 Pro image-to-image Text-to-image generation with FLUX.2 [pro] from Black Forest Labs. Optimized for maximum quality, exceptional photorealism and artistic images. OK 2025/11/23
Chrono Edit Lora image-to-image LoRA endpoint for the Chrono Edit model.
image-to-image image-editing
OK 2025/11/21
Chrono Edit Lora Gallery image-to-image You can make edits simply by drawing a quick sketch on the input image.
paint edit sketch
OK 2025/11/21
Chrono Edit Lora Gallery image-to-image Upscales and cleans up the image.
upscale details
OK 2025/11/21
Hunyuan Video V1.5 text-to-video Hunyuan Video 1.5 is Tencent's latest and best video model
hunyuan-video text-to-video
OK 2025/11/21
Sam 3 image-to-image SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.
segmentation rle real-time
OK 2025/11/20
Sam 3 vision SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.
embeddings mask real-time
OK 2025/11/20
Sam 3 video-to-video SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.
segmentation mask real-time rle
OK 2025/11/20
Sam 3 video-to-video SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.
segmentation mask real-time
OK 2025/11/20
Segment Anything Model 3 image-to-image SAM 3 is a unified foundation model for promptable segmentation in images and videos. It can detect, segment, and track objects using text or visual prompts such as points, boxes, and masks.
segmentation mask real-time
OK 2025/11/20
Gemini 3 Pro Image Preview image-to-image Nano Banana Pro (a.k.a Nano Banana 2) is Google's new state-of-the-art image generation and editing model
realism typography
OK 2025/11/20
Gemini 3 Pro Image Preview text-to-image Nano Banana Pro (a.k.a Nano Banana 2) is Google's new state-of-the-art image generation and editing model
realism typography
OK 2025/11/20
Nano Banana Pro image-to-image Nano Banana Pro (a.k.a Nano Banana 2) is Google's new state-of-the-art image generation and editing model
realism typography
OK 2025/11/20
Nano Banana Pro text-to-image Nano Banana Pro (a.k.a Nano Banana 2) is Google's new state-of-the-art image generation and editing model
realism typography
OK 2025/11/20
Imagineart 1.5 Preview text-to-image ImagineArt 1.5 text-to-image model generates high-fidelity professional-grade visuals with lifelike realism, strong aesthetics, and text that actually reads correctly.
visuals imagineart realism text
OK 2025/11/20
Lynx image-to-video Generate subject consistent videos using Lynx from ByteDance!
image-to-video subject
OK 2025/11/18
Maya1 text-to-speech Maya1 is a state-of-the-art speech model by Maya Research for expressive voice generation, built to capture real human emotion and precise voice design.
text-to-speech tts
OK 2025/11/15
OpenRouter Responses [OpenAI Compatible] llm The OpenRouter Responses API with fal, powered by OpenRouter, provides unified access to a wide range of large language models - including GPT, Claude, Gemini, and many others through a single API interface. OK 2025/11/13
Fibo Mashup image-to-image Combine three images to create an amazing mashup image with Bria's FIBO model.
bria fibo image-to-image
Deprecated 2025/11/13
OpenRouter Embeddings [OpenAI Compatible] llm The OpenRouter Embeddings API with fal, powered by OpenRouter, provides unified access to a wide range of large language models - including GPT, Claude, Gemini, and many others through a single API interface. OK 2025/11/12
OpenRouter [Vision] vision Run any VLM (Vision Language Model) with fal, powered by OpenRouter. OK 2025/11/12
OpenRouter llm Run any LLM (Large Language Model) with fal, powered by OpenRouter. OK 2025/11/12
OpenRouter Chat Completions [OpenAI Compatible] llm Run any LLM (Large Language Model) with fal, powered by OpenRouter. This endpoint is compatible with the OpenAI API. OK 2025/11/12
Editto video-to-video Edit videos using instruction-based prompting using Editto model!
video-edit wan-vace
OK 2025/11/12
Qwen Image Edit Plus Lora Gallery image-to-image Precise camera position and angle control (rotation, zoom, vertical movement)
stylized transform
OK 2025/11/11
Qwen Image Edit Plus Lora Gallery image-to-image Apply designs/graphics onto people's shirts
stylized transform
OK 2025/11/11
Qwen Image Edit Plus Lora Gallery image-to-image Remove existing lighting and apply soft, even illumination
stylized transform
OK 2025/11/11
Qwen Image Edit Plus Lora Gallery image-to-image Remove unwanted elements (objects, people, text) while maintaining image consistency
stylized transform
OK 2025/11/11
Qwen Image Edit Plus Lora Gallery image-to-image Create cinematic transitions and scene progressions (camera movements, framing changes)
stylized transform
OK 2025/11/11
Qwen Image Edit Plus Lora Gallery image-to-image Blend products into backgrounds with automatic perspective and lighting correction
stylized transform
OK 2025/11/11
Qwen Image Edit Plus Lora Gallery image-to-image Create group photos
stylized transform
OK 2025/11/11
Qwen Image Edit Plus Lora Gallery image-to-image Generate full portrait from a cropped face photo
stylized transform
OK 2025/11/11
Qwen Image Edit Plus Lora Gallery image-to-image Add a realistic scene behind the object with white background
stylized transform
OK 2025/11/11
Flashvsr video-to-video Upscale your videos using FlashVSR with the fastest speeds!
upscale video-to-video
OK 2025/11/11
Pixverse image-to-video Generate high quality video clips by swapping person, objects and background using Pixverse Swap. OK 2025/11/10
Pika image-to-video Discover ultimate control with Pikaframes key frame interpolation, a stunning image-to-video feature that allows you to upload up to 5 keyframes, customize their transition length and prompt, and see their images come to life as seamless videos. OK 2025/11/7
Infinity Star text-to-video InfinityStar’s unified 8B spacetime autoregressive engine to turn any text prompt into crisp 720p videos - 10× faster than diffusion models.
text-to-video
OK 2025/11/7
Sana Video text-to-video Leverage Sana's ultra-fast processing speed to generate high-quality assets that transform your text prompts into production-ready videos
text-to-video
OK 2025/11/7
Crystal Upscaler image-to-image An advanced image enhancement tool designed specifically for facial details and portrait photography, utilizing Clarity AI's upscaling technology.
image-to-image
Deprecated 2025/11/5
Workflow Utilities video-to-video Add automatic subtitles to videos
auto-subtitle captioning
OK 2025/11/4
Reve image-to-image Reve’s fast remix model lets you upload an reference images and then combine/transform them via a text prompt at lightning speed!
image-to-image
OK 2025/11/4
Reve image-to-image Reve’s fast edit model lets you upload an existing image and then transform it via a text prompt at lightning speed!
image-to-image
OK 2025/11/4
Image Outpaint image-to-image Directional outpainting. Choose edges to expand. left, right, top, or center (uniform all sides). Only expanded areas are generated; an optional zoom-out pulls the frame back by the chosen amount.
outpainting
OK 2025/11/3
Fashion Size Estimator vision Fashion Size Estimator model analyzes human body images to predict clothing size recommendations and estimate key body measurements including height, bust, waist, and hip dimensions.
utility editing
Deprecated 2025/11/3
Flux Vision Upscaler image-to-image Flux Vision Upscaler for magnify/upscaling images with high fidelity and creativity. OK 2025/11/2
Emu 3.5 Image image-to-image Edit images with a text prompt using Emu 3.5 Image OK 2025/11/1
Emu 3.5 Image text-to-image Generate images from text using Emu 3.5 Image OK 2025/11/1
Sima Video Upscaler Lite video-to-video Upscale your videos at real-time speeds with Sima Labs!
upscale video-to-video
Deprecated 2025/10/31
Bytedance Upscaler video-to-video Upscale videos with Bytedance's video upscaler.
upscaler video bytedance
OK 2025/10/31
Sima Upscaler image-to-image Upscale your images at blazingly fast speeds with Sima Labs!
upscale image-to-image
Deprecated 2025/10/31
Chrono Edit image-to-image NVIDIA's Logically Consistent and Physics-Aware Image Editing Model
image-editing
OK 2025/10/30
Minimax Music text-to-audio Generate music from text prompts using the MiniMax Music 2.0 model, which leverages advanced AI techniques to create high-quality, diverse musical compositions.
music audio
OK 2025/10/30
Qwen Image Edit Plus Trainer training LoRA trainer for Qwen Image Edit Plus OK 2025/10/30
Qwen Image Edit Trainer training LoRA trainer for Qwen Image Edit OK 2025/10/30
LongCat Video text-to-video Generate long videos in 720p/30fps from text using LongCat Video OK 2025/10/30
LongCat Video image-to-video Generate long videos in 720p/30fps from images using LongCat Video OK 2025/10/30
LongCat Video image-to-video Generate long videos from images using LongCat Video OK 2025/10/30
LongCat Video text-to-video Generate long videos from text using LongCat Video OK 2025/10/30
LongCat Video Distilled image-to-video Generate long videos in 720p/30fps from images using LongCat Video Distilled OK 2025/10/30
LongCat Video Distilled text-to-video Generate long videos in 720p/30fps from text using LongCat Video Distilled OK 2025/10/30
Fibo text-to-json Structured Prompt Generation endpoint for Fibo, Bria's SOTA Open source model
bria fibo structured-prompting
OK 2025/10/29
Omnipart image-to-3d Image-to-3D endpoint for OmniPart, a part-aware 3D generator with semantic decoupling and structural cohesion. OK 2025/10/29
MiniMax Speech 2.6 [Turbo] text-to-speech Generate speech from text prompts and different voices using the MiniMax Speech-2.6 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.
text-to-speech
OK 2025/10/29
MiniMax Speech 2.6 [HD] text-to-speech Generate speech from text prompts and different voices using the MiniMax Speech-2.6 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.
text-to-speech
OK 2025/10/29
Video As Prompt video-to-video A model for unified semantic control in video generation. It animates a static reference image using the motion and semantics from a reference video as a prompt.
video-as-prompt semantic control
OK 2025/10/29
Bytedance image-to-3d Image to 3D endpoint for Bytedance's high-quality Seed3D 3d model generator.
seed3d.quality bytedance 3d
OK 2025/10/29
Fibo text-to-image SOTA Open source model trained on licensed data, transforming intent into structured control for precise, high-quality AI image generation in enterprise and agentic workflows.
bria fibo prompt-adherence
OK 2025/10/29
LongCat Video Distilled image-to-video Generate long videos from images using LongCat Video Distilled OK 2025/10/29
LongCat Video Distilled text-to-video Generate long videos from text using LongCat Video Distilled OK 2025/10/28
Demucs audio-to-audio SOTA stemming model for voice, drums, bass, guitar and more.
audio
OK 2025/10/27
Piflow text-to-image Use the faster speed of piflow to generate images with same quality to that of slower models.
text-to-image
OK 2025/10/27
MiniMax Hailuo 2.3 [Pro] (Image to Video) image-to-video MiniMax Hailuo-2.3 Image To Video API (Pro, 1080p): Advanced image-to-video generation model with 1080p resolution
image-to-video
OK 2025/10/27
MiniMax Hailuo 2.3 Fast [Standard] (Image to Video) image-to-video MiniMax Hailuo-2.3-Fast Image To Video API (Standard, 768p): Advanced fast image-to-video generation model with 768p resolution
image-to-video
OK 2025/10/27
MiniMax Hailuo 2.3 [Standard] (Image to Video) image-to-video MiniMax Hailuo-2.3 Image To Video API (Standard, 768p): Advanced image-to-video generation model with 768p resolution
image-to-video
OK 2025/10/27
MiniMax Hailuo 2.3 Fast [Pro] (Image to Video) image-to-video MiniMax Hailuo-2.3-Fast Image To Video API (Pro, 1080p): Advanced fast image-to-video generation model with 1080p resolution
image-to-video
OK 2025/10/27
MiniMax Hailuo 2.3 [Standard] (Text to Video) text-to-video MiniMax Hailuo-2.3 Text To Video API (Standard, 768p): Advanced text-to-video generation model with 768p resolution
text-to-video
OK 2025/10/27
MiniMax Hailuo 2.3 [Pro] (Text to Video) text-to-video MiniMax Hailuo-2.3 Text To Video API (Pro, 1080p): Advanced text-to-video generation model with 1080p resolution
text-to-video
OK 2025/10/27
Birefnet video-to-video Video background removal version of bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS)
utility editing
OK 2025/10/26
Audio Understanding audio-to-audio A audio understanding model to analyze audio content and answer questions about what's happening in the audio based on user prompts.
utility audio
OK 2025/10/24
Bytedance image-to-video Image to Video endpoint for Seedance 1.0 Pro Fast, a next-generation video model designed to deliver maximum performance at minimal cost
bytedance seedance pro fast
OK 2025/10/24
Bytedance text-to-video Text to Video endpoint for Seedance 1.0 Pro Fast, a next-generation video model designed to deliver maximum performance at minimal cost
bytedance fast motion
OK 2025/10/24
Vidu video-to-video Use the latest Vidu Q2 models which much more better quality and control on your videos. OK 2025/10/24
Vidu image-to-video Use the latest Vidu Q2 models which much more better quality and control on your videos.
image-to-video
OK 2025/10/24
Vidu image-to-video Use the latest Vidu Q2 models which much more better quality and control on your videos.
image-to-video
OK 2025/10/24
LTX Video 2.0 Pro text-to-video Create high-fidelity video with audio from text with LTX-2 Pro. Deprecated 2025/10/23
LTX Video 2.0 Fast text-to-video Create high-fidelity video with audio from text with LTX-2 Fast Deprecated 2025/10/23
LTX Video 2.0 Pro image-to-video Create high-fidelity video with audio from images with LTX-2 Pro Deprecated 2025/10/23
LTX Video 2.0 Fast image-to-video Create high-fidelity video with audio from images with LTX-2 Fast Deprecated 2025/10/23
Vidu text-to-video Use the latest Vidu Q2 models which much more better quality and control on your videos. OK 2025/10/22
Kling Video image-to-video Kling 2.5 Turbo Standard: Top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.
stylized transform
OK 2025/10/22
GPT Image 1 Mini image-to-image GPT Image 1 mini combines OpenAI's advanced language capabilities, powered by GPT-5, with GPT Image 1 Mini for efficient image generation.
image-to-image
OK 2025/10/21
GPT Image 1 Mini text-to-image GPT Image 1 mini combines OpenAI's advanced language capabilities, powered by GPT-5, with GPT Image 1 Mini for efficient image generation.
text-to-image
OK 2025/10/21
Qwen 3 Guard [8B] llm Use Qwen 3 Guard [8B] to detect and classify text as safe or harmful, delivering precise and reliable safety categorization.
filter safety utility
OK 2025/10/20
Krea Wan 14b- Text to Video text-to-video Fast Text-to-Video endpoint for Krea's Wan 14b model.
text to video fast
OK 2025/10/20
Sound Effect Generation text-to-audio Create professional-grade sound effects from animal and vehicle to nature, sci-fi, and otherworldly sounds. Perfect for films, games, and digital content.
sfx audio effects speech
OK 2025/10/18
Music Generation text-to-audio Generate royalty-free instrumental music from electronic, hip hop, and indie rock to cinematic and classical genres. Perfect for games, films, social content, podcasts, and more.
speech audio music
OK 2025/10/18
Meshy 5 Retexture 3d-to-3d Meshy-5 retexture applies new, high-quality textures to existing 3D models using either text prompts or reference images. It supports PBR material generation for realistic, production-ready results.
3d-to-3d
OK 2025/10/18
Meshy 5 Remesh 3d-to-3d Meshy-5 remesh allows you to remesh and export existing 3D models into various formats
3d-to-3d
OK 2025/10/18
Reve image-to-image Reve’s remix model lets you upload an reference images and then combine/transform them via a text prompt
image-to-image
OK 2025/10/17
Reve text-to-image Reve’s text-to-image model generates detailed visual output that closely follow your instructions, with strong aesthetic quality and accurate text rendering.
text-to-image
OK 2025/10/17
Reve image-to-image Reve’s edit model lets you upload an existing image and then transform it via a text prompt
image-to-image
OK 2025/10/17
Wan Alpha text-to-video Generate videos with transparent backgrounds
transparent alpha
OK 2025/10/16
Mirelo SFX V1.5 video-to-audio Generate synced sounds for any video, and return the new sound track (like MMAudio)
video-to-audio sfx
OK 2025/10/15
Mirelo SFX V1.5 video-to-video Generate synced sounds for any video, and return it with its new sound track (like MMAudio)
video-to-video sfx
OK 2025/10/15
Krea Wan 14B video-to-video Superfast video model based on Wan 2.1 14b by Krea, excelling at real-time video-editing. OK 2025/10/14
Image2Pixel image-to-image Turn images into pixel-perfect retro art
post-processing pixel-art
OK 2025/10/14
Kandinsky5 text-to-video Kandinsky 5.0 Distilled is a lightweight diffusion model for fast, high-quality text-to-video generation. OK 2025/10/13
Kandinsky5 text-to-video Kandinsky 5.0 is a diffusion model for fast, high-quality text-to-video generation. OK 2025/10/13
DreamOmni2 image-to-image DreamOmni2 is a unified multimodal model for text and image guided image editing. OK 2025/10/10
Moondream3 Preview [Detect] vision Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.
Vision
OK 2025/10/9
Moondream3 Preview [Point] vision Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.
Vision
OK 2025/10/9
Moondream 3 Preview [Query] vision Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.
Vision
OK 2025/10/9
Moondream3 Preview [Caption] vision Moondream 3 is a vision language model that brings frontier-level visual reasoning with native object detection, pointing, and OCR capabilities to real-world applications requiring fast, inexpensive inference at scale.
Vision
OK 2025/10/9
Kling Video video-to-audio Generate audio from input videos using Kling OK 2025/10/9
Sora 2 video-to-video Video-to-video remix endpoint for Sora 2, OpenAI’s advanced model that transforms existing videos based on new text or image prompts allowing rich edits, style changes, and creative reinterpretations while preserving motion and structure
video to video audio sora
OK 2025/10/8
Veo 3.1 Fast image-to-video Generate videos from a first/last frame using Google's Veo 3.1 Fast OK 2025/10/8
Veo 3.1 image-to-video Generate videos from a first and last framed using Google's Veo 3.1 OK 2025/10/8
Veo 3.1 image-to-video Generate Videos from images using Google's Veo 3.1 OK 2025/10/8
Veo 3.1 Fast text-to-video Faster and more cost effective version of Google's Veo 3.1! OK 2025/10/8
Veo 3.1 Fast image-to-video Generate videos from your image prompts using Veo 3.1 fast. OK 2025/10/8
Veo 3.1 image-to-video Veo 3.1 is the latest state-of-the art video generation model from Google DeepMind OK 2025/10/8
Veo 3.1 text-to-video Veo 3.1 by Google, the most advanced AI video generation model in the world. With sound on! OK 2025/10/8
Hunyuan Part 3d-to-3d Use the capabilities of hunyuan part to generate point clouds from your 3D files.
3D-to-3D point-cloud
OK 2025/10/8
Wan 2.1 VACE Long Reframe video-to-video Reframe entire videos scene-by-scene using Wan VACE 2.1 OK 2025/10/7
Index TTS 2.0 text-to-speech Generate natural, clear speeches using Index TTS 2.0 from IndexTeam
text-to-speech
OK 2025/10/7
Meshy 6 Preview text-to-3d Meshy-6-Preview is the latest model from Meshy. It generates realistic and production ready 3D models.
text-to-3d
OK 2025/10/6
Meshy 5 Multi image-to-3d Meshy-5 multi image generates realistic and production ready 3D models from multiple images.
multi-image-to-3d
OK 2025/10/6
Meshy 6 Preview image-to-3d Meshy-6-Preview is the latest model from Meshy. It generates realistic and production ready 3D models.
image-to-3d
OK 2025/10/6
Sora 2 image-to-video Image-to-video endpoint for Sora 2 Pro, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.
image-to-video audio sora-2-pro
OK 2025/10/6
Sora 2 text-to-video Text-to-video endpoint for Sora 2 Pro, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.
text-to-video audio sora-2-pro
OK 2025/10/6
Sora 2 text-to-video Text-to-video endpoint for Sora 2, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.
text to video audio sora
OK 2025/10/6
Sora 2 image-to-video Image-to-video endpoint for Sora 2, OpenAI's state-of-the-art video model capable of creating richly detailed, dynamic clips with audio from natural language or images.
image-to-video audio sora
OK 2025/10/6
Qwen Image Edit Plus Lora image-to-image LoRA endpoint for the Qwen Image Edit Plus model.
image-to-image image-editing
OK 2025/10/3
Lucidflux image-to-image LucidFlux for upscaling images with very high fidelity
image-to-image
OK 2025/10/3
Ovi image-to-video Ovi can generate videos with audio from image and text inputs.
image-to-audio-video image-to-video
OK 2025/10/3
Ovi Text to Video text-to-video A unified paradigm for audio-video generation OK 2025/10/3
Fabric 1.0 Fast image-to-video VEED Fabric 1.0 is an image-to-video API that turns any image into a talking video
lipsync avatar
OK 2025/10/1
Qwen Image Edit image-to-image Image to Image Endpoint for Qwen's Image Editing model. Has superior text editing capabilities.
stylized transform
OK 2025/9/30
Hunyuan Image text-to-image Leverage the state-of-the-art capabilities of Hunyuan Image 3.0 to generate visual content that effectively conveys the messaging of your written material.
text-to-image
OK 2025/9/28
Hyper3d image-to-3d Rodin by Hyper3D generates realistic and production ready 3D models from text or images.
image-to-3d text-to-3d
OK 2025/9/26
Lynx image-to-video Generate subject consistent videos using Lynx from ByteDance!
image-to-video subject
Deprecated 2025/9/26
Wan 2.5 Image to Image image-to-image Wan 2.5 image-to-image model. OK 2025/9/25
Wan 2.5 Text to Image text-to-image Wan 2.5 text-to-image model. OK 2025/9/25
Wan 2.5 Text to Video text-to-video Wan 2.5 text-to-video model. OK 2025/9/24
Wan 2.5 Image to Video image-to-video Wan 2.5 image-to-video model. OK 2025/9/24
Bytedance OmniHuman v1.5 image-to-video Omnihuman v1.5 is a new and improved version of Omnihuman. It generates video using an image of a human figure paired with an audio file. It produces vivid, high-quality videos where the character’s emotions and movements maintain a strong correlation with the audio.
image-to-video lipsync
OK 2025/9/23
Product Photoshoot image-to-image Create product advertisements with an example image of the product Deprecated 2025/9/23
Qwen Image Edit Plus image-to-image Endpoint for Qwen's Image Editing Plus model also known as Qwen-Image-Edit-2509. Has superior text editing capabilities and multi-image support.
image-editing image-to-image high-quality-text
OK 2025/9/22
Kling Video image-to-video Kling 2.5 Turbo Pro: Top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.
stylized transform
OK 2025/9/22
Kling v2.5 Text to Video text-to-video Kling 2.5 Turbo Pro: Top-tier text-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.
animation stylized
OK 2025/9/22
Infinitalk video-to-video Infinitalk model generates a talking avatar video from an image and audio file. The avatar lip-syncs to the provided audio with natural facial expressions.
video-to-video
OK 2025/9/22
SeedVR2 video-to-video Upscale your videos using SeedVR2 with temporal consistency!
upscale video-to-video
OK 2025/9/22
SeedVR2 image-to-image Use SeedVR2 to upscale your images
upscale image-to-image
OK 2025/9/22
Wan VACE Video Edit video-to-video Edit videos using plain language and Wan VACE
video-edit wan-vace
OK 2025/9/22
Wan-2.2 Animate Replace video-to-video Wan-Animate Replace is a model that can integrate animated characters into reference videos, replacing the original character while preserving the scene’s lighting and color tone for seamless environmental integration.
video to video motion
OK 2025/9/21
Wan-2.2 Animate Move video-to-video Wan-Animate is a video model that generates high-fidelity character videos by replicating the expressions and movements of characters from reference videos.
video to video motion
OK 2025/9/21
Fabric 1.0 image-to-video VEED Fabric 1.0 is an image-to-video API that turns any image into a talking video
lipsync avatar
OK 2025/9/19
Product Holding image-to-image Place products naturally in a person’s hands for realistic marketing visuals.
product marketing
OK 2025/9/19
Product Photography image-to-image Generate professional product photography with realistic lighting and backgrounds.
product marketing
OK 2025/9/19
Lucy Edit [Pro] video-to-video Edit outfits, objects, faces, or restyle your video - all with maximum detail retention.
video-edit
OK 2025/9/18
Lucy Edit [Dev] video-to-video Edit outfits, objects, faces, or restyle your video - all with maximum detail retention.
video-edit
OK 2025/9/18
Virtual Try-on image-to-image Try on clothes virtually by combining person and clothing images.
fashion try-on virtual-try-on
OK 2025/9/18
Texture Transform image-to-image Transform objects with different surface textures like marble, wood, or fabric.
texture-transform
OK 2025/9/18
Relighting image-to-image Adjust and enhance images with different lighting styles.
relighting
OK 2025/9/18
Style Transfer image-to-image Apply artistic styles like impressionism, cubism, or surrealism to your images.
style-transfer
OK 2025/9/18
Photo Restoration image-to-image Restore old or damaged photos by fixing colors, scratches, and resolution.
photo-restoration image-enhance
OK 2025/9/18
Portrait Enhance image-to-image Enhance and refine portrait photos with improved clarity and detail.
image-edit enhancement
OK 2025/9/18
Photography Effects image-to-image Apply diverse photography styles and effects to transform your images.
style-transfer photography
OK 2025/9/18
Perspective Change image-to-image Easily adjust the perspective of any image to different angles.
change-angle perspective
OK 2025/9/18
Object Removal image-to-image Remove unwanted objects seamlessly from any image.
remove object-removal
OK 2025/9/18
Headshot Generator image-to-image Generate professional headshot photos with customizable backgrounds.
headshot profile-photo
OK 2025/9/18
Hair Change image-to-image Change hairstyles and hair colors in photos realistically.
hair-edit style-change
OK 2025/9/18
Expression Change image-to-image Change facial expressions in photos with realistic results.
face-edit expression-change
OK 2025/9/18
City Teleport image-to-image Place a person’s photo into iconic cities worldwide.
city-teleport backgroundswap
OK 2025/9/18
Age Modify image-to-image Modify a face to look younger or older while keeping identity realistic.
age-transformation face-editing
OK 2025/9/18
Makeup Changer image-to-image Apply realistic makeup styles with adjustable intensity.
makeup transform
OK 2025/9/18
Qwen Image Edit image-to-image Inpainting Endpoint for the Qwen Edit Image editing model.
image-to-image inpainting qwen-image
OK 2025/9/17
Wan 2.2 VACE Fun A14B video-to-video VACE Fun for Wan 2.2 A14B from Alibaba-PAI OK 2025/9/17
Wan 2.2 VACE Fun A14B video-to-video VACE Fun for Wan 2.2 A14B from Alibaba-PAI OK 2025/9/17
Wan 2.2 VACE Fun A14B video-to-video VACE Fun for Wan 2.2 A14B from Alibaba-PAI OK 2025/9/17
Wan 2.2 VACE Fun A14B video-to-video VACE Fun for Wan 2.2 A14B from Alibaba-PAI OK 2025/9/17
Wan 2.2 VACE Fun A14B video-to-video VACE Fun for Wan 2.2 A14B from Alibaba-PAI OK 2025/9/17
Isaac 0.1 [OpenAI Compatible Endpoint] vision OpenAI spec compatible endpoint of Isaac-01 which is a multimodal vision-language model from Perceptron for various vision language tasks.
multimodal vision
OK 2025/9/17
Isaac 0.1 vision Isaac-01 is a multimodal vision-language model from Perceptron for various vision language tasks.
multimodal vision
OK 2025/9/17
FLUX.1 SRPO [dev] image-to-image FLUX.1 SRPO [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use. OK 2025/9/15
FLUX.1 SRPO [dev] text-to-image FLUX.1 SRPO [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use. OK 2025/9/15
FLUX.1 SRPO [dev] image-to-image FLUX.1 SRPO [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use. OK 2025/9/15
FLUX.1 SRPO [dev] text-to-image FLUX.1 SRPO [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use. OK 2025/9/15
Pshuman image-to-3d Use the 6D pose estimation capabilities of PSHuman to generate 3D files from single image.
image-to-3D
OK 2025/9/13
Kling TTS text-to-speech Generate speech from text prompts and different voices using the Kling TTS model, which leverages advanced AI techniques to create high-quality text-to-speech.
audio
OK 2025/9/13
Kling AI Avatar image-to-video Kling AI Avatar Standard: Endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters
stylized transform
OK 2025/9/13
Kling AI Avatar Pro image-to-video Kling AI Avatar Pro: The premium endpoint for creating avatar videos with realistic humans, animals, cartoons, or stylized characters
stylized transform
OK 2025/9/13
MiniMax (Hailuo AI) Music v1.5 text-to-audio Generate music from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality, diverse musical compositions.
music
OK 2025/9/11
Decart Lucy 14b image-to-video Lucy-14B delivers lightning fast performance that redefines what's possible with image-to-video AI OK 2025/9/10
Qwen Image Edit Lora image-to-image LoRA inference endpoint for the Qwen Image Editing model.
image-to-image image-editing lora
OK 2025/9/10
Stable Audio 2.5 audio-to-audio Generate high quality music and sound effects using Stable Audio 2.5 from StabilityAI
audio
OK 2025/9/10
Stable Audio 2.5 text-to-audio Generate high quality music and sound effects using Stable Audio 2.5 from StabilityAI
audio
OK 2025/9/10
Stable Audio 25 audio-to-audio Generate high quality music and sound effects using Stable Audio 2.5 from StabilityAI
audio
OK 2025/9/10
Hunyuan Image text-to-image Use the amazing capabilities of hunyuan image 2.1 to generate images that express the feelings of your text.
text-to-image
OK 2025/9/9
Elevenlabs text-to-audio Generate realistic audio dialogues using Eleven-v3 from ElevenLabs.
audio
OK 2025/9/9
Vidu image-to-image Vidu Reference-to-Image creates images by using a reference images and combining them with a prompt.
images-to-image
OK 2025/9/9
Bytedance Seedream v4 Edit image-to-image A new-generation image creation model ByteDance, Seedream 4.0 integrates image generation and image editing capabilities into a single, unified architecture.
stylized transform editing
OK 2025/9/9
Bytedance Seedream v4 text-to-image A new-generation image creation model ByteDance, Seedream 4.0 integrates image generation and image editing capabilities into a single, unified architecture.
stylized transform
OK 2025/9/9
Hunyuan Video Foley video-to-video Use the capabilities of the hunyuan foley model to bring life to your videos by adding sound effect to them.
video-to-video add-sound
OK 2025/9/8
Chatterbox text-to-speech Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.
text-to-speech multilingual
OK 2025/9/4
Wan image-to-image Wan 2.2's 14B model edit high-resolution, photorealistic images with powerful prompt understanding and fine-grained visual detail
image-to-image
OK 2025/9/3
Elevenlabs text-to-audio Generate sound effects using ElevenLabs advanced sound effects model.
sound
OK 2025/9/2
Sync Lipsync video-to-video Generate high-quality realistic lipsync animations from audio while preserving unique details like natural teeth and unique facial features using the state-of-the-art Sync Lipsync 2 Pro model.
animation lip sync high-quality
OK 2025/9/2
Bytedance image-to-video Seedance lite reference-to-video allows the use of 1 to 4 images as reference to create a high-quality video.
reference-to-video image-to-video
OK 2025/9/1
Avatars Text to Video text-to-video High-quality avatar videos that feel real, generated from your text OK 2025/9/1
Avatars Audio to Video audio-to-video High-quality avatar videos that feel real, generated from your audio OK 2025/9/1
Uso image-to-image Use USO to perform subject driven generations using reference image.
image-to-image
OK 2025/8/30
Wan Ati image-to-video WAN-ATI is a controllable video generation model that uses trajectory instructions to guide object, local, and camera motion, enabling precise and flexible image-to-video creation. OK 2025/8/29
Decart image-to-video Lucy-5B is a model that can create 5-second I2V videos in under 5 seconds, achieving >1x RTF end-to-end OK 2025/8/28
Wan 2.2 Fun Control video-to-video Generate pose or depth controlled video using Alibaba-PAI's Wan 2.2 Fun
wan pose depth
OK 2025/8/28
VibeVoice 7B text-to-speech Generate long, expressive multi-voice speech using Microsoft's powerful TTS
text-to-speech multi-speaker podcast
OK 2025/8/27
VibeVoice 1.5B text-to-speech Generate long, expressive multi-voice speech using Microsoft's powerful TTS
text-to-speech multi-speaker podcast
OK 2025/8/27
Wan-2.2 Speech-to-Video 14B audio-to-video Wan-S2V is a video model that generates high-quality videos from static images and audio, with realistic facial expressions, body movements, and professional camera work for film and television applications
audio-to-video talking-head
OK 2025/8/27
Video video-to-video Upscale videos up to 8K output resolution. Trained on fully licensed and commercially safe data.
video-upscaling upscale
OK 2025/8/26
Gemini 2.5 Flash Image image-to-image Gemini 2.5 Flash Image is Google's state-of-the-art image generation and editing model
image-editing
OK 2025/8/26
Gemini 2.5 Flash Image text-to-image Nano Banana is Google's state-of-the-art image generation and editing model
text-to-image
OK 2025/8/26
Qwen Image image-to-image Qwen-Image (Image-to-Image) transforms and edits input images with high fidelity, enabling precise style transfer, enhancement, and creative modification.
image-to-image
OK 2025/8/25
Sonauto V2 audio-to-audio Extend an existing song
music text-to-music text-to-audio
OK 2025/8/23
Sonauto V2 text-to-audio Replace sections of an existing audio with newly generated content
music text-to-music text-to-audio
OK 2025/8/23
Sonauto V2 text-to-audio Create full songs in any style
music text-to-music text-to-audio
OK 2025/8/23
Pixverse image-to-video Create seamless transition between images using PixVerse v5
stylized transform
OK 2025/8/23
Pixverse image-to-video Generate high quality video clips with different effects using PixVerse v5
image-to-video
OK 2025/8/23
Pixverse v5 Image to Video image-to-video Generate high quality video clips from text and image prompts using PixVerse v5
stylized transform
OK 2025/8/23
Pixverse text-to-video Generate high quality video clips from text and image prompts using PixVerse v5 OK 2025/8/23
Infinitalk text-to-video Infinitalk model generates a talking avatar video from a text and audio file. The avatar lip-syncs to the provided audio with natural facial expressions. OK 2025/8/22
Infinitalk video-to-video Infinitalk model generates a talking avatar video from an image and audio file. The avatar lip-syncs to the provided audio with natural facial expressions.
stylized transform
OK 2025/8/21
Elevenlabs text-to-audio Generate text-to-speech audio using Eleven-v3 from ElevenLabs.
audio
OK 2025/8/20
Reimagine image-to-image Reimagine uses a structure reference for generating new images while preserving the structure of an input image, guided by text prompts. Perfect for transforming sketches, illustrations, or photos into new illustrations. Trained exclusively on licensed data
bria
OK 2025/8/20
Nano Banana image-to-image Google's state-of-the-art image generation and editing model
image-editing
OK 2025/8/19
Nano Banana text-to-image Google's state-of-the-art image generation and editing model
image-generation
OK 2025/8/19
Nextstep 1 image-to-image Endpoint for NextStep-1 Autoregressive Image Editing model. OK 2025/8/19
Qwen Image Edit image-to-image Endpoint for Qwen's Image Editing model. Has superior text editing capabilities.
image-editing image-to-image high-quality-text
OK 2025/8/18
Mirelo SFX video-to-audio Generate synced sounds for any video, and return the new sound track (like MMAudio)
sfx
OK 2025/8/15
Mirelo SFX video-to-video Generate synced sounds for any video, and return it with its new sound track (like MMAudio)
video-to-video sfx
OK 2025/8/14
Stable Avatar audio-to-video Stable Avatar generates audio-driven video avatars up to five minutes long
stable-avatar talking-head audio-to-video
OK 2025/8/14
Marey Realism V1.5 video-to-video Ideal for matching human movement. Your input video determines human poses, gestures, and body movements that will appear in the generated video. OK 2025/8/14
Marey Realism V1.5 video-to-video Pull motion from a reference video and apply it to new subjects or scenes. OK 2025/8/14
Marey Realism V1.5 image-to-video Generate a video starting from an image as the first frame with Marey, a generative video model trained exclusively on fully licensed data. OK 2025/8/14
Qwen Image Trainer training Qwen Image LoRA training
lora personalization
OK 2025/8/14
Marey Realism V1.5 text-to-video Generate a video from a text prompt with Marey, a generative video model trained exclusively on fully licensed data. OK 2025/8/14
EchoMimic V3 audio-to-video EchoMimic V3 generates a talking avatar model from a picture, audio and text prompt.
echomimic talking-head audio-to-video
OK 2025/8/13
Any LLM llm Use any large language model from our selected catalogue (powered by OpenRouter)
chat claude gpt streaming
Deprecated 2025/8/13
Luma Dream Machine image-to-video Generate video clips from your images using Luma Dream Machine v1.5
motion transformation
Deprecated 2025/8/13
Any VLM vision Use any vision language model from our selected catalogue (powered by OpenRouter)
multimodal vision streaming
Deprecated 2025/8/13
Luma Dream Machine text-to-video Generate video clips from your prompts using Luma Dream Machine v1.5
motion transformation
Deprecated 2025/8/13
PlayAI Text-to-Speech Dialog text-to-audio Generate natural-sounding multi-speaker dialogues, and audio. Perfect for expressive outputs, storytelling, games, animations, and interactive media.
audio
Deprecated 2025/8/13
PlayAI Text-to-Speech v3 text-to-speech Blazing-fast text-to-speech. Generate audio with improved emotional tones and extensive multilingual support. Ideal for high-volume processing and efficient workflows. Deprecated 2025/8/13
FLUX.1 [pro] Canny Fine-tuned image-to-image Utilize Flux.1 [pro] Controlnet with a fine-tuned LoRA to generate high-quality images with precise control over composition, style, and structure through advanced edge detection and guidance mechanisms.
controlnet detection editing composition
Deprecated 2025/8/13
FLUX.1 [pro] Depth image-to-image Generate high-quality images from depth maps using Flux.1 [pro] depth estimation model. The model produces accurate depth representations for scene understanding and 3D visualization.
depth utility composition
Deprecated 2025/8/13
Train Flux LoRAs For Pro Models training FLUX LoRA for Pro endpoints.
lora personalization
Deprecated 2025/8/13
FLUX.1 [pro] Depth Fine-tuned image-to-image Generate high-quality images from depth maps using Flux.1 [pro] depth estimation model with a fine-tuned LoRA. The model produces accurate depth representations for scene understanding and 3D visualization.
depth utility composition
Deprecated 2025/8/13
FLUX.1 [pro] Canny image-to-image Utilize Flux.1 [pro] Controlnet to generate high-quality images with precise control over composition, style, and structure through advanced edge detection and guidance mechanisms.
controlnet detection editing composition
Deprecated 2025/8/13
ElevenLabs Sound Effects text-to-audio Generate sound effects using ElevenLabs advanced sound effects model.
sound
Deprecated 2025/8/13
Easel AI Advanced Face Swap image-to-image Swap faces of one or two people at once, while preserving user and scene details!
face swap utility editing
Deprecated 2025/8/13
Tavus LipSync v2 video-to-video Generate lip sync using Tavus' state-of-the-art model for high-quality synchronization. Deprecated 2025/8/13
gpt-image-1 image-to-image OpenAI's latest image generation and editing model: gpt-1-image. Currently powered with bring-your-own-key. Deprecated 2025/8/13
gpt-image-1 text-to-image OpenAI's latest image generation and editing model: gpt-1-image. Currently powered with bring-your-own-key. Deprecated 2025/8/13
Easel Avatar text-to-image Create scenes with one or two people using just selfies and text prompt (without LoRAs)
avatars loras image-generation
Deprecated 2025/8/13
Easel Gifswap image-to-image Swap faces on GIFs
utility editing
Deprecated 2025/8/13
PlayAI Inpaint audio-to-audio A novel way to perform audio editing, ensuring smooth transitions and consistent speaker characteristics for edits.
audio inpaint
Deprecated 2025/8/13
Lipsync video-to-video Realistic lipsync video - optimized for speed, quality, and consistency. Deprecated 2025/8/13
any-llm Enterprise llm Run any large language model with fal, powered by OpenRouter. This endpoint only supports models that do not train on private data. Read more in OpenRouter's Privacy and Logging documentation.
chat claude gpt
Deprecated 2025/8/13
Fashion Photoshoot image-to-image Instant fashion photoshoot with a selfie and an outfit
image-to-image
Deprecated 2025/8/13
Fashion Try On image-to-image Instant fashion try on with a full-body pic and an outfit Deprecated 2025/8/13
Bytedance image-to-video Transform your images into stylized videos using this workflow.
image-to-video effects
OK 2025/8/12
Ffmpeg Api video-to-video Use ffmpeg capabilities to merge 2 or more videos. OK 2025/8/12
Minimax text-to-speech Generate speech from text prompts and different voices using the MiniMax Speech-02 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.
speech
OK 2025/8/11
Minimax text-to-speech Generate fast speech from text prompts and different voices using the MiniMax Speech-02 Turbo model, which leverages advanced AI techniques to create high-quality text-to-speech.
text-to-speech
OK 2025/8/11
Wan 2.2 14B Image Trainer training Wan 2.2 text to image LoRA trainer. Fine-tune Wan 2.2 for subjects and styles with unprecedented detail.
lora personalization
OK 2025/8/11
Ideogram V3 Character Edit image-to-image Modify consistent characters while preserving their core identity. Edit poses, expressions, or clothing without losing recognizable character features
character-consistency
OK 2025/8/7
Ideogram V3 Character image-to-image Generate consistent character appearances across multiple images. Maintain facial features, proportions, and distinctive traits for cohesive storytelling and branding
character-consistency
OK 2025/8/7
Ideogram V3 Character Remix image-to-image Transform your consistent character into different art styles, settings, or scenarios while maintaining their distinctive appearance and identity
character-consistency
OK 2025/8/7
Wan-2.2 Text-to-Video A14B with LoRAs text-to-video Wan-2.2 text-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts. This endpoint supports LoRAs made for Wan 2.2. OK 2025/8/7
Wan v2.2 A14B Image-to-Video A14B with LoRAs image-to-video Wan-2.2 image-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts and images. This endpoint supports LoRAs made for Wan 2.2
image-to-video motion lora
OK 2025/8/7
Wan text-to-video Wan 2.2's 5B distill model produces up to 5 seconds of video 720p at 24FPS with fluid motion and powerful prompt understanding OK 2025/8/6
Minimax image-to-video Create blazing fast and economical videos with MiniMax Hailuo-02 Image To Video API at 512p resolution
stylized transform
OK 2025/8/6
Bytedance text-to-image Dreamina showcases superior picture effects, with significant improvements in picture aesthetics, precise and diverse styles, and rich details.
text-to-image
OK 2025/8/6
Wan text-to-video Wan 2.2's 5B FastVideo model produces up to 5 seconds of video 720p at 24FPS with fluid motion and powerful prompt understanding
text to video motion
OK 2025/8/5
Wan v2.2 A14B Text-to-Image A14B with LoRAs text-to-image Wan 2.2's 14B model with LoRA support generates high-fidelity images with enhanced prompt alignment, style adaptability. OK 2025/8/5
Wan text-to-image Wan 2.2's 5B model generates high-resolution, photorealistic images with powerful prompt understanding and fine-grained visual detail OK 2025/8/5
Wan text-to-image Wan 2.2's 14B model generates high-resolution, photorealistic images with powerful prompt understanding and fine-grained visual detail OK 2025/8/5
Qwen Image text-to-image Qwen-Image is an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and precise image editing.
text-to-image
OK 2025/8/4
Wan video-to-video Wan-2.2 video-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts and source videos. OK 2025/8/2
Train Flux Krea LoRA training Train styles, people and other subjects at blazing speeds using the FLUX.1 Krea [dev] base model.
lora personalization
OK 2025/8/1
Flux Krea Lora text-to-image Super fast endpoint for the FLUX.1 [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.
lora personalization
OK 2025/8/1
FLUX.1 Krea [dev] Inpainting with LoRAs image-to-image Super fast endpoint for the FLUX.1 [dev] inpainting model with LoRA support, enabling rapid and high-quality image inpaingting using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.
lora personalization
OK 2025/8/1
FLUX.1 Krea [dev] with LoRAs text-to-image Super fast endpoint for the FLUX.1 [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.
lora personalization
OK 2025/8/1
FLUX.1 Krea [dev] with LoRAs image-to-image FLUX LoRA Image-to-Image is a high-performance endpoint that transforms existing images using FLUX models, leveraging LoRA adaptations to enable rapid and precise image style transfer, modifications, and artistic variations.
lora style transfer
OK 2025/8/1
Veo3 image-to-video Veo 3 is the latest state-of-the art video generation model from Google DeepMind OK 2025/8/1
Wan image-to-video Wan-2.2 Turbo image-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts. OK 2025/7/31
Wan text-to-video Wan-2.2 turbo text-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts.
text to video motion
OK 2025/7/31
FLUX.1 Krea [dev] image-to-image FLUX.1 Krea [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use. OK 2025/7/30
FLUX.1 Krea [dev] Redux image-to-image FLUX.1 Krea [dev] Redux is a high-performance endpoint for the FLUX.1 Krea [dev] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities. OK 2025/7/30
FLUX.1 Krea [dev] text-to-image FLUX.1 Krea [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use. OK 2025/7/30
FLUX.1 Krea [dev] image-to-image FLUX.1 Krea [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use. OK 2025/7/30
FLUX.1 Krea [dev] Redux image-to-image FLUX.1 Krea [dev] Redux is a high-performance endpoint for the FLUX.1 Krea [dev] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities. OK 2025/7/30
FLUX.1 Krea [dev] text-to-image FLUX.1 Krea [dev] is a 12 billion parameter flow transformer that generates high-quality images from text with incredible aesthetics. It is suitable for personal and commercial use. OK 2025/7/30
Wan v2.2 5B image-to-video Wan 2.2's 5B model produces up to 5 seconds of video 720p at 24FPS with fluid motion and powerful prompt understanding OK 2025/7/30
Flux Kontext Lora image-to-image Fast inpainting endpoint for the FLUX.1 Kontext [dev] model with LoRA support, enabling rapid and high-quality image inpainting with reference images, while using pre-trained LoRA adaptations for specific styles, brand identities, and product-specific outputs.
image-editing image-inpainting image-to-image
OK 2025/7/29
Wan v2.2 5B text-to-video Wan 2.2's 5B model produces up to 5 seconds of video 720p at 24FPS with fluid motion and powerful prompt understanding OK 2025/7/28
Wan-2.2 Text-to-Video A14B text-to-video Wan-2.2 text-to-video is a video model that generates high-quality videos with high visual quality and motion diversity from text prompts.
text to video motion
OK 2025/7/28
Wan v2.2 A14B image-to-video fal-ai/wan/v2.2-A14B/image-to-video OK 2025/7/28
Hunyuan World image-to-image Hunyuan World 1.0 turns a single image into a panorama or a 3D world. It creates realistic scenes from the image, allowing you to explore and view it from different angles. OK 2025/7/28
Hunyuan World image-to-3d Hunyuan World 1.0 turns a single image into a panorama or a 3D world. It creates realistic scenes from the image, allowing you to explore and view it from different angles. OK 2025/7/28
NSFW Checker vision Predict whether an image is NSFW or SFW.
filter safety utility
OK 2025/7/28
OmniHuman image-to-video OmniHuman generates video using an image of a human figure paired with an audio file. It produces vivid, high-quality videos where the character’s emotions and movements maintain a strong correlation with the audio.
image-to-video lipsync
OK 2025/7/27
Sky Raccoon text-to-image Generate images from a text prompt.
text-to-image
OK 2025/7/26
Image Editing image-to-image Retouch photos of faces. Remove blemishes and improve the skin. OK 2025/7/24
Hidream E1 1 image-to-image Edit images with natural language OK 2025/7/23
LTX-Video 13B 0.9.8 Distilled video-to-video Extend videos using LTX Video-0.9.8 13B Distilled and custom LoRA
ltx-video extend
OK 2025/7/23
RIFE video-to-video Interpolate videos with RIFE - Real-Time Intermediate Flow Estimation
interpolation
OK 2025/7/22
RIFE image-to-image Interpolate images with RIFE - Real-Time Intermediate Flow Estimation
interpolation
OK 2025/7/22
FILM video-to-video Interpolate videos with FILM - Frame Interpolation for Large Motion
interpolation
OK 2025/7/22
FILM image-to-image Interpolate images with FILM - Frame Interpolation for Large Motion
interpolation
OK 2025/7/22
MiniMax Voice Design text-to-speech Design a personalized voice from a text description, and generate speech from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality text-to-speech.
speech
OK 2025/7/18
Luma Ray 2 Flash Modify video-to-video Ray2 Flash Modify is a video generative model capable of restyling or retexturing the entire shot, from turning live-action into CG or stylized animation, to changing wardrobe, props, or the overall aesthetic and swap environments or time periods, giving you control over background, location, or even weather.
modify restyle
OK 2025/7/17
LTX-Video 13B 0.9.8 Distilled image-to-video Generate long videos from prompts and images using LTX Video-0.9.8 13B Distilled and custom LoRA
video ltx-video image-to-video
OK 2025/7/17
LTX-Video 13B 0.9.8 Distilled text-to-video Generate long videos from prompts using LTX Video-0.9.8 13B Distilled and custom LoRA
video ltx-video text-to-video
OK 2025/7/17
LTX-Video 13B 0.9.8 Distilled video-to-video Generate long videos from prompts, images, and videos using LTX Video-0.9.8 13B Distilled and custom LoRA
video ltx-video video-to-video multicondition-to-video image-to-video
OK 2025/7/17
Calligrapher image-to-image Use the text and font retaining capabilities of calligrapher to modify texts on your books, clothes and many more.
image-to-image
OK 2025/7/12
Veo 3 Fast [Image to Video] image-to-video Now with a 50% price drop. Generate videos from your image prompts using Veo 3 fast. OK 2025/7/9
Veo 3 Fast text-to-video Faster and more cost effective version of Google's Veo 3! OK 2025/7/9
Ffmpeg Api json Get EBU R128 loudness normalization from audio files using FFmpeg API.
ffmpeg
OK 2025/7/8
Vidu image-to-video Generate video clips from your multiple image references using Vidu Q1
stylized transform
OK 2025/7/8
Bria image-to-image Structure Reference allows generating new images while preserving the structure of an input image, guided by text prompts. Perfect for transforming sketches, illustrations, or photos into new illustrations. Trained exclusively on licensed data for safe and risk-free commercial use. OK 2025/7/8
Pixverse video-to-video Add immersive sound effects and background music to your videos using PixVerse sound effects generation
audio utility
OK 2025/7/7
Image Editing image-to-image Add details to faces, enhance face features, remove blur.
stylized transform realism
OK 2025/7/7
ThinkSound video-to-video Generate realistic audio from a video with an optional text prompt
audio-generation video-to-audio
OK 2025/7/2
ThinkSound video-to-video Generate realistic audio for a video with an optional text prompt and combine
audio-generation video-to-audio
OK 2025/7/1
Post Processing image-to-image Add a darkening vignette effect around the edges of the image with adjustable strength
stylized transform
OK 2025/7/1
Post Processing image-to-image Apply solarization effect by inverting pixel values above a threshold
stylized transform
OK 2025/7/1
Post Processing image-to-image Apply sharpening effects with three modes: basic unsharp mask, smart sharpening with edge preservation, and Contrast Adaptive Sharpening (CAS).
stylized transform
OK 2025/7/1
Post Processing image-to-image Apply a parabolic distortion effect with configurable coefficient and vertex position.
stylized transform
OK 2025/7/1
Post Processing image-to-image Apply film grain effect with different styles (modern, analog, kodak, fuji, cinematic, newspaper) and customizable intensity and scale
stylized transform
OK 2025/7/1
Post Processing image-to-image Apply dodge and burn effects with multiple modes and adjustable intensity.
stylized transform
OK 2025/7/1
Post Processing image-to-image Blend two images together using smooth linear interpolation with a configurable blend factor.
stylized transform
OK 2025/7/1
Post Processing image-to-image Reduce color saturation using different methods (luminance Rec.709, luminance Rec.601, average, lightness) with adjustable factor.
stylized transform
OK 2025/7/1
Post Processing image-to-image Apply various color tints (sepia, red, green, blue, cyan, magenta, yellow, purple, orange, warm, cool, lime, navy, vintage, rose, teal, maroon, peach, lavender, olive) with adjustable strength.
stylized transform
OK 2025/7/1
Post Processing image-to-image Adjust color temperature, brightness, contrast, saturation, and gamma values for color correction.
stylized transform
OK 2025/7/1
Post Processing image-to-image Create chromatic aberration by shifting red, green, and blue channels horizontally or vertically with customizable shift amounts.
stylized transform
OK 2025/7/1
Post Processing image-to-image Apply Gaussian or Kuwahara blur effects with adjustable radius and sigma parameters
stylized transform
OK 2025/7/1
Pixverse video-to-video PixVerse Extend model is a video extending tool for your videos using with high-quality video extending techniques
utility editing
OK 2025/6/30
Pixverse video-to-video PixVerse Extend model is a video extending tool for your videos using with high-quality video extending techniques
utility editing
OK 2025/6/30
Pixverse video-to-video Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization with PixVerse Lipsync model
animation lip sync
OK 2025/6/30
Image Editing image-to-image Generate YouTube thumbnails with custom text
stylized transform
OK 2025/6/30
Video video-to-video Automatically remove backgrounds from videos -perfect for creating clean, professional content without a green screen.
background-removal
OK 2025/6/30
Luma Ray 2 Modify video-to-video Ray2 Modify is a video generative model capable of restyling or retexturing the entire shot, from turning live-action into CG or stylized animation, to changing wardrobe, props, or the overall aesthetic and swap environments or time periods, giving you control over background, location, or even weather.
modify restyle
OK 2025/6/28
Topaz image-to-image Use the powerful and accurate topaz image enhancer to enhance your images.
image-to-image
OK 2025/6/27
Bytedance image-to-image SeedEdit 3.0 is an image editing model independently developed by ByteDance. It excels in accurately following editing instructions and effectively preserving image content, especially excelling in handling real images
image-editing image-to-image
Deprecated 2025/6/27
Flux Kontext Trainer training LoRA trainer for FLUX.1 Kontext [dev] OK 2025/6/26
Image Editing image-to-image Transform your character's hair into broccoli style while keeping the original characters likeness
stylized transform
OK 2025/6/26
Image Editing image-to-image Transform your photos into wojak style while keeping the original characters likeness
stylized transform
OK 2025/6/26
Image Editing image-to-image Transform your photos into cool plushies while keeping the original characters likeness
stylized transform
OK 2025/6/26
Flux Kontext Lora text-to-image Super fast text-to-image endpoint for the FLUX.1 Kontext [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.
text-to-image
OK 2025/6/25
Flux Kontext Lora image-to-image Fast endpoint for the FLUX.1 Kontext [dev] model with LoRA support, enabling rapid and high-quality image editing using pre-trained LoRA adaptations for specific styles, brand identities, and product-specific outputs.
image-editing image-to-image
OK 2025/6/25
Omnigen V2 text-to-image OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It can be used for various tasks such as Image Editing, Personalized Image Generation, Virtual Try-On, Multi Person Generation and more!
multimodal editing try-on
OK 2025/6/25
FASHN Virtual Try-On V1.6 image-to-image FASHN v1.6 delivers precise virtual try-on capabilities, accurately rendering garment details like text and patterns at 864x1296 resolution from both on-model and flat-lay photo references.
try-on fashion clothing
OK 2025/6/24
Ai Avatar image-to-video MultiTalk model generates a talking avatar video from an image and text. Converts text to speech automatically, then generates the avatar speaking with lip-sync.
stylized transform
OK 2025/6/23
Ai Avatar image-to-video MultiTalk model generates a talking avatar video from an image and audio file. The avatar lip-syncs to the provided audio with natural facial expressions.
stylized transform
OK 2025/6/23
Ai Avatar image-to-video MultiTalk model generates a multi-person conversation video from an image and text inputs. Converts text to speech for each person, generating a realistic conversation scene.
stylized transform
OK 2025/6/23
Ai Avatar image-to-video MultiTalk model generates a multi-person conversation video from an image and audio files. Creates a realistic scene where multiple people speak in sequence.
stylized transform
OK 2025/6/23
Video Understanding vision A video understanding model to analyze video content and answer questions about what's happening in the video based on user prompts.
utility vision
OK 2025/6/20
Wan VACE 14B video-to-video VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.
reframe
OK 2025/6/18
Wan VACE 14B video-to-video VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.
image-to-video video-to-video text-to-video
OK 2025/6/18
Wan VACE 14B video-to-video VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.
image-to-video video-to-video text-to-video
OK 2025/6/18
Wan VACE 14B video-to-video VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.
image-to-video video-to-video text-to-video
OK 2025/6/18
Wan VACE 14B video-to-video VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.
image-to-video video-to-video text-to-video
OK 2025/6/18
Chain Of Zoom image-to-image Extreme Super-Resolution via Scale Autoregression and Preference Alignment OK 2025/6/18
Tripo3D image-to-3d State of the art Multiview to 3D Object generation. Generate 3D models from multiple images!
stylized multiview
OK 2025/6/18
MiniMax Hailuo 02 [Standard] (Image to Video) image-to-video MiniMax Hailuo-02 Image To Video API (Standard, 768p, 512p): Advanced image-to-video generation model with 768p and 512p resolutions OK 2025/6/18
MiniMax Hailuo 02 [Pro] (Image to Video) image-to-video MiniMax Hailuo-02 Image To Video API (Pro, 1080p): Advanced image-to-video generation model with 1080p resolution OK 2025/6/18
MiniMax Hailuo 02 [Pro] (Text to Video) text-to-video MiniMax Hailuo-02 Text To Video API (Pro, 1080p): Advanced video generation model with 1080p resolution OK 2025/6/18
MiniMax Hailuo 02 [Standard] (Text to Video) text-to-video MiniMax Hailuo-02 Text To Video API (Standard, 768p): Advanced video generation model with 768p resolution OK 2025/6/18
PASD image-to-image Pixel-Aware Diffusion Model for Realistic Image Super-Resolution and Personalized Stylization
utility editing
OK 2025/6/17
Bria 3.2 Text-to-Image text-to-image Bria’s Text-to-Image model, trained exclusively on licensed data for safe and risk-free commercial use. Excels in Text-Rendering and Aesthetics.
image generation
OK 2025/6/17
Object Removal image-to-image Removes box-selected objects and their visual effects, seamlessly reconstructing the scene with contextually appropriate content.
utility editing
OK 2025/6/16
Object Removal image-to-image Removes mask-selected objects and their visual effects, seamlessly reconstructing the scene with contextually appropriate content.
utility editing
OK 2025/6/16
Object Removal image-to-image Removes objects and their visual effects using natural language, replacing them with contextually appropriate content
utility editing
OK 2025/6/16
Seedance 1.0 Pro text-to-video Seedance 1.0 Pro, a high quality video generation model developed by Bytedance. OK 2025/6/16
Seedance 1.0 Pro image-to-video Seedance 1.0 Pro, a high quality video generation model developed by Bytedance. OK 2025/6/16
DWPose Pose Prediction video-to-video Predict poses from videos.
pose utility
OK 2025/6/15
Hunyuan 3D 2.1 image-to-3d Hunyuan3D-2.1 is a scalable 3D asset creation system that advances state-of-the-art 3D generation through Physically-Based Rendering (PBR).
image-to-3d
OK 2025/6/14
Seedance 1.0 Lite image-to-video Seedance 1.0 Lite OK 2025/6/13
Seedance 1.0 Lite text-to-video Seedance 1.0 Lite OK 2025/6/13
Recraft image-to-image Converts a given raster image to SVG format using Recraft model.
stylized transform
OK 2025/6/12
Imagen 4 text-to-image Google’s highest quality image generation model OK 2025/6/12
Wan-2.1 LoRA Trainer training Train custom LoRAs for Wan-2.1 T2V 1.3B
lora training
OK 2025/6/11
Wan-2.1 LoRA Trainer training Train custom LoRAs for Wan-2.1 T2V 14B
lora training
OK 2025/6/11
Wan-2.1 LoRA Trainer training Train custom LoRAs for Wan-2.1 I2V 720P
lora training
OK 2025/6/11
Wan-2.1 LoRA Trainer training Train custom LoRAs for Wan-2.1 FLF2V 720P
lora training
OK 2025/6/11
Bytedance text-to-image Seedream 3.0 is a bilingual (Chinese and English) text-to-image model that excels at text-to-image generation. OK 2025/6/10
Ffmpeg Api image-to-image ffmpeg endpoint for first, middle and last frame extraction from videos
utility editing
OK 2025/6/9
Ffmpeg Api Merge Audio-Video video-to-video Merge videos with standalone audio files or audio from video files.
ffmpeg
OK 2025/6/9
Luma Photon image-to-image Edit images from your prompts using Luma Photon. Photon is the most creative, personalizable, and intelligent visual models for creatives, bringing a step-function change in the cost of high-quality image generation.
image-to-image
OK 2025/6/8
Luma Photon image-to-image Edit images from your prompts using Luma Photon. Photon is the most creative, personalizable, and intelligent visual models for creatives, bringing a step-function change in the cost of high-quality image generation.
image-to-image
OK 2025/6/8
Veo 3 text-to-video Veo 3 by Google, the most advanced AI video generation model in the world. With sound on! OK 2025/6/5
Image Editing image-to-image The reframe endpoint intelligently adjusts an image's aspect ratio while preserving the main subject's position, composition, pose, and perspective
stylized transform
OK 2025/6/5
Wan Vace 1 3b video-to-video Vace a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.
video-to-video
OK 2025/6/4
Image Editing image-to-image Transform any person into their baby version, while preserving the original pose and expression with childlike features.
stylized transform
OK 2025/6/3
Luma Ray 2 Flash Reframe video-to-video Adjust and enhance videos with Ray-2 Reframe. This advanced tool seamlessly reframes videos to your desired aspect ratio, intelligently inpainting missing regions to ensure realistic visuals and coherent motion, delivering exceptional quality and creative flexibility.
reframe outpaint flash
OK 2025/6/3
Luma Ray 2 Reframe video-to-video Adjust and enhance videos with Ray-2 Reframe. This advanced tool seamlessly reframes videos to your desired aspect ratio, intelligently inpainting missing regions to ensure realistic visuals and coherent motion, delivering exceptional quality and creative flexibility.
reframe outpaint
OK 2025/6/3
Luma Photon Flash Reframe image-to-image This advanced tool intelligently expands your visuals, seamlessly blending new content to enhance creativity and adaptability, offering unmatched speed and quality for creators at a fraction of the cost.
flash reframe outpainting
OK 2025/6/3
Luma Photon Reframe image-to-image Extend and reframe images with Luma Photon Reframe. This advanced tool intelligently expands your visuals, seamlessly blending new content to enhance creativity and adaptability, offering unmatched personalization and quality for creators at a fraction of the cost.
outpainting reframe
OK 2025/6/3
Chatterboxhd speech-to-speech Transform voices using Resemble AI's Chatterbox. Convert audio to new voices or your own samples, with expressive results and built-in perceptual watermarking. OK 2025/6/2
Chatterboxhd text-to-speech Generate expressive, natural speech with Resemble AI's Chatterbox. Features unique emotion control, instant voice cloning from short audio, and built-in watermarking. OK 2025/6/2
FLUX.1 [schnell] Redux image-to-image FLUX.1 [schnell] Redux is a high-performance endpoint for the FLUX.1 [schnell] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities. OK 2025/6/2
FLUX.1 [dev] Redux image-to-image FLUX.1 [dev] Redux is a high-performance endpoint for the FLUX.1 [dev] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities. OK 2025/6/2
FLUX.1 [dev] image-to-image FLUX.1 [dev] is a 12 billion parameter flow transformer that generates high-quality images from text. It is suitable for personal and commercial use. OK 2025/6/2
FLUX.1 [schnell] text-to-image Fastest inference in the world for the 12 billion parameter FLUX.1 [schnell] text-to-image model. OK 2025/6/2
FLUX.1 [dev] text-to-image FLUX.1 [dev] is a 12 billion parameter flow transformer that generates high-quality images from text. It is suitable for personal and commercial use. OK 2025/6/2
Image Editing image-to-image Remove all text and writing from images while preserving the background and natural appearance.
stylized transform
OK 2025/6/2
Image Editing image-to-image Restore and enhance old or damaged photos by removing imperfections, adding color while preserving the original character and details of the image.
stylized transform
OK 2025/6/2
Chatterbox speech-to-speech Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.
speech-to-speech
OK 2025/6/1
Chatterbox text-to-speech Whether you're working on memes, videos, games, or AI agents, Chatterbox brings your content to life. Use the first tts from resemble ai.
text-to-speech
OK 2025/6/1
Image Editing image-to-image Add realistic weather effects like snowfall, rain, or fog to your photos while maintaining the scene's mood.
stylized transform
OK 2025/5/29
Image Editing image-to-image Transform your photos to any time of day, from golden hour to midnight, with appropriate lighting and atmosphere.
stylized transform
OK 2025/5/29
Image Editing image-to-image Transform your photos into artistic masterpieces inspired by famous styles like Van Gogh's Starry Night or any artistic style you choose.
stylized transform
OK 2025/5/29
Image Editing image-to-image Place your subject in any scene you imagine, from enchanted forests to urban settings, with professional composition and lighting
stylized transform
OK 2025/5/29
Image Editing image-to-image Turn your casual photos into stunning professional studio portraits with perfect lighting and high-end photography style.
stylized transform
OK 2025/5/29
Image Editing image-to-image Remove unwanted objects or people from your photos while seamlessly blending the background.
stylized transform
OK 2025/5/29
Image Editing image-to-image Experiment with different hairstyles, from bald to any style you can imagine, while maintaining natural lighting and realistic results.
stylized transform
OK 2025/5/29
Image Editing image-to-image Enhance facial features with professional retouching while maintaining a natural, realistic look
stylized transform
OK 2025/5/29
Image Editing image-to-image Change facial expressions in photos to any emotion you desire, from smiles to serious looks.
stylized transform
OK 2025/5/29
Image Editing image-to-image Perfect your photos with professional color grading, balanced tones, and vibrant yet natural colors
stylized transform
OK 2025/5/29
Image Editing image-to-image Transform your photos into vibrant cool cartoons with bold outlines and rich colors.
stylized transform
OK 2025/5/29
Image Editing image-to-image Replace your photo's background with any scene you desire, from beach sunsets to urban landscapes, with perfect lighting and shadows
stylized transform
OK 2025/5/29
Image Editing image-to-image See how you or others might look at different ages, from younger to older, while preserving core facial features.
stylized transform
OK 2025/5/29
FLUX.1 Kontext [max] image-to-image Experimental version of FLUX.1 Kontext [max] with multi image handling capabilities OK 2025/5/29
FLUX.1 Kontext [pro] image-to-image Experimental version of FLUX.1 Kontext [pro] with multi image handling capabilities OK 2025/5/29
Hunyuan Avatar image-to-video HunyuanAvatar is a High-Fidelity Audio-Driven Human Animation model for Multiple Characters .
stylized transform
OK 2025/5/29
FLUX.1 Kontext [max] image-to-image FLUX.1 Kontext [max] is a model with greatly improved prompt adherence and typography generation meet premium consistency for editing without compromise on speed. OK 2025/5/29
FLUX.1 Kontext [max] text-to-image FLUX.1 Kontext [max] text-to-image is a new premium model brings maximum performance across all aspects – greatly improved prompt adherence. OK 2025/5/29
Kling 2.1 Master text-to-video Kling 2.1 Master: The premium endpoint for Kling 2.1, designed for top-tier text-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision. OK 2025/5/29
Kling 2.1 Master image-to-video Kling 2.1 Master: The premium endpoint for Kling 2.1, designed for top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.
_marquee-video-model
OK 2025/5/29
Kling 2.1 (pro) image-to-video Kling 2.1 Pro is an advanced endpoint for the Kling 2.1 model, offering professional-grade videos with enhanced visual fidelity, precise camera movements, and dynamic motion control, perfect for cinematic storytelling. OK 2025/5/28
Kling 2.1 (standard) image-to-video Kling 2.1 Standard is a cost-efficient endpoint for the Kling 2.1 model, delivering high-quality image-to-video generation OK 2025/5/28
FLUX.1 Kontext [pro] text-to-image The FLUX.1 Kontext [pro] text-to-image delivers state-of-the-art image generation results with unprecedented prompt following, photorealistic rendering, and flawless typography. OK 2025/5/28
FLUX.1 Kontext [dev] image-to-image Frontier image editing model. OK 2025/5/28
FLUX.1 Kontext [pro] image-to-image FLUX.1 Kontext [pro] handles both text and reference images as inputs, seamlessly enabling targeted, local edits and complex transformations of entire scenes. OK 2025/5/28
Lipsync video-to-video Generate realistic lipsync from any audio using VEED's latest model
lipsync video-to-video avatar
OK 2025/5/28
Avatars text-to-video Generate high-quality videos with UGC-like avatars from text
lipsync text-to-video
OK 2025/5/28
Avatars audio-to-video Generate high-quality videos with UGC-like avatars from audio
lipsync audio-to-video
OK 2025/5/28
Hunyuan Portrait image-to-video HunyuanPortrait is a diffusion-based framework for generating lifelike, temporally consistent portrait animations.
animation lip sync
OK 2025/5/27
Wan VACE 14B video-to-video VACE is a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.
image-to-video video-to-video text-to-video
OK 2025/5/27
Bagel image-to-json Bagel is a 7B parameter multimodal model from Bytedance-Seed that can generate both text and images.
image-to-text vlm
OK 2025/5/21
Bagel image-to-image Bagel is a 7B parameter multimodal model from Bytedance-Seed that can generate both images and text.
image-to-image image-editing
OK 2025/5/21
Bagel text-to-image Bagel is a 7B parameter from Bytedance-Seed multimodal model that can generate both text and images.
text-to-image multimodal
OK 2025/5/21
Lyria2 text-to-audio Lyria 2 is Google's latest music generation model, you can generate any type of music with this model.
music stylized
OK 2025/5/20
Imagen 4 Ultra text-to-image Google’s highest quality image generation model OK 2025/5/20
Imagen 4 text-to-image Google’s highest quality image generation model OK 2025/5/20
Kling 1.6 Elements image-to-video Generate video clips from your multiple image references using Kling 1.6 (standard) OK 2025/5/20
Kling 1.6 Elements image-to-video Generate video clips from your multiple image references using Kling 1.6 (pro) OK 2025/5/20
DreamO text-to-image DreamO is an image customization framework designed to support a wide range of tasks while facilitating seamless integration of multiple conditions.
stylized realism
OK 2025/5/19
LTX Video-0.9.7 13B Distilled video-to-video Extend videos using LTX Video-0.9.7 13B Distilled and custom LoRA
video ltx-video video-to-video extend-video
OK 2025/5/17
LTX Video-0.9.7 13B Distilled video-to-video Generate videos from prompts, images, and videos using LTX Video-0.9.7 13B Distilled and custom LoRA
video ltx-video video-to-video multicondition-to-video image-to-video
OK 2025/5/17
LTX Video-0.9.7 13B Distilled image-to-video Generate videos from prompts and images using LTX Video-0.9.7 13B Distilled and custom LoRA
video ltx-video image-to-video
OK 2025/5/17
LTX Video-0.9.7 13B video-to-video Generate videos from prompts, images, and videos using LTX Video-0.9.7 13B and custom LoRA
video ltx-video video-to-video multicondition-to-video image-to-video
OK 2025/5/17
LTX Video-0.9.7 13B video-to-video Extend videos using LTX Video-0.9.7 13B and custom LoRA
video ltx-video video-to-video extend-video
OK 2025/5/17
LTX Video-0.9.7 13B image-to-video Generate videos from prompts and images using LTX Video-0.9.7 13B and custom LoRA
video ltx-video image-to-video
OK 2025/5/17
LTX Video-0.9.7 13B text-to-video Generate videos from prompts using LTX Video-0.9.7 13B and custom LoRA
video ltx-video text-to-video
OK 2025/5/17
LTX Video-0.9.7 13B Distilled text-to-video Generate videos from prompts using LTX Video-0.9.7 13B Distilled and custom LoRA
video ltx-video text-to-video
OK 2025/5/17
Flux Lora text-to-image Super fast endpoint for the FLUX.1 [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.
lora personalization
OK 2025/5/15
LTX Video-0.9.7 LoRA video-to-video Generate videos from prompts, images, and videos using LTX Video-0.9.7 and custom LoRA
video ltx-video video-to-video multicondition-to-video image-to-video
OK 2025/5/15
LTX Video-0.9.7 LoRA image-to-video Generate videos from prompts and images using LTX Video-0.9.7 and custom LoRA
video ltx-video image-to-video
OK 2025/5/15
LTX Video-0.9.7 LoRA text-to-video Deprecated. Use fal-ai/ltx-video-13b-dev or fal-ai/ltx-video-13b-distilled instead.
video ltx-video text-to-video
Deprecated 2025/5/15
Pixverse image-to-video Create seamless transition between images using PixVerse v4.5
stylized transform
OK 2025/5/15
Pixverse image-to-video Generate fast high quality video clips from text and image prompts using PixVerse v4.5
stylized transform
OK 2025/5/15
Pixverse image-to-video Generate high quality video clips from text and image prompts using PixVerse v4.5
stylized transform
OK 2025/5/15
Pixverse text-to-video Generate high quality and fast video clips from text and image prompts using PixVerse v4.5 fast
stylized transform
OK 2025/5/15
Pixverse text-to-video Generate high quality video clips from text and image prompts using PixVerse v4.5
stylized transform
OK 2025/5/15
Pixverse image-to-video Generate high quality video clips with different effects using PixVerse v4.5
image-to-video
OK 2025/5/15
Hunyuan Custom image-to-video HunyuanCustom revolutionizes video generation with unmatched identity consistency across multiple input types. Its innovative fusion modules and alignment networks outperform competitors, maintaining subject integrity while responding flexibly to text, image, audio, and video conditions.
image-to-video
OK 2025/5/14
Framepack F1 image-to-video Framepack is an efficient Image-to-video model that autoregressively generates videos.
image to video motion
OK 2025/5/13
ACE-Step audio-to-audio Extend the beginning or end of provided audio with lyrics and/or style using ACE-Step
audio-to-audio audio-outpaint audio-extend
OK 2025/5/11
ACE-Step audio-to-audio Modify a portion of provided audio with lyrics and/or style using ACE-Step
audio-to-audio audio-inpaint audio-repaint
OK 2025/5/11
ACE-Step audio-to-audio Generate music from a lyrics and example audio using ACE-Step
audio-to-audio audio-edit
OK 2025/5/11
ACE-Step text-to-audio Generate music from a simple prompt using ACE-Step
text-to-audio text-to-music
OK 2025/5/11
Rembg Enhance (Remove Background Enhance) image-to-image Rembg-enhance is optimized for 2D vector images, 3D graphics, and photos by leveraging matting technology.
background removal image editing utility segmentation high resolution rembg
OK 2025/5/9
Vidu Start End to Video image-to-video Vidu Q1 Start-End to Video generates smooth transition 1080p videos between specified start and end images.
stylized transform
OK 2025/5/9
Vidu Text to Video text-to-video Vidu Q1 Text to Video generates high-quality 1080p videos with exceptional visual quality and motion diversity
stylized transform
OK 2025/5/9
Vidu Image to Video image-to-video Vidu Q1 Image to Video generates high-quality 1080p videos with exceptional visual quality and motion diversity from a single image
stylized transform
OK 2025/5/9
ACE-Step text-to-audio Generate music with lyrics from text using ACE-Step
text-to-audio text-to-music
OK 2025/5/8
LTX Video Trainer training Train LTX Video 0.9.7 for custom styles and effects.
ltx-video fine-tuning
OK 2025/5/8
Recraft Creative Upscale image-to-image Enhances a given raster image using the 'creative upscale' tool, increasing image resolution, making the image sharper and cleaner.
upscaling
OK 2025/5/7
Recraft Crisp Upscale image-to-image Enhances a given raster image using 'crisp upscale' tool, boosting resolution with a focus on refining small details and faces.
upscaling
OK 2025/5/7
Recraft V3 Create Style training Recraft V3 Create Style is capable of creating unique styles for Recraft V3 based on your images.
style vector personalization
OK 2025/5/7
Recraft V3 image-to-image Recraft V3 is a text-to-image model with the ability to generate long texts, vector art, images in brand style, and much more. As of today, it is SOTA in image generation, proven by Hugging Face's industry-leading Text-to-Image Benchmark by Artificial Analysis.
vector typography style
OK 2025/5/7
Recraft V3 text-to-image Recraft V3 is a text-to-image model with the ability to generate long texts, vector art, images in brand style, and much more. As of today, it is SOTA in image generation, proven by Hugging Face's industry-leading Text-to-Image Benchmark by Artificial Analysis.
vector typography style
OK 2025/5/7
Ltx Video V097 video-to-video Deprecated. Use fal-ai/ltx-video-13b-dev or fal-ai/ltx-video-13b-distilled instead. Deprecated 2025/5/6
MiniMax Voice Cloning text-to-speech Clone a voice from a sample audio and generate speech from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality text-to-speech.
speech
OK 2025/5/6
MiniMax Speech-02 Turbo text-to-speech Generate fast speech from text prompts and different voices using the MiniMax Speech-02 Turbo model, which leverages advanced AI techniques to create high-quality text-to-speech.
speech
OK 2025/5/6
LTX Video-0.9.7 video-to-video Deprecated. Use fal-ai/ltx-video-13b-dev or fal-ai/ltx-video-13b-distilled instead.
video image-to-video text-to-video
Deprecated 2025/5/6
MiniMax Speech-02 HD text-to-speech Generate speech from text prompts and different voices using the MiniMax Speech-02 HD model, which leverages advanced AI techniques to create high-quality text-to-speech.
speech
OK 2025/5/6
LTX Video-0.9.7 text-to-video Deprecated. Use fal-ai/ltx-video-13b-dev or fal-ai/ltx-video-13b-distilled instead.
video text-video
Deprecated 2025/5/6
LTX Video-0.9.7 image-to-video Deprecated. Use fal-ai/ltx-video-13b-dev or fal-ai/ltx-video-13b-distilled instead.
video image-to-video
Deprecated 2025/5/6
Minimax Image Subject Reference image-to-image Generate images from text and a reference image using MiniMax Image-01 for consistent character appearance.
stylized transform
OK 2025/5/6
MiniMax (Hailuo AI) Text to Image text-to-image Generate high quality images from text prompts using MiniMax Image-01. Longer text prompts will result in better quality images.
stylized realism
OK 2025/5/6
Hidream I1 Full image-to-image HiDream-I1 full is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds.
image-to-image hidream
OK 2025/5/5
Pony V7 text-to-image Pony V7 is a finetuned text to image for superior aesthetics and prompt following.
diffusion style
OK 2025/5/5
Trellis image-to-3d Generate 3D models from multiple images using Trellis. A native 3D generative model enabling versatile and high-quality 3D asset creation.
stylized
OK 2025/5/2
Ideogram image-to-image Extend existing images with Ideogram V3's reframe feature. Create expanded versions and adaptations while preserving main image and adding new creative directions through prompt guidance.
realism typography
OK 2025/5/1
Ideogram Text to Image text-to-image Generate high-quality images, posters, and logos with Ideogram V3. Features exceptional typography handling and realistic outputs optimized for commercial and creative use.
realism typography
OK 2025/5/1
Ideogram Replace Background image-to-image Replace backgrounds existing images with Ideogram V3's replace background feature. Create variations and adaptations while preserving core elements and adding new creative directions through prompt guidance. OK 2025/5/1
Ideogram image-to-image Reimagine existing images with Ideogram V3's remix feature. Create variations and adaptations while preserving core elements and adding new creative directions through prompt guidance.
realism typography
OK 2025/5/1
Ideogram V3 Edit image-to-image Transform existing images with Ideogram V3's editing capabilities. Modify, adjust, and refine images while maintaining high fidelity and realistic outputs with precise prompt control.
realism typography
OK 2025/5/1
Hidream E1 Full image-to-image Edit images with natural language Deprecated 2025/4/29
F Lite text-to-image F Lite is a 10B parameter diffusion model created by Fal and Freepik, trained exclusively on copyright-safe and SFW content. OK 2025/4/28
F Lite (texture mode) text-to-image F Lite is a 10B parameter diffusion model created by Fal and Freepik, trained exclusively on copyright-safe and SFW content. This is a high texture density variant of the model. OK 2025/4/28
Moondream2 vision Moondream2 is a highly efficient open-source vision language model that combines powerful image understanding capabilities with a remarkably small footprint.
Vision
OK 2025/4/26
Moondream2 vision Moondream2 is a highly efficient open-source vision language model that combines powerful image understanding capabilities with a remarkably small footprint.
Vision
OK 2025/4/26
Moondream2 vision Moondream2 is a highly efficient open-source vision language model that combines powerful image understanding capabilities with a remarkably small footprint.
image-to-image
OK 2025/4/26
Moondream2 vision Moondream2 is a highly efficient open-source vision language model that combines powerful image understanding capabilities with a remarkably small footprint.
image-to-image
OK 2025/4/26
Step1X Edit image-to-image Step1X-Edit transforms your photos with simple instructions into stunning, professional-quality edits—rivaling top proprietary tools.
editing
OK 2025/4/25
Tripo3D image-to-3d State of the art Image to 3D Object generation. Generate 3D model from a single image!
image-to-3d stylized
OK 2025/4/25
Image2svg image-to-image Image2SVG transforms raster images into clean vector graphics, preserving visual quality while enabling scalable, customizable SVG outputs with precise control over detail levels.
utility editing
OK 2025/4/25
Uno image-to-image An AI model that transforms input images into new ones based on text prompts, blending reference visuals with your creative directions.
image-to-image
OK 2025/4/24
MAGI-1 video-to-video MAGI-1 extends videos with an exceptional understanding of physical interactions and prompts
video-to-video
OK 2025/4/23
MAGI-1 text-to-video MAGI-1 is a video generation model with exceptional understanding of physical interactions and cinematic prompts
text-to-video
OK 2025/4/23
MAGI-1 image-to-video MAGI-1 generates videos from images with exceptional understanding of physical interactions and prompting
image-to-video
OK 2025/4/23
gpt-image-1 text-to-image OpenAI's latest image generation and editing model: gpt-1-image. OK 2025/4/23
gpt-image-1 image-to-image OpenAI's latest image generation and editing model: gpt-1-image. OK 2025/4/23
Pixverse image-to-video Generate high quality video clips with different effects using PixVerse v4
image-to-video
OK 2025/4/23
MAGI-1 (Distilled) video-to-video MAGI-1 distilled extends videos faster with an exceptional understanding of physical interactions and prompts
video-to-video video-extend
OK 2025/4/23
MAGI-1 (Distilled) image-to-video MAGI-1 distilled generates videos faster from images with exceptional understanding of physical interactions and prompting
image-to-video
OK 2025/4/23
Dia Tts audio-to-audio Clone dialog voices from a sample audio and generate dialogs from text prompts using the Dia TTS which leverages advanced AI techniques to create high-quality text-to-speech.
speech
OK 2025/4/22
Framepack image-to-video Framepack is an efficient Image-to-video model that autoregressively generates videos.
image to video motion
OK 2025/4/22
Dia text-to-speech Dia directly generates realistic dialogue from transcripts. Audio conditioning enables emotion control. Produces natural nonverbals like laughter and throat clearing.
text-to-speech
OK 2025/4/22
MAGI-1 (Distilled) text-to-video MAGI-1 distilled is a faster video generation model with exceptional understanding of physical interactions and cinematic prompts
text-to-video
OK 2025/4/22
Pipecat's Smart Turn model speech-to-text An open source, community-driven and native audio turn detection model by Pipecat AI. OK 2025/4/21
Juggernaut Flux Lora image-to-image Juggernaut Base Flux LoRA Inpainting by RunDiffusion is a drop-in replacement for Flux [Dev] inpainting that delivers sharper details, richer colors, and enhanced realism to all your LoRAs and LyCORIS with full compatibility. OK 2025/4/21
FASHN Virtual Try-On V1.5 image-to-image FASHN v1.5 delivers precise virtual try-on capabilities, accurately rendering garment details like text and patterns at 576x864 resolution from both on-model and flat-lay photo references.
try-on fashion clothing
OK 2025/4/21
Plushify image-to-image Turn any image into a cute plushie! OK 2025/4/20
Instant Character image-to-image InstantCharacter creates high-quality, consistent characters from text prompts, supporting diverse poses, styles, and appearances with strong identity control.
personalization customization
OK 2025/4/18
Wan-2.1 First-Last-Frame-to-Video image-to-video Wan-2.1 flf2v generates dynamic videos by intelligently bridging a given first frame to a desired end frame through smooth, coherent motion sequences.
image to video motion
OK 2025/4/17
Turbo Flux Trainer training A blazing fast FLUX dev LoRA trainer for subjects and styles. OK 2025/4/17
Framepack image-to-video Framepack is an efficient Image-to-video model that autoregressively generates videos.
image to video motion
OK 2025/4/17
Kling 2.0 Master image-to-video Generate video clips from your images using Kling 2.0 Master OK 2025/4/14
Kling 2.0 Master text-to-video Generate video clips from your prompts using Kling 2.0 Master OK 2025/4/14
Cartoonify image-to-image Transform images into 3D cartoon artwork using an AI model that applies cartoon stylization while preserving the original image's composition and details.
stylized transform
OK 2025/4/14
Vace video-to-video Vace a video generation model that uses a source image, mask, and video to create prompted videos with controllable sources.
video-to-video image-to-video text-to-video
OK 2025/4/11
Hidream I1 Full text-to-image HiDream-I1 full is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds. OK 2025/4/11
Hidream I1 Dev text-to-image HiDream-I1 dev is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds. OK 2025/4/11
Hidream I1 Fast text-to-image HiDream-I1 fast is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within 16 steps.
OK 2025/4/11
finegrain eraser image-to-image Finegrain Eraser removes any object selected with a mask—along with its shadows, reflections, and lighting artifacts—seamlessly reconstructing the scene with contextually accurate content.
utility editing
OK 2025/4/10
finegrain eraser image-to-image Finegrain Eraser removes any object selected with a bounding box—along with its shadows, reflections, and lighting artifacts—seamlessly reconstructing the scene with contextually accurate content.
utility editing
OK 2025/4/9
finegrain eraser image-to-image Finegrain Eraser removes objects—along with their shadows, reflections, and lighting artifacts—using only natural language, seamlessly filling the scene with contextually accurate content.
utility editing
OK 2025/4/9
Video Sound Effects Generator video-to-video Add sound effects to your videos
sound-effects sfx cassetteai
OK 2025/4/7
Speech-to-Text speech-to-text Leverage the rapid processing capabilities of AI models to enable accurate and efficient real-time speech-to-text transcription. OK 2025/4/4
Speech-to-Text speech-to-text Leverage the rapid processing capabilities of AI models to enable accurate and efficient real-time speech-to-text transcription.
streaming
OK 2025/4/4
Speech-To-text speech-to-text Leverage the rapid processing capabilities of AI models to enable accurate and efficient real-time speech-to-text transcription.
streaming
OK 2025/4/4
Speech-to-Text speech-to-text Leverage the rapid processing capabilities of AI models to enable accurate and efficient real-time speech-to-text transcription.
OK 2025/4/4
Sound Effects Generator text-to-audio Create stunningly realistic sound effects in seconds - CassetteAI's Sound Effects Model generates high-quality SFX up to 30 seconds long in just 1 second of processing time
sound sfx sound-effects cassetteai
OK 2025/4/3
Sync Lipsync 2.0 video-to-video Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization with Sync Lipsync 2.0 model
animation lip sync
OK 2025/4/1
FLUX.1 [dev] text-to-image FLUX.1 [dev] is a 12 billion parameter flow transformer that generates high-quality images from text. It is suitable for personal and commercial use. OK 2025/4/1
StarVector image-to-image AI vectorization model that transforms raster images into scalable SVG graphics, preserving visual details while enabling infinite scaling and easy editing capabilities.
image-to-image
OK 2025/4/1
PixVerse v4: Image to Video Fast image-to-video Generate fast high quality video clips from text and image prompts using PixVerse v4 OK 2025/4/1
PixVerse v4: Image to Video image-to-video Generate high quality video clips from text and image prompts using PixVerse v4 OK 2025/4/1
PixVerse v3.5: Effects image-to-video Generate high quality video clips with different effects using PixVerse v3.5 OK 2025/4/1
PixVerse v4: Text to Video text-to-video Generate high quality video clips from text and image prompts using PixVerse v4 OK 2025/4/1
PixVerse v3.5: Transition image-to-video Create seamless transition between images using PixVerse v3.5 OK 2025/4/1
PixVerse v4: Text to Video Fast text-to-video Generate high quality and fast video clips from text and image prompts using PixVerse v4 fast OK 2025/4/1
Ghiblify Images image-to-image Reimagine and transform your ordinary photos into enchanting Studio Ghibli style artwork
stylized transform
OK 2025/3/31
Orpheus TTS text-to-speech Orpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been finetuned to deliver human-level speech synthesis, achieving exceptional clarity, expressiveness, and real-time performances.
text to speech voice synthesis high-fidelity
OK 2025/3/31
Sana v1.5 1.6B text-to-image Sana v1.5 1.6B is a lightweight text-to-image model that delivers 4K image generation with impressive efficiency.
text to image 4k lightweight
OK 2025/3/31
Sana v1.5 4.8B text-to-image Sana v1.5 4.8B is a powerful text-to-image model that generates ultra-high quality 4K images with remarkable detail.
text to image 4k high-quality
OK 2025/3/31
Sana Sprint text-to-image Sana Sprint is a text-to-image model capable of generating 4K images with exceptional speed.
text to image 4k high-speed
OK 2025/3/31
music generator text-to-audio CassetteAI’s model generates a 30-second sample in under 2 seconds and a full 3-minute track in under 10 seconds. At 44.1 kHz stereo audio, expect a level of professional consistency with no breaks, no squeaks, and no random interruptions in your creations.
music cassetteai
OK 2025/3/27
Kling LipSync Text-to-Video text-to-video Kling LipSync is a text-to-video model that generates realistic lip movements from text input.
text to video lipsync
OK 2025/3/27
Kling LipSync Audio-to-Video text-to-video Kling LipSync is an audio-to-video model that generates realistic lip movements from audio input.
audio to video lipsync
OK 2025/3/27
LatentSync video-to-video LatentSync is a video-to-video model that generates lip sync animations from audio using advanced algorithms for high-quality synchronization.
animation lip sync
OK 2025/3/25
Wan-2.1 Text-to-Video with LoRAs text-to-video Add custom LoRAs to Wan-2.1 is a text-to-video model that generates high-quality videos with high visual quality and motion diversity from images
"text to video" "motion" "lora"
OK 2025/3/25
Wan-2.1 LoRA Trainer training Train custom LoRAs for Wan-2.1 I2V 480P
lora training
OK 2025/3/24
Thera image-to-image Fix low resolution images with fast speed and quality of thera. OK 2025/3/24
MixDehazer image-to-image An advanced dehaze model to remove atmospheric haze, restoring clarity and detail in images through intelligent neural network processing. OK 2025/3/24
Hunyuan3D image-to-3d Generate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation.
stylized
OK 2025/3/20
Gemini Flash Edit Multi Image image-to-image Gemini Flash Edit is a model that can edit single image using a text prompt and a reference image.
editing
OK 2025/3/20
Hunyuan3D image-to-3d Generate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation.
stylized
OK 2025/3/20
Hunyuan3D image-to-3d Generate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation.
stylized
OK 2025/3/20
Hunyuan3D image-to-3d Generate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation.
stylized
OK 2025/3/20
Hunyuan3D image-to-3d Generate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation.
stylized
OK 2025/3/20
Hunyuan3D image-to-3d Generate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation.
stylized
OK 2025/3/20
Gemini Flash Edit Multi Image image-to-image Gemini Flash Edit Multi Image is a model that can edit multiple images using a text prompt and a reference image.
editing
OK 2025/3/20
Luma Ray 2 Flash (Image to Video) image-to-video Ray2 Flash is a fast video generative model capable of creating realistic visuals with natural, coherent motion.
motion transformation
OK 2025/3/17
Luma Ray 2 Flash text-to-video Ray2 Flash is a fast video generative model capable of creating realistic visuals with natural, coherent motion.
motion transformation
OK 2025/3/17
Pika Effects (v1.5) image-to-video Pika Effects are AI-powered video effects designed to modify objects, characters, and environments in a fun, engaging, and visually compelling manner.
editing effects animation
OK 2025/3/14
Pika Image to Video Turbo (v2) image-to-video Turbo is the model to use when you feel the need for speed. Turn your image to stunning video up to 3x faster – all with high quality outputs.
editing effects animation
OK 2025/3/14
Pika Text to Video (v2.2) text-to-video Start with a simple text input to create dynamic generations that defy expectations in up to 1080p. Experience better image clarity and crisper, sharper visuals.
editing effects animation
OK 2025/3/14
Invisible Watermark image-to-image Invisible Watermark is a model that can add an invisible watermark to an image.
utility editing
OK 2025/3/14
Pika Text to Video (v2.1) text-to-video Start with a simple text input to create dynamic generations that defy expectations. Anything you dream can come to life with sharp details, impressive character control and cinematic camera moves.
editing effects animation
OK 2025/3/14
Pika Text to Video Turbo (v2) text-to-video Pika v2 Turbo creates videos from a text prompt with high quality output.
editing effects animation
OK 2025/3/14
Pika Image to Video (v2.2) image-to-video Turn photos into mind-blowing, dynamic videos in up to 1080p. Experience better image clarity and crisper, sharper visuals.
editing effects animation
OK 2025/3/14
Pika Scenes (v2.2) image-to-video Pika Scenes v2.2 creates videos from a images with high quality output.
editing effects animation
OK 2025/3/14
Pika Image to Video (v2.1) image-to-video Turn photos into mind-blowing, dynamic videos. Your images can can come to life with sharp details, impressive character control and cinematic camera moves.
editing effects animation
OK 2025/3/14
Pikadditions (v2) video-to-video Pikadditions is a powerful video-to-video AI model that allows you to add anyone or anything to any video with seamless integration.
editing effects animation
OK 2025/3/14
CSM-1B text-to-audio CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs.
conversational text to speech
OK 2025/3/13
Wan Effects image-to-video Wan Effects generates high-quality videos with popular effects from images
motion effects
OK 2025/3/13
Vidu Image to Video image-to-video Vidu Image to Video generates high-quality videos with exceptional visual quality and motion diversity from a single image
motion image to video
OK 2025/3/12
Vidu Start-End to Video image-to-video Vidu Start-End to Video generates smooth transition videos between specified start and end images.
motion transition
OK 2025/3/12
Vidu Reference to Video image-to-video Vidu Reference to Video creates videos by using a reference images and combining them with a prompt.
motion reference
OK 2025/3/12
Vidu Template to Video image-to-video Vidu Template to Video lets you create different effects by applying motion templates to your images.
motion template
OK 2025/3/12
Wan-2.1 Pro Image-to-Video image-to-video Wan-2.1 Pro is a premium image-to-video model that generates high-quality 1080p videos at 30fps with up to 6 seconds duration, delivering exceptional visual quality and motion diversity from images
image to video motion
OK 2025/3/11
Wan-2.1 Pro Text-to-Video text-to-video Wan-2.1 Pro is a premium text-to-video model that generates high-quality 1080p videos at 30fps with up to 6 seconds duration, delivering exceptional visual quality and motion diversity from text prompts
text to video motion
OK 2025/3/11
Veo 2 (Image to Video) image-to-video Veo 2 creates videos from images with realistic motion and very high quality output.
motion transformation
OK 2025/3/11
Wan-2.1 Image-to-Video with LoRAs image-to-video Add custom LoRAs to Wan-2.1 is a image-to-video model that generates high-quality videos with high visual quality and motion diversity from images
image to video motion lora
OK 2025/3/8
Kling 1.5 text-to-video Generate video clips from your prompts using Kling 1.5 (pro) OK 2025/3/6
Kling 1.0 text-to-video Generate video clips from your prompts using Kling 1.0
motion
OK 2025/3/6
Kling 1.6 text-to-video Generate video clips from your prompts using Kling 1.6 (pro) OK 2025/3/6
Kling 1.6 text-to-video Generate video clips from your prompts using Kling 1.6 (std) OK 2025/3/6
Hunyuan Video Image-to-Video Inference image-to-video Image to Video for the high-quality Hunyuan Video I2V model.
motion
OK 2025/3/6
Juggernaut Flux Lightning text-to-image Juggernaut Lightning Flux by RunDiffusion provides blazing-fast, high-quality images rendered at five times the speed of Flux. Perfect for mood boards and mass ideation, this model excels in both realism and prompt adherence.
image generation
OK 2025/3/5
Juggernaut Flux Pro image-to-image Juggernaut Pro Flux by RunDiffusion is the flagship Juggernaut model rivaling some of the most advanced image models available, often surpassing them in realism. It combines Juggernaut Base with RunDiffusion Photo and features enhancements like reduced background blurriness.
image generation
OK 2025/3/5
Rundiffusion Photo Flux text-to-image RunDiffusion Photo Flux provides insane realism. With this enhancer, textures and skin details burst to life, turning your favorite prompts into vivid, lifelike creations. Recommended to keep it at 0.65 to 0.80 weight. Supports resolutions up to 1536x1536.
image generation lora
OK 2025/3/5
Juggernaut Flux Base LoRA text-to-image Juggernaut Base Flux LoRA by RunDiffusion is a drop-in replacement for Flux [Dev] that delivers sharper details, richer colors, and enhanced realism to all your LoRAs and LyCORIS with full compatibility.
image generation
OK 2025/3/5
Juggernaut Flux Base image-to-image Juggernaut Base Flux by RunDiffusion is a drop-in replacement for Flux [Dev] that delivers sharper details, richer colors, and enhanced realism, while instantly boosting LoRAs and LyCORIS with full compatibility.
image generation
OK 2025/3/5
LTX Video-0.9.5 video-to-video Generate videos from prompts and videos using LTX Video-0.9.5
video video-to-video
OK 2025/3/5
LTX Video-0.9.5 text-to-video Generate videos from prompts using LTX Video-0.9.5
video text-video
OK 2025/3/5
LTX Video-0.9.5 video-to-video Generate videos from prompts,images, and videos using LTX Video-0.9.5
video image-to-video text-to-video
OK 2025/3/5
Juggernaut Flux Pro text-to-image Juggernaut Pro Flux by RunDiffusion is the flagship Juggernaut model rivaling some of the most advanced image models available, often surpassing them in realism. It combines Juggernaut Base with RunDiffusion Photo and features enhancements like reduced background blurriness.
image generation
OK 2025/3/5
Juggernaut Flux Base text-to-image Juggernaut Base Flux by RunDiffusion is a drop-in replacement for Flux [Dev] that delivers sharper details, richer colors, and enhanced realism, while instantly boosting LoRAs and LyCORIS with full compatibility.
image generation
OK 2025/3/5
LTX Video-0.9.5 image-to-video Generate videos from prompts and images using LTX Video-0.9.5
video image-to-video
Deprecated 2025/3/5
CogView text-to-image Generate high quality images from text prompts using CogView4. Longer text prompts will result in better quality images.
stylized
OK 2025/3/4
Topaz Video Upscale video-to-video Professional-grade video upscaling using Topaz technology. Enhance your videos with high-quality upscaling.
upscaling high-res
OK 2025/3/4
DiffRhythm: Lyrics to Song text-to-audio DiffRhythm is a blazing fast model for transforming lyrics into full songs. It boasts the capability to generate full songs in less than 30 seconds.
music
OK 2025/3/4
DocRes-dewarp image-to-image Enhance wraped, folded documents with the superior quality of docres for sharper, clearer results.
image-enhancement
OK 2025/3/3
DocRes image-to-image Enhance low-resolution, blur, shadowed documents with the superior quality of docres for sharper, clearer results.
image-enhancement
OK 2025/3/3
SWIN2SR image-to-image Enhance low-resolution images with the superior quality of Swin2SR for sharper, clearer results.
image-enhancement
OK 2025/2/28
Ideogram V2A Remix image-to-image Create variations of existing images with Ideogram V2A Remix while maintaining creative control through prompt guidance.
realism typography
OK 2025/2/27
Kling 1.6 text-to-video Generate video clips from your prompts using Kling 1.6 (pro) OK 2025/2/27
Ideogram V2A Turbo Remix image-to-image Rapidly create image variations with Ideogram V2A Turbo Remix. Fast and efficient reimagining of existing images while maintaining creative control through prompt guidance.
realism typography
OK 2025/2/27
ElevenLabs TTS Multilingual v2 text-to-audio Generate multilingual text-to-speech audio using ElevenLabs TTS Multilingual v2.
audio
OK 2025/2/27
Wan-2.1 1.3B Text-to-Video text-to-video Wan-2.1 1.3B is a text-to-video model that generates high-quality videos with high visual quality and motion diversity from text promptsat faster speeds.
text to video motion
OK 2025/2/27
Ideogram V2A Turbo text-to-image Accelerated image generation with Ideogram V2A Turbo. Create high-quality visuals, posters, and logos with enhanced speed while maintaining Ideogram's signature quality.
realism typography
OK 2025/2/27
ElevenLabs Speech to Text speech-to-text Generate text from speech using ElevenLabs advanced speech-to-text model.
speech
OK 2025/2/27
ElevenLabs Audio Isolation audio-to-audio Isolate audio tracks using ElevenLabs advanced audio isolation technology.
audio
OK 2025/2/27
ElevenLabs TTS Turbo v2.5 text-to-speech Generate high-speed text-to-speech audio using ElevenLabs TTS Turbo v2.5.
audio
OK 2025/2/27
Ideogram V2A text-to-image Generate high-quality images, posters, and logos with Ideogram V2A. Features exceptional typography handling and realistic outputs optimized for commercial and creative use.
realism typography
OK 2025/2/27
EVF-SAM2 Segmentation image-to-image EVF-SAM2 combines natural language understanding with advanced segmentation capabilities, allowing you to precisely mask image regions using intuitive positive and negative text prompts.
segmentation mask
OK 2025/2/26
DDColor image-to-image Bring colors into old or new black and white photos with DDColor.
image-recolorization faces utility
OK 2025/2/26
Wan-2.1 Text-to-Video text-to-video Wan-2.1 is a text-to-video model that generates high-quality videos with high visual quality and motion diversity from text prompts
text to video motion
OK 2025/2/25
Wan-2.1 Image-to-Video image-to-video Wan-2.1 is a image-to-video model that generates high-quality videos with high visual quality and motion diversity from images
image to video motion
OK 2025/2/25
Video Prompt Generator llm Generate video prompts using a variety of techniques including camera direction, style, pacing, special effects and more.
motion transformation chat claude gpt
OK 2025/2/25
Segment Anything Model 2 image-to-image SAM 2 is a model for segmenting images automatically. It can return individual masks or a single mask for the entire image.
segmentation mask
OK 2025/2/25
MiniMax (Hailuo AI) Video 01 Director - Image to Video image-to-video Generate video clips more accurately with respect to initial image, natural language descriptions, and using camera movement instructions for shot control.
motion transformation camera-controls
OK 2025/2/24
DRCT-Super-Resolution image-to-image Upscale your images with DRCT-Super-Resolution.
upscaling high-res
OK 2025/2/24
Veo 2 text-to-video Veo 2 creates videos with realistic motion and high quality output. Explore different styles and find your own with extensive camera controls.
motion transformation
OK 2025/2/21
NAFNet-deblur image-to-image Use NAFNet to fix issues like blurriness and noise in your images. This model specializes in image restoration and can help enhance the overall quality of your photography.
image-restoration deblur denoise
OK 2025/2/21
NAFNet-denoise image-to-image Use NAFNet to fix issues like blurriness and noise in your images. This model specializes in image restoration and can help enhance the overall quality of your photography.
image-restoration deblur denoise
OK 2025/2/21
Post Processing image-to-image Post Processing is an endpoint that can enhance images using a variety of techniques including grain, blur, sharpen, and more.
stylized utility
OK 2025/2/18
Skyreels V1 (Image-to-Video) image-to-video SkyReels V1 is the first and most advanced open-source human-centric video foundation model. By fine-tuning HunyuanVideo on O(10M) high-quality film and television clips
motion
OK 2025/2/18
Flow-Edit image-to-image The model provides you high quality image editing capabilities.
editing
OK 2025/2/14
Kokoro TTS (Mandarin Chinese) text-to-audio A highly efficient Mandarin Chinese text-to-speech model that captures natural tones and prosody.
speech
OK 2025/2/14
Kokoro TTS (Hindi) text-to-audio A fast and expressive Hindi text-to-speech model with clear pronunciation and accurate intonation.
speech
OK 2025/2/14
Kokoro TTS (Brazilian Portuguese) text-to-audio A natural and expressive Brazilian Portuguese text-to-speech model optimized for clarity and fluency.
speech
OK 2025/2/14
Kokoro TTS (Spanish) text-to-audio A natural-sounding Spanish text-to-speech model optimized for Latin American and European Spanish.
speech
OK 2025/2/14
Kokoro TTS (French) text-to-audio An expressive and natural French text-to-speech model for both European and Canadian French.
speech
OK 2025/2/14
Kokoro TTS (British English) text-to-audio A high-quality British English text-to-speech model offering natural and expressive voice synthesis.
speech
OK 2025/2/14
Kokoro TTS text-to-audio Kokoro is a lightweight text-to-speech model that delivers comparable quality to larger models while being significantly faster and more cost-efficient.
speech
OK 2025/2/14
Kokoro TTS (Japanese) text-to-audio A fast and natural-sounding Japanese text-to-speech model optimized for smooth pronunciation.
speech
OK 2025/2/14
Zonos-Audio-Clone text-to-audio Clone voice of any person and speak anything in their voice using zonos' voice cloning.
voice cloning
OK 2025/2/14
Kokoro TTS (Italian) text-to-audio A high-quality Italian text-to-speech model delivering smooth and expressive speech synthesis.
speech
OK 2025/2/14
Luma Ray 2 (Image to Video) image-to-video Ray2 is a large-scale video generative model capable of creating realistic visuals with natural, coherent motion.
motion transformation
OK 2025/2/14
GOT OCR 2.0 vision GOT-OCR2 works on a wide range of tasks, including plain document OCR, scene text OCR, formatted document OCR, and even OCR for tables, charts, mathematical formulas, geometric shapes, molecular formulas and sheet music.
optical character recognition high-res utility
OK 2025/2/12
FLUX.1 [dev] Control LoRA Canny image-to-image FLUX Control LoRA Canny is a high-performance endpoint that uses a control image using a Canny edge map to transfer structure to the generated image and another initial image to guide color.
lora style transfer
OK 2025/2/11
FLUX.1 [dev] Control LoRA Depth image-to-image FLUX Control LoRA Depth is a high-performance endpoint that uses a control image using a depth map to transfer structure to the generated image and another initial image to guide color.
lora style transfer
OK 2025/2/11
ben-v2-image image-to-image A fast and high quality model for image background removal.
background removal
OK 2025/2/11
FLUX.1 [dev] Control LoRA Canny text-to-image FLUX Control LoRA Canny is a high-performance endpoint that uses a control image to transfer structure to the generated image, using a Canny edge map.
lora style transfer
OK 2025/2/11
FLUX.1 [dev] Control LoRA Depth text-to-image FLUX Control LoRA Depth is a high-performance endpoint that uses a control image to transfer structure to the generated image, using a depth map.
lora style transfer
OK 2025/2/11
MiniMax (Hailuo AI) Video 01 Director text-to-video Generate video clips more accurately with respect to natural language descriptions and using camera movement instructions for shot control.
motion transformation camera-controls
OK 2025/2/11
Ben-Video-Bg-Rm video-to-video A model for high quality and smooth background removal for videos.
segmentation background removal
OK 2025/2/11
Imagen3 text-to-image Imagen3 is a high-quality text-to-image model that generates realistic images from text prompts. OK 2025/2/10
Imagen3 Fast text-to-image Imagen3 Fast is a high-quality text-to-image model that generates realistic images from text prompts. OK 2025/2/10
Ideogram Upscale image-to-image Ideogram Upscale enhances the resolution of the reference image by up to 2X and might enhance the reference image too. Optionally refine outputs with a prompt for guided improvements.
upscaling high-res
OK 2025/2/10
Hunyuan Video Image-to-Video LoRA Inference image-to-video Image to Video for the Hunyuan Video model using a custom trained LoRA.
motion
OK 2025/2/3
CodeFormer image-to-image Fix distorted or blurred photos of people with CodeFormer.
image-restoration faces utility
OK 2025/1/31
Lumina Image 2 text-to-image Lumina-Image-2.0 is a 2 billion parameter flow-based diffusion transforer which features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
diffusion typography style
OK 2025/1/31
Hunyuan Video (Video-to-Video) video-to-video Hunyuan Video is an Open video generation model with high visual quality, motion diversity, text-video alignment, and generation stability. Use this endpoint to generate videos from videos.
video to video motion
OK 2025/1/30
Hunyuan Video LoRA Inference (Video-to-Video) video-to-video Hunyuan Video is an Open video generation model with high visual quality, motion diversity, text-video alignment, and generation stability. Use this endpoint to generate videos from videos.
video to video motion lora
OK 2025/1/30
PixVerse v3.5 text-to-video Generate high quality video clips from text prompts using PixVerse v3.5 OK 2025/1/29
PixVerse v3.5: Image to Video Fast image-to-video Generate high quality video clips from text and image prompts quickly using PixVerse v3.5 Fast OK 2025/1/29
PixVerse v3.5 Fast text-to-video Generate high quality video clips quickly from text prompts using PixVerse v3.5 Fast OK 2025/1/29
PixVerse v3.5: Image to Video image-to-video Generate high quality video clips from text and image prompts using PixVerse v3.5 OK 2025/1/29
DeepSeek Janus-Pro text-to-image DeepSeek Janus-Pro is a novel text-to-image model that unifies multimodal understanding and generation through an autoregressive framework
stylized
OK 2025/1/28
YuE: Lyrics to Song text-to-audio YuE is a groundbreaking series of open-source foundation models designed for music generation, specifically for transforming lyrics into full songs.
music
OK 2025/1/28
Luma Ray 2 text-to-video Ray2 is a large-scale video generative model capable of creating realistic visuals with natural, coherent motion.
motion transformation
OK 2025/1/27
Kling Kolors Virtual TryOn v1.5 image-to-image Kling Kolors Virtual TryOn v1.5 is a high quality image based Try-On endpoint which can be used for commercial try on.
try-on fashion clothing
OK 2025/1/23
FFmpeg API Metadata json Get encoding metadata from video and audio files using FFmpeg API.
ffmpeg
OK 2025/1/22
FFmpeg API Waveform json Get waveform data from audio files using FFmpeg API.
ffmpeg
OK 2025/1/22
FFmpeg API Compose video-to-video Compose videos from multiple media sources using FFmpeg API.
ffmpeg
OK 2025/1/22
MiniMax (Hailuo AI) Video 01 Subject Reference image-to-video Generate video clips maintaining consistent, realistic facial features and identity across dynamic video content
subject transformation
OK 2025/1/20
MoonDreamNext Batch vision MoonDreamNext Batch is a multimodal vision-language model for batch captioning.
multimodal
OK 2025/1/17
FLUX.1 [dev] Canny with LoRAs image-to-image Utilize Flux.1 [dev] Controlnet to generate high-quality images with precise control over composition, style, and structure through advanced edge detection and guidance mechanisms.
controlnet detection lora editing composition
OK 2025/1/16
FLUX1.1 [pro] text-to-image FLUX1.1 [pro] is an enhanced version of FLUX.1 [pro], improved image generation capabilities, delivering superior composition, detail, and artistic fidelity compared to its predecessor. OK 2025/1/16
FLUX1.1 [pro] ultra Fine-tuned text-to-image FLUX1.1 [pro] ultra fine-tuned is the newest version of FLUX1.1 [pro] with a fine-tuned LoRA, maintaining professional-grade image quality while delivering up to 2K resolution with improved photo realism.
high-res realism
OK 2025/1/16
FLUX.1 [pro] Fill Fine-tuned image-to-image FLUX.1 [pro] Fill Fine-tuned is a high-performance endpoint for the FLUX.1 [pro] model with a fine-tuned LoRA that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
editing
OK 2025/1/16
Hunyuan Video LoRA Inference text-to-video Hunyuan Video is an Open video generation model with high visual quality, motion diversity, text-video alignment, and generation stability OK 2025/1/16
Train Hunyuan LoRA training Train Hunyuan Video lora on people, objects, characters and more!
lora personalization
OK 2025/1/14
CogVideoX-5B text-to-video Generate videos from prompts using CogVideoX-5B OK 2025/1/14
TransPixar V1 text-to-video Transform text into stunning videos with TransPixar - an AI model that generates both RGB footage and alpha channels, enabling seamless compositing and creative video effects. OK 2025/1/14
sync.so -- lipsync 1.9.0-beta video-to-video Generate realistic lipsync animations from audio using advanced algorithms for high-quality synchronization.
animation lip sync
OK 2025/1/13
Sa2VA 8B Video vision Sa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels
multimodal vision
OK 2025/1/13
Sa2VA 4B Video vision Sa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels
multimodal vision
OK 2025/1/13
Sa2VA 4B Image vision Sa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels
multimodal vision
OK 2025/1/13
Sa2VA 8B Image vision Sa2VA is an MLLM capable of question answering, visual prompt understanding, and dense object segmentation at both image and video levels
multimodal vision
OK 2025/1/13
MoonDreamNext vision MoonDreamNext is a multimodal vision-language model for captioning, gaze detection, bbox detection, point detection, and more.
multimodal vision
OK 2025/1/9
MoonDreamNext Detection image-to-image MoonDreamNext Detection is a multimodal vision-language model for gaze detection, bbox detection, point detection, and more.
multimodal
OK 2025/1/9
Kling 1.6 image-to-video Generate video clips from your images using Kling 1.6 (pro) OK 2025/1/7
Kling 1.6 text-to-video Generate video clips from your prompts using Kling 1.6 (std) OK 2025/1/7
Kling 1.6 image-to-video Generate video clips from your images using Kling 1.6 (std) OK 2025/1/7
Auto-Captioner video-to-video Automatically generates text captions for your videos from the audio as per text colour/font specifications
captioning video
OK 2025/1/3
Train Flux LoRA training Train styles, people and other subjects at blazing speeds.
lora personalization
OK 2025/1/1
Switti 1024 text-to-image Switti is a scale-wise transformer for fast text-to-image generation that outperforms existing T2I AR models and competes with state-of-the-art T2I diffusion models while being faster than distilled diffusion models. OK 2024/12/31
Switti 512 text-to-image Switti is a scale-wise transformer for fast text-to-image generation that outperforms existing T2I AR models and competes with state-of-the-art T2I diffusion models while being faster than distilled diffusion models. OK 2024/12/31
MMAudio V2 Text to Audio text-to-audio MMAudio generates synchronized audio given text inputs. It can generate sounds described by a prompt.
audio fast
OK 2024/12/20
Sad Talker image-to-video Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
animation
OK 2024/12/20
Dubbing video-to-video This endpoint delivers seamlessly localized videos by generating lip-synced dubs in multiple languages, ensuring natural and immersive multilingual experiences
animation lip sync dubbing
OK 2024/12/20
Bria Expand Image image-to-image Bria Expand expands images beyond their borders in high quality. Trained exclusively on licensed data for safe and risk-free commercial use. Access the model's source code and weights: https://bria.ai/contact-us
outpainting
OK 2024/12/19
Bria Text-to-Image Fast text-to-image Bria's Text-to-Image model with perfect harmony of latency and quality. Trained exclusively on licensed data for safe and risk-free commercial use. Available also as source code and weights. For access to weights: https://bria.ai/contact-us
image generation
OK 2024/12/19
Bria Text-to-Image Base text-to-image Bria's Text-to-Image model, trained exclusively on licensed data for safe and risk-free commercial use. Available also as source code and weights. For access to weights: https://bria.ai/contact-us
image generation
OK 2024/12/19
Bria GenFill image-to-image Bria GenFill enables high-quality object addition or visual transformation. Trained exclusively on licensed data for safe and risk-free commercial use. Access the model's source code and weights: https://bria.ai/contact-us
image editing
OK 2024/12/19
Bria Background Replace image-to-image Bria Background Replace allows for efficient swapping of backgrounds in images via text prompts or reference image, delivering realistic and polished results. Trained exclusively on licensed data for safe and risk-free commercial use
image editing
OK 2024/12/19
Bria Eraser image-to-image Bria Eraser enables precise removal of unwanted objects from images while maintaining high-quality outputs. Trained exclusively on licensed data for safe and risk-free commercial use. Access the model's source code and weights: https://bria.ai/contact-us
image editing object removal
OK 2024/12/19
FLUX.1 [dev] Fill with LoRAs image-to-image FLUX.1 [dev] Fill is a high-performance endpoint for the FLUX.1 [pro] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
editing lora
OK 2024/12/19
Bria Text-to-Image HD text-to-image Bria's Text-to-Image model for HD images. Trained exclusively on licensed data for safe and risk-free commercial use. Available also as source code and weights. For access to weights: https://bria.ai/contact-us
image generation
OK 2024/12/19
Bria Product Shot image-to-image Place any product in any scenery with just a prompt or reference image while maintaining high integrity of the product. Trained exclusively on licensed data for safe and risk-free commercial use and optimized for eCommerce.
product photography
OK 2024/12/19
Bria RMBG 2.0 image-to-image Bria RMBG 2.0 enables seamless removal of backgrounds from images, ideal for professional editing tasks. Trained exclusively on licensed data for safe and risk-free commercial use. Model weights for commercial use are available here: https://share-eu1.hsforms.com/2GLpEVQqJTI2Lj7AMYwgfIwf4e04
background removal image segmentation high resolution utility rembg
OK 2024/12/19
try-on image-to-image Image based high quality Virtual Try-On
try-on fashion clothing
OK 2024/12/17
Leffa Pose Transfer image-to-image Leffa Pose Transfer is an endpoint for changing pose of an image with a reference image.
pose utility
OK 2024/12/17
FLUX1.1 [pro] ultra text-to-image FLUX1.1 [pro] ultra is the newest version of FLUX1.1 [pro], maintaining professional-grade image quality while delivering up to 2K resolution with improved photo realism.
high-res realism
OK 2024/12/17
Leffa Virtual TryOn image-to-image Leffa Virtual TryOn is a high quality image based Try-On endpoint which can be used for commercial try on.
try-on fashion clothing
OK 2024/12/17
MiniMax (Hailuo AI) Music text-to-audio Generate music from text prompts using the MiniMax model, which leverages advanced AI techniques to create high-quality, diverse musical compositions.
music
OK 2024/12/17
MiniMax (Hailuo AI) Video 01 Live image-to-video Generate video clips from your images using MiniMax Video model
motion transformation
OK 2024/12/16
Recraft 20b text-to-image Recraft 20b is a new and affordable text-to-image model.
image generation vector art typograph style
OK 2024/12/16
Hyper3D Rodin image-to-3d Rodin by Hyper3D generates realistic and production ready 3D models from text or images.
stylized
OK 2024/12/16
MiniMax (Hailuo AI) Video 01 Live text-to-video Generate video clips from your prompts using MiniMax model
motion transformation
OK 2024/12/16
Ideogram V2 Edit image-to-image Transform existing images with Ideogram V2's editing capabilities. Modify, adjust, and refine images while maintaining high fidelity and realistic outputs with precise prompt control.
realism typography
OK 2024/12/14
Trellis image-to-3d Generate 3D models from your images using Trellis. A native 3D generative model enabling versatile and high-quality 3D asset creation.
stylized
OK 2024/12/13
MMAudio V2 video-to-video MMAudio generates synchronized audio given video and/or text inputs. It can be combined with video models to get videos with audio.
ai video fast
OK 2024/12/12
Ideogram V2 text-to-image Generate high-quality images, posters, and logos with Ideogram V2. Features exceptional typography handling and realistic outputs optimized for commercial and creative use.
realism typography
OK 2024/12/4
Ideogram V2 Turbo text-to-image Accelerated image generation with Ideogram V2 Turbo. Create high-quality visuals, posters, and logos with enhanced speed while maintaining Ideogram's signature quality.
realism typography
OK 2024/12/4
Video Upscaler video-to-video The video upscaler endpoint uses RealESRGAN on each frame of the input video to upscale the video to a higher resolution.
video generation video to video ai video high fidelity motion
OK 2024/12/4
Ideogram V2 Turbo Edit image-to-image Edit images faster with Ideogram V2 Turbo. Quick modifications and adjustments while preserving the high-quality standards and realistic outputs of Ideogram.
realism typography
OK 2024/12/4
Ideogram V2 Turbo Remix image-to-image Rapidly create image variations with Ideogram V2 Turbo Remix. Fast and efficient reimagining of existing images while maintaining creative control through prompt guidance.
realism typography
OK 2024/12/4
Ideogram V2 Remix image-to-image Reimagine existing images with Ideogram V2's remix feature. Create variations and adaptations while preserving core elements and adding new creative directions through prompt guidance.
realism typography
OK 2024/12/4
Kling 1.0 text-to-video Generate video clips from your prompts using Kling 1.0
motion
OK 2024/12/3
Luma Photon Flash text-to-image Generate images from your prompts using Luma Photon Flash. Photon Flash is the most creative, personalizable, and intelligent visual models for creatives, bringing a step-function change in the cost of high-quality image generation. OK 2024/12/3
AuraFlow text-to-image AuraFlow v0.3 is an open-source flow-based text-to-image generation model that achieves state-of-the-art results on GenEval. The model is currently in beta.
typography style
OK 2024/12/2
OmniGen v1 text-to-image OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It can be used for various tasks such as Image Editing, Personalized Image Generation, Virtual Try-On, Multi Person Generation and more!
multimodal editing try-on
OK 2024/11/29
FLUX.1 [schnell] Redux image-to-image FLUX.1 [schnell] Redux is a high-performance endpoint for the FLUX.1 [schnell] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
style transfer
OK 2024/11/27
Kling 1.5 text-to-video Generate video clips from your prompts using Kling 1.5 (pro) OK 2024/11/25
FLUX.1 [schnell] text-to-image FLUX.1 [schnell] is a 12 billion parameter flow transformer that generates high-quality images from text in 1 to 4 steps, suitable for personal and commercial use. OK 2024/11/25
FLUX1.1 [pro] Redux image-to-image FLUX1.1 [pro] Redux is a high-performance endpoint for the FLUX1.1 [pro] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
style transfer
OK 2024/11/21
FLUX.1 [dev] Redux image-to-image FLUX.1 [dev] Redux is a high-performance endpoint for the FLUX.1 [dev] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
OK 2024/11/21
FLUX.1 [dev] Depth with LoRAs image-to-image Generate high-quality images from depth maps using Flux.1 [dev] depth estimation model. The model produces accurate depth representations for scene understanding and 3D visualization.
depth lora utility composition
OK 2024/11/21
LTX Video (preview) image-to-video Generate videos from images using LTX Video OK 2024/11/21
FLUX1.1 [pro] ultra Redux image-to-image FLUX1.1 [pro] ultra Redux is a high-performance endpoint for the FLUX1.1 [pro] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
style transfer high-res
OK 2024/11/21
FLUX.1 [pro] Fill image-to-image FLUX.1 [pro] Fill is a high-performance endpoint for the FLUX.1 [pro] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
editing
OK 2024/11/21
FLUX.1 [pro] Redux image-to-image FLUX.1 [pro] Redux is a high-performance endpoint for the FLUX.1 [pro] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
style transfer
Deprecated 2024/11/21
Kolors Image to Image image-to-image Photorealistic Image-to-Image
realism editing diffusion
OK 2024/11/19
IC-Light-v2 for Image Relighting image-to-image An endpoint for re-lighting photos and changing their backgrounds per a given description
relighting editing
OK 2024/11/14
Mochi 1 text-to-video Mochi 1 preview is an open state-of-the-art video generation model with high-fidelity motion and strong prompt adherence in preliminary evaluation. OK 2024/11/7
Train Flux LoRAs For Portraits training FLUX LoRA training optimized for portrait generation, with bright highlights, excellent prompt following and highly detailed results.
lora personalization
OK 2024/11/7
FLUX.1 [dev] Differential Diffusion image-to-image FLUX.1 Differential Diffusion is a rapid endpoint that enables swift, granular control over image transformations through change maps, delivering fast and precise region-specific modifications while maintaining FLUX.1 [dev]'s high-quality output.
transformation
OK 2024/11/6
MiniMax (Hailuo AI) Video 01 image-to-video Generate video clips from your images using MiniMax Video model
motion transformation
OK 2024/10/30
PuLID Flux image-to-image An endpoint for personalized image generation using Flux as per given description.
personalization style transfer
OK 2024/10/29
Birefnet Background Removal image-to-image bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS)
background removal segmentation high-res utility
OK 2024/10/28
Stable Diffusion 3.5 Large text-to-image Stable Diffusion 3.5 Large is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
diffusion typography style
OK 2024/10/27
Stable Diffusion 3.5 Medium text-to-image Stable Diffusion 3.5 Medium is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.
diffusion typography style
OK 2024/10/27
Hunyuan Video text-to-video Hunyuan Video is an Open video generation model with high visual quality, motion diversity, text-video alignment, and generation stability. This endpoint generates videos from text descriptions.
motion
OK 2024/10/22
CogVideoX-5B image-to-video Generate videos from images and prompts using CogVideoX-5B OK 2024/10/17
F5 TTS text-to-audio F5 TTS
speech
OK 2024/10/17
CogVideoX-5B video-to-video Generate videos from videos and prompts using CogVideoX-5B
editing
OK 2024/10/17
LLaVA v1.5 13B vision Vision
multimodal vision
Deprecated 2024/10/5
Kling 1.0 image-to-video Generate video clips from your images using Kling 1.0 (pro)
motion
OK 2024/10/4
Kling 1.5 image-to-video Generate video clips from your images using Kling 1.5 (pro) OK 2024/10/4
Kling 1.0 image-to-video Generate video clips from your images using Kling 1.0
motion
OK 2024/10/4
Kling 1.0 text-to-video Generate video clips from your prompts using Kling 1.0 (pro)
motion
OK 2024/10/4
LTX Video (preview) text-to-video Generate videos from prompts using LTX Video OK 2024/10/4
FLUX.1 [pro] text-to-image FLUX.1 [pro] new is an accelerated version of FLUX.1 [pro], maintaining professional-grade image quality while delivering significantly faster generation speeds. Deprecated 2024/10/3
Live Portrait image-to-image Transfer expression from a video to a portrait.
expression animation
OK 2024/10/1
FLUX.1 [dev] Inpainting with LoRAs text-to-image Super fast endpoint for the FLUX.1 [dev] inpainting model with LoRA support, enabling rapid and high-quality image inpaingting using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.
lora personalization
OK 2024/9/18
FLUX.1 [dev] with Controlnets and Loras image-to-image A general purpose endpoint for the FLUX.1 [dev] model, implementing the RF-Inversion pipeline. This can be used to edit a reference image based on a prompt.
rf-inversion editing lora
OK 2024/9/17
High Quality Stable Video Diffusion image-to-video Generate short video clips from your images using SVD v1.1 OK 2024/9/16
Image Preprocessors image-to-image Holistically-Nested Edge Detection (HED) preprocessor.
preprocess detection utility controlnet
OK 2024/9/16
Image Preprocessors image-to-image Scribble preprocessor.
preprocess utility editing controlnet sketch
OK 2024/9/16
Image Preprocessors image-to-image Depth Anything v2 preprocessor.
depth preprocess utility controlnet
OK 2024/9/16
Image Preprocessors image-to-image MiDaS depth estimation preprocessor.
depth preprocess utility controlnet
OK 2024/9/16
Image Preprocessors image-to-image Line art preprocessor.
preprocess utility sketch controlnet
OK 2024/9/16
Image Preprocessors image-to-image Segment Anything Model (SAM) preprocessor.
segmentation preprocess utility mask controlnet
OK 2024/9/16
Image Preprocessors image-to-image ZoeDepth preprocessor.
depth preprocess utility controlnet
OK 2024/9/16
Image Preprocessors image-to-image TEED (Temporal Edge Enhancement Detection) preprocessor.
preprocess detection utility controlnet
OK 2024/9/16
Image Preprocessors image-to-image M-LSD line segment detection preprocessor.
preprocess utility controlnet
OK 2024/9/16
Image Preprocessors image-to-image PIDI (Pidinet) preprocessor.
detection preprocess utility controlnet
OK 2024/9/16
Stable Video Diffusion text-to-video Generate short video clips from your prompts using SVD v1.1 OK 2024/9/16
ControlNeXt SVD video-to-video Animate a reference image with a driving video using ControlNeXt.
animation stylized
OK 2024/9/5
FLUX.1 [dev] with Controlnets and Loras text-to-image A versatile endpoint for the FLUX.1 [dev] model that supports multiple AI extensions including LoRA, ControlNet conditioning, and IP-Adapter integration, enabling comprehensive control over image generation through various guidance methods.
lora controlnet ip-adapter
OK 2024/8/21
Stable Diffusion V3 text-to-image Stable Diffusion 3 Medium (Text to Image) is a Multimodal Diffusion Transformer (MMDiT) model that improves image quality, typography, prompt understanding, and efficiency.
diffusion style
OK 2024/8/20
Segment Anything Model image-to-image SAM.
segmentation mask
Deprecated 2024/8/20
Segment Anything Model 2 image-to-image SAM 2 is a model for segmenting images and videos in real-time.
segmentation mask real-time
OK 2024/8/15
Segment Anything Model 2 video-to-video SAM 2 is a model for segmenting images and videos in real-time.
segmentation mask real-time
OK 2024/8/15
FLUX.1 [dev] with Controlnets and Loras image-to-image FLUX General Image-to-Image is a versatile endpoint that transforms existing images with support for LoRA, ControlNet, and IP-Adapter extensions, enabling precise control over style transfer, modifications, and artistic variations through multiple guidance methods.
lora controlnet ip-adapter
OK 2024/8/14
FLUX.1 [dev] with Controlnets and Loras image-to-image FLUX General Inpainting is a versatile endpoint that enables precise image editing and completion, supporting multiple AI extensions including LoRA, ControlNet, and IP-Adapter for enhanced control over inpainting results and sophisticated image modifications.
lora controlnet ip-adapter
OK 2024/8/14
FLUX.1 [dev] with Controlnets and Loras image-to-image A specialized FLUX endpoint combining differential diffusion control with LoRA, ControlNet, and IP-Adapter support, enabling precise, region-specific image transformations through customizable change maps.
lora controlnet ip-adapter
OK 2024/8/13
FLUX.1 [dev] with LoRAs image-to-image FLUX LoRA Image-to-Image is a high-performance endpoint that transforms existing images using FLUX models, leveraging LoRA adaptations to enable rapid and precise image style transfer, modifications, and artistic variations.
lora style transfer
OK 2024/8/13
Fooocus Upscale or Vary text-to-image Default parameters with automated optimizations and quality improvements.
upscaling vary stylized
OK 2024/8/12
FLUX.1 Subject text-to-image Super fast endpoint for the FLUX.1 [schnell] model with subject input capabilities, enabling rapid and high-quality image generation for personalization, specific styles, brand identities, and product-specific outputs.
personalization customization
OK 2024/8/1
Sana text-to-image Sana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, with the ability to generate 4K images in less than a second. OK 2024/8/1
PixArt-Σ text-to-image Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
diffusion
OK 2024/8/1
FLUX.1 [dev] with LoRAs text-to-image Super fast endpoint for the FLUX.1 [dev] model with LoRA support, enabling rapid and high-quality image generation using pre-trained LoRA adaptations for personalization, specific styles, brand identities, and product-specific outputs.
lora personalization
OK 2024/8/1
SDXL ControlNet Union text-to-image An efficent SDXL multi-controlnet text-to-image model.
diffusion controlnet composition
OK 2024/7/31
SDXL ControlNet Union image-to-image An efficent SDXL multi-controlnet image-to-image model.
diffusion controlnet composition
OK 2024/7/31
SDXL ControlNet Union image-to-image An efficent SDXL multi-controlnet inpainting model.
diffusion controlnet composition
OK 2024/7/31
Kolors text-to-image Photorealistic Text-to-Image
realism diffusion
OK 2024/7/24
AMT Frame Interpolation image-to-video Interpolate between image frames
interpolation editing
OK 2024/7/18
MusePose video-to-video Animate a reference image with a driving video using MusePose. Deprecated 2024/7/18
FLUX.1 [dev] image-to-image FLUX.1 Image-to-Image is a high-performance endpoint for the FLUX.1 [dev] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.
style transfer
OK 2024/7/11
Live Portrait image-to-video Transfer expression from a video to a portrait.
expression animation
OK 2024/7/9
Era 3D image-to-image A powerful image to novel multiview model with normals. OK 2024/7/1
Stable Cascade text-to-image Stable Cascade: Image generation on a smaller & cheaper latent space.
diffusion lcm
OK 2024/6/25
Florence-2 Large image-to-image Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
detection multimodal vision
OK 2024/6/22
Florence-2 Large image-to-image Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
multimodal vision detection
OK 2024/6/22
Florence-2 Large vision Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
multimodal vision
OK 2024/6/22
Florence-2 Large image-to-image Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
multimodal vision
OK 2024/6/22
Florence-2 Large image-to-image Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
multimodal vision
OK 2024/6/22
Florence-2 Large image-to-image Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
ocr multimodal vision
OK 2024/6/22
Florence-2 Large vision Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
captioning multimodal vision
OK 2024/6/22
Florence-2 Large vision Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
ocr multimodal vision
OK 2024/6/22
Florence-2 Large image-to-image Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
multimodal vision segmentation
OK 2024/6/22
Florence-2 Large image-to-image Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
multimodal vision
OK 2024/6/22
Florence-2 Large vision Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
multimodal vision
OK 2024/6/22
Florence-2 Large vision Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
captioning multimodal vision
OK 2024/6/22
Florence-2 Large image-to-image Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
multimodal vision segmentation
OK 2024/6/22
Florence-2 Large vision Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks
captioning multimodal vision
OK 2024/6/22
Stable Diffusion XL text-to-image Run SDXL at the speed of light
diffusion lora embeddings high-res style
OK 2024/6/12
Stable Diffusion V3 image-to-image Stable Diffusion 3 Medium (Image to Image) is a Multimodal Diffusion Transformer (MMDiT) model that improves image quality, typography, prompt understanding, and efficiency.
diffusion editing style
OK 2024/6/12
SoteDiffusion text-to-image Anime finetune of Würstchen V3.
lcm stylized
OK 2024/6/10
Luma Photon text-to-image Generate images from your prompts using Luma Photon. Photon is the most creative, personalizable, and intelligent visual models for creatives, bringing a step-function change in the cost of high-quality image generation. OK 2024/6/3
Stable Video Diffusion Turbo text-to-video Generate short video clips from your images using SVD v1.1 at Lightning Speed
lcm diffusion turbo
OK 2024/6/3
DWPose Pose Prediction image-to-image Predict poses from images.
pose utility
OK 2024/6/1
SD 1.5 Depth ControlNet image-to-image SD 1.5 ControlNet
diffusion editing manipulation controlnet
OK 2024/5/31
CCSR Upscaler image-to-image SOTA Image Upscaler
upscaling
OK 2024/5/5
Omni Zero image-to-image Any pose, any style, any identity
style transfer
OK 2024/4/25
Lightning Models text-to-image Collection of SDXL Lightning models.
diffusion lightning
OK 2024/4/25
Playground v2.5 text-to-image State-of-the-art open-source model in aesthetic quality
artistic style
OK 2024/4/25
Hyper SDXL image-to-image Hyper-charge SDXL's performance and creativity.
diffusion editing
OK 2024/4/25
Realistic Vision text-to-image Generate realistic images.
realism diffusion
OK 2024/4/25
Dreamshaper text-to-image Dreamshaper model.
stylized diffusion
OK 2024/4/25
Hyper SDXL image-to-image Hyper-charge SDXL's performance and creativity.
diffusion
OK 2024/4/25
IP Adapter Face ID image-to-image High quality zero-shot personalization
ip-adapter personalization customization editing
OK 2024/4/22
Stable Diffusion with LoRAs image-to-image Run Any Stable Diffusion model with customizable LoRA weights.
diffusion lora customization fine-tuning
OK 2024/4/18
Stable Diffusion with LoRAs image-to-image Run Any Stable Diffusion model with customizable LoRA weights.
diffusion lora customization fine-tuning
OK 2024/4/17
Stable Diffusion XL image-to-image Run SDXL at the speed of light
diffusion high-res lora ip-adapter controlnet
OK 2024/4/16
Stable Diffusion XL image-to-image Run SDXL at the speed of light
diffusion high-res lora ip-adapter controlnet
OK 2024/4/16
Stable Diffusion v1.5 text-to-image Stable Diffusion v1.5
diffusion
OK 2024/4/16
Layer Diffusion XL text-to-image SDXL with an alpha channel. OK 2024/4/13
MuseTalk image-to-video MuseTalk is a real-time high quality audio-driven lip-syncing model. Use MuseTalk to animate a face with your own audio.
animation lip sync real-time
OK 2024/4/11
Stable Diffusion XL Lightning text-to-image Run SDXL at the speed of light
diffusion lightning real-time
OK 2024/4/11
AuraSR image-to-image Upscale your images with AuraSR.
upscaling high-res
OK 2024/4/11
Sad Talker image-to-video Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
animation
OK 2024/4/11
Wizper (Whisper v3 -- fal.ai edition) speech-to-text [Experimental] Whisper v3 Large -- but optimized by our inference wizards. Same WER, double the performance!
transcription speech
OK 2024/4/8
NSFW Filter vision Predict the probability of an image being NSFW.
filter safety utility
OK 2024/3/22
Moondream vision Answer questions from the images.
multimodal vision
OK 2024/3/20
Fooocus text-to-image Fooocus extreme speed mode as a standalone app.
stylized
OK 2024/3/13
Face to Sticker image-to-image Create stickers from faces.
sticker editing
OK 2024/3/11
PhotoMaker image-to-image Customizing Realistic Human Photos via Stacked ID Embedding
editing customization realism personalization
OK 2024/3/8
T2V Turbo - Video Crafter text-to-video Generate short video clips from your prompts
turbo
OK 2024/3/8
ControlNet SDXL text-to-image Generate Images with ControlNet.
diffusion controlnet manipulation
OK 2024/2/28
Creative Upscaler image-to-image Create creative upscaled images.
upscaling
OK 2024/2/27
Birefnet Background Removal image-to-image bilateral reference framework (BiRefNet) for high-resolution dichotomous image segmentation (DIS)
background removal segmentation high-res utility
OK 2024/2/27
Stable Diffusion XL Lightning image-to-image Run SDXL at the speed of light
diffusion lightning editing
OK 2024/2/21
Playground v2.5 image-to-image State-of-the-art open-source model in aesthetic quality
inpaint artistic style
OK 2024/2/21
Stable Diffusion XL Lightning image-to-image Run SDXL at the speed of light
diffusion lightning
OK 2024/2/21
Hyper SDXL text-to-image Hyper-charge SDXL's performance and creativity.
diffusion real-time
OK 2024/2/21
Playground v2.5 image-to-image State-of-the-art open-source model in aesthetic quality
artistic style
OK 2024/2/21
AMT Interpolation video-to-video Interpolate between video frames
interpolation editing
OK 2024/2/21
AnimateDiff text-to-video Animate your ideas!
animation stylized
OK 2024/2/21
Whisper speech-to-text Whisper is a model for speech transcription and translation.
transcription translation speech
OK 2024/2/19
Latent Consistency Models (v1.5/XL) image-to-image Run SDXL at the speed of light
lcm diffusion turbo real-time editing
OK 2024/2/19
Latent Consistency Models (v1.5/XL) text-to-image Run SDXL at the speed of light
lcm diffusion turbo real-time
OK 2024/2/19
Latent Consistency Models (v1.5/XL) image-to-image Run SDXL at the speed of light
lcm diffusion turbo real-time editing
OK 2024/2/19
Fooocus text-to-image Fooocus extreme speed mode as a standalone app. OK 2024/2/16
LLaVA v1.6 34B vision Vision
multimodal vision
OK 2024/2/14
AnimateDiff Turbo text-to-video Animate your ideas in lightning speed!
animation stylized turbo
OK 2024/2/13
Illusion Diffusion text-to-image Create illusions conditioned on image.
composition stylized
OK 2024/2/13
Fooocus Image Prompt text-to-image Default parameters with automated optimizations and quality improvements.
stylized
OK 2024/2/13
Face Retoucher image-to-image Automatically retouches faces to smooth skin and remove blemishes.
editing
OK 2024/2/13
Stable Video Diffusion Turbo image-to-video Generate short video clips from your images using SVD v1.1 at Lightning Speed
turbo
OK 2024/2/13
Midas Depth Estimation image-to-image Create depth maps using Midas depth estimation.
depth utility
OK 2024/2/13
AnimateDiff Turbo video-to-video Re-animate your videos in lightning speed!
animation stylized turbo
OK 2024/2/13
Fooocus Inpainting text-to-image Default parameters with automated optimizations and quality improvements.
stylized editing
OK 2024/2/13
MiniMax (Hailuo AI) Video 01 text-to-video Generate video clips from your prompts using MiniMax model
motion transformation
OK 2024/2/13
AnimateDiff video-to-video Re-animate your videos!
animation stylized
OK 2024/2/13
Clarity Upscaler image-to-image Clarity upscaler for upscaling images with high very fidelity.
upscaling
OK 2024/2/4
Latent Consistency (SDXL & SDv1.5) text-to-image Produce high-quality images with minimal inference steps.
diffusion lcm real-time
OK 2024/2/4
TripoSR image-to-3d State of the art Image to 3D Object generation OK 2024/1/30
DiffusionEdge text-to-image Diffusion based high quality edge detection
detection
OK 2024/1/8
Stable Audio Open text-to-audio Open source text-to-audio model.
music
OK 2024/1/4
Marigold Depth Estimation image-to-image Create depth maps using Marigold depth estimation.
depth utility
OK 2023/12/28
PuLID image-to-image Tuning-free ID customization.
editing customization personalization
OK 2023/12/14
ControlNet SDXL image-to-image Generate Images with ControlNet.
diffusion controlnet editing manipulation
OK 2023/12/1
ControlNet SDXL image-to-image Generate Images with ControlNet.
diffusion controlnet editing manipulation
OK 2023/12/1
Fooocus text-to-image Default parameters with automated optimizations and quality improvements.
stylized
OK 2023/11/16
Optimized Latent Consistency (SDv1.5) image-to-image Produce high-quality images with minimal inference steps. Optimized for 512x512 input image size.
diffusion lcm real-time
OK 2023/11/9
Animatediff SparseCtrl LCM text-to-video Animate Your Drawings with Latent Consistency Models!
lcm animation stylized
OK 2023/11/9
Inpainting sdxl and sd image-to-image Inpaint images with SD and SDXL
editing diffusion
OK 2023/11/4
ControlNet SDXL image-to-image Generate Images with ControlNet.
diffusion controlnet manipulation
Deprecated 2023/11/1
Upscale Images image-to-image Upscale images by a given factor.
upscaling high-res
OK 2023/10/30
Remove Background image-to-image Remove the background from an image.
background removal utility editing
OK 2023/10/5
Stable Diffusion with LoRAs text-to-image Run Any Stable Diffusion model with customizable LoRA weights.
diffusion lora customization
OK 2023/9/26