Build software better, together

Qwen-Image-Edit-2509-LoRAs-Fast is a high-performance, user-friendly web application built with Gradio that leverages the advanced Qwen/Qwen-Image-Edit-2509 model from Hugging Face for seamless image editing tasks.

python kernel numpy torch pytorch peft torchvision diffusion-models huggingface-transformers huggingface-spaces diffusers flash-attention-3 qwen2-5-vl qwen-image-edit qwen3-vl qwen-image-edit-2509 aoti

Updated Dec 23, 2025
Python

PRITHIVSAKTHIUR / Multimodal-OCR

Star

Multimodal-OCR is an experimental, high-performance visual reasoning and optical character recognition suite designed to accurately extract text, analyze visual content, and parse complex document structures. Built upon a diverse ecosystem of cutting-edge vision-language models.

python pillow torch gradio opencv-python ocr-recognition torchvision huggingface-transformers huggingface-models huggingface-spaces qwen2-vl-2b qwen2-5-vl

Updated Mar 23, 2026
Python

zhangguanghao523 / CMMCoT

Star

[AAAI'26] Official implementation of CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation

mcot cot chain-of-thought mllm multimodel-large-language-model qwen2-vl qwen2-5-vl

Updated Dec 5, 2025
Python

cilabuniba / artseek

Star

ArtSeek: Deep artwork understanding via multimodal in-context reasoning and late interaction retrieval

computer-vision deep-learning multimodal-learning multimodal vision-language large-language-models llm mllm multimodal-large-language-models retrieval-augmented-generation qwen qwen2-5 qwen2-5-vl

Updated Mar 10, 2026
Jupyter Notebook

PRITHIVSAKTHIUR / Super-OCRs-Demo

Star

A Gradio-based demo application for comparing state-of-the-art OCR models: DeepSeek-OCR, Dots.OCR, HunyuanOCR, and Nanonets-OCR2-3B.

python ocr pillow torch accelerate supervision gradio opencv-python nanonets torchvision sentencepiece huggingface-transformers huggingface-spaces flash-attention-2 hunyuan qwen2-5-vl dots-ocr deepseek-ocr easydict

Updated Nov 28, 2025
Python

PRITHIVSAKTHIUR / Qwen3-VL-Outpost

Star

Qwen3-VL-Outpost is an experimental, high-performance visual reasoning and multimodal inference suite designed for advanced image analysis, optical character recognition, and complex scene understanding. Built around the state-of-the-art Qwen3-VL and Qwen2.5-VL model families.

torch gradio opencv-python video-understanding huggingface-transformers huggingface-spaces vision-language-model qwen2-vl qwen2-5-vl qwen3-vl

Updated Mar 23, 2026
Python

smsk-01 / GRPO-Trainer-Images

Star

GRPO trainer for VLM

images grpo qwen2-5-vl grpovlm grpoimages

Updated Oct 8, 2025
Python

PRITHIVSAKTHIUR / Multimodal-OCR3

Star

Multimodal-OCR3 is a highly capable, experimental optical character recognition and visual processing suite designed for precise text extraction, document parsing, and markdown generation. Leveraging a powerful selection of vision-language.

python ocr pillow pytorch matplotlib gradio ocr-recognition nanonets huggingface-transformers vision-transformer huggingface-models huggingface-spaces vision-language-model qwen2-5-vl qwen3-vl chandra-ocr dotsocr olmocr2

Updated Mar 23, 2026
Python

PRITHIVSAKTHIUR / SAGE-MM-Video-Reasoning

Star

A Gradio-based demonstration for the AllenAI SAGE-MM-Qwen3-VL-4B-SFT_RL multimodal model, specialized in video reasoning tasks. Users upload MP4 videos, provide natural language prompts (e.g., "Describe this video in detail" or custom questions), and receive detailed textual analyses.

torch accelerate gradio opencv-python torchvision huggingface-transformers decord video-reasoning huggingface-spaces qwen2-5-vl qwen3-vl molmo2

Updated Dec 21, 2025
Python

PRITHIVSAKTHIUR / QIE-Bbox-Studio

Star

QIE-Bbox-Studio (Qwen Image Edit Bounding Box Studio) is an advanced AI-powered image editing interface built on top of the Qwen2.5-VL and Qwen-Image-Edit models. This application allows users to manipulate images with extreme precision by defining bounding boxes and providing natural language prompts.

numpy pytorch image-editor gradio opencv-python bbox torchvision huggingface-transformers huggingface-models qwen2-5-vl qwen-image-edit-2509 qwen-image-edit-2511

Updated Mar 17, 2026
Python

PRITHIVSAKTHIUR / Qwen-Image-Edit-2509-LoRAs-Fast-Fusion

Star

Qwen-Image-Edit-2509-LoRAs-Fast-Fusion is a fast, interactive web application built with Gradio that enables advanced image editing using the Qwen/Qwen-Image-Edit-2509 model from Alibaba's Qwen team. It leverages specialized LoRA adapters for efficient, low-step inference (as few as 4 steps).

Updated Dec 12, 2025
Python

tokisaka23 / RxLM-Med-Agent

Star

RxLM-Med: A multimodal clinical AI agent featuring System 2 reasoning, cross-lingual hierarchical RAG (BM25 + FAISS + RRF), deterministic medical calculation engine, and Traffic Light Protocol (TLP) safety alignment — built on Qwen-VL with LoRA fine-tuning, SFT/DPO alignment, and INT4 quantization for real-world lab report interpretation.

quantization-algorithms deepspeed langsmith rag-pipeline agentic-workflow qwen2-5-vl system2-reasoning

Updated Apr 1, 2026
Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qwen2-5-vl

Here are 56 public repositories matching this topic...

2U1 / Qwen-VL-Series-Finetune

sophgo / LLM-TPU

thaoshibe / relsim

Brekel / VisionCaptioner

yuanc3 / DATE

liuyifan22 / Qwen2.5-VL-Batched

PRITHIVSAKTHIUR / OCR-ReportLab-Notebooks

o-l-l-i / simple-captioner

PRITHIVSAKTHIUR / Qwen-Image-Edit-2509-LoRAs-Fast

PRITHIVSAKTHIUR / Multimodal-OCR

zhangguanghao523 / CMMCoT

cilabuniba / artseek

PRITHIVSAKTHIUR / Super-OCRs-Demo

PRITHIVSAKTHIUR / Qwen3-VL-Outpost

smsk-01 / GRPO-Trainer-Images

PRITHIVSAKTHIUR / Multimodal-OCR3

PRITHIVSAKTHIUR / SAGE-MM-Video-Reasoning

PRITHIVSAKTHIUR / QIE-Bbox-Studio

PRITHIVSAKTHIUR / Qwen-Image-Edit-2509-LoRAs-Fast-Fusion

tokisaka23 / RxLM-Med-Agent

Improve this page

Add this topic to your repo