Skip to content
#

qwen2-5-vl

Here are 56 public repositories matching this topic...

Qwen-Image-Edit-2509-LoRAs-Fast is a high-performance, user-friendly web application built with Gradio that leverages the advanced Qwen/Qwen-Image-Edit-2509 model from Hugging Face for seamless image editing tasks.

  • Updated Dec 23, 2025
  • Python

Multimodal-OCR is an experimental, high-performance visual reasoning and optical character recognition suite designed to accurately extract text, analyze visual content, and parse complex document structures. Built upon a diverse ecosystem of cutting-edge vision-language models.

  • Updated Mar 23, 2026
  • Python

Qwen3-VL-Outpost is an experimental, high-performance visual reasoning and multimodal inference suite designed for advanced image analysis, optical character recognition, and complex scene understanding. Built around the state-of-the-art Qwen3-VL and Qwen2.5-VL model families.

  • Updated Mar 23, 2026
  • Python

Multimodal-OCR3 is a highly capable, experimental optical character recognition and visual processing suite designed for precise text extraction, document parsing, and markdown generation. Leveraging a powerful selection of vision-language.

  • Updated Mar 23, 2026
  • Python

A Gradio-based demonstration for the AllenAI SAGE-MM-Qwen3-VL-4B-SFT_RL multimodal model, specialized in video reasoning tasks. Users upload MP4 videos, provide natural language prompts (e.g., "Describe this video in detail" or custom questions), and receive detailed textual analyses.

  • Updated Dec 21, 2025
  • Python

QIE-Bbox-Studio (Qwen Image Edit Bounding Box Studio) is an advanced AI-powered image editing interface built on top of the Qwen2.5-VL and Qwen-Image-Edit models. This application allows users to manipulate images with extreme precision by defining bounding boxes and providing natural language prompts.

  • Updated Mar 17, 2026
  • Python

Qwen-Image-Edit-2509-LoRAs-Fast-Fusion is a fast, interactive web application built with Gradio that enables advanced image editing using the Qwen/Qwen-Image-Edit-2509 model from Alibaba's Qwen team. It leverages specialized LoRA adapters for efficient, low-step inference (as few as 4 steps).

  • Updated Dec 12, 2025
  • Python

RxLM-Med: A multimodal clinical AI agent featuring System 2 reasoning, cross-lingual hierarchical RAG (BM25 + FAISS + RRF), deterministic medical calculation engine, and Traffic Light Protocol (TLP) safety alignment — built on Qwen-VL with LoRA fine-tuning, SFT/DPO alignment, and INT4 quantization for real-world lab report interpretation.

  • Updated Apr 1, 2026
  • Python

Improve this page

Add a description, image, and links to the qwen2-5-vl topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the qwen2-5-vl topic, visit your repo's landing page and select "manage topics."

Learn more