This app lets you talk to your engineering drawings. Upload an image of your P&ID or single-line diagram and a state-of-the-art language model will help you interpret the content. You will receive clear answers directly from your drawings.
For P&ID you can ask multiple question to get a complet undersnting of the process. Here are some examples:
"What is the main purpose of the process shown, and how do the steam lines interact with the vessel to heat the incoming fluid?" "What are the main components of the P&ID?" "How does the temperature-control loop (TT 100, TIC 100, TV 100) work together to keep the vessel at the target temperature?"
Similar as the P&ID diagram you can "talk" to your electrical drawings such as single line diagrams. You can ask question like the following:
- "What is the configuration of the substation? Only base your answer on the drawings. If any assumption is made in your answer or if there is uncertainty, state it."
- "How do transformers T1 and T2 support reliability if one source is lost?"
This application integrates Large Language Models (LLMs) from OpenAI for conversational understanding and Google's Gemini model for visual object detection to provide an interactive experience with your engineering drawings. The Instructor library is utilized to manage structured outputs from both LLMs, ensuring predictable and manageable data formats.
When you upload a drawing and ask a question:
- Conversational AI (OpenAI): Your questions are processed by an OpenAI model (e.g., GPT-4o mini). This model understands the context of your drawing and your query, providing textual answers.
- Object Detection (Google Gemini): If your query involves identifying or locating components in the drawing, the OpenAI model can trigger an action. This action invokes Google's Gemini model, which specializes in image understanding and object detection, to identify and provide bounding boxes for the requested components.
- Structured Outputs (Instructor): Instructor facilitates communication with both OpenAI and Gemini, ensuring that their responses (textual answers or detected object coordinates) are returned in a structured format that the application can easily use.
The chat and visualization workflow is orchestrated as follows:
- User Interaction: You interact with the app through
vkt.Chatand by uploading an image usingvkt.FileField. - Controller Logic: When a message is sent or an image uploaded, the
call_llmmethod in the controller is triggered. - LLM Processing (OpenAI): The conversation history and the uploaded image (converted to base64) are sent to the OpenAI model via Instructor. The model is prompted to provide a textual response and, if relevant, an
Actionto detect objects. - Response Handling:
- Textual Response: The textual part of the LLM's response is streamed directly to the chat interface.
- Action for Object Detection: If the LLM returns an
Action(e.g., a query like "detect pumps"), thisActionis serialized to JSON and stored usingvkt.Storage():
# In controller.py, call_llm method vkt.Storage().set( "View", # Key for the stored data data=vkt.File.from_data(partial.action.model_dump_json()), scope="entity" )
- Visualization Update: The
plot_viewmethod, decorated with@vkt.PlotlyView, listens for changes in the data stored under the "View" key. - Object Detection (Gemini):
- The
plot_viewretrieves theActionfromvkt.Storage(). - It uses the
queryfrom theActionand the uploaded image to call thedetect_objectsfunction. detect_objectsuses the Gemini model (via Instructor) to identify components and return their bounding boxes (BBoxList).
- The
- Displaying Annotations:
- The
plot_bounding_boxes_gofunction takes the image and the bounding box data to create a Plotly figure with annotations. - This figure is then displayed in the
PlotlyView.
# In controller.py, plot_view method # ... (retrieve action, call detect_objects) ... fig = plot_bounding_boxes_go(base64_image_string, bounding_box_data) return vkt.PlotlyResult(fig.to_json())
- The
- Dynamic Updates: The app manages storage, deleting or updating stored data when inputs change, ensuring the views (chat and annotated drawing) remain current.
To use this application, you need API keys for both OpenAI and Google Gemini.
Local Development:
- Copy the
.env.examplefile to a new file named.envin the project's root directory. - Add your API keys to the
.envfile:You can obtain a Google API key from Google AI Studio. TheGEMINI_API_KEY="your-google-gemini-api-key" OPENAI_API_KEY="your-openai-api-key"python-dotenvmodule loads these keys as environment variables. Never commit your.envfile or expose your API keys publicly.
Published Apps (VIKTOR Platform): For apps deployed on the VIKTOR platform, manage your API keys using VIKTOR’s environment variables. Administrators can set these via the 'Apps' menu in your VIKTOR environment. These variables are encrypted. For detailed instructions, refer to the VIKTOR environment variables documentation.
- Instructor Framework:
- OpenAI:
- Google Gemini:
- VIKTOR Platform:

