A Vision-Language Model (VLM) powered system for waypoint generation and intelligent navigation using ROS2, Nav2, and TurtleBot3.
Base Results (Youtube Demo) : https://youtu.be/9UGKGdawtN0?si=5kDADisyX7LLFk6V
This project integrates Vision-Language Models (VLMs) with ROS2 Nav2 to enhance TurtleBot3’s navigation in complex environments.
- The system analyzes cost maps and occupancy grids.
- It generates intelligent waypoints using GPT-4V (Vision).
- Converts pixel coordinates to real-world waypoints for TurtleBot3.
- Executes autonomous navigation while avoiding obstacles dynamically.
✅ AI-Powered Path Planning – Generates waypoints intelligently.
✅ Cost Map & Occupancy Grid Analysis – Extracts spatial insights.
✅ Pixel-to-World Coordinate Conversion – Ensures real-world accuracy.
✅ ROS2 Nav2 Integration – Executes AI-generated waypoints.
✅ Gazebo Simulation & RViz Visualization – Test before real-world deployment.
- ROS2 Humble
- Nav2 (Navigation Stack)
- Gazebo (for simulation)
- OpenAI API Key (for GPT-4 Vision)
- Python libraries:
numpy
,opencv-python
,PIL
,requests
git clone https://github.com/atharvahude/nav2-vlm.git cd nav2-vlm
- Can use the world files that I have made or use your own