Category: Multimodal-LLM
-
Multimodal LLM vs. Multimodal mom said we have at home.
In a world where AI is evolving into increasingly versatile multimodal systems, I decided to build a tool that brings together state-of-the-art object detection, OCR, and language generation into one cohesive workflow. Inspired by my mom’s extreme positivity when we couldn’t afford to dine outside as kids, I combined YOLOv8 with pytesseract and a Llama-based…
-
Visual-based Web Scraping: Using power of multimodal LLMs to Dynamic Web Content Extraction
With LLMs and vision models becoming more accessible than ever, I started exploring the intersection of web scraping and AI. As a test case, I recently wrote a code leveraging LLAMA vision to scrape the content of a webpage but using the multi-modality of LLM; we can call it Visual-based WebScraping. Instead of traditional HTML…