Multimodal-LLM – Neurogenou

Category: Multimodal-LLM

GenAI, Multimodal-LLM

Multimodal LLM vs. Multimodal mom said we have at home.

Sina

February 24, 2025

In a world where AI is evolving into increasingly versatile multimodal systems, I decided to build a tool that brings together state-of-the-art object detection, OCR, and language generation into one cohesive workflow. Inspired by my mom’s extreme positivity when we couldn’t afford to dine outside as kids, I combined YOLOv8 with pytesseract and a Llama-based…
Continue Reading
Multimodal-LLM

Visual-based Web Scraping: Using power of multimodal LLMs to Dynamic Web Content Extraction

Sina

February 18, 2025

With LLMs and vision models becoming more accessible than ever, I started exploring the intersection of web scraping and AI. As a test case, I recently wrote a code leveraging LLAMA vision to scrape the content of a webpage but using the multi-modality of LLM; we can call it Visual-based WebScraping. Instead of traditional HTML…
Continue Reading