Do Open-Source Coding Models Actually Work?

A Practical Example Using Gemma 4 and Qwen2.5-Coder with Aider

Open-source coding models are improving quickly, and recent models such as Gemma 4 already show promising behavior in real development workflows.

In this article, I walk through a practical example comparing two open-source models, Gemma 4 and Qwen2.5-Coder (14B), within an agent-based coding setup using Aider. The goal is not just to see whether they can generate code, but to understand how they behave as coding agents.

More specifically, the focus is on a few simple but important questions:

Can the model produce code that actually runs?
Does it anticipate runtime issues?
Can it behave like a developer rather than a code generator?

The results show that open-source models are starting to demonstrate meaningful agent-like behavior. They are not yet consistently on par with leading proprietary systems, but they are already useful in many scenarios and point toward a broader shift toward local, agent-driven development.

Tooling: Aider in a Local Agent Workflow

To make the comparison realistic, both models were used through Aider.

Aider is a terminal-based coding assistant that works directly inside a repository. It reads files, edits them, and applies changes iteratively. If you have used Claude Code, the idea is very similar. The model is not just responding to prompts. It is operating inside your project and modifying real code.

This setup matters because it changes what we are evaluating. The goal here is not text generation quality. It is agent behavior in a real workflow.

Models

Gemma 4 (`gemma4:e4b`)

Gemma 4 is developed by Google as part of a family of efficient open-weight models designed for local deployment. It is relatively compact, follows instructions well, and offers strong reasoning given its size.

Although it is not specifically designed only for coding, it performs surprisingly well in practical development tasks, especially when execution awareness is required.

Qwen2.5-Coder (14B)

Qwen2.5-Coder, developed by Alibaba, is a code-specialized model trained extensively on programming data. It is larger in size and explicitly optimized for code generation and debugging.

From a design perspective, this model should have an advantage in coding tasks, particularly in producing structured and syntactically correct implementations.

💻 Environment

Both models were run locally under the same conditions:

Intel i7–13650HX
32 GB RAM
RTX 5070 Laptop GPU

This setup reflects a realistic developer environment and avoids any dependency on external APIs or cloud infrastructure.

Task

Both models were given the same task:

Build a Python CLI tool using OpenCV that loads an image, applies multiple transformations, detects contours, computes statistics, and generates a visualization grid. The tool should also save outputs and handle edge cases.

This is a relatively simple task on the surface, but it requires a combination of correct API usage, data handling, and awareness of how image data behaves in practice.

Input

Output (Gemma 4)

Results

Gemma 4

The code generated by Gemma 4 executed successfully on the first run. It produced the expected outputs, handled differences in image formats correctly, and required no manual fixes.

Qwen2.5-Coder (14B)

The code generated by Qwen2.5-Coder initially appeared correct, but failed during execution with the following error:

could not broadcast input array from shape (H, W) into shape (H, W, 3)

🔍 What Went Wrong

The failure was caused by improper handling of image channel formats.

Grayscale images have shape (H, W), while color images have shape (H, W, 3). When combining them into a single visualization grid, this difference must be handled explicitly.

Gemma 4 accounted for this case by converting grayscale images into three-channel format before combining them. Qwen2.5-Coder did not, which led to the runtime error.

This is a small detail, but it reflects a deeper difference in behavior.

Generation vs Agentic Coding

This example highlights an important distinction between two types of model behavior.

Code generation focuses on producing syntactically correct and structurally plausible code. It relies heavily on learned patterns and often assumes ideal conditions.

Agentic coding, on the other hand, requires reasoning about execution. It involves anticipating edge cases, handling imperfect inputs, and producing code that works without manual intervention.

Both models were able to generate reasonable code. Only one produced code that actually survived execution.

Broader Perspective

Open-source agentic coding is evolving quickly, and recent models are beginning to show meaningful capability in real development workflows. Models that run locally are no longer limited to simple code generation. They can now participate in practical tasks, integrate into existing environments, and operate without relying on external APIs.

This becomes particularly relevant in organizational settings where data cannot leave internal systems and where privacy and compliance are strict requirements. In such environments, local models are not just an alternative to cloud-based solutions. They can become the preferred approach, enabling AI-assisted development without exposing sensitive code or data.

At the same time, it is important to recognize current limitations. Open-source models are improving rapidly, but they are not yet consistently at the level of leading proprietary systems such as those developed by Anthropic.

If you have worked with high-end proprietary systems such as Claude Opus 4.6, especially in scenarios involving large context windows and complex multi-step planning, the gap is still very clear. These systems are significantly more capable when it comes to long-horizon reasoning, structured planning, and maintaining consistency across complex tasks.

However, the direction is clear. The gap is narrowing, and for many practical use cases, open-source models are already sufficient. When combined with the advantages of local deployment, this creates a compelling and increasingly practical alternative.

If this trajectory continues, open-source agentic systems will play an increasingly important role in software development, especially in environments where control, privacy, and integration into existing infrastructure are essential.

Neurogenou

Quick Links

Customer Support