Coding agents: nuance is key

2025 marks the first year where LLMs became basically unavoidable as a dev. Whether at the PR merge, RCA-war-room or just at the watercooler, we were all talking about this and what it means for our industry. I’ve had both positive and negative experiences with coding assistants like Claude, Gemini, and GitHub GitHub Coilot. Most of my hands-on experience has been Claude Pro and GitHub Copilot and limited use of Gemini. The perspectives shared here are admittedly largely subjective, but some observations are substantiated with available literature, which is rapidly evolving.

Productivity Gains: Real but uneven?

Research from MIT, Princeton, and the University of Pennsylvania analyzing 4,800+ developers found a 26% productivity increase using GitHub Copilot. A key concern is how productivity was measured in this context. In this experiment coding challenges were provided with a template repository as a starting point. This included tests to judge correctness.

The results suggested that junior developers see the largest gains while senior developers were slightly less likely to accept code suggestions. How applicable is the 26% productivity increase in the real-world world? This likely depends on how closely your working environment resembles this experiment setup? Do developers get highly accurate specifications complete with correctness checks, or do they often receive unclear or ambiguous direction that needs to be clarified with stakeholders?

There is also research that suggests little to no actual productivity benefit; a 2025 METR study of experienced open-source developers found no significant productivity improvement when using AI tools like Cursor Pro with Claude, suggesting benefits vary significantly by context and developer experience. The study highlights that different stakeholders have different expectations of agentic tools:

Economics experts predicted ~40% speedup. (biased towards profits?)
Machine learning experts also predicted a ~40% speedup. (biased towards their domain’s impact?)
Developers before the study predicted ~27% speedup.
Developers after the study predicted ~25% speedup.

The actual result observed:

A decrease in speedup of 19%.

Key Insight: Productivity benefits are not universal and depend heavily on developer experience level and familiarity with the codebase. It might even be counterproductive in reality where different stakeholders biased towards their preferred outcomes.

Impact on Learning and Skill Development

One of the first observations made was that of undermined learning. Even if the code assistant produced the result that was requested, it did so without elevating the knowledge or understanding of the developer involved. This is a more extreme version of how driving with GPS slows your natural navigation ability. Our minds are optimization machines; if they are fooled into thinking they can get the same result without spending the same energy - we’re seemingly wired to choose the “optimal” path. In this case, short-term survival circuits can undermine career potential.

The concern about undermining learning is strongly supported by recent research:

Research published in Cognitive Research found that AI assistants may promote illusions of understanding in learners, leading them to believe they have greater understanding than they actually do.
A study on information seeking in software development found significant trade-offs between immediate productivity gains and long-term expertise building in AI-assisted task completion

Key Insight: AI assistants can create false confidence in learners while preventing the deep learning that comes from struggling with problems independently.

Experience Level Matters Significantly

The METR study also found that AI tools may be useful in contexts different from their setting, for example for less experienced developers or developers working in an unfamiliar codebase. Senior developers, especially those already familiar with the codebase and stack, saw little or no measurable speed-up, with the boost strongest in situations where developers lacked prior context.

Experienced developers in familiar codebases see minimal benefit, while novices in unfamiliar territory see the most acceleration—but at the cost of learning opportunities. Note the worrying allure of less experience developers gaining illusions of understanding while being incentivised by the productivity gains in domains where they lack context but could create a positive impression with stakeholders.

Key Insight:

Slow and steady might win this race?

The “Vibe Coding” Phenomenon

Vibe coding is as much a meme as it is an actuall process. It describes a chatbot-based approach where developers describe projects to LLMs which generate code, with developers avoiding examination of the code and accepting AI suggestions without human review.

Critics point out lack of accountability, maintainability, and increased risk of introducing security vulnerabilities, with experts noting it’s clearly risky for production codebases.

Characteristics of Vibe Coding:

Developer describes desired functionality in natural language
AI generates code without developer review
Evaluation based solely on execution results
Iterative refinement through additional prompts
Little to no code inspection or understanding required

Key Insight: Vibe coding accelerates prototyping but introduces significant risks in production environments due to lack of code understanding and review.

Code Quality and Security Concerns

The “vibe-coding” approach raises serious quality concerns:

GitClear’s 2024 research analyzing 211 million changed lines found that code cloning rose from 8.3% to 12.3% while refactoring dropped from 25% to less than 10%
Studies found 29.5% of Python and 24.2% of JavaScript snippets generated by GitHub Copilot contained security weaknesses
Apiiro’s 2024 research showed AI-generated code introduced 322% more privilege escalation paths and 153% more design flaws compared to human-written code
AI-assisted commits were merged into production 4x faster than regular commits, meaning insecure code bypassed normal review cycles

Key Insight: AI-generated code frequently contains security vulnerabilities and quality issues, and the speed of development often means these issues bypass normal review processes.

Benefits in Specific Contexts

Not all uses of AI assistants are problematic:

A 2024 survey of 481 experienced developers found that developers want to delegate writing tests and natural-language artifacts to AI assistants, highlighting these as areas where AI is already popular
Research found 62% of vibe coders cited speed and efficiency as their primary motivation, with practitioners describing how AI tools enabled them to produce working software in hours instead of weeks

Key Insight: AI assistants excel at automating repetitive tasks like test generation and documentation, freeing developers for more creative work. They’re particularly valuable for rapid prototyping and documentation.

The Skill Atrophy Question

Besides not gaining deep understanding at the learning phase, it could have penalties for hard-yards you put in ages ago. Developers risk turning into button-pushers who can ask AI the right questions but won’t truly grasp the answers, and when the AI is wrong, these developers might not catch it. Google’s 2024 DORA report found that increased AI use improves documentation speed but causes a 7.2% drop in delivery stability.

Long-term Concerns:

Loss of fundamental programming skills through disuse
Inability to debug or fix AI-generated code when it fails
Reduced problem-solving capabilities
Diminished code review skills among junior developers
Generation gap in core competencies

Key Insight: Developers must consciously choose which skills to maintain through manual practice versus which to delegate to AI, acknowledging that delegated skills will atrophy.

Recommended Practices from Research

For Individual Developers

Practice “AI Hygiene”: Always verify and understand AI-generated code. Don’t accept output as correct just because it looks plausible. This requires steadfast discipline in environments that often rewards short-term agility over all.
Use AI as a Learning Tool: You can ask the AI to explain human0written code line-by-line or offer alternative approaches and contrast trade-offs. Turn passive answers into active lessons. However, note that there is no substitute for keyboard-time.
Keep a Learning Journal: Try to track what you frequently ask AI for help with—it may signal knowledge gaps to address. Then address those without involving AI.
Maintain Core Skills: Deliberately practice fundamental skills manually to prevent atrophy. Do coding challenges like Advent of Code or similar in an unfamiliar stack without the use of artificial intelligence. The goal is not producing the output as fast as possible, it’s not activate the learning and reasoning circuits in your mind.
Strategic Delegation: Consciously decide which tasks to delegate to AI and which skills to maintain. Documentation is a common target where AI agents do well and don’t undermine crucial skills.

For Teams and Organizations

Mandatory Code Review: Especially for AI-generated code. Treat AI as a junior developer whose work requires review. Invest in processes that enforce this in a pragmatic way. Promote a culture of AI-code transparency rather than bans on AI generated code.
Security Scanning: Implement automated security analysis tools to catch vulnerabilities in AI-generated code.
Clear Guidelines: Establish when AI assistance is appropriate (prototyping, test generation, documentation) versus when traditional approaches are necessary (production code, security-critical features).
Extensive Testing: AI-generated code requires more thorough testing, especially edge cases.
Education and Training: Developers might need training on secure AI usage and recognizing pitfalls encountered when you hit AI’s current limitations.
Mentorship Programs: Pair experienced developers with juniors to ensure knowledge transfer continues despite AI assistance. Attempt to disuade AI generated code from juniors and provide alternative mentoring, pair-programming etc.

Recommended Academic Citations

Weisz, J. D., et al. (2024). “Examining the Use and Impact of an AI Code Assistant on Developer Productivity and Experience in the Enterprise.” CHI Conference on Human Factors in Computing Systems. IBM study with 669 developers.
Peng, S., et al. (2024). “The Impact of AI on Developer Productivity: Evidence from GitHub Copilot.” MIT, Princeton, and University of Pennsylvania study with 4,867 developers across Microsoft, Accenture, and Fortune 100 companies.
METR (2025). “Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity.” Study of 16 experienced developers from large open-source repositories (averaging 22k+ stars).
Fu, M., et al. (2024). “Security Weaknesses of Copilot-Generated Code in GitHub Projects: An Empirical Study.” ACM Transactions on Software Engineering and Methodology. Analysis of 733 code snippets from GitHub projects.
GitClear (2024). “AI Copilot Code Quality: 2025 Data Suggests 4x Growth in Code Clones.” Analysis of 211 million changed lines of code from Google, Microsoft, Meta, and enterprise repositories (2020-2024).
Sergeyuk, A., et al. (2024). “Using AI-based coding assistants in practice: State of affairs, perceptions, and ways forward.” Information and Software Technology. Survey of 481 experienced developers.
Osmani, A. (2024). “Avoiding Skill Atrophy in the Age of AI.” Discussion of cognitive offloading and maintaining developer expertise.
Ashktorab, Z., et al. (2024). “The Evolution of Information Seeking in Software Development: Understanding the Role and Impact of AI Assistants.” Research on trade-offs between productivity and expertise building.
Pearce, H., et al. (2022). “Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions.” IEEE Symposium on Security and Privacy. Foundational security analysis of AI-generated code.
Apiiro (2024). Security research showing AI-generated code introduced 322% more privilege escalation paths and 153% more design flaws.

So what?

The research found seems to support a nuanced perspective on coding assistants: they are neither universally good nor bad, but powerful tools that require thoughtful deployment. Key takeaways include:

Productivity gains have been both observed, challenged and seemingly context-dependent: Most beneficial for junior developers and unfamiliar domains, less so for experienced developers in known codebases.
Learning and skill development are absolutely at risk: AI assistants can create illusions of understanding and lead to skill atrophy if not used mindfully.
Code quality requires vigilance: AI-generated code frequently contains security vulnerabilities and quality issues that demand robust review processes. “Move fast and break stuff” leveled up with agents.
Strategic use is essential: Developers and organizations must establish clear guidelines about when and how to use AI assistance. Guidelines must have a nuanced stance that acknowledges where it adds value and where it produces a false sense of productivity.
The future requires balance: Success might lie in finding the balance between using AI to augment, not undermine human capabilities while maintaining core competencies through deliberate practice.

The cat is indeed out of the bag—these tools are here, are currently being used extensively and adoption is accelerating. The question for most is not whether to use them, but how to use them responsibly while preserving the essential skills and rigor that software development requires.