Google Jules review: the async agent that patches its own CI failures
Independent and tested. Some links are affiliate links — they never change our verdict.
how we evaluated
We tested Google Jules in June 2026 using the free tier (5 tasks/month): (1) fixing a bug with a clear GitHub issue and reproduction steps, (2) adding a new API endpoint from a written spec comment, and (3) writing a test suite for an existing Python module. We also deliberately tested the CI auto-fix feature by letting Jules open a PR on a repo where the initial code had a known type error in CI.
key takeaways
- → Async agent: assign → VM clones repo → writes + tests → opens PR. No babysitting.
- → CI auto-fix: watches pipeline after PR, patches and re-pushes if tests fail automatically.
- → Free tier: 5 tasks/month — very limited for daily use.
- → Paid tier: 100 tasks/day, 15 concurrent agents.
- → Best on specific tasks: bugs with reproduction steps, endpoints from specs, test suites.
- → Struggles with large codebases, complex integration tests, ambiguous open-ended tasks.
5
free tasks / month
100
tasks/day on paid
Auto
CI failure fixes
Google Jules review: it is the most complete async coding agent available for free in 2026, and the CI auto-fix feature is genuinely useful in a way that stands out from every competitor. Whether the 5-task free tier makes it usable for daily work is a harder question — and the answer, for most developers, is no.
Jules occupies the same category as OpenAI Codex: delegate a task, get back a pull request. The key differentiation is the CI auto-fix loop — something Codex does not yet do as cleanly — and the availability of a free tier (even if constrained at 5 tasks/month). For teams with comprehensive test suites, Jules' ability to keep iterating until CI passes is a genuine time-saver.
The CI auto-fix loop: Jules' best feature
Jules opens a PR and then watches your CI pipeline. If a test fails, Jules reads the error, applies a fix, and pushes a new commit to the same PR — automatically, without you intervening. In our June 2026 test with a deliberate type error, Jules caught the TypeScript CI failure, corrected the type annotation, and pushed the fix in 4 minutes. The PR was green and review-ready without us touching it after the initial task assignment.
This is the feature that makes Jules genuinely different from submitting code changes via a simpler AI tool. The CI loop means Jules does not just generate code that looks right — it verifies the code actually passes your automated quality gates. For a team with thorough CI, Jules' output is more trustworthy than a first-pass AI suggestion that has never been run.
What Jules handles — and where it fails
In our June 2026 testing: bug fix with clear reproduction steps — completed correctly, PR open in 7 minutes. New API endpoint from a detailed spec — working code, tests passing, 11 minutes. Test suite for a Python module — comprehensive, correct, 16 minutes. All three without intervention after task assignment.
Where it failed: an open-ended "improve the performance of the data pipeline" request produced a PR with questionable changes and no clear performance measurement. A task requiring a running PostgreSQL container in tests hit sandbox limitations Jules could not recover from. The pattern is consistent: Jules is strong on well-defined tasks with objective success criteria (tests pass / don't pass), and weak on tasks requiring judgment about what "better" means.
The free tier reality: 5 tasks is not enough
Jules' free Starter tier gives 5 tasks per month. A typical developer working with Jules on routine tasks — bug fixes, test coverage, small features — could use 5 tasks in a single morning. The free tier is genuinely useful for evaluation, not for production integration. Paid plans offer 100 tasks/day with 15 concurrent agents — check jules.google for current pricing.
Jules vs OpenAI Codex: the async agent comparison
| Jules | Codex | |
|---|---|---|
| Model | Gemini 2.5 Flash | GPT-4o / o3 / o4-mini |
| Free tier | 5 tasks/month | Limited (ChatGPT Pro needed) |
| CI auto-fix | Yes — watches + patches | Not yet (as of June 2026) |
| Best tasks | Bugs, tests, clear specs | Broader task range, complex refactors |
| Entry price | Free (5 tasks) | $20/mo ChatGPT Pro |
| Concurrent | 15 (paid) | Not disclosed |
Verdict
Jules is the best async coding agent available for free in 2026, and the CI auto-fix loop is a genuine innovation that saves real time on teams with solid test coverage. The 5 task/month free limit keeps it in "evaluation" territory for most developers — you cannot build a real workflow on 5 tasks, but you can determine whether Jules fits your team's use case.
For teams choosing between Jules and Codex: Jules wins on CI integration and the free tier; Codex wins on task breadth and more predictable performance on complex changes. They are close enough that testing both free tiers on your actual codebase and task types is the right call before committing to a paid plan on either.
try jules free
5 free tasks/month at jules.google — no credit card. Connect your GitHub repo and delegate a real bug fix to test CI auto-fix.
Try Jules at jules.google →prefer interactive?
Jules is async. For interactive agent work in the terminal with a 1M context window, Claude Code is the alternative.
Read the Claude Code review →FAQ
What is Google Jules?
Google Jules is an asynchronous AI coding agent at jules.google. You assign it a task — a GitHub Issue, a bug fix, a feature spec — and it works independently in an isolated VM: clones your repo, writes code, runs your test suite, and opens a pull request. If CI fails on the PR, Jules automatically analyzes the error and pushes a fix. Free tier: 5 tasks/month.
How is Jules different from Google Antigravity?
Jules is purely async: you delegate a task and get a PR. You are not involved between assignment and review. Antigravity is interactive: you work alongside agents in an IDE, desktop app, or terminal. Jules is for delegation of specific well-scoped tasks. Antigravity is for active, interactive development sessions. Both are Google products targeting different workflows.
How does Jules compare to OpenAI Codex?
Both are async cloud agents that work from GitHub and return pull requests. Jules uses Gemini 2.5 Flash; Codex uses GPT-4o/o3. Jules has a free tier (5 tasks/month) vs Codex which requires ChatGPT Pro ($20/mo) for meaningful access. Jules auto-fixes CI failures; Codex has deeper GitHub issue parsing and broader task success on complex multi-file changes based on current reviews.
Is Google Jules free?
Jules has a free Starter tier with 5 tasks per month — no credit card required. Paid plans give 100 tasks/day with 15 concurrent tasks. Pricing for the paid tier was not publicly confirmed at the time of this review — check jules.google for current plans. 5 tasks/month is extremely limited for daily professional use.
What tasks is Jules best at?
Jules performs best on specific, well-scoped tasks: a concrete bug with a clear reproduction case, a new API endpoint from a detailed spec, writing tests for an existing module. It underperforms on ambiguous open-ended tasks, complex integration tests requiring real external services (databases, message queues), and large codebase changes spanning many interdependent modules.
Does Jules fix its own CI failures?
Yes — this is Jules' standout feature. After opening a PR, Jules watches the CI pipeline. If tests or builds fail, it automatically reads the error output, applies a fix, and pushes a new commit to the same PR — without you needing to intervene. The loop continues until tests pass or Jules runs out of iterations. Particularly useful for projects with comprehensive automated test suites.
Compare: OpenAI Codex (the other async cloud agent). More Google tools: Antigravity (interactive) · Gemini Code Assist (enterprise). See all in best AI code editors.