I was one of the first beta testers of Devin, I know everyone loved him :D, so I thought, why not compare it with the new shiny Manus? Sounded like a good show to watch. So, I ran a little side quest this weekend where I put two of the most hyped AI agents head-to-head: Devin AI vs. Manus AI.
I compared them on four different tasks: building an educational webpage, summarizing research projects, analyzing public sentiment, and making an interactive political simulation (got these examples from the Manus's website)
But the best part is after the comparisons, I asked Devin to analyze and rate how overhyped Manus is... and let's just say Devin didn’t hold back. I put the "evaluation" chart it generated :D
- Manus: Honestly, much better at creative, general-purpose stuff. Definitely the show-off in the room. (mostly because of some Claude magic)
- Devin: More structured and solid when it comes to technical, detailed tasks, but struggles when things get a bit... general.
Both have their strengths, both feel a little overhyped in different ways.
If anyone wants the full breakdown of what I tested and how they did, here is the link of the post i just wanted to share.
Actually, as a researcher, my usual work has been on LLM SWE agents and their evaluation. I’ve worked on analyzing SWE Bench and similar benchmarks to explore how we can further analyze and improve their performance in real-world tasks. So if you're working in a similar field, happy to chat!
I recently got full access to Manus itself, so I’m now planning to do a v2 of this comparison. If you have any good ideas, let me know! :D Maybe there is way to compete them head-to-head, in real time.
I was one of the first beta testers of Devin, I know everyone loved him :D, so I thought, why not compare it with the new shiny Manus? Sounded like a good show to watch. So, I ran a little side quest this weekend where I put two of the most hyped AI agents head-to-head: Devin AI vs. Manus AI.
I compared them on four different tasks: building an educational webpage, summarizing research projects, analyzing public sentiment, and making an interactive political simulation (got these examples from the Manus's website)
But the best part is after the comparisons, I asked Devin to analyze and rate how overhyped Manus is... and let's just say Devin didn’t hold back. I put the "evaluation" chart it generated :D
Here is also the link of Devin's analysis: https://manus-ai-evaluation-website-v5gb3spi.devinapps.com/
My quick takeaways:
- Manus: Honestly, much better at creative, general-purpose stuff. Definitely the show-off in the room. (mostly because of some Claude magic)
- Devin: More structured and solid when it comes to technical, detailed tasks, but struggles when things get a bit... general.
Both have their strengths, both feel a little overhyped in different ways.
If anyone wants the full breakdown of what I tested and how they did, here is the link of the post i just wanted to share.
Actually, as a researcher, my usual work has been on LLM SWE agents and their evaluation. I’ve worked on analyzing SWE Bench and similar benchmarks to explore how we can further analyze and improve their performance in real-world tasks. So if you're working in a similar field, happy to chat!
I recently got full access to Manus itself, so I’m now planning to do a v2 of this comparison. If you have any good ideas, let me know! :D Maybe there is way to compete them head-to-head, in real time.