Andrew Ng, a prominent figure in the field of Artificial Intelligence, recently made a public presentation where likens AI to “the new electricity,” a general-purpose technology with the power to unlock countless new applications. While much of the recent buzz in AI has centered on foundational models and semiconductors, Ng emphasizes that the true value and biggest opportunities lie in the application layer. Among the myriad advancements in AI, Ng is most excited about a specific technical trend: agentic AI workflows, which he considers the single most important AI technology to pay attention to.
Beyond Single-Shot Prompts: The Agentic Revolution
The typical way most users interact with large language models (LLMs) today is through “zero-shot prompting”, asking the AI to generate a complete output in one go, much like writing an essay from start to finish without backspacing. While LLMs perform surprisingly well this way, Ng highlights that humans don’t do their best work like this, and neither do AIs.
This is where agentic workflows come in. According to Ng, an agentic workflow is an iterative process where an AI system breaks down a complex task into smaller steps, performs research, drafts, critiques its own output, and revises, often looping through these stages multiple times. This approach, akin to a human thinking, researching, and revising, takes longer but results in significantly better and more robust outputs. For example, a legal team could use an agentic workflow to process complex documents, or a healthcare system could leverage it for diagnostic assistance.
The performance gains are stark. Ng cites the HumanEval Benchmark, which measures an LLM’s ability to solve coding puzzles. While GPT-3.5 achieved 48% accuracy and GPT-4 a much improved 67%, GPT-3.5 with an agentic workflow could achieve up to 95% accuracy. This demonstrates that the improvement from using an agentic workflow can “dwarf” the improvements from moving to a larger, more advanced foundational model alone.
The Four Pillars of Agentic Design
Ng identifies four major design patterns that builders are using to construct agentic workflows in their applications:
- Reflection: This pattern involves an LLM critiquing its own output. For instance, a “coder agent” might generate code, then be prompted to review and critique that code, even suggesting improvements. This iterative self-correction, often combined with unit tests, significantly boosts performance from a baseline level.
- Tool Use: LLMs are prompted to decide when to make API calls, such as searching the web, executing code, or performing specific actions like issuing refunds or sending emails. This expands the capabilities of agentic workflows by allowing LLMs to interact with external systems and data. Ng notes that LLMs are now often explicitly tuned to support tool use, which creates a higher ceiling for what agentic workflows can achieve.
- Planning: For complex requests, an LLM can be prompted to devise a sequence of actions or steps to achieve the desired outcome. This allows the AI to break down a large task into manageable sub-tasks and execute them in a logical order, often involving different models or tools for each step.
- Multi-Agent Collaboration: Instead of a single LLM performing all tasks, this pattern involves prompting an LLM to play different roles at different points in time, simulating multiple agents interacting with each other to solve a task. Ng draws an analogy to multi-threading on a CPU – while it’s still one processor, abstracting tasks into different “agents” can help developers break down and solve complex problems more effectively, often leading to significantly improved performance for various tasks.
The Dawn of Visual AI Agents
Ng is particularly excited about the rise of large multimodal model (LMM) based agents, extending the power of agentic workflows beyond text to include image and video data. Just as with LLMs, LMMs can perform better with an iterative, step-by-step agentic approach compared to zero-shot prompting.
He demonstrated this with a “vision agent” capable of complex visual AI tasks. For example, by uploading an image of a soccer game and asking it to count players, the agent generated Python code to perform object detection and counting, then provided the accurate result. This capability allows businesses with vast amounts of visual data (images, videos) to extract significant value that was previously difficult to obtain. Other demonstrations included splitting video clips to find specific events (like a goal being scored) and generating metadata for video content, enabling searchable video databases. This significantly lowers the barrier to building complex visual AI applications.
Impact on the AI Stack and Development Practices
The emergence of agentic AI is not just changing how applications are built but also evolving the AI stack itself. Ng points to a new, emerging agentic orchestration layer (like LangChain or LangGraph) that makes it easier for developers to build these complex applications.
This rapid pace of development driven by generative AI means that while prototyping machine learning models has become much faster (days instead of months), other parts of the software development process – like product design, software integration, DevOps, and MLOps – still take time. However, the speed of ML prototyping is putting pressure on organizations to accelerate these other pieces as well. Ng advocates for a new mantra: “move fast and be responsible,” emphasizing the ability of smart teams to prototype quickly and evaluate robustly without shipping harmful products.
A critical bottleneck in this accelerated process is evaluations (evals). In the past, collecting test data was a small additional cost to training data. Now, with LLM-based apps often not requiring training data, collecting thousands of test examples becomes a significant bottleneck. Ng notes that building and collecting data are often done in parallel rather than sequentially, and there’s still much innovation needed in how evals are built. He suggests that many teams delay systematic evals, but even quickly thrown-together, imperfect evals can be immensely helpful in complementing human judgment and incrementally improving systems.
Key Trends and Future Challenges
Ng identifies several crucial trends supporting the agentic AI revolution:
- Faster Token Generation: Agentic workflows require reading and generating a lot of text (tokens). Efforts to speed up token generation through semiconductor and software advancements will make agents much more efficient.
- LLMs Tuned for Tool Use: Modern LLMs are increasingly optimized not just for answering human queries but explicitly for supporting tool use and fitting into iterative agentic workflows.
- Rising Importance of Unstructured Data Engineering: With generative AI’s prowess in processing text, images, and video, managing and deploying unstructured data to create value is becoming a major effort for businesses.
- Image Processing Revolution: While text processing is here, the image processing revolution is rapidly advancing, significantly increasing the range of possible applications.
He also touches on challenges and underrated areas:
- Bridging Business Needs to Agentic Workflows: It’s still difficult for businesses to break down existing processes into the right granularity of micro-tasks for agentic workflows. This requires a rare skill set to define steps, branches, and effective evals.
- Tactile Knowledge for Debugging: Building agents often requires “tactile knowledge” – the ability to quickly diagnose issues by looking at output traces and making informed decisions on what to do next. This skill is built through practice with various AI tools and understanding their limitations.
- Underrated Opportunities:
- Voice Stack Applications: Ng sees massive, often underestimated, opportunities for voice-based agentic applications in enterprises. While real-time voice can be challenging due to latency, agentic voice workflows offer more control and significantly reduce user friction, as people feel more comfortable speaking than typing for many applications.
- AI-Assisted Coding: Ng firmly believes that AI-assisted coding makes developers significantly faster and that everyone should learn to code, as it enables better instruction to computers across all job functions. He dismisses the idea that AI will automate away coding jobs, comparing it to past fears when programming languages became easier.
- MCP (Model-Client Protocol): Ng sees MCP as a “fantastic first step” towards standardizing the interface for agents to plug into various tools and data sources, greatly simplifying the “plumbing” of data integrations. While early and still evolving, it promises to reduce the integration effort from N*M to N+M.
- Agent-to-Agent Communication: Though very early, the concept of agents from different teams successfully interacting is a future frontier. Currently, multi-agent systems primarily work best within a single team due to protocol understanding.
In this presentation, Andrew Ng is immensely optimistic about the future of AI agents, believing they are expanding the realm of what’s possible and opening up countless new applications. The ability to experiment and build faster than ever before makes this an exhilarating time for builders in the AI space.
