WP-Bench: how much does AI know about WordPress

WP-Bench is a new benchmark tool to evaluate how artificial intelligence models understand and work with the WordPress ecosystem.

Remember that you can listen to this program from Pocket Casts, Spotify, and Apple Podcasts or subscribe to the feed directly.

Program transcript

Hello, I’m Alicia Ireland, and you’re listening to WPpodcast, bringing the weekly news from the WordPress Community.

In this episode, you’ll find the information from January 12 to 18, 2026.

WP-Bench is a new benchmark tool to evaluate how artificial intelligence models understand and work with the WordPress ecosystem.

It is designed to measure WordPress specific capabilities, not just generic programming tasks, helping developers and the community understand which models perform best with key WordPress concepts, APIs, coding standards, and security practices. The goal is to create an open and visible standard that enables model comparisons and helps guide integration decisions in tools and plugins.

The WP-Bench evaluation system covers two dimensions: knowledge, such as answers to WordPress multiple choice questions, and execution, such as code generation tasks evaluated through static analysis and tests in a real WordPress environment, using WordPress core itself as the scoring mechanism.

Although it is an early version, with a still limited test suite and some bias toward newer APIs such as Abilities and Interactivity, the project aims to grow with community contributions and include a public leaderboard for transparency and ongoing comparison across models.

The weekly Core AI team meeting focused on identifying work that can move forward immediately and clarifying priorities for WordPress 7.0.

A substantial part of the meeting was dedicated to the AI Experiments plugin, with planning for version 0.2 and prioritizing features such as Abilities Explorer and excerpt generation.

In relation to bringing AI features into core and preparations for WordPress 7.0, they discussed the need to plan ahead for merge proposals for the WordPress AI client and the deprecation of backward compatibility in the Abilities API.

Regarding the PHP AI client, provider rewrites have been included and improvements are planned, such as a caching layer, and there was interest in exploring support for local AI providers.

On the Developer Blog, there is a what’s new recap, including the release of Gutenberg 22.3, which now includes a new Fonts screen, support for PHP only blocks, and early additions of features that will land in WordPress 7.0.

The Playground team has announced that the tool has dropped support for PHP 7.2 and 7.3 and now requires PHP 7.4 as a minimum.

Blueprints that still request older versions are automatically migrated to 7.4, with a warning, and will continue working without the user needing to take immediate action.

And finally, this podcast is distributed under a Creative Commons license as a derivative version of the podcast in Spanish; you can find all the links for more information, and the podcast in other languages, at WPpodcast .org.

Thanks for listening, and until the next episode!

122. WP-Bench: how much does AI know about WordPress

Program transcript

Comments

Leave a Reply