Like the rest of the internet, I’ve been awestruck by the quality of Google’s Veo 3 AI video generator (even with audio). As you’ve seen from my posts, the American bison 🦬 is my favorite animal (aside from my cat of course). Also, perhaps I’ve watched too many videos from the Mustache Farmer, but I realized I could use Veo 3 to realize a fantasy of being able to pal around with what in reality is a wild animal. Generating bison videos brought me a lot of joy so I had to share them. However, I faced a dilemma because I didn’t want to do so via WordPress’s Video block in its current state: it suffers from a bad case of layout shiftitus. Since web performance is an even greater passion of mine than the bison, before I could share the videos I did a deep dive on the problem and came up with a fix. (Skip to the videos below if you don’t care.)
There’s a 2-year old Gutenberg issue (#52185) which reported the underlying problem here that the Video block should have width
and height
attributes added to prevent layout shifts. Such jank causes a poor user experience and negatively impacts the Cumulative Layout Shift (CLS) metric of Core Web Vitals (CWV).
A dimensionless video
element gets a 2:1 aspect ratio placeholder (a 300×150 default object size) until the video’s metadata is loaded, at which time a layout shift happens due to the new dimensions being applied. When there’s a poster
attribute, the placeholder dimensions get replaced with the dimensions of the poster image once it loads, also resulting in a layout shift; lastly, if the poster image doesn’t have the exact same dimensions as the video, then a second layout shift occurs once the video starts playing.
Take a look at these screen recordings of a Video block without a poster and then with a poster provided (and there’s a script on this test page that starts video playback after 4 seconds):
The layout shifts are somewhat exaggerated here because:
- A vertical/portrait video is used.
- A network delay is added to slow down the loading of the poster and video (which is not unexpected on a mobile connection).
- The poster image doesn’t have the same dimensions as the video.
In any case, such layout shifts seem to occur anywhere the Video block is used to some degree.
CLS Passing Rates
Layout shifts from the Video block contribute to WordPress overall having a relatively poor passing rate for CLS. On desktop, 71% of WordPress origins have a good CLS passing rate, while on mobile the passing rate is 82%. (CLS is worse on desktop presumably because more content is on the screen at a time, meaning there are more opportunities for layout shifts to appear in the viewport.) When evaluating these CLS passing rates in terms of academic letter grades, WordPress is getting a B− on mobile and a C− on desktop. When comparing WordPress to other popular CMS platforms, it ranks near the bottom with only Joomla performing worse, as seen in this table sorted by desktop:
Technology | Desktop | Mobile |
---|---|---|
Wix | 91% (A−) | 94% (A) |
Shopify | 82% (B−) | 90% (A−) |
Squarespace | 77% (C+) | 88% (B+) |
Drupal | 72% (C−) | 85% (B) |
(Any) | 72% (C−) | 79% (C+) |
WordPress | 71% (C−) | 82% (B−) |
Joomla | 69% (D+) | 79% (C+) |
Graphs of origins with good CLS over time
Desktop
Mobile
The passing rates for WordPress are unsurprisingly very close to the passing rates for the web overall (any technology) since WordPress has the largest market share by far. Whenever WordPress performs badly, the web as a whole suffers. Whenever WordPress performs well, the web as a whole improves. This was the drive behind my “scaled activation” Chrome team at Google when I was sponsored there to work on WordPress performance full time.
Now, CLS in WordPress is not nearly as problematic as Largest Contentful Paint (LCP), which is getting an F grade for its passing rates of 54% on mobile and 65% on desktop. Because of this, improving LCP has been the primary focus for us on the WordPress Core Performance Team, and the metric has improved thanks in part to adding fetchpriority=high
to LCP-probable img
tags, adjusting image lazy-loading heuristics, optimizing the emoji loader, and most recently landing Speculative Loading. And work continues on improving LCP, for example, by deprioritizing non-critical scripts and by leveraging client-side metrics to more accurately prioritize images via the Optimization Detective project (see also my talk).
The other CWV metric, Interaction to Next Paint (INP), is in relatively great shape with a passing rate of 85% on mobile (B) and 98% on desktop (A+).
So, in parallel with the continued work to improve LCP, it’s important to not neglect WordPress’s sub-optimal CLS passing rate. Prior work to improve CLS included adding width
and height
attributes to img
tags for the sake of lazy-loading. There’s also a ticket (#59119) to measure CLS in performance tests. Additionally, a key feature of the Embed Optimizer extension to the aforementioned Optimization Detective plugin is the reduction of layout shifts caused by embeds that resize when they load. This is commonly seen in embeds for Twitter, Bluesky, and WordPress itself. Embed Optimizer keeps track of these embeds’ resized heights. Then, with these resized heights stored, Embed Optimizer sets the appropriate height on the container figure
element as the viewport-specific min-height
so that when the embed loads any layout shift is minimized.
Lastly, coming back to the impetus of this post, there’s the issue of layout shifts in the Video block.
Fixing the Video Block
Preventing layout shifts in the Video block is straightforward. As described in the Gutenberg issue, the width
and height
attributes need to be supplied on the video
tag, although a bit more is needed than just that. When a video is uploaded to the Media Library, the metadata is obtained via the wp_read_video_metadata()
function, including its width and height. Assuming that reading the metadata was successful, these dimensions can then be injected into the video
tag in the same way as dimensions are being added to the img
tag. (For external videos added by URL not uploaded into the Media Library, the dimensions could be read client-side in the block editor or they could be gathered via Optimization Detective on the frontend.)
This goes full circle for me because we did something similar in to add dimensions to videos in the AMP plugin when converting from the video
tag to the amp-video
component. The AMP HTML spec mandates (generally) that all elements must have dimensions supplied to prevent layout shifts, as the first of AMP’s design principles is to prioritize the user experience. As I mentioned in my recent contribution retrospective, AMP predates CWV; as part of contributing back lessons learned from AMP to the whole web, it “invested in defining additional metrics that would paint a more holistic image of user perceived performance.” This included a “Layout Stability” metric which came to be known as CLS.
In addition to providing the width
and height
attributes, for the video
to scale to fit its container and to have the correct aspect ratio, the element needs to be styled with height:auto
since it has width:100%
. Finally, because of an issue in the CSS spec (as highlighted by Jake Archibald), the width and height need to be replicated in an aspect-ratio
style. This currently has to be added as an inline style since the following desired use of the attr()
function in a style rule is currently only supported in Chromium:
.wp-block-video video[width][height] {
aspect-ratio:
attr(width type(<number>)) /
attr(height type(<number>));
}
Code language: CSS (css)
I’ve submitted the fix in Gutenberg pull request #70293: Fix layout shift caused by video tag in Video block lacking width and height.
And since I didn’t want to wait for that fix to be merged and available in a new Gutenberg release, I also adapted it into a standalone Layout-stabilized Video Block plugin which is active here on my blog (since I wanted to share those AI-generated bison videos!). Please install the plugin on your site and test how it works with your Video blocks.
Compare the above screen recordings of layout-shifting Video blocks with the following screen recordings where the fix is applied:
Aside: Command used for transcoding screen recordings
I was really impressed with how well FFmpeg compressed the original Quicktime screen recordings from ~32MB down to just ~400KB, and since the args to ffmpeg
are always something I have to re-discover, here’s the command for (my) future reference:
for video in $(ls *.mov); do
ffmpeg
-i "$video" \
-vf "scale=-2:1080" \
-c:v libx264 \
-preset medium \
-crf 30 \
-an \
-b:a 128k \
-movflags \
+faststart \
"${video/.mov/.mp4}"
done
Code language: Bash (bash)
You can try loading this post without the fix by adding a special query var to the URL (and then keep hard reloading to see more layout instability).
And now, with the layout shift fixed, let’s get to the bison videos.
AI Bison Videos
The following Veo-generated videos were downloaded from Gemini and uploaded without any transcoding, although they are encoded quite well for the web at ~3MB each for 8 seconds of high quality 720p video with audio. I manually selected a representative frame of each video to create the poster images.
This first video leans a little too hard into the “mustache” of the Mustache Farmer:
I like how this guy seemingly pretends he didn’t know where the bison was in the open field, “Oh, there you are buddy!”:
Awkward running:
A somewhat less awkward run:
There’s an invisible stirrup in this one:
Apparently Tom Cruise with maniacal laughter and a magically-appearing saddle:
Just heartwarming:
The guy’s smile is a little intense in this one, but it’s also heartwarming:
This guy seems a little fake (almost like he’s AI):
Nice purring sound effect, followed by flapping bird wings?
Cozy, but the cat glitches a bit at the end:
Now I’m going to go pet my real cat. 😸
Where I’ve shared this:
Leave a Reply