<p>Artist and professor Greg Borenstein takes us through a sophisticated new technique for intelligently removing pixels from an image.</p>
When resizing images, you’re normally faced with a choice: stretch, scale, or crop. But what if you could alter an image’s proportions in a way that respected its content, altering the least noticeable parts of the image rather than cropping from the edges or stretching the whole thing? This is exactly the capacity provided by a technique called Seam Carving.
Seam Carving was originally published in a 2007 paper by Shai Avidan and Ariel Shamir. Here’s Avidan and Shamir’s video explaining the technique and demonstrating some of its applications:
In this tutorial, I’ll explain how Seam Carving works and introduce you to a Processing library that will let you use Seam Carving in your own projects.
How Does Seam Carving Work?
As you’ve seen in Avidan and Shamir’s video, Seam Carving can be used for multiple applications: shrinking images, expanding them, eliminating content, etc. In this tutorial we’re going to focus on the first and most straightforward of these: shrinking images.
Using Seam Carving to shrink an image is a three step process:
1. Define the important parts of the image that we want to preserve.
2. Use Dynamic Programming to calculate the costs of every potential path through the image.
3. Follow the cheapest path to remove one pixel from each row or column to resize the image.
Following these steps will shrink the image by one pixel in our chosen dimension (depending on whether we’re making the image shorter or skinnier). We can repeat the process as many times as we want to resize the image as much as necessary. And, at the end of this post, I’ll point you towards examples of how we can pre-calculate these seams in order to enable interactive content-aware resizing or even to apply it to video.
1. Defining the Important Parts of an Image
What are the most important parts of an image? Which parts of a given image should we eliminate first when resizing and which should we hold onto the longest? There are many ways to answer this question. In the original paper, Avidan and Shamir explore a variety of them including eye tracking to detect which parts of the image people actually look at and an interactive mode that lets creators paint in the areas they want to preserve (about which more later).
However, the simplest and frequently most-effective answer is the gradient of the image. An image gradient highlights the parts of the original image that are changing the most. It’s calculated by looking at how similar each pixel is to its neighbors. Large uniform areas of the image will have a low gradient value and the more interesting areas—edges of objects, places with a lot of detail, etc.—will have a high value.
For example, given this original image:
the gradient image will look like this (where brighter pixels represent higher value areas):
The solid green areas of the curtain and the purple floor under the chairs are clearly low value. Whereas the edges of each figure’s head are the most valuable.
This is just right for content-aware resizing.
(Additionally, as mentioned above, you could alter this cost image manually to achieve various effects. For example in his Short People video, Scott Garner altered the weight image to protect the head and shoulders of figures so that Seam Carving would distort their height to hilarious results.)
2. Calculating the Cost Grid: Dynamic Programming
Now that we’ve calculated the value of each pixel, we’re ready to start trying to remove pixels from the image to shrink it down. This time, we’ll be resizing the image horizontally so we need to remove one pixel from each row of the image. And we like the pixels that we remove to be arranged into a single connected row. This will prevent any disjunctions in the image: areas where a whole piece of the image seems to have been moved over by multiple pixels, something that definitely catches the eye.
Put another way, our goal is to find a path of connected pixels from the bottom of the image to the top. By “connected," I mean that we’ll never jump more than one pixel left or right as we move up the image from row to row.
And we’re looking for a very specific path: the one who’s pixels have the lowest total value. In other words, we want to find the path of connected pixels from the bottom to the top of the image that touches the darkest pixels in our gradient image.
One way to do this would be to simply calculate the costs of each possible path through the image one-by-one. Start with a single pixel in the bottom row, navigate every path from there to the top of the image, keep track of the cost of each path as you go.
However, there are a lot of paths from each pixel to the top of the image. Take a look at this diagram:
The arrows represent each possible move from one row to the next. From just that one pixel there are hundreds of paths through just those three rows. If you played this out for every pixel on the bottom row all the way through the image you’d end up with an enormous number of paths, an amount that would take us far too long to actually compute. Calculating the cost of each possible path one at a time is totally impractical.
To avoid this conundrum, we’ll use a technique called Dynamic Programming. Dynamic Programming lets us touch each pixel in the image only once, aggregating the total cost as we go, in order to calculate the final cost of an individual path.
Instead of starting at the bottom of the image and working up recursively through each possible path, we start at the top and work our way down, adding the cost of the cheapest above neighbor to each pixel. This way, we accumulate cost as we go: setting the value of each pixel not just to its own cost, but to the full cost of the cheapest path from there to the top of the image.
This effect is easy to see if you visualize these costs as an image:
Notice how the spots where the gradient image was brightest are now the roots of inverted triangles of brightness as the cost of those pixels propagate into all of the pixels within the range of the widest possible paths upwards.
When we arrive at the bottom row, the lowest-valued pixel will necessarily be the root of the cheapest path through the image. Now we’re ready to start removing seams.
3. Removing Seams
Due to the power of Dynamic Programming, the process of actually removing seams is quite easy. All we have to do to calculate the cheapest seam is to start with the darkest pixel in the bottom row and work our way up from there, selecting the cheapest of the three adjacent pixels in the above row. Dynamic Programming guarantees that the pixel with the lowest value will be the root of the cheapest connected path up from there.
Once we’ve selected which pixels we want to remove, all that we have to do is go through and copy the remaining pixels on each side into a new image that’s one pixel narrower than our original. And voila, we’ve carved a seam.
Seam Carving in Processing
As part of Makematics, a class I teach at NYU ITP in converting computer science research into creative tools, I built a Dynamic Programming library for Processing. This library includes an implementation of Seam Carving that makes the technique really easy to work with in Processing.
You can download the library: HERE.
One of the examples included with the library demonstrates Seam Carving (I’ve also included the source for that example, which includes comments explaining the details at the bottom of this post).
I’ve also posted the sketch I created during the New Cinema hackathon that adds a cache to the Seam Carving results so that you can resize an image interactively in real time.
Greg Borenstein is an artist and technologist in New York. He’s currently a professor at NYU’s Interactive Telecommunications Program and recently the author of Making Things See: 3D Vision with Kinect, Processing, Arduino and Makerbot from O’Reilly.