But what does it all mean? Understanding eye-tracking results (Part 5)
Part V: Time and the Heatmap
In my previous post, I mentioned that heatmaps do not have a time component. Several people have asked me to discuss this topic in a little more detail, so here we go.
Important point #1:
A heatmap represents which content was seen by a group of participants. A heatmap helps to answer the questions “Where did users look?” and “Where didn’t users look?”
Important point #2:
If you are analyzing a heatmap, you are momentarily saying “Ok, in this instant we don’t care how long people looked at things, we just want to know what they saw.”
A heatmap does not answer the questions:
• How long did someone look at my page or page element?
• Did users look away and look back?
To further clarify these points, here’s a general overview of how a heatmap is created (click illustrations for a larger view):
Step 1: Collect data A single participant views a web page. We record her eye movements as she browses. This gives us data which can be represented in a gaze replay animation. Now imagine that the animation is composed of a stack of stills, just like a flip book. Each page of our eye tracking “flipbook” contains the X,Y, and Time coordinates of a single fixation. |
Step 2: Collapse individual data sets along the t-dimension Keep imaging our eyetracking data as a flipbook (3D data structure). Now we will collapse the stack along the time axis. In its most basic form, the calculation will take the area of a web page and note every location where the participant fixated. By collapsing the time dimension, we remove it from all further calculations (poof!) This means that someone who fixated the center of a page for 100ms will end up with the same fixation summary as someone who stared at the center of the page for an extended period of time. |
Of course, the algorithm for doing this computation is more complex than the example I just presented. Any heatmapping algorithm should take into account peripheral vision, microsaccades, blinks, fixation duration, “bad data”, ocular drift, dynamic page behavior, etc. For now let’s continue with the simplified example. Steps #1 and #2 are completed for, oh, say 15 people total. This gives us 15 individual fixation summary plots. |
Step #3: Compute an average viewing value for each pixel of the webpage Again the algorithm which handles this step is more complex than what is presented here, but the basic idea is… For every pixel of a web page, the system asks “how many people saw this pixel?” If 10 people saw the pixel out of a group of 15, then the algorithm says “66% percent of people saw this pixel… color it yellow on the plot.” |
This averaging algorithm outputs a heatmap showing what percentage of participants saw each page element. Handling the data in this way keeps any single individual from biasing heatmap percentages. For example, imagine that 14 people only looked at the center of a page, and 1 person looked at the entire area of the page. Even though 1 participant viewed a much larger area than anyone else (and probably spend a goodly amount of time doing it), she still only represents 6.7% of the 15 person group. The resulting heatmap would then show that 100% of people fixated the center of the screen, and less than 10% looked elsewhere. |
Other questions that have come up:
If you make a heatmap from only the first 10 seconds of viewing, doesn’t that add a time component to the heatmap?
The answer is no. The method for computing a heatmap is the same no matter how large the time sample used. When you slice for specific time intervals, you are just selecting a specific group of fixations to include in the calculation. The resulting heatmap still has no time component.
Are heatmaps created from time slices more valid or informative than those created from full experiment sessions?
I suppose that depends on the kind of information you want to get from the plot. If you want to see where people looked in the first 10 seconds, a time slice heatmap is appropriate. If you are trying to understand an order for page element viewing, a heatmap is probably the wrong analysis tool. If you want to see where people are looking over their entire experience with the page, a full session heatmap is the way to go.
This is the last planned installment of our “But what does it all mean?” series. However, the discussion is still open, so if anyone is interested in other topics, just let me know.
Article by Teresa Hernandez - Eyetools, Inc.
Illustrations by Boyd Richard - Eyetools, Inc.





Very good initiative to educate people on what information can be obtained through eye tracking - and what cannot be obtained.
I only have one objection: A heat map actually can tell you how long people have been looking at a part of the page - if it is generated using the length of the fixations instead of the number. A heat map should always be accompanied by a legend describing how it was generated, otherwise you can't really draw any conclusions based on it.
Posted by: Joakim Isaksson | February 01, 2008 at 04:50 AM
Thanks so much for the feedback.
To clarify, a temperature map can be created based on any criteria, and it's imperative that the analyst have a concrete understanding of what the exact criteria is. However, in a "classic" heatmap (for lack of a better word), the aggregation is performed using fixation location and not time. So when most people refer to a "heatmap" within the niche field of eye-tracking in usability and market research, they are usually referring to a result which has no time dimension.
In the past, we created heatmaps based on fixation duration, but didn't find this to be the best method for analyzing time-based data (I'd be very interested in understanding how you've used this kind of graph). We commonly look at average fixation durations and the number of return saccades to page elements with mean and standard deviation calculations. A large number of studies have investigated fixation duration as a measure of interest and ease of comprehension. However, because a heatmap doesn't have an associated error bar, we usually visualize such data in a different way (in a bar or line graph for example). This analysis allows us to test for statistical significance, and avoids confounds which might be present depending on exactly how the time-based heatmap is produced.
Posted by: Teresa Hernandez | February 01, 2008 at 11:00 AM
I don't really see why fixation duration and count should be analyzed in different ways. You say that fixation duration is best analyzed using for example graphs, but this is also true for fixation count. In my opinion, heat maps should be used as a starting point for analysis and as a way to visualize and motivate conclusions to clients rather than being the foundation of the entire analysis.
Why would the time element require error estimations that the count does not? Doesn't the number of fixations need to be parsed through statistical analysis just like their duration? It is still the same fixations we are talking about, just adding an attribute. Isn't it more relevant to also look at the length of the fixations rather than just giving all of them the same weight in the heat map?
Posted by: Joakim Isaksson | February 05, 2008 at 03:36 AM
I forgot, I have actually never done any sharp analysis of eye tracking data but I work at Tobii and feel that I have a good understanding of the metrics but want to learn more. All opinions I express are my own, not Tobii's.
Posted by: Joakim Isaksson | February 05, 2008 at 03:39 AM
Thanks for the ongoing dialog. (I hoped there would be good discussions on analysis when writing these posts.) There are a few different points you've made that I'd like to address, so here we go:
> I don't really see why fixation duration and count should be analyzed in different ways.
I may be misunderstanding (and I apologize if I am), but I think you may be confusing a heatmap with a direct representation of "fixation count". In a "classic" heatmap, there isn't really a direct relationship between how many times an item is fixated by each individual and the temperature read. In this kind of a heatmap, individual user fixations to the same locations are counted only once. In other words, if a user fixates the exact location 1,5, or even 20 times, a standard heatmap calculation will only register that the user fixated the spot once. Then when averaging for the group is done, your result is a plot of where users looked, and not how long and or how often. This is done to avoid the possibility that 1 user could bias the entire heatmap by continually fixating the same page location.
> You say that fixation duration is best analyzed using [...] graphs, but this is also true for fixation count.
Yes, it's true for "number of fixations". (I only used duration as an example in my previous response). For both of these measurements, the most important properties are mean and standard deviation. This analysis allows comparison of different page elements, texts, and the ability to run statistical analysis. Without some measure of variability, it's difficult to draw useful conclusions from this data -- qualitative or quantitative. If you have mean and variability measurements, the plot you choose is really only important to the extent that it helps you clearly interpret and share the results. I suppose that there's a way to add error bars to a heatmap... but I'm sure you can imagine that a surface plot with +/- error bars would be exceptionally difficult to interpret accurately, and even harder to explain.
> In my opinion, heat maps should be used as a starting point for analysis and as a way to visualize and motivate conclusions to clients rather than being the foundation of the entire analysis.
I completely agree, and that's been one of the main points of this series. Analysts should always look beyond the heatmap.
> [....] It is still the same fixations we are talking about, just adding an attribute.
I'm not entirely sure I understand this part of your question. Please elaborate if my comments above haven't addressed this.
> Isn't it more relevant to also look at the length of the fixations rather than just giving all of them the same weight in the heat map?
This really depends on the research question you're asking. Examining fixation length and location are both valid and useful ways of analyzing eye-tracking data. As for heatmap representations... heatmaps which include a time component risk heavy biases if one person decides to stare at the center of the page for a minute or two (as an extreme example), or simply has a harder time understanding or reading something (as a less extreme example). The effect of this kind of outlier is limited in a calculation which gives all users equal weight. There is also a big question about how one would represent peripheral vision in a time-weighted heatmap. Using other analysis tools avoids these problems and will probably get you a much faster answer.
Posted by: Teresa Hernandez | February 05, 2008 at 11:27 AM