Scatterplots! Tackling Week 1 of "Back 2 Viz Basics"
Simple data sets
Suggested areas of focus (a weekly chart type)
The opportunity to see other participants' work on Twitter
A wrap up by the organizer with favorite vizzes
Don't be fooled by the title "viz basics": this can be huge for intermediate and advanced users. I've used Tableau for years, but my technical and design skills have soared since I started participating in community projects. In my day-to-day job, I don't get a chance to work with novel chart types or data sets, so these have been a very effective way to learn.
This was my finished viz... but I didn't build it until after working through (and ultimately working around) Eric's core question.
Tucked into Eric's prompt is a clear question:
I'd like to understand if there is a relationship between Seasons Coaching and overall Win Percentage.
The word "relationship" is a hint to try a trendline. So quickly built a viz with
Columns: SUM(Seasons Coached)
Rows: SUM(Win Percentage)
Analysis > Trend Lines > Show Trend Line
Boom! Upward-sloping line: people that have been coaching longer win more games! In technical terms, "You can predict win rate based on seasons coaching".
Not so fast. If you hover over the trendline, you get this description. Pay attention to the p-value:
P-value means (in lay terms): "What's the probability that the association is just happening by chance?" The most common standard is <0.05 (less than 5% probability). In this case, the p-value doesn't meet that threshold. So I chose not to include this line at all, because I don't want to mislead users with the upward slope.
Note that if p < 0.05, that still wouldn't mean that longer coaching tenures cause a higher win percentage. Remember the adage "Correlation does not equal causation". There would be at least four possibilities:
A causes B: Longer coaching careers result in more wins
B causes A: More wins cause a longer coaching career (e.g., successful coaches don't get fired)
A and B are both caused by a third factor (e.g., more prestigious schools attract better players and coaches who enjoy that prestige are less likely to retire)
The relationship is due to chance (always a possibility, even a small one)
Like many people, I went in a different direction: a dynamic quadrant chart. This is a opportunity to create a scatterplot geared to a specific user: a basketball fan who wants to compare their favorite coach to other top coaches.
It's also a great opportunity practice intermediate/advanced Tableau functionality. If you've never built this type of chart before, the basic steps are:
Build your scatterplot.
Create a parameter for the "reference coach".
Use LODs (level of detail expressions) to figure out the reference coach's number of seasons and win percentage.
Add reference lines to the chart.
Create a calculation to split the data into 4 quadrants based on the reference coach.
Add the quadrant to the color shelf.
Add a parameter action to change the reference coach when a user clicks on a different point.
Sounds like a lot, but Andy Kriebel (@VizWhizBI) has an excellent video to lead you through all of this in eight short minutes:
Other Quick Notes
X- and Y-axis
The convention in a scatterplot is:
Explanatory variable goes on the x-axis
Predicted variable goes on the y-axis
So if we think that seasons coaching will predict win rate, then the former goes on X and the latter on Y.
Confession: I didn't think about this at all. Besides, we weren't able to confirm an association between the two.
In my mind,
Time usually goes on the X-axis. So "number of years coaching" translated to left to right orientation.
"Better" things are "higher" (on a y-axis). So the very best coaches will be near the top.
This arrangement seemed natural to me, and most people picked the same one.
Inspiration this week came from Marcin Pielużek (@mpieluzek), whose viz has a beautiful color scheme. I really liked the way he included the filled bar charts as a summary on the right, so I took a similar approach.
Tableau makes it easy to add simple statistics, but it doesn't interpret them for you. Unfortunately, stats are complicated and non-intuitive, and most business users are flummoxed by them. I'm trying to explain things in lay terms. But it's also been ten years since I took statistics in grad school, and I don't use them day-to-day. So please, if I misstated or oversimplified anything, let me know so that I can correct it!