TL;DR – I made a pretty picture for my about page using D3 that you can see here.
It has been a crazy few months but sometime in the middle of June 2023 I started to really get the hang of some of this Django stuff (yeah the 50th iteration of this site is built with Django sue me). One of the things I finally figured out was how to properly serve static files to my site. Static files (CSS and Javascript) were misconfigured in an earlier iteration of my site which left me unable to make interactive and customizable changes to my site. This was a big problem for me and DIMMiN, as it states on my About page:
People don't want a picture, they want a playground. Something they can interact with and learn from.
And man is this site currently lacking in the interactive department. Some of that will be adjusted soon via new Django models (maybe allowing users to interact with / filter / like / comment on posts…?) but soon is not now. Soon is a long time from now. Because for now, the correction of my site happened to coincide very nicely with a new tool I was learning about for work.
I’ve run up against a limit on my current data viz tool – ol reliable – matplotlib. I can make nice visuals that tell me about the data I’m looking at, but creating animations frame by frame is a massive pain. Don’t even get me started on trying to use matplotlib for interactive plots. I know there is functionality for it in matplotlib however that was never its intent. As far as I’m aware matplotlib was developed so that researchers and academics could efficiently produce plots for their papers to effectively communicate the results of their work. This is all to say that this data viz skill was well overdue for an upgrade.
At work I started learning about D3, a javascript library that can be used to make dynamic, compelling, and interactive data visualizations for the web. It has a long list of cool applications that you can read about on the D3 site. The only drawback is that the learning curve can be fairly steep, but I knew when I played with some of the tools that this was exactly the type of tech I was looking for. I started to play around with D3 and to create some visuals, at first static, then adding interactive components. Eventually after building a few prototypes at work, I knew I was ready to start hacking with it on DIMMiN!
Currently I can see two key concepts behind D3.
The first is tying SVG objects directly to a dataset so that each object is referenced like an item in a database (probably why D3 stands for “Data Driven Documents”). For instance, if we take the following simple dataset:
myData = [
{"id": 0},
{"id": 1},
{"id": 2}
]
Then each element of that dataset may reference its own SVG element. Below we can see code that creates our three circles and adds them here via a div:
// Function that makes a chart displaying 3 circles with D3
function makeThreeCircleChart(){
var container = d3.select(".threeCircleChart");
// Append an SVG element to the container
var svg = container.append("svg")
.attr("width", 100)
.attr("height", 100)
.style("margin-left", "auto") // Center horizontally
.style("margin-right", "auto") // Center horizontally
.style("display", "block"); // Center horizontally
// Define dataset
myData = [
{"id": 0},
{"id": 1},
{"id": 2},
]
// Add circles to the SVG based on the data available
svg.selectAll("circle")
.data(myData)
.enter()
.append("circle")
.attr("cx", (d, i) => 30 + i * 30)
.attr("cy", 30)
.attr("r", 10)
.attr("fill", "red")
}
// Call our chart-making functions
makeThreeCircleChart();
Now that’s quite a bit of code for such a simple image, but that’s where the second key idea behind D3 comes from – each visual can be designed from the ground up. Instead of using code that automatically defines and scales the data (which is handled for us in matplotlib), every element in a D3 chart is placed explicitly. It’s completely up to the user to design their charts from scratch. That means that if we wanted to use these circles to create a scatterplot, we would be appending one SVG circle for each point in the dataset and scale the position of each point based on some factor (something matplotlib handles for us under the hood). This gives us a lot of customizability at the expense of a somewhat steeper learning curve. However what D3 lacks in brevity it more than makes up for in interactivity. For instance, we can now add event handlers that allow us to change the color of the three circles when we click on its SVG container:
Or observe the movement of a circle by animating its transition between two positions:
I wanted to start implementing these types of interactive charts into my website and decided to make my UltraLearning bar chart from a previous blog post into an interactive element headlining my about page. To do this I used the same processing method to extract my ultralearning data from my calendar, then applied a few extra steps to clean the data into a simple CSV format:
df = pd.read_excel("20231229_calendar_data.xlsx").iloc[:, 2:-9].dropna().drop(['Day accumulated','Weekday','Location'], axis=1).copy()
ultralearning_df = df[df["Text"].str.lower().str.contains("data science learning")].copy()
ultralearning_df["Description"] = ultralearning_df["Description"].replace('\n ', '')
ultralearning_df["Date"] = pd.to_datetime(ultralearning_df["Date"])
ultralearning_df.set_index("Date", inplace=True)
ultralearning_df = ultralearning_df[["Hours", "Description"]].resample("d").sum()
This gives us the following dataset to work with:
Note that the description is in HTML format. This will be a neat feature that we will come back to later. For now we want to start by making each row into its own bar. We can do this by loading in the CSV directly with D3 and explicitly defining each data type on load:
function makeBasicBarChart(){
// Define the color scheme for our chart
const backgroundColor = "#04344b";
const rectColor = "#535453";
const lightColor = "#f6f705";
// Load in the cleaned Ultralearning CSV from S3
d3.csv("https://dimmin.s3.us-west-1.amazonaws.com/data/ultralearning_data.csv")
.then(function(data){
// Identify the width of the user's blog post
var containerWidth = d3.select('.basicBarChart').node().getBoundingClientRect().width;
const referenceWidth = 1110, referenceHeight = 500;
var aspect = referenceHeight / referenceWidth;
// Given the desired aspect ratio, find out which
// height and width correspond to the current client view of the page
var height = containerWidth * aspect;
var width = containerWidth;
// Format our data with the correct data types
var parseDate = d3.timeParse("%m/%d/%Y");
data.forEach(function(d) {
d["Date"] = parseDate(d["Date"]);
d["Hours"] = parseFloat(d["Hours"]);
});
console.log(data)
})
}
The console output of the data object lets us know that the data object has successfully converted the data in the CSV to a JSON format with the correct data types for each column:
[
{
"Date": "2020-07-29T07:00:00.000Z",
"Hours": 1.333333333,
"Description": ""
},
{
"Date": "2020-07-30T07:00:00.000Z",
"Hours": 1.166666667,
"Description": "<ul><li>Completed working with single table systems</li><li>Analyzed how to work with multi-table systems</li><li>Learned that Foreign keys are primary keys of other tables</li><li>Learned that construction of certain primary tables is not trivial with relation to other tables</li></ul><br>NEXT TIME<br><ul><li>Learning how to Query multi-table systems</li></ul>"
},
{
"Date": "2020-07-31T07:00:00.000Z",
"Hours": 1.166666667,
"Description": "Learned about joining data together using JOIN ON\n\nLearned about using wildcards to find substrings within certain columns (LIKE '%LLC')\n\nLearned about nested queries"
},
...
]
There are a few extra steps here but it’s not too different from our circle chart. The first is we need to define the scale of the X and Y-axes so that D3 knows how large to make each rectangle object. D3 makes this easy by allowing us to scale different data types in the following way:
var scaleX = d3.scaleTime()
.domain(d3.extent(data, d => d["Date"]))
.range([0, width]);
var scaleY = d3.scaleLinear()
.domain(d3.extent(data, d => d["Hours"]*1.1))
.range([height, 0]);
Here we’re using the scaleTime function to scale the X-axis by the datetime input and the scaleLinear function to scale the Y-axis by some numeric value. Then we can add in our SVG:
var svg = d3.select(".basicBarChart")
.append("svg")
.attr("width", width)
.attr("height", height)
.style("background-color", backgroundColor);
And populate that SVG with rectangles corresponding to each datapoint:
var barWidth = (4 / referenceWidth) * width;
// Create groups for each bar and its lights
var barGroups = svg.selectAll(".bar-group")
.data(data)
.enter()
.append("g")
.attr("class", "bar-group");
barGroups.append("rect")
.attr("class", "bar")
.attr("x", d => scaleX(d["Date"]))
.attr("y", d => scaleY(d["Hours"]))
.attr("width", barWidth)
.attr("height", d => height - scaleY(d["Hours"]))
.style("fill", rectColor);
})
To get the bar chart below:
Now we have an outline of what the chart will look like. Nice! But it feels pretty flat. One of the things I’d like to do is add an animation so that when the user loads the page, all of the bars file in from left to right. We can do this by adding a .transition() with a delay to the bar groups on load. We can start each bar off at its current x-position, then allow them to gradually move up to their y-position via an animation. The code to do this is surprisingly simple and is seen below:
barGroups.append("rect")
.attr("class", "bar")
.attr("x", d => scaleX(d["Date"]))
.attr("y", d => height)
.transition()
.delay(function(_, i){
return i
})
.attr("x", d => scaleX(d["Date"]))
.attr("y", d => scaleY(d["Hours"]))
.attr("width", barWidth)
.attr("height", d => height - scaleY(d["Hours"]))
.style("fill", rectColor);
Then I’ll add an event handler here so that the animation will trigger when the SVG is clicked:
// Add the animation on click
svg.on("click", function(event, d){
// Get the current y-position of the bars
currentY = d3.select(".animatedBarChart .bar-group rect:first-child").attr("y")
// If the bars are currently up, allow them to return to
// their position off screen
if (currentY != height){
svg.selectAll("rect")
.transition()
.delay(function(_, i){return i})
.attr("y", height)
} else{
// If the bars are currently down, allow them to return to
// their position corresponding to our data
svg.selectAll("rect")
.transition()
.delay(function(_, i){return i})
.attr("y", d => scaleY(d["Hours"]))
}
})
Now whenever you click the chart below you can play the animation below:
I only wanted this animation to play once when the page loaded. I’d rather save clicks for the user to interact with the different components of the chart. The third column I left in my CSV dataset here was my description, a place where I left some occasional notes about what I studied / worked on in a given day. I’d like a user to be able to hover over each individual bar and see some information about what happened on that day. This information will be displayed in a tooltip box below the chart. When the user clicks on a bar that information should stay until they click on the same bar again (or click on a different bar). The friend who suggested the click functionality calls it “click and stick”, which I think is an incredibly intuitive feature. The code there gets a little more involved so I'll spare you the details, but feel free to try it out below:
It’s a little spastic because the tooltip is making space below the chart to fit the entire description on the page. This is less of an issue in the about page, where we instead allow the tooltip to rest on top of the content. Also did you notice the formatting on the text within the tooltip? This is because we can render the information as raw HTML whose format matches the initial format I used when adding my notes in the Google Calendar description. I think that’s pretty cool!
Finally, I added a few extra touches such as normally distributed stars that roatate in the background and lights for my little buildings. You know, the finer things in life. Of course, the finished product can be found on the About Page here.
I’m certainly looking forward to using more of D3. This tool coupled with my better understanding of Django should allow me to build additional features into this site. For instance, what if a user got to play through the Monty Hall problem and simulate its results live instead of just seeing a bunch of pictures? What if a reader could watch an algorithm work its magic live and play around with the outputs? There are plenty of ways I’m thinking about integrating D3 into my app. While I'll continue to use matplotlib for exploratory data analysis (EDA) and for quick figures, I'm starting to think that D3 is just the right tool for the job when it comes to more advanced interactive data viz.