Code Archives - Nightingale | Nightingale | Nightingale The Journal of the Data Visualization Society Tue, 17 Feb 2026 16:42:26 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 https://i0.wp.com/nightingaledvs.com/wp-content/uploads/2021/05/Group-33-1.png?fit=29%2C32&ssl=1 Code Archives - Nightingale | Nightingale | Nightingale 32 32 192620776 The Back of the Painting: On Structure, Integrity, and Data Visualisation https://nightingaledvs.com/the-back-of-the-painting/ Tue, 17 Feb 2026 16:41:46 +0000 https://nightingaledvs.com/?p=24599 In the early 1420s, Fra Angelico, a Dominican friar and painter, completed his first large-scale work for the newly built monastery at San Domenico. The..

The post The Back of the Painting: On Structure, Integrity, and Data Visualisation appeared first on Nightingale.

]]>
In the early 1420s, Fra Angelico, a Dominican friar and painter, completed his first large-scale work for the newly built monastery at San Domenico. The San Domenico Altarpiece is one of the Early Renaissance’s defining works and adorns the high altar where the friars once sang their hymns during the Divine Office. Last year, the altarpiece was removed for restoration and featured in a major exhibition across Florence. On the front, the polyptych depicts four haloed saints in a single unified space, each attentive to the Virgin and Child. The Virgin and Child are themselves surrounded by angels with vibrant multi-coloured wings, their feathers shifting though a prismatic palette that is particularly iconic of Fra Angelico’s work. 

San Domenico Altarpiece by Fra Angelico. (Source: Web Gallery of Art)

To look at the back of the high altarpiece, however, is to see an intricate collage of wood from various centuries. It serves as a physical record of how the work has been altered as tastes have changed over time. In the seventeenth century, carpenters recut the original panels and added new wood to force the piece into a rectangle. Beechwood inserts, shaped like butterflies, and crossbeams cover the surface, running against the natural grain. Poplar meets beechwood, intersecting in different directions. Each species moving discordantly with humidity and the passage of years.

Roberto Buda, a conservator who specialises in wooden panel paintings, spent close to nine months stabilising the altarpiece’s structure. Working with his team, he removed the existing crossbeams and butterfly-shaped inserts, replacing them with carefully matched old poplar wood infills aligned parallel to the wood’s grain. A new frame was added with conical springs that allow the wood to move naturally. “It’s a house,” Buda told the Financial Times during the restoration. “If you don’t have a good foundation, it doesn’t hold up. The painting will never look good if the support is not right.” 

Months later, as I sat at my laptop placing an axis in the centre of the page, I thought again about this quote.

A deliberate transition in my career was marked in 2025. During my PhD in experimental neuroscience, I learned to do many things at once. I built hardware and software. I designed experiments. I ran those experiments, analysed the data, visualised the results, wrote papers, and taught students. Academia rewards this kind of breadth and a range of technical skills accumulates quickly. Yet, I found myself most engaged at the very end of the workflow, sitting with a dataset that had not yet been interpreted. I wanted to slow down and to look for the narrative in the data. To focus not only on results but on how those results are communicated—clearly, honestly, beautifully. In academic research, figures are often produced in haste, appended at the end of the pipeline. There is a script, a deadline, a familiar plotting function. In Python, with the visualisation library Matplotlib, you can call plt.bar(), and a chart appears. Microsoft Excel goes further still, delivering a fully formed graphic with colours and proportions chosen on your behalf. I wanted to build visualisations with greater intention and technical freedom, and this is what led me to the open source JavaScript library, D3.js.

D3 stands for Data-Driven Documents and is a low-level library which uses the full capabilities of web standards such as CSS, HTML, and SVG to build sophisticated and interactive data visualisations. While other visualisation tools hand you a bar chart or a scatterplot, to represent data in D3 you must manually calculate the scales, define the coordinate system, and bind the data to a graphical element. You must decide exactly where an axis sits and how a margin breathes, what a data point is—a circle, a path, a mark—and how it behaves when the data changes. Nothing appears unless you build it. D3 is a workshop full of raw timber and hand saws.

With this in mind, I applied to the Data Visualisation Society’s mentorship program, intent on learning D3. Under the guidance of my brilliant mentor, Sam Bloom, I spent ten weeks at the end of 2025 working through the library’s fundamentals and building an interactive visualisation. We focused on first principles before developing an interactive scatterplot to explore Ancient Greek colour perception. Progress was slow at first because the learning curve was steep, but as I learned to build in D3 and perform this kind of digital carpentry, visualisation began to resemble construction. Every line of code was doing structural work. Figures included in this essay show examples of the interactive scatterplot, which examines the sensory dimensions of Ancient Greek colour by focusing on the major colour adjectives used by Homer in the Iliad. The Ancient Greek experience of colour was inseparable from motion and shimmer. Colour was a basic unit of information which reflected the natural world—encoding brightness and darkness as fundamental dimensions. Greek colour terms not only prioritised luminosity but the play of light across surfaces, the texture of materials, even the social standing implied by a sheen or shade. It was a colour vocabulary rooted in their lived perception, rather than the modern hue-based categories we use today. Selected excerpts from my D3 code illustrate how each visual element is constructed, for example, the multiple lines of code required to precisely position and size tick marks along each axis. The full project can be viewed here.

When a reader encounters a clean scatterplot, they see only the front of the painting. They don’t see the scaffolding: the decisions about scale domains or the choices about what not to encode. While the interactive scatterplot I built at the end of the ten weeks was modest, I could explain why each element existed and how it related to the data. Each decision—scale, colour, interaction—could be justified. Good data visualisations often look deceptively simple. But this clarity is the result of many intentional decisions about the data and the visual design. 

For example, ~90% of the charts the Financial Times publishes are bar charts or line graphs, yet because these charts adhere to a defined set of design principles, down to the very placement of the title and the subtitle, it makes the FT’s graphics some of the most recognisable in newsroom data visualisation. This coherence is maintained through meticulous style guides, which dictate everything from the weight of an axis line to the specific hex code of a categorical blue. These guides function as a visual vocabulary or grammar. Alan Smith, the FT’s Head of Visual and Data Journalism and who led the design of the FT’s visual vocabulary, has previously championed the idea that a chart should be as readable as a sentence. Alberto Cairo, a professor of visual journalism at The University of Miami, has often argued that the most important part of a visualisation is the “reasoning” that happens before the first pixel is placed. In his book, The Art of Insight, he argues that there are really no rules of data visualisation, there’s just reason. Every design choice must be a defensible, rational response to the data and the intended audience. 

These ideas are not confined to style guides or theory; they are persuasive when also used for animation and interactivity in visualisation. When such principles are applied with narrative intent, even complex data can be immediately comprehensible to an audience. A widely cited example of the power of a simple but intentional use of data visualisation is Hans Rosling’s 2006 TED talk. Rosling revealed patterns in a complex dataset through an animated scatterplot in which countries appeared as circles, mapped by measures such as life expectancy (on the x-axis), countries’ GDP (on the y-axis), and population (the size of the circle). As the animation unfolds, these circles shift across the axes allowing long-term trends to emerge gradually rather than all at once. Rosling paired this animation with carefully selected narration and emphatic gestures to guide his audience to the most meaningful changes as they occurred. The result was a complex global health story made easy to understand through intentional narrative decisions and clear visual structure.

The painted surface of Fra Angelico’s altarpiece is inseparable from its support. The relationship between the painted surface, the underlying preparation, and the wooden support beneath, makes the altarpiece a three-dimensional object rather than a flat image viewed only from the front. The butterfly-shaped beechwood inserts, which were set against the direction of the grain, introduced stresses that increased the risk of cracking, jeopardising the paint layer.

A data visualisation is a three-dimensional object of logic. If the underlying structure is weak, if scales are arbitrary or axes misleading, the surface won’t stand up to scrutiny. The narrative ‘paint’ (the colour palette, the interactivity, etc) will eventually crack. For example, decisions relating to the axes scale depend on what counts as meaningful in a given context. Although it is often suggested that a y-axis should begin at zero to preserve proportional accuracy, this convention can obscure important variation when the relevant changes are small, as is often the case with climate data. A review by Steven Franconeri, professor of psychology at Northwestern University, illustrates this clearly: a temperature chart anchored at zero degrees Fahrenheit flattens visible change, while a version scaled to the relevant temperature range makes trends legible without distorting the data. A widely criticised, since-removed National Review article employed a temperature chart with a lower bound of –10 degrees Fahrenheit, a choice that made recent increases in global temperature appear negligible.

Wood is a living thing and it needs to move. Buda and his team’s addition of a new, more encompassing frame made of chestnut wood and conical springs allowed the altarpiece painting to breathe through the natural movement of the wood in different directions. I developed my D3 visualisation in tandem with the JavaScript library React. In modern web development, React acts as the frame of chestnut wood. It is often described as a library for building user interfaces, but at its core it is a way of thinking about state and change. You describe what the interface should be given certain conditions, and React takes responsibility for updating it when those conditions shift. React holds the structure and lifecycle of my visualisation and D3 handles the math: scales, layouts, transitions that respond to data.

This article is not about JavaScript, or frameworks, or even data. It is about integrity in design. It is the realisation that the most important work we do as data visualisation developers is often the work that the reader will never see. When the San Domenico Altarpiece returns to the walls of the monastery, the public will only see the Virgin and Child, resplendent and serene. They do not see the new poplar inserts running parallel to the grain or the conical springs hidden within the frame. When we design good visualisations, we are doing something similar, we are building the foundations so that the story can stand on its own. We are building houses for data. Every axis, every scale, every line of code is a poplar insert aligned to the grain.

CategoriesCode

The post The Back of the Painting: On Structure, Integrity, and Data Visualisation appeared first on Nightingale.

]]>
24599
Scrollytelling with Closeread: The Super Low-Code Way to Bring Your Data Project to the Web! https://nightingaledvs.com/scrollytelling-with-closeread/ Thu, 22 May 2025 14:39:00 +0000 https://dvsnightingstg.wpenginepowered.com/?p=23584 Introduction What is Scrollytelling? Scrollytelling is a dynamic, interactive storytelling technique often used in web-based formats, that reveals insights, visuals, and narrative elements as the..

The post Scrollytelling with Closeread: The Super Low-Code Way to Bring Your Data Project to the Web! appeared first on Nightingale.

]]>
Introduction

What is Scrollytelling?

Scrollytelling is a dynamic, interactive storytelling technique often used in web-based formats, that reveals insights, visuals, and narrative elements as the user scrolls down the page. It allows data stories to unfold gradually, guiding the reader through a structured narrative in a way that feels both natural and engaging.

Why Scrollytelling Is Effective for Data Communication

Scrollytelling is a powerful way to communicate data because it helps reduce information overload, boosts user engagement, and makes insights easier to digest. Rather than overwhelming users with dense dashboards or complex visuals all at once, it guides them through your story step by step—just by scrolling.

Scrollytelling is not a replacement for other presentation methods such as dashboards and static pdf reports. Instead, it works best when there’s a need to communicate stories or data insights to a broad audience with varying levels of data literacy. It allows you to wrap each insight in meaningful context and empowers you to control the pacing and structure of your narrative while keeping readers engaged through suspense and sequential reveals. This results in a smoother, more intuitive experience, especially for readers who need guidance or are less data-savvy. This level of engagement is often difficult to achieve with other traditional methods of presentation. As a dynamic and versatile technique, scrollytelling supports various content formats such as text, charts, maps, GIFs, images and more.

The Challenge

Despite its many advantages, scrollytelling has traditionally required web development skills—something many dataviz professionals don’t typically have. In the past, even large media houses with dedicated teams would spend significant time and effort building a scrollytelling project. The tradeoffs were high, making it a less viable option for time-sensitive or resource-constrained projects.

For smaller teams or solo practitioners, this barrier has often made web-based storytelling feel out of reach. But that changes today. Thanks to the many developer communities, the barriers have been so significantly lowered that you can put up your scrollytelling project in a few hours, many of the times, without even needing to code!

What You’ll Learn in This Tutorial

By the end of this tutorial, you’ll be able to build and deploy a fully functional scrollytelling project that takes your insights beyond dashboards and onto the web! Specifically, you’ll be able to:

  • Set up your environment and craft your data story using the scrollytelling technique
  • Build your project locally and deploy it to the web for free using GitHub and Vercel (or any other deployment platform that supports dynamic webpages)

Don’t worry—we’ll walk through everything step by step, from scratch. Whether you’re an absolute beginner or just looking to sharpen your skills, this tutorial will help you build your first scrollytelling project from the ground up!

One more thing: This tutorial is designed to be hands-on, so as you follow along, feel free to copy each line of code and paste it into your Closeread document to see it in action.

Tools we’ll be using

For this project, we’ll use the Closeread extension to create our data scrollytelling experience. Closeread is a Quarto extension designed specifically for building interactive, scroll-based narratives. To use Closeread, you’ll need two key tools:

  1. Quarto: an open-source publishing system that supports Python, R, Julia, and ObservableJS. It allows you to create dynamic, multi-format documents using Markdown, Jupyter Notebooks, or your preferred editor. Since Closeread is built on top of Quarto, installing Quarto is a necessary first step.
  2. A Code Editor: This is where you’ll write and manage your project files. We’ll be using Visual Studio Code (VS Code) in this tutorial, but feel free to use alternatives like RStudio, Atom, or any editor that supports Quarto projects.

To get started, install the Quarto command line tool from the official Quarto website. Follow the standard installation process for your operating system. Since I’m using Windows, I downloaded it as shown below.

Downloading the Quarto installer from the official website.

We’ll also be using GitHub for version control and Vercel to deploy the final project to the web.

Once you’ve installed Quarto, you’re ready to install the Closeread extension. We’ll cover that in the next section.

Set up your project environment

Step 1: Set Up Your Project Directory

Start by creating a folder named closeread_tutorial. You can place this folder anywhere you’d like your project to live. Personally, I prefer to keep it on my Desktop, so my directory structure looks like this:

C:\Users\USER\Desktop\closeread_tutorial

Next, open a terminal and navigate to the folder you just created. An easy way to do this is by copying the full path to the folder.

If you’re on Windows, press Windows Key + R, type cmd, and hit Enter to open the Command Prompt.

Then, run the following command (update the path to match your own folder location if different):

cd "C:\Users\USER\Desktop\closeread_tutorial"

This sets your working directory to the project folder. You can confirm it’s successful by checking that the command prompt now matches the folder path you copied earlier.

Command prompt confirming that the working directory is now set to the Closeread project folder.

Install the Closeread Extension

To install the Closeread extension, run the following command in your command prompt:

quarto add qmd-lab/closeread

Make sure you’re connected to the internet, as this command will fetch the extension from an online repository. You may receive a few prompts asking whether Quarto extensions should be allowed to execute code during document rendering. Simply type Yes for each prompt to proceed with the installation.

Your command prompt should now look similar to this:

Command prompt showing that the Closeread extension was successfully installed in the project folder.

The message highlighted in red confirms that Closeread has been successfully installed. You can also verify this by refreshing your project folder. You’ll notice that a new folder named _extensions has been added to it.

Congratulations! 🎉 You’re now all set to create your first Closeread project.

Let’s dive in!

Create a basic Closeread project

Now, inside your project folder, create a new file named index.qmd. Open the file in your code editor and paste the following lines of code:

---
title: My First Closeread
format: closeread-html

---

Hello World! Please read my Closeread story below.

:::{.cr-section}

Closeread enables scrollytelling.

Draw your reader's attention with focus effects. @cr-features

:::{#cr-features}
1. Highlighting  
2. Zooming  
3. Panning  
:::

:::

You’ve just created your first Quarto document! 🎉

Now, let’s render and preview it to see your Closeread project in action. Go to your terminal and run this quarto command:

quarto render index.qmd

This should render your project.

After rendering, you’ll notice that new file and folder have been added to your project directory:

  • A folder containing the necessary libraries and assets used by your Closeread project.
  • An HTML file generated from your base Quarto document, which serves as the interactive output.

These confirm that your project has successfully compiled and is ready for further development.

To preview the project you just created, open the index.html file in your browser—and voila! Your first Closeread project is live!

We will dedicate the next section to understanding the building blocks of a Closeread project. Let’s ride on 🔥

Understand the building blocks of Closeread

A Closeread project is built as a section within a Quarto document, defined using fenced divs. At its core, a Closeread section consists of three main components: Section, Sticky, and Trigger.

1. Section

A Closeread section is created using opening and closing fenced divs with the .cr-section class. This defines the scrollytelling block.

Here’s what the simplest Closeread section looks like:

:::{.cr-section}
This is a Closeread section
:::

This section can be enhanced with stickies (content that remains fixed while the user scrolls) and triggers (content that activates the sticky as the user scrolls). We’ll explore those next.

💡 Pro Tip: If you wrap your entire Quarto document in a fenced div with the .cr-section class, the whole thing becomes a Closeread document. 😉 This means everything in your document becomes part of the scrollytelling experience—great for fully immersive data stories!

2. Stickies

A sticky is an element within a Closeread section. It could be a block of text, an image, a video, or any element that can be rendered in the browser. It’s the element you want to perform closeread on. This means you can set it to stick to the screen as the reader scrolls through the page.

Stickies can also be made invisible by default, and only appear when the viewer scrolls to the point where the trigger is activated.To declare an element as a sticky, wrap it within a fenced div and assign it an identifier prefixed with cr-, as shown below:

:::{#cr-identifier}
This block of text is a sticky!
:::

Since the sticky must be enclosed within a section, the full code would look like this:

:::{.cr-section}
This is a Closeread section

:::{#cr-identifier}
This block of text is a sticky within the Closeread section!
:::

:::

3. Triggers

As you might already know, a trigger is the element that activates a sticky in a Closeread document.

Remember the cr-identifier we assigned to the sticky above? The one prefixed with cr-? That’s the element we’ll use to trigger the sticky.

Here’s how triggering works:

  • Identify the point in your document where you want the sticky to be activated.
  • At that point, reference the sticky’s identifier—just replace the cr- prefix with @.

So, cr-identifier becomes @identifier.

Let’s update our full code to include a trigger:

:::{.cr-section}
This is a Closeread section

I want my sticky to appear here ➡ @cr-identifier

:::{#cr-identifier}
This block of text is a sticky within the Closeread section!
:::

:::

Simple, right? When a reader scrolls to the trigger (@identifier), the sticky pops into view!

Updated code:

Now, copy the updated code and paste it into your index.qmd file, replacing everything after the line that says:

Hello World! Please read my Closeread story below.

Your document should look like this:

---
title: My First Closeread
format: closeread-html

---

Hello World! Please read my Closeread story below.

:::{.cr-section}
This is a Closeread section

I want my sticky to appear here @identifier

:::{#cr-identifier}
This block of text is a sticky within the Closeread section!
:::

:::

Re-render the project:

Once you’ve updated your index.qmd file with the new code, open your terminal and run the quarto render command just like before:

quarto render index.qmd

Refresh your browser tab to see the updated Closeread project. As you scroll, your sticky will appear at the specified trigger point!

The updated Closeread project displayed live in the browser.

Celebrate this milestone so far. If yours is not as shown in the screenshot below, take some time to review your code and ensure that it is similar to the one above.

Adding styling and interactivity to your Closeread document

Closeread offers several options for styling your project—ranging from prebuilt effects to full-fledged themes. What’s more, you can even extend your project’s styling using an external CSS stylesheet. The Closeread styling documentation provides a detailed guide on how to style your document. You can declare the styling template in the YAML configuration section of your document. For this project, let’s apply some of these techniques to further customize our document, starting with the basics: focus effects.

Focus Effects

Focus effects are prebuilt functions within Closeread that add interactivity and dynamism to your Closeread projects. As described in the Closeread documentation, these features “guide your readers’ attention to aspects of your stickies.” A summary of these effects is provided in the table below:

EffectDescriptionSyntax Example
ScalingMagnifies or reduces the size of an element by a given factor.scale-by=”3″: Triples the size of a sticky.
PanningMoves the view to a specified section of the sticky (e.g., top-left corner).pan-to”-30px,30px”: Pans 30 pixels left and 30 pixels down.
ZoomingEnlarges a specific portion of the context to focus the reader’s attentionzoom-to=”3″: Zooms into line 3.
HighlightingVisually emphasizes a span of text or a line by changing its style or color.highlight=”2-3″: Highlights lines 2 to 3.

Focus Effects in Action

The purpose of this section is to demonstrate some of these focus effects. The next few lines contain short narratives along with their corresponding Closeread commands. We’ll use some images and text blocks as stickies and apply these effects to them.

NOTE: I’ve taken a conversational approach to explain the purpose of each feature. This is to keep things engaging. But don’t forget, the narratives also form part of the text you’ll copy into your Closeread document!

Now, back to our updated code. Quickly read through the following lines to get a sense of the flow. Afterwards, download these two images: grid.jpg and grid-highlighted.jpg. Create a folder named images directly inside your main project folder (where your index.qmd file is), and paste the two images you just downloaded into this folder. Then, copy the code block below into your Closeread document to see the effects in action:

Below is another block of text we'll be working with: @cr-highlighted
First, let's scale this block of text by two:
Scale this block of text by two [@cr-highlighted]{scale-by="2"}

Next, we’ll highlight lines 2 and 3 while keeping the same scale:
Lines 2 and 3 are scaled and highlighted [@cr-highlighted]{scale-by="2" highlight="2-3"}

Now, let’s bring in an image:
Loads an image @cr-image

It’s a bit large at first as it takes up the full screen. Let’s scale it down:
Image has been scaled down [@cr-image]{scale-by="0.5"}

Finally, we’ll pan to the portion highlighted in red:
Pan the image to the section highlighted in red [@cr-image2]{pan-to="-75%,75%" scale-by="1.5"}

:::{#cr-highlighted}
| 1⃣ This is the first line.
| 2⃣ This is the second line.
| 3⃣ This is the third line.
| 4⃣ And this is the fourth line.
:::

:::{#cr-image}
![](images/grid.jpg)
:::

:::{#cr-image2}
![](images/grid-highlighted.jpg)
:::

💡Pro tip: When you pan and scale at the same time, you end up zooming! (pun intended 😉)

Note: Panning can be a bit unintuitive at first. You might need to experiment with the position values to get the result you want. A bit of trial and error helps here.

Applying Additional Styling

Up to this point, the YAML configuration section of our project looks like this:

---
title: My First Closeread
format: closeread-html

---

Update it to apply the following styling:

---
title: "Understanding Tree Diagrams"
theme: "superhero"
fontsize: 16px
format: 
    closeread-html:
        cr-section:
            layout: "sidebar-left"
        cr-style:
            section-background-color: "#08508a"
            narrative-background-color-overlay: "#08508a"
            narrative-text-color-overlay: "#08508a"
            narrative-border-radius: 5px
            narrative-overlay-max-width: 60%

---

What we just did: Modified the YAML configuration to include some additional styling, such as:

  • Setting the layout to sidebar-effect
  • Defining the background color under cr-style
  • Setting the theme to super-hero
  • Adjusting the font size, and more

Each of these would have required more complex CSS code, but Closeread simplifies the process—you can simply call a named section and apply the style directly.

Applying custom CSS

If you’d like to further customize your Closeread project using an external .css stylesheet, you can follow the standard approach used in regular web development: by assigning styles directly to elements. All you need to do is link your Closeread document to the external stylesheet—and you’ll do this in the YAML section of the document (the part enclosed by triple dashes).

In this example, let’s change the color of the text in the narrative section of our Closeread project. The narrative section is the part of your story that delivers the main content. By default, the text appears black on desktop. We want to change it to white.

Steps:

  • Within the root of your project directory, create a new empty file and name it style.css.

Paste the following lines of code into the file and save it:

.narrative {
color: white;
}
  • Next, reference the external CSS file in your Closeread document. You can do this by navigating to the YAML configuration section of your document and pasting the following line:
css: style.css

Your YAML section should now look like this:

---
title: "Understanding Tree Diagrams"
theme: "superhero"
fontsize: 16px
format: 
    closeread-html:
        cr-section:
            layout: "sidebar-left"
        cr-style:
            section-background-color: "#08508a"
            narrative-background-color-overlay: "#08508a"
            narrative-text-color-overlay: "#08508a"
            narrative-border-radius: 5px
            narrative-overlay-max-width: 60%
css: style.css

---

Take note of the indentation!

Publish and deploy

You’ve made it this far—well done! You’ve built your first Closeread project. But a project this good shouldn’t live only on your computer. It’s time to publish it to the web and share it with the world!You’ll use GitHub to store your project online, and Vercel to host and publish it for free.

Step 1: Create Your GitHub & Vercel Accounts

  • Go to github.com → Click Sign Up and follow the steps to create your account.
  • Then, visit vercel.com → Click Start for Free and sign up using your GitHub account. This allows Vercel to access your repositories for deployment.

Step 2: Upload Your Project to GitHub (No Code Required)

  1. On GitHub, click the + icon at the top-right → Select “New repository”.
  2. Give your repository a name like closeread-project, and click Create repository.
  3. On the next page, click “Uploading an existing file”.
  4. Locate your Closeread project folder on your computer.
  5. Drag and drop everything inside the project folder into the GitHub upload area.
  6. Scroll down, add a commit message like Initial upload, and click Commit changes.

Great! Your web story is now on GitHub.

Step 3: Deploy with Vercel

  1. On the Vercel dashboard, click “Add New” > “Project”.
  2. You’ll be prompted to choose your preferred Git provider. Select Continue with GitHub
  3. You’ll see a list of your GitHub repositories. Select the one you just uploaded.
  4. Configure your settings:
    • Framework Preset: Choose Other or Static Site
    • Output Directory: leave the default option (root)
  5. Click Deploy.

Vercel will build and deploy your project in seconds.

Step 4: View & Share Your Live Story

Once deployment is complete, you’ll get a live URL with which you can access your project live on the web. e.g https://closeread-tutorial.vercel.app/

Click the link to view your published Closeread story—fully interactive and hosted online!

Closeread project – live on the web!

Conclusion

If you’ve followed this tutorial up to this point, you should now be familiar with how to build a scrollytelling project from scratch using Closeread. You’ve learned the core building blocks of a Closeread project—such as sections, stickies, and triggers. You’ve also explored how to style your project using both built-in options and external CSS files. Finally, you now know how to host your project on GitHub and deploy it with Vercel, so your story can go live and be shared with the world.

This gives you a solid foundation for taking your data storytelling skills to the next level.

Up next is a second project I’ve included to give you a more hands-on experience. You’ll find a script and an image folder linked here. Your task is simple: follow the script, insert the appropriate images, and apply the relevant Closeread effects to bring the story to life—just like in the completed version here.

This practical exercise is designed to help you reinforce everything you’ve just learned and give you the space to experiment further with Closeread’s effects and features. Once you’re done, feel free to share your completed project with your network on social media—and don’t forget to tag me. I’d love to see what you come up with!


This project is also available on GitHub.

CategoriesHow To

The post Scrollytelling with Closeread: The Super Low-Code Way to Bring Your Data Project to the Web! appeared first on Nightingale.

]]>
23584
The “Dashboard” is Broken https://nightingaledvs.com/the-dashboard-is-broken/ Wed, 16 Apr 2025 16:52:29 +0000 https://dvsnightingstg.wpenginepowered.com/?p=23397 The value of dashboards has eroded. When executives hear the word “dashboard” today, they envision standard charts in BI platforms—obligatory elements for meetings rather than..

The post The “Dashboard” is Broken appeared first on Nightingale.

]]>
The value of dashboards has eroded. When executives hear the word “dashboard” today, they envision standard charts in BI platforms—obligatory elements for meetings rather than catalysts for insight.

Business leaders once championed dashboards as windows into organizational performance, but they became too familiar, too technical, and the value diminished. As evidence, one look at the relationship between those with roles in “business intelligence” in comparison to the business leaders they serve shows the massive gap in seniority, influence, and wages.

How did this happen? Let’s discuss these 3 ideas:

  • Dashboard rot devalued BI
  • Data people were never trained in design or communication
  • D3.js is complicated

Dashboard rot devalued BI

Business leaders scrambled to use data to inform the C-suite, and in the process, multiple layers of the organization had their own dashboards. When BI software became a premium license, it was only a matter of time before enterprises began counting which dashboards were used and which had never been used. The overwhelming under-utilization of dashboards across an organization led to the term “dashboard rot” which is a fundamental misunderstanding of what the value was in the first place. It’s like counting all Word documents in an organization vs what is published. The value has always been in the insight, not in the number of documents.

The way BI software was monetized ended up devaluing its own importance. Dashboards became an IT cost-center in many regards instead of a strategic advantage. It became a burden in the organization, and in many organizations, “reporting” was seen as boring and a potential waste of time.

Thinking of the value of BI differently, if a dashboard can make a $1M decision easier, is it worth $1M? If, over its lifetime, it supports a $5B company for running its business daily, does that still make it worth $1M or more? On the contrary, organizations don’t think of investing in software in the same way: software is a strategic advantage, but dashboards are just the cost of doing business.

Data people were never trained in design or communications

Maybe part of the reason why dashboards instill a certain amount of hesitation is because most are not well designed. Many people working in analytics come from data science, data engineering, or data analysis backgrounds, and those fields lack significant design or communications training. While it is impossible to say all dashboards are badly designed, I’m certain that most people who create dashboards do not consider themselves to be “good designers.”

There’s a big difference between the kind of high-level graphic design we see in advertising or in consumer apps and the kind of important tweaks that could easily elevate most dashboards. In fact, most dashboards can probably get a significant lift by adjusting the language used in titles and labels alone.

The success of data literacy programs proves the importance of training people in more than just foundational data visualization practices.  This shift—if we can make it one—from data towards communication might see the value returned to business intelligence, ushering in a new generation of thought partnership between analytics professionals and organizational leadership.

D3 is complicated

The reason why BI software exists is because custom coding charts was difficult. When D3.js was invented, an entirely new way to draw shapes in the browser created new opportunities to visualize data from simple charts to multidimensional interactive tools. But developing charts with D3.js was far from straightforward and pushed it into the domain of software development.

While it is not the fault of D3 that dashboards have lost their zest, the complexity of doing this work opened the door for faster (and therefore cheaper) tools to take its place. Many frameworks to create interactive charts for business sprang up each with their own tradeoffs, each focused on their own flavor of front-end, and in the process, the software design was assigned to the UX designer. I’m a former UX designer, and I can tell you definitively that data visualization and data communication simply does not exist in user experience design—despite the fact that almost all software design is a visualization of data.


Maybe it’s time we drop the idea of dashboards and focus instead on data communication? By adopting this shift we might just recontextualize the power of data.

There’s a lot here to discuss, so please let me know what you think!

This article originally appeared at: https://www.linkedin.com/pulse/word-dashboard-broken-jason-forrest-agency-aco1e

The post The “Dashboard” is Broken appeared first on Nightingale.

]]>
23397
Introducing Girls To Code, One Flower at a Time https://nightingaledvs.com/introducing-girls-to-code-one-flower-at-a-time/ Tue, 23 Jan 2024 15:28:13 +0000 https://dvsnightingstg.wpenginepowered.com/?p=19742 The Data Garden Project started as a small group of creative learners. Now, it’s growing into a global community.

The post Introducing Girls To Code, One Flower at a Time appeared first on Nightingale.

]]>
How do we introduce data visualisation in an engaging and approachable way for communities of young women? What would it look like to learn code as a medium for art and creative storytelling? In this interview, Arran Ridley sat down with Joanne Amarisa, founder of the Data Garden Project, a growing resource and learning community for young people to share data-driven stories about their lives using creative coding.


A group photo of a Data Garden Project workshop.

Can you introduce yourself, Jo?

Of course! My name’s Jo, and I’m a designer and writer based in Melbourne, Australia, originally from Indonesia. I completed my studies in design at RMIT University here in Melbourne, and since then my passion has been towards furthering storytelling, technology, and design for education and community. Ultimately, it has led me to start and build the Data Garden Project!

What is the Data Garden Project? Can you tell me more about it?

The Data Garden Project (DGP) is a free resource and learning community that introduces young people to data visualisation and creative coding. It explores using data and code as a storytelling medium—to share data-inspired stories about your own life with your peers or loved ones.

The idea for the DGP came as a Capstone graduation project when I was finishing my design degree. I took a creative coding unit during my final year where I created a data viz artwork of WhatsApp conversations between my mother and I while we were separated during lockdown—visualised in a garden metaphor. I titled it A Garden of my Mother’s Concerns.

Screenshot of A Garden of My Mother’s Concerns, Jo’s first data art creative coding project.
Screenshot of A Garden of My Mother’s Concerns, Jo’s first data art creative coding project.

I was intrigued by how the creative meets the generative in the process of coding, but also how data and code can be used to convey meaningful, personal stories that we can share with others. So I decided I wanted to pass on this same experience and learning opportunity to others—especially fellow girls or young women from non-computing backgrounds, like me!—to learn creative coding with me as a new way of visualising data stories.

That’s an interesting concept. How exactly do you do that?

By embarking on a “Data Garden Project,” you’re invited to find and gather data from your life or surroundings—this can be the food in your pantry, music in your playlists, interactions with family or friends—and then we visualise that data using the basics of drawing with p5.js, a beginner-friendly JavaScript creative coding software.

Screenshot of the p5.js Web Editor
The p5.js Web Editor.
Students’ final data visualisation projects made using p5.js.
Students’ final data visualisation projects made using p5.js.

A big precedent or inspiration for the project is Dear Data, created by Giorgia Lupi and Stefanie Posavec, a year-long analogue “data journaling” project as a way of sharing, with each other, data stories from their own lives through sketches on a postcard. Replacing pencil drawings with p5.js canvases, our project uses coding as the storytelling medium. I guess you can say it’s part coding class, part data “treasure hunt”, and part collective journaling activity.

What has the journey been like for the project?

We were awarded our first creative grant from the Blackbird Foundation,a VC based in Sydney, in August 2021. When I got accepted into the grant program, I was part of a design student club at my university, and I did a call-out on Facebook asking if anyone would be interested in joining me on this “mission”—which at the time had not taken any form! 

Luckily, I met four wonderful peers and recruited them as my first team, and we quickly became dear friends. We took to Discord and “beta-tested” the Data Garden learning material by running two to three “Team Tutorial” sessions every week, where I would tutor them and we would go through each module and exercise via a group video call.

A Discord Team Tutorial session on how to parse data using Excel. Some of the team members are smiling and clapping.
A Discord Team Tutorial session on how to parse data using Excel. Some of the team members are smiling and clapping.
A Discord Team Tutorial session on drawing basic shapes with code. The screen shows a series of rings and circles forming a solar system, and the team members discuss on the side of the screen.
A Discord Team Tutorial session on drawing basic shapes with code. The screen shows a series of rings and circles forming a solar system, and the team members discuss on the side of the screen.
Another example of our Discord Team Tutorials, the team laughing while working on debugging coding projects together.
Another example of our Discord Team Tutorials, the team laughing while working on debugging coding projects together.
The DGP team smiling next to results of our first coding exercise: Drawing animals using code.
The DGP team smiling next to the results of our first coding exercise: Drawing animals using code.

Our six modules combine the basics of creative coding with the basic principles of data storytelling. By the end of the modules, each student creates a final data-driven art project made with p5.js, representing a story or theme of their choosing. At the end, it felt like a breakthrough. One of our team members, Kelly, created a flower grid visualising the songs she listened to during lockdown. I thought, Whoa. The Data Garden Project works!

In 2022, I began to record each module as bite-sized tutorials, which now exist on our YouTube channel. It helped us gather some more audience around our mission, which was to make creative coding and data visualisation accessible and enjoyable.

The community grew in our social media and Discord—we recruited new team members and hosted ‘Study Spaces’ on the weekends, where we would open a room on Discord for an hour, and people could come, relax, chat, or use the space to do some work. We also began hosting a few online workshops on Miro. In March 2023, we hosted our first in-person creative coding workshop in partnership with RMIT University, and it was a blast.

Jo presenting in our first offline workshop with CTRL+ School of Design RMIT University.
Jo presenting in our first offline workshop with CTRL+ School of Design RMIT University.
Students in the workshop using basic shapes and colours to draw using p5.js.
Students in the workshop using basic shapes and colours to draw using p5.js.

In June 2023, we got accepted into the Processing Foundation Fellowship Program, which felt surreal. With the help of the fellowship, we were able to work on creating an educational resource and guidebook not just for students, but also for educators to take and adapt the Data Garden material to their own classrooms and communities. The free learning resource comprises six modules, combining the basics of creative coding, data visualisation, and exercises on building narratives and stories woven with data. Its purpose is to guide young people—with a focus on young girls and women—to create their first-ever data-driven interactive artwork using code.

What was the need? How did the Data Garden mission come about?

I initially learned creative coding in 2020, and, so, due to the state of the world, I had to adopt this brand-new tool in isolation. I’m grateful for my lecturer at the time, Karen Ann Donnachie (now a mentor to the DGP), who provided us with a warm, supportive learning environment even while remote—especially since learning to code is such a big, daunting task. That played such a big role, so that was the first thing I wanted to pass forward: a space that was warm, collaborative, and encouraging. The need, firstly, was to create an environment that takes the loneliness out of learning. 

Often, I felt that I don’t always find this in online coding classes or data bootcamps. Where there’s a lack of community and a sense of play or peer-to-peer support, you embark on an individual upskilling pathway with the mindset that the world is your competition. As best as we can, we want the DGP to offer more of a playground or collective sandbox for everyone to learn, fail, test, win, and try again together, with a peer-to-peer learning approach rather than instructor-to-class. What one learns, everyone learns.

A Data Garden coding project made with the Processing Desktop Editor.
A Data Garden coding project made with the Processing Desktop Editor.

The second need was to demystify gender biases around STEM. We know statistically that women comprise only a little over 30% of the STEM workforce. Moreover, girls and young women are also outnumbered in STEM-related majors in school or college and are less likely to pursue them than their male peers. These gaps are further aggregated in communities where the infrastructure for technology is lacking.

The overly militarised and commodified end products of technology and software can also add to this bias—when we think of science, subconsciously, we may think of weaponry, vehicles, video games, the rise of AI, which are traditionally coded as masculine and can feel disconnected from practices that feel arts-based, grassroots, or close to a community.

Through the Data Garden, we wanted to explore coding as if it were a fun scrapbooking or art activity that you can do after school, engaging communities of women to give that sense of belonging, and to show that this is a space for them, too. We turn software into something crafts and knowledge-based, lowering the barrier while providing an avenue for connection and vulnerability through story-sharing, not just upskilling.

Finally, the project-based nature also reduces the feeling of being overwhelmed when learning data and coding. A survey we ran back in 2021 within our community found that most of them felt overwhelmed by “too many coding languages.” So our modules are very introductory and straightforward—students learn JavaScript through the p5js library, a bit of HTML and CSS to create their webpage, and that’s it. As for the data, we’ll learn how to gather data, place it into a spreadsheet, how to read visualisations, and create one of your own. The goal remains as one output: A data-driven artwork to live on the web that tells a story about your life—and that simplifies the learning objectives.

And the Data Garden Project is now community-focused, is that right?

Yes! We host our community on Discord as a central hub. We host online workshops using Miro, the team does our brainstorming sessions here, or sometimes we like to open a one-hour Study Space on a weekend where we put on lo-fi music, chill, and just work on our own things in the company of others. Most of us are based in the Asia-Pacific, but as the Discord community grew, there was also a time when we opened multiple Study Spaces to accommodate those in Europe or the US.

A group photo after a weekend ‘Study Space’ on Discord.
A group photo after a weekend ‘Study Space’ on Discord.
A screenshot of people on a video call looking at a screenshare of code.
Investigating code together.

We also publish much of our content (such as modules, project updates, or inspiration) on Instagram and YouTube. One of the things we did on YouTube earlier this year was host Sharing Sessions, where we sit down and do Q&As with data viz or creative coding practitioners in the field, who share about their work and career journeys. 

The Data Garden Project challenged my notion that dataviz is always clean-cut and clinical as seen in most scientific publications, but turns out it doesn’t have to be that way!
—Septia, a member of the DGP

What are some things you noticed or learned while you were growing the Data Garden?

It was a big surprise to me to see community members trickling in from different parts of the world. At first it started with close circles of peers in Australia and Indonesia who were interested in what we were doing. All of a sudden we saw introductions from the UK, Philippines, USA, India, Mexico, Peru… I thought, We’ve never even been there!

At our core, we’re more of an international “study club” that learns creative coding and dataviz together. We partner with educational organisations or institutions, but we aren’t representative of any specific one. And I think that has kept the project approachable, malleable, and accessible for people to enter into and rally behind.

I was also nervous at first about starting the initiative with no computer science credentials (unless we count self-taught through p5.js YouTube tutorials). However, it was a delight to learn that our offering of community, storytelling, and creativity is what draws people into the DGP. This shed a lot of pressure from having to be like a coding course or bootcamp, and gave us more confidence to play to our strengths.

Speaking of confidence, that’s probably the biggest, most heart-moving thing I’ve seen happen since starting the DGP. We have team members scattered across a few different pockets of the globe. (It’s quite something to have to arrange meeting times between four or five different time zones!) When our Melbourne team finished our first offline workshop this year in March, a couple of members who were based in Jakarta started shooting their hands up: We can maybe do a similar workshop like this here! A team member based in the US also said: I found a space that can be great for a Data Garden workshop. Maybe I can run one here?

Moments like these, to me, are small seeds of potential for what the project could be. Not just for us to be planting seeds and growing our own little garden, but for it to take root and grow someplace else, adapt to new communities and be implemented in different ways. The baton gets passed on, and that’s what I’m hoping our future consists of. That it lasts far beyond us.

Speaking of the future, what are your future plans for the Data Garden Project?

Right now, it’s about making sure that this project’s legacy lives on. We created our guidebook resource on Notion earlier this year, and I’m very excited for it to be an evergreen resource for computer science educators and creative educators in all parts of the world.

Snapshots of the Data Garden Guidebook resource.
Snapshots of the Data Garden Guidebook resource.
Snapshots of the Data Garden Guidebook resource.

The resource houses our six learning modules, combining written guides and our YouTube video tutorials. It includes a mix of coding challenges, storytelling or writing homeworks, simple data visualisation concepts and examples, and workshop or activity ideas for the classroom. For educators, we also include slides or materials for class settings, as well as tips on how to facilitate a workshop drawn from the Data Garden.

A snapshot of Module 1 in the Data Garden Guidebook, side-by-side with notes for facilitation.
A snapshot of Module 1 in the Data Garden Guidebook, side-by-side with notes for facilitation.

I like to say that the resource is complete, but will always be iterative. There are so many ways educators can enrich it—either from a creative coding or a data visualisation perspective. For instance, we can discuss deeper about the level of data literacy that’s needed to engage in the Data Garden course. In Module 4, we explore how to gather and parse data, and there’s a section there about “Treating Data with Care”, where we are invited to discuss biases or incompleteness in data, and educate about the power dynamics in data analysis and visualization.

In 2024, I would be excited to see how the Data Garden Project can be implemented in a classroom cohort. There’s a lot more space to build community within the timespan of a semester, a summer camp, or a curriculum.

Last, but not least, we also have an exciting summer workshop in collaboration with MPavilion Melbourne this coming February—a beautiful architectural space that sits just south of Melbourne’s CBD. We’re calling it the Data Stitching & Storytelling workshop. Like the name suggests, we will explore how to visualise nature and our surroundings using cross-stitching or embroidery, bringing back tactility and crafts as a way of communicating data and introducing computational thinking, so we’re very excited about that.

Example image of the Data Stitching & Storytelling workshop.
Example image of the Data Stitching & Storytelling workshop.

What do you think an initiative like the Data Garden Project enables?

In our first Team Tutorial sessions, some of my favourite moments were when my peers would show me and the team the coding projects they made over the weekend, going into the process of how they made it and teaching the team how to create the same.

With every cohort, there’s always the expectant hope to see our learners become teachers in their own right. When we talk about this garden taking root and growing elsewhere, it’s really about young people being able to lead, and share their knowledge through community-led learning. 

Whether that’s sharing about their finished projects on socials, speaking about their work, explaining their processes, and—once they’re ready and willing—passing that knowledge forward, either through creating new learning content with us or running workshops or after-school DGP clubs on their own. I imagine it as this “mitosis” of learning, and I hope to see more of it in the near future.

Arshi, a member of our community from Kolkata, for example, is in the process of building new YouTube tutorials for us, this time a crash course on Tableau, drawing from her professional experience in analytics. We’ll also be looking at ways to equip our team more to facilitate workshops and share Data Garden material. Diversifying our learning content and giving our community the open space to experiment in those ways will be exciting.

That’s splendid. Lastly, where can we find you?

If you would like to partner with us and chat about our resource or community, we’d love to be in touch. You can reach us at datagardenproject@gmail.com and we’re always open to collaborate. 

Want to get involved? We are currently seeking teachers or education partners who would like to collaborate with the Data Garden Project to pilot this resource inside their classroom or community, or adapt it to their existing lesson plan for the new school term. Reach out to us at datagardenproject@gmail.com to get in touch.

The post Introducing Girls To Code, One Flower at a Time appeared first on Nightingale.

]]>
19742
The Best Day… To Buy a Taylor Swift Ticket https://nightingaledvs.com/the-best-day-to-buy-a-taylor-swift-ticket/ Tue, 08 Aug 2023 12:40:01 +0000 https://dvsnightingstg.wpenginepowered.com/?p=18145 A Taylor Swift fan with little hope of buying a concert ticket used her coding and data viz skills to make her Wildest Dreams come true.

The post The Best Day… To Buy a Taylor Swift Ticket appeared first on Nightingale.

]]>
When presale tickets for Taylor Swift’s Eras tour were released in November 2022, Ticketmaster’s website was woefully overwhelmed. The site crashed, bots snatched up tickets, and millions of Taylor Swift fans, after waiting hours in a queue, were left empty handed. As the digital dust settled, the bad news only continued. Not only did Ticketmaster cancel their general sale (due to dwindled ticket inventory), but the only available tickets were listed for up to 20 times their original price on resale markets.

For context, I have been listening to Taylor Swift since I was 12. Over the years, her music has been the soundtrack to my heartbreak, my happiness, and my growth into womanhood. I have cried, laughed, and belted out to all her songs. And if I don’t sound like a truly mad Swiftie by now, I can say with confidence that seeing her Reputation tour alongside two of my best friends was the best night of my life.

Suffice to say, I could not fathom not seeing her Eras tour.

Looking back on it now, I realize (as ridiculous as it sounds) that I passed through something like the five phases of grief in my search for Taylor Swift tickets. Denial and anger after the initial Ticketmaster fiasco; bargaining as I scoured Facebook and Twitter for resale tickets; depression when I realized there were millions of people just like me, many of whom were being scammed; and, finally, acceptance when I resigned myself to buying marked-up tickets on a reputable site like SeatGeek or StubHub.

At this point it was March, and I was looking to buy tickets for the show nearest to me (MetLife stadium in late May). My only remaining question: was there an optimal time to buy tickets? Was it now? Would tickets only become more expensive? Or was there an intelligible pattern to decode? A reasonable way to buy unreasonably priced tickets?

Ticketmaster, Look what you made me do

To answer these questions, I consulted my inner dataviz engineer. After realizing that manually checking prices on StubHub and SeatGeek was unsustainable, I began doing research on their APIs. SeatGeek, compared to StubHub, had more documentation and sample code available online to access their API, which provided aggregated pricing metrics for each show.

So for example, on March 22nd, I started pulling the average, median and lowest prices of all SeatGeek listings for the April 13th show in Tampa. Repeating this until the day of the concert, I would be able to see trends in day-to-day ticket pricing, and not only for Tampa, but for every city and every date on the Eras tour. Initially, I manually added each day’s data to this ongoing dataset, but, for obvious reasons, then wrote Python code that grabbed the day’s data from SeatGeek and wrote it to a Github repository (thanks ChatGPT!).

A screenshot of python code that grabs information from the API on Taylor Swift concerts, including data, state, city, average price, lowest price, highest price, visible listing count, median price, and other metrics.
Sample data pulled on March 22 from SeatGeek’s API showing pricing for the April 13 show in Tampa.

Now here it was: the moment when I’d truly use my dataviz skills for a noble cause. I would visualize this data to see, at a moment’s glance, trends in pricing, and to determine the exact moment when I should buy my tickets. 

What happened next is what I would consider a developer’s dream–the first version of the viz became (more or less) the final version. Creating a data visualization typically involves multiple cycles of design and development, spurred by user testing, to ensure that the final product meets the audience’s needs. Throughout a project I usually have a running list of to-do’s and bug fixes. And this, of course, is as it should be.

But this pet project, arguably the least intuitive and worst visualization I’ve made so far, had a very particular audience with very particular needs (It’s me, hi, I’m the user it’s me!). With limited time, I did not fuss over clear axes or helpful explanations. I did not fret over the ugly UI, lack of a mobile-friendly design or (relatively) non-breaking bugs. All the extra care I’d usually apply to making my viz universal (no doubt the crux of our profession), was put into servicing its basic functionality. This was the first time I’d created such a simple and intimate project, and with that came a liberating joy.

A very crude bar chart with no labels or informative makers. There are many pop-ups on the bars showing the date, median price, listings and city, state. But they all overlap each other.
Bug when hovering on bars while showing a date with missing future data. 

The result … Ready for it?

So how did I use this viz? Now that I’m writing for a larger audience, a longer explanation seems due.

Annotated visualization, showing pricing and listing data for Taylor Swift tickets on Wed. June 7. The data show the number of listings and the ticket price as bars. The x-axis is time and the bar colors distinguish past concerts from future concerts.
Annotated visualization, showing pricing and listing data on Wed. June 7.

In the image above, each set of positive and negative bars represents a show on a particular date. The horizontal axis represents time, the positive vertical axis represents the selected ticket pricing metric (average, median, lowest, etc.), and the negative vertical axis represents the number of SeatGeek listings for that show’s date. Shows typically take place Friday to Sunday. Each bar’s height represents the pricing metric (for positive bars) or number of listings (negative bars) as of the date represented by the red line. This date can be adjusted using the range slider to see the general pricing trend of all shows over time. Hovering on a bar shows the historic pricing for that specific show. 

A line chart showing the median price for the concerts at East Rutherford, NJ. The median price drops from $5,269 on Thursday May 25 to $2,978 on Friday, May 26.
Historic median price for the Sun. May 28 show at MetLife Stadium in New Jersey, showing a 43% decrease in median price the week before the concert.

Play with the live viz here

So what did I divine from all this work? Generally, I noticed that prices increased in the weeks before a show, with a spike occurring in the middle of the week leading up to the show. Then, however, something interesting happened: On the Friday before a show, prices tended to drop dramatically. Culling through Swiftie Facebook groups and Twitter accounts, I realized that this was caused by tickets that Ticketmaster was releasing the very weekend of the concerts. Unsurprisingly, many of these tickets were then immediately posted for resale on SeatGeek, thus increasing supply and decreasing the price of tickets. Since buying the face value tickets released from Ticketmaster would be near impossible (though of course I’d try), this would have to be my repurchase window–at the eleventh hour, the very weekend of the shows. Though waiting till the last minute seemed risky (and oh, how that wait gave me a few extra grays), I decided to trust my visualization.

When Ticketmaster released additional tickets two days before the concert, I bought a resale ticket on  SeatGeek for the May 28th MetLife show. The ticket—eye-watering transaction fee included—was not cheap. And by ‘not cheap,’ I mean it was expensive—as in, ‘a month’s rent in New York’ expensive, or, for the more mathematically inclined, ‘add an extra zero to the original price’ expensive. As my adrenaline waned, a sobering reality set in. What had I just done? Had I really spent all that money in one fell swoop? And what if the concert turned out to be just like any other? What if it failed to meet my impossible expectations?

All weekend long, I questioned my decision, sick with both buyer’s remorse and that hopeful malady known as excitement.

Buyer’s remorse? Shake it off

But to say that the show didn’t disappoint is a vast understatement. It was the best night of my life (yes, humbly dethroning my previous best night, at her Reputation concert). Her singing was flawless, her performance intimate. The stage sets were immersive and grand, the lighting mesmerizing and psychedelic. From nosebleed seats, a normally disappointing bird’s eye view was transformed into a unique perspective of coordinated visual effects. In ‘Mastermind,’ she sings, ‘Checkmate, I couldn’t lose,’ and at one point a shifting chessboard was projected onto the stage floor, with dancers standing in for chess pieces—a sight unavailable to the fortunate few with floor seats. Everything—the lights, dancers, and sets—was coordinated in a manner that transcended a normal concert and approached something closer to a Broadway show or, as a devout Swiftie might say, a religious experience.

Image of Taylor Swift on stage with a giant screen behind her showing her singing into a microphone.
Author’s image, May 28 at MetLife Stadium.

As she traversed the eras of her career, so I traversed the eras of my life. When she sang about losing her grandmother in ‘marjorie, I looked up at the sky and fought back tears thinking of my Grandpa. When she sang ‘Shake It Off,’ I recalled belting out the very same lyrics with my college roommate as we commiserated over stupid boys. When she crooned about the first fall of snow in ‘All Too Well,’ I suddenly remembered leaving a college party late one night and being struck by the sight of snow falling fresh in New York City. I remembered how the streetlights had glowed with an aura of snowflakes; how I had listened with uncanny amazement to the unusual silence; and how, upon seeing the magical sight, I shared a moment of truce with a guy I was on the rocks with.

And here’s the thing: I wasn’t the only one having this experience. It was as if she touched every person in that stadium of 80,000. When she sang ‘betty,’ a recent song from folklore, I was shocked by the teenage girls around me who shouted along. The song, about a high school love triangle, was one where, despite loving the music, I’d found the lyrics a bit immature. But now I realized that Taylor, while maturing in her musical themes, still made an effort to connect with a younger audience, much in the same way that ‘The Story of Us’ had connected with me a decade earlier. And hearing it again, ‘betty’ became clever in a way that her earlier songs weren’t, incorporating intentional storytelling that deviates from her usual autobiographical style. (My turn to scream came a bit later, during the most recent era of her life, when the lyrical themes shifted from young love and heartbreak to the competing obligations of a career, relationships, and societal expectations.)

But the most touching moment of the concert occurred when, in the surprise acoustic section, Taylor sang ‘Welcome to New York,’ a synth-pop anthem from her 1989 album. At home, I’d normally skip this song, finding its beat a little too relentless. But hearing it intimately stripped down to her voice and her strummed guitar chords, I realized that my journey to standing in that stadium began much more than a few months ago.

I moved, not just to New York City, but to America 10 years ago. It was as far removed from a small Caribbean island as it was possible to be, and I distinctly remember the initial feeling of panic. Through the ups and downs, I made this my second home. And those ups and downs proudly mark the NYC era of my life. I made best friends and met the love of my life while surviving the stress of my undergraduate engineering degree. I struggled through multiple job hunts and a career pivot, but now get to do what I love every day (moving through appropriate design-development iterations of course!). It was a new soundtrack and I did dance to this beat – still do. 

So after this experience, I can see why Eras tour prices have only kept increasing over time… I may or may not be updating my visualization to keep an eye on future ticket prices…

Editing support: Rob Aldana

The post The Best Day… To Buy a Taylor Swift Ticket appeared first on Nightingale.

]]>
18145
Visualizing World Governance Data as a Flower Garden https://nightingaledvs.com/world-governance-dataviz-flower-garden/ Thu, 13 Apr 2023 17:49:11 +0000 https://dvsnightingstg.wpenginepowered.com/?p=16810 How to create a floral-themed visualization with Power BI tools that captures how we govern our world.

The post Visualizing World Governance Data as a Flower Garden appeared first on Nightingale.

]]>
Preparing the soil

Recently, I had the chance to participate in the World Data Visualization Prize, which was part of the 2023 World Government Summit. I would like to share my experience with you.

As a Power BI developer, I used to face limitations with Power BI’s visualization capabilities. To create better visuals, I had to resort to workarounds, which took time and didn’t always deliver the desired results. Then I discovered Deneb, a custom visual for Power BI that uses the Vega language grammar. It gave me full control over chart elements and their attributes and a lot of inspiration.

Over the past few months, I’ve spent a lot of time creating Vega visualizations and publishing them on my website. I replicated some great visualizations created by others and also created a bunch of my own designs. But I was ready for a new challenge.

Planting the seeds

When I found out about the World Data Visualization Prize in my Twitter feed, I knew I had to participate. I usually work with confidential client data, and this seemed like a great opportunity to create a visualization based on open data and share it with a broader audience. With only two weeks until the final submission deadline, I quickly reviewed the data provided by the organizers.

The data included multiple metrics, but I was particularly drawn to the World Governance Indicators (WGI). The WGI project reports governance indicators for over 200 countries and territories, for six dimensions of governance: voice and accountability, political stability and absence of violence/terrorism, government effectiveness, regulatory quality, rule of law, and control of corruption. I decided to use a data-driven floral diagram to show the six different indicators using a single visual element—a flower with six petals. I also encoded multiple years of data by using multiple lines to draw each petal.

Image of Singapore's flowers and Afghanistan's flowers. Each petal represents a different metric and are color coded red (worse) to blue (better) Singapore's petals are mostly blue hues, with the exception of the "voice and accountability" metric, which are yellow (a hue that's in between the blues and reds). Afghanistan is mostly orange, with the "political stability and absence of violence and terrorism" metric in a deep red.
A zoomed-in image of the flower design. Each petal is a governance metric from the dataset. Each petal has a line representing a year from 2012 (inner) to 2021 (outer). Each color represents a scale of better (more blue) to worse (more red) in comparison to other countries.

The bloom

I created my first flowers, which looked great, but the layout was too simple: almost 200 flowers (each representing a country) in a simple grid design. I wanted to create something more visually appealing and I wanted to add a GDP-per-capita dimension. So I resized the flowers depending on GDP per capita and used a “force” transformation in the Vega grammar. This brought all the flowers as close together as possible without overlapping. This was an improvement, but now it looked like a flower cloud and the positions of the flowers didn’t hold any particular meaning.

To solve this, I used each country’s latitude and longitude to place it on its geographical location (using a simple equirectangular projection). Then, I reapplied the “force” transformation. This added spatial dimension and created a “floral cartogram.” I also changed the background to black as I knew from previous experience that bright colors and many lines look better on a black background.

The rest of the process was mainly technical. How to make my floral cartogram perfect? I had to add a legend, adjust spacing, font sizes, label positioning, and other parameters to create the best-looking visualization. I also printed the cartogram to check its appearance on paper, but printing a black background quickly used up a printer cartridge.

These small adjustments took a considerable amount of time and prompted a lot of emotionally charged questions. Is it good enough already? Should I seek a second opinion? (I asked my wife’s opinion and she was very helpful.) Is it a work of art or science? Did I make the chart more visually appealing, but less functional? Well, there is not a single answer to all the questions, good data visualization is always a compromise between multiple choices and finally I had to stop asking questions.

A visual of the information graphic showing 200 countries as flowers of different sizes, with size representing GDP. Each flower is placed roughly in the geographic location of the country on a flat world map. Each flower has six petals, each representing a metric of governance. Each petal has 10 concentric petals within it, representing years from 2012 to 2021. Each petal is color coded on a shaded scale of red (worse) to white (middle) to blue (better).
The final layout for the data garden.

From flowers to fruits

After I was satisfied with the result, I submitted my visualization to the organizers of the World Data Visualization Prize. I was not lucky enough to win the prize, but my visualization made it onto the long list of best visualizations. Regardless, I am pleased with the outcome, and the creative process was inspiring in and of itself. 

Reading Nightingale magazine at the time that I was working on this project, I realized that it would be great to have my visualization and story available to others who may find inspiration in the work process and the final result. And here I am, with my article in Nightingale. I am thrilled, and it’s time for me to start looking for new data visualization challenges. 

I hope my experience will inspire Nightingale readers, especially those who feel limited by the technical possibilities of their primary data visualization tool, to experiment with alternative ways to visualize data, be more creative, and cultivate their own data visualization gardens to make the world of data visualization more beautiful. 

The post Visualizing World Governance Data as a Flower Garden appeared first on Nightingale.

]]>
16810
How I Created a Data Visualization With Zero Coding Skills, Thanks to ChatGPT https://nightingaledvs.com/data-visualization-using-chatgpt-to-code/ Tue, 04 Apr 2023 17:12:26 +0000 https://dvsnightingstg.wpenginepowered.com/?p=16685 An exercise in building a data visualization with ChatGPT writing—and debugging!—the code.

The post How I Created a Data Visualization With Zero Coding Skills, Thanks to ChatGPT appeared first on Nightingale.

]]>
This isn’t part of my product journal series (if you’re interested, feel free to check it out), but I wanted to share my journey of leveraging ChatGPT to create data visualizations. I teach data visualization at The New School, and common feedback I receive from my students and colleagues is:

Data visualization is cool but at the same time it’s bit daunting that I need to know lots of tech stacks to actually implement it.

I totally agree that even when I was studying data visualization, I spent a pretty substantial amount of time learning how to code, handle web hosting, work with Python, SQL, and more, all while absorbing knowledge on information visualization.

Thankfully, we no longer need to deep dive into technical gatekeepers in this field. This doesn’t mean that technical knowledge is not valuable, but rather that we no longer need to be intimidated by technology because AI can spoon-feed us knowledge and do the heavy lifting for us. Are you excited? Let’s get started!

I’m going to build the data visualization that one of my students posted on weekly write-up homework.

1. Find data source

Even finding data can be pleasant with ChatGPT.

Posting from a student work website at Pratt Institution about art galleries and museums in New York City. The image is a map of New York city, the the five boroughs highlighted different colors and a dot for each gallery or museum location.
Posting from a student work website at Pratt Institution about art galleries and museums in New York City.

It’s always crucial to honor the original reference. And I’m glad that my student did it for the original publisher of the data visualization.

If you hit the link, you will see the documentation on how the visualization was built as well as a link to the original source of the data (NYC Open Data) GREAT!

You can download the data by exporting it in CSV format from NYC Open Data like below.

An image of the original data set available for download from NYCOpenData. The writer has circled the "Export" and "CSV" buttons to indicate how to download the data.
The original source on the NYC OpenData site, which allows the data to be downloaded to a CSV file.

Once you download and open the data, you will be able to see how the table is structured.

The CSV file is a table with headines for geographic coordinates, the name, telephone number and url. The geographic information is in latitute-comma-longitude format.
A look at the data structure in table format

The lat, lon coordinates sits within a column which seems a little bit hard to use. So we need to massage the table. How? you guessed it. ChatGPT.

2. Data processing

Fear not, we have ChatGPT which will guide us. Let me walk you through the process assuming you know nothing about it.

Ask the right question to ChatGPT

An image of the question posed to ChatGPT: I would like to visualize the art gallery locations in NYC, and I got the data about it but the data I got seems to have lat, lon coordinates in one column in the table, which doesn't seem helping. How do separate teh column and having two separate columns? One for lat and the other for lon coordinates. And also I would like to delete the original column. Please write Python code to do that. For your information, the file name I have is "ART_GALLERY.csv" and the name of hte column which has lat lon coordinates together in a column is labeled "the_geom"."
Asking the question.

What I’ve noticed is that you have to be extremely specific about what you’re looking for. ChatGPT isn’t a mind reader, so you shouldn’t expect it to understand vague or unclear questions. The less specific you are with your query, the more follow-up work you’ll need to do, and that’s not ideal. I want ChatGPT to do the work for me and provide me with the final answer right away.

An image of the lines of Python code produced by ChatGPT
Python code by GhatGPT

Boom, you’ve got the code! But that’s not the end of the story. Do you know how to run it? If you’re not sure, don’t worry, ChatGPT can help you figure it out!

Don’t be shy. You can just drop a question as below.

An image of a question posed to chatGPT: "I don't know how to run a python code. Is there an application I could run Python code? The response is a list of programs including Python IDEs, Jupyter Notebook, and Python REPL, with a brief description of each.
Run code like a dummy.

Great, now you know how to run Python code! Personally, I prefer Jupyter Notebook, which has a pretty name. Let’s keep asking some more dummy questions!

An image of a question to ChatGPT: "So... What is the easy wat to get Jupyter notebook ok my mac?" And an answer is a step-by-step instruction on how to access the program.
Installing Jupyter.

Cool! Let’s get Anaconda by clicking on the link generated by ChatGPT, or simply searching for ‘Anaconda’ on Google. Once you’ve installed and opened Anaconda, you won’t have any trouble finding the Jupyter Notebook icon — you just need to have decent eyesight for this task!

A screenshot of the website where you can download the Jupyter notebook. The icon for it is in orange, with stars around it.
Jupyter Notebook yelling at you.

CheatGPT

Now we are SO READY to write a code. Let me correct myself. Now we are SO READY to copy and paste the code written by ChatGPT.

An image of the open Jupyter notebook, with a focus on the dropdown that says "New"
Creating a new Python file.

Click on the ‘New’ button and select ‘Python 3’. Python 3 is simply the latest version of the Python programming language, and it’s the only choice available in Jupyter Notebook. Paste your code into the notebook and hit the play button to run it.

An image of the Jupyter notebook with an error message.
Error

Ouch! It doesn’t work. How do I fix it? You guessed it, ChatGPT. Just paste the error message you got.

An image of the ChatGPT tool, where the user has asked the program what to do about the error message, which they have pasted into the chat.
Solution

It looks like the Python code is in a separate file, and your CSV file isn’t in the same location. Let’s move the CSV file to the same folder as the Python file. First, let’s save the Python code. After saving the code with the name ‘MyCode’, you should see the file saved in the following screen. It appears that the file was saved in the outermost folder on your Mac. (By the way, ‘ipynb’ is just the file extension for a Jupyter Notebook file — in this case, it’s your code.)

A screenshot of the window where the file gets saved
My code and directory

Based on ChatGPT’s advice, let’s move the CSV file to the same location as the Python file. That way, we’ll be able to work with both files more easily.

Let’s run the code again since I’m now confident that the code would work.

A screenshot of the code, again with an error message.
Error, again.

Again? I was over confident. Let’s just copy and paste the error message into ChatGPT.

A screen shot of ChatGPT's response. It reads, in part, "The 'ValueError' you encountered occurs when the number of columns in the new DataFrame you're trying to create using the 'str.split()' method does not match the number of column names provided in 'df[["lon", "lat]]'."
What are you talking about?

ChatGPT said something but I don’t want to understand what it is since it’s a bit too much. so I just typed in… “Please write a new code to solve the problems you mentioned”.

Chat GPT's reply, "Here is a revised code that should solve the issue of splitting the "the_geom" column into "lat" and "lon" columns. It follows with a new script.
Revised code.
An image of the new code in Jupyter notebook
Wollah!

Wollah! Done. Let’s download the result. How? you guessed it, ChatGPT.

An image of the user asking Chat GPT: "Let's write a code so that I could save the result as 'modified_geom.csv." The revised code is produced below the question.
Save it as CSV.

Life can’t be this easy, but that’s exactly what it is with ChatGPT. Now, how do I get borough information for each set of coordinates? Once again, the answer is ChatGPT!

User asking the Chatbot how to find the file and asking for a column of data that indicates which borough the geocoordinates are in. The chatbot provides a link to a dataset that contains that information and notes that it is in geoJSON format.
How do I get the additional information?

ChatGPT provided me with a link to download the necessary data, and I downloaded the ‘geo-json’ file as it recommended.

The chatbot provides a sample of code to help get the user started on integrating that new data.

I got an error code and felt lazy too figure that out.

Image of the user asking ChatGPT to update the code, with the error code pasted into the question.
Pasting error code.

TADA!

Wollah! Again!

I honestly didn’t expect to get this far, but with ChatGPT’s help, I finally got the result I was looking for! It’s worth noting that I encountered several different errors along the way, but all I had to do was copy and paste the error messages and ask ChatGPT to regenerate the code to fix the problem.

3. Decide visualization method

Now only the visualization part is left. Let’s write? NO
Let’s ask and copy and paste. Let’s start with the fundamental question.

Seeking a consulting solution from a ChatGPT

User asks ChatGPT "I want to create a data visualization on the NYC map using Javascript, HTML, CSS. What would I need?" The answer is a 6-step instruction list on the various data and libraries that are required.

How do I know already that HTML, CSS and JavaScript are necessary? It doesn’t matter since now you know it as well by reading my posting 🙂

Now I can start to ask several questions based on the answers.

  1. I need map API and D3.js
  2. I need csv file, which I already have
  3. HTML, javascript and css
  4. Code editor
  5. Browser which I already have

4. Implementation of design

Getting HTML

Ask for generating the backbone HTML file for the data visualization.

Asking ChatGPT" Okay, let's start with HTML. As you said, I could to use map API(Mapbox) and D3.js. Please write HTML code to use them first, and let's create javascript and css later." The ChatGPT offer sample code to get started.
An image of ChatGPT telling the user that the code includes CSS and Javascript libraries for Mapbox and D3.js. It notes that you will need a Mapbox access token to use the Mapbox API.

The program indicates that I need an ‘mapboxgl’ accessToken to proceed. Since this is private information, I cannot share it here. However, you can generate one by signing up for Mapbox.

Since we got HTML code above, let’s move on to Javascript code and CSS.

The user asking ChatGPT to write javascript code for the HTML written above. The response is a string of code.

Data Visualization solely made by ChatGPT

Take a look at the final result below. Isn’t it amazing to have a custom data visualization piece without having to write a single line of code?

The image is a map of New York city, the the five boroughs highlighted different colors and a dot for each gallery or museum location.
Final result – Cool map data visualization

If you’ve made it this far with me, you should be proud of yourself! Before I wrap up this article, let’s summarize the workflow and compare what it would have been like without ChatGPT.

A table showing comparing the process with and without ChatGPT. Without ChatGPT, the sources must be googled or searched for in a database. the data processing requires Python and SQL. The Visualization method requires Tableau, D3, Mapbox and Notebook. And the Design requires HTML, JS, CSS and debugging. With ChatGPT, you can ask it to find the source, ask for what you want to do with the data, ask it to decide a visualization method and write code to implement the design you want.

Find data source

Without ChatGPT, finding the data you need can be a painful process. You have to try different search words and sift through the search results that Google presents to you. With ChatGPT, it’s a different game altogether. You simply type in what you want and ask for the link.

Data processing

This is often the most intimidating part for those of us without a background in Python or other data processing platforms. In most cases, the data you find online won’t be in the exact format you need. That’s where tools like Python and SQL come in handy — you can use them to process the data and extract the information you need.

Visualization method

After preparing the data, the next step is to decide how you want to display it and what tools you want to use to accomplish that. This can require a decent amount of technical knowledge and familiarity. However, with ChatGPT, you can get clear guidelines and an implementation plan to help you navigate this process.Implementation of Design

This can be the biggest hurdle to overcome and I would call it the final gatekeeper of the world of data visualization. However, you can easily tackle this final boss by leveraging the power of AI and move forward with ease.

I mentioned at the very beginning of this posting casually.

This doesn’t mean that technical knowledge is not valuable, but rather that we no longer need to be intimidated by technology because AI can spoon-feed us knowledge and do the heavy lifting for us.

Ironically, this is the most critical lesson I took throughout this entire process while creating visualization not writing the code by myself.

While you can create stunning visualizations with ChatGPT, knowing how to code opens up even more possibilities. Especially when you encounter an error, the debugging process can be exhausting if ChatGPT is unable to identify it immediately. Additionally, ChatGPT does not retain memory of previous conversations, so it is ideal to have knowledge of coding with ChatGPT to streamline the process.

I want to encourage my audience to use this as a starting point to become more interested in data visualization and coding.

NOTE: A version of this article was originally published on Medium.

CategoriesCode Use Tools

The post How I Created a Data Visualization With Zero Coding Skills, Thanks to ChatGPT appeared first on Nightingale.

]]>
16685
Does Twitter’s Algorithm Hate Your Friends? https://nightingaledvs.com/does-twitters-algorithm-hate-your-friends/ Tue, 29 Mar 2022 13:00:00 +0000 https://dvsnightingstg.wpenginepowered.com/?p=10723 I love Twitter. Well, I love my community on Twitter. I love learning about new art, new events, scientific discoveries, and social movements new and..

The post Does Twitter’s Algorithm Hate Your Friends? appeared first on Nightingale.

]]>
I love Twitter. Well, I love my community on Twitter. I love learning about new art, new events, scientific discoveries, and social movements new and old. But when I’m scrolling through my feed, all I see is a certain style of post: engagement-bait. The content I want to see never makes it to my feed.

I’ve curated a pretty good list of those interesting people that I follow. When I go to their individual profiles, I see a wealth of enriching information and creativity around art, AI, political philosophies, new technology, etc. But I rarely see any of that content on my main feed. It all gets drowned out by the sure-fire, dopamine-rush tweets that get thousands of likes and retweets, or the “Main Character” being ratioed to oblivion by pilers on. There’s no way my friends’ deep discussions and beautiful creative posts–with their paltry dozens of likes and maybe a retweet or two–could compete.

Which is exactly what Twitter’s algorithm (“The Algorithm”) is optimized to do. Those kinds of engagement-bait tweets keep us scrolling, keep us engaging, keep our brains trapped in an attention cage.

Rekt· Replying to @TwitterSupport DEFAULT TO LATEST PLEASE
thomas violence  Replying to @TwitterSupport why does this app hate its users so much. did they wrong you somehow jeez
Twitter Support Appreciate the feedback. We're always looking for ways to give you more flexibility and control over what you see in your timeline. 
pat (sochie heim respecter) Replying to @TwitterSupport i respect your steadfast committment to constantly making this website worse
The PS1 startup sound as a Lesbian @janusrose Replying to @TwitterSupport JUST LET PEOPLE SEE LATEST TWEETS BY DEFAULT, MY GOD
Spencer Roberts The consistency is at least refreshing.
The PS1 startup sound as a Lesbian ♡ @janusrose · 2d every time you open the app it defaults back to Home. this is anti-user, coercive design and you know it. nobody wants this.
Katie Mack ♡ Replying to @TwitterSupport I would like to delete Home from my view and also from existence please allow this thank you 
Taylor Aynes Replying to @TwitterSupport I don't want “home” now or ever. This should not be the first thing I see every time I open the app. Let me get rid of this. I want to see tweets chronologically. Full stop. 
Joshua Jarvis Yes, latest only and please stop aggregating tweets. I want chronology and I want every update, in order, NOT algorithmic prioritization which only forefronts and intensifies into populist pedestals that the web shouldn't even have.
Twitter users replying to Twitter’s latest announcement about defaulting to the “Home” feed algorithm instead of a chronological feed.

I began to really get fed up with The Algorithm hiding the content I wanted to see. I felt like I wasn’t ever seeing tweets from people I actually follow, but rather randos that The Algorithm thought would keep me hooked. Many others, it seems, feel the same way. To be sure, I needed to collect data to prove my hypothesis, and so I began this experiment.

A system of self-surveillance 

The first step was to record and measure what The Algorithm was shoving into my eyeballs. So I wrote an app to surveil my screen every time I opened the Twitter app.

Recording of me scrolling through Twitter with my Twittysnitch app saving all tweets: the green outlines are roughly what Twittysnitch identifies as an individual tweet to parse.

My app appropriated Android’s Accessibility Service system, which is normally used to give people with disabilities alternate ways to experience an app–like screen readers that read the text to blind people. (This technique has been used in the past to automate tasks, fix broken or unsupported abandonware, even help exploited workers fight back against gig economy corporations trying to surveil and control them. It’s a testament to Android’s open and extensible nature.) 

My app scans all the text on my Twitter main feed, parses it into individual Tweets, and then saves it to a database. It does this constantly, every time I scroll. I collected thousands of tweets over dozens of hours, scrolling through Twitter over a month-long period. This way, I gathered the raw data to test my hypothesis. 

(Why didn’t I use the Twitter API, you ask? Well, Twitter rejected my request to use their API after I told them my intent to write this article. I guess they don’t want people digging in and revealing how their system is rigged. ?‍♂)

Chewing the feed data

Once I got the raw data from my app, I parsed it into relevant portions, and then imported it into Observable to start visualizing the data. All my data and code is in this Observable notebook if you want to explore more deeply.

First I came up with a primitive metric I call “total engagement,” which is the sum of the number of likes, retweets, and replies that a tweet gets. The following graphs use this metric as a general way of measuring how many people have interacted with the content (which I’m guessing is close to how The Algorithm decides to show things to you anyway.)

Stranger danger

Let’s start by just comparing how many tweets in my feed are from people I follow (hereby labeled as “friend”) versus strangers.

Out of 3,200 tweets, only 31 percent of them are from people I follow.

Looks like around one third of my feed is tweets from friends, and two thirds are from strangers and ads. But let’s go a bit deeper into why.

The distribution of friends versus strangers based on engagement, showing how many times a tweet was retweeted or liked. (Hover over a dot to read the tweet content).

We start to see that the less popular tweets are from people I follow, whereas the really popular tweets with loads of retweets and likes are mainly strangers.

Maybe I just don’t follow enough popular people who tweet out bangers with hundreds of thousands of likes? I’m curious to see how these numbers compare to someone who primarily follows large accounts, and if it’s still an easy one-third split between friend/stranger or if the division reflects popularity.

How many tweets appear in my feed from friends versus strangers, plotted against the tweets popularity.

Even at this log scale, it’s clear that not many of my friends post popular tweets, and thus don’t show up in my feed. Without the log scale, it’s even more dire:

Same graph as above, but in linear scale.

Who is in my feed?

Let’s see how many of the people I follow actually show up in my feed.

All the people I follow, broken down by if they are featured in the news feed or not.

I am following over 2,000 people, so to only see tweets from 10 percent of them is disconcerting; 90 percent of the people I intentionally follow, and want to hear from, are being ignored/hidden from me. When we dig deeper, it gets even worse.

Top Tweeters

Here’s a breakdown of repeated posters, who appear in my feed multiple times.

Number of tweets in feed, by person.

You can see a good portion of them aren’t even people I follow. And even of the people I do follow, a small percentage of them take up a large amount of my feed.

Here is the same graph, color coded by how popular each tweet is:

The number of tweets in my feed, by person, color coded by total engagement. (Hover over a tweet to see the breakdown.)

It’s interesting to see that some of the top repeat tweeters have content that only has a moderate amount of engagement. I’m guessing this means these people are shown to me more often because I interact with them more often. And because I see them a disproportionate amount of the time, I interact with them more, which only compounds the problem, shrinking my “bubble” through recursive self-selection. To borrow a phrase from machine learning, this could be called “overfitting:” where The Algorithm focuses too much on a narrow band of people because I’m only able to interact with that narrow band of people shown. 

Why are they in my feed?

Tweets shown by friends’ recommendations:

Distribution of tweets that appeared in my timeline because a person I follow liked, retweeted, replied to it, or followed the stranger’s account.

It looks like the majority of tweets are from strangers recommended to me because someone I follow liked or retweeted them, and the remaining 13 percent are ads: 

Proportions of recommended tweets versus ads versus tweets from people I follow.

Then there’s a whole 11 percent of tweets that I cannot classify with my parser (labeled “Stranger”). These are probably “viral” tweets or tweets on topics that Twitter thinks I would like, even though they have no connection to me or people I follow.

Top Tastemakers

We also see that a large portion of strangers’ tweets appear in my feed because they were recommended by the same small subset of people I follow. I call these “The Tastemakers” since apparently these few people get to dictate what I see in my feed. 

Number of recommended tweets that show up in my feed, grouped by who recommended them.

How current is this content?

Timeline of tweets: the x-axis is how long ago the tweets were posted relative to the time I saw them.

Analyzing the relative time posted, most of the tweets from people I follow are more recent (within a day). Whereas if it’s older than a day, it’s mostly from strangers. I guess Twitter wants me to catch up on the drama happening in other areas I’m not actively following. (Or, maybe it takes longer for the strangers’ tweets to get enough engagement to propagate to my feed).

Unfortunately, this means that the majority of tweets I see are relatively old and outdated. Conversely, I’ll miss any of my friends who post infrequently if they posted more than a day since I last logged in (with my level of Twitter addiction though, that’s not likely, haha).

What about tweet quality (sentiment analysis)?

I then tried some sentiment analysis using VADER, to see if The Algorithm was feeding me overly negative or positive content.

Tweet sentiment by how popular they are. (Hover over any dot to read the tweet text.)

Interestingly, it seems to be pretty evenly distributed, maybe a little heavier on the positive side. I wonder if the Twitter algorithm takes sentiment into account when deciding what to show you, or if it’s just a natural distribution of popular tweets.

Who is today’s Main Character?

Then, I visualized the “Ratio”, i.e., the proportion of likes to replies, in order to spot controversial statements. Anything below the zero line has more replies than likes, which could be an indicator that it has fired people up enough to respond. (Hover over any dot below to read the tweet text.)

Tracks the “Ratio”, i.e., the proportion of likes to replies, in order to spot controversial statements. Anything below the zero line has more replies than likes, which could be an indicator that it has fired people up enough to respond. (Hover over any dot to read the tweet text.)

It seems like The Algorithm doesn’t optimize for showing me the deeply controversial tweets, but a few do sneak in there. Judging by the amount of orange below the line, apparently I follow some real rabble-rousers.

How do we fight The Algorithm?

So how do I interact with those 90 percent of people I chose to follow but never see? How do we win back our attention from the clutches of The Algorithm? 

The short-term workaround I found was to create Lists of people I follow, divided up by subject/category: AI/ML, dataviz, UX designers, philosophy, artists, etc. Then I add those Lists as Tabs at the top of the screen. Those tabs seem to be purely chronological, bypassing The Algorithm and culling a lot of the cruft from people I don’t follow. I end up seeing content from people who never show up in my main feed. Of course, this only works in the Twitter app, and only for now.

To solve this issue longer term, and for other social networks governed by similar attention-capturing Algorithms, we need to rethink how these platforms are designed and built. Stephen Wolfram proposed to let people choose their own Algorithm: the social platforms could still host your data and provide the interface, but the way the data is aggregated, sorted, and displayed to you would be customized by pluggable algorithms. If any entity were allowed to create and share an algorithm, they could create an entire marketplace of algorithms, allowing for competition and choice in how you consume your content.

What’s ironic is that I think Twitter’s CEO, Jack Dorsey, realized this, and split off part of Twitter’s brainpower to create a new social network called Blue Sky, which encompasses some of these ideas in a novel type of decentralized social network. 

The other interesting thing emerging from web3 and decentralization is a move away from these types of algorithms that incentivize inflammatory race-to-the-bottom-style content creation, towards a model that supports creators and communities through more socialized funding and discovery. Some emerging coalitions like Channel are exploring new modes of content creation, distribution, and ownership, through NFT subscriptions, patron-models, RSS, etc.

Another approach, proposed by The Center for Humane Technology, is from the centralized, regulatory side: leverage 12 pressure points to change social network incentives–from internal design changes and oversight boards, to lawsuits and regulations that shift power towards the users of the networks instead of the shareholders of the company.

The way I see it, the centralized path via government regulation is a short-term fix which may be necessary given the amount of power our current societal structures allot to social media corporations, but the long-term fix is to put the power into the hands of each user instead—especially considering that centralized power structures are how we got into this mess in the first place. I’m eager to see what this new world of decentralization will bring us, and how it could afford us more agency in how we donate our attention and how we manage our privacy.

At the very least, maybe I’ll finally be able to see what my friends are posting.

The post Does Twitter’s Algorithm Hate Your Friends? appeared first on Nightingale.

]]>
10723
Colorless Green Graphs Sleep Furiously: A Conversation with Leland Wilkinson https://nightingaledvs.com/colorless-green-graphs-sleep-furiously-a-conversation-with-leland-wilkinson/ Tue, 15 Mar 2022 13:00:00 +0000 https://dvsnightingstg.wpenginepowered.com/?p=10656 This article traces some history of ideas behind Leland Wilkinson’s development of the Grammar of Graphics (and later, of ggplot2) in the form of a..

The post Colorless Green Graphs Sleep Furiously: A Conversation with Leland Wilkinson appeared first on Nightingale.

]]>
This article traces some history of ideas behind Leland Wilkinson’s development of the Grammar of Graphics (and later, of ggplot2) in the form of a discussion / debate between Lee and the author. At issue were meta-questions of data visualization: (a) the essential nature of software for data graphics; (b) the idea of a comprehensive, mathematical theory for graphics expressed in computer syntax; (c) the idea that there is something beyond syntax in the code of data graphics. 

Introduction

In 2017, for the Joint Statistical Meetings in Baltimore, I organized an invited session on The Development of Dynamic and Interactive Graphics, with Luke Tierney and Dan Carr as featured speakers. Afterwards, I met several other friends for dinner at Azumi on the inner harbor.  Present also were Howard Wainer, Lee Wilkinson, Paul Velleman, and others. The inventive (if overpriced) Japanese cuisine that was served was so visually appealing and provided a backdrop for a wide-ranging discussion of data visualization. 

In the course of the evening I got into a discussion (or debate) with Lee about the wider understanding of data graphs and implementations of graphics in software systems. Lee’s main point was that the Grammar of Graphics (GoG) was a complete, self-contained mathematical theory of statistical and scientific graphics. He meant this in the sense of a formal grammar, such as Chomsky’s (1957) Syntactic Structures, that could produce any well-formed, syntactically correct sentence in a language and could not produce any syntactically incorrect ones. 

I countered that there was more: the semantics or meaning of graphs and poetics—the beauty of the language of the code that was used to create a given graphic, the connection between the idea of a graph and the language used to create it on a computer. As one example, I used “Colorless green graphs sleep furiously,” a paraphrase of Chomsky’s famous example of a sentence whose grammar is syntactically correct and whose meaning is nonsensical semantically, but could have an appealing poetic interpretation. 

After that dinner, I proposed that we write a joint article summarizing some of these issues, but neither of us had the time to pursue this. What follows is a lightly edited transcript of our follow-up discussion, with the goal of exposing what Lee and I were thinking at that time, more explicit than has previously been expressed in print. It concludes with a coda intended as tribute to Lee. It is not an overstatement to say that Lee was among the most profound thinkers in modern data graphics. His Grammar of Graphics  (Wilkinson, 1999) revolutionized theory and practice and is now the basis of most modern graphic software systems. 

Letter from Lee, August 8, 2017

Thanks for the invitation to co author an article on some of the topics we discussed. While needing to decline, I nevertheless think I ought to clarify some of the points that were possibly the source of misunderstandings. Many of those misunderstandings didn’t surprise me because I’ve seen them pop up occasionally in comments on Grammar of Graphics (GoG). It’s been difficult for some readers to relinquish a popular nostalgia for what one reviewer called “the golden age of statistical graphics.” That “golden age” was described as spanning books by Bertin, Cleveland, Tufte, Wainer, and others. As I mentioned in my reply to that reviewer, however, GoG doesn’t belong on that bookshelf. GoG has nothing to do with those books or, for that matter, with any writing on the efficacy of various visualizations, good usage, taxonomies, history of graphics, new types of graphics, semiotics of graphics, storytelling with graphics, human perception of graphics, or the mind’s eye.

So, what is GoG concerned with? As I explained in the book, it involves the mathematics underlying statistical and scientific graphics. I am interested in any graphic that can be expressed in an explicit mathematical model. That model induces a huge corpus of graphics that were, prior to GoG, considered to be disparate.

Why do I think a mathematical model of graphics is important? Because GoG invokes a new world of visualization. It is not a world of printed graphs or even a world of interactive exploratory systems like XGobi, JMP, or DataDesk. Instead, it is a world where a computer understands the content of a graphic.

Consider the following illustrative collection of use-cases in that world.

  1. “Here are some data. Please analyze these data and show me the kind of interesting things an expert in visualization would find.”
  2. “Here is an image of a published chart. Please parse this image, extract the data from it, and generate an equivalent chart in the style of Cleveland, Holmes, or Tableau.”
  3. “Here is a table of results from a factorial experiment. Please fit a plausible subset model to these data and show me a graph of the residuals in each cell of a similarly formatted table.”
  4. “Here are the dates, temperatures, divisions, and coordinates of Napoleon’s march to and retreat from Moscow. Draw Minard’s map, highlight the date of the minimum temperature, and explain its effect on the number of surviving troops. Also, tell me the average speed (in kilometers per hour) of the troops as they marched.”

I assert that there is no system not based on GoG that can implement these tasks. No amount of hand-waving can substitute for mathematics when you program a computer.

Now, the points I’ve made so far in no way denigrate or exclude the important investigations of history of graphs, visual processing of graphs, memory for graphs, design of graphs, and so forth. I am only saying that these ideas cannot be used to train a computer to understand a graph. They are germane to standards of graphics usage, to design of effective UI’s, etc.

When I told you at dinner that I don’t care whether a particular graph is popular or not, however, that’s because the question is irrelevant to understanding the structure of a graph. In fact, one of the most popular graphs is the Pareto Chart. As I showed, this popular chart is ungrammatical or, equivalently, ill-formed. It rests on a mathematical mistake.

When a graph has a clear GoG structure, then there is little use in trying to express that structure in some other way. Conversely, GoG structure has nothing to say about aesthetics, effectiveness, or other non-mathematical aspects of a graph.

There are two dimensions to GoG. The first is temporal. As this figure from the book shows, the construction of a graph is a total order — one cannot do these tasks in a different order and get a correct graph. That’s a strong statement; one I haven’t seen refuted. In fact, I exposed serious bugs in Tableau because they failed to observe this ordering.

Figure 1: Conceptual diagram for the Grammar of Graphics. It identifies the key classes of objects starting from data and the classes of methods used to transform one to the next, giving a final graphic object that still needs to be rendered.

Unfortunately, this figure has been taken by some to be an ordinary data flow. But data flows have little to contribute to the understanding of graph construction, despite their widespread use in describing such things. By contrast, this figure represents a function chain. Each class contains functions (methods) that are composable. The expressiveness of GoG is due to this composition; each class has many methods and the repertoire of graphs produced is a product set of all these methods. I have seen no other graphics platform, including D3, SAS, R, or even SYSTAT that can produce as wide a range of graphics as nViZn.

Lamentably, one reviewer failed to notice that this figure is an outline and sequencing of the chapters of the book. It is a description of the actual objects (classes) used in the GoG program called nViZn (now called IBM RAVE). This reviewer thought the book’s organization to be rather haphazard and ad hoc. This signaled to me that the reviewer completely misunderstood the logic of GoG and thought the topics covered were simply independent aspects of visualizations. In the almost two decades of GoG, the only group I’ve seen that understands GoG in detail is engineers responsible for producing major graphics systems— the engineers at R (ggplot2), Python, Microsoft, Google, Tableau, Facebook, and Netflix. That’s where the book is still selling — in numbers that increase each year.

The second dimension to GoG is structural. Here is a graph diagram of the whole system. It was produced by dragging the Java package directory into AutoViz — using GoG to analyze GoG. Notice that the classes in the figure above are actually represented in the Java code. 

[MF: I omit this network diagram here. It shows that the structure of the nViZn code for GoG closely matches the diagram in Figure 1, but shows the functions involved in each class.]

Minard graphic

Now let’s take a look at the Minard graphic. Here’s the structure of that graphic. It was produced by dragging the XML used to produce Minard in nViZn. I hope you see that there’s a fundamental difference between demonstrating the structure of Minard using an explicit executable specification vs. explanations in ordinary language or miscellaneous collections of graphics primitives (point, line, area, text, etc.).

[MF: I omit this diagram here.]

When I described the fourth use-case above (interpreting the Minard graph), I had in mind this graph of the elements in the XML. Each class in GoG can have methods that answer questions about that class. Thus, temperatureGraph has information about its scale and other attributes. By the use of introspection and reflection and extension, languages like Java can add other capabilities to classes without recompilation. But these additions need not involve specific customizations for every visualization, because GoG is object-oriented rather than functional.

Now let’s look at the code. The following is the formal Graphics Production Language that draws Minard when submitted to the nViZn interpreter in SPSS.

The blue section is simply a data specification. The actual graph specifications are in black. Now, you should compare this not only to the other programs’ sizes (orders of magnitude more prolix), but also to their ill-formed and ad-hoc organization. Programs that simply draw primitives (lines, areas, text, etc.) are not to be compared to a language like nViZn that is grounded in graphics objects. This is why I urged you not to waste your time reviving the Minard contest. Plenty of things can be programmed on a computer, but the resulting code doesn’t necessarily tell us anything about the meaning of what is produced. Every engineer recognizes spaghetti code. The other Minard programs are simply spaghetti.

Here is the result:

Figure 2: Minard graphic created in nViZn

I hope this answers some of the questions you had and clarifies the source of some of the disagreements. I’m especially concerned that nobody thinks GoG is a “theory” or a “taxonomy” or a “platform.” GoG is a mathematical system, and arguments against it (or relativizing it) need to be made on a mathematical basis. I’m not at all interested in whether someone draws the Minard map (one can do that best with Adobe Illustrator) or makes beautiful visualizations like D3 or Mathematica. GoG is about formulating the meaning of a graphic in a way that a computer can understand enough to draw it and answer questions about it.

Reply to Lee, August 25, 2017

I don’t really have time either to pursue an article such as I proposed (although it would be fun) but let me make clear that I agree with most of what you say. In fact, I strongly believe that the power of “GoG” theory is that it provides a coherent mathematical model for statistical graphs, and as you say, the key thing is that, in some sense, the language allows the computer to understand — and as actual proof, produce the content of a graphic.

That is the true power of the GoG approach for me — the clear arrangements of the steps from data to a finished graphic, each with its attributes and components, and also the ideas that steps have a clearly defined temporal order, and one can see violations of the grammar as a consequence of ignoring some of its features. This is brilliant, but my main point is that it only goes so far.  

(I’ll leave aside as unnecessary and unproductive here the question of uniqueness — are there other distinguishable, but functionally equivalent graphics postulates? Like a different set of axioms for some geometry.)

Implementation matters

I think we differ on the question of implementation — GoG can be implemented in a variety of different computer languages that are not all equivalent from my perspective. You say that only the mathematical description of a graph in GoG is important. I say that there is more.

Even if different implementations are functionally equivalent to a computer and produce identical results, some software language implementations can be considered more expressive than others, in that:

  • Expressive power: ease of translating what you want to do into the graphic output you want
  • Elegance: the code can be “read” by a human in a more (or less) comprehensible way, one that offers more (or less) insight into the relation between the graphics specification language and the code, or 
  • Extensibility: code can be more (or less) easily extended to encompass additional aspects of the process of composing a graph in a common language.

 I see this as a separate issue from that of the mathematics behind the theory, but an important one that should not be dismissed. The mathematical aspects of GoG provide the theory; implementations define the practice.

A prime example comes from my ancient work with Logo, where the language features of recursion, list processing, etc. (inherited from Lisp, with sufficient syntactic sugar) made it a delight to understand the structure of complex Moorish tiling patterns, trees, both real and abstract, etc. in their very simple language description (Friendly, 1988).  A square of size X wasn’t just a collection of four (x,y) points with constant shifts, but rather the simple path of a turtle doing 

REPEAT 4 [FORWARD :X RIGHT 90]. 

That simple change in frame of reference for a computer language from Cartesian coordinates to Turtle-centric coordinates made it possible to introduce young children to what Seymour Papert called “Powerful Ideas” in Mindstorms (Papert, 1980).

One simple illustration: A spiral is the path of a turtle going [FORWARD :size RIGHT :angle] and then doing the same with a slightly larger value:

Children are delighted to see that a small change from a square spiral makes an artistic image.

I also argue that the implementation of ggplot2 and its extensions is both a testament to the power of GoG and also a way to make the specification of a wider array of graphs much easier to actually specify in a largely coherent syntax.  I am not knocking the nViZn implementation at all, but I better love the idea of plots as composed of multiple layers, connected by “+” signs in ggplot2. To a human, this is much easier to read and write than the nested function calls of nViZn.

Another part of this is a codification of the process of ideas about data manipulation in terms of manipulating cases (filter, sample, sort, …) or variables (select, combine, transform, mutate), grouping/aggregating and multi-table, SQL-like constructs.  I know that a lot of this is implicit in your Chapter 5 on Algebra of data, but I don’t know of any implementation of this.  

The R implementation probably differs from what you may have been thinking. The features for data manipulation (dplyr), data import (readr), handling dates (lubridate), strings (stringr), databases (DBI), foreign files from SAS, SPSS, … (haven), are becoming increasingly coherent.  But one key feature is the powerful idea of a data pipeline, in which a graphic call can be an element of the chain. This is not at all new in CS: it goes back to Kernighan & Ritchie and unix pipes (“|”), now implemented in R with “%>%” syntax.  But the combination is greater than the sum of its parts.  

For a wider appreciation of the idea of “implementation matters”, the website 99 Bottles of Beer gives 1,500 different program language implementations of the song, 99 Bottles of Beer, not a great test case, but at least it illustrates the range and expressivity of different language paradigms. All do the same thing, but some languages do it  more elegantly.The Tower of Hanoi problem is another that has attracted implementations in a wide variety of programming languages, but I don’t know of any comprehensive collection.  I still like my Logo version in Advanced Logo (Friendly, 1988):

Reading the code is all you need to know if you understand recursion.

A key feature of this way of thinking was that the MOVE function could be anything: it was the only part that determined the “output”— it could:

  • simply print “Move disk 1 from Tower A to B”,  
  • draw the effect on a screen in various ways, 
  • instruct a robot arm to actually pick up a disk and move it, or 
  • draw a tree diagram reflecting the history of all moves from the start to a solution.

On my Ubuntu development server, I have Xscreensaver in my startup, and one of my favorites is a 3D open GL version that renders the HANOI process with an arbitrary number of disks, animated to rise from a peg, and rotate in 3D several times before it lands on the destination peg.  All of this while the entire scene is also rotated in the XY plane, with lighting and shading applied to all the disks and pegs.  Expressive power, elegance, and extensibility are all rightful criteria for comparing graphics programming languages.

Minard

You took issue with my idea that Minard’s graph could still be used as a test case for comparing different graphic programming languages, or— my larger thought, that it could inspire people to think about what Minard wanted to say–and create other graphics to try to tell other aspects of the story.The fact that Minard’s graphic can be reproduced in spaghetti  code (such as graphic systems based on points, lines, areas, text primitives) doesn’t say that there are not other, principled and informative ways to draw this graphic. As a test case, I tried to do a better job than I had done before with ggplot2 in R. The code for this figure is in a github gist.

Figure 3: Minard graphic reproduced with ggplot2

What I learned from this little exercise is that, as you say: the devil is in the details. Or when I invoke the 80-20 rule:  I could get the basic 80 percent form right with 20 percent of the code and effort, but the graphic details took the remaining 80 percent of effort, and I don’t regard this as a finished product—certainly not to Minard’s standards.  

But another point here is that to say that the idea that a graph such as this can simply be reduced to the mathematical specification of GoG or in ggplot2 notation misses the point that there is a lot of human judgment that goes into the use of aesthetics, plot annotations, etc.  

The main area where we disagreed, at dinner and now, had to do with my point that there is much more to a statistical graphic than can be captured in the GoG, and that has to do with human understanding and the purpose for which graphs are drawn.  Minard designed his wonderful graph to tell a particular and poignant story about the folly of leaders who would sacrifice so many for their own glory.  That part, which is poetry (Friendly & Wainer, 2021), cannot be captured in a mathematical theory of graphs, even if one can reproduce it in an elegant syntax.

Coda

Looking back at this discussion four years later, it doesn’t seem we differed as much as I made out. Our beliefs and values about data graphics largely overlapped. It was more a matter of emphasis. Lee was correct that GoG was as close to a complete mathematical theory of data graphics as has ever been conceived. I was talking about a different aspect, more to do with the beauty of code as a mental pipeline to take the idea of a graph and produce what you want with as direct a connection between the idea and the result as possible. 

In the history of data visualization, there are some events that stand out as more general and influential, beyond some exquisite and beautiful, but specific, examples. Playfair’s constructions of nearly all the modern forms of data graphics around 1800 (line chart, bar graph, pie chart) is one such event we celebrate as the “Big Bang” of data visualization. The next signal event came with Jacques Bertin’s (1967; English translation: 1983) Semiology of Graphics, which laid the framework for a comprehensive system of visual signs and their meanings to synthesize general principles of graphic communication.

But Bertin’s schema was conceptual, not computational. Like many verbally stated theories (famously, Freudian theory), there was no direct way to test, prove, or refute the assumptions or conclusions from Bertin’s system.  The Grammar of Graphics changed all that. Wilkinson asserted that GoG could produce any well-formed, syntactically correct graph, even those that had not yet been invented; perhaps more contentious was his claim that the strict framework of GoG could not produce any syntactically incorrect ones. 

The beauty of a computational specification of data graphics is, first, that the software can provide automatic equivalents of spell checking and grammar checking for text by throwing an error for any syntactically incorrect graph. The second, semantic checks are more varied and not easily automated: you can easily see if a kind of graph “works” or not by running test cases, just as mathematicians try to challenge a theorem by proposing a counterexample. The proof is in running the code! 

GoG has become the de facto standard conceptual structure for software implementations of data graphics, perhaps not in the strict pipeline shown in Figure 1, but certainly in terms of the ideas of layers, aesthetics, scales, stats, geoms, and facets. GoG ideas made their first appearance following Wilkinson’s initial SYSTAT package. In this, he tried to be able to reproduce nearly any graphic he had seen before. Some higher-level ideas became apparent, e.g., that histograms and Nightingale rose diagrams differed only in using cartesian vs. polar coordinates.

Around the time that SYSTAT (1995) was acquired by SPSS, Lee and others had developed the initial specifications for what would become the Graphic Production Language (GPL), which embodied what would become the GoG and was implemented as the underlying machinery for SPSS graphics. In 1996, Lee and Dan Rope developed nViZn as a set of Java classes (see the essential ideas here.) All the essential ideas of GoG are present in the data flow diagram they used to explain the system (Figure 4). 

Figure 4: Dataflow diagram explaining the steps of creating a graphic from a data source in nViZn. The sequence of boxes depicts the algebraic transformations from one step to the next, with alternatives shown explicitly.

Shortly thereafter, Lee teamed up with Graham Wills to take the nViZn Java classes to a Whiz-Bang demonstration project. They created a Java application (a .jar application) called AutoVis (Wills & Wilkinson, 2010). When launched, it presented a box saying, “Visualize Me.” As proof-of-concept, you could drag and drop nearly any kind of object onto the box. Standard things like tables or spreadsheets were easy, but you could also drop the text of Moby Dick and see a network diagram of all the main actors. AutoVis is now an active Python project.

In a nearly coincident step, Hadley Wickham initially wrote the ggplot package for R in 2008, translating Wilkinson’s framework into the object-oriented structure for R.  But to my main point, the implementation syntax clearly did not work cognitively: plots used nested function composition, like h(g(f(x))),  as in ggpoint(ggplot(mtcars, list(x = mpg, y = wt))). This syntax was perfectly understandable to a computer, but like LISP was challenging for a human ( To counteract, code editors like emacs developed brace matching and indentation schemes.  LOGO was actually based on LISP, but with the change in syntax it was called “LISP without tears.”)

The result was a completely new implementation, ggplot2, using a formal, algebraic representation of layers as ggplot objects joined by ‘+’ (Wickham, 2010). This redesign also hewed closer to the formal GoG framework of stats, geoms, coordinates, scales, and aesthetics, all with sensible functions and inheritance of defaults.

After that time, the essential ideas of Wilkinson’s GoG (and it’s ggplot2 version) have formed the structure, and provided inspiration for implementations of data graphics in a wide variety of computer languages and software systems. Some examples are: Python (altair, plotnine); Javascript (Vega-lite, D3js, Observable Plot); Julia (gadfly). What is now Tableau Software began life as Polaris (Stolte & Hanrahan, 2000), structured around the GoG framework.  It is no accident that Wickham’s (2009) ggplot2 book was subtitled Elegant Graphics for Data Analysis. But there were several aspects of GoG that weren’t made explicit in the ggplot2 structure.  One was the idea of an Algebra for variables in the VarSet to be plotted, comprised of operators cross, nest, … to compose plotting variables from those in the input data set. Another important idea, but not shown explicitly in Figure 4 was that of the pre-processing that took place between the DataSource and the VarMap. This is now the subject of great development, called Tidy Data (Wickham, 2014) or the tidyverse in R. Lee would have been well-pleased with this. Data graphics today progresses so broadly because it stands on the shoulder of a giant.


References

Bertin, J. (1967). Sémiologie Graphique: Les diagrammes, les réseaux, les cartes. Paris: Gauthier-Villars, 1967

Bertin, J. (1983). Semiology of Graphics. Madison, WI: University of Wisconsin Press.

Chomsky, N. (1957). Syntactic structures. The Hague: Mouton.

Friendly, M. (1988). Advanced Logo: A Language for Learning. Hillsdale, NJ: L. Erlbaum Associates.

Friendly, M. and Wainer, H. (2021). A History of Data Visualization and Graphic Communication. Cambridge, MA: Harvard University Press.

Papert, S. (1980). Mindstorms: Children, Computers, and Powerful Ideas. New York: Basic Books.

Stolte, C. and Hanrahan, P. (2000). “Polaris: a system for query, analysis and visualization of multi-dimensional relational databases,” in IEEE Symposium on Information Visualization 2000. Proceedings,  pp. 5–14, doi: 10.1109/INFVIS.2000.885086 

Wickham, H. (2009).  ggplot2: Elegant Graphics for Data Analysis. Springer New York.

Wickham, H. (2010). A Layered Grammar of Graphics, Journal of Computational and Graphical Statistics, vol. 19, no. 1, doi: 10.1198/jcgs.2009.07098.

Wickham, H.  (2014). Tidy Data. Journal of Statistical Software, 59(10), 1–23. https://doi.org/10.18637/jss.v059.i10 

Wilkinson, L. (1999). The Grammar of Graphics. New York: Springer.

Wilkinson, L. (2008). The Future of Statistical Computing. Technometrics, 50(4), 418–435. http://www.jstor.org/stable/25471520

Wills, G. and Wilkinson, L. (2010).  “AutoVis: Automatic visualization,” Information Visualization, 9(1), 47-69.

The post Colorless Green Graphs Sleep Furiously: A Conversation with Leland Wilkinson appeared first on Nightingale.

]]>
10656
Navigating the Wide World of Data Visualization Libraries https://nightingaledvs.com/navigating-the-wide-world-of-data-visualization-libraries/ Tue, 22 Sep 2020 18:53:06 +0000 https://dvsnightingstg.wpenginepowered.com/?p=5013 Graphics and visualization developers often get presented with a simple yet difficult question: “Which visualization library should I use?” Typically, making this decision is not about..

The post Navigating the Wide World of Data Visualization Libraries appeared first on Nightingale.

]]>
Graphics and visualization developers often get presented with a simple yet difficult question: “Which visualization library should I use?” Typically, making this decision is not about whether one library is “better” than another, but whether the specific library is more suitable for what the developer is trying to achieve.To answer this question thoroughly, we need to start by better understanding the actual design space of visualization libraries. Based on a survey of web-based libraries (i.e. javascript packages), we could conceivably map out a landscape based on two attributes:

  • Level of Abstraction — which maps roughly to two aspects: The first is the effort required from the developers to create a visualization. Higher-level libraries usually require fewer lines of code and/or fewer concepts to learn compared to lower-level libraries. Another is expressivity, or how much you can customize, which is in the opposite direction. Higher-level libraries will not let you customize much while you can have a lot more freedom and flexibility with lower-level libraries.
  • API Design — The choices made by the authors which control how the code should be written.

API Design

Let’s start with the API Design. As we go through each level of abstraction after this, you will get a better sense of the available varieties. Libraries that provide the same level of abstraction may offer different forms of APIs. It is important to avoid confusing API design with level of abstraction.

  • A large number of libraries offer plain Javascript API. Theydo not depend on specific frameworks such as ReactVueAngular, etc. For example, D3 does not depend on any one framework. The advantage of being framework-agnostic is flexibility; they can be used anywhere. The code, however, tends to be more imperative (closer to machine instructions) than declarative (closer to the output that humans want to see).
const chart = new Chart();
chart.addAxis(new XAxis(...));
chart.addSeries(new LineSeries(...));
  • Some libraries, such as Vega, declare their entire APIs in a single JSON configuration. With the JSON constraint, they cannot accept any function or custom object as part of the arguments, avoiding imperative instructions. This constraint enforces a more declarative API. It also means the configuration can be easily serialized and stored as text files, or used with command-line tools. In return, it is more difficult to integrate with other libraries.
{ "x": "time", "y": "price", "series": [ ... ] }
  • Some libraries, such as ECharts, are in between and offer the hybrid JSON with callbacks approach.Instead of plain JSON, they declare the entire API as a single configuration Javascript object which can also include functions and sometimes non-primitive values. Their simple configurations may look just like the plain JSONs though at a surface level. The added function support allows advanced customization, more flexibility and easier integration with other libraries. This flexibility is traded with the serializable text output and strict enforcement of a fully declarative API.
{ "x": d => d.time, "y": d => d.price, "series": [ ... ] }
  • Other libraries yet fully embrace syntax of specific frameworks (e.g. React) and provide better integration. For example, using a React-based library in a React web application project will be more natural and provide better overall code consistency and optimization opportunities compared to adding alien blocks of D3 code. The drawback is they require prerequisite knowledge about the framework and are only appropriated to be used in a project where the main framework is also used.
<Chart>
<XAxis />
<LineSeries />
</Chart>

Some libraries also offer multiple forms of APIs. For instance, deck.gl has @deck.gl/core@deck.gl/react and @deck.gl/json modules that offer plain JS API, React-based API and declarative JSON API, respectively.

Level of Abstraction

These levels map roughly to the effort required to create a visualization and expressivity. In other words, the higher level libraries usually requires fewer lines of code to produce a usable visualization but then there are fewer things you can customize. On the other hand, you can customize more and more the lower-level you choose, but have to put in more work yourself.

A metaphorical representation of Levels of Abstraction

Composable Building Blocks (level 2–4)are fragments that can be composed to produce a visualization. If using the graphics libraries is like trying to build a house from freeform clay, using composable building blocks is like trying to do the same thing using a box of LEGOs. You can assemble these LEGOs anyway you want. The limitations are based on the kinds of LEGOs you have. You can also use multiple building block libraries together, as long as they are compatible.

The building blocks are at the sweet spot when you need more flexibility than the chart templates (level 5) and still want to stand on the shoulder of giants instead of starting from almost scratch (level 1).

1. Graphics Libraries

This group of libraries lets a developer draw visual elements directly or perform traditional computer graphics operations (scene graph, shading, etc.). They are the closest to native APIs such as Canvas or WebGL. They have the maximum level of expressivity and in return require the most effort to produce the same visualizations. If you are trying to produce a quick bar chart immediately out-of-the-box, these are probably not for you. However, these libraries let you tune for deep performance optimization or produce wild graphics that the higher-level libraries may not offer.

Example from react-three-fiber

ExamplesProcessingp5*jsRaphaelRough.jsthree.jsPhiloGLluma.gltwo.jsPixiJSreact-roughreact-three-fiber

For instance, this is the amount of code required to setup an empty canvas and draw a single rectangle with p5.

import p5 from 'p5';

const p = new p5(function(sketch) {
  sketch.setup = () => {
    sketch.createCanvas(200, 200);
  };
  sketch.draw = () => {
    sketch.background(0);
    sketch.fill(255);
    sketch.rect(100, 100, 50, 50);
  };
});

Draw a rectangle with p5*js, which has plain JS API

Similarly, this is how to draw a single rectangle with react-rough. Some higher-level libraries may let you create an entire bar chart with the same amount of code.

<ReactRough>
  <Rectangle x={15} y={15} width={90} height={80} fill="red" />
</ReactRough>

Draw a sketch rectangle with react-rough, which has React-based API.

2. Low-level Building Blocks

Basic LEGO blocks (source)

The low-level Building Blocks are quite independent and flexible. Each component or utility in these libraries serves particular purpose and can be used in combination with components from the same libraries or other libraries to create a visualization. How they should be combined is roughly defined and leaves a lot up to the discretion of the developers.

The most notable of this is D3 , which evolves from the early frameworks in other languages (PrefuseFlare). D3 completely changed the landscape of visualization authoring in the past decade. It introduced a suite of low-level components and utilities, such as selectionscalesformatting, etc. while leveraging the common standards such as DOM and SVG instead of defining all constructs by itself.

In the example below, multiple building blocks offered by D3 (scales and selection) are used in combination to create a simple bar chart.

const x = d3.scaleBand().rangeRound([0, width]);
const y = d3.scaleLinear().range([height, 0]);
const svg = d3.select("svg").attr("width", width).attr("height", height);

x.domain(data.map(d => d.date));
y.domain([0, d3.max(data, d => d.value)]);
svg.selectAll("bar")
    .data(data)
  .enter().append("rect")
    .style("fill", "steelblue")
    .attr("x", d => x(d.date))
    .attr("width", x.band())
    .attr("y", d => y(d.value))
    .attr("height", d => (height - y(d.value)));

Create a bar chart with D3

In addition to D3, many libraries offer specialized components and utilities with unique functionalities. Even though many of them have the d3-prefix in their names, not all of them really depend on D3. To name a few:

  • cola and Cytoscape provide various graph layout algorithms.
  • d3-annotation takes annotation to the next level.
  • d3-cloud provides word cloud algorithm.
  • d3-legend creates nice legends for your scales.
  • flubber smoothly interpolates between 2-D shapes.
  • labella helps you place label on a timeline.
  • visx provides native React building blocks that wraps D3 and SVG.
  •  

3. Visualization Grammars

From the blueprint, a LEGO mini-figure consists of 8 body parts. Any LEGO mini-figure can be described using these 8 parts and accessories.

In the middle of the “building blocks” range are the Visualization Grammars. They have their roots in The Grammar of Graphicswhich was introduced in the late 1990s and offered a new perspective on designing statistical graphics. Instead of referring to charts by their traditional “types” — bar, pie, scatter plot, bubble, etc., — the book calls out their shared structures and introduces the ideas of using these common concepts to describe any chart.

Similar to how the grammar of a language, such as English, defines part of speech (noun, verb, etc.) and gives you a structure for combining these parts into a meaningful sentence, the grammar of graphics defines it own parts and provides a structure for combining them to describe an output graphics. This rigid structure is what differentiates them from the low-level building blocks.

Object-oriented graphic specification— Leland Wilkinson. “The Grammar of Graphics” p. 7

See the chart below and its specification in the grammar of graphics. The chart is broken down into parts and described as a composition of DATA, SCALE, COORD an ELEMENT.

An example application — Leland Wilkinson. “The Grammar of Graphics” p. 191

The most famous implementation of the Grammar of Graphics is ggplot2, which dominates the R and data science communities. On the web, Vega let users describe visualizations in JSON, and generate interactive views using either HTML5 Canvas or SVG. Vega-Lite provides a higher-level grammar equivalent to ggplot2 level with interactions, which is compiled into Vega and rendered using the same engine.

The code block below is a specification of a bar chart in Vega-Lite. The dataset data is described separately. The mark and encoding fields are equivalent to the ELEMENT part of the Grammar of Graphics and its aesthetics.

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
  "description": "A simple bar chart with embedded data.",
  "data": {
    "values": [
      {"country": "China", "population": 131744}, 
      {"country": "India", "population": 104970},
      {"country": "US", "population": 29034}
    ]
  },
  "mark": "bar",
  "encoding": {
    "x": {"field": "population", "type": "quantitative"},
    "y": {"field": "country", "type": "nominal"}
  }
}

Create a bar chart with Vega-Lite, which describes everything as pure JSON.

In contrast to the APIs of Vega-Lite which is JSON, G2, and Muze all provide visualization grammars with Plain JS APIs while Chart Parts was built for React. See the G2 code to create a bar chart below. Notice the different API design compared to Vega-Lite.

import { Chart } from '@antv/g2';

const data = [
  {country: "China", population: 131744}, 
  {country: "India", population: 104970},
  {country: "US", population: 29034}, 
];

const chart = new Chart({ container: 'container', autoFit: true, height: 500 });
chart.data(data);
chart.coordinate().transpose();
chart.scale('population', { nice: true });
chart.interval().position('country*population');
chart.render();

Create a bar chart with G2, which has plain JS API

4. High-level Building Blocks

Pre-assembled LEGOS: You still need to put them together to create a bathroom. (source)

If the low-level building blocks are equivalent to individual LEGO bricks, which are very flexible and can be combined in many different ways, these high-level building blocks are pre-assembled larger pieces.

Similar to the visualization grammars, each of the high-level building blocks librariesalso comes with its own set of components and predetermined way to assemble these components into a chart. However, there are a few common differences that place this group of libraries in between the visualization grammars and the chart templates levels:

  • Some libraries combines axes and scales together. SCALE in the grammars is considered one part while axes are part of GUIDE
  • The high-level building blocks libraries sometimes embed data in multiple places, commonly within the series along with aesthetics configurations. The grammars treat data (DATA) and transformation (TRANS) as separated parts and only reference to field names or derived variables in the ELEMENTs, .
  • More commonly, they are more relaxed from the “no-chart-type” philosophy and may include convenient templates as series or layerto encapsulate special logic for more complex chart types, such as stream graph, etc. This makes them closer to the chart templates. However, they still do not refer to the entire chart by just a chart type, which is more common for the chart templates.

As an example, a candlestick chart can be described with grammars as combined layers of bar “marks” and line “marks.” For convenience, a high-level library may provide CandlestickSeries to combine the two layers into one and encapsulate logic for encoding the aesthetics. This CandlestickSeries is then composed with axes and gridlines to create a chart. On the other end, a chart template library may provide aCandlestickChart component which already includes axes and gridlines and only ask for data.

Candlestick Chart (reference)

Several libraries utilize the JSON with callbacks design such as EChartsHighcharts and Plotly. The example below is a simple one so the configuration object looks just like plain JSON.

option = {
  xAxis: {
    data: ['2017-10-24', '2017-10-25', '2017-10-26', '2017-10-27']
  },
  yAxis: {},
  series: [{
    type: 'candlestick',
    data: [
      [20, 30, 10, 35],
      [40, 35, 30, 55],
      [33, 38, 33, 40],
      [40, 40, 32, 42]
    ]
  }]
};

Create a Candlestick chart with ECharts

Many libraries were later created for React such as VictoryReact-Vis or Semiotic. They provide components, such as <XYPlot/><LineSeries/>, or <XAxis/>, that can be composed into the desired visualizations.

<VictoryChart
  theme={VictoryTheme.material}
  domainPadding={{ x: 25 }}
  scale={{ x: "time" }}
>
  <VictoryAxis tickFormat={(t) => `${t.getDate()}/${t.getMonth()}`}/>
  <VictoryAxis dependentAxis/>
  <VictoryCandlestick
    candleColors={{ positive: "#5f5c5b", negative: "#c43a31" }}
    data={sampleDataDates}
  />
</VictoryChart>

Create a Candlestick chart with Victory, which has React-based API

Compare the ECharts code (line 7) and Victory code (line 8–11) above to Vega-Lite code (line 24–39) below. Notice how the candlestick shape is described as a single series vs. two layers of marks.

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
  "width": 400,
  "data": {"url": "data/ohlc.json"},
  "encoding": {
    "x": {
      "field": "date",
      "type": "temporal",
      "title": "Date"
    },
    "y": {
      "type": "quantitative",
      "scale": {"zero": false},
      "title": "Price"
    },
    "color": {
      "condition": {
        "test": "datum.open < datum.close",
        "value": "#06982d"
      },
      "value": "#ae1325"
    }
  },
  "layer": [
    {
      "mark": "rule",
      "encoding": {
        "y": {"field": "low"},
        "y2": {"field": "high"}
      }
    },
    {
      "mark": "bar",
      "encoding": {
        "y": {"field": "open"},
        "y2": {"field": "close"}
      }
    }
  ]
}

Create a Candlestick chart with Vega-Lite

Another notable example of high-level building blocks abstraction is deck.gl and its comprehensive collection of layers that can be combined to produce map-based visualizations.

import {Deck} from '@deck.gl/core';
import {ScatterplotLayer} from '@deck.gl/layers';

const INITIAL_VIEW_STATE = {
  latitude: 37.8,
  longitude: -122.45,
  zoom: 15
};

const deckgl = new Deck({
  initialViewState: INITIAL_VIEW_STATE,
  controller: true,
  layers: [
    new ScatterplotLayer({
      data: [
        {position: [-122.45, 37.8], color: [255, 0, 0], radius: 100}
      ],
      getColor: d => d.color,
      getRadius: d => d.radius
    })
  ]
});

Create a scatter plot on the map with deck.gl using its plain JS API. A ScatterplotLayer is one of the many layers offered by deck.gl that can be composed and placed on top of a map.

5. Chart Templates

Chart Templates are like completed LEGOs. Just look at the catalog and choose the one you want. (source)

A library of this type can range from containing a single component to hundreds of components. Each component is referred to via its chart type, e.g., Bar, Pie, Area, Stacked Bar, Stacked Area, Waterfall, Bump, Calendar, Treemap, Marimekko, Sunburst, ColumnWithLine, Dual line etc.

The best thing about the chart templates is they are often ready to use, straight out of the box and require the least effort to produce a usable output. Developers can choose a chart type from its catalog, prepare data in the documented format then plug the data and component together.

Instead of trying to describe a pie chart with a grammar or learn how to implement it in D3, just check if the library provides a pie chart component. If there is such component, then use it. If not, then find another library or alternative.

Also, novel visualization types (such as a new technique that just came out of research) are often offered as a single-component library like this.

Examples:

const myRadarChart = new Chart(ctx, {
  type: 'radar',
  data: data,
  options: options
});

Create a Radar chart with Chart.js

import { Calendar } from '@nivo/calendar';

<Calendar
  data={[
    { "day": "2016-02-05", "value": 397 },
    { "day": "2015-09-17", "value": 283 }, 
  ]}
  from="2015-04-01"
  to="2016-12-12"
  emptyColor="#eeeeee"
  colors={[ '#61cdbb', '#97e3d5', '#e8c1a0', '#f47560' ]}
/>

Create a calendar chart with nivo, which has React-based API

The level of abstraction in a data visualization library is a continuous spectrum, not a discrete set of layers. Therefore, you may run into libraries that are somewhat borderline. What is critical is not a semantic distinction between levels, but more so the developers’ abilities to understand the offered abstraction to select a library that is appropriate for their own use cases and comfort. It is in fact not uncommon for libraries to offer features from multiple levels of abstraction. To give a few examples:

  • dc.js has both chart templates and high-level building blocks.
  • G2Plot is a catalog of chart templates on top of G2, which is a grammar.
  • react-vis has both high-level building blocks (<XYPlot />) and chart templates (<Sankey />).
  • In fact, D3 also spans multiple levels. For example, d3-scale is using the scale concept from the grammar level while d3-shape is closer to the graphics libraries.

Parting Thoughts

To recap, the goal of this article was to map the vast universe of data visualization libraries and derive an underlying framework for better understanding them, regardless of whether you are simply picking one to use or trying to develop a new one. We’ve organized a variety of libraries across a spectrum from low-level graphics manipulators to chart components that are ready to use, straight out of the box.

While there were many libraries that were mentioned, I would be remiss if I didn’t state that this listing is by no means exhaustive. I simply endeavored to describe the characteristics of each group and then selected some prominent libraries that best exemplified each category. This post also focused only on web-based libraries, and I would be interested to see such an approach extended to other languages and platforms.

A magnificent Beauty and the Beast Library built with LEGO by Sarah von Innerebner (source)

When deciding which library to use, look for the appropriate abstraction level for the time you have, your own coding comfort, the tasks you are trying to accomplish, and the target developers and users. Then look at API design and other factors that might be included into the consideration, such as:

  • Rendering technology: SVG, Canvas, WebGL
  • Performance: Bundle size, Speed, Server-side Rendering
  • Others: Type-safety, License, Theming, Animation, etc.

I hope you enjoyed the tour and learned a few things along the way! Perhaps the next time you come across a new package, you can use this framework as a lens to analyze its own offerings and how it is different from or similar to the libraries that you already know.

Acknowledgment

Thank you Kanit Ham Wong and Senthil Natarajan for their feedback.

The post Navigating the Wide World of Data Visualization Libraries appeared first on Nightingale.

]]>
5013
Drawing Neurons From Sound And Music In Real-Time https://nightingaledvs.com/drawing-neurons-from-sound-and-music-in-real-time/ Mon, 04 Nov 2019 20:18:51 +0000 https://dvsnightingstg.wpenginepowered.com/?p=5076 Inspired by Neuroscience, we can start to answer the question: What does music look like? I know that I am not the only one that,..

The post Drawing Neurons From Sound And Music In Real-Time appeared first on Nightingale.

]]>
Inspired by Neuroscience, we can start to answer the question: What does music look like?

I know that I am not the only one that, when listening to music, likes to imagine shapes and colors as well as growing and ever-evolving visual patterns. Up until recently, I did not know that music visualization¹ ² was a common thing in creative coding projects. As an attempt to reconcile my neuroscientific and academic background with my current data processing environment, I created an audio and music visualizer that uses the information within audio streams to create a neural forest, all in real time. Here is how it looks after a fragment of “Claire de Lune” from Debussy:

Pay attention to growing speed and patterns and how these follow the music.

NOTE: I am not going do describe detailed mathematics here. If you are interested in knowing more about the algorithm and how it works feel free to contact me! Source code can be found at the end of the text.

The sound Neuron forest in detail

Looking at an original drawing by 1906 Nobel Prize winner Santiago Ramón y Cajal, we can see that a pyramidal neuron looks like this:

Taken from here and part of the collection in Instituto Cajal del Consejo Superior de Investigaciones Científicas, Madrid, Spain.

The soma in the center (the body of the neuron) has an axon (the wider branch) and a dendritic tree (rest of the smaller branches)³. Through a process called neurogenesis⁴, these specialized cells grow from simple immature somas to intricate mature neurons⁵ like the one we see here, with branches randomly spreading out of the middle soma. If we could simulate this growing then, it seems that a good idea to do so is by means of random walkers⁶, wherein very general terms and without going into mathematical details, the position p of every growing branch b at the step t+1 is randomly chosen. True random walks will generate very intricate dendritic webs (there is no directionality gradient) so it would be wise to change direction based on a probabilistic approach, where all new positions in branching patterns are only changed if that given threshold is passed. Very importantly, we can see from Cajal’s original drawing that the farther away a dendrite or axon goes from the central soma, the thinner it gets and the more intricate branching patterns emerge. We can also replicate this again with a probabilistic take, where after a certain axon/dendrite width is passed, we generate a new branch (or walker in mathematical terms). If you are interested into the mathematics of random walks, which are applied in the code, there are many sources that go into more details⁶ ⁷.

To achieve all this, I used the Java-based programming language Processing. Processing has some very powerful tools and functions that allow it to integrate graphical and audio streams into one single pipeline, making the job easier.

Luckily for me, the Processing community is very active and through openprocessing.org I found a sketch code simulating a growing tree that suited my needs. The original random branching algorithm, to which I adapted and added sound input (and more) to create this project, is taken from here.

After some coding (a link to it at the end of the text), I managed to create simulations that imitate neuronal branching. Compare a drawing of a neuron using the code with what Cajal drew:

A simulated neuron drawing.

Similar, don’t you think? Interestingly, given that the regularity is provided only at the initial branching parameters (like size, reach and transparency) while branching follows a probabilistic random growth, each time you run the drawing you will get a completely different neuron:

Six neuron drawing simulations, all created with the same initial growth parameters. Same size, different patterns.

With this, we have the base of the drawing figured, now let’s bring this neurons alive with sound and color!

Remember when I said that the drawing parameters are fixed? What about changing these parameters based on an input? Even further, what about making this input audio, so depending on the input, the growing pattern and velocity changes? This is exactly what I did next.

The easiest approach with sound would be to work with a real-time amplitude (or intensity) stream of the signal as a growing parameter. In very simple pseudo-code this would look like this:

sound_neurons(neuron_width,neuron_reach,amplitud_stream){ drawing_commands;}

Where amplitud_stream is the only parameter that is updated via real-time audio streaming (see source code for more information). First, let’s change the diameter and transparency of the soma and dendritic branches based on this stream at the moment of running the simulation. The louder the input, the larger the neuron will be. Also, let’s track the drawing so we can see how the neuron branches in real time.

A small sound_Neuron from a quiet input followed by a large one generated from a loud input.

Now things start to look very interesting! With this tweak, we are beginning to have sound-reactive neurons. With a quiet input (which was me literally whispering into my microphone) I got a small neuron. After a very loud input, the next drawing is much bigger! This is why I decided to call these drawings sound_Neurons.

Another thing we can try is to manipulate the drawing speed, again with the streamed amplitude. Look again at the cover image of this story and you might see that the growing speed in each sound_Neuron follows a bum bum….. bum bum…. bum bum… rhythm.

I then decided to randomly change the colors (in HSV space⁸ in the video example) of the neurons each time they appear so the final picture is more colorful:

sound_Neurons in Color

Finally, Processing is able to stream and read audio files, so the final and logical thing for me to do was integrate these colored sound_Neurons with music! For this, I decided to use “Claire de Lune,” the beautiful piano suite from Claude Debussy, as the driver for a neural forest. By driver I mean that the intensity and changes in music rhythm will dictate how this forest grows. The final result is this image:

Generated in real-time in the video at the beginning of the text while using a dark background so neurons really fill the space with sound. Even after many runs, I still found it mesmerizing to watch. I can run the simulation many times, over and over again.This side project of mine turned out to be much more than what I anticipated in the beginning. Listening to music through these neural forests helped me bring to life the shapes and colors I talked about imagining in a rich and rather personal way. I like to think of this as my take on trying to mix a little bit of science with art and signal processing. Like so, this project is a reflection of who I am. I did my graduate studies in Neural Networks and as someone who loves music and visual arts, this was a mean of expressing my passion for both sides.

I hope you enjoyed reading and watching these animations too. I certainly hope to continue exploring this beautiful field of music visualization in the future and with luck, again with sound_Neurons.

NOTE: What song would you like to see colored by neurons? Let us know in the comments and we will randomly select one or two and generate a new sound_Neuron forest.

Thank you for reading!


Source code here

References:

[1] https://en.wikipedia.org/wiki/Music_visualization

[2] https://medium.com/nightingale/data-visualization-in-music-11fcd702c893

[3] https://qbi.uq.edu.au/brain/brain-anatomy/what-neuron

[4] https://qbi.uq.edu.au/brain-basics/brain-physiology/what-neurogenesis

[5] Kempermann, G., & Overall, R. W. The small world of adult hippocampal neurogenesis. 2018. Frontiers in neuroscience12, 641.

[6]https://www.mit.edu/~kardar/teaching/projects/chemotaxis(AndreaSchmidt)/random.htm

[7] ttps://medium.com/@ensembledme/random-walks-with-python-8420981bc4bc

[8] https://en.wikipedia.org/wiki/HSL_and_HSV

CategoriesCode Data Art

The post Drawing Neurons From Sound And Music In Real-Time appeared first on Nightingale.

]]>
5076