Erica Gunn, Author at Nightingale | Nightingale | Nightingale https://nightingaledvs.com/author/erica-gunn/ The Journal of the Data Visualization Society Wed, 14 Jan 2026 16:02:33 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 https://i0.wp.com/nightingaledvs.com/wp-content/uploads/2021/05/Group-33-1.png?fit=29%2C32&ssl=1 Erica Gunn, Author at Nightingale | Nightingale | Nightingale https://nightingaledvs.com/author/erica-gunn/ 32 32 192620776 Info+ https://nightingaledvs.com/info-plus/ Wed, 14 Jan 2026 16:02:12 +0000 https://nightingaledvs.com/?p=24511 Info+ is a long-standing data vis conference, held biannually in rotating locations. This year, it was hosted at Northeastern University in Boston (my alma mater),..

The post Info+ appeared first on Nightingale.

]]>
A quiet moment, before the conference begins. Image credit: Pedro Cruz

Info+ is a long-standing data vis conference, held biannually in rotating locations. This year, it was hosted at Northeastern University in Boston (my alma mater), chaired by Pedro Cruz of Northeastern and Sarah Williams from MIT. The event was an action-packed three days of workshops, keynotes, seminars and social activities, and even included an art exhibition at the MIT media lab.

Opening night exhibition at the MIT Media Lab. Photo credit: Pedro Cruz

The conference was a dose of concentrated inspiration, with a head-spinning line up of back-to-back 10-minute seminars by leading designers in the visualization field. By the second day there were definitely some unifying themes emerging from the blur of inspiration and ideas. 

You can find recordings and abstracts for all of the talks on the conference homepage. A few selected presentations are also linked below.

From communication-to towards communication-with

As someone who’s been in the data vis community for a long time, the biggest change I noticed was a shift in the general framing of data vis problems. Instead of Tufte-esque critiques of “proper” visualization techniques or discussion of misinformation and misleading graphics in politics, the conversation (at least in this conference) has shifted strongly toward more participatory practices in data vis.

Talking about inflation. Photo credit: Jose Duarte

Rather than talking about how to present data so that people will understand it, the focus was on how to have conversations—with people, using data—and how to include appropriate context and resolution to help them see how it fits into and reflects their lives. This was reflected in games talking about inflation at the grocery store and local biodiversity challenges in college classrooms, mapping inclusive and discriminatory spaces for marginalized communities to inform urban planning, and using info vis techniques to map informal transportation networks in developing nations.

Mapping exclusionary spaces. Photo credit: Sofia Burgos-Thorsen

When communicating with disenfranchised groups (like middle-schoolers impacted by extreme climate events and migrants hesitant about motivations behind the intervention), it can also be a challenge to overcome obstacles to communication, like self-censorship and diminished agency.

Visualizing marginalized perspectives

Across many talks, there was a focus on using data as a form of community expression, and using locally-generated data to capture experiences that are often left out of the dominant narrative. The conference exhibition included a project to record the important annual events for the Quecha people of the Amazon, organizing their year around important agricultural and cultural events.

Map of cultural practices created by the Quecha people. Photo credit: Catherine D’Ignazio and Claudia Tomateo

Another team used conversations with migrants to improve shelters, focusing on designing features that will support them best in their transition. Data can also help to articulate deep-rooted structural inequalities, or something as “simple” as pronouncing someone’s name. It may also help us to question what we memorialize, how, and why. 

Designing for impact

Some talks showed how to use data in a political context, as a tool for advocacy and creating change. One project focused on providing legal evidence to demonstrate systematic displacement in the West Bank, another created an archive of communities erased by urban redevelopment in Seoul.

Mapping the land of dispossessed farmers in the West Bank. Photo credit: Gauri Bauhuguna

A blanket woven from currencies served as an entry point into deeper discussions about economic impacts and the many reasons for migration, informing and humanizing policy decisions at the UN. One team collaborated with corporate sustainability offices to use biodiversity data to create better-informed sustainability policy and achieve more meaningful targets. Data can also help to illustrate what is lost when policies change, such as local shore changes for communities in the Mediterranean, and the pain caused by lost reproductive rights.

A blanket highlighting the economic impacts and reasons for migration. Photo credit: Sarah Williams

Advocacy is one form of impact; others take a more neutral approach. Some speakers discussed using data journalism to represent geopolitical conflicts in an unbiased but informative way. Others illustrated the importance of thoughtful visualizations focused on place and the need to keep things simple when dealing with the practical realities of fast-paced projects in a newsroom. Conversely, including details in your charts can sometimes make them better, more interesting, and more understandable.

Visualizing ship motions related to undersea cable damage. Photo credit: Irene de la Torre Arenas

New modes for visualizing data

Of course, the medium we choose also influences what we observe. The representation of time in social media platforms can shape and even distort our perceptions. Using different modes of visualization (including touch and sound) can help people engage with and better understand different habitats on the ocean floor.

Visualizing sea floor habitats with visuals and texture. Photo credit: Jessica Roberts

Textiles have deep traditional roots and can evoke a softer expression of meaning, especially in a cultural context. Acoustic data can have profound emotional impact as well as quantitative meaning, and mixing auditory and visual explorations can encourage different modes of exploration, as well as creating more accessible tools

Perhaps my favorite application of unexpected media was using folded paper as the basis for the conference identity, creating rich and nuanced visuals by simple physical means.

Behind the scenes view of creating a conference identity. Photo credit: Todd Linkner

Seeing the big picture

Stepping back from day-to-day practices, we also considered how visualization can be a reflection of worldview. Framing is a critical step for a designer grappling to create a visualization, and our underlying theories of change influence both how we approach and how we talk about data visualization.

Books that capture an entire worldview through visualization. Photo credit: Paul Kahn

What I didn’t hear

Across the entire conference, there was almost no mention of AI. Presenters were definitely using AI technologies for certain kinds of data, but their talks were focused on the output rather than the tools. The one talk focused explicitly on AI considered whether it is helpful to use visualization as an input for AI learning, and what properties of a visualization might make it more interpretable and more useful for training an AI. I’m not sure if that was incidental or intentional, but it was a notable absence when so much of our current discourse is dominated by AI froth.

Reflections to take forward

Coming out of these many conversations, I found myself wondering which of the “theory of change” approaches are most effective, for which audiences, and when. Some speakers mentioned negative receptions: from the CDC when talking about data rhetoric and emotional visualizations, and from institutions of higher education when talking about faculty pay inequity. Many others discussed the tangible impacts of their work in shifting stubborn social and policy problems.

As always, the key lies in consciously framing your data and your analysis: in terms of the context, your purpose, the audience, and the people impacted and involved. Across many projects, we heard designers talk about how to define and redefine the problem as a critical step in getting to insight and achieving a successful design. 

As a designer working in industry to create large platform software, I find that all design often gets simplified to UX. It was nice to step outside of that bubble for a moment and remember the many things that design does, and the different places that designers contribute. I do think there is an interesting conversation to be had between the perspective of creating large-scale tools to structure data exploration for decision making at scale, and the one focused on using bespoke and personalized data visualization for communication—either to or with—an audience once the analysis is complete. 

Many of the unique, nuanced and contextual factors in a dataset can get blurred out when analyzing data at scale, and much of the big picture gets lost when focusing only on the particularities of a specific dataset. And yet, both the large and the contextualized cases come down to helping humans create big-picture conclusions by understanding nuances in the data. Building systems to accommodate large, unwieldy, and heterogeneous datasets to connect across these different scales requires insights from both sides. Perhaps that’s a topic for the next conference.

CategoriesCommunity

The post Info+ appeared first on Nightingale.

]]>
24511
Mapping Change: How One Design Studio Navigated 20 Years at the Forefront of a Changing Industry https://nightingaledvs.com/mapping-change-applied-works/ Mon, 17 Nov 2025 17:14:30 +0000 https://dvsnightingstg.wpenginepowered.com/?p=24395 Applied Works is a London-based design studio celebrating their 20th anniversary. I sat down recently with founders Joe Sharpe and Paul Kettle to discuss their..

The post Mapping Change: How One Design Studio Navigated 20 Years at the Forefront of a Changing Industry appeared first on Nightingale.

]]>
Applied Works is a London-based design studio celebrating their 20th anniversary. I sat down recently with founders Joe Sharpe and Paul Kettle to discuss their work, changes they’ve seen in the industry over time, and to talk about the core principles that guide and focus their work.

Founders Paul Kettle (left) and Joe Sharpe (right)

Early influences

Joe and Paul met while in university. With a background in motion graphics, Joe has always kept an eye out for how a story evolves, frame-by-frame. Paul has a more classic graphic design background: his emphasis is information design and creating clarity for the user through in-depth understanding of his audiences. 

The two worked independently for a few years, and they joined forces to create Applied Works in 2005. Over the years, the studio has remained relatively small and has shifted focus multiple times to stay relevant in a changing landscape. Now, with 15 people, they’re on a growth path.  

Through their projects and clients, Applied Works has had unprecedented opportunities to witness the growth and transformation of an industry over time. From their early days working in moving image, branding, and websites, they had front row seats through the dot com bubble and learned how to code on the job. They ran tests on prototype devices like early satellite communications and the first iPad, and collaborated on many high-profile data vis projects, with the BBC, the Times in London, the London 2012 Olympic and Paralympic Games, and others.

BBC Class Calculator (2013). Source: Applied Works

As web technologies advanced, they upgraded their methods to support live data feeds, establishing style and component systems for code reuse. They also experimented with 3D maps for the 2014 Tour de France and advanced image filters for a Black Mirror project in 2018. Their Class Calculator project for the BBC became the broadcaster’s most-shared data tool in 2013. Lately, they have pivoted toward climate and environmental work with nonprofit partners — as well as projects tackling societal issues and inequality — collaborating with philanthropists, intergovernmental agencies, and think tanks to help them communicate complex data and nuanced narratives. They are also expanding their skills into data science and machine learning. 

Themes

The team relies on several “north star” behaviors to guide their exploration, helping to chart a course over complicated and changing terrain. Throughout our conversation, a few strong themes stood out.

Push the boundaries

In design school, Paul observed that the coursework was very structured and quite strict, but the most successful students were often the ones who did their own thing. To develop your own perspective, he realized early on that you need to push the edges to test who you are and find out what you think. Your initial instincts might be wrong, but that’s how you learn. This principle continues to shape how the studio approaches its work. An exploratory mindset and keen appetite for learning helps to feed their creativity and ideas.

Experiment to find out

In our influencer age, it’s worth emphasizing that success is not just about broadcasting your ideas and opinions and hoping that someone else follows along. You also need to test and refine those ideas based on feedback from the world. 

Experimentation and prototyping are a key part of the process at Applied Works. In order to find the limits, you need to push an idea as far as it will go, and then just a little bit further. When it starts to fail, you can pull back and find the place where it works. This process of tuning their approach through experiment, feedback, and course correction has been a consistent theme for Joe and Paul throughout their design practice.

Follow the creative tension

In addition to doing your own thing, you need to find something to push against and someone to negotiate with. Joe and Paul bring different contributions and viewpoints to their collaboration, producing a natural creative tension that drives their approach. 

Joe has a more technical bent. He often starts by analysing complex datasets to propose a narrative, and then they iterate together until it makes sense from both a user and a technical perspective. This collaboration allows the pair to use each other to get to a better solution than either would have achieved alone.

Creative tension also forms the foundation of client engagements. Clients bring new and interesting problems and constraints, and together the group negotiates a new set of solutions to meet those needs. They start by asking challenging questions to get the team thinking, and then they get deeply involved with a client problem and the data, understanding as much as they can about the science of what the client is doing. This process helps them identify the underlying need, and the solutions emerge from that. 

When designing a call center dashboard for Genesys, the team identified a fundamental relationship in the way the key performance metrics are presented. They transformed the data into a user-friendly dashboard build around just three key insights, streamlining the display to allow users to monitor and address issues in real time. This approach later became a foundation for how Genesys designs its products.

Genesys supervisor dashboard (2014). Source: Applied Works.

Have a perspective

Over time, the team’s projects and creative experiments added up to experience, creating a sense of identity that is both unique to the studio and informed by the external world. This gives them the confidence to stand their ground when needed, which sometimes means forging an alternate path. 

One of their biggest breaks as a studio came in 2010, when the iPad first came out. At the time, most of the industry was using Adobe Flash for infographics. For accessibility reasons, Applied Works had resisted using Flash in favour of HTML5 and CSS. When the Times got a pre-release version of the first iPad, their existing projects worked natively where many others did not.

The Times iPad data journalism (2010). Source: Applied Works.

By following their own inner guidance rather than an industry fad, Applied Works was positioned to take advantage of a major opportunity when the technology changed. The team was quick to point out that it doesn’t always work out this well, but independent thinking sometimes pays off in unexpected ways.

New technologies

Over and over again, Paul and Joe’s experimental approach positioned them to embrace new technologies as they emerged. They are often approached by people who want something done and aren’t quite sure yet what it is. Starting from an unformed idea, they work collaboratively to shape and co-define the work, and that often leads to new and innovative projects that they might not otherwise have created. 

Although the team has often been among the first to embrace a new technology, they work hard not to be defined (or confined) by it. Technologies are a medium or a tool that they use to achieve better results for their clients, but the process often starts on paper, outside of the constraints and limitations of a screen. 

Instead, the team comes back to core design principles to guide their work. Usability has always been central to what the team does. The term has changed over time, from accessibility and user centered design to usability, human centered design, and now inclusive design. It’s similar for data visualization: the team sees it both as a practice and a tool that’s best applied to a problem, and not necessarily the skill that defines an artist in its own right.

Chatham House resourcetrade.earth (2017). Source: Applied Works.

Regardless of terms or technology, the quality standards remain the same: is the design easy to use? Interesting? Intuitive? Coming back to Joe’s background in motion graphics, does the sequence and hierarchy of information over time make sense? Everybody learns differently, and the team focuses on using a mix of technologies and skills to facilitate core use cases and needs. A recent article on a project about trade flow for Chatham House shows how all of these different pieces work together. 

Embracing change

Across all of the team’s experiences, there is a strong pattern of learning and embracing change. Where there are no precedents, Joe and Paul see opportunities. Learning alongside their clients makes experimenting and trying new approaches a more collaborative way of introducing fresh perspectives.

Applied Work’s content focus has changed over time, shifting with their interests and the industry. Starting out with websites, corporate work and data journalism, they transitioned into data products and design systems as those opportunities emerged. They are now refining their focus again, focusing on enlarging their scope and creating a better future for the planet. 

The team’s process has also changed over the years. In the beginning, they worked mostly from creative briefs. As their experience and expertise grew, they moved into more open-ended engagements based on client trust. Paul likened it to going on a journey together: the ideal situation is when a client has an open-ended idea, and they can sit down and work out how to approach it together. 

They’ve also been working to make their work more scalable, developing a process and a system to support a larger, more distributed team. They’re deliberately creating more opportunities for R&D and making space to explore their personal interests and curiosities to keep the team engaged. Joe in particular is interested to see what happens if they let technology lead the way a bit more, to help them invent what could be. In 2017, the team got the chance to work on chapter artwork for a book about the Netflix series Black Mirror. Taking inspiration from the anthology’s dystopian themes of losing control of technology, the team used creative coding to generate imagery of each episode, relinquishing a certain level of control over the visual aesthetic. 

Inside Black Mirror book (2017). Source: Applied Works.

Looking back

Applied Work’s 20th anniversary has been an opportunity to pause and make sense of the journey the team has taken over the years. This kind of progress usually doesn’t follow a linear path. You can’t draw these connections with a ruler: you can only look back and connect the dots after the fact. The guiding principles above helped the team to navigate the shifting terrain, and to find their way. 

Joe and Paul created a successful studio built around care for their people and their team, their clients and affected audience, and the legacy that they leave behind in the world. They negotiated an ever-changing landscape by optimizing at each point in the process, following their principles and intuition to find the best path.  

Imagining the future

Looking forward, the Applied Works team is excited to help their clients navigate a world that is subject to ever-increasing change. They are interested in partnering with climate and environmentally-minded non-profits, data scientists and academic partners to understand and share their impact, communicate their mission, and design their approach to funding and future research. They hope to go deeper with their clients to articulate the core identity of their organization, to help them see further and ensure the continued success of their work.  

Applied Works 2025

Especially in the area of climate awareness, some of the team’s major clients are already thinking far into the future, asking questions like: “if we do our job properly, in 10 years we won’t need to exist in our current form. What should we do next?” Paul and Joe would like to help them to answer that question. They are also positioned to help facilitate new connections between their clients, creating an exchange of ideas that could lead to more collaborative and impactful work. 

Of course, Applied Works will continue leveraging technology to solve problems and experimenting to push beyond the current limits. They’re excited to shape our technical evolution beyond the screen into a more immersive and experiential virtual environment. Joe recently completed a MSc in geographic data science to expand his skillset for an AI-enabled world. The team is also ready to engage with the many new creative tensions introduced by AI: questions of bias and ethics, where and how we should use AI methods, and the many conversations about profitability and exploitation that this new technology poses. 

Overall, Joe and Paul are looking to help lead the push toward ethical, sustainable progress, both globally and for design. With two decades of experience navigating complex landscapes, they are well-positioned to “work together with clients to take each other into the future.” It will be interesting to see where they go next. 


Get in touch if you are interested in working with Applied Works, or subscribe to Rows and Columns to get updates on what’s happening with the team. They are also accepting applications to their Springboard program to solve big, global problems until Dec 17, 2025.

For more information about the team’s projects and history, see their recent anniversary post on LinkedIn.

CategoriesCommunity

The post Mapping Change: How One Design Studio Navigated 20 Years at the Forefront of a Changing Industry appeared first on Nightingale.

]]>
24395
Review of Stakeholder Whispering by Bill Shander https://nightingaledvs.com/review-of-stakeholder-whispering/ Wed, 17 Sep 2025 14:56:04 +0000 https://dvsnightingstg.wpenginepowered.com/?p=24198 Full disclosure: Bill and I met through the DVS, and have known one another for years. I received an advance copy of his book. I..

The post Review of Stakeholder Whispering by Bill Shander appeared first on Nightingale.

]]>
Full disclosure: Bill and I met through the DVS, and have known one another for years. I received an advance copy of his book. I don’t think that has influenced my opinion, except that knowing Bill makes me even more willing to encourage you to trust his advice. I have always appreciated his warmth, patience, and common sense. He’s a very positive guy who’s focused on making good things the right way. That ethos shows through when working with him, and in the book.

Illustration by Bill Shander

Stakeholder whispering by Bill Shander is an approachable book about why it’s important to solve the right problem, and how you can make sure that you’re doing it. Having worked with many designers over the years, I can say that stakeholder whispering is the hardest part of the job to get right, and often the most important one. The book offers simple, clear advice on how to make sure you’re getting to the bottom of a situation before diving in with solutions. 

It can be very hard to whisper well. Consequences for failure can be severe, but there aren’t a lot of books that focus on just this one aspect of working with a team. This book offers guidance from an expert whisperer on the small things that might trip a new designer up. Reading it is like shadowing a senior designer at work.

Bill brings the reader along at a level that’s gentle enough for a beginner but also valuable for an expert. Written with empathy and a sense of humor, the book feels like a comfortable conversation over tea with a friend, commiserating and sharing tips with someone who has had all of the same struggles and knows what it’s like. At different times, I found myself laughing out loud, grimacing in recognition, and nodding along. I appreciated how Bill used simple, practical examples to demonstrate his points (usually accompanied by a verbal wink, just to make sure we saw what he did there).

Illustration by Bill Shander

What does this have to do with data vis? Everything, really. Helping people push past “I want this chart” and get to a good outcome is a struggle we all face. This book is for anyone who needs to work with multiple stakeholders to help their projects succeed. (It might also be useful for stakeholders who need to work with designers, so that they can understand why we’re asking all these questions.)

Here are some of the topics addressed in the book.

Common painpoints:

  • Pushing back without saying no
  • Stakeholders who dictate solutions or don’t care about their stakeholders (especially the hidden ones)
  • Knowing how & when to lose the battle
  • Breaking a problem down into manageable chunks
  • Switching roles as you moving from problem identification into the design process, and remaining flexible in your approach

What you will learn:

  • Using neuroscience and cognitive behavioral therapy to understand stakeholder dynamics
  • Keeping the focus on the problem, and not making it about you
  • Empathy as a tool to enter the client’s frame of mind, without losing your own
  • Creating a space for not-knowing: encouraging curiosity, even when people think they know what they need
  • How to prepare for a conversation, and how to use what you hear
  • The four components of productive listening: focus, attention, interruption-free, and picking up on nonverbal cues 
  • Switching between the surface ask and deeper structure when solving a problem
  • Listening for holistic understanding, and simplifying without oversimplifying
  • Why finding the right problem might not be enough (and what to try next)
  • What success looks like
  • How to tell whether your stakeholders are open to whispering, and what to do when they’re not

These topics apply everywhere. I think these techniques might matter more for data vis for a few reasons:

  • Stakeholders are less likely to understand the details (of the user task, or the solution)
  • Other designers may not have the technical experience to follow along 
  • Experts may be so frustrated by trying to explain the problem that they won’t even try. When you can use these techniques to demonstrate understanding, you get to the real conversation faster.

As with all experience, the magic happens in knowing how to dance, not in just following the steps. You need to develop a sense of rhythm and an instinct for where these principles apply. That said, experiment. Apply these techniques. They will help.

Illustration by Bill Shander

Question time with Bill!

I had a few questions after reading the book, so I reached out to Bill. He kindly answered them here, to share as part of the review:

This book was focused mainly on what I would call framing the problem: the needs identification step before you get into the design work. Can you talk about why you chose to focus on that part of the process?

The short answer is that I haven’t seen enough people write about or talk about this. It’s the part of the process that is mentioned but rarely explored in detail. In other words, designers (and others) are told they need to do “needs assessment” or “requirements gathering” and “ask questions”, etc. But to me, that’s like saying “make some beef stew” without providing a recipe. Because it’s not so simple. The recipe is the “how”. You need to ask the right questions, in the right way, of the right people, with the right tone, to really figure out what is called for. And that takes either years of hit and miss experience to figure out on your own or you can learn a process and a way of thinking about this that will get you up and running much more quickly. I wanted to provide that to people based on my experience. Oh, and by the way, a key part of all of this is to first just acknowledge the idea that our stakeholders often don’t know what they need. They need our help figuring it out. Once we acknowledge this, we can move on to the “how”.

What do you do when you’re stuck with a stakeholder who can’t be whispered?

As I say in the book, the short answer is that you should find new stakeholders. If your boss, or client, or whoever, won’t engage, then you should find a new boss/client/whoever. Honestly. Life is much more fulfilling when you’re working with people who respect you and engage with you as a thought partner. That being said, there are some techniques to help soften an intransigent stakeholder. For instance, start small. Just ask ONE key question, like “how will we measure success”, which is a very informative question to help you understand true needs pretty quickly. For instance, if your boss says “make a dashboard of our HR data”, but the measure of success is “employee retention goes up”, then you know retention is a key part of that HR data that needs to be the focus, and maybe it will lead to follow-up questions about how that data might help with retention, what other data might affect it, etc. Part of starting small is realizing you have to gain trust to engage with reticent stakeholders, so a short focused meeting with incisive questions will earn you longer and more complete conversations over time.

Designers are often very good at listening, but struggle when it’s time to transition from a position of understanding to become the expert presenting solutions. It can be hard to be seen as an expert when you’re in the role of listener and learner (especially working with an experienced team). Can you talk about ways to avoid this trap?

Expertise is an incredibly valuable thing. If you are new in your career, you may not be perceived as the expert, which makes things harder. But the great news is that you can lean on others’ expertise. Rather than saying to your stakeholders something like “pie charts suck!”, you can say, “we know from research on human visual perception that humans aren’t very good at distinct value comparisons when looking at circular shapes, so a pie chart won’t be as effective for this visual because you really want your audience to compare those two numbers – research also shows that a bar chart will be much more effective here, so I’d recommend that.” When you cite research, that glow of expertise will shine on you and you will gain trust. As you gain more and more trust, you will eventually be perceived as the expert and you will walk in the room with the gravitas and respect you need to engage effectively with any stakeholder!

Interruption free can sometimes be a problem for time management when talking to an expert. Can you share some techniques for using active listening to guide the conversation, as opposed to giving up control?

There is a fine line between active listening (really listening and hearing everything, without jumping constantly to your own thoughts and reactions and perceptions) and simply being someone’s audience, and they’re driving the entire conversation. The difference between the two is a true dialog where you are asking good follow-up questions based on what they’re saying. BUT, the key to doing this well is to NOT be perceived as just listening so you can jump in and respond, which is what most people do, right? (Listen, react…listen, react…) No, you need to truly listen, really hear what they’re saying. What they’re saying will trigger thoughts and reactions in you. Capture that if you need to. And respond with questions. But probably not every thought and question you have needs airing. What are the ones that you really need to address in the context of helping your stakeholder figure out what they really need? This is a gray area and something you can only learn over time and in your context, so this is something I can’t exactly teach, except to suggest you try to find that balance. Simply being reminded that there is a balance to be found will hopefully help you get there in time.

You discuss the importance of building a holistic understanding of the problem, and switching between superficial and deeper concerns. Can you talk about how to interpret what you hear, and how to process that interpretation with stakeholders?

One of the most important initial ideas in Stakeholder Whispering is to acknowledge that we live our lives driven largely by our subconscious. So in the context of work, that plays out in the automated response to all of our work. For instance, in today’s world, what do we do when we want to make “data-driven” decisions? We measure stuff, and then we make a dashboard out of it! This automated response isn’t bad, but it’s just so rote that we don’t always think it through. We need to measure stuff, but which stuff, and how much, for how long? And we need to understand that data, but is a dashboard the answer or might it just be a 5-minute call to review one key metric? It depends. So we have to probe deeper than the automated response. This applies to everything. So to the question, the “superficial” is the initial obvious concern/request/plan. And “deeper” review is literally the entire point of Stakeholder Whispering. Sometimes the superficial initial idea may be all that’s needed. But sometimes it isn’t. Whispering to figure that out is what it’s all about! The way to do it is to ask incisive questions, open your ears with your domain and data expertise, trust your gut about things that you know might be concerns or worth further exploration, and probe those. The book is full of specific techniques to do it, and it’s hard to explain without diving deep. But the short answer is simply to engage what I call “useful paranoia”. Something is always missing or not quite right, so probe it! But that doesn’t mean everything requires a deep rabbit hole. Explore thoughtfully, and know when you’ve done enough to move on to the next concern. This is also something you will develop over time, but hopefully the ideas I share in the book will speed up that process.

For a new researcher, it’s often hard to balance best practices from the quantitative social science research they might have learned in school and design research in a business setting. Concerns about deviating from script, “biasing” responses, etc. are common. To me, it’s always been a matter of incorporating those best practices into a more fluid dance of the conversation. Can you talk more about how you think about that balance?

I think that balance is actually inherent to the Whispering process. Because the way I recommend doing it (and I talk about this in the book) is like therapy. When you go into therapy, and you share your childhood trauma or relationship troubles (or whatever), your therapist doesn’t give you solutions or ask leading questions. They ask intentionally open-ended questions like “how does that make you feel?” The point of therapy is to help you understand what you’re feeling. That’s what Whispering (and research) is about. You ask unbiased questions to be sure your data is pure. Now, in Whispering (as in therapy), sometimes the questions will eventually start to lead the witness a bit. The therapist may eventually say “it seems like you’re getting angry…is that what you’re feeling?” because they are there to guide their patients to some degree, based on their expertise. And in a Whispering session, you may start to ask less open-ended questions as you get a sense of where things are going. You might start with something like “why do you think a dashboard is best for this project?” But later in the conversation, you might ask something like “do you think a report might be more effective since you mentioned that people will be reviewing this on a plane and only 2X per year…maybe a dashboard isn’t the best tool for the job?” It’s OK to get to this point because, as the therapist, using your expertise and experience and active listening, you can help guide your stakeholders to the best decision based on the conversation. You’re not conducting primary research, so the standard does shift a bit from those types of conversations, and that’s the “dance of the conversation”, as you describe it, that you need to get comfortable with.

CategoriesReviews

The post Review of Stakeholder Whispering by Bill Shander appeared first on Nightingale.

]]>
24198
Step 9 in the Data Exploration Journey: Chart Choices https://nightingaledvs.com/step-9-in-the-data-exploration-journey-chart-choices/ Wed, 24 Jul 2024 16:21:00 +0000 https://dvsnightingstg.wpenginepowered.com/?p=21480 Both the purpose and the audience for our charts shifted during that transition, and it was important to think carefully about what we needed the charts to do.

The post Step 9 in the Data Exploration Journey: Chart Choices appeared first on Nightingale.

]]>
This article is part 10 in a series on data exploration. I began this series while serving as the Director of Education for the Data Visualization Society in 2022, because so many people were asking to hear more about data exploration and the process of learning data vis. A list of previous entries can be found at the end of the article. What began as an exploratory project on the “State of the Industry Survey” data grew into a 1.5 year project that produced a 30-page 2023 “Career Portraits” publication (DVS member login required). This series gives an inside view of the project, illustrates my process for approaching a big project, and demonstrates that no “expert” is immune from the challenges and setbacks of learning. Let’s see where this journey takes us!

The last article found us transitioning out of the discovery diamond for the Career Portraits report and into the slow, upward climb of the Build process. In Step 8, we were reworking our data and thinking through the layout and constraints for our final deliverable. As part of that, we needed to think through the data visualization more carefully to decide on a final form for each chart that we included in the report.

Diamond-shaped flowchart illustrating project stages from "Expand/Ideate" to "Deliver/Deploy," highlighting points of maximum risk of overwhelm and exhaustion.

For this article, I’m going to focus on the refinement choices that we made for the visualizations as we moved from early exploration into the build phase for the Career Portraits project. Both the purpose and the audience for our charts shifted during that transition, and it was important to think carefully about what we needed the charts to do.

In the Expand phase, charts support data exploration by surfacing patterns and insights to validate and pursue. The primary audience is usually the data visualizer themselves or an audience that is close enough to the project to understand the limitations and nuances of the rough draft stage. As you move into Build, your audience often broadens and your purpose starts to shift from exploration toward communication. This often requires a change of form, and usually involves a lot more annotation and clarification of the chart insights. A Build audience is often not as close to the data, and may not even be all that familiar with data vis, so you’re also looking to refine the visualization to work for them. 

Considerations for chart design in Expand vs Build:

Expand:

  • Support the data visualizer in thinking through the problem and understanding the dataset
  • Surface patterns and identify interesting insights for further analysis
  • Compare multiple narratives and viewpoints, often in the same chart
  • Work out the main variables of interest and experiment with visual form to see what works (often as quickly as possible)

Build:

  • Optimize your visualizations so that they communicate your findings to a broader audience
  • Highlight the important points to improve legibility and reduce noise
  • Make relevant comparisons so that the viewer will understand your conclusions
  • Focus in on the specific points or comparisons that you want to make

Things to consider when transitioning from Expand into Build:

  • Remove unnecessary data points. Your audience is further from the dataset than you are, and they’re unlikely to understand the nuances of your dataset. Where you see useful context, your audience will usually just see noise. Unless the full dataset is the point of your chart, it’s generally best to remove it. 
  • Use more common charts. Sometimes you really do need a fancy visualization to explain a dataset, but most people struggle to read even basic charts. Get too adventurous, and there is a risk that people won’t even understand the point of your chart after all of your hard work. A complicated visualization increases your “wow” factor, but it may also reduce your audience size and your impact. Make your editorial choices accordingly.
  • Add guideposts to help your viewer understand. Good use of color, line weight, and other visual variables adds hierarchy and context to your chart. Supplementary annotations, captions and text explanations call out important points, clarify your purpose, and allow your viewer to confirm that they understood your point. At this stage, it’s hard to overstate the clarity and emphasis that you can achieve with sophisticated visual design. Take the time to do it well.  
  • Clear visual hierarchy is critical. You can fit many layers of data into the same chart if you establish a clear visual hierarchy. Ideally, this should follow the importance and relevance of information types as someone reads your chart. 

Remove alternate interpretations or ambiguous visuals. A good exploratory visualization often allows you to make multiple comparisons at once. It’s intended for someone who knows how to move between different kinds of comparison, and who can block out the noise to focus on a specific task at hand. This is helpful when you are trying to work out what’s in the dataset, but it’s less helpful when you are trying to make a point. Review your visualization for alternate interpretations, distractions and visual noise, and revise it to clarify and sharpen your point.

Our chart choices reflected a combination of editorial, aesthetic, and practical considerations. We needed charts that supported the comparisons we wanted to make, were interesting to look at for a group interested in visualization, and were simple enough to reduce the time-consuming manual work required to create charts for publication. We also wanted to support specific questions that our audience might have. Fortunately, this project allowed us the opportunity to experiment with several different forms for the data during the early exploration, so we had a good sense of our options going into Build. 

We ended up splitting the report into two sections: the first looked at comparisons across careers, for those trying to decide what kind of vis they wanted to do. The second focused on comparing different variables within a career, for those who cared more about career advancement or understanding themselves in relation to their chosen field. In the first section, we kept the complexity of comparative vis across career types, and supplemented it with text labels and other visual details for readability. In the career section, we chose simpler visualizations (mostly bar charts) that broke out individual variables within that career for more focused exploration. Most of the charts discussed below are from the first section, and include comparison between careers and against the general population.

In my earliest data explorations, I almost always just use default charts in whatever tool I’m using (Excel, in this case). A grouped bar chart helped me compare the number of people in each career area experiencing different frustrations in the State of the Industry Survey data.

Bar chart titled "Top Frustrations" showing the count of frustrations faced by Analysts, Designers, Developers, and Engineers in various categories like accessing data, data volume, and lack of time.

This was useful in terms of understanding raw counts, and the data itself calls out the difference in sample size within the survey data: there are many fewer engineers than analysts! Within that, I can see the peak values for each group, and I can compare values across groups pretty easily. When I tried to identify more interesting patterns, though, I found that I was losing the sense of each career as a whole in this chart. 

A different grouped bar chart helped me to see the careers as individual entities, but the lookup task to read the legend makes it really hard to get much insight about the specific frustrations or to make comparisons between career groups.

Bar chart titled "Top Frustrations" with separate bars for each role (Analyst, Designer, Developer, Engineer) showing counts for categories like accessing data, data volume, lack of collaboration, and technical limitations of tools.

A radar plot overlapped the dataset values on the same axes, and made it easier to compare values for my four series along each one.

Radar chart titled "Top Frustrations" comparing the counts of frustrations faced by different roles (Analyst, Designer, Developer, Engineer) in categories such as accessing data, low data literacy, and lack of design expertise.

The default chart settings were pretty terrible for readability; it was hard to follow the axis lines, some of the data was occluded by other series, and in general there wasn’t much hierarchy within the chart. A few simple changes improved legibility significantly.

Radar chart titled "Common Frustrations" highlighting the percentage of respondents from different roles (Analyst, Designer, Developer, Engineer) experiencing various frustrations like accessing data, information overload, and lack of mentorship.

Reducing the opacity of the area fill prevented occlusion. A strong solid line kept the outlines strong and helped to make the chart colors more readable. Adding axis lines clarified where people should look to make their value comparisons, and also directed the eye out of the center of the chart toward the axis labels for easier lookup. Text hierarchy and annotations made the content of the chart clearer. I also changed the radial axis metric from raw counts to percentages for better comparison between groups. This choice erased the difference in sample size from the chart, but it allowed for better comparison between reported experience for the different groups.

This may be the first time that I’ve ever chosen to use a radar plot for a published graphic. I don’t usually find them to be particularly readable, especially for categorical comparisons. To me, the data areas tend to read as connected shapes rather than data points, but in this case that connection was precisely what I was struggling to see in the grouped bar chart.  Unfortunately, once my brain summarizes the data series into a shape, that shape becomes the strongest identifier in the chart, and I start comparing the details of one shape vs another. It’s very hard to counteract that tendency, even when I know that the shape I see is nothing more than an artifact caused by the order of my axes. 

A quick sorting experiment shows how strong those effects are. In the first two charts, the axis order is relatively arbitrary. In the third, I sorted based on value for the Analyst group. This creates a clear pattern for the Analysts, but it also means that everyone else is implicitly compared against that standard.

Three radar charts titled "Top Frustrations" for different roles (Analyst, Designer, Developer, Engineer), each illustrating the count of respondents facing various frustrations such as accessing data, lack of design expertise, and lack of time.

I find these strong axis-ordering effects quite distracting, and I often feel that they obscure real signals in the data. In general, I tend not to use radar charts for this kind of data for that reason. The grouped bar chart is more flexible and gives a less biasing view in most situations, but that advantage becomes a weakness when there are so many comparison points in the chart. As it becomes more and more difficult to integrate across the bar groups, a radar plot can removes the cognitive burden of grouping the series—as long as you can ignore the ordering artifacts. 

With the visual cleanup and proper context, this chart worked well to support early discussions with the community about their frustrations working in this field. For those conversations, we wanted people to react to and think about alternate interpretations for the data, so the more ambiguous visualization worked well. In a guided discussion, we could emphasize the weaknesses and support clear interpretation, and the idiosyncrasies of the radar chart made an interesting discussion point. For a report meant to be read independently, I felt that these ambiguities and artifacts reduced the value of the chart. 

The heatmap is another popular alternative. Instead of a spatial axis, this chart uses value to compare the different counts, simplifying down to a 2-dimensional display of career vs frustration and using color to encode the metric values. 

This works well for identifying outliers—it’s pretty effortless to identify the darkest and the lightest square. It is pretty terrible for comparing relative values, especially with so many colors in the chart. I could have binned this down to a high, medium, and low color to emphasize patterns, but that suppresses a lot of the variation that we were looking to expose. We did use the heatmap in our initial report as a way of identifying extremes within the survey population. In that case, the lack of resolution in the data points was fitting because we had not yet determined our full statistical relevance of our data and could not provide appropriate context for evaluating small differences. For the final report, we wanted to capture more of the richness in the dataset when comparing frustrations between groups.

We did keep a heatmap for the Barriers to Entry data in the final report, because that was a situation where large variations and differences in pattern between career area were more important than the smaller details. We included an Overall column in the graphic to allow comparison of each career area against the field as a whole, and we supplemented the chart with value annotations so that people could read rather than guess at the chart values. Adding in the n values as part of the series definition also helped to clarify the context in situations where a small sample size might be skewing results.

Heatmap displaying challenges faced by different roles (Analysts, Designers, Developers, Engineers) including time/balance, support, skills/training, and finding a job/pay.

For the frustrations analysis, we essentially “unrolled” the radar plot back onto cartesian axes rather than using a radial plot. The radar area became a line, which still has continuity to facilitate comparison between values but creates less confusion with an identity encoding (“the spiky shape” vs “the highest point”).

Line chart comparing frustrations and issues facing data visualization among different roles (Analyst, Designer, Developer, Engineer) for categories like lack of time, accessing data, and low data visualization literacy.

We introduced the overall population distribution as a secondary layer in this chart also, and used that value as a sorting index for the frustration categories. This way, there was a clear rule for sorting the categories that provided information about the dataset, but it didn’t require one career to become the default comparison for all of the others.

We encoded this context layer as a more subtle background bar chart, and superimposed the career series lines on top of it. Their bold weight and more interesting color keeps the focus on the career areas and pushes the contextual information to the back. The reference values for the overall population are still readily available if someone wants to make that comparison, but they don’t interfere with reading the primary data. The different visual forms (bar vs line) helped to separate the different levels of aggregation (population vs individual career), and the bars avoided clutter by not adding another line to an already-crowded chart. Again, the legend reports counts for each series so that the reader can identify population size differences that are suppressed when we define the value axis as a percentage.

For other comparisons, we mixed the 2D grid of the histogram with a line chart. Here, we wanted to emphasize how different careers used specific techniques. We chose to use size instead of color for our metric encoding, partly because it emphasized the sort order and relative differences within each column. These differences will always be difficult to read in an area encoding, so we supplied data values and let the visual form act as a reference to reinforce the organization of the chart.

Line chart comparing the preferred types of data visualizations among different roles (Analysts, Designers, Developers, Engineers) such as bar charts, line charts, and scatterplots.

The resulting “bump” charts have a column for each career, sorted in terms of the relative percentage for that career. It’s easy to see that bar charts are the top chart for every career except designers and engineers, who tend to use line charts more instead. The number annotations identify that the gap is indistinguishable (real numerically, but within rounding error for the annotations and definitely within margin of error for the analysis) for the Designer group, and more pronounced for the Engineers.

If the exact size of the usage was more important than the ranking (or to de-emphasize differences smaller than our margin of error), we could have collapsed multiple bubbles into one for values that were so close, so that they looked more similar in the chart. That would have been a bit more accurate but a lot more work. In the end, we decided that this version would be ok since the ranking values were clear. 

There is an axis-ordering artifact in this chart, also: if you follow a particular line in the chart, you can see that infographics are pretty far down the list for most career areas, but they jump up to third place for designers and down to 12th for engineers. The fact that Designers are next to Developers makes the jump less drastic than it would be if they were right next to engineers. Here, I felt that this artifact was less distracting than in the radar plots. 

The bump charts are probably the least familiar visualization of the set, but we thought that they did a good job of highlighting where specific techniques, methods, or audiences were different for one career vs the others. I would have loved to play with these in an interactive context, because I think there are lots of things you could do to improve readability and reduce the flaws of the chart if the user could temporarily select a method (or set of methods) of interest. This visualization could also easily accept a highlight series, if there were one particular series or method that we wanted to spotlight. Our report was intended for static or print distribution, so we ended up keeping the style flat to allow the viewer to make their own comparisons based on their interests.

For the individual career reports, we used simpler visualizations that illustrated differences within a single variable for that specific career. These visualizations were much simpler to make and to read, and supported a more focused, narrated experience for the individual careers.

Collection of bar charts showing salary distribution, size of organization, and sector of respondents (Engineers), alongside a bar chart of commonly used tools such as Python, React, and Tableau.

We also needed to be thoughtful about balancing off-the-shelf and custom graphics to keep our deliverable scope reasonable and our project on time. The report ended up using a mix of simple and custom charts, based on where we saw added value for the narrative in going a little bit outside of the box. The heatmaps and radar plots were fast and fairly simple to make with off-the-shelf software. They only needed minor visual cleanup to improve readability and style. The superimposed bar and line charts were created separately and then manually overlaid; those took an extra step in the processing, but it was quick to do. Everything about the bump chart was manual, and those were some of the most time-expensive charts that we put into the report.

In terms of chart selection, I probably wouldn’t push the edges this far for a general audience. Even within a specialist group, there was some confusion about what the bump charts meant and how to read them. In our case, we wanted to use a variety of charts to emphasize specific points within the dataset, and we felt that we could afford to challenge this audience a bit. We also felt that some chart variety and novelty was important to keep things more dynamic, so that the report would be fun to read for a group who already knows a lot about vis. 

All of these choices were part of a dynamic decision-making process throughout the Build phase, informed by both the chart purpose, audience, and task, and by the practical considerations of what was possible and easy to build with the technologies we had. We considered many alternate forms and more advanced comparisons that didn’t make the cut, and we identified many charts that would make interesting standalone projects (perhaps in a more interactive medium) for another day. Hopefully, the end product provided a little interest with sufficient clarity for the group we were intending to serve.

Previous articles in this series:
Embrace the Challenge to Beat Imposter Syndrome
Step 1 in the Data Exploration Journey: Getting to Know Your Data
Step 2 in the Data Exploration Journey: Going Deeper into the Analysis
Step 3 in the Data Exploration Journey: Productive Tangents
Step 4 in the Data Exploration Journey: Knowing When to Stop
Step 5 in the Data Exploration Journey: Collaborate to Accelerate
Step 6 in the Data Exploration Journey: Cut to Realistic Scope
Step 7 in the Data Exploration Journey: Spin Off Projects
Step 8 in the Data Exploration Journey: Build

Related links:
Early Sketches for Career Portraits in Data Visualization, by Jenn Schilling
DVS Careers in Data Visualization, YouTube Playlist for interview series by Amanda Makulec and Elijah Meeks
Career Portraits project (DVS Member space login required)

The post Step 9 in the Data Exploration Journey: Chart Choices appeared first on Nightingale.

]]>
21480
Step 8 in the Data Exploration Journey: Build https://nightingaledvs.com/step-8-in-the-data-exploration-journey-build/ Thu, 29 Feb 2024 17:07:33 +0000 https://dvsnightingstg.wpenginepowered.com/?p=20103 This article is part 9 in a series on data exploration, and the common struggles that we all face when trying to learn something new...

The post Step 8 in the Data Exploration Journey: Build appeared first on Nightingale.

]]>
This article is part 9 in a series on data exploration, and the common struggles that we all face when trying to learn something new. A list of previous entries can be found at the end of the article. I began this series while serving as the Director of Education for the Data Visualization Society in 2022, because so many people were asking to hear more about data exploration and the process of learning data vis. What began as an exploratory project on the “State of the Industry Survey” data grew into a 1.5 year project that produced a 30-page 2023 “Career Portraits” publication (DVS member login required). This series gives an inside view of the project, illustrates my process for approaching a big project, and demonstrates that no “expert” is immune from the challenges and setbacks of learning. Let’s see where this journey takes us!

Where we last left off, my early discovery project on the DVS State of the Industry Survey data had morphed into what would become the Career Portraits initiative for the DVS. In Step 6, we got serious about reducing scope in the focus phase for the project, and in Step 7 we talked about how the right cuts can actually inspire new growth and expose new opportunities for collaboration. Now, it’s time to come back to the core project and start the long, uphill climb of the Build stage. 

A diagram illustrating the different phases of a project's lifecycle, represented in a diamond shape divided into four segments. Each segment is labeled with a different phase of the project: "Expand/Ideate" - This first segment, colored in purple, represents the beginning phase where ideas are generated and the project scope is expanded. It is marked with the note "Max risk of overwhelm," indicating that this is the stage where one might feel overwhelmed by the possibilities. "Focus/Consolidate" - The second segment, also in purple, follows the ideation phase and involves narrowing down ideas and focusing on the most viable ones. "Build/Produce" - The third segment, colored in blue, is where the actual product or project is built or produced. "Deliver/Deploy" - The final segment in blue represents the delivery or deployment phase of the project, marked with the word "Success!" A vertical axis labeled "Size of Project" shows an upward direction indicating growth or expansion. At the bottom of the axis are the words "Question, interest, or idea," and at the top, the phrase "Plan for what to do and how to tackle (at least some of it)" is written, suggesting that as the project progresses, a more concrete plan is formed. A black dot is placed at the intersection of the "Focus/Consolidate" and "Build/Produce" phases with an arrow pointing to it marked with "You are here," suggesting that the viewer is currently at the transition point between these two phases. The arrow also has a note "Max risk of exhaustion," warning of the potential for burnout at this stage. The diagram serves as a conceptual roadmap for project management, indicating that the project is halfway through its lifecycle and cautioning against the common risks associated with each stage.

The framework for this series is my version of the double-diamond model for design, first popularized by the British Design Council in 2005. The first diamond begins with the Expand stage, where all ideas are on the table. It’s all about discovery, innovation, and sketching ideas quickly. It’s fast, sometimes sloppy, and it’s usually skin deep. 

The Focus phase is the narrowing half of the first diamond. You step back and look at all the options, choose your core project direction, and focus on the specific tasks that need to be done. The Build phase at the start of the second diamond is where we shift into low gear and do the hard work of building the real thing. This is where craftsmanship and slow, deliberate effort really start to shine. It’s where discipline, hard work, and the pursuit of perfection come into play. At the end of the Focus phase you should have a clear plan; Build is where you execute.

Of course, in reality a project oscillates back and forth between expand and build phases throughout its life cycle, but the prevailing goals of the Build phase are ensuring accuracy and optimal quality and making sure that we get our project to the finish line. 

A humorous representation of the emotional rollercoaster often experienced during a project or creative process, depicted as a line graph with peaks and valleys. Each peak or valley of the graph is labeled with a phrase that characterizes a typical stage of the emotional journey: "Start. So excited!" - Initial enthusiasm at the beginning of the project. "Too many ideas!" - Overwhelm with the plethora of possibilities. "This is the one!" - Finding a great idea to focus on. "Really need to focus." - Recognizing the need to concentrate. "Crazy new idea!" - Distraction with a new concept. "Ok, time to rein it in." - Deciding to limit the scope and focus. "Looking good; let's build it!" - Optimism upon starting the actual work. "It works! Let’s release!" - Success in creating a working model or prototype. "Hmm. Well, that’s not going to work." - Encountering a setback. "I think that does it?" - Tentative solution to the problem. "Oops. Forgot this other thing." - Realizing something was overlooked. "Maybe this time?" - Attempting a fix or solution. "Seems to work..." - Uncertain success. "So many options!" - Facing many possible paths forward. "Huh. So that’s what I was trying to do." - A moment of clarity. "Looking good..." - Building confidence in the project. "Yay! We made it!" - Celebration upon completing the project. "See? That wasn’t so hard..." - Reflection on the process, often with irony. The colors of the graph alternate between shades of purple and blue, and each label is connected to a point on the graph by a line. The graph illustrates the ups and downs of a project from inception to completion with a light-hearted take on the challenges and breakthroughs along the way.

For some people, the Build phase is pure joy. It’s a time to work your technical muscles and make clear progress toward defined objectives. This is where your achiever side shines. You get to check things off of your to-do list from the Focus phase, and you should always be making visible progress toward your goals. 

For other people, Build is a painful and boring slog. Coming back to our mountain climbing metaphor from a previous article, at the start of the second diamond you are standing at the foot of the mountain and you can’t even see the top. In that moment, you might panic and decide that it was more fun to plan the trip than it is to climb the actual mountain. You might be tempted to just turn around and go home. Or maybe you’re really excited when you first start the climb, but then your muscles start to hurt in the first 5 minutes and you wonder if maybe you’re just not cut out for this. If this is you, that’s ok! You might decide to spend your time in Expand instead, or you might experiment a bit to see how you can make Build work better for you. 

If you talk to people who actually climb mountains, they will probably tell you that planning is the least rewarding part of the experience: there is no replacement for actually being there. Yes, it’s hard work, but they’ve hooked into other rewards that make the effort part of the joy. They will tell you that you need a good plan that’s matched to your strength and endurance, along with the courage to start and a willingness to embrace the discomfort that is part of every climb. After that, you just take one step after another, lean into the effort, and keep going until it is done. It will be hard, and you will be tired. You will fight your own resistance along the way. You will need to push through all of that to get to the other side. There’s no shame in turning back – sometimes it’s the wisest choice, especially if you’ve misjudged your skill level – but this is your mountain and you will need to climb it if you want to get to the top. 

It’s important to realize that the work often gets easier as you go, because the view and sense of achievement start to pull you along. You hurt less as your muscles warm up and the endorphins kick in, and you start to find a rhythm in the work. The people who excel at Build are the ones who have enough confidence in the journey to get through that initial discomfort, knowing that there is something worthwhile on the other side. They might even enjoy the hard work and find pleasure in the challenge of pushing their own limits. Skating over lots of ideas and imagining what we could do might be really fun, but it is the effort and reward of creation that really lights Builders up. 

Neither the Expand nor the Build phase is better or worse than the other. They require different skills, and appeal to different people. Sometimes a single person is able to master and enjoy both phases, but most people lean more toward one or the other. If you’re a natural at one and struggle with the other, that’s pretty normal. 

Switching metaphors, it doesn’t really matter which hand you prefer to write with, as long as you end up with a similar result. Still, most people will choose one hand over the other for certain tasks. There are few truly ambidextrous people out there, and most of them put in significant, conscious effort to train their second hand before it can be useful for precise tasks. Expand and Build are just different strengths.

If Build isn’t your thing, have patience with yourself, and try to enjoy the challenge of learning something new. You may never get to a point where you are equally strong in both phases, but time and practice will help you to become more comfortable working in the one that comes less naturally to you. 

Things you need in the build stage

Focus, and a plan.
Coming out of the first diamond, you should have a clear plan for what you need to accomplish. Know what needs to be done, and put all of your energy into doing it.

A realistic sense of your own capabilities and strengths.
You would be crazy to summit Kilimanjaro without training, and you shouldn’t expect to be creating masterpieces out of the gate in data vis, either. Know your skillset, and choose the project that pushes at the level where you truly are, not where you wish to be. If you’re in over your head, the safest and smartest thing is to turn back, or find someone who can help.

Discipline.
This is the time to do things right. No shortcuts, no “I’ll come back to this later.” You don’t want to leave a bunch of holes in your final deliverable. This is where you clean up and resolve all of the things you left dangling in the Expand phase. For many, this is the hardest part of Build; you can’t defer the things you’d prefer not to do any longer. If you haven’t been cleaning up loose ends all along, now is the time to deal with them. Think how much better you’ll feel when they are resolved!

A commitment to excellence and craftsmanship.
Build is where you take the time to do your very best work. If you are a natural Builder, your inner perfectionist may be feeling traumatized by the fast and loose approach we took in the Expand phase: this is where that part of you can retake control and really shine. Just make sure to keep things positive, and not self-defeating – it was your job to explore during Expand. A bunch of loose sketches and semi-realistic ideas is actually what perfection means in the ideation stage! This is where you take those half-finished sketches and rework them into something real. If your Builder is feeling completely exasperated with your Expander, that might be a sign that you’ve overcommitted or that you’re not setting realistic goals. This is a good time to check in and head back into Focus if necessary.

The ability to say no.
There will be times where you’ll be tempted to go “just a little bit further” or add one more thing. This is your judgment call, but it’s important to keep focused on your goals for the project and only add what you need. If you tire yourself out on the side trails, you may not make it to the peak. If you get to the end early and still have energy, you can always go back and explore on the way back down. Compulsively adding too many things during Build is a common reason that people burn out and don’t make it to the end of the project. It may also be a sign that you’re jumping back into Expand too readily, and not sticking it out with Build. 

Enough time.
The build work should start as soon as possible; don’t leave it all to the end or push yourself up against a deadline. Nobody does their best work when they’re under the gun: that creates prime conditions for your inner perfectionist to freak out and melt down rather than help. If you forgot to account for the time it actually takes to do the work, then you should go back to Focus and revise your scope accordingly. If you over-committed, now is the time to admit it and beg forgiveness. This is where you earn your own trust. You can insist on just powering through no matter what, but know that it will be harder to let your vigilance down enough to succeed in the Expand phase next time if you do.

Self-knowledge, and compassion.
You should be pushing to your limits in Build: this is where you achieve creative growth. Doing that without injury requires that you know your strengths and what you need to get through a difficult challenge, and that you know when to stop. If you find yourself always spinning out or pathologically avoiding the Build phase, a habit of pushing yourself to injury is probably why: some part of you knows that it’s not safe, and it doesn’t trust you to go there. 

Energy breaks.
You’re at maximum risk of exhaustion in the build stage. Remember to stop and do something else from time to time. I like to have a different project in Expand while I’m working on Build so that I can switch gears and do some ideation or sketching when the Build work gets hard (without adding to my current scope!). This is a great place to ideate on Part II of your project, so that you’ll have some ideas ready when this one is done…just be sure that doesn’t turn into increasing your scope for Part I. Switching into Expand mode on a subtask also helps to break things up. Working in different phases is often more effective for me as a “rest” period than taking a complete break: I can rest the mental muscles that are tired, without having to leave flow. 


Remember that the Build phase should be a constant conversation between final deliverable and process. It’s really important to take your time here. Work through the problem again all the way from the beginning, taking advantage of all the things you learned in Explore. You may need to step back into Expand or Focus for a little while as you refine the detailed picture of where you’re trying to go. 

Try to get through Build without making major changes to your plan. “Easy come, easy go” is an Expand mentality. It shouldn’t be necessary to throw everything out and start over at this point. As you get deeper into Build, you’re putting in work that’s going to be painful to toss out. Instead of scrapping everything when the going gets tough, lean into it, focus on the goal, and push through. That said, you should expect your picture to shift a little as you get more information about what is (and isn’t) possible, and as you tie up all those loose ends. If you find yourself tempted to run for the exits, that’s probably a signal that you’re overdoing it. Take a rest and re-evaluate, or escape into Expand for a while and then come back.

Now that we’re acquainted with the Build phase in general, let’s take a look at what was happening in the Career portraits project in this stage. 

Content Creation

Clean up the data.
In Focus, we decided on the final variables to use and comparisons that we wanted to make. For Build, we needed to recalculate our values, adjust units, and re-aggregate several of our analyses to allow slightly different comparisons in the final document. We spent a lot of time going back and forth over whether to show counts or percentages, and when to show both. We also checked (and re-checked) our code and our results to make sure that our numbers made sense. 

Get clearer on the details.
We knew what we wanted to show in the data vis, but we still needed to choose the specific visualizations and define our tools and approach for creating them. We did a brief Expand phase to review sketches for different types of visualizations to evaluate for strengths and weaknesses and feasibility, and then focused back down quickly to the ones we knew we could build in the time that we had.

Production

Think through the format of your final deliverable.
We knew we were creating a digital pdf, but we still had to think through the page size, layout specifics and implications for font sizes, labels, chart formatting, etc. All of these affected the details of our chart and page designs, and set some hard limits on how detailed the visualizations could be.

Pick your technology(ies).
Jenn (my collaborator) and I had a mix of different skills between us, and we didn’t want wrangling with code to slow us down. In the end, we built most of the visualizations in R and then annotated them in Illustrator. Some charts were used directly from R, others were calculated in R and re-built in Illustrator from scratch, and still others were made in Figma. In some cases, D3 or another solution would have been a lot better as an end-to-end tool for production, but for a one-off print publication it was much faster for us to build from where we were and with what we knew well. This hybrid approach required additional manual work and cleanup, but it gave us more control over the final formatting than we could get easily from code. 

Build out the charts.
Once the chart types and general layout were selected, we still needed to build the visualizations and calculate the actual values for our analyses. This required another round of code edits in R, and some additional exploration about how to export and edit the charts once we were done. Jenn was able to do a lot with base chart theming in R, but there were still some visualizations that required manual work. It turns out that R can’t export editable text labels, so anything we changed had to be re-typed in Illustrator by hand. The labels alone took more than 30 hours of work to clean up. We understood that cost up front and we accepted it, because for this project it was the simplest way to get to our goal.

Generalize and clean up the code.
As we worked through the final details of the analysis, there were several opportunities to go back and restructure the code to make it more consistent. This helped to make it clearer and more robust, and it also cleaned up a couple of minor calculation errors that we might not otherwise have caught. We chose to do this step even though we weren’t planning to productionalize the analysis, because we wanted our code to be readable for use in future projects and we wanted to make sure that we caught any mistakes or bugs in the data.  

Finalize your narrative.
We had a pretty good sense of the overall discussion we were interested in and the metrics we wanted to show, but you can’t actually finalize your narrative until you’re sure that the data is solid. We re-wrote most of the supporting narrative and revised our document structure more than once as the data calculations completed, to be sure that our comments and the data details were aligned.  

Identify new analyses or content, and prune as needed.
As our picture of the report became clearer, we realized that there were some additional metrics that we wanted to include. We considered the full set of options and did another round of Focus to finish things up. Some of these were larger than we could afford, so we put those off for another day. We also removed things that no longer fit. We’d planned to include an analysis comparing answers from independent visualizers and those employed in organizations, but when we got into the details, the branching structure of the survey and distribution of responses made it hard to compare those populations in a meaningful way. It would have been possible to rework the analysis to include that comparison, but it would have meant going back to square one. Reluctantly, we added this to the list of follow up projects that we could return to later. 

Assemble the final document.
Once the analysis, text, figures and other content were complete, they needed to be assembled into a single document for publication. This process alone took a couple of months and went through several rounds of revision. We used InDesign for the layout and imported the images from files to support the many, many edits and refinements required as we worked through micro edits for the final doc.


There are a lot of pieces involved in Build, and it can be difficult to navigate all of the loose ends and get to something that you’re proud of. The flip side is that you get to see the work develop and grow into its final shape, and there is a lot of pleasure in creating a solution that feels well-resolved. In the next article, we’ll take a closer look at some of the individual choices going on in this stage of the project, and how practical and editorial choices came together to shape the final document.

Previous articles in this series:

Embrace the Challenge to Beat Imposter Syndrome
Step 1 in the Data Exploration Journey: Getting to Know Your Data
Step 2 in the Data Exploration Journey: Going Deeper into the Analysis
Step 3 in the Data Exploration Journey: Productive Tangents
Step 4 in the Data Exploration Journey: Knowing When to Stop
Step 5 in the Data Exploration Journey: Collaborate to Accelerate
Step 6 in the Data Exploration Journey: Cut to Realistic Scope
Step 7 in the Data Exploration Journey: Spin Off Projects

Related links:

Early Sketches for Career Portraits in Data Visualization, by Jenn Schilling
DVS Careers in Data Visualization, YouTube Playlist for interview series by Amanda Makulec and Elijah Meeks
Career Portraits project (DVS Member space login required)

CategoriesHow To

The post Step 8 in the Data Exploration Journey: Build appeared first on Nightingale.

]]>
20103
Step 7 in the Data Exploration Journey: Spin-Off Projects https://nightingaledvs.com/data-exploration-spin-off-projects/ Tue, 24 Oct 2023 16:01:09 +0000 https://dvsnightingstg.wpenginepowered.com/?p=18896 With large projects, it's common to pursue spin-off ideas for the material that doesn't fit into the core project. Here are two examples.

The post Step 7 in the Data Exploration Journey: Spin-Off Projects appeared first on Nightingale.

]]>
This article is part 8 in a series on data exploration, and the common struggles that we all face when trying to learn something new. A list of previous entries can be found at the end of the article. I began this series while serving as the Director of Education for the Data Visualization Society, because so many people were asking to hear more about the process of data exploration and analysis. What began as an exploratory project on the “State of the Industry Survey” data grew into a 1.5-year Career Portraits project that produced the 2023 “Career Paths in Data Visualization” report (DVS member login required). This series illustrates how I approach a new project, and the fact that no “expert” is immune from the challenges and setbacks of learning. Let’s see where this journey takes us!

In the last article, Jenn Schilling and I refocused my initial data exploration to frame a broader Career Portraits project based on the DVS “State of the Industry Survey” data. We trimmed our scope aggressively to reflect the time and resources that we had on hand, and re-envisioned some core parts of the project. At the end of that focus-and-consolidate phase, we had a clear, tight focus for the project. 

Successfully navigating the focus phase has many advantages. First, it allows you to put your energy into the most important things. It also creates the seeds for lots of new ideas and projects that can spin off of the core work; very often, almost everything you cut can be considered a future upgrade or a new project in its own right. 

If your time and resources are fixed, focusing a project can mean postponing parts that you’re excited about. (I suspect that this is why most people struggle to make the cuts.) In many cases, this also creates an opportunity to share the work or to structure it in new ways. The end of the focus phase is the perfect time to start looking for collaborations that can help to move your project ahead. The guidelines from our previous article on collaboration still apply! In the case of spin-off projects, it’s particularly important to remember that a collaboration adds back to the scope that you just reduced. You need to account for that effort, and should never use spin-offs simply to avoid making cuts.

When working on initiative-level collaboration for something as large as the Career Portraits, it’s important to: 

  • Set clear boundaries between projects. Everyone needs to know what they’re working on, who’s doing what, how it’s different from what others are doing, and what’s needed for it all to come together. Mixed messages lead to missed goals, duplicated work and frustration.
  • Look for common goals and mutual wins. The best collaborators have an interest that is slightly outside of your scope, but whose needs align with yours. We’d worked on aspects of both projects below during our initial Career Portraits work, but pursuing them as collaborations generated significant contributions that supported and extended the core work beyond what we could have done alone. 
  • Work to align timelines in advance. In some cases, collaboration like this creates dependencies. It’s important to be clear about when you need things done, and to be willing to flex if the schedule doesn’t work out as you hoped. 
  • Be realistic about what you can take on. Collaborations take a lot of work and require support to succeed. Collaboration is not delegation, and you need to be available to fully participate in any project that you spin off. If you can’t realistically support it from start to finish, don’t start.  

As Jenn and I started the heads-down build phase for the core Career Portraits work, I was able to identify and spin off two projects in collaboration with other teams: 

Collaboration #1: Career Interviews Series 

The first project that we spun off was a series of career interviews with people working in data viz. I knew from the beginning that I wanted to include qualitative stories alongside the quantitative data for the Career Portraits, to give the data a more human face and to illustrate how much variation there is within even a few of the individual “data points” (a.k.a. responses) from the survey. It’s always important to check your insights against reality whenever you are building a data story, and connecting with people from the community was one way to help us do that. While Jenn began re-working the core data analysis in December 2021, I started a series of research interviews with people working in the field. I put out a call for participants in the newsletter, on Slack and in a couple of articles, and we got a core set of interviews scheduled in January to start the research. 

YouTube video cover with a title and photo of Elijah Meeks
Caption: Cover page for the Careers in Data Visualization interview series, hosted by Elijah Meeks

Around March of 2022, Amanda Makulec started conversations with Elijah Meeks about hosting a series of career conversations to spotlight paths into data viz. This aligned well with the work that I was already doing for the Career Portraits project, so we joined efforts and I worked with Elijah to brainstorm some questions and visualizations to inform his series. Amanda organized the calls to be released over the summer, and Josephine Dru and I compiled transcripts for each one as they came out. My early research calls had given us a good sense of where we wanted the project to go, so I was also able to compile a pre-survey for all participants to take. The questions were similar to the ones we were working on in the Career Portraits project, but they went into more detail and depth on a few points that we wanted to learn more about. Because we were working in a short format with a patient and supportive audience, we were able to ask much more focused (and sometimes repetitive) questions than we would publish in the general survey. As the results came in, I visualized the data and wrote a series of summary profiles for each interview, creating the “Career Profiles” section of the final report.

Adding profile interviews to the newly-reduced scope of the “Career Paths in Data Visualization” report more than doubled the work and required pushing our initial deadline back by a couple of months, but it gave the final project a much richer view into career paths in data viz. If we hadn’t cut deeply during the focus phase for the core project, we wouldn’t have had the time budget to take advantage of this opportunity when it arose. Collaborating on the profile interviews made both projects stronger, and made the lift much smaller than if the Career Portraits team had tried to do it all alone. 

Collaboration #2: Automated Tagging of Free-Text Job Titles

The second project we spun off was much larger in scope. Jenn had completed a quick clustering analysis when she first joined the Career Portraits project to look for trends in the kinds of tools needed across different careers. By looking at where specific titles intersected or crossed over between career areas, we thought we’d be able to pull out a lot more detail about specific roles. Our early results were very intriguing, but we quickly realized that this analysis was a project all on its own, and it wasn’t realistic to pursue it as part of the core Career Portraits work. Quite reluctantly, we put the clustering work aside so that Jenn and I could focus on re-running her initial core careers analysis on the updated 2021 dataset, so that we were using data from the most recent survey collection year. 

Graph diagram showing 7 color-coded clusters with connected nodes.
Clustering analysis from Lukas’ thesis project

In April of 2022, a graduate student named Lukas Geisseler posted a query in the DVS Slack looking for organizational partners for his master’s thesis in Applied Information and Data Science at Lucerne University in Switzerland. His program required a project with a well-defined topic that would contribute to an organization. I reached out in response, and we started discussing whether he might be able to use the survey dataset to support his thesis project. The clustering analysis was where our discussions began, but we soon settled on a much larger task that would be a fantastic addition to our dataset and a better candidate for the depth and scope required of a master’s thesis. To understand the importance of his contribution, it’s helpful to know a little bit more about the limitations of the survey dataset.

There are two columns related to careers in the survey dataset. The first one is a fixed career category that users select from a dropdown (analyst, designer, etc.), and the second is a free-text entry field for job title, where people input their actual job title. When we first started the Career Portraits project, I manually (and inconsistently) tagged a couple thousand survey responses to compare the fixed bins to the job titles, and found many interesting threads to pursue. My exploration was fast and dirty, but it gave me a better sense of the dataset, and it pointed out some important aspects of the data that helped us to contextualize the results for our core Career Portraits work. 

In the fixed career category question, people often categorize themselves differently than you’d expect based on their job title: someone categorizes themselves as an analyst in the fixed careers list but their title is data visualization developer, or an engineer has the title of data visualization UI designer. This in itself is a fascinating statement about job searching and careers in general, but it’s particularly relevant in a career where roles tend to be highly variable between companies, and are often conglomerates of multiple roles and responsibilities. 

The definition of the fixed categories themselves also posed some challenges. First, it wasn’t clear when someone should switch from calling themselves an “analyst” to “leadership”: if your title is director of analysis, where do you put yourself? Some people listed themselves as “leadership” when they got to a team lead or director level position, and some were still listing themselves as “analysts” when their title stated VP or CEO. Second, it could be hard to understand the definition of some of the buckets: Do data scientists belong in the analyst, developer, or scientist bucket? Representatives show up in all three! We knew that these response variations would muddy our results (for instance, including a VP’s salary will almost certainly skew the median salary for a career), but there wasn’t a good way for us to consistently and efficiently tag the titles with the resources and the project team that we had. Using the free-text question could have helped to clarify some of these more complex cases, but we reluctantly chose to rely only on the fixed buckets for our project because they were the simplest to use and the most direct reflection of the user’s input.

The free-text job title question also contains a lot of implicit information about job seniority, role progression, and experience. It would be very interesting to compare job seniority level by title (junior analyst, senior analyst, director of analysis, etc.) with years of experience in the base survey dataset. For example, it would be interesting to compare the range of years of professional experience typical for a junior vs a senior role. Unfortunately, we couldn’t easily extract the job level information from the free-text entries without more advanced methods. In order to focus on our core deliverable, we made several painful cuts and put these more nuanced analyses aside, hoping to come back to them another day. 

We didn’t know it at the time, but we didn’t have long to wait. Auto-tagging free text responses was exactly the kind of problem that Lukas was interested in studying, and he developed a pipeline to analyze the job titles as part of his thesis project. The full pipeline includes creation of neural networks, machine learning to train the algorithm, and graph representation to help interpret and quality-check the dataset.

Once built, this pipeline automated the analysis of the job titles data, removing a tedious and manual task that is hard to do consistently over large datasets. However, the initial process of training the algorithm required that Lukas manually validate his machine learning tags. He published an early version of his results in the DVS Slack with a call for participation, and some dedicated community members helped him to error-check and validate the initial coding results, lightening the load on one of the more time-consuming parts of his thesis work. This validation process also helped Lukas to fine tune his approach, and made the final outputs more robust. Once the initial text tagging was validated, he carried out a clustering analysis to look at how jobs were distributed within the results, and used the resulting graph to detect patterns and intriguing details in the dataset.

Lukas’ thesis was submitted in June of 2023. In all, he contributed more than 500 hours of advanced data science and programming time to the DVS. This is far more than we’d expect from a typical volunteer, but he was able to benefit from our dataset, experience and use case as a core part of his thesis, and we certainly benefited from the results of this in-depth academic project. Lukas has already tagged the 2020 and 2021 datasets, and the outputs of his model can be used to tag future datasets as they are collected. Leveraging the algorithm’s consistency and speed will also allow us to enrich datasets from previous years. This enrichment might support more advanced longitudinal analyses in future, if we can overcome the complexities of changing definitions and survey context from year to year. Lukas’ analysis also created a more robust version of the initial clustering that Jenn had worked on in the core project. Producing similar results with a completely different method added confidence to some trends, and highlighted some potential differences or artifacts in others. Comparing the initial results from the two analyses helped us to identify many interesting stories within the dataset, and to outline potential next steps to pursue. 

What makes a spin off project work?

There were several ingredients working together that made these spin-off projects successful. Here are some things to look for when evaluating a side project:

Opportunities to provide value to both sides

For Elijah’s interview series, we were able to help brainstorm questions and prep materials, and we folded the results of the interviews back into our work to give them additional impact. When the portraits were published, the interviewees got a document that they can share to highlight their work. We got support from Amanda in scheduling and running the calls, the benefit of Elijah’s standing in the community and his perspective on what’s interesting to talk about, additional material to support the Career Portraits effort, and a collection of generous insights from the people that he interviewed. 

In Lukas’ thesis project, we were able to provide an interesting dataset and a tangible problem, as well as some basic explorations to accelerate the start of his work. It’s nice to work on a thesis project that will continue to be valuable after the work itself is complete, and a tangible outcome can help to make an academic thesis more approachable to potential employers. We got almost a year of focused, highly specialized work that enriched our dataset and will help us to continue creating value for the DVS community. 

Clear separation between projects

While both projects contributed directly to the Career Portraits work, neither was required for it to succeed. We wanted the projects to feed and encourage one another, but not to introduce unnecessary pressure or risks. The projects were structured in a way that allowed others to take ownership and carry an initiative forward without a lot of input from the core Career Portraits team, but we also set up regular communication between the projects to learn from one another, seek additional opportunities for alignment between projects, and to highlight the contribution and impact of each team. 

A critical aspect for planning was to remove or reduce timeline dependencies between projects, so that if one project fell behind or changed direction it wouldn’t break the others. We did need to complete the interview series before the report could be published, but the profiles were written and visualized independently from the analysis work that I was doing with Jenn. I handed off a completed document at the end of my board tenure in January of 2023, ensuring that the core project reached completion before it changed hands. We didn’t include Lukas’ results in the core document because we knew that his project wouldn’t finish until at least six months later, but his results will help to extend and inform the clustering analyses that we’d started before he joined. Having seen his preliminary results, his work could even become the seed for Career Portraits V2, if the DVS decides to pursue that project in future.

Honor each contribution 

Each collaboration is a significant commitment of effort and time. It’s important to honor each person’s contribution to the bigger effort, and to take the time to make sure that work is rewarded. There is a difference between collaboration (where both sides contribute) and delegation (where one side assigns work to someone else and expects an outcome). I find that people often confuse the two. To be a good collaborator is to commit to doing work to raise your collaborators up, even if it’s outside of the focus of your core project. For this reason, you need to be very careful about assessing your own ability to support the work involved in a side collaboration. The focus stage of a project is necessary to evaluate whether you have the time budget to do that successfully. 

In the profile interviews series, our commitment took the shape of additional preparation for the interviews and nearly doubling the scope to the Career Portrait deliverable. For the thesis work, I chose to continue working with Lukas past my tenure as DVS Education Director, to make sure that he had the support he needed to see his project through to the end of his thesis work. Collaboration is a giving economy, and it’s important that you commit to your collaborators as deeply as you ask them to contribute to you. 

In the end, both of these collaborations were highly successful, and I believe that they created significant value to the organization. Both leveraged the early groundwork that Jenn and I had done, but each project took things to a completely different level and contributed far beyond what we were able to do on our own. Because we were disciplined about the focus phase of our project, we were able to identify and act on these opportunities when they came up, allowing us to collaborate in ways that we couldn’t have anticipated at the time we made the cuts. 

You won’t always be able to spin projects off immediately with these kinds of results, but a disciplined, focused approach to project management helps to ensure that you’ll be ready to jump on opportunities when they come. Returning later to the things you cut means that you’ll always have another project ready if your initial inspiration runs dry, or your project hits a wall. 

The core Career Portraits project was published this summer in the DVS member space (member login required). We’ll continue discussing the actual project build in the next article!

Previous articles in this series:

Embrace the Challenge to Beat Imposter Syndrome
Step 1 in the Data Exploration Journey: Getting to Know Your Data
Step 2 in the Data Exploration Journey: Going Deeper into the Analysis
Step 3 in the Data Exploration Journey: Productive Tangents
Step 4 in the Data Exploration Journey: Knowing When to Stop
Step 5 in the Data Exploration Journey: Collaborate to Accelerate 
Step 6 in the Data Exploration Journey: Cut to Realistic Scope

Related links:

Early Sketches for Career Portraits in Data Visualization, by Jenn Schilling
DVS Careers in Data Visualization, YouTube Playlist for interview series by Amanda Makulec and Elijah Meeks
Career Portraits project (DVS Member space login required)

CategoriesHow To

The post Step 7 in the Data Exploration Journey: Spin-Off Projects appeared first on Nightingale.

]]>
18896
Step 6 in the Data Exploration Journey: Cut to Realistic Scope https://nightingaledvs.com/data-exploration-cut-to-realistic-scope/ Mon, 21 Aug 2023 12:43:12 +0000 https://dvsnightingstg.wpenginepowered.com/?p=18311 A project to find insights from DVS's State of the Industry Survey data moves from the analysis stages to preparation for build out.

The post Step 6 in the Data Exploration Journey: Cut to Realistic Scope appeared first on Nightingale.

]]>
This article is part 7 in a series on data exploration, and the common struggles that we all face when trying to learn something new. A list of previous entries can be found at the end of the article. I began this series while serving as the Director of Education for the Data Visualization Society, because so many people were asking to hear more about the process of data exploration and analysis. What began as an exploratory project on the “State of the Industry Survey” data grew into a 1.5-year Career Portraits project that produced the 2023 “Career Paths in Data Visualization” report (DVS member login required). This series illustrates how I approach a new project, and the fact that no “expert” is immune from the challenges and setbacks of learning. Let’s see where this journey takes us!

In the previous installment in the series, Jenn Schilling and I had formed a new collaboration to rework my exploratory analysis of the DVS State of the Industry Survey data. After our initial introductory conversations, we needed to get down into the narrowest part of the focus phase: redefining our scope and deliverables in the context of the new project. Once you’ve opened up all of the possibilities and peeked down the many avenues for exploration, there comes a time where you need to pause and decide what you are actually going to build. At this stage in the project lifecycle, your focus should be on getting to the absolute essentials of what needs to happen so that you have a clear path to getting things done.

A lot of people struggle with this part of the process because it requires you to be very disciplined about giving things up and letting go. That can feel harsh or even scary at times. It’s natural to get attached to your ideas and your project, and it can be very hard to leave behind ideas that you’ve invested in. This is one of the best reasons that you should try to keep the initial exploration light and fast. You need to remain flexible enough to pivot before entering the build stage, and it’s a lot easier to make that call if you remember that you’re “just sketching” in the early stages, and avoid becoming too deeply invested in a particular outcome. 

I like to compare the focus stage to pruning plants in the garden. It feels cruel at the time, but a good pruning inspires vigorous new growth, and it’s often necessary to keep a plant healthy and happy. If you prefer the traveling analogy that I used in my previous articles, this is the stage where you lighten your pack before beginning that last, hardest climb. Either way, it’s a good opportunity to trim away the things that aren’t working and to make deliberate choices about how to move forward into the build stage. 

“I like to compare the focus stage to pruning plants in the garden. It feels cruel at the time, but a good pruning inspires vigorous new growth, and it’s often necessary to keep a plant healthy and happy.”

Of course, everyone is different. Some people are very uncomfortable when ideating and prefer to have a clear plan at all times. For them, getting to the focus stage can feel like a real relief. If you find yourself saying that you “don’t have any good ideas,” it may be that you are too good at the focus stage and need to spend some time ideating and playing with your ideas before jumping straight into focus mode. If you’re someone who tends to overcommit and feel overwhelmed by the size of a project, then taking a moment to put things down can be a lifesaver, because it gives you the space to keep your project from spiraling out of control. It’s worth remembering that there are lots of ways to react to the same situation. Try experimenting with unfamiliar approaches to get around your blocks. 

Personally, I don’t like cutting things back, but I do like to focus, and I realize that pruning is crucial to success. It’s really exciting to feel the scope click into place, and it can be such a relief to cut through the noise and end up with a clear plan. Focusing on that desired outcome helps me to push through the cuts that I find harder to make. 

It often helps to distance myself from the project a little bit before beginning the focus stage. At this point, I need to let things take on a life of their own, rather than pushing for that one thing I thought I would make. Every project has an identity and a best expression in the world that is informed by its circumstances, and those may be very different from where you thought you’d end up. If you’ve learned things during the journey, it is natural for your plans to adapt and change accordingly. You can always come back to your initial ideas in another project. Your role at this stage is to give this project its best chance to reach its full potential. Assess where things are, cut anything that no longer helps, and focus on how you can best get to done.

Here’s a checklist for getting through this crucial stage:  

  • Take an honest look at what’s possible—and what’s not. Be realistic about what you can do with the time and the resources that you have. 
  • Re-focus your scope. You started out with one vision in your head, but you’ve learned some things since then. Trim anything that no longer fits.
  • Discard what you don’t need. If you can’t carry it to the finish line, don’t put it in your pack. Be ruthless about what’s really needed, and what can be left behind. 
  • Be prepared to start over. More often than not, I find that the best way to focus your project is to scrap your initial sketches and start over. You have a different perspective now than you did when starting out, and that can help you to put the pieces together in a completely different way. This is especially true if you’re doing any kind of data analysis; that early work harbors hidden mistakes that will trip you up later, so it’s best to start from the beginning and work it through again. 
  • Explore new possibilities. When you’re clear on where you’re going, you’ll find new things that you need to do to get you there. This opens up new opportunities and new things to learn. (For those who are not comfortable with the focus stage, identifying new horizons to focus on can be a huge help.)
  • Prepare to commit. This next stage is all about doing things thoroughly and right. Sketching and ideating can be lots of fun, but the build phase requires you to get busy and roll up your sleeves. The first diamond is focused on exploration and speed, but the second depends on quality, craftsmanship, and discipline to make your project the best that it can be.  

What this looked like in the survey project 

As a reminder, this series began with an exploratory analysis of the DVS State of the Industry Survey to understand the tools that people use in different data visualization careers. As I talked to others at the DVS, my initial project was slowly morphing into something different. I was eager to dig in and follow up on a bunch of interesting leads from the tools data, but the survey contains many questions across all areas of data visualization practice, and we thought it might be more useful to map out a general picture of career paths as a first step, and then come back to the details of tool sets used in different careers later. The new project had a much larger scope that wouldn’t have been possible with my limited R skills and manual Excel manipulation. It was really the addition of Jenn’s skillset to the project that allowed us to make that choice. 

“We needed to set some realistic goals … switching from the ideation phase to the build phase of the project.”

One of the first things Jenn and I needed to do in our new collaboration was to set some realistic goals for our joint project outcomes, switching from the ideation phase to the build phase of the project. We knew that the new project would require a very different analysis of the dataset, so we began by doing a mini-exploration to identify the variables that we had available in the survey, the key questions that we thought we could answer, and to look at some initial values in the dataset. We mapped out the different options, focusing on the variables and analyses that we felt would best support the careers overview we wanted.

Collection of digital post-its in a Mural board, grouped into categories to reflect different sections of the report.
Topic board for planning the structure of the Career Portraits report, showing the individual survey questions and how they inform questions that early career professionals might have when entering the field.

With a clearer idea of the goals and variable space in hand, Jenn and I walked through my odd mish-mash of R code and Excel files to be sure that we understood what we were trying to do, and then we scrapped it all and started fresh. Keeping the old code would only have slowed us down, and it would almost certainly have introduced mistakes. It was a huge learning opportunity for me to see how someone who knew what they were doing in R would approach the same problem. I’d struggled for weeks trying to wrangle the data into the form that I needed (or even to understand exactly what it was that I needed). It turns out that partial pivots and sequential group-by operations are hard to invent from scratch, even though they are very obvious once you know what you’re doing. From the perspective of my learning, the decision to throw out all of my initial work and start over was hands down the most useful part of the entire project. I got to see how an expert approached the problem, and it helped me figure out which fundamentals I’d been missing when trying to learn the software on my own. 

“From the perspective of my learning, the decision to throw out all of my initial work and start over was hands down the most useful part of the entire project.”

As we began re-working the code, we also started to get into deeper detail with the dataset. One of the key things I’d glossed over in the initial exploration was an analysis of statistical relevance in the survey results. If there is one mistake that I see most often from inexperienced folks, it is getting excited about a “story” in the data before checking to be sure that the trend you see is reasonably likely to be real. I wanted a sense of the size and types of variation in the data before worrying too much about the interpretation, but we needed to get our feet back on solid ground before we could go any further. Getting ahead of yourself here risks having your whole project fall apart when the analysis doesn’t hold up. 

The basic test of statistical relevance is this: “Is the difference I see in my data big enough to be meaningful beyond the measurement noise?” If you would see the same difference when randomly sampling different subsets of the dataset or when measuring a different population, then you’ve got a winner. If the measurement vanishes when you sample differently, then it’s likely just an artifact of your method. Without a real statistical analysis, I had no way of knowing whether a 10% variation between two variables was likely to mean something, or whether it was just a blip in the dataset.

In our particular example, this check was complicated by the fact that the survey has multiple branches, so that different people see slightly different versions of the survey with more focused questions depending on their career area. For example, if a respondent answers that they are an employee, they will see questions about their organization size and other items that don’t make sense if they say they are an independent contractor. For us, that meant that there were multiple respondent populations that needed to be evaluated separately. 

Before choosing a statistical method, we looked at the simple response counts across the different career areas and branches in the dataset for each survey question that we wanted to use in our analysis. This gave us a sense of how many respondents we had for each sub-group in the dataset. We flagged anything in red that had counts too low to be valuable, and this assessment reinforced our decision to focus only on the employee branch and a subset of careers. That meant that our Career Portraits would be based mainly on people working as employees and less on those who identified as freelancers, and it would eliminate some careers from our first publication.

This was a difficult decision to make (we wanted everybody to be represented!), but we didn’t think that it made sense to try to compare across the populations and question structures given how varied the response counts were for the different populations. Measuring against fewer than 20 responses for one career group and 700 for another would create large differences in the quality of information reported, and we felt that some of the smaller populations would be better served with separate analyses later on.

Excel table with 13+ questions broken out over 7+ careers, with individual count values for each one.
Excel table of response-counts analysis for the different questions and branches involved in the Career Portraits work. Each cell represents the number of respondents for a particular cut of the data. Blue questions are asked of everyone, purple are only asked of specific branches, and cells highlighted in red are ones where we felt that the counts were too low to support robust analysis or insightful comparisons.

Our survey is a broad-spectrum instrument, designed to give a holistic overview of practices in the field. It isn’t a formal research project, and it’s not intended or structured to answer deep, research-level questions. We also weren’t trying to publish a final statement on universal practices in data vis. For that context, it didn’t really make sense to try to pin down a specific statistical error bar for each comparison that we wanted to make, and we didn’t think that doing so would really help people to interpret the final results. Instead, we focused on a general margin-of-error calculation across all questions. This indicated that a 10% difference was the rough threshold for significant variation, and that gave us a rule-of-thumb guideline for interpreting the comparisons that we wanted to make. 

Next, we created a list of all the new analyses that we would need to support our reduced focus. I did some manual analyses in Excel to get through the initial explorations for those items and started creating an outline for our final report. Jenn took a stab at moving the tools analysis one step further ahead. With her data science background, she was able to add a clustering analysis to the dataset, resolving one of the items that I knew I needed but didn’t have enough technical knowledge to complete. About a month in, we swapped projects: I dug deeper into her clustering work, and she switched focus to work on the main project, whipping through a huge set of analyses in a few weeks that probably would have taken me a year to figure out. 

“There were all kinds of interesting things to tease out of the tools analysis, but we ended up leaving out most of it when we realized how much time it would take to fill in the rest of the report.”

Based on the new analysis and the data coverage considerations, we made much more specific decisions about what to include. We looked at the number of people who answered a question, the amount of variation that we saw in the results, and the relevance of those results to our new project focus. There were all kinds of interesting things to tease out of the tools analysis, but we ended up leaving out most of it when we realized how much time it would take to fill in the rest of the report. 

In the end, we discarded almost everything in my initial tools exploration in favor of the new focus we’d chosen. I don’t consider that to be a problem or a loss, and I don’t think that the initial project was wasted effort. We did what we needed to do to understand the dataset, and then we made tough choices based on where we wanted to go. Rather than being discouraged or disappointed by the outcome, I was excited to start fresh on a new project, and grateful for the early explorations that led us there.

In my experience, this is the most common outcome of an early exploration and refocusing stage: If your exploration is effective, then you almost always end up reframing the question that you set out to ask. (That’s why you do the exploration…it’s the whole point!) Collaborating with Jenn opened up possibilities that I didn’t have when working alone, and connecting with people in the #topics-in-dataviz channel on the DVS Slack and other forums helped us to understand what would be most useful to folks in our community. In the end, we decided on a broader overview rather than a deep dive on tools because we thought it would have the most relevance for the people we wanted to serve. I hope that the tools analysis work will come back as part of another project someday, but even if it doesn’t, my meandering journey gave us what we needed to frame the Career Portraits work, and I count that as a win. 


Previous articles in this series:

Embrace the Challenge to Beat Imposter Syndrome
Step 1 in the Data Exploration Journey: Getting to Know Your Data
Step 2 in the Data Exploration Journey: Going Deeper into the Analysis
Step 3 in the Data Exploration Journey: Productive Tangents
Step 4 in the Data Exploration Journey: Knowing When to Stop
Step 5 in the Data Exploration Journey: Collaborate to Accelerate 

The post Step 6 in the Data Exploration Journey: Cut to Realistic Scope appeared first on Nightingale.

]]>
18311
Step 5 in the Data Exploration Journey: The Magic of Collaboration https://nightingaledvs.com/data-exploration-collaboration/ Mon, 17 Apr 2023 13:55:38 +0000 https://dvsnightingstg.wpenginepowered.com/?p=16780 A good collaborator offers companionship, a fresh perspective, and can help balance skillsets. Here's how to make a collaboration successful.

The post Step 5 in the Data Exploration Journey: The Magic of Collaboration appeared first on Nightingale.

]]>
This article is part VI in a series on data exploration, and the common struggles that we all face when trying to learn something new. A list of previous entries can be found at the end of the article. I began this series while serving as the Director of Education for the Data Visualization Society, because so many people were asking to hear more about data exploration and the process of learning data vis. The series follows along with an exploratory project on the State of the Industry Survey data. It illustrates how I approach a new project, and the fact that no “expert” is immune from the challenges and setbacks of learning. In addition to working with a new dataset, I also used this project to take my first steps toward learning R. Let’s see where this journey takes us!

After several articles about data exploration and the “expand” stage of the survey data tools analysis project, the last installment identified the first steps of transitioning into the focus phase. After a big, unwieldy exploration process, focus is a time to cut things down, to work on the immediate task at hand, and to move forward on a single, manageable piece with a much tighter scope. 

Image of the double diamond model which shows the steps in tackling design projects. The leftmost point of the first diamond is "question, interest or idea." The designer then travels through the "expand/ideate" phase to reach the top of the first diamond, which is "max risk of overwhelm." Then, the designer travels down from that top point in the "focus/consolidate phase," which is the phase in which this column about collaboration is most relevant. The next phases (in the second diamond) are in light gray font, suggesting that those steps are yet to come, but they include "building/producing" and "deliver/deploying").

This is also a great time to look for a collaborator, especially if there are core skills that you don’t have. Some people are game for an open-ended collaboration, but I find that most people respond best when you have a clear idea of what you are trying to do. The end of the focus phase is a great time to make the ask. It’s a lot easier for someone to understand whether they’re interested and to set clear expectations for your collaboration when a project is already well-defined.

Why collaborate

A good collaborator offers companionship and a fresh perspective, and can help balance your skills. They might like something you don’t, or think in ways that feel foreign to you. Or, maybe you’re 100% in sync and it’s just nice to work with someone who likes to work the same way you do. There are many reasons to collaborate: 

  • Opportunity to learn, and to share skills. For me, the best collaborations happen when each person brings a different toolset to the project, and when everyone has something to learn. Most projects move a lot faster with a mix of skillsets; look for teammates whose technical, professional, or project management strengths balance your weaknesses, and vice versa. 
  • Spend time with people you like, doing things that interest you. Collaboration can be a great excuse to get together and work toward a common goal. Sometimes it’s worth cooking up a project with a friend for just this purpose. Other times, the project comes first and the people follow; either way, it’s a great way to get to know people or deepen an existing relationship (as long as your expectations are aligned!). 
  • Meet people with similar interests. Sometimes, an interesting project can be a way to start a new relationship with a stranger or someone that you don’t know well. Especially if you’re an introvert, working together on a project can make it a little less intimidating to get to know someone new. 
  • Support a common vision or cause. Some collaborations form around an idea, a principle, or an organization. There’s magic in people coming together for the sake of getting things done. 

Like any relationship, there are as many ways to collaborate as there are people, and every collaboration will be different. To me, an ideal partnership is one where both parties feel like the other one is doing all the hard work. That’s a sign that there is a good match in expectations and contribution effort, and a strong balance of skills.

Finding a collaborator

Lots of people struggle with finding the right collaborator. Fortunately, organizations like the Data Visualization Society (DVS) provide a great place to connect with other people, and many even provide structured activities and events to encourage collaboration and pairing up. The following tips might help to turn those opportunities into something more concrete:

  • Be visible. People can’t want to work with you if they don’t know that you exist. Lurking might help you feel close to the people who are speaking, but they have no way of even knowing that you’re there. Make sure that you’re actively participating in shared spaces to make it easier for others to reach out. 
  • Build relationships first. Sometimes you find the perfect person just as you start up a new project, but most collaborations happen between people who are already at least acquainted, if not friends. Collaboration is a lot of commitment, and it’s easier to work with someone you know well and can trust to keep up their side of the bargain. Keep this in mind, and try to connect and establish relationships with people before you ask to collaborate with them.
  • Focus on similar interests and approaches, or the unique aspects of your work. There are lots of personalities and styles out there; you want someone who matches your mindset and commitment level, or whose work complements yours (either in style or skillset). Never underestimate how important basic compatibility will be for ensuring that your collaboration makes it to completion. 
  • Take advantage of formal structures. There are tons of mentoring, collaboration and networking opportunities, and many are organized specifically around the idea of finding collaborators. Conferences, in-person meetups and online communities are all good ways of finding people to work with. Joining a committee or volunteering for a project are also great ways to meet folks with a common goal. Challenges, competitions and awards events are a great way to get your name out there and show people what you can do.
  • Think about how you present yourself. Whatever your method of searching, it’s important to pay attention to how you show up. Other people may not know you or your work; what can you do or say to help them see that you’re a good match? Is there a quick and memorable way to help them connect with your work?
  • Expect it to take time. Good collaborations very rarely spring up overnight. You should work on building (and maintaining) relationships constantly, because you never know when something will come up. If you wait until you want to start a project to think about meeting people, you might find that it takes quite a while to find just the right partner. 
  • You’ll probably have to ask first. Multiple times. The world is a busy place, and there are always a million opportunities competing for our attention. Even when you find someone who’s a great match, it may not be a good time or the right project for them. If you treat “no” as a failure or a reason to get discouraged, you probably won’t keep at it long enough to find the right person. Don’t play cool and detached on this one; put yourself out there and show that you’re open to collaborate. Making it easier for other people to find you will increase your chances for success. 

Asking for a collaboration

Let’s say you’ve identified a collaborator and you’re super excited, but you’re not sure how to approach them about it. What should you do? 

  • Be honest about your expectations, and your limits. A clear ask is often a critical ingredient in a good collaboration. Be up front about what you do and do not want from this person, and be thoughtful about outlining your own contributions and time. Honesty at the beginning helps you both to decide if this is likely to work out. 
  • Identify a reason that you want to work together. Remember that most people have a lot of opportunities for projects and collaboration. What’s interesting about this one, specifically? Help them to understand what they would bring that you need, and why you are interested in working with them
  • Agree to specific goals or outcomes for your project. It’s always a good idea to set some specific objectives at the beginning, to make your expectations more concrete. Approaching someone in the focus stage of the project means that you have a lot more clarity around what you’re trying to do, and makes it easier for someone to tell if they can (and want to) help. 

Pitfalls to avoid

There are also many ways that a collaboration can go wrong. Most of these come down to mismatched expectations, whether that’s around work ethic, project goals and timelines, degree of involvement, or the specific tasks that people take on. Before you embark on a collaborative venture, it is a good idea to pause and talk with your collaborator(s) explicitly about what your collaboration is not. It’s easy to focus on what you will do in a new relationship, but it’s equally important to set expectations for what you’re looking to avoid.

You don’t need to overdo it with rigid rules, but even a brief conversation can help to set the stage and provide a starting point for clear communication later on. Here are a few things that might trip you up:

  • Don’t make others responsible for your project. A lot of people look for collaboration buddies to help them “stay accountable” or to “keep them motivated.” This is usually a huge warning sign that you are not taking ownership for the success of your own project and management of your own needs. Shared projects are great for motivation, but finding a collaborator does not mean outsourcing the hard stuff to someone else, or counting on them to carry you through. 
  • Collaboration is not a way to get free help or work from a professional. Lots of people know how to do what you want to get done, and they probably get paid to do it. Unless there is a clear professional benefit to working with you, don’t expect someone to sign on for something where they’d usually get paid.
  • This is not a fun-only zone. Most collaboration involves real work, and there are going to be times when you don’t feel like doing it. In some cases, a project might even involve disagreement and conflict, especially when you’re getting things set up. It won’t always be easy and it won’t always be fun, but a good collaborator shows up—even when it’s hard. If you or your collaborator are only conditionally committed (“I’ll do it if I feel like it or if it works for me”), make sure that you are clear about that up front and that you’re both ok with it before you proceed. 

Focusing on the survey project

After going through the focusing exercises for my survey data project, it became really clear that I could benefit from a collaborator who knew something about R. (Pro tip: any “focus” list that begins with “learn a new piece of software from scratch” is more than a bit suspect.) Fortunately, working on a project for the Data Visualization Society gave me an opportunity to find a collaborator who had exactly the skills that I lacked, and who was interested in working on a project that would help to move the organization along. When I started publishing articles and talking about the work that we were doing, Jenn Schilling followed up about joining my committee to help out. Jenn came into the project with extensive R knowledge, and was interested in exploring the survey data and getting more involved with DVS.

What we each brought to the project:

  • Erica:
    • Early data exploration, experience working with data and familiarity with the DVS survey, connection to other DVS initiatives.
    • Strategic guidance on which analyses to pursue, curation of final project outcomes and goals. Focus, brainstorming, and general process support.
    • Advanced design software skills for cleaning up and presenting final work
  • Jenn:
    • Advanced R knowledge and quick data shaping skills.
    • Deep knowledge of data analysis and handling methods, which opened up new opportunities for how we think about the analysis and goals.
    • Advanced visualization skills in R for creating complex visualizations quickly.

Benefits of collaboration:

  • To the project:
    • Faster analysis, deeper insights. Because we had more advanced R skills on the team, we were able to go a lot farther in extracting insights from the data than I ever would have been on my own. 
    • Second opinion on tough calls. There were several times where we had to make tough decisions about what to cut and what to keep, or how to visualize the data. Having a second set of eyes and another expert to discuss with made it easier to choose the right approach. 
    • Tighter focus. It’s easy to convince yourself to go off on a crazy tangent, but harder to spend someone else’s time on something that might not pan out. That extra accountability helped us to keep things tightly focused and on track, even when there were lots of interesting side paths that we could have explored. 
  • To Erica: 
    • Learning R by example. It is always helpful to learn code by example. Watching the analysis develop live (and spending hours taking it apart and trying to replicate and understand it myself) taught me a lot about how to approach problems in this language. I learned far more and much faster than if I’d relied on tutorials alone. This also meant that the project progress wasn’t handicapped by my coding speed or ability to understand the language.
    • Fun to work with someone. It’s always nice when you are in a collaboration that just works. It’s really fun to balance the workload with someone else, and to see things develop faster as a result. We developed a really good cadence for handing work back and forth, and it kept us focused on delivery and getting things done. 
  • To Jenn:
    • Learning design by example. I really enjoy collaboration because everyone thinks and approaches data in slightly different ways, and I learned a lot from the way Erica created prototypes of visualizations and iterated through the design of the report. I have more experience generating dashboards and one-off visualizations than with design and comprehensive reports. So, I benefited from experiencing Erica’s design skills as we worked together.
    • Fun to work with and learn from someone. Our complementary skillsets taught me a lot—I learned more about working with design software and the design process overall from Erica. As Erica mentions above, we had a great rhythm for our work as we passed ideas and work back and forth. It was also a great opportunity to get to know each other and develop a friendship through working together. 
    • Increased involvement in DVS. I wrote my first Nightingale article as a result of our collaboration, and our collaboration will result in the first Career Portraits report! I also got to meet and interact with other members of the DVS leadership and community through our collaboration. It’s been great to get more involved with the organization!

Hopefully we’ve convinced you to give collaboration a try. Check out upcoming DVS events for opportunities to connect…you might make a new friend! 


Previous articles in this series:

Embrace the Challenge to Beat Imposter Syndrome
Step 1 in the Data Exploration Journey: Getting to Know Your Data
Step 2 in the Data Exploration Journey: Going Deeper into the Analysis
Step 3 in the Data Exploration Journey: Productive Tangents
Step 4 in the Data Exploration Journey: Knowing When to Stop

The post Step 5 in the Data Exploration Journey: The Magic of Collaboration appeared first on Nightingale.

]]>
16780
Step 4 in the Data Exploration Journey: Knowing When to Stop https://nightingaledvs.com/step-4-in-the-data-exploration-journey-knowing-when-to-stop/ Thu, 08 Dec 2022 14:00:00 +0000 https://dvsnightingstg.wpenginepowered.com/?p=14254 This article is part V in a series on data exploration, and the common struggles that we all face when trying to learn something new...

The post Step 4 in the Data Exploration Journey: Knowing When to Stop appeared first on Nightingale.

]]>
This article is part V in a series on data exploration, and the common struggles that we all face when trying to learn something new. A list of previous entries can be found at the end of the article. I’m exploring the tools data from the State of the Industry Survey, to illustrate both how I approach a new project, and the fact that no “expert” is immune from the challenges and setbacks of learning. In addition to working with a new dataset, I am also using this project to take my first steps toward learning R. Let’s see where this journey takes us!


The previous article left us wandering around in productive tangents, exploring all of the remote corners of a problem before settling in to get to work. This is a good, productive space, but if it goes on too long, it will hurt your focus, your motivation, and your project. 

Someone asked me recently how I know when to stop. I had to laugh, because I’m not at all sure that I do. (In fact, I would say that there are several signals in my life that indicate that I don’t! Whether you consider this to be a good or a bad thing is a matter of taste.) Ultimately, setting the limits for your exploration is a a combination of experience, trusting your instincts, and knowing how to manage your timelines and energy levels. The amount of exploration that I do is always a balance between my interest, motivation, the time available for my project, and the core destination that I’m trying to reach. 
After several articles in the expand phase for this project, now it’s time to start moving into the focus phase instead. This phase is all about pruning back, cutting down, and simplifying to get a clear picture of where you want to go.

Knowing when to stop

Precisely when and how to stop is possibly the hardest question in any exploratory adventure. You never know what will turn out to be wasted effort, or what amazing discoveries lie just around the next bend in the road. Here are some heuristics that I use to decide when it’s time to pull back or narrow down: 

  • When it looks like a dead end. If you’re pretty sure that a tangent is leading nowhere, there is little reason to pursue it. Pick one that looks promising instead.
  • When you’ve understood what you needed to see. At this stage of a project, I am focused on identifying the structure of the problem and the core elements of the data rather than finding a particular answer or solution. Clinging to a need for completeness will only bog you down in the early discovery period. This is about sketching, not finalizing a masterpiece. If you’ve gotten what you need, move on. 
  • When it starts to feel overwhelming. This is usually a sign that your energy budget is running low. You need to refocus, change modes of exploration to exercise a different “muscle,” or take a break.
  • When you start to lose focus. If I’m losing track of why I got into this thing in the first place, then it’s probably time to pull out. This can be a hard one to call, because there’s a fine line between losing focus and stepping out of your rut to see things in a different way. My personality tends to lean more toward discipline, focus, and clear goals, so I often make a conscious choice to indulge here and encourage myself to stretch into a less familiar space (for a short time, with a clear stopping point). If I’m losing my connection to the core purpose of the problem, then it’s a good time to re-evaluate.
  • When the threads start to dissipate, rather than converge. Judging this one is tricky, and it’s ultimately a matter of intuition and experience. I will follow a lot of tangents for a step or two, but if it’s clear that they’re heading off into the wilderness and not toward my core goals, I’ll step back and redefine. 

Letting go

Sometimes, the hardest part of stopping is that you don’t really want to let go. If fear of stopping is your challenge, it can be helpful to remember a few things:

  • You can always come back. You’ve taken good notes and left a trail, there’s no reason that you can’t pick this up again later. Let go of the false urgency that demands that things must be done right now. 
  • If you’ve learned something, no effort is wasted. Sometimes we get so attached to the time we’ve put into a project that we feel like we can’t walk away. This is so common that there’s a name for it: the sunk costs fallacy. There are shelves full of books on decision making that talk about how fear of cutting losses leads people to make bad decisions. You don’t need to fall into that trap. Take what you’ve learned, accept what you’ve already paid, and choose not to spend your life throwing good time after bad. 
  • Thoroughness isn’t always commendable. Many of us have succeeded in life because we work hard, we don’t give up, and we always do a complete and thorough job. Those are all good traits to have, but if you invest too much of your identity in those metrics, sometimes they work against you. There are times when it pays to be thorough, and there are times when it’s downright silly. Not finishing is actually the smarter choice when it increases your chances of getting where you need to go.

Switching into focus

The key feature of the focus phase is that we want to narrow things down, not open them up. That can sometimes be painful, but it can also be freeing. For some people, the expand phase is the hard one, and getting to focus feels like a relief. I really enjoy the expand phase, but I also appreciate the clarity of focus. I think of it as an opportunity to put down all of the options I’m carrying, so that I can invest all of my time and attention into a single path. Instead of asking “what do I need to do?” the focus phase is all about asking “what do I need to do first?”

Here are a few of the steps I take to switch from the exploratory stage into focus mode. 

  • Stop, and re-evaluate. You’ve learned some things and uncovered some intriguing potential. The world is full of possibilities. Now it’s time to ask: what am I really trying to do? Be ruthless when assessing what is truly necessary to achieve your goals. 
  • Compile key information. I will often create a list of things to include, things to leave behind, and things to come back to some other time. This can also help you to identify whether there is more information that you need. 
  • Identify the steps needed. The focus phase can be a long haul, and it’s important to be able to see your progress along the way. What needs to happen for this to succeed? How will you know that you’re making progress? How will you know when you’ve arrived?
  • Build a plan. Are there things that have to happen first? Which ones are the most interesting? Which are the most fun? Which tasks are likely to present a challenge? Don’t forget to budget your energy, too: that’s where you’ll find the stamina for the long haul. Make sure to plan consistent “energy snacks” along the way. 
  • Restrict your scope. Pick something small to start with, and do the work. Let small successes build on one another, rather than tackling the whole thing at once. 

Focusing on the survey project

As a first step, I went back through all of my notes and files and did a quick review. I made a mental list of the dead ends, good ideas, and new things that I wanted to explore. Then I looked carefully at what I thought would be a good candidate to do first. Jumping straight into the complex analysis would be a mistake (I’d have no way to know if it was even close to right, and I’d just get frustrated and lost before I’d even begun.) Instead, I focused on making a simple chart first, just trying to figure out the basic steps needed to get things going. 

List of objectives

  • Learning R: Build out a simple chart, understand the mechanics of basic data manipulations.
  • Tools analysis: Regroup on all of the different analyses, and identify one that’s a good candidate to start. 
  • Project strategy: Learn enough to create a plan for pursuing the more complicated analyses.

First step: Build out a simple chart

  • Load the dataset from .csv.
  • Clean the data, removing NAs and other items that I don’t want to count in the final piece.
  • Work on information from a single column first; the tools analysis requires across-column manipulation, but that’s a lot harder to figure out. Start with the basics first.
  • Repeat an analysis that I’ve already done in Excel. Replicating a working example allows me to check the numbers and retrace my steps. 
  • Figure out how to aggregate data in R, and learn how to manipulate the data object to do what I want. 
  • Format the data for use with a charting library.
  • Create the chart! 
  • Apply styles and understand how to tweak the display.

Notice that most of these items are very small, concrete steps, and it will be easy to tell if I’ve done them. But then there are other items (anything that starts with “figure out”) that are much less defined. Those items are outside of my experience and off the edges of my current map. I will need to push harder to understand what those tasks mean, and each one will probably require its own expand phase to learn what I need to know. I’ll want to keep those explorations tight and focused to avoid getting pulled off track, but it’s good to realize that I’m going to need them up front, because that helps me to set realistic expectations for how long it will take to do this task. It’s easy to look at a focus list and assume that it will all be easy, but it’s important to take stock of the unknowns and the risks, too. Setting reasonable goals at this stage is a big part of getting successfully to the end of the project without giving into frustration or burning out. 

Once your plan is identified, the next step is simply to get to work. I usually find the clarity of this stage quite exciting, and really enjoy the simplicity and directed activity that the focus phase creates, especially after doing so much wandering during the early stages of the exploration. Sometimes it’s hard to make the important cuts, but it helps to know that they’re necessary if you want to keep your project on track. 


Previous articles in this series:

Embrace the Challenge to Beat Imposter Syndrome

Step 1 in the Data Exploration Journey: Getting to Know Your Data

Step 2 in the Data Exploration Journey: Going Deeper into the Analysis

Step 3 in the Data Exploration Journey: Productive Tangents

The post Step 4 in the Data Exploration Journey: Knowing When to Stop appeared first on Nightingale.

]]>
14254
Step 3 in the Data Exploration Journey: Productive Tangents https://nightingaledvs.com/step-3-in-the-data-exploration-journey-productive-tangents/ Tue, 16 Aug 2022 13:00:00 +0000 https://dvsnightingstg.wpenginepowered.com/?p=12264 This article is part IV in a series on data exploration, and the common struggles that we all face when trying to learn something new...

The post Step 3 in the Data Exploration Journey: Productive Tangents appeared first on Nightingale.

]]>
This article is part IV in a series on data exploration, and the common struggles that we all face when trying to learn something new. The previous articles can be found here, here, and here. I’m exploring the tools data from the State of the Industry Survey, to illustrate both how I approach a new project, and the fact that no “expert” is immune from the challenges and setbacks of learning. In addition to working with a new dataset, I am also using this project to take my first steps toward learning R. Let’s see where this journey takes us!

Last time, we got deeper into the details of the survey data to better define the kinds of questions it might be able to answer. At this point, I’ve exhausted most of the basic analyses in Excel, and I’m pretty sure that a more complex analysis (and possibly a more powerful tool) is necessary. I’m not sure yet how I want to structure that approach, but spending more time on the same methods is not likely to create additional progress at this point. I am getting ready to switch back into focus mode and engage with the harder problem of doing the analysis in an unfamiliar tool, but my instinct is telling me that there is more to see here before I move on.

The risk in this stage of a project is that I’ll stay in expand mode for too long, and my focus will collapse like over-proofed dough.

Credit: Reddit

At this point, I need to make a judgment call about whether to keep expanding or switch into focus mode. For this project, I am indulging my interest in exploration a lot more than I would for something at work or on a strict deadline. This is an early-stage, conceptual exploration of a much larger project, and it’s worth investing some time in the big picture to help inform where we’re going. The tools analysis is actually outside of the scope of my core project deliverables, but I’m using this side exploration as a way to spend more time on a volunteer project, while also building some professional skills (understanding analysis in R). 

When to follow a tangent

  • When it feels exciting. There is a certain kind of excitement that comes from figuring things out. If the tangent is drawing you in, it might be worth a look. It’s important that this is not just about “shiny.” There should be at least a reasonably good chance that this tangent holds the key to understanding something larger. If your challenge is having too many distractions (mine isn’t), then you might want to be stricter about how you define this. 
  • When I’m blocked and want an interrupt or new perspective. Tangents can be a great way to keep moving without burning out. Sometimes an hour exploring a new idea is exactly the break that you need. 
  • When there is something that I will learn from doing it. Sometimes it’s worth pushing a little longer or a little harder if it will teach you an important skill, or if you will learn something about the project from following your nose. 
  • When I can leverage the tangent to keep me engaged. My most productive progress often happens when I’m deeply engaged with the project, but not actively thinking about it. Sometimes I take a walk and think about the problem at a higher level. Sometimes I read a book that I’m interested in that is related to my topic in some indirect way. Sometimes I try a different method or tool to see what perspective that brings.
  • When my subconscious creates an unexpected or improbable connection. Every once in a while, my brain tosses out a connection or an analogy that feels intriguing and important, but that also makes no sense. It’s worth observing how you describe the problem to yourself, and the references you use when deep in the throes of a problem: they are often windows into connections that you haven’t seen yet. 

In this phase of the project, I trust what my intuition tells me, unless I have reason not to. I have spent years honing my mind to solve problems. When my experience suggests that there’s something here to look at, I’ll trust its instructions, even if the logical part of my brain doesn’t necessarily agree. It’s important to validate your other ways of knowing, and to observe what you learn. If you are uncomfortable with this, the following tips might help.

Creating a safe space for exploration

Part of developing experience is creating a safe space for exploration. It would be a mistake to do your first hike as a solo backcountry trip in Denali National Park in Alaska. That’s not only foolish, but dangerous. If you want to learn how to orient yourself alone in the woods, pick a small state park with roads on all sides, and preferably one without dangerous animals in it. Similarly, it can be helpful to create some general parameters to define your tangent before going in:

  • Work for five minutes, or an hour
  • Pick a small problem, and start with that. (The tools analysis has plenty in it, but it’s actually just a small piece of the bigger project.)
  • Give yourself time
  • Don’t commit to deliverables
  • Have a backup plan, in case you come up empty
  • Don’t stake your professional reputation, your self esteem, your ability to deliver, or your social standing on getting it right
  • If it helps you, don’t tell anyone until it’s ready (but don’t keep it to yourself so long that you smother it, either)
  • Set low expectations
  • Embrace the process, rather than craving the achievement. Let the process become the achievement.
  • Pay attention to how you frame things. Sometimes we freak out about being “hopelessly lost” in the woods, when someone else would simply say that we took a wrong turn or that we’re on an adventure. The difference is perspective…and having smart protections in place.  

Coming back to the survey data

While working through the analysis, I kept finding myself reaching for metaphors that create temporary states out of an underlying structure: gathers in fabric, notes on a piano, weaving cloth. I did several freewriting sessions where I tried to describe the problem, just talking myself through what I was trying to solve. I wasn’t looking for answers here, just insight into what connections I needed to make, and what they might mean.

I spent a couple of hours toying with an embroidery project, using patterned gathers (selections) to create structures.

I thought about networks, and connecting individual data points using beads. Networks also create the intriguing possibility of abandoning the grid entirely, though I kept it in this sketch.

Adding beads to the gathers also raises questions about intersecting layers of data. Connecting data points from different directions could highlight additional features of the structure. Each combination of strings will gather the fabric in entirely different ways.

I also played around with constructing the fabric itself, planning out a weaving project as a way to think through how different variables interact. I had planned to do a weaving project anyway, but selected this particular pattern because it would help me to stay engaged with the survey work, while also creating a gift for my aunt. This is an example of converting a project in a different part of my life into a tangent connected to this project, so that they worked together to move me ahead.

Weaving is a great way of thinking through sequences of structured patterns, and working through this project in a familiar medium helped me to get a better grasp on how to approach code in R. R is a functional language, which means that you take an input (a data table) and apply a series of operations and transformations to it in sequence as you execute different functions to get to a result. This is very different from the object-oriented programming that I’m used to in Javascript/d3, but it’s actually quite similar to what you do at the loom: one thread goes through multiple patterned structures to create the specific combination required to create a “shed,” and then a new thread is added to the cloth to lock the pattern in place. Drawing up a weaving draft is itself an exercise in visualization, and it also helped me to think through the project as a sequence of steps (functions) applied to an individual thread (data row).

The most interesting insight (to me) from this exercise is that the fabric itself is actually a record of the sequence of analyses applied, rather than the data points themselves. You can weave many patterns from a single set of structures, just by applying your functions in a different sequence. Working through this project with physical materials gave me a concrete way to think about the features that I was struggling to understand in the more abstract world of code.

I also started to extend the weaving metaphor to think about what it might look like if I tried to actually convert this dataset to a woven piece. It’s highly unlikely that I will actually do this, but working from a language that I understood helped me to see what relationships I was trying to create in the dataset, and helped me to find some contradictions that I hadn’t resolved. 

It’s also worth noting that I am leveraging materials and techniques that have long been dismissed as “hobbies” or “women’s work.” Respecting these other ways of knowing often creates insights that you might not otherwise see, and can be an important advantage when you’re looking for a creative solution. It’s easy to dismiss other experiences or approaches as “not really data vis,” but if you have a tool in your toolbox, there’s no reason that you can’t apply it to solve the problem at hand. These different perspectives often contain the insights that will help you get to a better solution. There’s no need to restrict yourself to methods that others understand or value when you are exploring to meet your own needs. 

For all of these reasons, I consider tangents like this to be a highly productive diversion from the core work of my project. They did take me a little bit outside of the bounds of what I needed to do, but they only took a few hours, and they helped me to enjoy the process and stay with the expansion a little longer while I worked things out. I also accomplished some useful things:

  • Trained my intuition. This project is less about the fabric or the folds themselves, and more about the structured threads that create the patterns that I want. I suspect that my final analysis of the data will be more about patterns created by capturing temporary states, and less about drawing out the data itself. I’m not sure precisely what this means yet, but I know that it’s important. This exploration has made it more likely that I’ll recognize what I need when I find it. 
  • Built a dictionary of metaphors. Different ways of structuring the analysis give me different hooks to use when looking for patterns. Threads create one kind of gather, beads create another, and weaving uses the “gathers” in threads on the loom as a method to build the cloth itself. These different metaphors help me to look at the problem in different ways, and I can use them to think through the problem as I get deeper into the weeds. 
  • Indulged my curiosity. Curiosity is one of the most powerful motivators. It’s easy to forget, but nurturing your sense of discovery is one of the best ways to keep excited about a project and avoid burnout. It never hurts to put a little wind in your sails before settling in for the long haul. 
  • Took a rest and built stamina. My number one reason for using tangents is to stay engaged with a project while doing something else. A full break is rarely productive for me. It’s too easy to forget where I was and what I was doing, and it makes a huge barrier to entry when I want to come back into the project. Switching gears and using a different part of my brain helps me to keep active, excited, and in touch with the project, while also creating a space where I can rest a bit before digging into the real work that lies ahead. 

The post Step 3 in the Data Exploration Journey: Productive Tangents appeared first on Nightingale.

]]>
12264
Step 2 in the Data Exploration Journey: Going Deeper into the Analysis https://nightingaledvs.com/step-2-in-the-data-exploration-journey-going-deeper-into-the-analysis/ Tue, 03 May 2022 13:00:00 +0000 https://dvsnightingstg.wpenginepowered.com/?p=11109 This article is part III in a series on data exploration, and the common struggles that we all face when trying to learn something new...

The post Step 2 in the Data Exploration Journey: Going Deeper into the Analysis appeared first on Nightingale.

]]>
This article is part III in a series on data exploration, and the common struggles that we all face when trying to learn something new. The previous articles can be found here and here. I’m exploring the tools data from the State of the Industry Survey, to illustrate both how I approach a new project, and the fact that no “expert” is immune from the challenges and setbacks of learning. In addition to working with a new dataset, I am also using this project to take my first steps toward learning R. Let’s see where this journey takes us!


The last article focused on the process of getting oriented; that first, heady discovery stage where everything is new and it’s easy to skim through the simple questions to find something interesting. It’s a high-potential space where everything seems possible, in part because you have not really begun to engage with the reality of the data. But good visualization leads to deeper questions, and deeper analysis takes more consistent effort. Let’s take a closer look at where we are on the double diamond. 

At this point, we’re starting to enter the higher-effort engagement stage. This often requires you to push edges and explore new space, and it’s where you’ll need to stretch your skills and abilities. If the first phase of the journey was a gentle stroll across an open field, this is where you start to climb your first major hill. You’ll need to pay a bit more attention to your energy levels here, and you can expect to start feeling the workout. 

The engagement stage is also where the potential that felt so exciting in the discovery period starts to feel heavy. If you’re not careful, the weight of your own expectations can really start to slow you down, and unmanaged fatigue can create an opening for imposter syndrome to slip in. Staying deep in the engagement phase for too long or not pacing your work properly can lead to burnout or failure to complete your project.

Fortunately, there are several ways to prevent that from happening. There’s always a window of time where you have an opportunity to intervene before things go off the rails. Experts have more experience recognizing and managing their energy to prevent injury; they probably respond sooner and may not even notice that they’re skirting on the edge of fatigue, because they’re doing what they need to do to care for themselves along the way. Developing experience is about learning to recognize when you’re getting tired and to take that as an opportunity to adapt rather than pushing until you just can’t go on. Remember that energy management is a symptom of intelligence, not weakness: experts do this constantly, because it reduces the likelihood of injury and increases your chances of success.

We’ve looked at this image before. As a beginner, it can be tempting to shoot straight for your destination and to take the most direct path to get to your goal. It seems like that must be the fastest way, and that the real process is inefficient and undesirable. Like measuring the crow’s flight distance on a map, this approach assumes there are no obstacles and fails to account for the real terrain of the problem. The two-dimensional picture of the design process is really just a simplistic projection of a much bigger, multidimensional space. 

If you think about your task in the context of climbing a mountain, you can immediately see why someone might turn back or loop around to get where they’re going. The squiggly-line drawing isn’t nearly as senseless and inefficient as it might seem. People outside of the process may not be able to see the challenges of the terrain, but an expert knows enough to account for them if they want to reach their destination in one piece. 

From this perspective, success becomes a matter of energy management: your job is to balance your effort and reward curves in ways that allow you to stay deep in the engagement phase for the longest period of time, and to make sure that you’re continuing to make strategic progress toward your goal. Ignoring fatigue is usually not helpful: if your brain can’t count on you to rest when you need to, it will simply make the choice for you, and without your consent. This often comes in the form of distraction, loss of motivation, or imposter syndrome. In my experience, those are usually signs of poor energy management rather than weakness or lack of commitment. 

One way to rest is to simply stop and do something else, but there are nearly infinite ways to stay engaged with a project while taking a break. Getting overwhelmed? Step into the focus and consolidation phase to simplify and regroup, before you burn out (timing is key here). Feeling like you’re not making any progress? Step back into discovery for a while and figure out some quick wins. Are you getting tired but not quite ready to stop? Level off and work at something straightforward for a while to maintain your pace, rather than trying to sprint straight uphill. 

Tips for energy management 

Here are a few guidelines that help me to stay productive during the harder work of the engagement phase. 

  • Stop before you’re tired. The best way to avoid getting slowed down by an injury is to prevent it. Don’t wait until you’re exhausted to act. 
  • Pay attention to how you’re feeling about a project. If your motivation is slipping or you’re starting to feel overwhelmed, that’s a sign that you need to back off and do things differently for a while.
  • Switch tasks often, even within the same project. Every sub-task requires different strengths. Switching them up is like cross-training for your brain. You may need to manage this to make sure you’re still making progress, but switching gears can be a very healthy thing. If distraction is a challenge for you, make a game of seeing how long you can stay happily excited about working on the same thing by focusing on different pieces or tasks. The key word there is “game.” If it stops being fun, see the first bullet. 
  • Don’t beat yourself up. I’ve never seen someone become less tired because someone yells at them. If you’ve missed the signals and things have devolved into exhaustion, the best you can do is to treat the problem at its source. Self-flagellation is counterproductive and a waste of time: don’t do it. Learn not to abuse yourself. Try something to make your exhausted brain feel safe and cared for, instead. You are responsible for setting realistic expectations and managing frustration in healthy ways. There’s no need to take it out on your creative self if you’re disappointed or frustrated with how much you can do. 
  • If you get stuck, step back and try a different direction. You’ll likely end up circling back around to the same underlying questions again and again: that’s ok. In fact, that’s useful information. Running into the same roadblocks is often good confirmation that there’s a core piece of the puzzle here that you need to solve. It may be that there’s a limitation in your data or that you need to change your method, but banging up against the same dead ends in different ways is often an indicator that your most interesting insight lies behind those stubborn walls. I often imagine myself pacing around the perimeter of the problem, trying to find a way in from different angles: the more times I end up back in the same place, the more likely it is that it’s the key to figuring out where I want to go (or to changing my mindset, so that I can see a better way). Pay attention to those centers of gravity that pull you back in, and then get very clear on what’s blocking you from reaching them: those hints are usually the ones that point to what you need to see.

Back to the survey data: 

So, what does this look like in practice? Let’s get back to our exploration of the survey tools data and take a look. 

Last time, I made a list of all the interesting questions that were relatively easy to look at, and took notes about all of the more complicated things that I’d return to later. In reality, I do a mix of simple and complex analyses in almost all stages of a project (task switching!). I don’t want to spend too much time on side tangents, though, so I’m always monitoring to see if I’m getting in too deep, and I will pull out if an analysis is starting to bog me down. In one sense, the discovery phase is unlike just about any other area of my life: I will almost always err on the side of impatience to keep things moving along.

As I worked through the simple questions list from last time, there were several side analyses that diverted my attention. These were slightly more complex pivot tables or required more steps to massage the data into shape. This is the time to dig deeper into a few of those problems. I worked on this project over multiple weeks, so I might have spent a few hours on a harder task, switched gears and played with an easier one, and then come back to focus later in the day, or whenever the next session happened to be. 

Ultimately, though, what I’m looking to do in this stage is to get deeper into the details and start looking at more complex relationships. Instead of individual values or individual columns, I’m starting to look for correlations between columns, or derivative data that might help to give me some insight into the problem at hand. 

A deeper look at the data

For this stage of the project, I wanted to understand more about the tools that people use together. This is moving toward my initial instinct that I’d need a clustering analysis of some kind, but I wanted to see what I could manage with simple methods first. I used this set of explorations to experiment with simple group-by functions in R, and then shifted back into Excel to get a better understanding of the results. As we’ll discover, it turned out that the R analysis was actually counterproductive when working with pivot tables, so I switched back to the flat dataset halfway through and tried again. I understood from the beginning that there was an awkwardness with how the original table was formatted, but hadn’t yet figured out how I wanted to structure the data. Seeing precisely why the group-by approach couldn’t work was part of figuring out what I needed to do next. Sometimes the missteps are a necessary part of seeing how to move on. 

In terms of the more focused questions that I was asking during this phase, I picked up where we left off in the previous article. Rather than analyzing for one tool at a time, I started with simple pairwise analysis (comparing counts of X tool vs. Y), and then moved toward a more complicated row-level analysis to look at groups of tools for individual people in the dataset. 

Before we look at the charts, I want to emphasize again that I’m in the “expand” stage of this project. I am putting up with a lot of manual work right now to avoid getting tangled up in learning R and exploring at the same time. If you know how to use more advanced analysis tools, my repetitive bumbling here will probably put your teeth on edge. I know there’s a more efficient way: I’m simply not there yet. And that’s okay with me! I’m simply laying the groundwork so that I can move faster when I do start working with the new tool. Also, please remember that the data in this analysis is intentionally incomplete. None of these charts contain reliable values and there is no useful information to interpret from them; these are all just sketches to help me think through the analysis that I want to do. As such, I rely heavily on screenshots and notes, and am not concerned with axis labels, specific data values, or other details, because I know them to be meaningless. With that, let’s explore some questions! 

What tools are paired with X, and how often?

Because the responses for each tool are stored in a separate column, it is relatively easy to count up the number of people who use X, and the number of people who also use Y. Looking at the results in a stacked bar gave me a solid bar for the reference tool (each of the ~125 users who uses ArcGIS uses ArcGIS), and then a pair of bars for each other tool in the group (10 percent of people who use Excel also use ArcGIS). 

Because I’m using the overall response counts as the total for each stack, it’s easy to see which tools are more popular. It does look like there might be some interesting patterns to tease out of the tool pairs, but it doesn’t make sense to get too detailed with those before firming up the data. (Remember from the last article that we don’t even have all of the tools in here yet!)

Looking at this as a correlation grid helps me to see hotspots that might indicate interesting pairs, though this will also be convoluted with the relative population of users for each tool. For instance, (I’m making this example up) you might not see that 90 percent of QGIS users also use ArcGIS, because the population for both of those tools is completely swamped by Excel. 

I could change the picture by using percentages, but that adds other complications. It does seem like there are some groupings within the data, and some pairings are particularly strong. I also see strong variation in popularity (counts) for the different tools reflected in the correlation pattern. For now, it’s enough to sketch this out. I’ll come back to it later to refine and develop different views, if this analysis makes the final list. No sense in over-optimizing yet, since I’m not even sure that this is where I want to go. 

How often are tools X and Y paired together?

Next, I wanted to get a sense of frequency for the different tool pairs. This analysis was more complicated in Excel, and required a lot of manual rearrangement that I knew would be easier in R. I was still struggling with basic R manipulations and didn’t want to get bogged down with that just yet, and I didn’t want to spend a lot of time developing custom charts to reflect incomplete data values. For now, I decided to stick with just sketching out the concepts, instead. 

  • The first sketch looks at tool frequency within a particular overall tool count: of the people who use only one tool, how many use Excel? Tableau? Of the people who use two tools, how many use Excel?, etc. 
  • The next sketch selects a single reference tool (Excel), and looks at what other tools people use. How many people use Excel and Tableau, Excel and PowerBI, etc.?
  • The third sketch looks at the overall popularity of the different tools, and uses lines to show how often they are paired. It’s drawn with Excel as a reference point, as if it is the center of an interaction. The only difference between this and the previous version is that I’m explicitly encoding the connections between the reference tool and others in the group (in the previous version, the reference tool implicitly sets the context, but is not drawn).
  • If we assume there will always be a reference tool, we can move that into the center, and show other tools around the edge. We could scale their size by frequency of pairing, if we want to emphasize that aspect here.
  • I’m also interested in how this intersects with the number of tools people use overall. Adding radial groups to represent the total number of tools could help to tease apart how closely related two tools are. Here, the inner ring is Excel and one other tool, second is Excel and two other tools, etc. 
  • As soon as we get to more than two tools, we might need to start including loops to show where those categories overlap. If a person uses Excel and Tableau and Power BI, we might want to connect them with someone who uses that same group plus D3. Or, we might want to connect the Excel and Tableau group to anyone who also uses Tableau, regardless of what other tools are in the mix. There are lots of ways to define the connections, but the key thing to keep in mind is that we might want to see them.  

I’m pretty sure that none of these will become the final form for my project, but they were quick to draw, a fun break from the spreadsheets, and these sketches are enough to capture the basic ideas. This is an example of pacing in action; I’d run into a roadblock on the pivot table analysis, I didn’t want to put the energy and time into the harder R work, and so I did something quick and fun within my existing skill set instead. I can always come back to this later, and it gave me a short break before getting back to wrestling with the core problem that I was trying to solve.

Which tools are used alone, versus in sets?

By adding in a calculated column that counts the total number of tools per person, I can start to compare how much a tool is used overall, and how often it is the only tool or one of a small group. This doesn’t tell me about the importance of the tool within that group, but I would expect that core tools would be represented more often in small groups (and more broadly across all groups), where peripheral or more specialized tools would tend to show up only in larger groups. 

You can see that I’m also keeping notes on the limitations of the analysis and details to follow up on, if I come back to this piece of the problem again later. I’ve had to rely on these notes multiple times when writing this article, which I’m composing more than three months after the initial exploration was done. You’ll never see things as clearly from the screenshot as you do when you’re in the thick of things, so make sure you leave a trail! 

Once I have a good thing going in the sketching phase, I like to push it until it breaks, just to see what happens. For this version, I broke the pivot out again, to look at the list of second tools and their groupings as well. Essentially, this pivot attempts to answer this question: for groups of two tools containing D3, what other tools are most often represented in that pair? 

That’s an interesting question, but unfortunately, I had another set of assumptions in this data that I didn’t catch right away. When I found it later, it completely invalidated these results. It was only a few minutes of experimentation with a pivot table, though, so no harm done. 

How many people use each lineage of tools? 

The problem came from an intermediate step that I took hoping to start doing row-wise comparisons in the data. The initial data table had one column per tool, as I mentioned before. That’s great if you just want to count totals per column, but I really wanted to understand groups of tools per person. It’s a lot easier to read that information in the spreadsheet if you collapse out the empty cells. By reducing out the blanks, I had a much more readable list of first tool, second tool, etc. 

This approach has one major drawback, which is that it imposes a ranked order when there is none in the dataset. This is a multiselect question that accepts unordered responses, which means that my “tool1”, “tool 2” designation is based on whatever order the original columns happen to be in. I really want the first column to be the most important or most frequently-used tool, but that information simply isn’t available. I understood that limitation when I made the table, but didn’t think through all of the implications until a few steps further on.

This is an instance where a different visual form can be helpful. I knew that the sequencing could cause me problems, but I was interested in creating some kind of attenuating tree diagram focused on the different tool groups, where longer branches would represent larger groups of tools, and the size and number of branching points would indicate popularity and diversity within groups. After aggregating up the common “branches” (shared tool sets), I started sketching them out in a multiple y plot. I wanted to look at the first tool, second tool, etc., in the list, so those are the individual y axes, and the tool list is on the left. My first two branches share Excel as the first tool and PowerBI as the second, but then they diverge for the third tool in the list. 

The problem with this data aggregation method became clear pretty quickly: if the third tool in the list happens to be mismatched, I end up splitting a branch that will rejoin in the fourth tool. Since I know the sequence is meaningless, I would prefer to consolidate these branching points  so that there are no loops in my network. These two branches should share tools 1, 2, and 4 and diverge at tool 3, but I don’t currently have any good way of sorting to make that happen. Again, it’s the artificial sequence that causes this problem, and there’s no simple way around it in my reduced-branches version. I am pretty sure that this analysis will be trivial to do with a more sophisticated method, so it was time to put this one aside.  

It took me a couple of weeks of project time to play around with this to my satisfaction and discover that this particular method leads to a pretty firm dead end. It’s not impossible to get this data to work, but it’s going to take more effort and thought to do it well. The question is still interesting and I will probably return to this analysis in a different form later, but it’s not going to get me where I want to go from here. The same restriction applies to my pivot tables above. The designation of first, second, and third tool doesn’t make a lot of sense in an unordered dataset, and so a simple clustering model would probably be a better match. An unordered, network approach would not impose the same hierarchy that’s implicit in the sequenced, hierarchical approach required for branches. This little diversion also raised a question that I would consider adding to next year’s survey: if we asked users to rank tools on preference or frequency of use, then we’d have the data attribute we need to make the branches work.

By now, certain inner voices might be starting to shout that this entire effort was wasted, this was all completely obvious from the beginning, and that I’d spent more than two weeks learning absolutely nothing. I disagree with those voices: discovering a dead end is still information, and if you understand why it’s a dead end it’s taught you an important property of the dataset. Whether it’s something you saw and didn’t recognize or something that you simply failed to notice, making a mistake like this will reinforce that connection in ways that you’re unlikely to forget. If you find yourself in need of additional consolation, remember that it’s a lot less painful to cut your losses while you’re still in the sketching phase than to get caught wrong-footed later. The problem isn’t being wrong; it’s staying wrong. 

Whether or not this detour was successful, it helped me to clarify what I was going for, simplified my approach to the analysis by removing the row-level aggregation, and gave me an opportunity to think through alternative displays to show this same kind of information in a non-sequenced way. I contented myself with adding some notes (in red, to be sure I didn’t forget), sketching out a few ideas for extensions or other approaches that could reduce the impression of ordering, and moved on. 

How many people use this specific tool group? 

To regroup, I did what I always do when I hit a dead end: I circled back to something simpler, and started moving forward again. The branching detour did validate my interest in being able to drill down deeper into more complex combinations of tools. I seldom use treemaps, but that was one of the first visualizations that I sketched out for this dataset, and I had that sense of subsectioning in the back of my mind for several of these iterations. There is no reasonable way to build what I wanted manually in Excel, but I could at least do enough to peer inside and get a sense of what’s in there. To do that, I created an individual branch browser that let me specify relationships between different tools. 

This pivot allowed me to specify my branches one node at a time, and to see the count of tools that fall within a particular branch. By putting one reference tool in the row labels category, I was able to see the count of users who do (and don’t) use the reference tool, broken out across each of the other tools. I started out with ArcGIS as my reference, and then looked at the count of people who use Cytoscape who also use ArcGIS, who use D3 and also use ArcGIS, etc. In a sense, this is coming back to my stacked bar charts example from the beginning, and just doing the whole comparison at once. 

Unfortunately, as you can see from my next red note, this pivot is (sadly) still not a treemap. It’s always a good idea to keep an eye on your totals row when doing this kind of analysis: if your counts add up to something much bigger than your dataset, you know you’re in trouble! This is easiest to think about in terms of an example. Let’s say that I am looking at the group of all ArcGIS users who also use Cytoscape. In a group of two tools, this browser works just fine. The problem comes in with groups of three or more tools. The third tool could be anything! Let’s look at what happens if the third tool is D3. Those people get counted in the group of three for ArcGIS and Cytoscape and D3, but they’re also counted in other groups of three containing Cytoscape, and in the groups containing ArcGIS and D3. All of a sudden, my counts blow up, because I’m cross-counting everything. That’s okay for some questions and visual forms, but overlapping categories are simply not compatible with a treemap. If I redefine my task as simply mapping out branches, I could sort of make this work by manually toggling filter values for each and every combination, but that’s way too much work to make sense at this stage. Much better to wait and automate this one, if this ends up being an interesting path.   

Ok, so what have we learned?

This part of the project was more involved and took a lot more manual, tedious work than the previous stage. In some ways, you could say that I wasted several weeks bumbling around in the dark and heading off down rabbit holes and dead ends. Welcome to data exploration! I am very aware that someone who knows more about analysis or is fluent in R could have run rings around me and my manual pivot tables in Excel. It would be nice if I could just magically know everything I need and be good at that already, but I’m not, and that’s ok. This is what it means to learn, and no magic piece of software is going to get me out of the real work involved in understanding the data. True, I didn’t come out of this with everything figured out, but I did get a lot out of this foray. Here are some of the things that I learned:

  • Ordered vs. non-ordered data is a key consideration for this dataset, and it will affect both the aggregation and the visual form in important ways. We don’t want to imply a ranking that doesn’t exist. Implied ordering can also cause mistakes in the analysis. 
  • I need to be sure that my method prevents double-counting across columns, so that I can be sure that my branching categories are unique. A good basic check is to be sure that my pivot table column totals are no larger than the original dataset. 
  • Pairwise analysis is interesting, but clustering or branch analysis across the full toolset for an individual user creates much richer and more interesting information. 
  • The number of unique combinations makes the branching analysis difficult to manage, and may cause problems with scaling for some visual displays. It might be necessary to remove branches/tool combinations with only a single user, or to consolidate them in some way to avoid creating hairballs or value displays that need to show tens of thousands of individual pixels. 
  • The branch navigator was interesting because it allowed me to interactively query the data. If possible, it seems likely that interaction in support of exploration will be a key piece of presenting the information. This has implications for both the data structure and aggregation, as well as the medium in which the final result will be displayed. 
  • Drilldown is useful, but I really want an overview of the full dataset first. That piece is going to take some work, but I think it will be worthwhile. Whether it’s a treemap, a dendrogram, or some other visualization, I want to be able to see the relative size and fragmentation of the different tool sets. 
  • I am much better prepared to approach the hard work of learning R and doing these more complicated analyses now that I know more about where I’m going. This gives me at least some (clunky and painful, but functional) approaches to sense-check my results along the way. 

Lots of effort for a bit more progress is pretty typical in the steeper uphill climb of the engagement stage. We’ve learned some interesting things, gotten more focused on what we need to do, and identified some important mistakes to avoid. The next question is how long you stay in this stage, and how far you wander before pulling back in and switching into focus/consolidate mode. I had one more set of things to explore before making that transition, but we’ll cover those in the next article. 

Coming up at the DVS:

  • One thing we haven’t talked about in these articles is how to collect the data that supports an analysis like this. If you’re interested in better understanding and participating in the data collection process, the 2022 State of the Industry Survey Committee will be launching toward the end of May! Fill out an application here.
  • Join us for an informal roundtable on May 28 at 10 am to talk about how we build and analyze the survey data, and to meet some of the people who have been working on this project. It’s listed on the DVS Events calendar, or you can register here!

The post Step 2 in the Data Exploration Journey: Going Deeper into the Analysis appeared first on Nightingale.

]]>
11109
Step 1 in the Data Exploration Journey: Getting Oriented https://nightingaledvs.com/data-exploration-step-1-getting-to-know-your-data/ Wed, 09 Mar 2022 14:00:00 +0000 https://dvsnightingstg.wpenginepowered.com/?p=10579 This article is part II in a series on data exploration, and the common struggles that we all face when trying to learn something new...

The post Step 1 in the Data Exploration Journey: Getting Oriented appeared first on Nightingale.

]]>
This article is part II in a series on data exploration, and the common struggles that we all face when trying to learn something new. The previous article can be found here. I’ll be using the tools data from the State of the Industry Survey as a basis for this exploration, to illustrate both how I approach a new project, and the fact that no “expert” is immune from the challenges and setbacks of learning. In addition to working with a new dataset, I am also using this project to take my first steps toward learning R. Let’s see where this journey takes us!

Before diving into a project where I’m likely to get diverted or distracted, it’s helpful to take a moment to get a clear idea of what I’m working toward. A design brief helps to clarify my thoughts and it gives me a reference point to check against as I evaluate tangents and new opportunities that come up during the exploration phase. It can also be helpful as a framework for evaluating success, and as a way to structure feedback and evaluate input from other people. I don’t always write this down, but I find it helpful to have clear objectives from the start. 

Design brief: 

Here’s a rough outline of what I’m trying to achieve in this project. 

Project Scope:

  • Design a chart (or group of charts) to showcase a single 2020 survey question about tool usage, and to provide insights into the skills needed in different careers. 
  • For now, I will focus on the tools question exclusively, though there are many interesting questions correlating this information to other survey questions that might be worth considering in the future.
  • This is an exploratory project, so the final output and mechanism of delivery are to be determined.

Context:

  • This visualization is part of a larger project to leverage the DVS survey data to inform people about different career paths.

Audience

  • DVS members and Nightingale readers.
  • People working in dataviz who are curious to see what tools others use.
  • People new to dataviz who want to understand what they should learn next (especially if they are interested in a particular career, or comparing different careers to find a match for their interests).

Purpose:

  • To understand which tools are most popular, and which sets of tools tend to be used together.
  • To find out how much tool usage varies between professional communities.

Data

  • Data from the 2020 DVS census, hand-tagged to different career groups by job title.
  • The first four career categories account for roughly ⅔ of the data. Business analysts and related roles are the largest group, at almost 36 percent.
  • 1,766 individual data points, with 33 tools (plus an “other” category) listed in the dataset.
  • This survey question is not exclusive; respondents can choose more than one answer.
  • I am not doing anything to remove or track incomplete responses at this point. 
  • The “other” category in the original question is excluded from this analysis, since it is a free-entry field and harder to process (a future version should include this data as well).

Once I know what I’m trying to do, the next step is the fun part: getting in there and understanding what this problem really looks like. This is the “expand” stage, where I take some time to get oriented, understand what the data is all about, and play around with some initial ideas. This stage is especially critical when you are working with someone else’s data. You want to find all of the gotchas and limitations lurking in the data before spending too much time on a formal analysis. I’m also looking to assemble a strategic view of where I’m going, so that I know where to spend my time in the much slower, higher-effort “focus/consolidate” step that comes next. 

Practices for exploration

For me, the most important principles of the exploration piece of the expand phase are the following: 

Do what’s easy

I’m trying to learn as much as possible about the problem without getting too bogged down in the details or impeded by my tools. Right now, I just want to rummage through as many aspects of the data as I can, to identify which ones might be promising enough to come back to later, and to anticipate the dead ends.

What that looks like for me right now is a tiny bit of analysis in R to export a spreadsheet, and then a whole lot of manual playing around in Excel to figure out how I want the bits to work together. I’d do that in R if I had the skills, but I don’t (yet…that’s what I’m learning!). For now, I’ll use R when I can, and when I can’t, it’s good old elbow grease in Excel to fill in the gaps. I’m leaving the hard work of learning new software for later, when I have a better idea of what I need, and a better sense of the data to help me sense-check my results. The price for that will be hours spent doing things inefficiently in Excel, but the tradeoff is worth it to me right now, especially because it’s easier for me to check my work and debug data issues in software that I know well.

I’m using Illustrator for the charts for several reasons. I’m already familiar with it, I want the flexibility to sketch and ideate on top of basic data points and visual forms, and the actual data values aren’t all that important to me right now. The data points will all need to be carefully recalculated and analyzed for the final version anyway, so everything in this file is subject to change, and should be thrown out. Knowing that gives me the freedom to ignore “little” things like axis labels, and just dump in screenshots and notes to help myself re-connect the dots later, instead of spending days building out charts that I know I will throw away. Is this best practice? No. Is it ok when you’re sketching? In my opinion, yes – if you know how to structure your notes and your process so that you can pick up the pieces later.

One important caveat: this approach works well in a case like this one, where I am doing multiple straightforward analyses off of the same dataset (lots of pivots, but using only two source data tables, and few calculations). I am not building a big, complicated analysis where each step depends on the result of the last. If you have a sequence of analyses that affect one another, it is almost always better to build out and thoroughly test each one before moving on to the next. You might still do some sketching and ideation to make sure you understand the paths and the steps that you need to take, but you should always be more careful when working with code or analyses that have strong dependencies.

Sketch first

You may be thinking that this is a sloppy, imprecise way to work…and you would be right! To me, that’s actually sort of the point here. I don’t want to get in too deep and start taking myself seriously before I know what I’m after and where I’m going. In my experience, an analysis that looks like it might be finished is a lot more dangerous than one that is clearly a mess, because it’s easy to forget that “one little thing” you needed to do when you came back. I used to tell my students that the best way to avoid plagiarizing was to never copy and paste (or even paraphrase) someone else’s sentence into your document. Once it’s in there, it’s really easy to forget that you need to go back and make a change, but a big block of [add something interesting here later] with a link to your references file is something you’re not likely to miss in the editing phase.

I find that the same thing applies to charts. If I make a chart that looks “real” in Excel and I skip a step in the data analysis for the sake of time, I’m much more likely to end up with an error in my final dataset. I consciously prevent that by increasing the separation between the ideation and editing stage (different tools, different files, etc.). This helps me to avoid getting bogged down in the details too early, short-circuits perfectionism, and gives me the room to move freely while I work through the big-picture strategy for a project. Sometimes, it also means that I make mistakes, but I can usually live with that. This approach only works if you have the discipline to really, actually start over from scratch and to resist the urge to copy and paste later. Otherwise, you risk transferring errors and missing gaps that could jeopardize your entire analysis.

Personally, I have usually learned enough by the end of the sketching to more than make up for the time that I’ve lost making messy charts that need to be thrown out. You may find that it’s different for you; in that case, it might be better to stick to pen and paper for sketches, because that’s almost always the fastest way. I prefer the additional detail of realistic data values, but depending on the project, it’s not always necessary, or worth the time. In some cases, my values are so exact that I have to do the full analysis, at high quality, right from the beginning. That’s okay, too. It’s just a matter of deciding what makes sense for you, and for the project at hand.

Leave a trail

This kind of rapid ideation also means that I need to leave myself a trail to make sure that I can come back and re-create each individual step. This practice helps me to make first-draft documentation for the analysis and the project. If I know that I’m going to have to go back and figure out those cryptic notes later, it creates a good incentive not to cut corners on writing things down. I usually just keep a running Word document with a bullet list of changes for each day, notes about file and tab names and locations, and a bunch of screenshots showing different iterations. Use whatever works for you. Writing blog posts is also a really good way to document what you’re doing at a high level, to help make sense of the details in your implementation notes. 

Start simple

Sometimes it’s hard to resist the urge to dive right in on the newest, most interesting thing, but I find that it’s much better to slow down and look around me first. Always start simple and work up from there, especially when you’re working with someone else’s data. I am 99 percent sure that I am going to want some kind of advanced cluster analysis by the time I’m done with this project, but it would be a mistake to get impatient and go for that destination right away. 

First, I don’t know what I’m doing yet with the R software. I’m much more likely to end up bogged down in questions I can’t answer, frustrated by my technology, and missing out on the actual insight if I jump in right away. Even if I could find a package to run the analysis automatically, I’d have to blindly trust the output at this stage, and that can be dangerous. I never trust an automated routine without first taking the time to understand the method and its limitations, becoming familiar with the data I’m putting in, and getting at least some sense of what I should expect to get out. Otherwise, it’s just a black box and I have no way to evaluate the results. 

Second, by jumping in directly I’d miss the opportunity to develop a deeper understanding of the dataset that will help to inform my interpretation of the results. Third, there are tons of other insights sitting right in front of me, just waiting to pop out. If I short circuit the exploration stage in favor of a fancy analysis, I may end up missing the most important thing that this data has to tell me. Like any good warmup routine, a robust exploration phase helps to make sure you’re ready, improves your performance, and prevents injury (mistakes/frustration) when you get into the actual analysis.

Follow the data

My whole job in this part of an analysis is to understand what’s going on with the data. I want to get a big picture sense of counts and distributions, and to see where there is variation between careers. I usually start with the simplest possible question that I can ask of the data, and then work forward from there. I’m not trying to force a particular path or get to a specific outcome yet. Right now, I just want to see what questions come up as I look at the dataset. 

Ask more questions

The wonderful thing about questions is that answering one of them almost always creates another. Follow your curiosity and see where it leads. Looking at one chart will usually suggest another idea. As I get into more depth, I keep a running list of more complex questions that I want to come back to and explore. I also structure my output document to reflect the series of questions I asked, so that I can come back later and follow the trail. 

A first look at the data:

Here are some of the questions I asked in the first couple of weeks of exploring this data: 

How common are the different tools? Which tool is most popular?

My first chart was just a simple frequency calculation for the different tools, plugged into the most basic, default chart possible, to help me see the data values. Adding a simple sort function lets me see which tools top the list. 

It’s important to note that I’m not looking at specific values right now, because I know my analysis is not robust enough to support that kind of weight. Doing this analysis in Excel required dragging each and every tool into my pivot table by hand; there were 35 of them, and they needed to be added in order, using the same order every time. I decided that I didn’t need that level of detail in this stage of the project. That means that my ranked values are only based on half of the dataset, so I really know nothing about which is the most common tool from this chart. (In fact, this chart is missing Tableau, which is actually the second-most common tool!) 

It’s also important to note that the total number of answers in the bars adds up to significantly more data points than people who took the survey, because this survey question allows one person to select multiple answers. I’m not yet doing anything about that multiselect or even tracking its meaning, but that’s worth adding to the list of items that I need to keep in mind as I go. 

It’s fine to work with incomplete data right now, as long as I resist the urge to try to draw conclusions or make inferences off of the differences that I see in these charts. I know that I am missing important information in the points that I’m not showing, and that those values could change everything about the conclusions that I’d draw based on the numbers that I see here. Again, I’m just trying to get a sense of the dataset, and I care more about the structure and the kinds of analyses right now than I do about the individual values. Getting too attached to individual values and conclusions can actually be counterproductive at this stage. 

This is another reason that I chose not to bother creating and formatting axis labels in my Illustrator document: it is an extra manual step in the software, but it’s also a salient reminder that I can’t count on any of this information to be real. That level of detail comes later, when I’m ready to do this thing right.

How much does the distribution of tools vary across careers?

Next, I built a copy of my basic chart for each of the different career paths. The first column shows the total for all career paths, and the subsequent charts show distributions for each of the subgroups.

Which tool is the most popular, within each career group? 

Again, I can sort my bars by height, to get a sense for the popularity of individual tools. The previous charts all used the same sort as the totals chart, so that I could compare positions across career groups (if Excel was at the top of the totals list, it is the top bar within each career chart). If I want to look at popularity within groups, I can re-sort my y axis as I create each individual chart. I have to be careful here, because this means that my y axis is now different for each chart, which makes my omission of data from these visualizations riskier. Again, I’m deliberately not fixating on data values or patterns yet, and knowing that I’m on thin ice with the analysis helps to keep me out of inference-making mode and in the exploration space. 

How many counts do we have for the different career-tool groups?

Another reason to be cautious is that I know from a previous analysis that the size of my career groups is not evenly distributed within the dataset. The charts above are scaled automatically to the max for each dataset, which hides that variation. The second row of charts below shows the first row scaled to a common x axis, set based on the sorted and unsorted totals charts at the far left of the row. Right away, I have a different sense of which differences are meaningful to follow up on; this completely shifts my interpretation of the previous charts. I want to find that out now, before I get attached to a story that doesn’t exist.

I often use this sort of small multiples approach to help keep me honest when looking for interesting differences, patterns, and trends in aggregated data. If you forget about the underlying counts, you’ll often end up chasing differences that vanish in the final analysis, or drawing conclusions that the data can’t support. 

Again, I’m not looking at values here, but what I can see is that there is a fairly long tail for most career groups, and that the shape of the distribution is similar across groups. There are a couple of careers with a shoulder, or with a more abrupt increase in counts for the top few bars, but the careers with the largest differences in distribution widths seem to be the ones with the lowest counts. That makes me wonder if the width of the distribution reflects the number of tools chosen by individuals, or whether it’s actually driven by variation between individuals within each group (everybody picks five tools, but no two people pick the same tool). I can’t answer that question from here just yet, but I’ll put it aside to dig into later, when I’m ready for more sophisticated comparisons. 

How much does the top tool vary across career groups?

The sorted bar charts can give me some sense of distribution, but they do a terrible job of helping me to track tool position from one career group to the next. Putting the same data into a different visual form makes that task a lot easier.

The multiple y plot gets crowded fast, and I didn’t want to draw out all those different connections individually, so I contented myself with drawing lines for the top 10, and will come back and put in the effort to build this out in code later, if it makes the final cut. I added color to the top few lines, just to help me follow them across a busy chart. If I were to build this specific chart out for actual use, I’d want to include animation to support a focus task or to select one or two lines to follow as a single narrative, rather than trying to look at everything at once.

I don’t want to get over enthusiastic with my conclusions here, but there is some interesting variability in the line shape for different tools. Excel stays pretty consistent in the top three slots, where other tools are first for some career groups and not even in the top 10 for others. This might suggest specialized tool sets for particular careers (e.g., designers use Illustrator, developers use d3), and that gets right to the heart of the comparisons that I’m trying to make. This is something that seems worth coming back to, when I have all of the data in place. 

I also think that it might be interesting to pull information about the relative size of the different tools into this diagram, in addition to their order in the ranking. I’d like to see how much bigger Excel is than Java, for instance, and if most Java users fall within a particular career group. For now, a quick-and-dirty way to do that is to add a stacked bar chart at the beginning of the diagram, showing the relative proportion of total users for each tool in the chart (in the order that they’re shown in the parallel axis plot). I’d want to do the same for each career, to look at variability in the distribution across groups, but this is enough to remind me to think through that piece when I come back to refine this later. 

How many different tools do people use?

So far, I’ve been looking at counts per tool, but it’s also interesting to explore how many tools are listed per user. This is an interesting question on its own, but it will also help me to get a handle on how much duplicate counting I’m doing in the previous charts. One respondent can identify multiple tools in this survey question: if I sum up all of my bars in the counts charts above, I get just north of 4,000 data points, but there are only 1,766 individual responses to the survey, and I know that some of those responses are incomplete.

Fortunately, because of the way the data is structured, getting a count of tools per user is as simple as adding a countA column to the dataset and doing a different pivot off of the same table. 

As expected, these curves are pretty asymmetrical: lots of people use just a few tools, and then some professions have a long tail of people who use just about everything. Some professions are much more variable than others, and the length of the tail varies a bit as well. In general, most people identified 10 or fewer tools, but there were also a couple of overachievers who ticked off all of the tools that I measured here. 

Another interesting thing to keep in mind is that there’s a strong behavioral component to this data. I’m sure there were a few people who picked only one or two tools, even if they have used many more in their career, and possibly others who dutifully checked off every single tool they’ve ever used. There’s probably some aspect of prioritization, frequency of use, and expertise/familiarity with the different tools that’s not captured here, and we have no way to tell precisely how much those variations affect our data. 

This is the difference between descriptive surveys and authoritative research. In a formal research setting, you’d put in structures and practices to minimize variation due to personal behavior or preference so that you could draw firm conclusions about a specific question. That’s not the intent of this survey, which attempts only a broader-stroke picture of tools that people use. It would be fascinating to do a follow-up study to dig into the specifics, but here we can only look at the responses that people provide, and interpret those as best we can. For our purposes, it’s  important to remember the limitations of this dataset, to consider the potential impacts and implications of those limitations on our analysis, and to be careful not to overstate our results. 

There may also be a gap between the tools that people use professionally and what they use in their personal projects. If someone is working on a team, they may not personally use d3.js, but it might be the final form for all of their work – just implemented by someone else. To really get at those details, we’d need to add several more questions (and a lot more complexity) to the survey. It’s always good to keep in mind what questions you can and can’t answer from the data, and where your questions and interpretation start to run up against the limits of the information that you have, and to ask whether that changes the level of effort that you’re willing to put into exploring a particular point.

What’s the most common number of tools per profession?

Another way to get at a comparison between groups is to do a median calculation. The median bin is shown as a teal dot in each bar chart above, to give me a point of reference for making sense of the distributions. I can also go for a more aggregated view, and count up the median bins per profession to make a derivative histogram showing the median number of tools for each career. For most professions, the median falls between four and five tools, but there are a couple with medians as low as two or as high as seven tools as well. I would want to look closely at those edge cases in the final analysis, just to make sure that I don’t have a hidden n-value problem giving me unrealistic medians or otherwise skewing the results.

Do you feel how much easier it is to trust this chart, with its confident axes and labeled values? Don’t let the representation fool you: this chart is still missing at least half of my data, and that makes count comparisons meaningless at this stage. The more aggregated your representation becomes, the easier it is to miss those important caveats lurking in the details. 

What is the median number of tools per profession?

Of course, as soon as I make this aggregated chart, I want to see which professions are in the seven tools bin, so I’d probably want to include a breakout of some kind or a supplementary view of the median value per career group if I wanted this chart to become the basis for my final analysis. 

This is another chart whose interpretation is highly sensitive to n values. I’m comparing across career groups, but not paying attention to how many responses were collected for each one.  Some of these “results” are based on 15 people and others are based on 700. Always keep your eye on the n values: I haven’t validated my categories here, and I’m pretty certain that at least some of my career categories will need to be merged and redefined before I include them here.

Tempting as it is to start comparing values, I have no business making any conclusions at this stage about why people in dataviz might use more tools than business, for example. I shouldn’t even start to speculate. People who are inexperienced with data analysis will often try to extrapolate from partial results and start imagining stories based on “the data” or “the trends I’m seeing,” but it’s important to completely deny that urge in the discovery phase. What you put into a chart determines the quality of insight that you get out of it, and “the data” is only as good as the analysis you’ve done. Getting over-attached to a blip in the numbers will only blind you to the more interesting information that’s really there, and it may lead you to make major mistakes. 

What have we learned?

So, what exactly have I learned from doing this exploration, if I can’t trust my counts or make any conclusions based on what I’ve seen? I have:

  • A set of questions to choose from, based on the final story that I decide to tell. I can pull from these later, when my core narrative begins to take shape.
  • An idea of what the analysis will look like for each question.
  • Notes about important things to look into, and warnings about things to avoid.
  • A preliminary view into some interesting aspects of the data, and some initial observations to verify as I work through a more complete analysis:
    • Some tools are quite popular, and are identified as important by almost half of survey respondents. Others have only a handful of users. 
    • Some tools are popular across all career groups, while others are more specialized, and common to just a few careers.
    • The distribution of tools across career groups varies somewhat, but usually in frequency/count rather than presence, suggesting that there might be interesting variability within career groups that could be worth teasing out.
    • Small n values complicate the analysis for several career groups, and reflect too many fine distinctions in my first attempt at manually tagging the data. I should consider excluding or merging certain categories, look for another way to improve the counts (merging in data from previous years, etc.), or consider whether these smaller groups can inform a lower-certainty, more qualitative picture. 
  • A list of things to consider next.

That’s a lot of information to get out of basic frequency analysis on individual data columns, but the more interesting questions for this data are going to require a bit more work. I knew that from the beginning, but starting here has helped me to get acquainted with the dataset and gives me options to consider when building my project. If the more complex comparisons don’t work out, it won’t be hard to find a new place to start. This basic sense of the data will also help me to evaluate the results that I get and to catch errors, as I wander deeper into more complicated territory. 

Comparing back to my design brief, though, I can see that none of these questions has really gotten to the heart of what I’m trying to accomplish (yet). I want to understand how individual people use different tools, and how that maps onto specific skill sets within the career groups. For that, I’ll need to look deeper into the relationships between columns, and across rows. Stay tuned! 

Coming up in Education:

  • Questions, comments, suggestions? Feel free to reach out to education@datavisualizationsociety.org anytime to share your thoughts. 
  • Keep your eye out for an Education/Early career event to talk about data discovery on Saturday April 2. We’re still working out the details, but will announce more via Slack and the DVS newsletter as we get closer to the event.
  • Are you a data analyst, a dataviz designer/artist or a dataviz developer/engineer? This year, the education committee is building out career portraits to help people understand what it’s like to work in these different roles. Please sign up here if you’re interested in supporting our research effort, or otherwise contributing to the project. (Note: we will be asking for additional careers in the coming months, but we’re starting with these three first. If this isn’t you, hold tight!)
  • Do you have experience in determining statistical significance for survey datasets collected without a control series? The tools visualization is one step in a larger project to map out career portraits using our survey data, and we need to get a sense of how big the variation between groups should be to count as real. We have our initial n values summarized and the basic analysis is done, but we could use some help getting the stats right and assessing feasibility for the more complicated comparisons. Please reach out to education@datavisualizationsociety.org if you know how to help.
  • Interested in joining the education committee? Applications are now open…let us know how you’d like to get involved!

The post Step 1 in the Data Exploration Journey: Getting Oriented appeared first on Nightingale.

]]>
10579