Zoë Wilkinson Saldaña Social Science and Geospatial Data Librarian Data, cats, pinball, Python, NLP

Creating a custom network visualization using the Scalar API Explorer, Part 1

Creating a custom network visualization using the Scalar API Explorer, Part 1 feature image

Scalar is a unique and powerful open source publishing platform. Its strength lies in its ability to combine linear and nonlinear methods of exploring media, narratives, annotations, and scholarship within the single organizing structure of a book. Scalar also provides several out-of-the-box options to visualize your data - including as an interactive network visualization.

But what happens when you want to customize your visualizations beyond what the Scalar presets allow for? I recently ran up against this issue and decided to find a way to create a network visualization “from scratch” (in reality, leveraging a number of excellent open source tools, demos, and APIs!) I used the Scalar API Explorer to export data about our pages and tags between pages, prepared the network data with Python and various packages (NetworkX, BeautifulSoup, etc.), and wrote a custom network visualization using D3.js and Canvas.

I waded through a fair bit of code and experimentation along the way, and I’d like to share with you some notes, lessons, code, and tools that reuslted from that process. I also tried to identify several places where you may wish to deviate from my process, or to experiment further, depending on your Scalar book and your own vision of what such a visualization might look like.

This tutorial is written as two parts:

  • Part 1: Represent your Scalar book as a network using Python and the Scalar API Explorer
  • Part 2: Create an interactive visualization of the network data using D3.js and Canvas

In Part 1, I will introduce the goals of this process and walk through the Python code needed to prepare your data.

...(read more)...

What does critical data science add to our understanding of sexual harassment in academia?

What does critical data science add to our understanding of sexual harassment in academia? feature image

A cautious introduction to NLP and Machine Learning methods in analyzing thousands of anonymous sexual harassment & assault reports.

Introduction

“The data are too messy.”

“There’s no way we could work through it in time.”

“I’d like to figure something out. But I don’t know.”

I was sitting in the grad student lounge in Ann Arbor with three classmates from Information Visualization. Each of us had the same Google Sheet pulled up on our browsers: “Sexual Harassment In the Academy: A Crowdsource Survey. By Dr. Karen Kelsky, of The Professor Is In”.

In just a few months, a call for anonymous survey submissions in the popular The Professor Is In blog had resulted in over 2,300 reports of sexual harassment and assault.

Our group quickly realized a few things: this data was immense. It was messy in the sense that data folks often describe messy data: non-standard, full of missing values and strange capitalizations. It was also data that spoke to an immensity of pain, the loss of futures diverted and destroyed.

One peer described the issues in the data, column by column, but she stopped when it came to the “Event” column. We reached a point where we stopped knowing what to say, and just made eye contact with each other. Eventually, our group passed on the Sexual Harassment dataset in favor of a climate change-related project (which generated its own share of complex data issues).

However, I couldn’t stop thinking about the survey. Thousands of individuals had revealed their experiences with sexual harassment and assault in academia, many apparently for the first time. These reports detailed the devastating effect these events had on their lives. This is vital, powerful data that deserves to have its story told.

...(read more)...