This is an assignment I recently completed for my Digital Humanities Methods class in the Information Studies department, taught by Prof. Cindy Nguyen. The assignment was to watch a technical workshop from the Office of Advanced Research Computing archive and write a critical autoethnography reflection (500-700 words) guided by the following questions:
- Describe the workshop/technical training you attended.
- How might you critically implement some of the things you learned into your work?
Workshop Numerical Computing with Python: Intro to Numpy, Scipy, and Pandas Instructor Ben Winjum
In my work, I investigate the mathematical methods that underlie algorithms, computation, and AI. Currently, I am taking a class on Critical Approaches to Digital Humanities and TA’ing for the Data Justice and Society Cluster. Both classes seek to apply a critical lens to the relationship between technology and the social. In Digital Humanities, Much of digital humanities work focuses on the adaptation of computational and quantitative methods for humanistic questions and the digitization of the cultural record. Alongside doing this work, a number of questions come up about labor, ethics, and the politics of the digital. In Data Justice and critical data studies, we seek to critically engage with the emergence of big data and consider the ways data can be used to fight for justice and a better society. For me, all these roads lead back to mathematics. The methods of computation, data analysis, and artificial intelligence all emerge from the operationalization of mathematical techniques. As argued by Cathy O’Neil, often these mathematical tools are implemented uncritically and assumed to be unassailable. Mathematics tends to be assumed to have a unique claim to knowledge, removed from social context.
My current goal is to investigate the mathematics of digital humanities and data justice, particularly focused on the use of algorithms and machine learning techniques, to learn about the mathematics of these commonly used techniques from the ground up. Beginning with the fundamental mathematics techniques behind model training, classification, analysis, and machine learning. One of the fundamental mathematical subjects in machine learning is linear algebra. Alongside my core coursework, I’ve been self-teaching linear algebra using two books, Mathematics of Machine Learning and Linear Algebra Done Right. My goal in choosing this tutorial was to put some of my linear algebra knowledge into practice and learn about how the concepts of linear algebra work in a computational context. In light of my increasing focus on mathematics, I decided to choose the workshop Numerical Computing with Python from the Institute for Digital Research and Education. In this workshop, the instructor focused on four Python libraries that are foundational to scientific computing with Python: Numpy, Scipy, Pandas, and Matplotlib. I have some familiarity with using these libraries but wanted to gain a firmer foundation, especially in light of my current research focus. This tutorial was conducted using a provided Jupyter Notebook. Jupyter Notebooks provide a programming environment with a mix of text cells and code cells. Users can run the individual blocks to read output alongside text that explains the purpose of the functions used. One thing I found interesting was that the instructor referred to Jupyter as a tool for creating computational narratives because of the mix of text and code cells. In the Data Justice and Society last quarter, we taught students about redlining via the Jupyter Notebook created by the Mapping Inequality team. This project explicitly fits the definition of computational narrative used by the instructor, where data analysis is combined with research and information on the history and legacy of redlining in Los Angeles. I think the idea of computational narratives has potential as a point of departure for critical digital humanities work because it emphasizes the storytelling and narrative work being done in DH projects, and this could be a potential site for intervention and critical frameworks. I would argue that all analysis and computation work either implicitly or explicitly creates a computational narrative.
This tutorial covered each of the Python libraries within the broader context of the scientific computing ecosystem for Python. Numpy provides the basis and is a fundamental package for doing numerical computations, especially using arrays and matrices. Within machine learning, scalars, vectors, and matrices are important for representing data and doing the calculations needed for models. Numpy provides an array datatype that allows you to represent these mathematical concepts and, because of specific optimizations used in the library, provides efficient computation and calculation compared to Python out of the box. Scipy builds on Numpy and provides additional numerical algorithms and methods. Many of the popular machine learning tools and libraries rely on Numpy and Scipy under the hood. Matplotlib is a library for visualizing data, and finally, Pandas is a library for reading, writing, and manipulating tabular data. Previously, I have used Pandas a lot while working for the UCLA Digital Library program, preparing records to be ingested into the system. This workshop was very helpful in connecting the mathematical topics I’ve been learning to specific Python tools and techniques. Next, I hope to continue learning from the workshops to understand the bigger picture of how these libraries and methods for matrix and data manipulation fit into AI and ML applications to develop my own understanding and framework for critical technical practice.