top of page

The Data/ The Charts/ 

The Composition/ 

The Code/ 

The Data

So how do you take 30,000 letters and make music? Well all the letters had to be organized. The best way to take a random, patternless string of letters was to divide each string by its protein, and then look at the frequency of different bond groupings. There are 26 different proteins that make up SARS-CoV-2 each of which range from 117 letters to 5,835 letters. Which is a LOT of letters to filter through before finding some pattern. But to start counting the frequency of each individual A, G, C, U , main double bonds GU, AU, UG, and all the remaining double bonds AA, AG, AC, GG, GA, GC, UU, UA, UC, CC, CA, CG, and CU, I needed a little code and charts to help me out.

initialize string from txt
test_str length
test_str to list
for loop to count individual chars
individual char count via hard coding
count of 3 main bond pairs
count of all other bond pairs

The Charts

Taking the python code outputs and converting them into a plain data table made the strings of letters more tangible. I was getting close to a pattern. From 30,000 random letters, I knew that I could find something from 546 numbers that had an association between protein type and base combinations. From this large data set I uncovered 9 variations of data visualizations. The code was starting to crack visually. Which meant that step three: developing a graphical score from the data visualizations, was starting to materialize. 

SARS-CoV 2 Genome Map

Click & Drag Code

to get a closer look

With a little bit of python coding I extracted the frequency of each letter and bond using string/character/list conversions and for loops from individual text files of each protein RNA sequence. Below is an example of the 1st RNA sequence along with the code. With this first step complete, I now had the data somewhat organized for step two: finding a pattern by converting numerical data into charted visualizations.

chart%201_edited.jpg

The Python Code

Click Data Table

to get a closer look

Hover Over Charts

to get a closer look

Click Image

to find its source

So we took the genetic code, we organized the RNA sequences, we converted the letters to numerical data, and we charted out the data with graphical visualizations of frequency to bond types. Where do we go from here? All we have are some pretty graphs, a simple python code, and a neat table with numbers. But what does this all mean? What can we do with this information aside from stare in awe at its chaotic beauty?

We have 

   DATA! 

      So..... 

          now what ? 

About  
 

The artist behind this project, Shambhavi Mishra, is a current undergraduate student at Carnegie Mellon University studying Humanities Analytics, Music Composition, and Sonic Arts. To see or hear more of her projects please visit her website below. 

edied-P26.png
  • SoundCloud
  • LinkedIn
  • Instagram
  • YouTube
Contact
 

To contact the artist please fill out the form below or send an email to: 

shambhav@andrew.cmu.edu

 

To contact the Frank-Ratchye STUDIO for Creative Inquiry please visit their website: 

https://studioforcreativeinquiry.org

Thanks to The Frank-Ratchye STUDIO for Creative Inquiry
 

For providing a grant through the Residency-In-Your-Room Fellowship to fund the creation of this project during the summer of 2020. To learn about other projects check out this link below: 

https://studioforcreativeinquiry.org/riyrf 

Thanks for submitting!

© 2020 Shambhavi Mishra. All Rights Reserved.

bottom of page