The Data/ The Charts/
The Composition/
The Code/
The Data
So how do you take 30,000 letters and make music? Well all the letters had to be organized. The best way to take a random, patternless string of letters was to divide each string by its protein, and then look at the frequency of different bond groupings. There are 26 different proteins that make up SARS-CoV-2 each of which range from 117 letters to 5,835 letters. Which is a LOT of letters to filter through before finding some pattern. But to start counting the frequency of each individual A, G, C, U , main double bonds GU, AU, UG, and all the remaining double bonds AA, AG, AC, GG, GA, GC, UU, UA, UC, CC, CA, CG, and CU, I needed a little code and charts to help me out.
The Charts
Taking the python code outputs and converting them into a plain data table made the strings of letters more tangible. I was getting close to a pattern. From 30,000 random letters, I knew that I could find something from 546 numbers that had an association between protein type and base combinations. From this large data set I uncovered 9 variations of data visualizations. The code was starting to crack visually. Which meant that step three: developing a graphical score from the data visualizations, was starting to materialize.
Click & Drag Code
to get a closer look
With a little bit of python coding I extracted the frequency of each letter and bond using string/character/list conversions and for loops from individual text files of each protein RNA sequence. Below is an example of the 1st RNA sequence along with the code. With this first step complete, I now had the data somewhat organized for step two: finding a pattern by converting numerical data into charted visualizations.
The Python Code
Click Data Table
to get a closer look
Hover Over Charts
to get a closer look
Click Image
to find its source
So we took the genetic code, we organized the RNA sequences, we converted the letters to numerical data, and we charted out the data with graphical visualizations of frequency to bond types. Where do we go from here? All we have are some pretty graphs, a simple python code, and a neat table with numbers. But what does this all mean? What can we do with this information aside from stare in awe at its chaotic beauty?