2019 Week 40: Charts and Code

Short version: Intuitions about how we look at data. Taking small steps with code. Art at the Tate Modern.

Long version:

Sneaky Spreadsheets

This week I have been processing some experimental data and realised how the same data can be plotted in seemingly different ways. Often looking at the numbers themselves can be challenging, particularly if there are hundreds of points with multiple values at each point. For this example we take some simple data about income distribution in the UK. The numbers alone are accurate but making inferences takes consideration and a little mental arithmetic.

A linear column chart is the simplest and fastest way to visualise the data in a spreadsheet program (excel, google sheets, open office calc). It shows the 99th percentile as much larger than the other values, and the increases seem bigger between the higher percentiles.

Taking a logarithmic plot makes each step along the distribution seem more even, but the 99th percentile still seems noticeably larger.

But by changing the minimum axis value on a logarithmic plot the distribution seems much flatter overall.

And we can go to the opposite extreme, by plotting a pie chart the high income earners (top 1%) seem to take a massive share of the pie.

Code

I am not proficient in code, and find that embarrassing. That is an odd statement, I’m not proficient at many things (skateboarding, Portuguese speaking, cello playing) but few of those things cause me to feel embarrassment. Technological proficiency, unlike skating or playing a string instrument, is a part of my identity, and I feel that being able to code (or at least write basic scripts) is a part of being good with tech. When I go to write code, I am confronted by the gap between what I feel my skill should be and where my skills actually are, which is uncomfortable. Moreover it causes me to hide, refraining from asking for help, and I end up doing things in a slow and repetitive way. To push back against that urge, here is an attempt to write a simple python script to extract folder information from a directory tree.

import os
import re
print "This removes files from saved directory trees"
path = "./"
print "I will run in the local folder"
linecount = 0
for filename in os.listdir(path):
        if filename[len(filename)-4:]==".txt":
            with open(os.path.join(path, filename), "r") as file:
                output= open(filename[0:len(filename)-4]+"_folders.txt","w+")
                for line in  file:
                    if len(line) > 6:
                        if line[len(line)-6]!=".":
                            if line[len(line)-5]!=".":
                                if line[len(line)-6:]!="Cloud\n":
                                    output.write(line)
                                    linecount = linecount +1
            print("I found "+str(linecount)+" lines and put them into "+filename[0:len(filename)-4]+"_folders.txt")
            linecount = 0
        else:
            print("I did not process "+filename)
            linecount = 0

Subscribe to email alerts!

Loading

Photos from the week

From the Tate Modern exhibition by Olafur Eliasson

2019 Week 10: Fitness in Science

Short version: I grew up thinking science-y people weren’t fit, but there is plenty of fitness in science, and scientific reasons to keep fit. I share some thoughts on anatomy, metrics, protein powder, and astronauts.

Long version:

Personal Observations

I remember thinking of exercise as inherently a waste of time; why would you ever want to run in circles and just end up at the same place? I’m sure this was in part informed by the media I consumed growing up; portraying the stereotypical nerd as being interested in mathematics, science, technology, along with a lack of physical fitness (also, in retrospect, portraying very fit people as not particularly bright). I identified with those archetypes and spurned exercise through much of school, as did many of my peers. It was later in life that I realised improving cardiovascular endurance was important to health. Starting to run I discovered the joy of Runner’s High. A competitive mindset and an internship in an anti-doping laboratory led me to build regular exercise into my routine, something I’ve enjoyed maintaining for the past few years.

Athletes’ Anatomy

Athletes setting world records are obviously different from the norm. Skill, dedication, talent, training, and genetics all contribute. I find conversations about athletes success tend to drift towards the genetic element, perhaps the intrigue is due to the allure of quantifying potential, or perhaps it provides a comforting fatalism for the undertrained. Most likely it is interesting simply because it is poorly understood compared with the simplicity of regular training or perceptible skill.

David Epstein gave a TED talk in 2014 where he shared a number of facts about the nature of athletes’ physicality. It particularly stood out to me that a transition in sport occurred (in parallel with the rise of broadcast media) from favouring a generalist body type of average proportions, to a plurality of extremes. One of the most memorable statistics is that Hicham El Guerrouj and Michael Phelps, who differ in height by 17 cm, have the same length legs (running advantages longer legs proportional to height, whereas swimming is the opposite). These characteristics are difficult to change: no amount of training will allow these two to exchange their body type. Training can however alter different aspects of the body to similar extremes.

Physiological adaptations from training can be as radical as the size difference between NBA basketball players and Olympic gymnasts. Specifically, athletes’ hearts really are significantly bigger than those of the untrained population (particularly endurance athletes). The body responds to stress, and the process of repeated exertion to influence adaptations that increase performance for a given activity is the basis of all training. When I worked in anti-doping an office legend described a cycling team that, in the days before blood doping was banned or effectively enforced, would need to sleep with heart rate monitors that would wake them if their heart rate got too low for fear of their hearts stopping altogether.

Marathon Times and Personal Metrics

I’m pretty motivated by quantifiable goals. Either arbitrary times (usually round numbers) or achieving a certain relative performance (e.g. placing in the top 1%). This paper examining marathon finishing times suggests I’m not alone. Times tend to bunch below “whole numbers” such as 3 hours and 4 hours, as well as smaller bunching observed across 5 minute increments, as people dig a little deeper to get below their goal times.

Links:
More statistics on half marathons and marathons. BAA Marathon and Half-Marathon results with the code shown. (I would like to be able to code informative charts like this.)

Protein Powder

The literature suggests that, when combined with training, protein supplementation increases gains in strength. I find that protein powder is a convenient way to add protein to my diet, particularly as a vegetarian. The NHS points out that the same benefits of protein powder can be achieved from other protein-rich foods, and that the lack of vitamins and nutrients of protein powder compare to a balanced meal make it a poor replacement for meals. It also recommends not exceeding intakes of 111 g per day for men or 90 g for women, which more or less concurs with the BMJ’s study suggesting the benefits of protein supplementation cease after 1.62 g/kg/day i.e. 120 g for a 75 kg person.

Importantly from an environmental perspective, looking at the World Resources Institute protein scorecard I wrote about in Week 4, dairy (from which whey protein is sourced) has the third highest impact, more than chicken and pork. Fortunately vegetable sources (i.e. pea protein) has a much lower footprint than conventional animal sources and pea protein is just as effective as whey protein in producing additional muscle growth.

That all said, there are good reasons to be skeptical of any benefit of supplementation at all beyond a healthy balanced diet. Trying to define a healthy balanced diet though could easily be several papers (or blog posts) by itself.

NASA Twin Study

I am eagerly awaiting the release of the integrated paper covering the NASA Twin Study. I suspect this will be the most intensive series of measurements made of any individual for some time. A brief summary by the Scientific American.

Photos from the Week: Solid water.

In the first photo, unusually clean ice traps dissolved gasses as they are forced out of solution. The second and third photos show Oxford’s spring weather variation.

Focus and LaTeX

Job Hunting Update
My primary goal is still finding work. So far that looks like:

   Positions Considered          200
   Applications             50
   Rejections               7
   Recruiter Calls       Many
   Interviews               6
   Offers               0

Unfortunately still no offers. Progress is accelerating, having found clarity in my own goals.

CV and Resume
In this job application period, I’ve used 3 CV/Resumes:
CV Version 1:   Simple bullet point list.
CV Version 2:  Google docs template, better design but less content.
CV Version 3: LaTeX template, content dense whilst still fairly clean.
I am happy with the result of Version 3, and hopefully it impresses some employers. The process of formatting content was itself an act of introspection, a useful reminder that presentation is intertwined with content in transferring meaning. Also it is hard to send a message if you are not sure of the content yourself.

Personal Aims
In preparing my latest CV, I felt the need to include my personal aim in this job search, which I’ve distilled to the following:

Find a career solving complex and rewarding problems, with opportunities to develop skills and knowledge. Work with a diverse and experienced team, aspiring to one day lead investigative research and development.

I want to find work where I can grow, and while education and finance are interesting fields, it is the application of scientific and logical tools, rather than the content itself that interests me. Career progression in scientific research seems to require a PhD. The fastest path would be to do honours at the University of Sydney, but having spent nearly 7 years in and around USyd I feel it is time to diversify my experience. Additionally maintaining a relationship over the longest possible distance can be quite tedious.

LaTeX
Last weekend I finally familiarised myself with the document preparation system LaTeX. I had played with it a little before, but hadn’t taken the time to go through a tutorial in its entirety. I’m using the editor texmaker and have been happy with it.

Writing from the Oxford Hacker-space.