array reduce method is useful :)

In the D3 documentation, Mike Bostock writes: 

When using D3—and doing data visualization in general—you tend to do a lot of array manipulation. That’s because D3’s canonical representation of data is an array. Some common forms of array manipulation include taking a contiguous slice (subset) of an array, filtering an array using a predicate function, and mapping an array to a parallel set of values using a transform function. Before looking at the set of utilities that D3 provides for arrays, you should familiarize yourself with the powerful array methods built-in to JavaScript.

I’ve been impressed with how much I can do if I use these built in methods.  And impressed by how anything I learn about these built in methods immediately make my life easier (and my programming more effective).  But, there are still some that I’m under using or not using at all.  When reading someone else’s code today, I noticed them using “reduce” quite effectively.  

MDN explains reduce here.  

I can’t wait to use this for doing things like finding the max of an array, but for more complicated comparisons than max.  In the past, when facing operations like this, I’ve been using “forEach” instead with a variable that I initialized outside of the forEach as the previousValue.  This will be much cleaner.

sometimes things are easier than they seem

It’s been a little while, but here goes again.  Think I’ll go for shorter posts now 🙂

So, the truth is that I think of some aspects of html, especially html5, as sort of magical. Today I wanted to create a slider.  And… it turns out it’s not that hard :).  

Example at jsfiddle: link

Create an <input> in html with type ‘range’ and min/max values, and listen for the value.

transitions & update in d3

Today’s goal was to respond to events in d3.

My bible for this is Scott Murray’s fantastic book: “interactive Visualization on the Web”, particularly chapter 9 here.   The goal was to make a box and click on something to make that box change color.

Nothing spectacular, but here it is!  link  It is pretty remarkable how easy d3 makes this.

Bonus points included making it happen more than once (bigger & smaller, loop through a color list) and figuring out the div IDs to respond differently to clicking on different parts of the text.

[edit] – and wow, responding to mouse events is so easy & awesome! link

color interpolation

It’s been a little while as I started playing with some real, but confidential, data.  But, I’m back to talk about colors!  This is also my first time playing with the scales functions from D3.  Live code here.

numbers –> colors

My goal was to turn numbers into colors that were quantitatively meaningful.  To do this, I used the scales functions from D3 – documentation & tutorial.  These functions create a mapping from an input (domain) to a output range.  Then, submit any value in the domain and the function will return the corresponding value in the range.  This is massively useful for mapping data to # of pixels when plotting the data, without changing the underlying datasets (so you can still do mathematics operations on them, display the raw values, etc).  As a bonus, it can also be used to find a mapping between numbers and colors, and to interpolate between two colors.

For colors, Mike Bostock’s D3 offers interpolation in 3 different color spaces: RGB, HSL, and HCL.  Curious to understand the difference between these color spaces, I created a little script to visualize them.  link  Change the colors in the javascript on line 8 to a different pair & press “run” to compare different end points for the colors.

Which color scale do you think looks the most natural?  Shows a quantitative gradient from one side to the other?  

blue to orange

Screen Shot 2013-08-16 at 12.02.52 PM

Screen Shot 2013-08-16 at 12.05.15 PM

So, what are these color spaces anyway?  

From Wikipedia:

RGB uses additive color mixing, because it describes what kind of light needs to be emitted to produce a given color. Light is added together to create form from out of the darkness. RGB stores individual values for red, green and blue.RGBA is RGB with an additional channel, alpha, to indicate transparency.

HSL (hue, saturation, lightness/luminance), also known as HLS or HSI (hue, saturation, intensity) is quite similar to HSV, with “lightness” replacing “brightness”. The difference is that the brightness of a pure color is equal to the brightness of white, while the lightness of a pure color is equal to the lightness of a medium gray.

And from hclcolor.org: HCL is the least written about, at least as far as my Google searches go.  But, it seems quite interesting.  “HCL is an acronym for Hue, Chroma, Luminance. It is a color-space which lends itself to easy control of the visual impact that colors have on our perception” – link -.

Ross Ihaka: “There is a problem associated with choosing colours in the RGB or HSV spaces; the way that colours are spaced in these spaces is not in accordance with our perception of how similar or different the colours appear to be.”

And…  “it is crucial to rely on the distance criterion which states that the distance D(c1, c2) between two colors c1 and c2 is correct if and only if the distance value is close to the difference perceived by the human eye.”

In other words, HCL attempts to align the distance in the color space metric to the perceived difference between two colors.  The question is, do they do it?  🙂

Edit: this appears to be another color space.

callback functions

I’m in the stage where I’m starting to be able to do more, but nothing is easy yet :).  Fortunately, I’m heading out to Western CT for a very hilly & rocky 7 mile trail race early tomorrow morning, so by tomorrow afternoon my body will be as tired as my brain is now.

This week was effectively a 3-day week as I spent Monday prepping & giving an unrelated presentation & yesterday was an offsite with the office.  Before I head out for the weekend, just wanted to celebrate two things: reading in real data & creating a day-of-week profile.

London makes data from their cycle program easily available here.  This is just # of rented per day since summer 2010.  Reading in data isn’t so hard now.  But, I had to wrap my head around the nature of callbacks & asynchronous loading.  More on that later.

Creating the day of week profile required a lot more array manipulation than I’d done before, in particular leveraging native array methods like “forEach” and “filter.”  This gave me more practice with call backs as well.  And, as a bonus, I got to spend time with date handling… which was an adventure.  Next week I might try translating dates into numbers, instead of treating as dates, with a function for going to & from the date strings/numbers.

In the chart below, blue is weekends & red is Monday.  The bottom chart shows the typical day compared to average, with Sunday on the left.  Obviously I need to add in some labels to make this more meaningful.  I think that Monday is being dragged down to average a bit by holidays, which I haven’t handled for.

Screenshot from 2013-08-02 17:53:57

 

Will put on fiddle on Monday.

Next steps next week: scales & mouseovers & (maybe) reading in data using native javascript!

reading in real data!

So, the coolest thing about learning how to write code for the web is that every website is an example.  Today’s goal was to figure out how read in real data (in more than one way).  I found this super mysterious on Friday when I started looking into it, but knew that I was missing some core concept.  Today I dove back in, found some good references, and figured out what I was missing conceptually.

At first I was looking at this documentation, but realized I was missing something about where the data got stored.  In the past in R reading in data looks something like “data.frame = read.csv(‘wheredatais.csv’, parameters… )”.  But, in d3, it’s not about setting the variable equal to something coming out of d3.tsv/d3.csv/d3.json.  Rather, it seems to be about defining functions within the d3.csv parenthesis which includes either using the data or storing it in some variable.  I’m just figuring this out, so am likely butchering the explanation :).  But, launch & iterate, right.

They key was reading this note on stack overflow:

The principle behind d3.json() is to do everything in this function, which will be executed when the json is loaded:

var data; // a global
d3.json("path/to/file.json", function(error, json) {
  if (error) return console.warn(error);
  data = json;
  visualizeit();
});

and this on Mike Bostock’s original documentation on the more general topic of data requests:

Also, you may find it convenient to save loaded data to the global namespace, so that you can access it after the initial render, such as during a transition. You can do this using closures, or simply assign the loaded data to a global:

var data; // a global

d3.json("path/to/file.json", function(error, json) {
  if (error) return console.warn(error);
  data = json;
  visualizeit();
});

Once I understood that the function is the key, I was able to load in data.  I wanted to find data that was universally accessible.  So I found the d3.tsv call in the page source for a New York Times interactive visualization on picking players for the NFL draft.  In my browser’s console, I wrote:

var data;

d3.tsv(“http://graphics8.nytimes.com/newsgraphics/2013/04/18/nfl-draft/d9c0f4a098da672625afadfee7be4135fb0f724b

/picks.tsv”, function(error, picks) { data = picks; });

data;

And there it was – a beautiful array!  Easy peasy :).

histogram success!

[live code here – each time you click run, you’ll get a new random walk as the dataset]

One of the cool side-effects of creating graphs from scratch is having to really think about what defines a particular graph or function.  Today was about the histogram!

Ignore the terrible color choices and lack of labels.  Better things will come in time.  The left chart are the values of the random walk.  Red highlights areas of significant change.  On the right is a histogram of the values.  Pink is negative bins and grey are positive.  I’m currently using a bin width of 7.  There are 1000 values, with a start point of zero and a maximum step between points of 5.

Screenshot from 2013-07-25 17:18:11

I actually wrote three slightly different functions for binning, all based on the suggestion here.  The main challenge was what I wanted to do with the bin & value information once I had it.  Put it in an array?  In an object?  In an array of objects?  If you click on the super tiny picture below, you can see the three ways I was considering storing the data.  While it made the histogram function a little less elegant, I eventually opted for the array of objects in order to make it easy to create the chart (test2 in the picture).  Deciding how to store the data was a great exercise in getting to know arrays vs objects more intimately.  Sending out a thank you to my colleague Cesar for helping me think through the options!

Screenshot from 2013-07-25 17:18:26

If you look closely, you’ll also notice an interesting javascript oddity.  The “test” array seems to have fewer elements than the others.  This is because each object was created as array[bin] = value.  When the bin is positive, the array treats it as an index which it assumes start at 0.  The methods on this array also assume that it start as zero, so array.length only counts the number of positive bins.  Interestingly, you can actually access the negative ones by array[-1], etc.  They’re still there, sort-of.  Mostly, javascript didn’t mind that I created an array with negative indices so didn’t stop me, it’s my problem to figure out if that’s what I actually wanted to do or not :).

Three versions below (click to enlarge):

Screenshot from 2013-07-25 17:30:41

next steps

some possible challenges for the day:

  • create histogram
  • highlight all bars larger than the one I’ve selected
  • create fake data that mimics timeseries (w/ seasonality, dow, noise, trend, holidays)

Also, starting to find myself thinking about where to define things in the code.  Goal for the moment is to think about how I can define meaningful variables, and key everything off of those.

bars: step 2 – learning from the web

Need to learn a little more about data handling in javascript.  Good reference from Bostock here.  Scales tutorial by Scott Murray here.

Also, some new SVG shapes like lines.  (reference) 🙂

This stat library could be useful?  http://www.jstat.org/

Success!  Now I can make charts of random walks of data, and highlight times of change 🙂

Screenshot from 2013-07-24 18:47:14

 

and my notes for making the graph 🙂

 

IMG_20130724_185039