Tuesday, 30 April 2013

Bar Charts

I wanted to make a bar chart for a recent post on my real blog (on the difference between different citation databases for my recent papers).

I needed to look up enough of the details of how to do it in gnuplot, that I thought I'd document it here for my own use, and in the hope that others may get some help from it.  I pass my thanks to the gnuplotting blog for some useful tips to help me get started, and the nice idea that having less prominent grids and tick labels makes things easier to read.

The figure is above, and the necessary code follows, with commentary below:
set term 'pngcairo' font 'Helvetica,12'
set out 'cites.png'

set style line 1 lc rgb '#440000' lt 1
set style line 2 lc rgb '#882200' lt 1
set style line 3 lc rgb '#bb6622' lt 1
set style line 4 lc rgb '#ffcc66' lt 1

set style line 11 lc rgb '#808080' lt 1
set border 3 back ls 11
set tics nomirror
set xtics 1,1.0,25.0 offset -1.5,0.0
set style line 12 lc rgb '#808080' lt 0 lw 1
set grid back ls 12

set style fill solid 1.0 border rgb 'grey30'

set ylabel 'citations'
set xlabel 'Paper Index'
set xrange [0:25]

bw=0.15
plot 'cite-data' u ($0+0.5-1.5*bw):1:(bw) w boxes ls 1 t 'Google Scholar', \
     'cite-data' u ($0+0.5-0.5*bw):2:(bw) w boxes ls 2 t 'Scopus', \
     'cite-data' u ($0+0.5+0.5*bw):3:(bw) w boxes ls 3 t 'ISI', \
     'cite-data' u ($0+0.5+1.5*bw):4:(bw) w boxes ls 4 t 'Journal site'
I used the pngcairo terminal, in the belief that it is better than the png terminal, but I should really investigate that.

The block of "set style line" commands is used just to set the colour of the filling of the bars in the bar chart.  If you follow the link to the blog post from my other blog that I mentioned above, you'll see that I didn't have those colour-changing lines, and used the default colours instead.  I'm not sure which I prefer, but this at least shows how you can change them.

The next block sets up the border, tick marks and labels and the grid.  Line styles 11 and 12 are used to set the border and grid to be grey rather than black, to accentuate the actual data.  I used the set tics command to define user tick marks.  I set up the data file to have no abscissa column, so it will be plotting it just by the ordinal number, and a little playing with the offset enabled me to get the numbers in the right place.

The set style fill line is needed to fill in the bars.  If you don't set this, you'll just have box outlines, so if that's what you want, don't have this line.  You can specify the border to be a different colour from the fill, as indicated.

The last section sets up the bars.  I have four bars to show per ordinate.  I decided to make each one take up a fraction of 0.15 the width of the space available for each data point.  The bw=0.15 sets up a variable to store this.  In the plot command, the $0 specifies the ordinate for each point, since I don't provide x-value points in the cite-data file.  The calculations such as $0+0.5-1.5*bw specify the centre of the point at which the bar appears, and the (bw) the width.

If you want to play with this, here is the file cite-data referred to in the gnuplot script.