Fluffykittens
Rev Dan Catt’s Blog

Latest Achievements
Total Completed: 140 / 931
-
Last week on Flickr
(0) -
Last week on Flickr
(0) -
Last week on Flickr
(0) -
Guardian Datablog Competition & Cleaned up JSON data for “1,000 songs to hear before you die”
(2)A short while ago I was playing with the data from “1,000 songs to hear before you die” from the Guardian’s Datablog. The idea being that I’d find the artists they’d missed by crunching all the bands/artists though last.fm looking for the similar artists that appeared most often that weren’t already on the list.
I got as far as cleaning up (more on that later) the data, adding the MusicBrainz IDs (mbid) for each artist and converting it to JSON, a format I tend to work with a lot. But it turns out that moving country actually takes up a surprising amount of time, so I put it to one side for later.
Now I see that they’re running a Competition to visualize some of their data, on the grounds that someone may find this useful and I wanted to give GitHub a try, I’m throwing out the data I cleaned up. Hopefully, if I followed the instructions correctly it’s over here …
Guardians-1-000-songs-to-hear-before-you-die
The file you’re specifically after is js/guardian_1000_songs.js which will give you these handy js object thingys;
-
guardian_1000_songs.data
- The original data, cleaned up from the Guardian’s spread sheet
-
guardian_1000_songs.artists
- All the artists, along with the number of times they’re referenced, the Music Brainz ID (mbid), if last.fm knows the mbid and the tracks
-
guardian_1000_songs.artists_a
- An array to hold the key values for the artists object
Obviously this is geared towards JavaScript and AJAX api type calls, but it should take much effort to convert it back into spreadsheety formats.
I’ve also included an example file datagrab.html that should work in Firefox, untested on anything else.
A note on the data and what to do with it
Each artist has a MusicBrainz id for it, you can call the last.fm API passing over the mbid to get more information about that artist, including similar artists, like this …
However there are a few records that don’t have mbids, these three;
- Crystal Mansion
- Grange Hill Cast
- Sheffield Socialist Choir
Don’t have records at MusicBrainz, although you can still find them via a normal artist search at last.fm, they have a mbid of “0″.
These six have ids at MusicBrainz, but last.fm doesn’t know what it is, I’ve flagged this in the data as “mbid_known_by_lastfm”;
- Bon Iver
- Glasvegas
- Lou Reed & John Cale
- Sam Mayo
- The Ting Tings
- William Blake
For the life of me I can’t work out why last.fm doesn’t have an ID for The Ting Tings but there you go.
So out of 994, you now have 985 music tracks for which you have an artist’s mbid.
Cleaning the data
A small aside about getting to this point.
The original Guardian data is over here and frankly a bit funky. I grabbed it as a CVS and did a rough conversion to JSON. I then ran through each row in turn searching for the artist/band via the last.fm API.
This pulled up a number of records where there was either more than one match (where the first match was normally the correct one) or no matches at all. Of the no-matches a closer look showed that there were a number of places where the artist and trackname are swapped over, the odd typo and so on.
Often it turned out that while The Guardian had “Kid Creole and the Coconuts” last.fm likes “Kid Creole & The Coconuts” so I fixed all those up and went over it again.
That left me with about 60 or so to go though by hand, which took a couple of evenings. I would convert it back to a Google Docs SpreadSheet but don’t actually have the time atm.
But if you find any mistakes, omissions or use the data in any interesting ways let me know in the comments or fix it up on github :)
data, datablog, General, guardian -
-
Last week on Flickr
(0) -
Last week on Flickr
(0) -
Last week on Flickr
(0) -
Last week on Flickr
(0) -
Rainbow Vomiting Panda Delux – Alpha 1.0
(6)I’ve been playing with the Panda, because … well just because.
In the mean time, I’ve pushed out an Alpha version … i.e. it’s a bit sketchy in IE, because I’m trying to be at #openhacklondon in spirit :) and it’s Flickr’s 5.25 Birthday tonight. Which seems like a good alignment of omens to push a Version 1.0 out.
Next version (unless someone else does it) should allow you to specify a tag/group/user to display instead of the Panda’s selection and to point to a different mp3 to play in the background.
Anyway, it’s here: http://www.fluffykittens.com/projects/panda2/#playmusic=1|autozoom=1.
Back to that “just because” bit … mainly because the Flickr Panda displays well on a portrait orientated monitor, which just happened to be how I had my monitor at work. But it was pointed out that it’d be fun to plug it into a TV or projector, and for that it needs to be landscape. So I grabbed the other Panda and made this instead.
The photos come from the flickr.panda.getPhotos API method. Music is played with SoundManager2, the Panda comes from The Searcher and the zooming stuff with a hacked up version Fancy Zoom.
-
Last week on Flickr
(0) -
Last week on Flickr
(0) -
Actually managed to play some World of Warcraft
(0)Getting all the Noblegarden Achievements all in one night, pretty much with Charlie’s help all the way. Oh and a natty new outfit to boot.
Which is nice, because the WoW stats on my blog haven’t been updated for ages making me look like a slack gamer, which will never do!
worldofwarcraft games -
Last week on Flickr
(0) -
Phew, new geobloggers post!
(2)Whoo, I’ve finally gotten round to posting on geobloggers! It’s been frankly killing me not really having the time to post. And when I do get the time, I generally knock something out for the flickr dev blog instead.
Anyway, for a long(ish) time I’ve been thinking about what happens when large websites just shut down. Mainly from links to and from here …
Eviction, or the Coming Datapocalypse, Datapocalypso! and rather subtly
FUCK THE CLOUD.Those generally kept leading me to the thoughts of, obviously, Flickr being 5 years worth of an awful lot of everyday life. If we/it stopped accepting uploads tomorrow, it’d still be an incredibly important collection of photos.
Which then started me off along the concept of a blog post trilogy …
- A World Without Flickr
- Every Building with a Shoebox in it’s Basement
- Peer-to-Peer Photos, a culture tax on your Hard-drive
The idea being that the first post plays around some “what if” thought games, how does all that important information get persevered and so on. The second is a stab at the locations you take photos accepting some responsibility for photo preservation, rather like an image sponge. And finally some way to spread photos around users own desktop machines.
The problem being that I’m kinda stuck on the first one, I keep flip-floping over exactly what I think, while at the same time, really wanting to push out my thoughts on the second part. And I know how this plays out, as I’ve done it before, I never get to any of them, due to being blocked on the very first bit.
So I’ve just gone ahead and posted: Every Building with a Shoebox in it’s Basement.
It’s not the best bit of thinking ever, but I’m happy just to get stuff down onto paper, well, kinda paper.
I also know this is a little meta linking from there to here, a blog post about how I feel about that blog post.
Anyway, while I’m here, there’s also another not really related bit. I go though cycles of reading everything I can get my hands, eyes on, about the subject I’m interested in … and then other phases where I don’t want to read anything, because I get to the point where I’m not sure if my current point of view is just borrowed from what I’ve recently read, of because I’ve kinda got to there myself.
I’ve been in one of those, “keep myself pure” phases for the last 4 months. Which means it’s entirely possible that someone else, and probably someone I know, has already written about what I’ve just written about, only far far better, and I just haven’t realize.
Once this is all out of my system, I’ll start reading again to catch up :)
-
Last week on Flickr
(0) -
Last week on Flickr
(0) -
Last week on Flickr
(0) -
Last week on Flickr
(0) -
Last week on Flickr
(0) -
Killing the Twitter Posts
(0)Boo! I’ve just turned off the automatic twitter posts. When I first did them I wasn’t twittering very much, less than a twitter a day. So it seemed reasonable to write some code that’d make a blog post every 6 twitters or so.
This in combination with the weekly automatic flickr post was supposed to make a nice two or three posts a week that’d drop in-between more regular blog posts.
Turns out I’m not very good at ‘more regular blog posts’ and twitter a lot more now. To the extent that the twitter posts are frankly annoying.
And so, they are dead.
I may look at rounding up tweets that have been faved recently though and see how that goes.


























![Rainbow Vomiting Panda Delux - Alpha 1.0 [firefox - safari version]](http://farm4.static.flickr.com/3626/3515381251_ae14886804.jpg)
























