./backlog: Charlie's blog

meanderings through tidbits of mathsy computery stuff

An introduction to gradient boosting

Well, it’s pretty much all in the title. Gradient boosting is one of those unavoidable algorithms if you’re a data science professional - it’s very common, fast, lightweight, and capable of exceptional accuracy when tuned correctly. There’s been renewed enthusiasm for gradient boosting algorithms with the advent of popular frameworks like XGBoost and LightGBM, and they get used for almost everything - so much so that it’s difficult to find a Kaggle competition winner that doesn’t use some form of gradient boosting.

Read On →

Diving into stochastic gradient descent

Stochastic Gradient Descent (let’s call it SGD from now on, because that feels a bit less repetitive) is an online optimisation algorithm that’s received a lot of attention in recent years for several reasons (and is notable for its applications in optimising neural networks). One of the most attractive things about it is its scalability - because of how it differs from other gradient descent algorithms, SGD lends itself neatly to massively parallel architectures and streaming data.

Read On →

Gauss' shoelace formula

There’s a useful trick I learned the other day while working some geospatial data. I needed to compute the areas of millions of polygons very quickly - which initially looked like a fairly daunting task. It turns out that there’s a very fast and efficient method developed by Gauss that was designed for computing the area of any simple polygon (a simple polygon is a closed-path flat shape consisting of non-intersecting line segments - i.

Read On →

Monopoly; a data scientist's perspective

Monopoly is a wonderful game. It’s one that gets wheeled out at every family gathering (usually Christmas, in my case), and gets half-heartedly played for a couple of hours - until everyone either gets bored and wanders off, or until everyone gets so apoplectic with rage that they flip the board upside-down, scattering the pieces everywhere - and we’re forced to pack it away for another year. The reason it’s so frustrating is that it’s pretty clear who will win after about half an hour of gameplay; the remaining hours of playtime essentially are just a slow, painful game of attrition during which other players are forced into progressively more desperate financial situations - until they go bust, that is.

Read On →

Just how bad is London's homicide rate?

There’s been a lot in the news lately about the supposedly "soaring" murder rate in London, and how it has supposedly eclipsed that of New York City in recent months. I was initially sceptical as this fact seemed fairly mundane — 50 homicides in 3-and-a-bit months didn’t seem all that unlikely. However, I wanted put my scepticism to the test — so I acquired a couple of datasets from the London Datastore to see if the hyper-sensationalised news articles were justified or not.

Read On →