./backlog: Charlie's blog

meanderings through tidbits of mathsy computery stuff

Tips for a high-performance data science team

Over the last couple of years, I’ve been working as a data scientist in a small aviation technology company based in the UK. We’re a fairly new team, and we’ve come across some difficulties over the years. This blog post is a bunch of anecdotes and tips about the problems that I’ve come across while leading my team. I guess it’s a distillation of my (limited) experience — but it’s only one person’s account so take it with a pinch of salt.

Read On →

An introduction to gradient boosting

Well, it’s pretty much all in the title. Gradient boosting is one of those unavoidable algorithms if you’re a data science professional - it’s very common, fast, lightweight, and capable of exceptional accuracy when tuned correctly. There’s been renewed enthusiasm for gradient boosting algorithms with the advent of popular frameworks like XGBoost and LightGBM, and they get used for almost everything - so much so that it’s difficult to find a Kaggle competition winner that doesn’t use some form of gradient boosting.

Read On →

Diving into stochastic gradient descent

Stochastic Gradient Descent (let’s call it SGD from now on, because that feels a bit less repetitive) is an online optimisation algorithm that’s received a lot of attention in recent years for several reasons (and is notable for its applications in optimising neural networks). One of the most attractive things about it is its scalability - because of how it differs from other gradient descent algorithms, SGD lends itself neatly to massively parallel architectures and streaming data.

Read On →

Gauss' shoelace formula

There’s a useful trick I learned the other day while working some geospatial data. I needed to compute the areas of millions of polygons very quickly - which initially looked like a fairly daunting task. It turns out that there’s a very fast and efficient method developed by Gauss that was designed for computing the area of any simple polygon (a simple polygon is a closed-path flat shape consisting of non-intersecting line segments - i.

Read On →

Monopoly; a data scientist's perspective

Monopoly is a wonderful game. It’s one that gets wheeled out at every family gathering (usually Christmas, in my case), and gets half-heartedly played for a couple of hours - until everyone either gets bored and wanders off, or until everyone gets so apoplectic with rage that they flip the board upside-down, scattering the pieces everywhere - and we’re forced to pack it away for another year. The reason it’s so frustrating is that it’s pretty clear who will win after about half an hour of gameplay; the remaining hours of playtime essentially are just a slow, painful game of attrition during which other players are forced into progressively more desperate financial situations - until they go bust, that is.

Read On →