Yesterday was the second qualifier round for ModelOff, the World Financial Modeling Championships. I’m sharing my models again, with the same caveat as before: these are what I made during the test, with lots of time pressure, etc – I don’t know if they’re complete and correct (I hope so!), and I’m sure there are things I would have designed differently with a bit more time (some of which I’ve commented on below). I think I was also a bit more sloppy about my formatting this round…
If you haven’t taken the test, you can access the questions here (and I’d definitely recommend giving them a try before reading this). I’ve posted them in the order I attempted them.
You can find my models from Round 1 (which turned out to be right – or at least to give the right answers to the questions!) here.
Section 2: Late again
This question had two parts: first, turning a ‘human-readable’ train schedule into a more ‘machine-readable’ database, and second cleaning up some messy data on actual train times to compare to the schedule.
I struggled to come up with a ‘good’ way to model the first part quickly, and ended up giving up and doing it in a half-manual / half-automated way, which required some copy / value-pasting at every schedule change (AM to midday, etc), changing a VLOOKUP range for every line or direction change, and changing other formulas for every line change. I would never build a real model like that, because it’s way too error-prone (I actually made a mistake building it and had to spend some time figuring out where I had gone wrong), but I’m still not sure that I could have made a robust alternative any faster for the actual size of this task. One of the questions last year (fund the future – you can find it here) had some messy input data as well, and I spent way too much time trying to clean it into a standard form instead of just putting in a couple of patches and getting it done.
The data clean-up was more straightforward (although most of my working columns weren’t labeled – sorry!). The idea was to (i) look for AM / PM to figure out where the time occurred and pull that out (and remove that from the starting string, (ii) look for R / Y / B (the line codes) to figure out where that part was and pull it out (with an additional step to remove all the extra delimiters), and then (iii) extract the day from what was left.
To join the two together and do the questions on scheduled vs actual time, I coded each train with its line, direction, station, and trip number in the day (e.g. BIH7 was the seventh inbound train on line B to reach station H in the day). Because of the clumsy way I built the timetable, when I wanted to sort it, I had to copy and value paste – another piece f very bad modeling practice!
I wonder if I could have structured the whole thing better if I knew that I’d have to answer the last question about the total journey time for a train along a line. As it was, I had to calculate the departure and arrival times for each of the 30 trains on the line in each direction in each day (W7:AD70 on the ‘5 days data’ tab). It works, but it’s not pretty.
Section 3: System allocation
I liked this question a lot, because it was not at all obvious at a glance how to model it. I started off thinking of looking at the depots one by one, finding one which was over-capacity, finding where to reallocate it to, and then repeating with the next over-filled one until I was done – but that won’t match up the method described in the question.
The way I built it in the end (the ‘right’ way, I think) was to make a table of all the pairs of depots, sort it in order from shortest to longest pairwise distance, and then work through the possible reallocations from the top to the bottom. For each pair, I checked (i) if the distance was less than the maximum profitable distance, (ii) if the two depots were in the same hub, and (iii) if one depot was under capacity before this step and the other was over. If all those conditions were met, I moved as many cars as possible between that pair, then calculated the number of cars at each depot after that move, and used that as the input for the next pair. Whenever I made a move I calculated the transport cost in that row, and then I calculated the revenues and penalties based on the final distribution after all allocations.
There are two messy tables down at the bottom with a lot of the workings for the questions. The first, in O59:S63, summarizes the fee income and transport and penalty charges for each hub, and the second, in V60:AC64, runs different numbers across all the 4 scenarios (using a data table, and changing the value of the switch in cell G15) to calculate the changes in certain costs between scenarios.
Section 1: Collect the cash
This one was very like many real-life models, in that the assumptions were messy and totally not standardized. There wasn’t anything that I thought was very difficult to model in principle, but there was a lot of detail, and a lot of places to go wrong – and I did. Twice!
I want to share an honest reflection of what I made in the time (since that’s the whole point of this post – there will probably be official solutions that are much cleaner than mine), but I also don’t want to share bad info, so I’ve corrected the two small errors I made, but also highlighted them an included a comment with what I originally did. They’re in cells U20 and X27.
There are also a couple of assumption switches (since that was one of the things I called myself out for not doing in the last round!) in cells U15 and Y15 which you can use to activate the alternative assumptions needed to answer some of the questions.
Aside from that, I don’t have any great insight to offer on this question – it was all about getting in the details.
I enjoyed this round a lot. The questions played much more to my strength than the first round (even though that wasn’t reflected in my score!), and definitely filled in the ‘non-traditional’ dimension that I thought was missing from Round 1. I also thought the questions were hard – even harder than Round 1, which I already thought was a step up from previous years (offset by allowing more time, of course).
Based on my save times, it looks like I spent 59* mins on Section 2 (vs 50-75 min target), 31 mins on Section 3 (vs 30-45 min target), and 46 mins on Section 1 (vs 40-60 min target). Although since I found my bigger mistake on Section 1 less than 10 minutes after submitting, I regretted not spending a little longer on that one.
The results will be out in a few days, so I’m crossing my fingers that I make the final this year – wish me luck!
* This includes a few minutes spent changing a poopy diaper for my almost-2-year-old within 10 minutes of the test starting (my wife is pregnant with our second child, and gets very nauseous around baby poop). Talk about a sense of timing!