median blog

Stack Overflow shouldn't be your documentation

I work with a number of libraries with terrible documentation. The highest on that list is probably the python library pandas. That isn't to say the documentation is not comprehensive or it has undocumented features. The problem with the documentation is you need to know where to look to find anything.

Take for example a task I had last week. I had some date time values in a column and I wanted to map them to the start of the month so I could get a mean and range per month. That's a feature that is well supported in pandas but it is not easy to find in the documentation.

There is a resample function, which does something different. There are numerous datetime functions accessed through the very odd .dt. syntax. But ultimately I found my answer through a post on stack overflow.

This isn't the first time this has happened with pandas. There is an odd air around the library that seems to try and separate those who "know pandas" from those who do not. This separation seems to be entirely formed based on how many little tricks you can memorize. I have worked with many tools of similar size and complexity that do not suffer from the same issue, but sadly many more that do.

Given how widely pandas is used and recommended it is a wonder to me that it remains so difficult to find what you need other than through an apprenticeship from someone on stack overflow.

Not wanting to be someone who complains and does nothing about it maybe I'll start a little series on learning pandas without going insane.