I run a website that generates degree days, a specialist form of weather data used for calculations relating to building energy consumption. Without going into unnecessary detail, temperature is a function of time, and degree days are essentially the integral of that function.
At the moment our site calculates the data using an approximation method based on daily average, maximum, and minimum temperatures. But I'm working on improving that method by using finer-grained temperature measurements.
These finer-grained temperature measurements are taken throughout each day, and the recording interval can be anything from 1 minute to several hours - it depends on the weather station making the recordings. For most weather stations, the readings are fairly regular, but they're not completely regular. A weather station might typically record the temperature every half hour, but there will often be extra readings or missing readings, or readings taken at less regular intervals for certain periods.
Initially I've been using the trapezoidal method to numerically integrate the function of temperature against time. It's working pretty well, but I'm wondering if I might be able to improve it.
I'm not a mathematician, and my understanding of numerical integration is only very basic. I understand that Simpson's 1/3 rule and Simpson's 3/8 rule typically work better than the trapezoidal rule when numerically integrating mathematical functions. But real-world temperature readings don't follow an exact mathematical function. Also I understand that Simpson's rules require equal intervals, which my temperature readings don't consistently have.
I wonder if it might make sense to use Simpson's rules to integrate stretches of temperature readings that have 2 or more consecutive time intervals of equal length, and use the trapezoidal rule for stretches of irregular readings. But then I see here (a paper that I don't pretend to understand properly) that the trapezoidal rule can often work better than Simpson's rule for various classes of "rougher" functions. I would guess that outside air-temperature variation would be classed as pretty rough - the temperature jumps up and down throughout the day for all sorts of reasons.
I could probably come up with some way to estimate the effectiveness of various methods, but it's tricky because there's no "right answer" to compare figures against. So I'm trying to figure out what method would make most sense from a theoretical standpoint.
Do you think the trapezoidal rule is likely to be the best approach for me? Or are there other approaches that might make more sense?