I have been thinking about this for years and, when I started thinking about it because something I am coding, I decided to go for a geek post on this.

I really like Matlab. It is very easy, simple, decently fast and you can do literally anything with it. Yes, it is expensive, but usually somebody pays for it so you can use it. Also, months ago, I blogged about Columbia University students having access to a free Matlab license plus the software itself. I like Matlab mostly because sometimes I am working on a project, I have an idea, and I want to test it quick. A very simplified simulation of a large problem. And I can have that in a day or two with Matlab – and if I want the full implementation, I can do that with Matlab too… with time -.

For the ones of you like me – we code sort of often but we are not really SW developers – you might face this problem all the time. Oh, by the way, I am aware this might sound like an aberration for a real SW developer – who takes care of security of your code, bugs and pays special attention to optimize the code – but in my case I just want the simulation to run and give me the results.

So, my code – and I assume everyone’s too – has quite a few loops. There is usually a main loop that is the number of iterations or repetitions of the experiment, so you can average the results at the end. For the kind of work I do – wireless communications – the second main loop is time. And then there might be other loops inside – users, cells in my system, sub-carriers in OFDMA, etc. -. Within these loops, there are certain variables – arrays – that are filled and emptied. And here is where the problem comes.

Some arrays are known from the beginning. Results, for example. I know before starting to run that I want a sample of, for example, bit error rate for each time slot (time loop) per each iteration (experiment repetitions loop). So, if I am doing M repetitions and I will simulate N seconds of time, I need an MxN matrix pre-allocated for the results. And we all know – or should know – that pre-allocating is good and speeds up your code A LOT. What to do when my pre-allocated matrix needs to be (10^9 x 10^9), that’s a different problem…

Some other arrays, though, are not known initially. For example, if I am simulating data communications, with sessions initiated following a Poisson Process – with inter-arrival time exponentially distributed – and with the amount of data transmitted per session being random. If I need to store the data packets transmitted, I cannot know in advance the size of the matrix I need. So, what should I do? Pre-allocate a huge chunk of memory from the beginning – and possibly running out of memory – or just initiate the matrix empty ( matrix=[]; ) and just increase it each time I create a new data packet?

Another example, even more complex. Throughout my simulation I generate packets that need to be processed. Once a packet is created it is stored in an array. Once it is processed it is removed from that array. Here not only does the array grow, but it also shrinks during the simulation. What to do here?

It is a tricky question, and I am open to any kinds of suggestions. So far, unless I am in a hurry to see the results, I rather have the simulation run for a couple of days in a server. I keep growing arrays until I run out of memory. This way, once I have no more memory, at least I have stored the results so far in a big matrix. If I start pre-allocating a huge chunk of memory, Matlab might complain and not let me even start running the simulation.

PS. Yes, you guessed right. There is a poor server somewhere having a hard time running a very inefficient simulation launched by me that keeps growing and shrinking arrays at each iteration… I am working in optimizing it in parallel, but if it runs fast enough I’ll just leave it the way it is.

Advertisements