PROPER PERFORMANCE SCRIPTING IN 2K3

How to make your code execute as quick as possible!

  • Vanit
  • 03/18/2011 06:23 PM
  • 9111 views
This articles purpose is mainly to propogate some correct information, as the tutorial "Performance Scripting in 2k3" by Fhizban is more or less entirely incorrect (unempirical nonsense is probably more accurate). I don't suggest reading the article as you'll need to unlearn it all anyway. In the explanations to follow I will be citing my findings. If you would like to see how I worked it out please check out the appendices!

Disclaimer

These calculations are true of a modern day computer, I expect with something a bit older that the upper limits of rm2k3 are a bit lower (probably < 2000Mhz single core computers). I'm running a 6 core 3.7Ghz processor, so I'm pretty sure I'm somewhere above where the performance of rm2k3 would've plateaued.

Glossary
Parallel Process - the trigger condition of an event
Loop - the Event Command that performs loops
Label Loop - using 2 sets of labels you can make a loop and can jump out of the loop when a condition is true
loop - anything that performs a looping function, including both Parallel Processes, Loops and Label Loops
L - the maximum number of lines of code rm2k3 can execute a second

Introduction

Let's start at the beginning. I'm sure all of you have experienced the god forsaken lag that ensues when you start using Loops and Parallel Processes, but very few people understand exactly what is causing this lag. The popular myth is that having too many Parallel Processes running at one time causes lag. For the most part, this is incorrect. It is not the existence of the Parallel Processes that causes this lag, but the accumulation of how many event instructions rm2k3 is executing at one time. Its sounds so obvious! It isn't. The reality is you can have thousands of Parallel Processes going at once with no lag, more on that below!

Overhead

Everytime rm2k3 executes 1 line of code it makes the game lag. Not very much, but lag none-the-less. How many lines per second can rm2k3 take? So far as my data shows... somewhere between 600166 and 800222. I'm not sure what the exact number is, but its somewhere inbetween those (see Appendix A).

Yeah thats right. Sounds fricking insane doesn't it? You'll never be doing that kind of calculation, its madness, right?! You, my friend, have fallen into the trap of underestimating rm2k3 and computers in general (if rm2k3 is making that many executions a second, how much do you think your computer is!?). Its infact extremely easy to reach that cap - if you've made your game lag, you exceeded that magical number, which I will now refer to as L (for lag, you see!).

There are a few ways to exceed L, which yes, occurs when you have a lot of Loops and Parallel Processes happening at once. BUT its not because of the number of those you have, its the combination of how fast they're doing their calculations! I ran a small test and worked out how fast each kind of loop executes.

-Parallel Processes perform about 60 iterations per second
-Loops inside an event... perform about 200055!
-Label Loops inside an event........ 300083!!

This is what essentially causes the lag. Its not the fact they exist, its that they're doing so much at once! You can have a huge ass event with no loops and it'll run almost instantly, not matter how long it is! But the second you introduce a loop, especially an infinite one, code is being executed hundreds of thousands of times a second.

Also you're probably wondering why the hell Loops and Parallel Processes are different speeds. Its because, contrary to popular belief, Parallel Processes ARE NOT the same as Loops! When a Parallel Process reaches the end of its code, it doesn't loop back to the start like a Loop does. It calls itself! It basically goes "Call Event: This Event" indefinitely at the end of each iteration. This may or may not be a problem depending on what you're trying to code, I'll go over the implications in a later section. Basically though, when you use the "Call Event" command you introduce an overhead that takes rm2k3 0.0165 seconds to compute, whereas if you were using a proper Loop you don't have that over head and your calculations go about 333333% faster!

You never really hear people talking about Label Loops as its believed to be "bad programming". When it comes to rm2k3 the truth is that they're by far the fastest way you can calculate as they have even less overhead than a Loop! Because people don't really deal with Label Loops I won't be mentioning them again, but they're definitely a nifty tool when it comes to once off functions that are supposed to execute asap!

Overcoming L

Now we're getting into something you probably knew that works, but not WHY it works! You've probably had someone tell you that if a Loop is lagging to put a wait command of 0.0s at the end of it. Unbeknownst to you, this works because you then infact make the Loop slow down to about the same pace that a Parallel Process runs at (because a 0.0s wait command is actually a wait of 0.0165s), which is less calculations per second, which is therefore less laggy!

OMFG ITS ALL COMING TOGETHER. Are you getting excited? I'm getting excited.

So how many Parallel Processes CAN you run at once? The answer is about 10000-13000.Yep. Thats LINEAR Parallel Processes though, ones without a Loop inside them. Once you start adding Loops ofcourse, since they execute SO much faster than Parallel Processes, you start catching up to little ol' L mighty fast. There is an easy way around this though, if you added a 0.0 wait command inside that Loop, you bring it back down to the speed of a regular Parallel Process, so you're back to having as many loops as you want. You could even put an additional 0.0s wait command in each loop which will double the amount you can have running at once!

You ARE allowed to have Loops running rampant in the background though, but you can only have 2 of them running at once, as each one is executing 200055 lines per second you won't pass L, but you will with 3. That means you can run 2 Loops and still have room for only 3333 Parallel Processes! Ofcourse you can probably have a little more than that. 3 Loops WILL make you exceed L, though.

Practical Applications

The above may be helpful for everyone if they're simply experiencing lag, but making use of these principles is for advanced users only. An implication of Parallel Processes having a 0.0165s overhead is that time related events that you've coded that use the Parallel Process itself as a loop will be running slower than you intended.

For example, say you had a Parallel Process that tracked time, and every iteration of the Parallel Process it would wait 1.0s and then add 1 to the Seconds variable of your clock. Instead you'd find that you're incrementing that clock every 1.0165s and that after 10 hours gameplay your clock would be 10 minutes slow! In the world of time, losing 10 minutes in 10 hours means a broken clock. Instead inside the Parallel Process you should have a Loop that has inside it your 1.0 and Seconds incrementer, so that way it is infact keeping accurate time. Because the Loop is only executing every second it won't slow anything down, it just means every 1.0 seconds it'll do one really fast calculation with no overhead, unlike the Parallel Process.

Bonus Material! - Linear code optimization

Code doesn't need loops for it to be slow. Linear code will also cause lag if you're trying to do too much at once, but not because of how many instructions you're trying to execute, but because of if you're trying to be clever by putting the code you reuse often into its own common event. This in itself is good practice, but not if you're trying to write efficient code in a loop. Remember how I mentioned that each Call Event command has a 0.0165s overhead? If you put that Call Event in a loop you slow it down by that much... EACH TIME YOU CALL IT! Where as if you cut and paste the code multiple times, you wouldn't get that slow down!

On the whole it is better to compartmentalize your functions into different events so you can reuse them or separate your code, but you need to break that rule when writing events that warrant quick calculations - like a damage algorithm in a battle system, or a sorting algorithm, these need to be as quick as possible and Calling Events willy nilly makes them run a hell of a lot slower!

Another common misconception is that if your nest your if statements a certain way, so that rm2k3 has to check fewer of them, that the code will run faster. This is not true at all, for linear code. Ofcourse if inside one of those if statements there is a Loop and that if statement isn't true then that Loop won't execute and the code will run faster, but line for line rm2k3 takes the same time to execute linear code whether that code is being used or not.

But there is still hope for reducing your code. You achieve this by actually reducing the length of your linear code, but without using Call Event. You take the code that you do want to reuse and put it down the bottom of your event. Then instead of using Call Event to get to it, you use labels! It can be a little confusing, but if you're trying to get as much raw speed as you can its a must, and then you just put a label (I use label 100) at the very end of the event so you can skip to the end of the event to quit when you're done without executing the code you were hiding down there.

Another way to reduce the size of code is to invent the OR operator! This something that rm2k3 lacks, and its a shame because its one of the simplest things most programming languages have. By default in rm2k3, if you want a piece of code to execute when either one thing is true, or another thing is true, but not necessarily both, you have to paste the code in twice. Thats twice as many lines of linear code rm2k3 has to read through, and it also means you're making more work for yourself if the code is the same, long or your decide to change it later (you have to change it for each possible if statment)! The way you get around it is this simple trick:

OR = 0

IF thing 1 is equal to 5
OR = 1
End If

IF thing 2 is equal to 6
OR = 1
End If

IF OR is equal to 1
//code for both cases here
End If

Its very simple, your code will run quicker, and its easier to fix later! You can also do more complicated things like making the code only run if atleast 2 out of maybe 10 things is true, that'd be very annoying to replicate without doing it this way.

And thats it for this session, read onto the Appendix if you want to see how I got my data, and please leave a comment if this was helpful to you! :)

Appendix A

This was actually really easy to work out once I got my head around it. For the sake of simplicity I'm making the assumption that it takes ~0 seconds to execute a single instruction X.

Aim

To calculate how many times each type of loop iterates per second I ran an experiment over 60 seconds using multiple loops running concurrently alongside a timer loop, and on a 60 second timer another event that was watching the timer would pop up with the number of iterations each loop performed.

Method

1) Create the Time event as a Parallel Process with the following code:

Loop
Wait 0.1s
Timer + 1
End Loop

2) Create the Loop event as a Parallel Process with the following code:

Loop
Loop + 1
End Loop

3) Create the Parallel event as a Parallel Process with the following code:

Parallel + 1

4) Create the Label event as a Parallel Process with the following code:

Label 1
Labelcount + 1
Jump to Label 1

5) Create the Popup event as a Parallel Process with the following code:

Loop
If Timer is equal to 600
Message(Time: \vTimer Loop: \vLoop Parallel: \vParallel Label: \vLabelcount)
End If
End Loop

6) Execute the map and wait a minute for the results to appear and record them

Results

Time = 600
Loop = 12003333
Parallel = 3601
Label = 18005000

The per second metric was obtained by dividing these values by 60. Potentionally measurement could've been compromised if the combined execution rate exceeded L, however it did not and as such the results are assumed accurate.

The value L was obtained by copying and pasting each event until lag (regular jerkiness in character movement) was observed. The following combinations induced an L state:

3*Loop events
2*Loop + 1*Label
2*Label + 1 Loop
3*Label

No amount of linear Parallel Processes could induce an L state as hypothesized.

As it was impossible to measure the execution rate of each loop with less than 1 instruction per loop, L can only be approximated as being somewhere between the highest non-L state (2*Label, 600166 lines per second) and the lowest oberserved L state (3*Loop, 800222 lines per second).

Posts

Pages: 1
How did you reach the conclusion that "call event" uses 0.0167sec for overhead?

When I put up 3 events, 1 event to be called with a counter which added 1 for each call, a pp event which triggers if a switch is true and repeatedly calls the first event in a label loop, and a third event activated upon action, which turns switch on, waits for 1.0 sec and then turns switch off.

if what you said with 0.0167sec overhead for the command call event is true, then it should have been impossible for that counter to reach 150000 within that 1 second, yet it does.
Good article. Spoken like a true boss.
Excellent information. This should help when I try to reduce some of the lag in my game.
@Kazesui: It seems I made a slight error in that regard, by the time I finished writing the article it was 4am and I think I made it as an assumption based on me guess of how a Parallel Process loops and why it is so slow compared to everything else. In anycase, I plan to rectify that now as I looked further into it!

The Overhead of a Common Event

It seems my previous claim for the overhead of a common event was inaccurate, upon further study I found that it is both worse and better than I thought at the same time, depending on the circumstances!

Before I start showing you numbers, keep in mind that this is relative. Any code I put into the loops will slow them down significantly as before I was only comparing the delay between the end of an iteration and the beginning of the next iteration. If you conduct versions of my experiments and feel like posting the results go ahead, but also remember to post the control values of the loops running empty (that is, with only a counter in them) or the data is useless.

To save you scrolling up and down, here was the results of my loops with only counters in them (ips is iterations per second):

-Parallel Processes 60ips
-Loops inside an event, 200055ips
-Label Loops inside an event, 300083ips

For the sake of this having some real world application I used a moderate but linear event (no loops in it) as the Common Event I'd be testing with. Its one I have in my game that resets all the characters health to full and a few variations in if statements depending on the mode the event is running in. Its a very simple, but useful event and appropriate for this test I think. I got some really strange results.

When I pasted the code for the event into the Loop itself, it executed at 13337ips and didn't lag the game. You'll notice this number is significantly slower than when the event is empty - remember I said everything affects the latency of a loop, the point is it doesn't lag the game. But when I had that same code in a Common Event and the Loop was calling it every iteration I didn't need to worry about what the ips was because the game became jittery as shit, the lag made it unplayable. The moral of the story: calling Common Events over and over in a Loop is a no go!

I didn't bother testing the Parallel Process with the code itself inside as I knew it'd run properly, but what I didn't expect was how stable the Parallel Process would be with Common Event calls in it. The Parallel Process still executed at a steady 60ips with FORTY Common Event calls in it. Slow, but steady eh? However when I pushed it further it'd lag the actual game making it unplayable. You can safely use a a lot of Common Events in this fella with no additional latency at all!

Label Loops weren't much better off than Loops. With the code of the Common Event inside the Label Loop it happily executed at 13636ips with no ingame lag. Similar to the Loop, replacing that code with a call to a Common Event made the game so laggy it was unplayable. Code only inside Label Loops please!

So there you have it. I can't give you the exact overhead of a Common Event as I previously tried to. In the best of cases, Parallel Processes, there is no overhead. In the worst of cases, Loops and Label Loops, the game becomes so laggy its unplayable, but this can be avoided by using the other techniques I suggested in the original article.
I thought it was weird for call event to have that adverse effect, considering I was pretty sure I had tested them in some similar manner, and also because of how much I use them in my events which needs to be updated around once every 0.0167 sec. Any extra 0.0167 per event call would have been pretty drastic there.
I don't think I've called too too many big events in label loops though, as the only time it would seem practical is if the screen is static at the time, so I guess I don't know too much about it's potential ill effects and as long as one follow your advice of having at least a 0.0 wait in the loop somewhere, it doesn't seem to have any noticable problems for me at least.

Using the method I mentioned earlier I get
60 ips for pp loops
200,000 ips for loops, and
300,000 ips for label loops.

I've done quite a bit of testing on these things as well, which has revealed to me interesting and sad things, like comments taking an instruction cycle, and even empty lines as well, which is why "loops" are slower than labels.
Doing variable operation on any variable range seems to be just as fast as any other variable operation though, which is good for optimizing code with lot of similar variable operations. In other words, being thoughtful about the location of your various variables might help as well.

Another thing I found peculiar was your claim about nesting if statements not being good unless there's a loop in it which you could prevent.
When I add a branch to the label loop with just a counter, it tells me that if the condition is permnantly false, the if statement will only execute 2 instructions for each iteration ( the check, and a jump to the command after "end" I suppose), regardless of any amount of instructions inside of it.
The problem with nesting them comes from all the empty lines and the "end" lines, when you get further into the nests, which still get read.

an "if nest" with 4 if's and only additional code in the final one, will execute 11 lines if all if's were true except the last one. This is regardless of the amount of commands within the top if (which btw. is almost all the lines if all if's are empty). If only the first if was true, it would only execute 5 lines.
You could use labels at the end of each nest top optimize it though. This could cut the 3 of 4 if's true scenario down to 6 lines. You'll just need a lot of labels in case you have many nested statements.

The way I use to determine this, is to take 600,000 / lines to read, and it will return the same value at the end of a counter session using my label loop method. it has held true for all my observations so far.

So bottom line is something you've already mentioned I suppose. Labels are very good for optimalization, at the potential cost of making spaghetti code of your events.

Hope I'm not sounding too negative or anything though, as there is very much good in this tutorial.
Not at all. I cherish any constructive feedback I get and thank you for continuing to take the time to read and consider my article. :D

author=Kazesui
I thought it was weird for call event to have that adverse effect, considering I was pretty sure I had tested them in some similar manner, and also because of how much I use them in my events which needs to be updated around once every 0.0167 sec. Any extra 0.0167 per event call would have been pretty drastic there.
I don't think I've called too too many big events in label loops though, as the only time it would seem practical is if the screen is static at the time, so I guess I don't know too much about it's potential ill effects and as long as one follow your advice of having at least a 0.0 wait in the loop somewhere, it doesn't seem to have any noticable problems for me at least.
There's not a lot I can think of that would require upfront computational speed in rm2k3. The only one I've come up with so far is the sorting algorithms I've implemented for the Item and Materia submenus, which can take up to a second to execute in their current implementation. Before I start the computation I just pop a message for the player that just says "Calculating..." or something as the wait time is sadly unavoidable until I get around to putting in an O(nlogn) sorting algorithm.

author=Kazesui
Using the method I mentioned earlier I get
60 ips for pp loops
200,000 ips for loops, and
300,000 ips for label loops.

I've done quite a bit of testing on these things as well, which has revealed to me interesting and sad things, like comments taking an instruction cycle, and even empty lines as well, which is why "loops" are slower than labels.
Doing variable operation on any variable range seems to be just as fast as any other variable operation though, which is good for optimizing code with lot of similar variable operations. In other words, being thoughtful about the location of your various variables might help as well.
I was aware of variable operations all seeming to take the same time. I expected modulo and divide to take a bit longer, but apparantly they're all as worse as eachother it seems. I wasn't aware of the comments thing though, that is sad. :( I guess a part of me blindly hoped there was a precompiler that removed them or something.

author=Kazesui
Another thing I found peculiar was your claim about nesting if statements not being good unless there's a loop in it which you could prevent.
When I add a branch to the label loop with just a counter, it tells me that if the condition is permnantly false, the if statement will only execute 2 instructions for each iteration ( the check, and a jump to the command after "end" I suppose), regardless of any amount of instructions inside of it.
The problem with nesting them comes from all the empty lines and the "end" lines, when you get further into the nests, which still get read.

an "if nest" with 4 if's and only additional code in the final one, will execute 11 lines if all if's were true except the last one. This is regardless of the amount of commands within the top if (which btw. is almost all the lines if all if's are empty). If only the first if was true, it would only execute 5 lines.
You could use labels at the end of each nest top optimize it though. This could cut the 3 of 4 if's true scenario down to 6 lines. You'll just need a lot of labels in case you have many nested statements.

The way I use to determine this, is to take 600,000 / lines to read, and it will return the same value at the end of a counter session using my label loop method. it has held true for all my observations so far.
It seems you've helped me discover another lapse in my judgement. Up until now I had thought that all code was "read over" even if it wasn't executed because I had observed this behaviour in my ATB algorithm - huge chunks would not be touched when the enemies that used them weren't in play for that battle, yet the ips of the event would not change regardless of how little or much of the code was active. It turns out that this was because I had put a 0.0 wait at the end and that had regulated the execution time. When I take out the 0.0s wait there's a huge disparity in the ips depending on the enemies that are active. This does give rise to the existence of a behaviour that I still don't fully understand - why execution time of instructions seems to give way once you start putting a wait in. I've tried to think of ways that could work, but as of writing it completely escapes me why this behaviour exists.
I haven't read much of this yet ('cause I get distracted a lot), but I know I'm gonna love it, and it's gonna make RPG Maker life a lot easier! ^O^
Vanit, you really should update the tutorial with all the changes discussed in the comments. I just read a whole bunch of crap that is wrong and now I have to unlearn it, which is ironically what you say at the start of this article about someone else's article.

This is all extremely useful information, just remove the wrong information.

If you even come around here any more...
Pages: 1