mirror of
https://github.com/godotengine/godot-docs.git
synced 2026-01-07 02:12:07 +03:00
259 lines
11 KiB
ReStructuredText
259 lines
11 KiB
ReStructuredText
.. _doc_cpu_optimization:
|
|
|
|
CPU Optimizations
|
|
=================
|
|
|
|
Measuring performance
|
|
=====================
|
|
|
|
To know how to speed up our program, we have to know where the "bottlenecks"
|
|
are. Bottlenecks are the slowest parts of the program that limit the rate that
|
|
everything can progress. This allows us to concentrate our efforts on optimizing
|
|
the areas which will give us the greatest speed improvement, instead of spending
|
|
a lot of time optimizing functions that will lead to small performance
|
|
improvements.
|
|
|
|
For the CPU, the easiest way to identify bottlenecks is to use a profiler.
|
|
|
|
CPU profilers
|
|
=============
|
|
|
|
Profilers run alongside your program and take timing measurements to work out
|
|
what proportion of time is spent in each function.
|
|
|
|
The Godot IDE conveniently has a built in profiler. It does not run every time
|
|
you start your project, and must be manually started and stopped. This is
|
|
because, in common with most profilers, recording these timing measurements can
|
|
slow down your project significantly.
|
|
|
|
After profiling, you can look back at the results for a frame.
|
|
|
|
.. image:: img/godot_profiler.png
|
|
|
|
`These are the results of a profile of one of the demo projects.`
|
|
|
|
.. note:: We can see the cost of built-in processes such as physics and audio,
|
|
as well as seeing the cost of our own scripting functions at the
|
|
bottom.
|
|
|
|
When a project is running slowly, you will often see an obvious function or
|
|
process taking a lot more time than others. This is your primary bottleneck, and
|
|
you can usually increase speed by optimizing this area.
|
|
|
|
For more info about using the profiler within Godot see
|
|
:ref:`doc_debugger_panel`.
|
|
|
|
External profilers
|
|
~~~~~~~~~~~~~~~~~~
|
|
|
|
Although the Godot IDE profiler is very convenient and useful, sometimes you
|
|
need more power, and the ability to profile the Godot engine source code itself.
|
|
|
|
You can use a number of third party profilers to do this including Valgrind,
|
|
VerySleepy, Visual Studio and Intel VTune.
|
|
|
|
.. note:: You may need to compile Godot from source in order to use a third
|
|
party profiler so that you have program database information
|
|
available. You can also use a debug build, however, note that the
|
|
results of profiling a debug build will be different to a release
|
|
build, because debug builds are less optimized. Bottlenecks are often
|
|
in a different place in debug builds, so you should profile release
|
|
builds wherever possible.
|
|
|
|
.. image:: img/valgrind.png
|
|
|
|
`These are example results from Callgrind, part of Valgrind, on Linux.`
|
|
|
|
From the left, Callgrind is listing the percentage of time within a function and
|
|
its children (Inclusive), the percentage of time spent within the function
|
|
itself, excluding child functions (Self), the number of times the function is
|
|
called, the function name, and the file or module.
|
|
|
|
In this example we can see nearly all time is spent under the
|
|
`Main::iteration()` function, this is the master function in the Godot source
|
|
code that is called repeatedly, and causes frames to be drawn, physics ticks to
|
|
be simulated, and nodes and scripts to be updated. A large proportion of the
|
|
time is spent in the functions to render a canvas (66%), because this example
|
|
uses a 2d benchmark. Below this we see that almost 50% of the time is spent
|
|
outside Godot code in `libglapi`, and `i965_dri` (the graphics driver). This
|
|
tells us the a large proportion of CPU time is being spent in the graphics
|
|
driver.
|
|
|
|
This is actually an excellent example because in an ideal world, only a very
|
|
small proportion of time would be spent in the graphics driver, and this is an
|
|
indication that there is a problem with too much communication and work being
|
|
done in the graphics API. This profiling lead to the development of 2d batching,
|
|
which greatly speeds up 2d by reducing bottlenecks in this area.
|
|
|
|
Manually timing functions
|
|
=========================
|
|
|
|
Another handy technique, especially once you have identified the bottleneck
|
|
using a profiler, is to manually time the function or area under test. The
|
|
specifics vary according to language, but in GDScript, you would do the
|
|
following:
|
|
|
|
::
|
|
|
|
var time_start = OS.get_system_time_msecs()
|
|
|
|
# Your function you want to time
|
|
update_enemies()
|
|
|
|
var time_end = OS.get_system_time_msecs()
|
|
print("Function took: " + str(time_end - time_start))
|
|
|
|
|
|
You may want to consider using other functions for time if another time unit is
|
|
more suitable, for example :ref:`OS.get_system_time_secs
|
|
<class_OS_method_get_system_time_secs>` if the function will take many seconds.
|
|
|
|
When manually timing functions, it is usually a good idea to run the function
|
|
many times (say ``1000`` or more times), instead of just once (unless it is a
|
|
very slow function). A large part of the reason for this is that timers often
|
|
have limited accuracy, and CPUs will schedule processes in a haphazard manner,
|
|
so an average over a series of runs is more accurate than a single measurement.
|
|
|
|
As you attempt to optimize functions, be sure to either repeatedly profile or
|
|
time them as you go. This will give you crucial feedback as to whether the
|
|
optimization is working (or not).
|
|
|
|
Caches
|
|
======
|
|
|
|
Something else to be particularly aware of, especially when comparing timing
|
|
results of two different versions of a function, is that the results can be
|
|
highly dependent on whether the data is in the CPU cache or not. CPUs don't load
|
|
data directly from main memory, because although main memory can be huge (many
|
|
GBs), it is very slow to access. Instead CPUs load data from a smaller, higher
|
|
speed bank of memory, called cache. Loading data from cache is super fast, but
|
|
every time you try and load a memory address that is not stored in cache, the
|
|
cache must make a trip to main memory and slowly load in some data. This delay
|
|
can result in the CPU sitting around idle for a long time, and is referred to as
|
|
a "cache miss".
|
|
|
|
This means that the first time you run a function, it may run slowly, because
|
|
the data is not in cache. The second and later times, it may run much faster
|
|
because the data is in cache. So always use averages when timing, and be aware
|
|
of the effects of cache.
|
|
|
|
Understanding caching is also crucial to CPU optimization. If you have an
|
|
algorithm (routine) that loads small bits of data from randomly spread out areas
|
|
of main memory, this can result in a lot of cache misses, a lot of the time, the
|
|
CPU will be waiting around for data instead of doing any work. Instead, if you
|
|
can make your data accesses localised, or even better, access memory in a linear
|
|
fashion (like a continuous list), then the cache will work optimally and the CPU
|
|
will be able to work as fast as possible.
|
|
|
|
Godot usually takes care of such low-level details for you. For example, the
|
|
Server APIs make sure data is optimized for caching already for things like
|
|
rendering and physics. But you should be especially aware of caching when using
|
|
GDNative.
|
|
|
|
Languages
|
|
=========
|
|
|
|
Godot supports a number of different languages, and it is worth bearing in mind
|
|
that there are trade-offs involved - some languages are designed for ease of
|
|
use, at the cost of speed, and others are faster but more difficult to work
|
|
with.
|
|
|
|
Built-in engine functions run at the same speed regardless of the scripting
|
|
language you choose. If your project is making a lot of calculations in its own
|
|
code, consider moving those calculations to a faster language.
|
|
|
|
GDScript
|
|
~~~~~~~~
|
|
|
|
GDScript is designed to be easy to use and iterate, and is ideal for making many
|
|
types of games. However, ease of use is considered more important than
|
|
performance, so if you need to make heavy calculations, consider moving some of
|
|
your project to one of the other languages.
|
|
|
|
C#
|
|
~~
|
|
|
|
C# is popular and has first class support in Godot. It offers a good compromise
|
|
between speed and ease of use.
|
|
|
|
Other languages
|
|
~~~~~~~~~~~~~~~
|
|
|
|
Third parties provide support for several other languages, including `Rust
|
|
<https://github.com/godot-rust/godot-rust>`_ and `Javascript
|
|
<https://github.com/GodotExplorer/ECMAScript>`_.
|
|
|
|
C++
|
|
~~~
|
|
|
|
Godot is written in C++. Using C++ will usually result in the fastest code,
|
|
however, on a practical level, it is the most difficult to deploy to end users'
|
|
machines on different platforms. Options for using C++ include GDNative, and
|
|
custom modules.
|
|
|
|
Threads
|
|
=======
|
|
|
|
Consider using threads when making a lot of calculations that can run parallel
|
|
to one another. Modern CPUs have multiple cores, each one capable of doing a
|
|
limited amount of work. By spreading work over multiple threads you can move
|
|
further towards peak CPU efficiency.
|
|
|
|
The disadvantage of threads is that you have to be incredibly careful. As each
|
|
CPU core operates independently, they can end up trying to access the same
|
|
memory at the same time. One thread can be reading to a variable while another
|
|
is writing. Before you use threads make sure you understand the dangers and how
|
|
to try and prevent these race conditions.
|
|
|
|
For more information on threads see :ref:`doc_using_multiple_threads`.
|
|
|
|
SceneTree
|
|
=========
|
|
|
|
Although Nodes are an incredibly powerful and versatile concept, be aware that
|
|
every node has a cost. Built in functions such as `_process()` and
|
|
`_physics_process()` propagate through the tree. This housekeeping can reduce
|
|
performance when you have very large numbers of nodes.
|
|
|
|
Each node is handled individually in the Godot renderer so sometimes a smaller
|
|
number of nodes with more in each can lead to better performance.
|
|
|
|
One quirk of the :ref:`SceneTree <class_SceneTree>` is that you can sometimes
|
|
get much better performance by removing nodes from the SceneTree, rather than
|
|
by pausing or hiding them. You don't have to delete a detached node. You
|
|
can for example, keep a reference to a node, detach it from the scene tree, then
|
|
reattach it later. This can be very useful for adding and removing areas from a
|
|
game for example.
|
|
|
|
You can avoid the SceneTree altogether by using Server APIs. For more
|
|
information, see :ref:`doc_using_servers`.
|
|
|
|
Physics
|
|
=======
|
|
|
|
In some situations physics can end up becoming a bottleneck, particularly with
|
|
complex worlds, and large numbers of physics objects.
|
|
|
|
Some techniques to speed up physics:
|
|
|
|
* Try using simplified versions of your rendered geometry for physics. Often
|
|
this won't be noticeable for end users, but can greatly increase performance.
|
|
* Try removing objects from physics when they are out of view / outside the
|
|
current area, or reusing physics objects (maybe you allow 8 monsters per area,
|
|
for example, and reuse these).
|
|
|
|
Another crucial aspect to physics is the physics tick rate. In some games you
|
|
can greatly reduce the tick rate, and instead of for example, updating physics
|
|
60 times per second, you may update it at 20, or even 10 ticks per second. This
|
|
can greatly reduce the CPU load.
|
|
|
|
The downside of changing physics tick rate is you can get jerky movement or
|
|
jitter when the physics update rate does not match the frames rendered.
|
|
|
|
The solution to this problem is 'fixed timestep interpolation', which involves
|
|
smoothing the rendered positions and rotations over multiple frames to match the
|
|
physics. You can either implement this yourself or use a third-party addon.
|
|
Interpolation is a very cheap operation, performance wise, compared to running a
|
|
physics tick, orders of magnitude faster, so this can be a significant win, as
|
|
well as reducing jitter.
|