I had the privilege to contribute key ideas to the new MAVLink Gimbal Protocol v2. The new protocol not only overcomes the many issues with the old gimbal protocol, but also introduces the new concept of a „manager“ and a „device“ for managing the communication with a MAVLink peripheral. Albeit simple in concept, it is not as simple as the old gimbal protocol. So I dedicate this page to some discussion of the new protocol.
The central part of the concept is the gimbal manager. On the one hand, essentially all communication related to the control of the gimbal is going through – and has to go through – the gimbal manager (with few exceptions). This enables the gimbal manager to control all the communication with the gimbal, and thus prevent and resolve conflicts in cases where several parties want to control the gimbal. This, on the other hand, is the gimbal manager’s main job and responsibility, to manage several parties and to deconflict in case of conflicting requests.
On the protocol level this is achieved by a strict separation of the messages (with few exceptions) into those which describe the communication between gimbal manager and gimbal and those which describe the communication between gimbal manager and the other parties. The messages between gimbal manager and gimbal are named GIMBAL_DEVICE_XXX, and those between gimbal manager and the other parties are named GIMBAL_MANAGER_XXX.
Thus, the principle is simply this:
Only the gimbal manager is allowed to control the gimbal by sending GIMBAL_DEVICE_XXX messages, while all other parties must use GIMBAL_MANAGER_XXX messages.
Each gimbal is associated to one gimbal manager, and – in order to not have to rely on the goodwill of the other parties to follow the rules but to enforce them – we require:
A gimbal must listen only to the GIMBAL_DEVICE_XXX messages from its gimbal manager, but not to those from other gimbal managers, and it especially must not listen to any GIMBAL_MANAGER_XXX messages.
Makes sense, right :).
For notation, the gimbal is called „gimbal device“ in that concept. The reason is that a physical gimbal can be both gimbal manager and gimbal device; gimbal manager and gimbal device are logical entities, and not physical devices (they are actually pieces of code running on some device). This should become clearer below. Also, what I’ve called „other parties“ so far I will call „clients“ henceforth.
A pictorial representation of the concept can thus looks as this:
In addition, the gimbal manager can support higher level operations, such as tracking or whatever comes to mind. And indeed, while the GIMBAL_DEVICE_XXX messages carry grass-root data fields such as a attitude quaternion, the GIMBAL_MANAGER_XXX messages generally are more „user-friendly“. We thus find:
The gimbal manager also provides a higher-level API or abstraction to the clients.
From the perspective of the user or integrator, this is probably the most relevant aspect of the concept. But it shall not be the topic of this article.
A primary challenge for any concept is to manage the various possible states in which the gimbal device, the gimbal manager, and the clients can be. On the protocol level this is achieved by a set of flags, and a set of messages which are regularly reporting the flags. On the code level it is the job of the gimbal manager to sort things out. The flags can be broadly distinguished into capability flags and status flags. The capability flags describe the static properties and features of the gimbal device and the gimbal manager, which cannot change during run-time. The status flags collect all those flags which are dynamic, i.e., can change during run-time (dynamic doesn’t mean here that they actually change, only that they could change; similarly, static doesn’t mean that they do not change, it means that they cannot change). The handling is as such:
The clients can request changes in the flags, the gimbal manager will evaluate them and make a decision, and (regularly) broadcast its flags. That way the clients can determine if their request was accepted or not.
It is with the flags where the new gimbal protocol becomes a bit more difficult than the old. In fact, it took me some significant effort to work out an initialization protocol which is relatively simple to implement, yet safe and flexible. But before I go into this, I first want to discuss in the next chapter the relation to the physical communication links.
Physical Communication Links
Two Clients, one Gimbal
The simplest system to consider consists of three MAVLink components, a GCS, an autopilot, and a gimbal of course. However, as regards the above concept, this actually gives rise to two configurations: The gimbal device is clearly located on the gimbal, however the gimbal manager can be located on the autopilot but also on the gimbal (which means that the piece of code which is doing the gimbal manager function is either part of the autopilot code or part of the gimbal code):
Technically, the gimbal manager could also be located on the GCS, but let’s not consider this. From the perspective of the clients, i.e. the GCS and the autopilot, both arrangements are equally good, since they both would communicate only with the gimbal manager.
In the arrangement with the gimbal manager on the gimbal, GIMBAL_DEVICE_XXX messages would never be physically exchanged, and the gimbal would expose only the GIMBAL_MANAGER_XXX message API to the clients. The gimbal however is then responsible to e.g. deconflict if conflicting request are coming from the GCS and the autopilot, i.e., it has to do exactly what the gimbal manager’s job is to do.
A gimbal which integrates the gimbal manager (and the gimbal device), is called a „smart gimbal“.
It is added in passing, that a gimbal device does not always have to be present. Such a situation can occur if the gimbal is in fact „dump“ and does not support the new MAVLink protocol. This could be either a gimbal which only speaks the old MAVLink gimbal protocol, or a gimbal which is controlled by other means such as PWM signals. In these cases, the gimbal manager would also do the traditional communication with the gimbal in order to mimic a protocol-v2 gimbal to the clients. This could also be done by the gimbal device, if that better suits one’s needs. The take-home message is that also „dump“ gimbals can well be handled in the concept.
Three Clients, one Gimbal
Let’s now consider a system which is made up of a GCS, an autopilot, a companion computer, and a gimbal.
These four components can be connected to a MAVLink network in different ways with different topologies. Three typical examples are given below. In addition, for each topology, the gimbal manager can be located differently, on the autopilot, on the companion computer, or on the gimbal. This thus gives a rich set of different configurations, which are all possible within the concept without need of any special precautions (except of course that the gimbal manager must be implemented on the respective device it is located on).
Of course, the MAVLink components which are at the interconnection of two or more links need to have routing capabilities, i.e., forward incoming MAVLink messages to the other links according to the MAVLink routing rules.
Locating the gimbal manager on the companion computer can often be of advantage. It could be realized as e.g. a Python script and would thus be most easily modified or extended as needed.
In this topology, the GCS, companion computer and gimbal are connected to the autopilot, yielding a star-like network with the autopilot at the center. This is probably the most widely considered arrangement when companion computers are used. It has the advantage that a router only needs to be implemented on the autopilot, which is the case anyways, so that it’s relatively simple to get it going. The disadvantage might be that it uses relatively many links (UART ports) on the autopilot, which it may not provide.
As discussed, the gimbal manager can be located on the autopilot, companion computer, or gimbal, which gives rise to these three scenarios:
In this topology, the four components are connected to a chain, with the companion computer in between the GCS and the autopilot. This topology is akin to that used in the 3DR Solo, and that’s why I called it „3DR Solo like“, but the 3DR Solo certainly did not invent this topology.
The disadvantage of this topology, at least for a DIY hobbyists like me, is that the companion computer (and thus potentially also the RC control) sits in between ground control and autopilot, which means that any code mistake on the companion can result in a loss of control of the vehicle. It is thus not my preferred arrangement for developing, testing, experimenting (= fooling around :)).
Again, the gimbal manager can be located on the autopilot, companion computer, or gimbal, giving rise to these scenarios:
This is the last example I’d like to show. Here, the companion computer is in between autopilot and gimbal. For brevity, I have not plotted the cases with the gimbal manager on the autopilot or the gimbal, you certainly got the pattern now.
This is my preferred arrangement for my own experiments.
This picture sketches the initialization protocol I ended up with:
Looks complicated, but it is less complicated than it looks like. The main task to achieve at initialization is a controlled flow of IDs, such as the gimbal device ID, and of the capability and status flags. After initialization all involved parties, gimbal device, gimbal manager and clients, should have synchronized flags. And this is what this protocol achieves.
It also avoids the need for excessive configuration to get things going, i.e., uses auto-discovery wherever possible. This minimizes the need of parameters, and the need to configure them.
The sequence in words:
1. Waits for autopilot HEARTBEAT to auto determine system ID (can be skipped if system ID is pre configured).
2. Starts sending HEARTBEAT and GIMBAL_DEVICE_ATTITUDE_STATUS
3. Waits for GIMBAL_MANAGER_STATUS from its gimbal manager. Determines component ID of gimbal manager.
4. Starts accepting GIMBAL_DEVICE_SET_ATTITUDE messages.
5. Responds to GIMBAL_DEVICE_INFORMATION requests at any time after having determined system ID.
1. Waits for gimbal HEARTBEAT. Determines component ID of gimbal, and sets gimbal_device_id to it.
2. Waits for GIMBAL_DEVICE_ATTITUDE_STATUS from its gimbal device. Determines gimbal device flags if wanted.
3. Starts sending GIMBAL_MANAGER_STATUS with desired gimbal manager flags.
4. Starts sending GIMBAL_DEVICE_SET_ATTITUDE with desired gimbal device flags.
5. Requests GIMBAL_DEVICE_INFORMATION and waits for response.
6. Starts to responds to GIMBAL_MANAGER_INFORMATION requests.
1. Waits for gimbal HEARTBEAT. Determines component ID of gimbal.
2. Waits for GIMBAL_MANAGER_STATUS from the associated gimbal manager. Determines component ID of gimbal manager. Ignores GIMBAL_DEVICE_ATTITUDE_STATUS messages up to this point.
3. Requests GIMBAL_DEVICE_INFORMATION (if not already received) and GIMBAL_MANAGER_INFORMATION and waits for responses.
4. Ready to go.
Depending on the desired situation, the initialization protocol could be chosen to be different. For instance, if pre-configured gimbal device flags should be used then the gimbal manager wouldn’t have to wait for a GIMBAL_DEVICE_ATTITUDE_STATUS message before it starts sending GIMBAL_MANAGER_STATUS messages. However, it is not very harmful to make the gimbal manager to wait for this message, since it is transmitted relatively frequently by the gimbal device. Based on such considerations I arrived at the shown scheme, which is relatively simple to implement and yet allows us to accommodate the various situations.
It also accounts for these two cases: The gimbal device flags are (1) pre-configured in the gimbal device or (2) pre-configured in the gimbal manger. The first case is trivial to handle; one just has to require that the gimbal manager has received a GIMBAL_DEVICE_ATTITUDE_STATUS message before it starts sending messages or responding to messages on its own. The second case is more tricky since a regression exists in that the gimbal device must have received a GIMBAL_MANAGER_STATUS message in order to discover its gimbal manager but the new pre-configured gimbal device flags are sent later to it, which introduces a short moment of ambiguity. However, this is of little practical relevance so I didn’t bother with this; the proposed scheme works well for both cases.
The Simplest Gimbal Manager
The concept may sound complicated. And it can be so, if applied to complicated situations. It is however maybe not so surprising that a complicated situation is, well, complicated. This chapter shows that if one just wants to do what one has done so far with protocol v1, it is really very simple.
All one needs to do is to send at least one GIMBAL_MANAGER_STATUS message to the gimbal:
This will announce the gimbal manager to the gimbal: The gimbal device will determine the component ID of the gimbal manager from this message, and enable being controlled by GIMBAL_DEVICE_SET_ATTITUDE messages. The operator now can fully control the gimbal by using the GIMBAL_DEVICE_SET_ATTITUDE message, and retrieve its status by listening to the GIMBAL_DEVICE_ATTITUDE_STATUS messages emitted by the gimbal.
This is all it needs to control the gimbal 🙂
Therefore, it should be very simple to convert any existing code to using the gimbal protocol v2 for controlling the gimbal.
However, the catch, this obviously does not also provide the front-end of the gimbal manager, i.e., its API, as this in addition would require implementing the GIMBAL_MANAGER_XXX messages. So, I was actually lying to you then I was declaring that this would be the simplest „gimbal manager“. It is not a gimbal manager in the true sense, as it misses the front-end. So, the converted code may still use the old protocol v1 methods for doing the high-level stuff like POI and so on. The point of this chapter was to emphasize, that the control of the gimbal itself has not become much more difficult than before, and can be easily accommodated.
A not so trivial aspect of controlling a gimbal is related to its attitude. This topic is actually unrelated to protocol v2, but is a fundamental point true for any gimbal and had been around since ever – it is just that it may not have been so apparent so far and hence may not have become so much attention.
Attitude can be represented in many ways, and probably the best known in the drone area are (aeronautic) Euler angles, DCM, and quaternion, and I assume that the respective advantages and disadvantages are also known to the reader. A strong feature of the new protocol v2 is that it consistently uses quaternions for the communication with the gimbal device, i.e., in the GIMBAL_DEVICE_XXX messages. However, in many cases one nevertheless wants to, or maybe even has to, convert to and from Euler angles. And this poses some issue since the aeronautic Euler angles, which are generally used in that area, do note bode well with gimbals.
Comment: For the context, I’d like to remind the reader that there are in total 12 different sets of angles allowed by Euler’s theorem, sometimes classified as Euler and Tait-Bryan or Cardan angles, proper or classic and improper angles, and so on. I choose the position to call them all Euler angles.
This is not difficult to see: The aeronautic Euler angles are characterized by having the gimbal lock at pitch = +-90°. However, tilting the camera down by 90° is really a very common and desired situation with gimbals! If one thus would use the aeronautic Euler angles for describing the gimbal orientation one would run in all sorts of mathematical issues. If one would, e.g., display them, then the roll and yaw angles would just go nuts when one approaches pitch = +-90°, even though the gimbal does behave perfectly fine! Not very user-friendly.
For the large majority of gimbals, Euler angles with the gimbal lock at roll = +-90° are the appropriate choice. And usually it is the Yaw-Roll-Pitch Euler angle set (the aeronautic Euler angles correspond to Yaw-Pitch-Roll in this notation). I will refer to them as „gimbal Euler angles“ in the following.
Unfortunately, the angles calculated with one Euler set are not consistent with those calculated with another Euler set. That is, one obtains numerically different values even for exactly the same camera orientation.
Gladly, when roll is zero, both the aeronautic and the gimbal Euler angles coincide. That is, pitch and yaw angles calculated with either of them are then exactly identical. This is great, since for nearly all control purposes roll is in fact chosen to be zero and only pitch and yaw is controlled (in fact, the gimbal manager API largely exposes only pitch and yaw, for that reason). Thus, here the task is simple, as we now can use either Euler angle set for converting the desired pitch and yaw angles into a desired quaternion.
Unless we want to nudge. Nudging corresponds to adding a small deflection to the primary target orientation. Three ways to do this are possible:
Choice 3 makes probably most sense, so it is what I would recommend to do (and I am doing myself), but in the code it may require recalculations of angles and quaternions.
The situation is very different for the attitude reported by the gimbal via the GIMBAL_DEVICE_ATTITUDE_STATUS message. Here, roll is not zero. It may be very small for most of the time, but it nearly never will be exactly zero, and the numerical issues mentioned earlier fully take place.
My recommendation here is to not use the aeronautical Euler angles, but the gimbal Euler angles. This removes the instability, and in addition the pitch and yaw angles are consistent with the control angles, for the reason just discussed.
My Gimbal Manager Design
A great feature of the new protocol v2 is that the gimbal manager behavior can be determined by the implementer to a very large degree, which opens lots of flexibility to accommodate one owns needs and wishes. All responsibility is with the gimbal manager. This also includes the responsibility for bad behavior! That is, if things don’t work out properly, you know whom to blame: the gimbal manager implementation :).
I designed „my“ gimbal manager following these ideas:
● Four clients are supported, MISSION, COMPANION, GCS and RC. While for the COMPANION and GCS clients corresponding MAVLink components are existing, this is not so for the MISSION and RC clients. Both the MISSION and RC clients are implicitly tied to the autopilot. In the case of the MISSION client it actually depends on whether the gimbal manager is located on the autopilot or not. For the RC client it is obvious, since the RC receiver which provides the RC signals is connected to the autopilot.
Comment: The current version of the protocol v2 does not support more than 2 clients, but this can be overcome with a simple extension of the flags, as discussed here.
● The priority for overriding is as follows: COMPANION > GCS > MISSION > RC.
● Only the client which currently holds the overriding status can execute the higher level CMDs.
● The COMPANION and GCS clients can request override status for both itself and the RC client. That is, only the COMPANION and the GCS can set RC_OVERRIDE or RC_NUDGE.
● All clients can request override status for the MISSION client, as well as terminate it (this is somewhat not so nice, and I will probably replace by something more appropriate).
● The overriding client cannot also nudge (this is an obvious one).
● Every client can nudge, if at least one client has overriding status. That is, in the extreme case up to three clients would nudge. The second condition requires that at least one client is in overriding state, whish is to prevent that every client is just nudging.
● If a client isn’t sending for a certain timeout period (e.g. 10 seconds), then its override and nudge flags are reset. In normal conditions this never should occur, but it’s a convenient safety net.
● MAVLink official documentation: MAVLink Gimbal Protocol v2
● Protocol design document: Proposal for new gimbal messages
● Collection of ideas: gimbals.txt
● Main issue in the MAVLink github repo: #1174