Platform/GFX/Gralloc

From MozillaWiki
< Platform‎ | GFX
Jump to: navigation, search

Everything that we know, and everything that we'd like to know, about Gralloc.

What is Gralloc?

Gralloc is a type of shared memory that is also shared with the GPU. A Gralloc buffer can be written to directly by regular CPU code, but can also be used as an OpenGL texture.

Gralloc is part of Android, and is also part of B2G.

This is similar to the functionality provided by the EGL_lock_surface extension, but EGL_lock_surface is not widely supported on Android/B2G.

Gralloc buffers are represented by objects of the class android::GraphicBuffer. See ui/GraphicBuffer.h.

We only use Gralloc buffers on B2G at the moment, because the locking semantics of Gralloc buffers tend to vary a lot between GPU vendors, and on B2G we can currently at least assume that we only have to deal with Qualcomm drivers. However, this got standardized in Android 4.2. See below.

Allocation and lifetime of Gralloc buffers

How Gralloc buffers are created and refcounted (non Mozilla-specific)

The android::GraphicBuffer class is refcounted and the underlying gralloc buffer is refcounted, too. It is meant to be used with Android Strong Pointers (android::sp). That's why you'll see a lot of

 android::sp<android::GraphicBuffer>.

That's the right way to hold on to a gralloc buffer in a given process. But since gralloc buffers are shared across multiple processes, and GraphicBuffer objects only exist in one process, a different type of object has to be actually shared and reference-counted across processes. That is the notion of a gralloc buffer, which is referenced by a file descriptor.

Think of the gralloc buffer as a file, and multiple file descriptors exist that refer to the same file. Just like what happens with normal files, the kernel keeps track of open file descriptors to it. To transfer a gralloc buffer across processes, you send a file descriptor over a socket using standard kernel functionality to do so.

So when a gralloc buffer is shared between two processes, each process has its own GraphicBuffer object with its own refcount; these are sharing the same underlying gralloc buffer (but have different filed descriptors opened for it). The sharing happens by calling GraphicBuffer::flatten to serialize and GraphicBuffer::unflatten to deserialize it. GraphicBuffer::unflatten will call mBufferMapper.registerBuffer to ensure that the underlying buffer handle is refcounted correctly.

When a GraphicBuffer's refcount goes to zero, the destructor will call free_handle which call mBufferMapper.unregisterBuffer, which will close the file descriptor, thus decrementing the refcount of the gralloc buffer.

The GraphicBuffer constructors take a "usage" bitfield. We should always pass HW_TEXTURE there, as we always want to use gralloc buffers as the backing surface of OpenGL textures. We also want to pass the right SW_READ_ and SW_WRITE_ flags.

The usage flag is a hint for performance optimization. When you use SW flags, it may just disable all possible optimizations there. Since CPU usually cache data into registers, when we want to lock the buffer for read/write, it have to maintain the cache for correct data. However, other hardware that can use GraphicBuffer on Android e.g. Codec, Camera, GPU do not cache data. It locks/unlocks the buffer in a faster fashion.

It definitely helps performance if we can use the usage flag correctly to describe our purpose about the buffer. In particular, if the SW_READ/SW_WRITE usage flags are set, the GL driver and others will make sure to flush the cache after any rendering operation so that the memory is ready for software reading or writing. Only specify the flags that you need.

How we allocate Gralloc buffers

Most of out GraphicBuffer's are constructed by GrallocBufferActor::Create. This unconditionally uses SW_READ_OFTEN and SW_WRITE_OFTEN, which is probably bad at least for some use cases.

Out protocol to create GraphicBuffers is as follows. It's generally the content side that wants to create a new GraphicBuffer to draw to. It sends a message to the compositor side, which creates the Gralloc buffer and returns a serialized handle to it; then back to the content side, we receive the serialized handle and construct our own Gralloc buffer instance from it.

In more detail (this is from Vlad's wiki page):

Content side:

  • Entry point: PLayerTransactionChild::SendPGrallocBufferConstructor (generally called by ISurfaceAllocator::AllocGrallocBuffer).
  • This sends a synchronous IPC message to the compositor side.


Over to the compositor side:

  • The message is received and this comes in as a call to PLayerTransactionParent::AllocPGrallocBuffer, implemented in LayerTransactionParent.cpp.
  • This calls GrallocBufferActor::Create(...), which actually creates the GraphicBuffer* and a GrallocBufferActor* (The GrallocBufferActor contains a sp<GraphicBuffer> that references the newly-created GraphicBuffer*).
  • GrallocBufferActor::Create returns the GrallocBufferActor as a PGrallocBufferParent*, and the GraphicBuffer* as a MaybeMagicGrallocBufferHandle.
  • The GrallocBufferActor/PGrallocBufferParent* is added to the LayerTransactionParent's managed list.
  • The MaybeMagicGrallocBufferHandle is serialized for reply (sending back the fd that represents the GraphicBuffer) -- using code in ShadowLayerUtilsGralloc ParamTraits<MGBH>::Write.


Back to the content side:

  • After the sync IPC call, the child receives the MaybeMagicGrallocBufferHandle, using ShadowLayerUtilsGralloc.cpp's ParamTraits<MGBH>::Read.
  • Allocates empty GrallocBufferActor() to use a PGrallocBufferChild.
  • Sets the previously created GrallocBufferActor/PGrallocBufferChild's mGraphicBuffer to the newly-received sp<GraphicBuffer>.
  • The GrallocBufferActor/PGrallocBufferChild is added to the LayerTransactionChild's managed list.
  • A SurfaceDescriptorGralloc is created using the PGrallocBufferChild, and returned to the caller.

How we manage the lifetime of Gralloc buffers

As said above, what effectively controls the lifetime of gralloc buffers is reference counting, by means of android::sp pointers.

Most of our gralloc buffers are owned in this way by GrallocBufferActor's. The question then becomes, what controls the lifetime of GrallocBufferActors?

GrallocBufferActors are "managed" by IPDL-generated code. When they are created by the above-described protocol, as said above, they are added to the "managee lists" of the LayerTransactionParent on the compositor side, and of the LayerTransactionChild on the content side.

GrallocBufferActors are destroyed when either a "delete" IPC message is sent (see: Send__delete__) or the top-level IPDL manager goes away.

Unresolved problems

We don't have a good way of passing the appropriate USAGE flags when creating gralloc buffers. In most cases, we shouldn't pass SW_READ_OFTEN. If the SyncFrontBufferToBackBuffer mechanism requires that, that's sad and we should try to fix it (by doing this copy on the GPU). In many cases, it also doesn't make sense to pass SW_WRITE_OFTEN --- that basically only makes sense for Thebes layers, and Canvas2D if not using SkiaGL, but that doesn't make any sense for WebGL, SkiaGL canvas, or video. This is getting resolved by the patch in bug 843599 (vlad)

It sucks that when content wants a new gralloc buffer to draw to, it has to wait for all the synchronous IPC work described above. Could we get async gralloc buffer creation?

Gralloc buffers locking

Gralloc buffers need to be locked before they can be accessed for either read or write. This applies both to software accesses (where we directly address gralloc buffers) and to hardware accesses made from the GL.

The lock mechanisms used by Gralloc buffers (non Mozilla-specific)

How gralloc buffer locking works, varies greatly between drivers. While we only directly deal with the gralloc API, which is the same on all Android devices (android::GraphicBuffer::lock and unlock), the precise lock semantics vary between different vendor-specific lock mechanisms, so we need to pay specific attention to them.

  • On Android >= 4.2, a standardized fence mechanism is used, that should work uniformly across all drivers. We do not yet support it. B2G does not yet use Android 4.2. These are called sync points and are discussed here [1] and [2]. They are currently in the staging tree [3] and there is a similar non-android linux concept called dma-buf fences being worked on.
  • On Qualcomm hardware pre-Android-4.2, a Qualcomm-specific mechanism, named Genlock, is used. We explicitly support it. More on this below.
  • On non-Qualcomm, pre-Android-4.2 hardware, other vendor-specific mechanisms are used, which we do not support (see e.g. bug 871624).

Genlock

Official genlock documentation can be found in Qualcomm kernel sources: genlock.txt.

In a nutshell, with genlock,

  • Read locks are non-exclusive, reference-counted, and recursive. This means that a single caller may issue N read locks on a single gralloc buffer handle, and then issue N unlocks to release the lock.
  • Write locks are completely exclusive, both with any other write lock and also with any read lock.
  • The following is somewhat speculative, not firmly established (QUESTION: so is it correct?). If a buffer is already locked (for read or write) and an attempt is made to get a write lock on it, then:
    • If the new write lock attempt is using the same handle to the gralloc buffer that is already locked, this will fail. This typically gives a message like "trying to upgrade a read lock to a write lock".
    • If the new write lock attempt is using a different handle than the one already locked, then this locking operation will wait until the existing lock is released.
  • A write lock can be converted into a read lock.


Genlock is implemented in the kernel. The kernel GL driver is able to lock and unlock directly. Typically, it will place a read lock on any gralloc buffer that's bound to a texture it's sampling from, and unlock when it's done with that texture.

A logging patch for genlock that lets you see locking across process is here: [4]

How we lock/unlock Gralloc buffers

Drawing to Gralloc buffers

When (on the content side) we want to draw in software to a gralloc buffer, we call ShadowLayerForwarder::OpenDescriptor() in ShadowLayerUtilsGralloc.cpp. This calls android::GraphicBuffer::lock(). When we're done, we call ShadowLayerForwarder::CloseDescriptor() in the same file, which calls android::GraphicBuffer::unlock().

This is generally done by TextureClientShmem. Indeed, there is no need for a gralloc-specific TextureClient, as on B2G, TextureClientShmem will implicitly try to get gralloc memory, and will silently fall back to generic non-gralloc shared memory if gralloc fails. The knowledge of whether the TextureClientShmem is actually gralloc or not, is stored in the underlying SurfaceDescriptor. The compositor, upon receiving it, will examine the type of the SurfaceDescriptor, and if it is a SurfaceDescriptorGralloc, it will create a GrallocTextureHostOGL (see below).

Thus, there is no Gralloc-specific TextureClient class --- but there is a Gralloc-specific TextureHost class.

Drawing from Gralloc buffers (binding to GL textures)

When (on the compositor side) we want to draw the contents of a gralloc buffer, we have to create an EGLImage with it (see GLContextEGL::CreateEGLImageForNativeBuffer), and create a GL texture object wrapping that EGLImage (see GLContextEGL::fEGLImageTargetTexture2D).

This is generally done by GrallocTextureHostOGL.

It is worth noting that there are two levels of locking involved here.

As GrallocTextureHostOGL::Lock is called, it calls fEGLImageTargetTexture2D (as explained above) which immediately results in placing a read lock on the gralloc buffer. When a subsequent GL drawing operation occurs, sampling from that texture, that will then also place a read lock on the gralloc buffer, this time from the GL kernel driver.

It is vital that these two read locks get released as soon as possible, as we won't be able to draw again into the gralloc buffer (which requires a write lock) until then.

The read lock placed directly by fEGLImageTargetTexture2D is unlocked in GrallocTextureHostOGL::Unlock. However, we don't have a very good way to do that; see below ("Unresolved problems").

The read lock placed internally by the GL kernel driver gets released at some point after it's finished drawing; we don't know very precisely when. The following is somewhat speculative, not firmly established (QUESTION: so is it correct?): as the GL kernel driver uses a different handle than we do, its read lock doesn't cause failure of our subsequent attempts to lock the gralloc buffer for write; instead, it just causes it to wait until it's released.

Unresolved problems

We don't have a great way of un-attaching a gralloc buffer from a GL texture. What we currently do (see GrallocTextureHostOGL::Unlock) is that we issue another fEGLImageTargetTexture2D call to overwrite the attachment by a dummy "null" EGLImage. That is however known to cause performance issues at least on Peak (see bug 869696). Another approach (that happens to perform better on Peak) is to just destroy the GL texture object (and recreate it every time). But really, we should have a reliable and fast way of releasing the read locks that we are placing on gralloc buffers when we attach them to GL texture objects. QUESTION: We should understand why attaching the dummy "null" EGLImage is slow on the Peak device.

How Android is using Gralloc

Some terminology: EGLSurface, ANativeWindow, etc.

EGLSurface is a portable EGL abstraction for a possibly multi-buffered render target.

ANativeWindow is the Android-specific abstraction for a possibly multi-buffered render target. The eglCreateWindowSurface function allows to create an EGLSurface from an ANativeWindow. There are two concrete implementations of ANativeWindow in Android: FramebufferNativeWindow and SurfaceTextureClient.

                        EGLSurface

                            ^                      EGL world
                            |                    opaque handles
                            |
----------------------------+---------------------------------------
                            |
                            |                      Android world
                                                    C++ classes
                        ANativeWindow
                        =============
                     abstract base class

                     /                \
                    /                  \
                   /                    \

 FramebufferNativeWindow               SurfaceTextureClient
 =======================               ====================
Directly linked to fbdev                What everybody uses
Only 1 instance system-wide

While ANativeWindow abstracts a possibly multi-buffered render target, the individual buffers managed by ANativeWindow are instances of ANativeWindowBuffer.

The concrete implementation of ANativeWindowBuffer is GraphicBuffer, the class discussed above in this document.

SurfaceTexture

SurfaceTexture is the server side of a client-server system, whose client side is SurfaceTextureClient. As explained above, SurfaceTextureClient is a concrete implementation of ANativeWindow.

The reason to use a client-server system like this is to allow producing and compositing a surface in two different processes.

Let us introduce two important functions that a client needs to call on its ANativeWindow: dequeueBuffer and queueBuffer

  • dequeueBuffer acquires a new buffer for the client to draw to;
  • queueBuffer lets the client indicate that it has finished drawing to a buffer, and queues it (e.g. for compositing).


Since eglSwapBuffers internally calls dequeueBuffer and queueBuffer, this system removes the need for manual management of GraphicBuffer's as we are currently doing in our B2G code.

The other benefit of this system is that most BSP vendors provide graphics profilers (e.g. Adreno Profiler from QCOM, PerfHUD ES from nVidia) which recognize the eglSwapBuffers calls as frame boundaries to collect frame-based GL information from the driver to help development and performance tuning.

In Android 2, there were many buffer management systems. In Android 4, all of this is unified under SurfaceTexture. This is made possible by the great flexibility of SurfaceTexture:

  • SurfaceTexture supports both synchronous and asynchronous modes.
  • SurfaceTexture supports generic multi-buffering: it can have any number of buffers between 1 and 32.


Examples:

  • The Codec/Camera code configures it to have several buffers (depending on hardware, for instance 9 buffers for camera preview on Unagi) and run asynchronously
  • SurfaceFlinger (the Android compositor) configures it to have 2--3 buffers (depending on BSP) and run synchronously.
  • Google Miracast uses it to encode OpenGL-rendered surfaces on-the-fly.


Let us now describe how the client-server SurfaceTexture system allocates GraphicBuffer's, and how both the client and server sides keep track of these shared buffer handles. Again, SurfaceTexture is the server-side class, while SurfaceTextureClient is the client-side class. Each of them stores an array of GraphicBuffers, which is called mSlots in both classes. The GraphicBuffer objects are separate instances in SurfaceTexture and in SurfaceTextureClient, but the underlying buffer handles are the same. The mechanism here is as follows. The client side issues a SurfaceTextureClient::dequeueBuffer call to get a new buffer to paint to. If there is not already a free buffer in mSlots, and the number of buffers is under the limit (e.g. 3 for triple buffering), it sends an IPC message that results in a call to SurfaceTexture::dequeueBuffer which allocates a GraphicBuffer. After this transaction, still inside of SurfaceTextureClient::dequeueBuffer, another IPC message is sent, that results in a call to SurfaceTexture::requestBuffer to get the GraphicBuffer serialized over IPC back to it, using GraphicBuffer::flatten and GraphicBuffer::unflatten, and cache it into its own mSlots. The mSlots arrays on both sides mirror each other, so that the two sides can refer to GraphicBuffer's by index. This allows the client and server side to communicate with each other by passing only indices, without flattening/unflattening GraphicBuffers again and again.

Let us now describe what happens in SurfaceTextureClient::dequeueBuffer when there is no free buffer and the number of buffers has already met the limit (e.g. 3 for triple buffering). In this case, the server side (SurfaceTexture::dequeueBuffer) will wait for a buffer to be queued, and the client side waits for that, so that SurfaceTextureClient::dequeueBuffer will not return until a buffer has actually been queued on the server side. This is what allows SurfaceTexture to support both synchronous and asynchronous modes with the same API.

Let us now explain how synchronous mode works. In synchronous mode, on the client side, inside of eglSwapBuffers, when ANativeWindow::queueBuffer is called to present the frame, it sends the index of the rendered buffer to the server side. This causes the index to be queued into SurfaceTexture::mQueue for rendering. SurfaceTexture::mQueue is a wait queue for frames that want to be rendered. In synchronous mode, all the frames are shown one after another, whereas in asynchronous mode frames may be dropped.

Let us now explain how SurfaceFlinger (the Android compositor) uses this system to get images onto the screen. Each time SurfaceTexture::queueBuffer is called, it causes SurfaceFlinger to start the next iteration of the render loop. In each iteration, SurfaceFlinger calls SurfaceTexture::updateTexImage to dequeue a frame from SurfaceTexture::mQueue and bind that GraphicBuffer into a texture, just like we do in GrallocTextureHostOGL::Lock.

The magic part is that SurfaceFlinger does not need to do the equivalent of GrallocTextureHostOGL::Unlock. In our case, we have a separate OpenGL texture object for each TextureHost, which typically (at least in the case of a ContentHost) represent one buffer each (so a double-buffered ContentHost has two TextureHost's). So we have to unbind the GraphicBuffer from the OpenGL texture before we can hand it back to the content side --- otherwise it would remain locked for read and couldn't be locked for write for content drawing. By contrast, SurfaceFlinger does not need to worry about this because it uses only one OpenGL texture, so that when it binds a new GraphicBuffer to it for compositing, that automatically unbinds the previous one!