[Bf-python] Fastest way for RenderEngine to return combined rgba + z

Discussion:

Asbjørn

2013-01-31 22:36:16 UTC

Hi,

in LuxBlend25 we're returning combined rgba + z (depth) data as follows:

result = self.begin_result(0, 0, xres, yres)
lay = result.layers[0]
lay.rect, lay.passes[0].rect = combinedResults()
self.end_result(result)

where combinedResults() returns a tuple of two lists, the first a list
of rgba values, each rgba value also stored as a list), and the second a
list of floats. This function is implemented in a binary module.

This method is very slow. It seems the overhead of converting the
internal data to lists is quite high. Returning an iterator for the
outer list of rgba pixels did not improve the situation noticeably.

Are there better (API) ways of doing this directly, that is, without
loading files from disk?

Cheers
- Asbj?rn

Asbjørn

2013-02-01 11:36:57 UTC

Permalink

Post by AsbjÃ¸rn
Are there better (API) ways of doing this directly, that is, without
loading files from disk?

In case it wasn't clear, I'm curious if the Blender Python API has more
efficient methods to update the result layers.

Cheers
- Asbj?rn

Brecht Van Lommel

2013-02-01 18:25:08 UTC

Permalink

Hi,

Post by AsbjÃ¸rn
This method is very slow. It seems the overhead of converting the
internal data to lists is quite high. Returning an iterator for the
outer list of rgba pixels did not improve the situation noticeably.
Are there better (API) ways of doing this directly, that is, without
loading files from disk?

I think currently loading from disk is still the quickest method,
especially as on a modern OS the contents of that file is cached in
memory on doesn't actually have to be read from the physical disk
usually.

We could implement a more efficient way to assign the pixels though by
letting it accept a buffer object, which you could then use to wrap
the pixels without actually having to copy them into a python list.
http://docs.python.org/3.3/c-api/buffer.html

There is already a trick that you can use, it didn't work entirely
before but I just committed a fix for that. Basically you use the
foreach_set method which accepts a buffer object:
http://www.blender.org/documentation/blender_python_api_2_65_9/bpy.types.bpy_prop_collection.html?highlight=foreach_get#bpy.types.bpy_prop_collection.foreach_set

Example here. The inconvenient thing about this is you have to set the
pixels for all layers/passes at once, but it should be quick if the
lux C code can create the python buffer object.
http://www.pasteall.org/39345/python

Brecht.

Asbjørn

2013-02-01 21:29:07 UTC

Permalink

Post by Brecht Van Lommel
I think currently loading from disk is still the quickest method,
especially as on a modern OS the contents of that file is cached in
memory on doesn't actually have to be read from the physical disk
usually.

True, however you risk polluting the temp directory with orphaned files,
would be neater to go directly :)

Post by Brecht Van Lommel
There is already a trick that you can use, it didn't work entirely
before but I just committed a fix for that. Basically you use the
http://www.blender.org/documentation/blender_python_api_2_65_9/bpy.types.bpy_prop_collection.html?highlight=foreach_get#bpy.types.bpy_prop_collection.foreach_set

This looks very interesting indeed. Which revision will I need for this
to work?

Thanks for the help and example!

PS: there's a typo in the API example you linked to above, the seq and
attr parameters seems to be switched in the collection.foreach_set call.

Cheers again
- Asbj?rn

Brecht Van Lommel

2013-02-02 13:14:39 UTC

Permalink

Post by AsbjÃ¸rn

This looks very interesting indeed. Which revision will I need for this
to work?

Revision 54253 or newer.

Post by AsbjÃ¸rn
Thanks for the help and example!
PS: there's a typo in the API example you linked to above, the seq and
attr parameters seems to be switched in the collection.foreach_set call.

Thanks for pointing out, I see Campbell just fixed it.

Brecht.

Asbjørn

2013-08-19 14:59:04 UTC

Permalink

I've finally gotten around to implementing this, and it does indeed work
nicely.

On a single 1920x1080 image with alpha and depth (zbuffer), it takes
about 1.2 seconds total to update and transfer the image to the render
result. About 400 ms is spent on our side tonemapping etc and converting
to bottom-up format (internally we store top-down). The remainder, about
800 ms, is spent getting the result into Blender using the foreach_set.
This was measured in different ways on my i7 2700k.

The old method which returned a list of lists uses about 6-8 seconds on
the same image, and considerably more memory. So indeed a nice boost there.

Two catches:

- foreach_set requires that the object not only follows the buffer
protocol but also a subset of the sequence protocol. Specifically it has
to pass PySequence_Check() which requires sq_item to be assigned, and
foreach_set itself requires a valid sq_length implementation as it uses
PySequence_Size(). Those are the only two PySequenceMethods fields that
needs to be filled though.

- As mentioned you have to assign all layers and passes in one go. We
don't support multiple layers yet but we're talking about it, so it
would be nice to be able to assign each layer individually using a buffer.

Thanks again for the assistance.

- Asbj?rn

Dan Eicher

2013-08-19 22:24:49 UTC

Permalink

Hello,

In py-3.2 or 3.3 they added full support for buffer objects so unless
there's some conversion that needs to be done it should be able to take a
pure buffer object and do a simple memcpy -- which I'd imagine would be a
lot faster than 800ms. The Py_buffer struct should be able to supply all
the information that the current sequence methods currently provide.

As an added plus you could define it as negative strides indexing and do
away with the bottom to top conversion as well.

Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.blender.org/pipermail/bf-python/attachments/20130819/14ccddbf/attachment.htm

Asbjørn

2013-08-19 23:03:01 UTC

Permalink

Post by Dan Eicher
In py-3.2 or 3.3 they added full support for buffer objects so unless
there's some conversion that needs to be done it should be able to take
a pure buffer object and do a simple memcpy -- which I'd imagine would
be a lot faster than 800ms. The Py_buffer struct should be able to
supply all the information that the current sequence methods currently
provide.

I was under the impression nothing had changed in the Blender API, I
could be mistaken of course :)

Post by Dan Eicher
As an added plus you could define it as negative strides indexing and do
away with the bottom to top conversion as well.

Yes, ideally. However internally on our side (LuxRender core) the alpha
buffer is stored separately from the framebuffer, and Blender requires
premultiplied alpha always, while our internal data differs depending on
what the user chose, so we need to do a bit of data juggling and
conversion anyway.

Still, the data juggling and such only takes about 20ms out of the about
400ms spent on our side for a 1080p image, which I think is acceptable.
Better to figure out a way to lower the remaining 380ms :)

Cheers
- Asbj?rn

Bram de Greve

2013-08-23 13:05:53 UTC

Permalink

Post by AsbjÃ¸rn
I've finally gotten around to implementing this, and it does indeed work
nicely.
On a single 1920x1080 image with alpha and depth (zbuffer), it takes
about 1.2 seconds total to update and transfer the image to the render
result. About 400 ms is spent on our side tonemapping etc and converting
to bottom-up format (internally we store top-down). The remainder, about
800 ms, is spent getting the result into Blender using the foreach_set.
This was measured in different ways on my i7 2700k.

This surely looks interesting. I've been looking for something like this
for my own little toy-raytracer (liar.bramz.net).
For this you implemented a C extension module that's loaded in Blender
as part of your add-on?

Bramz.

Asbjørn

2013-08-23 22:22:00 UTC

Permalink

Post by Bram de Greve
This surely looks interesting. I've been looking for something like this
for my own little toy-raytracer (liar.bramz.net).
For this you implemented a C extension module that's loaded in Blender
as part of your add-on?

Yes, pylux is a binary python module (called "pylux"). It's implemented
almost exclusively using Boost.Python, the only direct Python C stuff is
the buffer things I added now.

The source code is available under the GPL license here:
https://bitbucket.org/luxrender/lux/src/tip/python?at=default

The buffer specific code is here:
https://bitbucket.org/luxrender/lux/src/0c11e6bf3765c26d16039cb8c4dd59a6886d6013/python/pycontext.h?at=default#cl-234

This object wraps the buffer stuff, and is used in
PyContex::blenderCombinedDepthBuffers().

The Blender python plugin code is here:
https://bitbucket.org/luxrender/luxblend25/src/9beea7f187d76b432c15505e322ccc556cb37a6b/src/luxrender/outputs/__init__.py?at=default#cl-115

In essence it's just

result = render_engine.begin_result(0, 0, xres, yres)

pb, zb = pylux.blenderCombinedDepthBuffers() # get buffer objects
result.layers.foreach_set("rect", pb) # rgba pixels
result.layers[0].passes.foreach_set("rect", zb) # zbuffer

render_engine.end_result(result)

So far we only support this for a single layer with a single pass.

Please note that this was the very first time I've done anything Python
C api stuff, and also the very first time I've done anything non-trivial
with Boost.Python (I did not write the initial pylux code), so there's
probably some silly stuff in there :)

Just ask if you have any questions.

Cheers
- Asbj?rn

Asbjørn

2013-08-25 20:09:01 UTC

Permalink

Post by AsbjÃ¸rn
On a single 1920x1080 image with alpha and depth (zbuffer), it takes
about 1.2 seconds total to update and transfer the image to the render
result. About 400 ms is spent on our side tonemapping etc and converting
to bottom-up format (internally we store top-down). The remainder, about
800 ms, is spent getting the result into Blender using the foreach_set.
This was measured in different ways on my i7 2700k.

Here are some more detailed timings:

begin_result: 2 ms
blenderCombinedDepthBuffers: 366 ms
layers.foreach_set: 21 ms (color)
passes.foreach_set: 2 ms (zbuffer)
end_result: 780 ms

The "blenderCombinedDepthBuffers" is our function, which performs
tonemapping and merges the color and alpha data into a single rgba float
buffer, which it returns along with the zbuffer.

Does the end_result involve uploading the data to OpenGL or similar? I'm
a bit surprised it takes so long.

Cheers
- Asbj?rn