Towards Java2D limits

Towards Java2D limits

After the profiling process I wasn't completely satisfied with the speed of the sky rendering process in JPARSEC. Some minor bugs remained and while fixing them I realized there was a lot of tweaks that could be done to improve performance that won't appear evident using only a profiling tool. These optimizations are required for better experience in Android, since previous version was difficult to execute in version 1 or 2 of the Android OS. Here I will summarize the progress done for JPARSEC 1.92.

Memory optimizations

I started the optimizations using the JVisualVM profiling tool to control the use of memory. The SkyChart component required 232 MB, which is too much, and I finally found the source of the memory waste in the set of TextLabel instances saved in the AWTGraphics class. They were saved to avoid constructing TextLabel instances all the time, but performance penalty is very little compared to the increase in memory consumption. Eliminated this, memory consumption went down to 89 MB. Android was obviously not affected by this.

After this, next memory optimizations came without using JVisualVM. I eliminated the static milky way texture to avoid having it in memory all the time. Memory requirement went down to 76 MB (this was measured with JVisualVM after playing a little with the SkyChart component until memory consumption was stable). The Milky Way computations were saved in a list of LocationElement objects in a non efficient way, so I implemented a class called MilkyWayData to hold the computations, and memory went down to 70 MB.

Next, I realized a great bottleneck in the rendering process just at the beginning. In an old version of JPARSEC I needed to create a BufferedImage to save the rendering before drawing the labels, in order to be able to apply the algorithm to distribute efficiently the labels without superimposing them. This first rendering is no longer needed, but it wasn't removed. Eliminating the BufferedImage object reduced memory consumption to 64 MB, and also allowed an important performance increase. Later I removed the MilkyWayData objects since everything was so optimized (thanks, in part, to using the faster fillRect instead of fillOval to render the Milky Way texture) that I preferred to keep memory in 39 MB instead of the little performance increase of holding all that data.

Last step was to reorganize the star data. Previous version computed star ephemerides in different accuracy levels depending on the zoom level, the region, and the time elapsed from J2000 epoch. It was very inefficient since the data for double/variable stars was spread in different arrays. I implemented a StarData class to hold everything in a clean way, computing ephemerides at startup for all stars using a dedicated algorithm for maximum performance, instead of calling the StarEphem class. Startup is now 3x faster, memory is around 32 MB, and the component is more responsive since ephemerides are only computed at startup. This also allowed another important performance boost. Memory consumption can be reduced even more by eliminating the star textures (3 MB) or the leyend, but these are options already present in the SkyRenderElement object, along with several new options for better quality. Among them now it is possible to render the Milky Way textures in different wavelengths.

In Android initial memory consumption is now 9 MB (before it was 16), or 12 MB using Milky Way texture. When showing planets it peaks below 30 MB (before it was close to 70).

Performance optimizations

Respect performance, I measured it using the SkyChart component and dragging the sky around Cygnus constellation. I zoom in until stars of magnitude 7 are shown, with a resolution of 1200×800. In previous version I managed to reach 25 fps in my Core i5 Linux desktop, starting from 18 and using JVisualVM only. First improvement came when I realize I was trying to draw stars outside the visible screen. It seems Java 7 was able to 'predict' this partially and that was the reason for the different performance between Java 6 and 7. This fix allowed to reach 32 fps in both Java 6 and 7. After eliminating the duplicated rendering process for the BufferedImage performance reached 37 fps, and 41 fps when I reorganized the star data and the sky/constelations rendering process. It seemed all but after googling I found a tip at http://www.pushing-pixels.org/2008/06/06/effective-java2d.html and I tested to disable the alpha channel in the BufferedImage objects of the constructors in AWTGraphics. Performance reached 47 fps, while maintaining image quality since Java automatically flattens the textures to draw them correctly. The tip of avoiding filling shapes with antialiasing enabled didn't help to improve performance.

At this point I executed JVisualVM again and found that most of the time was spent in the AWT methods to draw everything. Some Java2D methods appears like a bottleneck. However, the draw(Shape) method of Graphics2D (and not the fill(Shape) one) was spending too much CPU, even disabling the rendering of the Milky Way outline, and after checking the code I found the draw(Shape) was unnecesary called when drawing nebula, since fill(Shape) is enough for emission nebula. Performance increased to 52 fps, with the possibility of reaching 54 in case the outline color for dark nebula is equal to the background color. Executing JVisualVM for the last time showed that for wide fields draw(Shape) consumes most of the CPU when the Milky Way outline is enabled, while at greater zoom levels drawLine() spends most of the time. An implementation of the Milky Way rendering using drawPolyLine didn't help to increase performance.

JVisualVM shows the AWT library of Java2D is a bottleneck in certain operations like drawLine and draw/fillOval, but anyway the current performance of sky rendering is really fastastic. In my PC I would say it is clearly faster than KStars, although in a slower PC with a 32bit OS KStars is faster. Anyway, the typical and easy to say expression 'Java is slow' sounds to me false. Java is extremely efficient if it is used properly, there are a huge margin to improve performance or memory consumption in most applications, but since Java takes care of memory handling and provides a lot of math operations it is unlikely that a given developer will take the care/time to implement a FastMath class for approximate and very fast maths, or to apply a profiling tool during one or two months, including a detailed code inspection since some of the problems will not be catched using only a profiler. It is noticeable the improvement in memory handling of a factor 3 (even considering the use of TextLabel as a bug) and in CPU of a factor >2 even after executing JVisualVM in a first phase of the optimization, described in a previous post.

More performance beyond Java2D

At this point I investigated the possibility of increasing performance by using other libraries instead of AWT. I found two options: GLG2D and PerfGraphics.

GLG2D

This library is an implementation of Graphics using JOGL (OpenGL). To use OpenGL seems the best and surely the fastest option, but most of the times I become deceive using these libraries since I find errors almost always, and this was the case again. In the PC I use at OAN (Core 2 with an ATI card) the examples crash always, while in my home PC (Core i5 with a NVidia card) they work during a few seconds only. It seems the examples creates a lot of threads and finally the app (and Eclipse sometimes) crashes.

One of the examples measures the fps using AWT and GLG2D as renderers, and the result is 1.7 fps and 8.4, so GLG2D seems 5 times faster than AWT while maintaining image quality. I cannot imagine the sky rendering process runing at more than 200 fps, maybe in the future GLG2D or another alternative (loon-simple also looks promising) could get enough stable in Linux to be used instead of AWT.

Anyway, I managed to run SkyChart using a G2DGLPanel with GLGraphics2D, but obtained very poor performance: 23 fps after all performance improvements detailed below (which is 3 times slower than Java2D in the same situation, using Java6 and 32 bit mode in a 64 bit machine), and much less visual quality. In particular, strokes didn't work in the same way as in Java2D, and star textures appeard with excesive contrast. There was important issues with drawLine, that showed wrong lines everywhere in addition to the correct ones. Methods with greater CPU time were jogamp.opengl.gl4.GL4bcImp.dispatch_glBegin1, com.jogamp.opengl.util.texture.awt.AWTTextureData.createFromCustom, and jogamp.opengl.gl4.GL4bcImp.dispatch_glReadPixels1, although it seems to waste only 20% of CPU time. Probably I'm not using this library correctly, but anyway it is clear that there are some other problems also.

Another (but old) library also designed to implement Java2D with OpenGL is Agile2D. It is easier to implement in AWTGraphics, but 64bit is not supported in the system dependent libraries… Another is swing-gl, but Graphics2D is very incomplete.

PerfGraphics

I was surprised to see that PerfGraphics performs quite faster compared to AWT in most of the methods, so I tested it first. To setup PerfGraphics in AWTGraphics was extremely easy, in fact I did it a just a minute! The methods fill/draw oval/rect are 2 times faster, and draw image/string/line slightly faster. However, it seems this library has problems to render images with transparency, and results are not fine. The quality of the rendering compared to AWT is quite poor, but it is true that without using image textures performance is 40-50% better with this library, although quality is more important.

What is interesting is the drawLine methods that are available in this library. One uses the algorithm by Wu (1991) that considers antialiasing, and the other the one by Bresenham (1990). I assume AWT uses the one by Wu (although PerfGraphics implementation is slightly faster…), but the point is that without antialiasing AWT's drawLine method is still extremely slow. Tests showed that Bresenham's algorithm is 10-20 times faster that AWT's one, so even considering that it does not take into account antialiasing, alpha blending, or the stroke it would improve performance when used for certain drawLine operations, in case those limitations are acceptable. This can be the case when rendering the coordinate grid, the Milky Way outline, and even the constellation figures. So I added an option in SkyRenderElement to select different fast line rendering options and implemented them. I also updated the Milky Way filling process to use the fast draw line method, and I applied it to the lines that defines the squared region where the rendering is done, as well as some lines for the symbols of deeps sky objects.

Latest fps number above is 52, but now with these changes fps boosts to 98 when applying the fast drawLine to everything, and still 88 when applying to Milky Way outline and the grid. In the last case visual quality is not affected too much, so its worth the effort. In fact, 70 fps is now the minimum speed I get in all possible fields of view! Previous numbers include an additional 5 fps that can be obtained by rendering against the Graphics2D object of the JLabel I use to put the rendering on the panel (without losing double buffering since SkyRendering draws to an image first), and also a 3 fps gain by calling AWTGraphics without creating a new BufferedImage all the time, but it does not include an additional 4 fps that can be obtained using the background color as the outline color for dark nebula. Since Cygnus is one the most crowded fields 100 fps is a barrier that can be now easily beated. After this the fillOval method showed to be too slow and I decided to use drawImage instead by computing ovals of different sizes previously, in a similar way as with the star textures. Performance increased in the 'cute' rendering mode from 30 to 47 fps.

The method that consumes now most of the CPU is drawImage, but this is fine since it seems one of the fastest Java2D methods. drawOval also consumes a lot when star textures are disabled. Sure this can be improved even more, but I think current optimization level is far enough, in fact impressive considering that it is not hardware accelerated. In fact, using -Dsun.java2d.opengl=True I obtain 64 fps instead of 85, and without antialiasing 61, which means that my Java program is faster without using hardware acceleration. This confirms that Java internally does little to improve performance using OpenGL, as most OpenGL developers say. Also means that OpenGL drivers are not very well optimized for Linux, although thanks to the Valve' Steam program and NVIDIA it seems this is changing now.

About Android, it is clear that it is faster now. In my Android 1.6 phone Saturn can be rendered without issues, but in the emulator with a bigger screen resolution it fails unless I reduce the quality of the rendering and hence the size of the image that holds it. Dragging the sky in my old phone goes at about 2 fps, so after releasing the finger the rendering is completed almost without waiting. Star ephemerides are now 3 times faster during startup, and accuracy compared to full precision mode is in the arcsecond level.

 
blog/towards_java2d_limits.txt · created: 2012/11/18 11:43 (Last modified 2016/05/17 12:52) by Tomás Alonso Albi
 
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki