Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1px gaps between tiles on Android #39096

Closed
mattislorentzon opened this issue May 27, 2020 · 38 comments
Closed

1px gaps between tiles on Android #39096

mattislorentzon opened this issue May 27, 2020 · 38 comments

Comments

@mattislorentzon
Copy link

Note: This issue is created after a discussion in #38004, where the initial assumption was that this problem was due to the new batching feature in 3.2.2. Links to the posts are included at the end of this issue as reference.

Godot version:
3.2.2 beta3 official

OS/device including version:
Galaxy Tab A (SM-T510), 1920x1200px resolution, Android 9,
GPU: Mali-G71 MP2
Backend: GLES2

Issue description:
While trying out v3.2.2 beta3 I noticed on one of my Android devices that in some instances there are visible 1px gaps between tiles in my tilemaps, see the following screenshot. When disabling batching the gaps are gone.
batching_bug_minimal

  • The gaps disappear when switching of batching.
  • The gaps only appears for tiles that are created with an offset inside a texture, i.e. a 32x32px tile created from a 32x32px image will not have this problem, but a 32x32px tile created at row 256 in a 32x320px image will have the problem.
  • The gaps are only visible on my android tablet, not my PC, even if I set the window size to the same resolution as my tablet.
  • Filter is disabled for my tile texture, pixel snap is enabled in project settings.
  • Changing the project window size has an effect on the artifacts, e.g. 320x224 shows the problem while 320x400 does not.

Steps to reproduce:

  • Create a new project.
  • Set project resolution to 320x224, stretch mode to 2d, stretch aspect to keep, enable pixel snap
  • Add a new tilemap.
  • Add a new tileset to the tilemap.
  • Using the texture from the attached minimal example project, create a new 32x32 tile at position x:256 y:0
  • Draw a few tiles
  • Run the project on Galaxy Tab A

Minimal reproduction project:
batching_bug_minimal.zip

Previous posts on this:

#38004 (comment)
#38004 (comment)
#38004 (comment)
#38004 (comment)
#38004 (comment)
#38004 (comment)
#38004 (comment)

@dalexeev
Copy link
Member

This happens when scaling with a non-integer factor. Pixels do not match exactly, so this effect is obtained. This can be avoided if you outline each tile/sprite as follows:

1

In addition, setting "Use Pixel Snap" and disabling the filter when importing images is very important.

Unfortunately, Godot is not refined for pixel graphics and does not perform the corresponding conversions automatically and implicitly.

@Calinou
Copy link
Member

Calinou commented May 28, 2020

@dalexeev See #6506.

@lawnjelly
Copy link
Member

I think I've worked out at least one potential problem. The bias in the shader to the UV is always positive. But if the tile is flipped (on horizontal or vertical axis) this bias may also need to be flipped, depending on the draw method. The uniform method does this slightly differently if I remember correctly, it literally draws the thing back to front, which may affect this.

I'm not sure this is the problem in your scenario but it seems worth testing.

#ifdef USE_PIXEL_SNAP
	outvec.xy = floor(outvec + 0.5).xy;
	// precision issue on some hardware creates artifacts within texture
	// offset uv by a small amount to avoid
	uv += 1e-5;
#endif

I'll do some experiments, I'm confident we will come to as best solution as possible (given the precision constraints at the GPU).

I'm also still not absolutely sure using a constant bias is a good idea compared to half a texel. Either way the problem with such small bias is they are liable to precision issues e.g. from fixed point.

@mattislorentzon
Copy link
Author

I ran a few more tests using this 32x32 tile:
tile

The tile is only placed at one position once in the tilemap, and there are no other sprites in the project. Batching diagnostics are:

canvas_begin FRAME 547
items
	joined_item 1 refs
			batch D 0-0 
			batch R 0-1 [0 - 143] {255 255 255 255 }
canvas_end

When running on my android tablet the tiles are displayed as follows.

Batching enabled Batching disabled
batching_on batching_off

Modifying the pixel snap uv-offset in gles2 canvas.glsl had no effect regardless of batching (except messing up the godot editor). Do I need to rebuild the export templates for changes to show on my android device?

But if the tile is flipped (on horizontal or vertical axis) this bias may also need to be flipped

I couldn't figure out how to flip the tile, let me know how if you'd like me to try it out :)

@lawnjelly
Copy link
Member

lawnjelly commented May 29, 2020

Ah with the UV effects, I suspect it is having no effect in android because you have to recompile the android export templates when you make changes (and ensure you are using the recompiled template in the export). This makes tweaking android a bit of a pain as it is very easy to accidentally be using the wrong template, I've done it many times! 😁

The way I do the templates is, I compile them (follow instructions in the docs) .. I think they now get placed in the bin/ folder. Then in the android export section for the project you have the choice to use custom templates, and I point it to these output files. You could presumably alternatively copy them to the default template folder, but you'd need an automated solution as you would probably forget.

If you do get the templates compiling and working, one key thing to test would be in rasterizer_canvas_base_gles2.cpp, line 1009:

RasterizerCanvasBaseGLES2::RasterizerCanvasBaseGLES2() {
#ifdef GLES_OVER_GL
	use_nvidia_rect_workaround = GLOBAL_GET("rendering/quality/2d/use_nvidia_rect_flicker_workaround");
#else
	// Not needed (a priori) on GLES devices
	use_nvidia_rect_workaround = false;
#endif
}

This disables nvidia workaround on non-desktop platforms. If you could override this, e.g.

	// Not needed (a priori) on GLES devices
	use_nvidia_rect_workaround = true;

If you now set use_batching to false, this will test the indexed primitive draw method on android, and find out whether it suffers from the same bug. I suspect it would .. this will be helpful in determining the cause.

1:1 assumption

I realised some problems with the UV offset last night. Basically there are a lot of problems because UV offsetting is based on the idea that you will be using a 1:1 scale of pixels to texels.

Classically you might apply a 0.5 texel offset, which would make it most likely the centre of the texel would get hit for the filtering.

So consider a single pixel, starting at texel 0.5, and ending at 1.5. It samples the 0.5 and is correct.

Multiply by 10 (for 2d scaling) and the last line will sample at texel 1.4, and sample outside the intended UV area. This is kind of hard to explain but I will try and make a diagram later.

So it could be that the correct offset to use will depend on the pixel / texel ratio (or likely as a quick test, if you remove the offset bodge, it's more likely to work).

I'm still interested to work out why the uniform buffer method seems to be different in this respect.

Precision

The differences between desktop and android are intriguing, it suggests a lower precision being used either in the vertex or fragment shader which are affecting it. It could be that adding the UV offset on the CPU rather than in the shader helps in this respect (or at least gives more repeatable behaviour). The calculation in the vertex shader could be happening at a lower bitdepth with less resolution than expected (this seems likely). On the other hand the input data may be quantized to that bitdepth anyway so doing the calculation on CPU may not make a difference. The fragment shader is even more likely to be using a lower bitdepth than desktop.

@lawnjelly
Copy link
Member

I have now replicated this on desktop. All I needed to do was to increase the uv bodge offset. It seems to exhibit the same behaviour as you are reporting (it works fine at 1:1, and then as you increase the scale, you get lines appearing on odd rows etc).

#ifdef USE_PIXEL_SNAP
	outvec.xy = floor(outvec + 0.5).xy;
	// precision issue on some hardware creates artifacts within texture
	// offset uv by a small amount to avoid
	//uv += 1e-5;
	uv += 0.001;
#endif

Screenshot from 2020-05-29 08-36-55
This would seem to backup the theories that we are looking in the right place.

@lawnjelly
Copy link
Member

lawnjelly commented May 29, 2020

Some useful stuff here:
https://community.khronos.org/t/pixel-perfect-drawing/38454/6
One suggestion is instead of modifying the UVs, to modify the positions by 0.5 pixel. This sounds intriguing (I may well have done this myself in the past on non Godot OpenGL, I should check some old code). Although I suspect this is going to have to be done in a convoluted way to handle the viewport 2d scaling (I'm not sure how the scaling is handled in Godot, whether it is concatenated, will investigate).

They also mention the difference between drawing e.g. a line from A to B is not the same as from B to A, something I mentioned in my earlier post about the contrast in drawing methods between the uniform buffer method and the index primitive method. Using the same flipped method for indexed primitive may help with the uv offset polarity problem.

@lawnjelly
Copy link
Member

The more I read the more it seems like this is not an easy problem. Expecting borders of atlas tiles to match up with non-integer scaling might not be possible. A lot of people (outside of Godot) do seem to end up having to use padding solutions like @dalexeev .

Part of the problem I suspect is that pixels on screen (and texels) are mapped from the bottom left of the bottom left pixel (or texel) to the top right of the top right pixel (or texel). The maths therefore seems to get a bit floaty and precision sensitive when you are using atlases (i.e. not using the whole 0.0 - 1.0, 0.0 - 1.0 range of the texture), as you have to use this to calculate the correct adjusted UVs to use.

This is all compounded by the UV bodge, which may not be the best solution for the problem it was addressing, I need to take a look at that.

@lawnjelly
Copy link
Member

lawnjelly commented May 29, 2020

Ok this was a mistake on my part, but I ended up taking some screenshots on android with filtering active, without pixel snap on the left and with pixel snap on the right. Not hugely significant but shows the values are ballpark right for reading from the texels.

linear_point_scaled

@lawnjelly
Copy link
Member

Argg. Finally got the texture filtering flag working, and I can't replicate it, on desktop or tablet. 😞

@mattislorentzon
Copy link
Author

I feel your pain 😁

I'm starting to feel that I'm in a bit over my head, but I got the export templates up and running at last!
I ran the following combinations on my android tablet:

Forced rect workaround UV offset 0 UV offset 1e-5 UV offset -1e-5
Batching: Problem Problem Problem Problem
No Batching: Problem No problem No problem No Problem

"Forced rect workaround" refers to forcing the use_nvidia_rect_workaround flag to true as suggested in rasterizer_canvas_base_gles2.cpp. "UV offset" refers to the offset used when pixel snapping in gles2 canvas.glsl

Problem means the blue border is visible on the bottom and right edges of the rendered tile.

Worth noting is that forcing the rect workaround indeed makes the problem appear when batching is disabled. Also the small UV offset doesn't seem to have any effect on the issue for me (triple checked 😄)

@lawnjelly
Copy link
Member

lawnjelly commented May 29, 2020

I'm starting to feel that I'm in a bit over my head

Oh you and me both lol! 😁

But those results are great, thank you for testing. 👍

They do show that it's the indexed primitive drawing method that is associated with the problem, but it is for some reason working with the uniform drawing method (rather than the batching per se, the batching just uses the more standard method of drawing, same as the nvidia workaround).

Also it would be helpful to know whether this same situation occurs when you aren't using pixel snap (my guess is yes, but if it doesn't occur that would be interesting).

Hardware

This is probably solvable (given that the uniform method works well enough) but the main snag is it is usually a case of trying a matrix of different possibilities and finding what triggers it, and this is hard to do remotely for the rendering guys without the hardware.

I did bring this up with @akien today, in the long run it might be worth me and @clayjohn buying a couple more second hand old android devices so it's easier to fix these issues. In a lot of cases the emulator doesn't help because the issues are down to driver bugs or precision. I do have a phone with really bad precision that was handy for this but the minimum Godot version is too high for it.

It is incredibly annoying that precision emulation isn't available on any of the software renderers to my knowledge. In fact I've been wondering whether it might be possible to make some modifications to the mesa software GL implementation in order to allow precision emulation. That would be excellent for testing purposes (not just to us, but to everyone developing on mobiles). I might have to try and mention this on their site and see if this is practical.

Fixes

Incidentally, I have a couple of ideas that will likely solve it, but it would be nice to fully understand it instead of adding another bodge, because otherwise fixing this can end up creating a visual anomaly in another situation.

For instance, in both the nvidia_workaround and the batching draw calls it is possible to simply reduce the source rect size by a small sub pixel amount. By increasing it a small amount I can get the same behaviour as the bug.

Another idea is to have a conversion function that adjusts source rect UVs according to their position through the texture (based on the idea that bottom left of bottom left pixel is 0, 0, and top right of top right pixel is 1, 1).

Another precision idea is to pass the UVs as source rect coords and multiply by pixel_size in the shader (or even try doubles on the CPU).

However I'm not keen on the idea of introducing fixes without fulling being able to investigate the problem. Anyway I'll keep an eye on this for now and see if there are any more reports of the problem to see how common it is.

The other thing is that am I right in saying, changing to a power of 2 texture size fixed it for you? Perhaps that is the best solution for now until we have this pinned down better.

@mattislorentzon
Copy link
Author

The other thing is that am I right in saying, changing to a power of 2 texture size fixed it for you?

Yup, I have not been able to trigger the problem with power of two sized texture atlases!

The ideas sound great to me, and I share your concerns about introducing fixes to problems that are difficult to reproduce :) If you'd like me to try out some patches in the future I'll be happy to help!

@robertarnborg

This comment has been minimized.

@Calinou
Copy link
Member

Calinou commented May 30, 2020

@robertarnborg Please don't bump issues without contributing significant new information. Use the 👍 reaction button on the first post instead.

As a workaround, you can change the Default Clear Color in the project settings to match the average tile color more closely (or add a large ColorRect as a background).

@lawnjelly
Copy link
Member

lawnjelly commented May 31, 2020

Actually this is useful info, @robertarnborg : What hardware / OS are you running on?

Also can you give me any more specifics about your situation that triggers it (what size texture, what size tiles etc), or even a minimum reproduction project. It may give more clues or better still enable me to replicate it which would make it far easier to fix.

Also please let me know if using a power of 2 size texture helps or not. The precision of the UV in the fragment shader is mediump (as highp may not even be available) and we may be exceeding the texel accuracy available there with a non POT texture. Generally using POT and tiles of a regular size like 16, 32, 64 is more likely to work. Or maybe a multiple of 16. Tile sizes without a divisor like 4 or 8 are more likely to cause problems (e.g. 3, 5, 7, 9, 11 etc).

I will bump up the research into this, if it was a once-off that is different from several people reporting it. 👍

As I say I have a few fix ideas but they will require regression testing, especially as I don't have hardware that exhibits the problem. I could implement 2 or 3 fixes in some experimental android templates as project settings, and get you guys to test them.

@lawnjelly
Copy link
Member

lawnjelly commented May 31, 2020

Can either of you guys confirm this:

Does the problem occur with pixel_snap off? Or does it only occur with pixel_snap switched on?

This is important because offsetting the pixels so it jumps to a neighbouring pixel can have an indirect effect on the UVs that are read. I've so far been concentrating on the UVs but it could be that the UVs are fine and pixel snap is pushing the UV read over a boundary.

i.e. There may need to be a corresponding adjustment to UVs when snapping vertex positions.

@lawnjelly
Copy link
Member

lawnjelly commented May 31, 2020

Ok I hope this download works. I've compiled some debug android templates with a little bodge on the UVs which may help:
http://www.mediafire.com/file/0blnlo3nq7bg66u/android_debug.apk/file

Only the 2 arm versions are relevant, it kept including an old x86 in the APK, and I have no idea where it was cached to get rid of it.

You will have to select this as a custom debug template in your android export, and make a debug export. You will also have to use the batching path to test it, I haven't done the change on the nvidia workaround (non batching renderer).

@lawnjelly
Copy link
Member

I've been experimenting with various UV offsets. In clayjohn's original method to deal with precision #26467 a small constant was added to all vertex uvs (1e-5).

The problem is throughout a tile (or sprite) sampling UVs can sometimes take place along integer borders of the texture, with some passing the border and some not, giving the staggering effect.

The problem I think with adding to every vertex UV is although that might make sense for the minimum verts (of a rect), pushing the maximum UV makes it more likely to sample out of bounds, so I'm working on the assumption we need to do the opposite for maximum UVs.

This is where the uniform method and the indexed primitive method may need to diverge. Actually I have no idea why the uniform method doesn't suffer the problem, it is probably due to a subtlety of precision doing the calculation on the GPU compared to CPU. It's very difficult to guess without being able to test.

I've tried increasing the minimum UV by 0.5 texel and decreasing the max by 0.5. This will probably work at 1:1 scale, but with 2d scaling of the viewport it results in 'expanding' the tile and cutting off some of the edge. So it would seem that small offsets is the way to go with this approach.

It may be necessary to remove the offset bodge from the shader when not using TEXTURE_RECT, and apply it CPU side instead, adding a small offset to the mins and subtracting a small offset from the maxes.

Of course what this 'small offset' should be is difficult to guess. @mattislorentzon reports that there was still a problem with a -1e-5 offset. Although whether it was the 'same' problem I don't know, I'm assuming so (could it have been reverse polarity edge showing?). The test android APK I made above just subtracts a 0.25 texel offset from the max UVs, so we'll have to see if this works for a start.

@mattislorentzon
Copy link
Author

mattislorentzon commented May 31, 2020

You're on a roll 👍 Must be super frustrating to put so much thought into this but not be able to see the problem directly 😄

Does the problem occur with pixel_snap off? Or does it only occur with pixel_snap switched on?

I ran the attached project using the combinations of Batching on/off, Pixel snap on/off. The images included below are exactly how they are rendered on my device (lossless screenshot, no scaling or other post processing).

issue_39096.zip

Pixel Snap On Pixel Snap Off
Batching On batchOn_pixelSnapOn batchOn_pixelSnapOff
Batching Off batchOff_pixelSnapOn batchOff_pixelSnapOff

Edit: I probably should have chosen a background color which differs more from the tile color 😄 But just to be clear, the outmost dark border is the project background and not part of the tile.

And using your custom android template:

Pixel Snap On Pixel Snap Off
Batching On batchOn_pixelSnapOn_lawnjellyTemplates batchOn_pixelSnapOff_lawnjellyTemplates

The bottom-right edges are a bit squished, so 0.25 texels might be a bit too much, but it seems to have the effect we wished for!

mattislorentzon reports that there was still a problem with a -1e-5 offset. Although whether it was the 'same' problem I don't know,

The problem was visibly the same, i.e. the artifact appeared on the bottom-right edge of the tile.

@lawnjelly
Copy link
Member

Ah fantastic with the tests! 👍

Yes with the custom template above I erred on the side of caution and offset by 0.25 of a texel, which will show up when zoomed.

I'll have a little think about whether the max value should have a fixed offset (like in the shader) or a value in fractions of a texel. I'll probably be able to make another template once I've finalized a method, may be tomorrow though. I might for the best be able to bandit some other float value from the project settings so you can adjust it (this will avoid having to recompile the engine).

I think we are getting there, despite the difficulty! 😁

@lawnjelly
Copy link
Member

lawnjelly commented Jun 1, 2020

Ok I have now made another version of the android debug templates:
http://www.mediafire.com/file/9dz0gbj902rkixq/android_debug.apk/file

On this version I have bandited the rendering/gles2/batching/max_join_item_commands project setting, so that it is used to determine the UV offset subtracted from the rect max UVs.

The formula used is:

offset = max_join_item_commands / 1000000

This is in absolute UV units (scaled from 0 to 1) rather than in texels. Ideally the measurement should be in texels, but because precision is fixed on the GPU on the 0 to 1 scale, a correct texel offset may end up being below the minimum precision of the GPU at some texture sizes.

A value of max_join_item_commands of 0 will give no change, and around 300 will probably be too much. So the sweet spot should be somewhere in between those values, so if you can give me some kind of idea of the minimum value that solves the problem in your test case, and then let me know the texture dimensions and the tile dimensions.

I can then use that to estimate a good value to use. This of course depends on the precision in your GPU but it should give me a ballpark figure.

The GLES spec Page 33:
https://www.khronos.org/files/opengles_shading_language.pdf
Gives minimum ranges and precisions for low, medium and highp, this is also helpful.
Lowp Absolute 1/256
Mediump Relative 1/1024
Highp Relative 1/65536

The Mediump and Highp are a bit more difficult because they are relative precisions. In practice I think these will be at least medium - highp in the vertex shader. The range is also quite a bit higher than 1, so we may get better results by passing in the UVs as texels and converting to 0 to 1 in the shader, that is always an option.

It could alternatively be that the precision in the vertex shader is fine and the precision issue occurs in the fragment shader, but we still need to effectively know what offset bodge to use to correct for this.

@mattislorentzon
Copy link
Author

mattislorentzon commented Jun 1, 2020

Awesome! These are the results I get:

0 44 45 300
batchOn_000_cropped batchOn_044_cropped batchOn_045_cropped batchOn_300_cropped

The blue border disappears when switching max_join_item_commands from 44 to 45. There is no difference between 45 and 300.
There is however still a slight dissimilarity at the right and bottom edge between batching with offset 45 and non-batching.
batchOff_batchOn044_batchOn045_comparison

But maybe this is expected?

Device/OS: Galaxy Tab A (SM-T510), Android 9, GPU: Mali-G71 MP2
Pixel snap: On
Stretch mode: 2D (with 'keep aspect')
Window size: 224x224 px
Tablet resolution: 1920x1200 px
Tile size: 32x32 px
Tile texture atlas size: 448x448 px
Tile top-left position in texture atlas: x: 256 px, y: 256 px
Tile top-left position relative scene/window origin: x: 32 px, y: 32 px
Tile top-left position ẃhen displayed on the tablet: x: 531 px, y: 171 px
Tile width when displayed on the tablet: 172x172 px

@robertarnborg
Copy link

robertarnborg commented Jun 1, 2020 via email

@lawnjelly
Copy link
Member

lawnjelly commented Jun 1, 2020

Thanks to both of you guys! 👍

The slight dissimilarity is actually really interesting. I've been lining them up in photo editor to examine. The shapes of the corner blobs look identical in 0 and 44, but the exact moment we lose the blue line the blobs also shift. This is making me wonder whether we either also need a shift in the minimum UV, or the problem is with the positions rather than the UVs.

I'll try some experiments with the positions and see if I can emulate the bad behaviour.

Very interesting that it can occur on desktop too. BTW @robertarnborg, you can temporarily get around it for playtesters by disabling batching until it is fixed, but performance will probably suffer.

Edit: Actually I've noticed something fundamentally different about the uniform method and the index method:

The UV in the uniform method is derived from the vertex (which is specifically marked highp), however in the indexed method (and all other drawing) the UV is passed by the uv_attrib which is NOT marked as highp. This is definitely worth trying. The only thing is usually precision has no effect on desktop, which doesn't match up with the report of it occurring on desktop.

@lawnjelly
Copy link
Member

Ok one last build of the templates for tonight:
http://www.mediafire.com/file/vehgdejtf5rm78l/android_debug.apk/file

This time I've modified the shader to use highp. There's just a chance this will help even with max_join_item_commands set to 0. Worth a shot.

Other than that this version does a corresponding change to the rect minimum UVs as well as the maximum UVs, with the aim of keeping the blobs equal.

Failing that I'll start looking at the positions tomorrow.

@mattislorentzon
Copy link
Author

Thanks again!
I downloaded the new template, double and triple checked that I actually used it in the export (:smile:), and ran the same evaluation with max_join_item_commands set to 0, 44 and 45. The results are exactly the same as using the previous template. I used ImageMagick's compare function to diff the images and they are identical for every pixel. So the latest change had no visible effect on the example project.

The UV in the uniform method is derived from the vertex

My brain is still slowly trying to catch up, but wouldn't this mean that the uv-positions in this method are affected when rounding the positions in the pixel snap?

@lawnjelly
Copy link
Member

lawnjelly commented Jun 2, 2020

Thanks again!
I downloaded the new template, double and triple checked that I actually used it in the export (), and ran the same evaluation with max_join_item_commands set to 0, 44 and 45. The results are exactly the same as using the previous template. I used ImageMagick's compare function to diff the images and they are identical for every pixel. So the latest change had no visible effect on the example project.

Hmm, this isn't entirely surprising. You might only notice the difference once you get to higher values like >300. This may depend on the size of your texture, the size of the tile, and which tile you are addressing in the atlas.

Precision

I'm assuming the problems are coming from precision, have a read of:
https://docs.python.org/3/tutorial/floatingpoint.html

For some explanation of the principles, although consider that on a GPU, rather than 53 bits of precision, we may only have 8 or 10 bits! Floating point tries to give the impression of a continuous range, but it also works in discrete steps, like integer, but unlike integer these steps can be irregular, and hard to predict (especially when the mapping is not 1:1).

So you can end up in a situation where you apply the exact same subtraction from a min value and a max value, and one of them responds and one doesn't (or they respond as if you had subtracted different values).

Floating point

Floating point is therefore really a rubbish way of addressing texels, especially in a texture atlas that is not power of 2. But they do have the property that they work fine irrespective of the size of the texture (i.e. 0 to 1 is the same if you change texture size from 63 to 412 or whatever).

Floats and fixed point get considerably easier when you are dealing with power of 2 texture sizes - certain operations become much easier, more accurate and faster. Consider that to find e.g. a fixed point mip map texel address you can simply shift the bits (divide by 2). And for a 256x256 texture, even with 8 bits of precision, you can still exactly address each texel.

So for texture atlases, to prevent this kind of problem you end up having to do hacks to workaround the lack of precision, hence things like this offsetting.

The UV in the uniform method is derived from the vertex

My brain is still slowly trying to catch up, but wouldn't this mean that the uv-positions in this method are affected when rounding the positions in the pixel snap?

Error

All the calculations are potentially subject to precision error:

  • Calculation of the UV on the CPU
  • Transfer of the UV into the shader
  • Any calculations on the UV in the vertex shader
  • Transfer of the UV from the vertex shader to the fragment shader
  • Fragment shader calculations

So you can see how difficult it can be to guess where the precision error (or other bug) is occurring without hardware!

Outside 1:1 Ratio

Also consider that the problem is hard enough with a 1:1 ratio of pixels to texels. But once you use the 2d viewport stretching, things get especially tricky. It is very easy for the first and last line to shift over the texel boundary, due to these inaccuracies.

Solutions

The traditional ways to deal with this problem is to:

  • Not use a texture atlas, use a whole texture, and apply CLAMP to the edges (so if you go off the edge it repeats)
  • Apply a single pixel border, replicating the edge pixels of the tile in the texture (i.e. instead of making the tiles 32x32, you make them 30x30 then replicate the edges).
    A lot of games do this, but we have to work within the constrains of current functionality - Godot doesn't do this for tilemaps (and maybe not for other similar cases)
  • Apply clamping in the fragment shader.
    We probably shouldn't do this because it is potentially slow, and on many of the target platforms touching UVs in the fragment shader can switch to a lower precision path. Although it is available as a last ditch option.
  • Apply either a position or UV offset to try and coax the reading of texels from within the range
    For 1:1 mapping, you can often apply a 0.5 pixel offset. The problem is that if we do that for 2d viewport expansion, we end up with 'expanded' tiles, with a thinner edge pixel than those within the tile (kind of like a reverse fish eye effect).

I'm wondering whether we might be better off sending the UVs as texels to the shader, and doing the conversion to 0-1 range in the vertex shader. It would at least remove 2 areas of possible bug surface (the CPU calculation, and the transfer). This has the advantage that the texel UVs can be passed exactly.

Overall though this is really going to need a developer who can replicate the problem to properly solve it. As you can see, I have a lot of ideas, but it is too difficult to test the hypotheses rapidly. We can apply the bodge on the UV maximums, however as you can see this is not perfect because it does result in some inconsistencies. Maybe we'll have to do that as a working solution for now.

The other simpler option is to just recommend using POT texture atlas sizes, and POT tile sizes, if it cures the problem.

Some useful info here, bit repeating what I have said but maybe better explained:
http://download.nvidia.com/developer/NVTextureSuite/Atlas_Tools/Texture_Atlas_Whitepaper.pdf
https://stackoverflow.com/questions/18818555/opengl-2d-game-texture-artifacts

@lawnjelly
Copy link
Member

lawnjelly commented Jun 2, 2020

Good news, I've finally worked out how to get it to happen on desktop. I went with the idea of @robertarnborg using small tile size, so I went to an extreme with a 2x2 texel tile, inside a large non regular atlas (39x1797), and zooming in quite a bit.

Texel_SmallTile.zip

This should make it much easier for me to try things out and come up with a solution. 👍
Screenshot from 2020-06-02 18-33-56

Note the above is with pixel snap turned off. However I think it is likely to be the same bug, pixel snap just makes it less likely to appear, down to precision in the read UV coordinates. For example is you have a 2x2 texel texture, and you attempt to read from 0.5 to 1.0, then which texel should it read if the UV coordinate is band on 0.5, the left or the right? Any floating point varation around this will lead to sometimes reading the left and sometimes the right.

@lawnjelly
Copy link
Member

Ok I now have made a PR that should (hopefully fix this). It isn't ideal though because it will result in slight squashing of the edge texels.

Looking at alternative solutions, I don't think it would necessarily be that difficult to introduce a system into the tilemaps to auto-generate a 1 pixel padded version of the texture, and modify the UVs automatically. It might be a bit more verbose code wise but would solve the problem and not be hardware sensitive.

Being a perfectionist I think this might be the best final solution, it will also deal with artifacts if you choose to use linear texture filtering.

I'll try and do a proposal and have a look into this when I get some time. 👍

@Zireael07
Copy link
Contributor

Looking at alternative solutions, I don't think it would necessarily be that difficult to introduce a system into the tilemaps to auto-generate a 1 pixel padded version of the texture, and modify the UVs automatically. It might be a bit more verbose code wise but would solve the problem and not be hardware sensitive.

I think a 1px-padded version could be very easily done via image.set_pixel() :)

@robertarnborg
Copy link

Here a screenshot how these things pop up for me. A minor annoyance, but a huge issue in regards to polish for your game. Waiting for Godot here since over two years now, working with this beautiful engine but... nope. Unacceptable.

Screenshot (88)

@lawnjelly
Copy link
Member

Here a screenshot how these things pop up for me. A minor annoyance, but a huge issue in regards to polish for your game.

Sure, and I agree, but this is a beta, exactly to catch these issues. If you don't want to deal with them, use the last stable version, or turn off batching and it should go back to how it was before. We can only fix these issues as they are reported, and this was only reported a few days ago, and some are more difficult to fix than others.

Also, I understand the frustration, but saying this is unacceptable isn't really appropriate. Most contributors are not being paid, and are volunteering their spare time to try and improve the engine and fix bugs. This is open source software. If you find something 'unacceptable', then get stuck in and fix it. You have 1 year more experience with the engine than me. I only set eyes on the 2d source code in February. 😁

@lawnjelly
Copy link
Member

I think a 1px-padded version could be very easily done via image.set_pixel() :)

Yes I think actually making the padded texture and adjusting the UVs is not too difficult. The tricky bit is formulating a workflow with the Godot export process.

For ease of use, ideally we would have the source texture with no padding, then at startup create a padded version. The only snag here is texture compression. If it takes some time to compress a large atlas that would be undesirable. Uncompressed would work, but would would take more RAM (and is slower at runtime in some cases than compressed texture).
Texture compression also has impacts on the padding. We may need to use 2 pixel padding in order to get 4 pixel alignment to avoid compression artifacts.

The alternative is to export the texture as padded, and use as is. This is the most efficient for the end product. The only difficulty is fitting it into the existing resource paradigm. At the moment in most cases all the textures are imported, so we could end up with both the unpadded and the padded texture being in an export.

I really know very little about the resource handling / exporting side of Godot, so can only make wild suggestions. We could have a special folder name (e.g. tile_source) that won't be exported but could be used as the source textures for generating padded textures. That might work, selecting them in the tilemap editor, and simply ignoring them as far as resources are concerned. I'll have to seek advice on this.

@robertarnborg
Copy link

robertarnborg commented Jun 3, 2020 via email

@robertarnborg
Copy link

robertarnborg commented Jun 3, 2020 via email

@lawnjelly
Copy link
Member

I was just researching how to add some auto-padding, when I found there appears to be some kind of support already for padding, there is a separation parameter in the TilesetEditorContext (it took me a while to find even with the webpage):

https://godotengine.org/qa/48528/tileset-spacing-and-bad-snapping

This could also be utilized to solve the problem. I don't know whether this was the intention with this feature, or whether it was added to ease importing 3rd party tilesets. The padding areas need to be copies of the edge texels. I think the padding in the corners may be incorrect (it needs a pixel barrier at the edge for the filtering to work) so this might need fixing, alternatively you could just miss out using the tiles at the sides.

I don't know if there is a feature already for automatically adding this padding, if not it could be added. I will ask around.

@akien-mga akien-mga added this to the 3.2 milestone Jun 10, 2020
@akien-mga
Copy link
Member

Fixed by #39256.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants