Segmentation Fault in AllocAppender #470

yazd · 2014-01-17T23:08:02Z

Got this segmentation fault when rendering a diet template.

__memcpy_ssse3_back () at ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S:2816
2816    movdqu  %xmm0, 33(%rdi)
(gdb) 
(gdb) 
(gdb) bt
#0  __memcpy_ssse3_back () at ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S:2816
#1  0x00000000007ac41f in _d_arraycopy ()
#2  0x000000000074b6af in vibe.utils.array.__T13AllocAppenderTAhZ.AllocAppender.put() (
    this=0xb49f98, arr=...) at ../../.dub/packages/vibe-d-master/source/vibe/utils/array.d:93
#3  0x000000000074b759 in vibe.utils.array.__T13AllocAppenderTAhZ.AllocAppender.put() (
    this=0xb49f98, arr=...) at ../../.dub/packages/vibe-d-master/source/vibe/utils/array.d:99
#4  0x000000000067234f in vibe.http.common.ChunkedOutputStream.write() (this=0xb49f80, 
    bytes_=...) at ../../.dub/packages/vibe-d-master/source/vibe/http/common.d:422
#5  0x00000000007518e4 in vibe.core.stream.OutputStream.write() (this=0xb49fd0, bytes=...)
    at ../../.dub/packages/vibe-d-master/source/vibe/core/stream.d:77
#6  0x0000000000638184 in vibe.templ.diet.__T15compileDietFileVAyaa9_7365726965732e6474S178_D4vibe3web3web108__T6renderVAyaa9_7365726965732e6474S69_D3app15VideoDownloader9getSeriesMFmkZv8ttSeriesHmC2dv7library6SeriesZ6renderMFZv3reqC4vibe4http6server17HTTPServerRequestS69_D3app15VideoDownloader9getSeriesMFmkZv8ttSeriesHmC2dv7library6SeriesZ.compileDietFile() (
    this=0x7ffff4e03170, __applyArg0=<error reading variable>) at series.dt:12
#7  0x00000000007abb6e in _aaApply ()
#8  0x0000000000638042 in vibe.templ.diet.__T15compileDietFileVAyaa9_7365726965732e6474S178_D4vibe3web3web108__T6renderVAyaa9_7365726965732e6474S69_D3app15VideoDownloader9getSeriesMFmkZv8ttSeriesHmC2dv7library6SeriesZ6renderMFZv3reqC4vibe4http6server17HTTPServerRequestS69_D3app15VideoDownloader9getSeriesMFmkZv8ttSeriesHmC2dv7library6SeriesZ.compileDietFile() (
    this=0x7ffff4e031d8, stream__=0xb49fd0) at series.dt:8
#9  0x0000000000637f3c in vibe.http.server.__T6renderVAyaa9_7365726965732e6474S178_D4vibe3web3web108__T6renderVAyaa9_7365726965732e6474S69_D3app15VideoDownloader9getSeriesMFmkZv8ttSeriesHmC2dv7library6SeriesZ6renderMFZv3reqC4vibe4http6server17HTTPServerRequestS69_D3app15VideoDownloader9getSeriesMFmkZv8ttSeriesHmC2dv7library6SeriesZ.render() (this=0x7ffff4e031d8, res=0xb5ad00)
    at ../../.dub/packages/vibe-d-master/source/vibe/http/server.d:234
#10 0x0000000000637eda in vibe.web.web.__T6renderVAyaa9_7365726965732e6474S69_D3app15VideoDownloader9getSeriesMFmkZv8ttSeriesHmC2dv7library6SeriesZ.render() (this=0x7ffff4ffc0c0)
    at ../../.dub/packages/vibe-d-master/source/vibe/web/web.d:62

The error is reproducible every time.
While trying to debug it, I logged the variables in AllocAppender put(). And although m_remaining reported enough space, accessing it produced the segfault.

The text was updated successfully, but these errors were encountered:

yazd · 2014-01-17T23:19:40Z

Further testing shows that m_data is itself inaccessible.

yazd · 2014-01-18T09:04:21Z

Using VibeManualMemoryManagement, the segfault doesn't occur.

etcimon · 2014-01-18T21:58:54Z

I had problems with GC mode as well, I think vibe's memory should be manually managed by default.

yazd · 2014-01-29T21:19:59Z

I'll close this issue as apparently it is a bad report without a reproducible case.
If I come by a small reproducible case, I'll reopen this.

yebblies · 2014-05-01T06:43:58Z

I think I just hit this bug - I got a segfault in memcpy and a broken stack trace.

Unfortunately the application is closed-source and the crash only happened after ~3 hours.

The VibeManualMemoryManagement workaround seems to fix it for me too.

s-ludwig · 2014-05-12T11:50:53Z

Possible additional instance: http://forum.rejectedsoftware.com/groups/rejectedsoftware.vibed/thread/17107/

luismarques · 2014-05-12T13:36:49Z

@yazd I don't understand, in the first post you said "The error is reproducible every time"; why was this closed later as a "a bad report without a reproducible case"? In my case, which @s-ludwig mentions, the bug is not reproducible deterministically, so if you have a deterministic example sharing it would help.

yazd · 2014-05-12T13:48:23Z

Originally, it was reproducible for me, but it was not a minimal testcase. I changed some stuff and it got working and I lost the case. I'll try going back to that project and to reproduce the issue again. I think it's worth a few hours of my time.

etcimon · 2014-05-12T13:49:08Z

@luismarques did you have manual memory management when this bug occured?

yazd · 2014-05-12T13:53:41Z

Nop, with the manual memory management version everything worked correctly.
Edit: Ops, just noticed the question was for luismarques.

etcimon · 2014-05-12T13:56:02Z

I had tons of problems without manual memory management, that's why I still hold that it should be on by default.

s-ludwig · 2014-05-12T14:04:16Z

I had tons of problems without manual memory management, that's why I still hold that it should be on by default.

It's dangerous (i.e. not @safe)! Using any memory of the HTTPServerRequest outside of the request handler will cause memory corruption (e.g. using a query parameter as an AA key is enough for this)! Adding scope to the req/res parameters would be the least that is necessary before this can be considered as the default setting. But this is a little difficult in terms of providing a deprecation path.

In other languages this wouldn't be really noteworthy (especially C and partially C++), but for D it's kind of the norm that you can't do much wrong in terms of memory corruption because of the GC, so this is kind of unexpected.

etcimon · 2014-05-12T14:08:27Z

you can't do much wrong in terms of memory corruption because of the GC, so this is kind of unexpected.

That's funny because I thought the vibe.utils.memory was setup to manually manage most of the stuff even though the USE_GC is set to true..

luismarques · 2014-05-12T14:35:50Z

@etcimon: I'm using the default automatic memory management.

s-ludwig · 2014-05-12T14:42:33Z

Yes, but the HTTP server uses defaultAllocator(), which is a pure GC based allocator if manual memory management isn't enabled. manualAllocator() is mostly used for internal stuff that is not part of the API.

etcimon · 2014-05-12T15:00:40Z

So.. just a quick guess here to practice my vibe skills, but doesn't these use cases mostly refer to ubytes that were received by a TCP Connection? That would be in the circular buffer, and the access violation means the circular buffer got collected by the GC, yes?

edit: Nvm that wouldn't be possible because TCPConnection.read(dst) copies the data from the buffer

…ess tests to GCAllocator. See #470.

s-ludwig · 2014-05-16T16:27:41Z

I've just had a crash (invalid write access) in that location that was reproducible 100%. After changing the use of GC.realloc to GC.extend, the crash went away. So either I was doing something wrong with GC.realloc, or there is a but in there (or there is some obscure heap corruption going one, but hopefully not...).

yazd · 2014-05-16T17:15:22Z

I wasn't able to reproduce my case, but fortunately, you were. Hopefully this gets it fixed.

s-ludwig · 2014-05-16T23:57:17Z

Another piece of empirical evidence: I've just discovered that the vibed.org website frequently crashed due to a general protection fault, but stopped doing so ever since I've rebuilt with the "fixed" vibe.d version. So it's not guaranteed that the GPF was caused by this issue here, but at least it seems likely.

s-ludwig · 2014-05-19T15:58:36Z

Closing until new evidence comes up. The vibed.org process didn't crash anymore since the fix has been incorporated (runs for 3 days now).

etcimon · 2014-05-20T00:57:07Z

So this means GC.realloc could produce memory corruption in other instances of use, such as associative arrays? I think this may also be why you resort to "magic numbers" wouldn't it?

etcimon · 2014-05-20T06:06:23Z

Ok I found a few huge bugs in the GC.realloc routine

https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gc.d#L674
https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gc.d#L674

Apparently the updateCaches is used in extend and realloc to cache the size of the new allocation, because it speeds up the findSize method. However, it's nowhere to be found in one specific branch of the realloc method (when the previous size is smaller than a page size ie. 4096 bytes)
https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gc.d#L679

So, if you do the following you should get a write access violation:
1- Allocate 2048 bytes
2- Reallocate to anything lower than 1024 bytes
3- Reallocate to 2048 bytes again

The 2048 byte allocation will be ignored because realloc thinks psize is 2048 bytes. Anything that followed this logic will end up with a write access violation. (Assuming no other realloc took place in between...)

...AND if the old and new allocation sizes are larger than a page (4096 bytes) and consecutive blocks of pages aren't found in the pagetable, it also skips the cache update by going the same malloc route as stated above

https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gc.d#L657

This will ALSO prevent pages from being freed, and will lead to the GC memory leaks which I've experienced! (finally found it!)

auto psz = psize / PAGESIZE; ( psize is the previous old allocation size b/c of cache bug )
[..]
pool.freepages -= (newsz - psz);

s-ludwig · 2014-05-20T13:08:43Z

The magic numbers are just there as a safety measure to detect when the postblit constructor or the destructor is called on uninitialized memory. This mainly happens in conjunction with AAs, but also in some other cases (ternary operator, I think), but is not really relevant for GC allocations. But the situation is far better now than a year ago, where it still frequently triggered.

s-ludwig · 2014-05-20T13:19:30Z

Regarding the GC realloc issue, it sounds like that could also be the normal behavior, where an additional and completely separate block of memory is allocated. Because there might still be references to the old one, it would be kept instead of being freed. But it would still be collected during the next collection run, when there are in fact no references left.

Disclaimer: I didn't look closer at the source code, yet, so this is pure speculation.

etcimon · 2014-05-20T13:30:49Z

Regarding the GC realloc issue, it sounds like that could also be the normal behavior, where an additional and completely separate block of memory is allocated.

For this it would have to keep the old pointer around, but this branch of the code essentially does what extend does but skipping the size cache. I'm pretty sure it's a bug, you could add

gcx.updateCaches(p, size);

here:

https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gc.d#L706

And I'm pretty sure it would fix this issue #470 without resorting to extend, and it would also fix tons of other issues

s-ludwig · 2014-05-20T13:33:57Z

But that would update the size of the old block, which doesn't change in size. Only a new block is allocated there, AFAICS.

etcimon · 2014-05-20T13:36:50Z

Yes you're right, the key changes so it doesn't need to be updated, so it's something else..

etcimon · 2014-05-20T13:47:02Z

Though this does seem like the right reason the problem would be caused, just not in this specific execution path. It could be when the mallocNoSync call in this branch returns a page that was recently freed, the next time realloc is called on this pointer the previous size is still cached and will be returned there in findSize.

s-ludwig · 2014-05-20T14:09:54Z

Seems like the findSize call here should reset the cache to something valid, so I think that case should be okay, too. But independent of that, the code there seems to be quite fragile - explicit cache updates are usually not a good idea, better would be write a caching wrapper around the non-caching GC implementation.

etcimon · 2014-05-20T14:10:03Z

This execution path seems to return the wrong cache size:

p1 = malloc(size1)
realloc(p1) // this caches size1 for findSize
free(p1) // this doesn't delete the cached size
p1= malloc(size2) // ditto
realloc(p1) // this realloc uses size1

s-ludwig · 2014-05-20T14:16:57Z

That looks like it may indeed be a bug. Unfortunately it doesn't explain the vibe.d issue, because no explicit free calls are used there and the normal collection run resets the cache.

etcimon · 2014-05-20T14:21:25Z

That looks like it may indeed be a bug. Unfortunately it doesn't explain the vibe.d issue, because no explicit free calls are used there and the normal collection run resets the cache.

Free was probably a bad example, the internal use of free is the implementation in freePages which makes a previously cached pointer available.

s-ludwig · 2014-05-20T14:28:00Z

Hm, but freePages is only called otherwise when realloc directly returns the in-place shrunk memory (here), so updateCaches is called, or am I missing something?

etcimon · 2014-05-20T14:28:56Z

Here's how it could have affected vibe.d :

p1 = malloc(size1)
p2 = realloc(p1, size2) // this will call freePages and release p1 without changing the cache to p2
p1 = malloc(size3) // p1 is recycled here, notice p1's size is still in cache
realloc(p1, size4) // this will use psize = size1 although it's a new memory block

etcimon · 2014-05-20T14:40:02Z

Hm, but freePages is only called otherwise when realloc directly returns the in-place shrunk memory (here), so updateCaches is called, or am I missing something?

Yes that's true. I'm not sure anymore if that's the buggy path I'll keep on researching it more..

etcimon · 2014-05-20T14:43:56Z

https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gc.d#L588

This could be a good candidate

etcimon · 2014-05-21T03:05:44Z

After giving it long thoughts, I think an assert(new_sz != 0 && mem !is null) would be necessary in memory.d's realloc()- and of course explicitly deleting the size cache in gc.d's free.

yazd closed this as completed Jan 29, 2014

s-ludwig reopened this May 12, 2014

s-ludwig added a commit that referenced this issue May 16, 2014

Use GC.extend instead of GC.realloc and add assertions and memory acc…

151420b

…ess tests to GCAllocator. See #470.

s-ludwig added bug and removed bug labels May 16, 2014

s-ludwig closed this as completed May 19, 2014

etcimon mentioned this issue May 21, 2014

Fix cache inconsistency when free'ing from GC dlang/druntime#797

Merged

Segmentation Fault in AllocAppender #470

Segmentation Fault in AllocAppender #470

Comments

yazd commented Jan 17, 2014

yazd commented Jan 17, 2014

yazd commented Jan 18, 2014

etcimon commented Jan 18, 2014

yazd commented Jan 29, 2014

yebblies commented May 1, 2014

s-ludwig commented May 12, 2014

luismarques commented May 12, 2014

yazd commented May 12, 2014

etcimon commented May 12, 2014

yazd commented May 12, 2014

etcimon commented May 12, 2014

s-ludwig commented May 12, 2014

etcimon commented May 12, 2014

luismarques commented May 12, 2014

s-ludwig commented May 12, 2014

etcimon commented May 12, 2014

s-ludwig commented May 16, 2014

yazd commented May 16, 2014

s-ludwig commented May 16, 2014

s-ludwig commented May 19, 2014

etcimon commented May 20, 2014

etcimon commented May 20, 2014

s-ludwig commented May 20, 2014

s-ludwig commented May 20, 2014

etcimon commented May 20, 2014

s-ludwig commented May 20, 2014

etcimon commented May 20, 2014

etcimon commented May 20, 2014

s-ludwig commented May 20, 2014

etcimon commented May 20, 2014

s-ludwig commented May 20, 2014

etcimon commented May 20, 2014

s-ludwig commented May 20, 2014

etcimon commented May 20, 2014

etcimon commented May 20, 2014

etcimon commented May 20, 2014

etcimon commented May 21, 2014