Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Fault in AllocAppender #470

Closed
yazd opened this issue Jan 17, 2014 · 37 comments
Closed

Segmentation Fault in AllocAppender #470

yazd opened this issue Jan 17, 2014 · 37 comments

Comments

@yazd
Copy link
Contributor

yazd commented Jan 17, 2014

Got this segmentation fault when rendering a diet template.

__memcpy_ssse3_back () at ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S:2816
2816    movdqu  %xmm0, 33(%rdi)
(gdb) 
(gdb) 
(gdb) bt
#0  __memcpy_ssse3_back () at ../sysdeps/x86_64/multiarch/memcpy-ssse3-back.S:2816
#1  0x00000000007ac41f in _d_arraycopy ()
#2  0x000000000074b6af in vibe.utils.array.__T13AllocAppenderTAhZ.AllocAppender.put() (
    this=0xb49f98, arr=...) at ../../.dub/packages/vibe-d-master/source/vibe/utils/array.d:93
#3  0x000000000074b759 in vibe.utils.array.__T13AllocAppenderTAhZ.AllocAppender.put() (
    this=0xb49f98, arr=...) at ../../.dub/packages/vibe-d-master/source/vibe/utils/array.d:99
#4  0x000000000067234f in vibe.http.common.ChunkedOutputStream.write() (this=0xb49f80, 
    bytes_=...) at ../../.dub/packages/vibe-d-master/source/vibe/http/common.d:422
#5  0x00000000007518e4 in vibe.core.stream.OutputStream.write() (this=0xb49fd0, bytes=...)
    at ../../.dub/packages/vibe-d-master/source/vibe/core/stream.d:77
#6  0x0000000000638184 in vibe.templ.diet.__T15compileDietFileVAyaa9_7365726965732e6474S178_D4vibe3web3web108__T6renderVAyaa9_7365726965732e6474S69_D3app15VideoDownloader9getSeriesMFmkZv8ttSeriesHmC2dv7library6SeriesZ6renderMFZv3reqC4vibe4http6server17HTTPServerRequestS69_D3app15VideoDownloader9getSeriesMFmkZv8ttSeriesHmC2dv7library6SeriesZ.compileDietFile() (
    this=0x7ffff4e03170, __applyArg0=<error reading variable>) at series.dt:12
#7  0x00000000007abb6e in _aaApply ()
#8  0x0000000000638042 in vibe.templ.diet.__T15compileDietFileVAyaa9_7365726965732e6474S178_D4vibe3web3web108__T6renderVAyaa9_7365726965732e6474S69_D3app15VideoDownloader9getSeriesMFmkZv8ttSeriesHmC2dv7library6SeriesZ6renderMFZv3reqC4vibe4http6server17HTTPServerRequestS69_D3app15VideoDownloader9getSeriesMFmkZv8ttSeriesHmC2dv7library6SeriesZ.compileDietFile() (
    this=0x7ffff4e031d8, stream__=0xb49fd0) at series.dt:8
#9  0x0000000000637f3c in vibe.http.server.__T6renderVAyaa9_7365726965732e6474S178_D4vibe3web3web108__T6renderVAyaa9_7365726965732e6474S69_D3app15VideoDownloader9getSeriesMFmkZv8ttSeriesHmC2dv7library6SeriesZ6renderMFZv3reqC4vibe4http6server17HTTPServerRequestS69_D3app15VideoDownloader9getSeriesMFmkZv8ttSeriesHmC2dv7library6SeriesZ.render() (this=0x7ffff4e031d8, res=0xb5ad00)
    at ../../.dub/packages/vibe-d-master/source/vibe/http/server.d:234
#10 0x0000000000637eda in vibe.web.web.__T6renderVAyaa9_7365726965732e6474S69_D3app15VideoDownloader9getSeriesMFmkZv8ttSeriesHmC2dv7library6SeriesZ.render() (this=0x7ffff4ffc0c0)
    at ../../.dub/packages/vibe-d-master/source/vibe/web/web.d:62

The error is reproducible every time.
While trying to debug it, I logged the variables in AllocAppender put(). And although m_remaining reported enough space, accessing it produced the segfault.

@yazd
Copy link
Contributor Author

yazd commented Jan 17, 2014

Further testing shows that m_data is itself inaccessible.

@yazd
Copy link
Contributor Author

yazd commented Jan 18, 2014

Using VibeManualMemoryManagement, the segfault doesn't occur.

@etcimon
Copy link
Contributor

etcimon commented Jan 18, 2014

I had problems with GC mode as well, I think vibe's memory should be manually managed by default.

@yazd
Copy link
Contributor Author

yazd commented Jan 29, 2014

I'll close this issue as apparently it is a bad report without a reproducible case.
If I come by a small reproducible case, I'll reopen this.

@yazd yazd closed this as completed Jan 29, 2014
@yebblies
Copy link

yebblies commented May 1, 2014

I think I just hit this bug - I got a segfault in memcpy and a broken stack trace.

Unfortunately the application is closed-source and the crash only happened after ~3 hours.

The VibeManualMemoryManagement workaround seems to fix it for me too.

@s-ludwig
Copy link
Member

@s-ludwig s-ludwig reopened this May 12, 2014
@luismarques
Copy link

@yazd I don't understand, in the first post you said "The error is reproducible every time"; why was this closed later as a "a bad report without a reproducible case"? In my case, which @s-ludwig mentions, the bug is not reproducible deterministically, so if you have a deterministic example sharing it would help.

@yazd
Copy link
Contributor Author

yazd commented May 12, 2014

Originally, it was reproducible for me, but it was not a minimal testcase. I changed some stuff and it got working and I lost the case. I'll try going back to that project and to reproduce the issue again. I think it's worth a few hours of my time.

@etcimon
Copy link
Contributor

etcimon commented May 12, 2014

@luismarques did you have manual memory management when this bug occured?

@yazd
Copy link
Contributor Author

yazd commented May 12, 2014

Nop, with the manual memory management version everything worked correctly.
Edit: Ops, just noticed the question was for luismarques.

@etcimon
Copy link
Contributor

etcimon commented May 12, 2014

I had tons of problems without manual memory management, that's why I still hold that it should be on by default.

@s-ludwig
Copy link
Member

I had tons of problems without manual memory management, that's why I still hold that it should be on by default.

It's dangerous (i.e. not @safe)! Using any memory of the HTTPServerRequest outside of the request handler will cause memory corruption (e.g. using a query parameter as an AA key is enough for this)! Adding scope to the req/res parameters would be the least that is necessary before this can be considered as the default setting. But this is a little difficult in terms of providing a deprecation path.

In other languages this wouldn't be really noteworthy (especially C and partially C++), but for D it's kind of the norm that you can't do much wrong in terms of memory corruption because of the GC, so this is kind of unexpected.

@etcimon
Copy link
Contributor

etcimon commented May 12, 2014

you can't do much wrong in terms of memory corruption because of the GC, so this is kind of unexpected.

That's funny because I thought the vibe.utils.memory was setup to manually manage most of the stuff even though the USE_GC is set to true..

@luismarques
Copy link

@etcimon: I'm using the default automatic memory management.

@s-ludwig
Copy link
Member

Yes, but the HTTP server uses defaultAllocator(), which is a pure GC based allocator if manual memory management isn't enabled. manualAllocator() is mostly used for internal stuff that is not part of the API.

@etcimon
Copy link
Contributor

etcimon commented May 12, 2014

So.. just a quick guess here to practice my vibe skills, but doesn't these use cases mostly refer to ubytes that were received by a TCP Connection? That would be in the circular buffer, and the access violation means the circular buffer got collected by the GC, yes?

edit: Nvm that wouldn't be possible because TCPConnection.read(dst) copies the data from the buffer

s-ludwig added a commit that referenced this issue May 16, 2014
@s-ludwig
Copy link
Member

I've just had a crash (invalid write access) in that location that was reproducible 100%. After changing the use of GC.realloc to GC.extend, the crash went away. So either I was doing something wrong with GC.realloc, or there is a but in there (or there is some obscure heap corruption going one, but hopefully not...).

@yazd
Copy link
Contributor Author

yazd commented May 16, 2014

I wasn't able to reproduce my case, but fortunately, you were. Hopefully this gets it fixed.

@s-ludwig s-ludwig added bug and removed bug labels May 16, 2014
@s-ludwig
Copy link
Member

Another piece of empirical evidence: I've just discovered that the vibed.org website frequently crashed due to a general protection fault, but stopped doing so ever since I've rebuilt with the "fixed" vibe.d version. So it's not guaranteed that the GPF was caused by this issue here, but at least it seems likely.

@s-ludwig
Copy link
Member

Closing until new evidence comes up. The vibed.org process didn't crash anymore since the fix has been incorporated (runs for 3 days now).

@etcimon
Copy link
Contributor

etcimon commented May 20, 2014

So this means GC.realloc could produce memory corruption in other instances of use, such as associative arrays? I think this may also be why you resort to "magic numbers" wouldn't it?

@etcimon
Copy link
Contributor

etcimon commented May 20, 2014

Ok I found a few huge bugs in the GC.realloc routine

https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gc.d#L674
https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gc.d#L674

Apparently the updateCaches is used in extend and realloc to cache the size of the new allocation, because it speeds up the findSize method. However, it's nowhere to be found in one specific branch of the realloc method (when the previous size is smaller than a page size ie. 4096 bytes)
https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gc.d#L679

So, if you do the following you should get a write access violation:
1- Allocate 2048 bytes
2- Reallocate to anything lower than 1024 bytes
3- Reallocate to 2048 bytes again

The 2048 byte allocation will be ignored because realloc thinks psize is 2048 bytes. Anything that followed this logic will end up with a write access violation. (Assuming no other realloc took place in between...)

...AND if the old and new allocation sizes are larger than a page (4096 bytes) and consecutive blocks of pages aren't found in the pagetable, it also skips the cache update by going the same malloc route as stated above

https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gc.d#L657

This will ALSO prevent pages from being freed, and will lead to the GC memory leaks which I've experienced! (finally found it!)

auto psz = psize / PAGESIZE; ( psize is the previous old allocation size b/c of cache bug )
[..]
pool.freepages -= (newsz - psz);

@s-ludwig
Copy link
Member

The magic numbers are just there as a safety measure to detect when the postblit constructor or the destructor is called on uninitialized memory. This mainly happens in conjunction with AAs, but also in some other cases (ternary operator, I think), but is not really relevant for GC allocations. But the situation is far better now than a year ago, where it still frequently triggered.

@s-ludwig
Copy link
Member

Regarding the GC realloc issue, it sounds like that could also be the normal behavior, where an additional and completely separate block of memory is allocated. Because there might still be references to the old one, it would be kept instead of being freed. But it would still be collected during the next collection run, when there are in fact no references left.

Disclaimer: I didn't look closer at the source code, yet, so this is pure speculation.

@etcimon
Copy link
Contributor

etcimon commented May 20, 2014

Regarding the GC realloc issue, it sounds like that could also be the normal behavior, where an additional and completely separate block of memory is allocated.

For this it would have to keep the old pointer around, but this branch of the code essentially does what extend does but skipping the size cache. I'm pretty sure it's a bug, you could add

gcx.updateCaches(p, size);

here:

https://github.com/D-Programming-Language/druntime/blob/master/src/gc/gc.d#L706

And I'm pretty sure it would fix this issue #470 without resorting to extend, and it would also fix tons of other issues

@s-ludwig
Copy link
Member

But that would update the size of the old block, which doesn't change in size. Only a new block is allocated there, AFAICS.

@etcimon
Copy link
Contributor

etcimon commented May 20, 2014

Yes you're right, the key changes so it doesn't need to be updated, so it's something else..

@etcimon
Copy link
Contributor

etcimon commented May 20, 2014

Though this does seem like the right reason the problem would be caused, just not in this specific execution path. It could be when the mallocNoSync call in this branch returns a page that was recently freed, the next time realloc is called on this pointer the previous size is still cached and will be returned there in findSize.

@s-ludwig
Copy link
Member

Seems like the findSize call here should reset the cache to something valid, so I think that case should be okay, too. But independent of that, the code there seems to be quite fragile - explicit cache updates are usually not a good idea, better would be write a caching wrapper around the non-caching GC implementation.

@etcimon
Copy link
Contributor

etcimon commented May 20, 2014

This execution path seems to return the wrong cache size:

  • p1 = malloc(size1)
  • realloc(p1) // this caches size1 for findSize
  • free(p1) // this doesn't delete the cached size
  • p1= malloc(size2) // ditto
  • realloc(p1) // this realloc uses size1

@s-ludwig
Copy link
Member

That looks like it may indeed be a bug. Unfortunately it doesn't explain the vibe.d issue, because no explicit free calls are used there and the normal collection run resets the cache.

@etcimon
Copy link
Contributor

etcimon commented May 20, 2014

That looks like it may indeed be a bug. Unfortunately it doesn't explain the vibe.d issue, because no explicit free calls are used there and the normal collection run resets the cache.

Free was probably a bad example, the internal use of free is the implementation in freePages which makes a previously cached pointer available.

@s-ludwig
Copy link
Member

Hm, but freePages is only called otherwise when realloc directly returns the in-place shrunk memory (here), so updateCaches is called, or am I missing something?

@etcimon
Copy link
Contributor

etcimon commented May 20, 2014

Here's how it could have affected vibe.d :

  • p1 = malloc(size1)
  • p2 = realloc(p1, size2) // this will call freePages and release p1 without changing the cache to p2
  • p1 = malloc(size3) // p1 is recycled here, notice p1's size is still in cache
  • realloc(p1, size4) // this will use psize = size1 although it's a new memory block

@etcimon
Copy link
Contributor

etcimon commented May 20, 2014

Hm, but freePages is only called otherwise when realloc directly returns the in-place shrunk memory (here), so updateCaches is called, or am I missing something?

Yes that's true. I'm not sure anymore if that's the buggy path I'll keep on researching it more..

@etcimon
Copy link
Contributor

etcimon commented May 20, 2014

@etcimon
Copy link
Contributor

etcimon commented May 21, 2014

After giving it long thoughts, I think an assert(new_sz != 0 && mem !is null) would be necessary in memory.d's realloc()- and of course explicitly deleting the size cache in gc.d's free.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants