Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding APIs for in-memory operation without fs access #5736

Open
brson opened this issue May 19, 2023 · 8 comments
Open

Adding APIs for in-memory operation without fs access #5736

brson opened this issue May 19, 2023 · 8 comments

Comments

@brson
Copy link
Contributor

brson commented May 19, 2023

I've received a request to make it possible for wasm-opt-rs to perform optimization entirely in memory without accessing the filesystem. This seems like a reasonable feature, and relatively easy to add. Is there any apatite for this in tree?

@kripken
Copy link
Member

kripken commented May 19, 2023

The C API does support running the optimizer,

https://github.com/WebAssembly/binaryen/blob/e42a58696059fd1cadcf25e10223b979214984b3/src/binaryen-c.h#LL2974C19-L2974C41

And also arbitrary passes can be run,

binaryen/src/binaryen-c.h

Lines 3075 to 3077 in e42a586

BINARYEN_API void BinaryenModuleRunPasses(BinaryenModuleRef module,
const char** passes,
BinaryenIndex numPasses);

The only other thing wasm-opt does is to provide a commandline API, that it translates into calls to the C++ APIs that the C API calls, basically. Is that not good enough? It might not be, if we're missing something, like I'm not sure if all the commandline flags have C APIs - maybe recent ones like --closed-world don't, and I see source maps mentioned in the issue you linked, which I'm not sure of either.

Perhaps it would be nice to have a C/C++ API that gets commandline flags and handles them, and wasm-opt would use that - that would keep things in sync. Is that what you're thinking of?

@kripken
Copy link
Member

kripken commented May 19, 2023

(Actually, are you using the C API or C++ API?)

@brson
Copy link
Contributor Author

brson commented May 25, 2023

@kripken I am using the C++ API.

My original issue I think was not worded correctly. It's not that I want to run the optimizations in memory, it's that I want to read and write the modules without touching the file system, so that I can run the optimizations.

@brson
Copy link
Contributor Author

brson commented May 25, 2023

This issue may be moot for me for now since the requester of this feature also wants to run on the wasm32-unknown-unknown target and I suspect binaryen cannot be compiled to that target, but instead needs to compile to wasm32-unknown-emscripten.

@kripken
Copy link
Member

kripken commented May 25, 2023

Hmm, what's missing in the C++ API then? You can convert bytes into a Module and then optimize the Module, and convert it back into bytes. Sorry, not sure I understand yet.

I don't know if anyone's tried to compile binaryen with wasm32-unknown-unknown, but it might just work, or it might need an ifdef or two I guess to avoid things like threads for now.

@brson
Copy link
Contributor Author

brson commented May 26, 2023

Hmm, what's missing in the C++ API then? You can convert bytes into a Module and then optimize the Module, and convert it back into bytes. Sorry, not sure I understand yet.

It's definitely possible there are APIs I'm not finding. So far I have been using ModuleReader and ModuleWriter, and those deal in files. I don't see how to do what they are doing with in-memory input and output in an obvious way without copying the logic in these two types.

So to handle loading the modules I would need to do something like readTextData and readBinaryData:

static void readTextData(std::string& input, Module& wasm, IRProfile profile) {
  if (useNewWATParser) {
    std::string_view in(input.c_str());
    if (auto parsed = WATParser::parseModule(wasm, in);
        auto err = parsed.getErr()) {
      Fatal() << err->msg;
    }
  } else {
    SExpressionParser parser(const_cast<char*>(input.c_str()));
    Element& root = *parser.root;
    SExpressionWasmBuilder builder(wasm, *root[0], profile);
  }
}

void ModuleReader::readBinaryData(std::vector<char>& input,
                                  Module& wasm,
                                  std::string sourceMapFilename) {
  std::unique_ptr<std::ifstream> sourceMapStream;
  // Assume that the wasm has had its initial features applied, and use those
  // while parsing.
  WasmBinaryBuilder parser(wasm, wasm.features, input);
  parser.setDebugInfo(debugInfo);
  parser.setDWARF(DWARF);
  parser.setSkipFunctionBodies(skipFunctionBodies);
  if (sourceMapFilename.size()) {
    sourceMapStream = make_unique<std::ifstream>();
    sourceMapStream->open(sourceMapFilename);
    parser.setDebugLocations(sourceMapStream.get());
  }
  parser.read();
  if (sourceMapStream) {
    sourceMapStream->close();
  }
}

where readBinaryData would instead need to use an in-memory source map.

and to serialize the modules again do the same is in writeText and writeBinary:

void ModuleWriter::writeText(Module& wasm, Output& output) {
  output.getStream() << wasm;
}

void ModuleWriter::writeBinary(Module& wasm, Output& output) {
  BufferWithRandomAccess buffer;
  WasmBinaryWriter writer(&wasm, buffer);
  // if debug info is used, then we want to emit the names section
  writer.setNamesSection(debugInfo);
  if (emitModuleName) {
    writer.setEmitModuleName(true);
  }
  std::unique_ptr<std::ofstream> sourceMapStream;
  if (sourceMapFilename.size()) {
    sourceMapStream = make_unique<std::ofstream>();
    sourceMapStream->open(sourceMapFilename);
    writer.setSourceMap(sourceMapStream.get(), sourceMapUrl);
  }
  if (symbolMap.size() > 0) {
    writer.setSymbolMap(symbolMap);
  }
  writer.write();
  buffer.writeTo(output);
  if (sourceMapStream) {
    sourceMapStream->close();
  }
}

The text cases look obvious, but for the binary cases there is some important logic here that needs to be repeated if I want to do what ModuleReader and ModuleWriter are doing, particularly wrt debug info and source maps.

@brson
Copy link
Contributor Author

brson commented May 26, 2023

It's not a lot of code obviously, so if that's the way to do it, I can definitely do it, but I'm happy to have any tips.

@kripken
Copy link
Member

kripken commented May 26, 2023

Ah, yes, that looks right. So readBinary/writeBinary is almost what you want, but it assumes source maps are actual files. I think that would make sense to generalize, and doing it in-tree makes sense to me. That is, the low-level functions should work entirely on bytes in memory, and higher-level ones would handle loading source map data from disk etc. as needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants