-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Lookup Op and unit tests. #3
Conversation
Not sure what's going on here. OSX build succeeds, local builds work but linux CI build breaks down with segfault in the graph-mode-test. |
Apparently there's a mismatch between |
std::vector<float> embedding = lookup->embedding(query(i)); | ||
// optionally mask failed lookups and/or empty string. Generally, empty string will lead to a failed lookup. | ||
if ((query(i).empty() && mask_empty_string_) || (mask_failed_lookup_ && embedding.empty())) { | ||
std::memset(&output_flat(i * dims), 0., dims * 4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess 4 is supposed to be sizeof(float)
? If so, put that here ;).
Also, is the data guaranteed to be row-major? (I guess so)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess 4 is supposed to be sizeof(float)? If so, put that here ;).
Fixed
Also, is the data guaranteed to be row-major? (I guess so)
The Eigen documentation is not too nice, I spent some time looking for suggestions on how to move data from vec(s) into tensors, and the recommended way was using copy_n
or memcpy
I found this SO thread from 3 years ago that claims tf data is always row-major:
Whoops, pressed a button ;). |
If I comment the |
Following this description, it's possible to get shapes through .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
ShapeHandle strings_shape = c->input(1);
ShapeHandle output_shape;
int embedding_len;
TF_RETURN_IF_ERROR(c->GetAttr("embedding_len", &embedding_len));
TF_RETURN_IF_ERROR(
c->Concatenate(strings_shape, c->Vector(embedding_len), &output_shape)
);
ShapeHandle embeds = c->output(0);
TF_RETURN_IF_ERROR(c->WithRank(c->input(0), 0, &embeds));
c->set_output(0, output_shape);
return Status::OK();
}); |
tensorflow/tensorflow#29951 (comment) downgrading the compiler to g++ 4.8 supposedly fixes it. |
9c26671
to
453e4b9
Compare
A maintainer in the thread suggested following the instructions at https://github.com/tensorflow/custom-op which means using a docker with the correct environment. That docker comes with gcc-4.8 which doesn't support It might make sense to restructure the project and follow the It would be great if you could also take a quick look at this and give your opinion! edit: Seems to work both in the custom-op docker container and on Ubuntu with gcc 4.8.4, |
95c1620
to
53a8ae4
Compare
That is old. But they probably want to build with an old glibc/gcc for compatibility with older distributions.
They probably get
The ABI between compiler versions is not guaranteed to be stable. So, if Tensorflow is compiled with a different version than an op, there may be subtle ABI incompatibilities. You generally don't notice. If you are lucky, you get segmentation faults, if you are unlucky, there is silent data corruption. This is why Rust prefers statically linking of units compiled with the same compiler version. At any rate, when you compile with a newer g++, you could try to force an older ABI with https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Dialect-Options.html you want |
There already is a I tried setting various The custom op repo also states that newer (manylinux2010-compatible) wheels are built with Ubuntu 16.04 docker images, which comes with g++-5.4. Fwiw, not setting the ABI in cmake should allow people to compile against self-compiled tf packages with a compiler of their choice. |
This AWS project also pins gcc to 4.8 to build their custom ops: https://github.com/aws/sagemaker-tensorflow-extensions |
That's another know. This sets "_GLIBCXX_USE_CXX11_ABI", which uses newer C++11-compatible declarations of some classes. This changes the C++ standard library API and thus ABI. So, the library can change the ABI and the compiler can change the ABI (e.g. by changing alignment preferences).
I think it is standard for Google to compile C++ without exceptions, so besides setting the ABI you might also want to add
It could just be a CMake option. |
More info on the upstream library:
So, they are actually using gcc 5.4.0. (Which confirms the statement.) Did you try to compile on CI with 5.4 and no additional flags? |
That's different for me:
5.4 is what is supposedly used for packages built after August 1st 19. I haven't tried 5.4.0 yet. |
Maybe they used different build environments for different Python versions? You are on 3.6, while I am on 3.7. |
We're throwing an exception in the constructor if constructing the embeddings in Rust fails. So with that flag we can't build unless we introduce some From what I can tell there is also no such flag set in the custom op example repo: https://github.com/tensorflow/custom-op/blob/master/Makefile TF_CFLAGS := $(shell $(PYTHON_BIN_PATH) -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_compile_flags()))')
TF_LFLAGS := $(shell $(PYTHON_BIN_PATH) -c 'import tensorflow as tf; print(" ".join(tf.sysconfig.get_link_flags()))')
CFLAGS = ${TF_CFLAGS} -fPIC -O2 -std=c++11
LDFLAGS = -shared ${TF_LFLAGS}
ZERO_OUT_TARGET_LIB = tensorflow_zero_out/python/ops/_zero_out_ops.so
# zero_out op for CPU
zero_out_op: $(ZERO_OUT_TARGET_LIB)
$(ZERO_OUT_TARGET_LIB): $(ZERO_OUT_SRCS)
$(CXX) $(CFLAGS) -o $@ $^ ${LDFLAGS}
I verified that, 3.7.3 is built on 16.04 with g++-5.4 |
So then the probably absolutely correct way would be to build for Python 3.6 using gcc 4.8 and for Python 3.7 using gcc 5.4. (Since there are two compiler ABI changes in between. Not sure whether they matter in this case.) |
6235ea5
to
0c5e049
Compare
I found some more info on this at tensorflow/tensorflow#27067 and tensorflow/community#77. For now, matching the compiler version and flags seems like the only way to guarantee working builds. We can get the compiler version from the python package, too: import tensorflow as tf
tf.version.COMPILER_VERSION returns this, so I guess we can use that to select the correct compiler, too. With tensorflow/community#77 it would be possible to build packages independent of what was used to compile pip tensorflow. |
python
I put together a setup.py by following what uber is doing with horovod (https://github.com/horovod/horovod/blob/master/setup.py), they set a min compiler version through That'll be part of a future PR, so I think the only question for this PR is how to set up the CI and whether what I did is correct for our purposes. |
Sounds good, especially with the possibility to override the compiler.
I think for this PR then, it would be good enough to use a fixed compiler version (corresponding to what the Python module needs). |
This PR now has two Linux builds, one for python 3.6 and one for 3.7, for the 3.6 build I export g++-4.8 as Does that sound good to you? |
Excellent! |
.travis.yml
Outdated
addons: | ||
apt: | ||
sources: | ||
- ubuntu-toolchain-r-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to get an old toolchain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy paste from the travis-ci docs. Seems like it doesn't actually work ;)
Disallowing sources: ubuntu-toolchain-r-test
To add unlisted APT sources, follow instructions in https://docs.travis-ci.com/user/installing-dependencies#Installing-Packages-with-the-APT-Addon
I'll push a version without this, shouldn't break the build.
This PR adds functionality to do embedding lookups in tensorflow.