Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add image for hive3 and hadoop3 #13

Closed
wants to merge 3 commits into from

Conversation

shubhamtagra
Copy link
Member

Images based on hdp3.

I have use image created via this change for Hive ACID tests in trinodb/trino#1034

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that most of the files were created as a copy from hdp2.6 and then applying certain changes. Now, these changes are not visible.

Could you please make 2 separate commits? One that is plain copy and second which changes the file contents as necessary?


## Oracle license

By using this image, you accept the Oracle Binary Code License Agreement for Java SE available here:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't apply, the image uses openjdk

@@ -0,0 +1,13 @@
# centos7-oj8 [![][layers-badge]][layers-link] [![][version-badge]][dockerhub-link]

[layers-badge]: https://images.microbadger.com/badges/image/prestodev/centos7-oj8.svg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kokosing i'd like to abandon this READMEs. I don't find them useful.
WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed the link are dead as of now but wasnt sure if there are plan to get them working so extended the readmes from existing ones.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stagraqubole i understand. You did the reasonable thing. Let's wait for @kokosing 's opinion before dropping this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Especially, if they have outdated information then they do more harm than good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove these then.

prestodev/centos7-oj8/Dockerfile Outdated Show resolved Hide resolved
prestodev/centos7-oj8/Dockerfile Show resolved Hide resolved
\
# install supervisor
yum --enablerepo=extras install -y setuptools epel-release && \
sed -i 's/https/http/g' /etc/yum.repos.d/epel.repo && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not be needed. Why so?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will double check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @stagraqubole for working on this!

By any chance, did you also prepare kerberized images?
They would be required when we want to run all the tests on hdp3.

I didnt prepare kerberized image, my requirement was limited to running tests using hive3.

@findepi
Copy link
Member

findepi commented Jun 26, 2019

Thanks @stagraqubole for working on this!

By any chance, did you also prepare kerberized images?
They would be required when we want to run all the tests on hdp3.

@shubhamtagra
Copy link
Member Author

My understanding is that most of the files were created as a copy from hdp2.6 and then applying certain changes. Now, these changes are not visible.

Could you please make 2 separate commits? One that is plain copy and second which changes the file contents as necessary?

Makes sense, I will separate it out.

@shubhamtagra shubhamtagra force-pushed the hive3 branch 2 times, most recently from 5c90577 to 2a46baa Compare June 27, 2019 05:02
@shubhamtagra
Copy link
Member Author

Updates as per the comments @findepi

Copy link
Member

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice piece of work!

@@ -0,0 +1,13 @@
# centos7-oj8 [![][layers-badge]][layers-link] [![][version-badge]][dockerhub-link]

[layers-badge]: https://images.microbadger.com/badges/image/prestodev/centos7-oj8.svg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Especially, if they have outdated information then they do more harm than good.

prestodev/hdp3-hive3/Dockerfile Outdated Show resolved Hide resolved
@@ -59,7 +66,7 @@ RUN chmod 755 /root && chmod 700 /root/.ssh
RUN passwd --unlock root

# HDFS ports
EXPOSE 1004 1006 8020 50010 50020 50070 50075 50470
EXPOSE 1004 1006 8020 9866 9867 9870 9864 50470
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to these ports? Is 50070 no longer exposed? Was it migrated somehow? I used to use it in other versions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Port have changed. I used https://issues.apache.org/jira/browse/HDFS-9427 and updated the ports that were in use.

prestodev/hdp3-hive3/files/etc/hadoop/conf/yarn-site.xml Outdated Show resolved Hide resolved
prestodev/hdp3-hive3/files/etc/hadoop/conf/yarn-site.xml Outdated Show resolved Hide resolved
prestodev/hdp3-hive3/files/etc/tez/conf/tez-site.xml Outdated Show resolved Hide resolved
prestodev/hdp3-hive3/files/etc/tez/conf/tez-site.xml Outdated Show resolved Hide resolved
prestodev/hdp3-hive3/files/etc/tez/conf/tez-site.xml Outdated Show resolved Hide resolved
prestodev/hdp3-hive3/files/etc/tez/conf/tez-site.xml Outdated Show resolved Hide resolved
@shubhamtagra shubhamtagra force-pushed the hive3 branch 2 times, most recently from 0dec365 to 8584ce4 Compare June 27, 2019 09:25
@shubhamtagra
Copy link
Member Author

Handled your comments @kokosing

@kokosing
Copy link
Member

Ah... I forgot about tests. Please see https://github.com/prestosql/docker-images/blob/master/bin/test.sh. It would be great if you could add a test for this image. Where you could do some simple queries against hive (CREATE TABLE, INSERT, SELECT, TRUNCATE, DROP).

@shubhamtagra
Copy link
Member Author

shubhamtagra commented Jun 27, 2019

Ah... I forgot about tests. Please see https://github.com/prestosql/docker-images/blob/master/bin/test.sh. It would be great if you could add a test for this image. Where you could do some simple queries against hive (CREATE TABLE, INSERT, SELECT, TRUNCATE, DROP).

@kokosing I see that the test.sh file already has the test queries and can run for any profile which has compose file defined. Do you want me to make changes to ensure that it runs on hdp3-hive3 (like having compose file for hdp3-hive3 etc) or something more?

@shubhamtagra
Copy link
Member Author

  • Added docker-compose.yml for hdp3-hive3 for tests
  • Added beeline-site.xml to let hive -e work without specifying connection url

Ran bin/test.sh hdp3-hive3 successfully. Let me know if something more is needed @kokosing

@shubhamtagra
Copy link
Member Author

I also noticed that in test.sh, set +e is set around the function that runs the hive queries. Wouldnt we want the test run to fail if hive queries fail for some reason?

@shubhamtagra
Copy link
Member Author

I also noticed that in test.sh, set +e is set around the function that runs the hive queries. Wouldnt we want the test run to fail if hive queries fail for some reason?

The expectation is to return the error code and do cleanup, hence the set +e. But we are currently returning last hive query's error code which is not correct. I have added a commit to fix that as well

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review

(until prestodev/hdp3-hive3/files/etc/supervisord.d/mysql-metastore.conf)

prestodev/hdp3-hive3/Dockerfile Outdated Show resolved Hide resolved
prestodev/centos7-oj8/Dockerfile Outdated Show resolved Hide resolved
prestodev/centos7-oj8/Dockerfile Outdated Show resolved Hide resolved
prestodev/hdp3-hive3/Dockerfile Outdated Show resolved Hide resolved
prestodev/hdp3-hive3/files/etc/hadoop/conf/yarn-site.xml Outdated Show resolved Hide resolved
prestodev/hdp3-hive3/files/etc/hive/conf/hive-site.xml Outdated Show resolved Hide resolved
prestodev/hdp3-hive3/files/etc/hive/conf/hive-site.xml Outdated Show resolved Hide resolved
prestodev/hdp3-hive3/files/etc/hive/conf/hive-site.xml Outdated Show resolved Hide resolved

<property>
<name>hive.support.concurrency</name>
<value>true</value>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for?
(I think we'll never run any concurrent workload in hive directly.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a requirement for using DbTxnManager and DbTxnManager is a requirement for ACID.

@@ -1,5 +1,5 @@
[program:mysql-metastore]
command=/usr/bin/pidproxy /var/run/mysqld/mysqld.pid /usr/bin/mysqld_safe
command=/bin/bash -c "(chown -R mysql:mysql /var/lib/mysql) && (/usr/bin/pidproxy /var/run/mysqld/mysqld.pid /usr/bin/mysqld_safe)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is not sufficient to chown once in Dockerfile. Did you try? i wouldn't be surprised -- i think i encountered such a problem recently somewhere else.

no need for ( ... ), please remove them

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasnt able to get it running with chown in Dockerfile. Tried again by placing another instance of it after full setup but still there are failures when container starts.

@shubhamtagra
Copy link
Member Author

@findepi handled your latest comments. Only the part about mysql vs mariadb is left.

@shubhamtagra
Copy link
Member Author

@findepi @kokosing ping!! I have a PR in presto blocked on this, can we move forward with this please.

@kokosing
Copy link
Member

kokosing commented Jul 8, 2019

@findepi @kokosing ping!! I have a PR in presto blocked on this, can we move forward with this please.

Can you please post a docker image under different repository name to docker hub for now? I would like to merge this PR and the one in Presto together. Notice that merging this PR is not enough, we still need to do a release.

@shubhamtagra
Copy link
Member Author

@findepi @kokosing ping!! I have a PR in presto blocked on this, can we move forward with this please.

Can you please post a docker image under different repository name to docker hub for now? I would like to merge this PR and the one in Presto together. Notice that merging this PR is not enough, we still need to do a release.

I have done that, PR is open in presto (with a bit outdated docker image): trinodb/trino#1034

I will update it with the latest image.

@kokosing
Copy link
Member

kokosing commented Jul 8, 2019

I will update it with the latest image.

Super. Thanks!

@shubhamtagra
Copy link
Member Author

shubhamtagra commented Jul 9, 2019

I will update it with the latest image.

Super. Thanks!

This is done. Image is at shubhamtagra/hdp3.1-hive:1

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, a bunch of comments.
I'm testing this locally now.

etc/compose/hdp3.1-hive/docker-compose.yml Show resolved Hide resolved
etc/compose/hdp3-hive3/docker-compose.yml Outdated Show resolved Hide resolved
etc/compose/hdp3-hive3/docker-compose.yml Outdated Show resolved Hide resolved
prestodev/hdp3.1-hive/Dockerfile Show resolved Hide resolved
prestodev/hdp3.1-hive/files/root/setup.sh Show resolved Hide resolved
prestodev/hdp3.1-hive/files/root/setup.sh Show resolved Hide resolved
prestodev/hdp3.1-hive/files/root/setup.sh Show resolved Hide resolved
prestodev/hdp3.1-hive/files/root/setup.sh Show resolved Hide resolved
prestodev/hdp3.1-hive/files/root/setup.sh Show resolved Hide resolved
@findepi
Copy link
Member

findepi commented Jul 24, 2019

I manually tested locally the metastore part of the image

docker run --rm -it --name hadoop-master --hostname hadoop-master -p 9083:9083 -p 1180:1180 prestodev/hdp3.1-hive:latest
echo "127.0.0.1 hadoop-master" > /etc/hosts
connector.name=hive-hadoop2
hive.metastore.uri=thrift://hadoop-master:9083
hive.metastore.thrift.client.socks-proxy=hadoop-master:1180

It worked like a charm.

Good job!

@findepi
Copy link
Member

findepi commented Jul 25, 2019

i will take over this PR & merge it, if you don't mind.

@findepi findepi mentioned this pull request Jul 25, 2019
@findepi
Copy link
Member

findepi commented Jul 25, 2019

Thanks @stagraqubole, this is merged as 51c48e9

@findepi findepi closed this Jul 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants