Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace homegrown Dockerfile parser with dockerfile-parse #522

Closed
nishakm opened this issue Jan 15, 2020 · 0 comments · Fixed by #611
Closed

Replace homegrown Dockerfile parser with dockerfile-parse #522

nishakm opened this issue Jan 15, 2020 · 0 comments · Fixed by #611
Assignees
Labels
Milestone

Comments

@nishakm
Copy link
Contributor

nishakm commented Jan 15, 2020

Description
dockerfile-parse is a library to parse Dockerfiles. It has a few more features than the homegrown parser that are useful including listing ENV (and ARG after submitting a PR to the project)

To Do
Rewrite dockerfile.py to use the parser.

Background
This is related to #508

Super Issues
#454

@nishakm nishakm added this to the Release 1.1.0 milestone Jan 15, 2020
@nishakm nishakm self-assigned this Jan 16, 2020
nishakm pushed a commit to nishakm/tern that referenced this issue Jan 22, 2020
This is work towards tern-tools#522

We add initial functionality for parsing dockerfiles using
dockerfile_parse. We also add some tests for the functions therein.
A key feature of using dockerfile_parse is that we can now do
variable expansion i.e. for ENV instructions, replace the keys
with the values for the content in the dockerfile. This allows for
more accurate analysis of possible packages installed using scripts
that don't use a system package manager.

In order to test the functions, we also added some example
dockerfiles we would test against. They vary in complexity.

We added the new test to the ci test suite and the dockerfile-parse
module to requirements.txt

Signed-off-by: Nisha K <[email protected]>
nishakm pushed a commit to nishakm/tern that referenced this issue Jan 22, 2020
This is work towards tern-tools#522

We add initial functionality for parsing dockerfiles using
dockerfile_parse. We also add some tests for the functions therein.
A key feature of using dockerfile_parse is that we can now do
variable expansion i.e. for ENV instructions, replace the keys
with the values for the content in the dockerfile. This allows for
more accurate analysis of possible packages installed using scripts
that don't use a system package manager.

In order to test the functions, we also added some example
dockerfiles we would test against. They vary in complexity.

We added the new test to the ci test suite and the dockerfile-parse
module to requirements.txt. We also made sure prospector doesn't
complain about functions written for unittest being in CamelCase.

Signed-off-by: Nisha K <[email protected]>
nishakm pushed a commit to nishakm/tern that referenced this issue Jan 22, 2020
This is work towards tern-tools#522

We add initial functionality for parsing dockerfiles using
dockerfile_parse. We also add some tests for the functions therein.
A key feature of using dockerfile_parse is that we can now do
variable expansion i.e. for ENV instructions, replace the keys
with the values for the content in the dockerfile. This allows for
more accurate analysis of possible packages installed using scripts
that don't use a system package manager.

In order to test the functions, we also added some example
dockerfiles we would test against. They vary in complexity.

We added the new test to the ci test suite and the dockerfile-parse
module to requirements.txt. We also made sure prospector doesn't
complain about functions written for unittest being in CamelCase.

Signed-off-by: Nisha K <[email protected]>
nishakm pushed a commit to nishakm/tern that referenced this issue Jan 22, 2020
This is work towards tern-tools#522

We add initial functionality for parsing dockerfiles using
dockerfile_parse. We also add some tests for the functions therein.
A key feature of using dockerfile_parse is that we can now do
variable expansion i.e. for ENV instructions, replace the keys
with the values for the content in the dockerfile. This allows for
more accurate analysis of possible packages installed using scripts
that don't use a system package manager.

In order to test the functions, we also added some example
dockerfiles we would test against. They vary in complexity.

We added the new test to the ci test suite and the dockerfile-parse
module to requirements.txt.

Signed-off-by: Nisha K <[email protected]>
nishakm pushed a commit to nishakm/tern that referenced this issue Feb 12, 2020
This is work towards tern-tools#522

We add initial functionality for parsing dockerfiles using
dockerfile_parse. We also add some tests for the functions therein
and some extra functions to parse various pieces of the Dockerfile
we will need.

Most of the work is in tern/analyze/docker/dockerfile.py.
We add a class called Dockerfile which will contain the information
parsed using the function get_dockerfile_obj. The typical workflow
is to create a Dockerfile object using get_dockerfile_obj from an
existing Dockerfile file. Then we can use the other functions to
return the information we want

- replace_env will do a key-value replacement of any piece of the
Dockerfile object's structure property with any key-value dict. The
typical use for this is to replace ENVs with their values in any
Dockerfile line.
- expand_vars will do the replacement wholesale for the Dockerfile
content.
- parse_from_image will get a dictionary containing tokens in the
image string for each FROM line in the Dockerfile. In order for
this to work, we also add a function called parse_image_string
to tern/utils/general.py which will do the parsing of the image
string. This helps us use this parsing for image names passed via
command line using the -i flag.
- Added tests for these functions in test_analyze_docker_dockerfile.py
For these tests, we also add some dockerfiles.

Other changes include:
- Add dockerfile-parse to the list of requirements.
- Add tests for dockerfile and general to the CI tests.

Signed-off-by: Nisha K <[email protected]>
nishakm pushed a commit to nishakm/tern that referenced this issue Feb 12, 2020
This is work towards tern-tools#522

We add initial functionality for parsing dockerfiles using
dockerfile_parse. We also add some tests for the functions therein
and some extra functions to parse various pieces of the Dockerfile
we will need.

Most of the work is in tern/analyze/docker/dockerfile.py.
We add a class called Dockerfile which will contain the information
parsed using the function get_dockerfile_obj. The typical workflow
is to create a Dockerfile object using get_dockerfile_obj from an
existing Dockerfile file. Then we can use the other functions to
return the information we want

- replace_env will do a key-value replacement of any piece of the
Dockerfile object's structure property with any key-value dict. The
typical use for this is to replace ENVs with their values in any
Dockerfile line.
- expand_vars will do the replacement wholesale for the Dockerfile
content.
- parse_from_image will get a dictionary containing tokens in the
image string for each FROM line in the Dockerfile. In order for
this to work, we also add a function called parse_image_string
to tern/utils/general.py which will do the parsing of the image
string. This helps us use this parsing for image names passed via
command line using the -i flag.
- Added tests for these functions in test_analyze_docker_dockerfile.py
For these tests, we also add some dockerfiles.

Other changes include:
- Add dockerfile-parse to the list of requirements.
- Add tests for dockerfile and general to the CI tests.

Signed-off-by: Nisha K <[email protected]>
nishakm pushed a commit that referenced this issue Feb 12, 2020
This is work towards #522

We add initial functionality for parsing dockerfiles using
dockerfile_parse. We also add some tests for the functions therein
and some extra functions to parse various pieces of the Dockerfile
we will need.

Most of the work is in tern/analyze/docker/dockerfile.py.
We add a class called Dockerfile which will contain the information
parsed using the function get_dockerfile_obj. The typical workflow
is to create a Dockerfile object using get_dockerfile_obj from an
existing Dockerfile file. Then we can use the other functions to
return the information we want

- replace_env will do a key-value replacement of any piece of the
Dockerfile object's structure property with any key-value dict. The
typical use for this is to replace ENVs with their values in any
Dockerfile line.
- expand_vars will do the replacement wholesale for the Dockerfile
content.
- parse_from_image will get a dictionary containing tokens in the
image string for each FROM line in the Dockerfile. In order for
this to work, we also add a function called parse_image_string
to tern/utils/general.py which will do the parsing of the image
string. This helps us use this parsing for image names passed via
command line using the -i flag.
- Added tests for these functions in test_analyze_docker_dockerfile.py
For these tests, we also add some dockerfiles.

Other changes include:
- Add dockerfile-parse to the list of requirements.
- Add tests for dockerfile and general to the CI tests.

Signed-off-by: Nisha K <[email protected]>
@nishakm nishakm assigned rnjudge and unassigned nishakm Feb 21, 2020
rnjudge added a commit to rnjudge/tern that referenced this issue Mar 27, 2020
This commit makes changes to the execution path when analyzing a
dockerfile or utilizing the dockerfile lock functionality. Instead of
using a True/False 'dockerfile' flag to indicate there is a Dockerfile
to analyze, provide a 'dfobj' dockerfile object instead. When using a
dockerfile object, the file path is still available to access via
dfobj.filepath. This commit also adds a dfile_lock True/False flag as
an argument to a handful of functions to differentiate between a
"dockerfile lock" dockerfile and regular dockerfile analysis. Even
though the execution path is similar for both options, this distinction
is important for determining the output file that should be generated.

Works towards tern-tools#522

Signed-off-by: Rose Judge <[email protected]>
rnjudge added a commit to rnjudge/tern that referenced this issue Mar 27, 2020
This commit makes changes to tern/analyze/docker/run.py and
tern/analyze/docker/helpers.py to utilize dockerfile objects for
parsing purposes when applicable. The dockerfile object information
is accessible via the global 'docker_commands' variable assigned as
part of the setup process in helpers.py.

This commit also makes changes to the appropriate functions to
utilize the new dfile_lock and dfobj arguments. In order to pass
the dfobj as a flag to analyze-related functions, a dockerfile
object is created in execute_dockerfile() based on the dockerfile
path that was provided on the command line when Tern was run.

The most significant changes in helpers.py is to the
get_dockerfile_base() function. The function still serves the same
purpose it did before this commit, but now gets information about
the base tag directly from the dockerfile object and adds more error
checking around various dockerfile keywords related to the FROM
command.

Works towards tern-tools#522

Signed-off-by: Rose Judge <[email protected]>
rnjudge added a commit to rnjudge/tern that referenced this issue Mar 27, 2020
This is a large commit that removes unnecessary manual parsing from
tern/analyze/docker/dockerfile.py so that other parts of the code can
utilize the built-in parsing abilities from the DockerfileParse module.
While many of the changes from this commit remove no longer necessary
functions, a few helper functions were also added.

Functions Removed:
- get_command_list()
- get_directive()
- get_directive_list()
- get_base_instructions()
- get_base_image_tag()

Functions Added:
- update_parent_images(
  If the FROM line in a Dockerfile contains a variable previously
  defined by the ARG command, the function expand_arg will take
  care of the replacement and update the dockerfile object structure.
  When this happens, we also need to update the parent_images list
  property of the Dockerfile object in case one of the ARG variables
  is part of the FROM line.

- get_command_list()
  Returns a list of commands from the dockerfile object structure
  provided. Useful when it's unrealistic to loop through the entire
  dictionary looking for a certain command.

Resolves tern-tools#522

Signed-off-by: Rose Judge <[email protected]>
rnjudge added a commit to rnjudge/tern that referenced this issue Mar 27, 2020
This commit makes changes to the execution path when analyzing a
dockerfile or utilizing the dockerfile lock functionality. Instead of
using a True/False 'dockerfile' flag to indicate there is a Dockerfile
to analyze, provide a 'dfobj' dockerfile object instead. When using a
dockerfile object, the file path is still available to access via
dfobj.filepath. This commit also adds a dfile_lock True/False flag as
an argument to a handful of functions to differentiate between a
"dockerfile lock" dockerfile and regular dockerfile analysis. Even
though the execution path is similar for both options, this distinction
is important for determining the output file that should be generated.

Works towards tern-tools#522

Signed-off-by: Rose Judge <[email protected]>
rnjudge added a commit to rnjudge/tern that referenced this issue Mar 27, 2020
This commit makes changes to tern/analyze/docker/run.py and
tern/analyze/docker/helpers.py to utilize dockerfile objects for
parsing purposes when applicable. The dockerfile object information
is accessible via the global 'docker_commands' variable assigned as
part of the setup process in helpers.py.

This commit also makes changes to the appropriate functions to
utilize the new dfile_lock and dfobj arguments. In order to pass
the dfobj as a flag to analyze-related functions, a dockerfile
object is created in execute_dockerfile() based on the dockerfile
path that was provided on the command line when Tern was run.

The most significant changes in helpers.py is to the
get_dockerfile_base() function. The function still serves the same
purpose it did before this commit, but now gets information about
the base tag directly from the dockerfile object and adds more error
checking around various dockerfile keywords related to the FROM
command.

Works towards tern-tools#522

Signed-off-by: Rose Judge <[email protected]>
rnjudge added a commit to rnjudge/tern that referenced this issue Mar 27, 2020
This is a large commit that removes unnecessary manual parsing from
tern/analyze/docker/dockerfile.py so that other parts of the code can
utilize the built-in parsing abilities from the DockerfileParse module.
While many of the changes from this commit remove no longer necessary
functions, a few helper functions were also added.

Functions Removed:
- get_command_list()
- get_directive()
- get_directive_list()
- get_base_instructions()
- get_base_image_tag()

Functions Added:
- update_parent_images(
  If the FROM line in a Dockerfile contains a variable previously
  defined by the ARG command, the function expand_arg will take
  care of the replacement and update the dockerfile object structure.
  When this happens, we also need to update the parent_images list
  property of the Dockerfile object in case one of the ARG variables
  is part of the FROM line.

- get_command_list()
  Returns a list of commands from the dockerfile object structure
  provided. Useful when it's unrealistic to loop through the entire
  dictionary looking for a certain command.

Resolves tern-tools#522

Signed-off-by: Rose Judge <[email protected]>
rnjudge added a commit to rnjudge/tern that referenced this issue Mar 27, 2020
This commit makes changes to the execution path when analyzing a
dockerfile or utilizing the dockerfile lock functionality. Instead of
using a True/False 'dockerfile' flag to indicate there is a Dockerfile
to analyze, provide a 'dfobj' dockerfile object instead. When using a
dockerfile object, the file path is still available to access via
dfobj.filepath. This commit also adds a dfile_lock True/False flag as
an argument to a handful of functions to differentiate between a
"dockerfile lock" dockerfile and regular dockerfile analysis. Even
though the execution path is similar for both options, this distinction
is important for determining the output file that should be generated.

Works towards tern-tools#522

Signed-off-by: Rose Judge <[email protected]>
rnjudge added a commit to rnjudge/tern that referenced this issue Mar 27, 2020
This commit makes changes to tern/analyze/docker/run.py and
tern/analyze/docker/helpers.py to utilize dockerfile objects for
parsing purposes when applicable. The dockerfile object information
is accessible via the global 'docker_commands' variable assigned as
part of the setup process in helpers.py.

This commit also makes changes to the appropriate functions to
utilize the new dfile_lock and dfobj arguments. In order to pass
the dfobj as a flag to analyze-related functions, a dockerfile
object is created in execute_dockerfile() based on the dockerfile
path that was provided on the command line when Tern was run.

The most significant changes in helpers.py is to the
get_dockerfile_base() function. The function still serves the same
purpose it did before this commit, but now gets information about
the base tag directly from the dockerfile object and adds more error
checking around various dockerfile keywords related to the FROM
command.

Works towards tern-tools#522

Signed-off-by: Rose Judge <[email protected]>
rnjudge added a commit to rnjudge/tern that referenced this issue Mar 27, 2020
This is a large commit that removes unnecessary manual parsing from
tern/analyze/docker/dockerfile.py so that other parts of the code can
utilize the built-in parsing abilities from the DockerfileParse module.
While many of the changes from this commit remove no longer necessary
functions, a few helper functions were also added.

Functions Removed:
- get_command_list()
- get_directive()
- get_directive_list()
- get_base_instructions()
- get_base_image_tag()

Functions Added:
- update_parent_images(
  If the FROM line in a Dockerfile contains a variable previously
  defined by the ARG command, the function expand_arg will take
  care of the replacement and update the dockerfile object structure.
  When this happens, we also need to update the parent_images list
  property of the Dockerfile object in case one of the ARG variables
  is part of the FROM line.

- get_command_list()
  Returns a list of commands from the dockerfile object structure
  provided. Useful when it's unrealistic to loop through the entire
  dictionary looking for a certain command.

Resolves tern-tools#522

Signed-off-by: Rose Judge <[email protected]>
nishakm pushed a commit that referenced this issue Mar 27, 2020
This merge brings in changes to replace the existing Dockerfile
parser with the functionality of the dockerfile_parse module. The
module allows us to take in a Dockerfile and return an object which
can be read and manipulated to either produce an analysis report
or create a locked Dockerfile.

Resolves #522 

Signed-off-by: Nisha K <[email protected]>
rnjudge pushed a commit to rnjudge/tern that referenced this issue Jun 5, 2020
This is work towards tern-tools#522

We add initial functionality for parsing dockerfiles using
dockerfile_parse. We also add some tests for the functions therein
and some extra functions to parse various pieces of the Dockerfile
we will need.

Most of the work is in tern/analyze/docker/dockerfile.py.
We add a class called Dockerfile which will contain the information
parsed using the function get_dockerfile_obj. The typical workflow
is to create a Dockerfile object using get_dockerfile_obj from an
existing Dockerfile file. Then we can use the other functions to
return the information we want

- replace_env will do a key-value replacement of any piece of the
Dockerfile object's structure property with any key-value dict. The
typical use for this is to replace ENVs with their values in any
Dockerfile line.
- expand_vars will do the replacement wholesale for the Dockerfile
content.
- parse_from_image will get a dictionary containing tokens in the
image string for each FROM line in the Dockerfile. In order for
this to work, we also add a function called parse_image_string
to tern/utils/general.py which will do the parsing of the image
string. This helps us use this parsing for image names passed via
command line using the -i flag.
- Added tests for these functions in test_analyze_docker_dockerfile.py
For these tests, we also add some dockerfiles.

Other changes include:
- Add dockerfile-parse to the list of requirements.
- Add tests for dockerfile and general to the CI tests.

Signed-off-by: Nisha K <[email protected]>
rnjudge added a commit to rnjudge/tern that referenced this issue Jun 5, 2020
This commit makes changes to the execution path when analyzing a
dockerfile or utilizing the dockerfile lock functionality. Instead of
using a True/False 'dockerfile' flag to indicate there is a Dockerfile
to analyze, provide a 'dfobj' dockerfile object instead. When using a
dockerfile object, the file path is still available to access via
dfobj.filepath. This commit also adds a dfile_lock True/False flag as
an argument to a handful of functions to differentiate between a
"dockerfile lock" dockerfile and regular dockerfile analysis. Even
though the execution path is similar for both options, this distinction
is important for determining the output file that should be generated.

Works towards tern-tools#522

Signed-off-by: Rose Judge <[email protected]>
rnjudge added a commit to rnjudge/tern that referenced this issue Jun 5, 2020
This commit makes changes to tern/analyze/docker/run.py and
tern/analyze/docker/helpers.py to utilize dockerfile objects for
parsing purposes when applicable. The dockerfile object information
is accessible via the global 'docker_commands' variable assigned as
part of the setup process in helpers.py.

This commit also makes changes to the appropriate functions to
utilize the new dfile_lock and dfobj arguments. In order to pass
the dfobj as a flag to analyze-related functions, a dockerfile
object is created in execute_dockerfile() based on the dockerfile
path that was provided on the command line when Tern was run.

The most significant changes in helpers.py is to the
get_dockerfile_base() function. The function still serves the same
purpose it did before this commit, but now gets information about
the base tag directly from the dockerfile object and adds more error
checking around various dockerfile keywords related to the FROM
command.

Works towards tern-tools#522

Signed-off-by: Rose Judge <[email protected]>
rnjudge added a commit to rnjudge/tern that referenced this issue Jun 5, 2020
This is a large commit that removes unnecessary manual parsing from
tern/analyze/docker/dockerfile.py so that other parts of the code can
utilize the built-in parsing abilities from the DockerfileParse module.
While many of the changes from this commit remove no longer necessary
functions, a few helper functions were also added.

Functions Removed:
- get_command_list()
- get_directive()
- get_directive_list()
- get_base_instructions()
- get_base_image_tag()

Functions Added:
- update_parent_images(
  If the FROM line in a Dockerfile contains a variable previously
  defined by the ARG command, the function expand_arg will take
  care of the replacement and update the dockerfile object structure.
  When this happens, we also need to update the parent_images list
  property of the Dockerfile object in case one of the ARG variables
  is part of the FROM line.

- get_command_list()
  Returns a list of commands from the dockerfile object structure
  provided. Useful when it's unrealistic to loop through the entire
  dictionary looking for a certain command.

Resolves tern-tools#522

Signed-off-by: Rose Judge <[email protected]>
rnjudge pushed a commit to rnjudge/tern that referenced this issue Jun 5, 2020
This merge brings in changes to replace the existing Dockerfile
parser with the functionality of the dockerfile_parse module. The
module allows us to take in a Dockerfile and return an object which
can be read and manipulated to either produce an analysis report
or create a locked Dockerfile.

Resolves tern-tools#522 

Signed-off-by: Nisha K <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants