Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map scancode's data into tern's data model #480

Closed
nishakm opened this issue Oct 22, 2019 · 0 comments · Fixed by #582
Closed

Map scancode's data into tern's data model #480

nishakm opened this issue Oct 22, 2019 · 0 comments · Fixed by #582
Milestone

Comments

@nishakm
Copy link
Contributor

nishakm commented Oct 22, 2019

Description
#474 creates the plugin but it will just print the output to console and so it cannot be used in any of the formats. Fill out Tern's data properties based on scancode's data.

To Do
This will involve just populating the image object's properties with scancode's data.

Background
This issue depends on #476

Super Issues
#284

@nishakm nishakm added this to the Near Future milestone Oct 22, 2019
@nishakm nishakm modified the milestones: Near Future, Release 1.1.0 Nov 7, 2019
nishakm pushed a commit to nishakm/tern that referenced this issue Mar 5, 2020
We need the ability to set the file_type after a FileData object
is instantiated as we may be able to get that data later. Added
a property setter for file_type and a test for the setter.

Work towards tern-tools#480

Signed-off-by: Nisha K <[email protected]>
nishakm pushed a commit to nishakm/tern that referenced this issue Mar 5, 2020
This change uses the FileData class to store all the data scancode
collects. There are two scenarios this change tackles: one when
tern cannot collect file level data and hence a files list does not
exist within the ImageLayer object created during loading the image,
and one where tern has processed files in the image but doesn't have
any of the other data. In the case where we know the files in the
image layer, we introduce a function get_file_command() which will
get the correct command to invoke against a file rather than a
directory where the file exists.

In the analyze_layer() function, if the data retrieved after a scan
is a file, a FileData object is created and added to the ImageLayer
object. In the case of the analyze_file() function, for each FileData
object in the ImageLayer object, we run scancode and fill in the data
from the results.

This is work towards tern-tools#480

Signed-off-by: Nisha K <[email protected]>
rnjudge added a commit that referenced this issue Mar 6, 2020
These changes are work towards #480

This commit adds a setter for file_type in the
FileData class and also integrates
the scancode data into Tern's data model.
rnjudge added a commit that referenced this issue Mar 13, 2020
This is work towards #480

The first commit fixes execution of the scancode extension.
The second commit adds a new method to the FileData
class along with a test.
nishakm pushed a commit to nishakm/tern that referenced this issue Mar 16, 2020
load_from_cache is used in the default execution path to analyze
Docker container images and Dockerfiles. It is used to check if
a layer has already been looked at. If it is, rather than run the
analysis, we will load the data from the cache. Similarly,
save_to_cache, checks to see if there is any data collected, and
if there is, it will commit the data to the cache. This functionality
was only restricted to package level data. This change adds file
level data caching.

This change does the following:
1. Split up package level and file level cache checks and loading.
Combine these two functions into load_from_cache.
2. Move the file level data loading from the DockerImage class to
load_from_cache. Modify the class tests accordingly.
3. Check if there is package level data or if the files were analyzed
before saving to the cache.

Work towards tern-tools#480

Signed-off-by: Nisha K <[email protected]>
nishakm pushed a commit to nishakm/tern that referenced this issue Mar 16, 2020
Scancode takes a ridiculous amount of time to scan every file in
a full OS. It is much faster at making decisions on what to scan
when operating at the directory level, especially since it does
its own file level inventorying. To allow for the best possible
user experience, we will only resort to scanning at the directory
level. For this to work, we will also need to check if the files
were analyzed before.

This change introduces the following changes:
1. Remove the file level data collection function.
2. Introduce a collect_layer_data function which will collect the
file level data and return a list of FileData objects.
3. Introduce a add_file_data function which will make use of the
'merge' method in FileData to add the collected file level information
to the ImageLayer object.
4. Load and save to the cache and information that is collected.

Work towards tern-tools#480

Signed-off-by: Nisha K <[email protected]>
nishakm pushed a commit to nishakm/tern that referenced this issue Mar 16, 2020
Scancode takes a ridiculous amount of time to scan every file in
a full OS. It is much faster at making decisions on what to scan
when operating at the directory level, especially since it does
its own file level inventorying. To allow for the best possible
user experience, we will only resort to scanning at the directory
level. For this to work, we will also need to check if the files
were analyzed before.

This change introduces the following changes:
1. Remove the file level data collection function.
2. Introduce a collect_layer_data function which will collect the
file level data and return a list of FileData objects.
3. Introduce a add_file_data function which will make use of the
'merge' method in FileData to add the collected file level information
to the ImageLayer object.
4. Load and save to the cache and information that is collected.

Work towards tern-tools#480

Signed-off-by: Nisha K <[email protected]>
nishakm pushed a commit that referenced this issue Mar 18, 2020
This resolves #480

The first commit modifies loading and saving to the cache
for file level data.
The second commit modifies the executor for scancode to
leverage file level data caching and collection for an image.

Signed-off-by: Nisha K [email protected]
rnjudge pushed a commit to rnjudge/tern that referenced this issue Jun 5, 2020
We need the ability to set the file_type after a FileData object
is instantiated as we may be able to get that data later. Added
a property setter for file_type and a test for the setter.

Work towards tern-tools#480

Signed-off-by: Nisha K <[email protected]>
rnjudge pushed a commit to rnjudge/tern that referenced this issue Jun 5, 2020
This change uses the FileData class to store all the data scancode
collects. There are two scenarios this change tackles: one when
tern cannot collect file level data and hence a files list does not
exist within the ImageLayer object created during loading the image,
and one where tern has processed files in the image but doesn't have
any of the other data. In the case where we know the files in the
image layer, we introduce a function get_file_command() which will
get the correct command to invoke against a file rather than a
directory where the file exists.

In the analyze_layer() function, if the data retrieved after a scan
is a file, a FileData object is created and added to the ImageLayer
object. In the case of the analyze_file() function, for each FileData
object in the ImageLayer object, we run scancode and fill in the data
from the results.

This is work towards tern-tools#480

Signed-off-by: Nisha K <[email protected]>
rnjudge added a commit to rnjudge/tern that referenced this issue Jun 5, 2020
These changes are work towards tern-tools#480

This commit adds a setter for file_type in the
FileData class and also integrates
the scancode data into Tern's data model.
rnjudge added a commit to rnjudge/tern that referenced this issue Jun 5, 2020
This is work towards tern-tools#480

The first commit fixes execution of the scancode extension.
The second commit adds a new method to the FileData
class along with a test.
rnjudge pushed a commit to rnjudge/tern that referenced this issue Jun 5, 2020
load_from_cache is used in the default execution path to analyze
Docker container images and Dockerfiles. It is used to check if
a layer has already been looked at. If it is, rather than run the
analysis, we will load the data from the cache. Similarly,
save_to_cache, checks to see if there is any data collected, and
if there is, it will commit the data to the cache. This functionality
was only restricted to package level data. This change adds file
level data caching.

This change does the following:
1. Split up package level and file level cache checks and loading.
Combine these two functions into load_from_cache.
2. Move the file level data loading from the DockerImage class to
load_from_cache. Modify the class tests accordingly.
3. Check if there is package level data or if the files were analyzed
before saving to the cache.

Work towards tern-tools#480

Signed-off-by: Nisha K <[email protected]>
rnjudge pushed a commit to rnjudge/tern that referenced this issue Jun 5, 2020
Scancode takes a ridiculous amount of time to scan every file in
a full OS. It is much faster at making decisions on what to scan
when operating at the directory level, especially since it does
its own file level inventorying. To allow for the best possible
user experience, we will only resort to scanning at the directory
level. For this to work, we will also need to check if the files
were analyzed before.

This change introduces the following changes:
1. Remove the file level data collection function.
2. Introduce a collect_layer_data function which will collect the
file level data and return a list of FileData objects.
3. Introduce a add_file_data function which will make use of the
'merge' method in FileData to add the collected file level information
to the ImageLayer object.
4. Load and save to the cache and information that is collected.

Work towards tern-tools#480

Signed-off-by: Nisha K <[email protected]>
rnjudge pushed a commit to rnjudge/tern that referenced this issue Jun 5, 2020
This resolves tern-tools#480

The first commit modifies loading and saving to the cache
for file level data.
The second commit modifies the executor for scancode to
leverage file level data caching and collection for an image.

Signed-off-by: Nisha K [email protected]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant