Map scancode's data into tern's data model #480

nishakm · 2019-10-22T19:14:53Z

Description
#474 creates the plugin but it will just print the output to console and so it cannot be used in any of the formats. Fill out Tern's data properties based on scancode's data.

To Do
This will involve just populating the image object's properties with scancode's data.

Background
This issue depends on #476

Super Issues
#284

We need the ability to set the file_type after a FileData object is instantiated as we may be able to get that data later. Added a property setter for file_type and a test for the setter. Work towards tern-tools#480 Signed-off-by: Nisha K <[email protected]>

This change uses the FileData class to store all the data scancode collects. There are two scenarios this change tackles: one when tern cannot collect file level data and hence a files list does not exist within the ImageLayer object created during loading the image, and one where tern has processed files in the image but doesn't have any of the other data. In the case where we know the files in the image layer, we introduce a function get_file_command() which will get the correct command to invoke against a file rather than a directory where the file exists. In the analyze_layer() function, if the data retrieved after a scan is a file, a FileData object is created and added to the ImageLayer object. In the case of the analyze_file() function, for each FileData object in the ImageLayer object, we run scancode and fill in the data from the results. This is work towards tern-tools#480 Signed-off-by: Nisha K <[email protected]>

These changes are work towards #480 This commit adds a setter for file_type in the FileData class and also integrates the scancode data into Tern's data model.

This is work towards #480 The first commit fixes execution of the scancode extension. The second commit adds a new method to the FileData class along with a test.

load_from_cache is used in the default execution path to analyze Docker container images and Dockerfiles. It is used to check if a layer has already been looked at. If it is, rather than run the analysis, we will load the data from the cache. Similarly, save_to_cache, checks to see if there is any data collected, and if there is, it will commit the data to the cache. This functionality was only restricted to package level data. This change adds file level data caching. This change does the following: 1. Split up package level and file level cache checks and loading. Combine these two functions into load_from_cache. 2. Move the file level data loading from the DockerImage class to load_from_cache. Modify the class tests accordingly. 3. Check if there is package level data or if the files were analyzed before saving to the cache. Work towards tern-tools#480 Signed-off-by: Nisha K <[email protected]>

Scancode takes a ridiculous amount of time to scan every file in a full OS. It is much faster at making decisions on what to scan when operating at the directory level, especially since it does its own file level inventorying. To allow for the best possible user experience, we will only resort to scanning at the directory level. For this to work, we will also need to check if the files were analyzed before. This change introduces the following changes: 1. Remove the file level data collection function. 2. Introduce a collect_layer_data function which will collect the file level data and return a list of FileData objects. 3. Introduce a add_file_data function which will make use of the 'merge' method in FileData to add the collected file level information to the ImageLayer object. 4. Load and save to the cache and information that is collected. Work towards tern-tools#480 Signed-off-by: Nisha K <[email protected]>

This resolves #480 The first commit modifies loading and saving to the cache for file level data. The second commit modifies the executor for scancode to leverage file level data caching and collection for an image. Signed-off-by: Nisha K [email protected]

We need the ability to set the file_type after a FileData object is instantiated as we may be able to get that data later. Added a property setter for file_type and a test for the setter. Work towards tern-tools#480 Signed-off-by: Nisha K <[email protected]>

This change uses the FileData class to store all the data scancode collects. There are two scenarios this change tackles: one when tern cannot collect file level data and hence a files list does not exist within the ImageLayer object created during loading the image, and one where tern has processed files in the image but doesn't have any of the other data. In the case where we know the files in the image layer, we introduce a function get_file_command() which will get the correct command to invoke against a file rather than a directory where the file exists. In the analyze_layer() function, if the data retrieved after a scan is a file, a FileData object is created and added to the ImageLayer object. In the case of the analyze_file() function, for each FileData object in the ImageLayer object, we run scancode and fill in the data from the results. This is work towards tern-tools#480 Signed-off-by: Nisha K <[email protected]>

These changes are work towards tern-tools#480 This commit adds a setter for file_type in the FileData class and also integrates the scancode data into Tern's data model.

This is work towards tern-tools#480 The first commit fixes execution of the scancode extension. The second commit adds a new method to the FileData class along with a test.

load_from_cache is used in the default execution path to analyze Docker container images and Dockerfiles. It is used to check if a layer has already been looked at. If it is, rather than run the analysis, we will load the data from the cache. Similarly, save_to_cache, checks to see if there is any data collected, and if there is, it will commit the data to the cache. This functionality was only restricted to package level data. This change adds file level data caching. This change does the following: 1. Split up package level and file level cache checks and loading. Combine these two functions into load_from_cache. 2. Move the file level data loading from the DockerImage class to load_from_cache. Modify the class tests accordingly. 3. Check if there is package level data or if the files were analyzed before saving to the cache. Work towards tern-tools#480 Signed-off-by: Nisha K <[email protected]>

Scancode takes a ridiculous amount of time to scan every file in a full OS. It is much faster at making decisions on what to scan when operating at the directory level, especially since it does its own file level inventorying. To allow for the best possible user experience, we will only resort to scanning at the directory level. For this to work, we will also need to check if the files were analyzed before. This change introduces the following changes: 1. Remove the file level data collection function. 2. Introduce a collect_layer_data function which will collect the file level data and return a list of FileData objects. 3. Introduce a add_file_data function which will make use of the 'merge' method in FileData to add the collected file level information to the ImageLayer object. 4. Load and save to the cache and information that is collected. Work towards tern-tools#480 Signed-off-by: Nisha K <[email protected]>

This resolves tern-tools#480 The first commit modifies loading and saving to the cache for file level data. The second commit modifies the executor for scancode to leverage file level data caching and collection for an image. Signed-off-by: Nisha K [email protected]

nishakm added this to the Near Future milestone Oct 22, 2019

nishakm modified the milestones: Near Future, Release 1.1.0 Nov 7, 2019

This was referenced Feb 27, 2020

Add a list of FileData objects to the ImageLayer class #554

Closed

Add a list of FileData objects to the Package class #559

Closed

Extract file data attributes #557

Merged

nishakm mentioned this issue Mar 5, 2020

Integrate scancode data #572

Merged

rnjudge added a commit that referenced this issue Mar 6, 2020

Integrate scancode data

c9ff72f

These changes are work towards #480 This commit adds a setter for file_type in the FileData class and also integrates the scancode data into Tern's data model.

This was referenced Mar 9, 2020

Allow Tern to save to and load from FileData information from the cache #574

Closed

Update FileData object and scancode executor #577

Closed

Update FileData object and scancode executor #578

Merged

rnjudge added a commit that referenced this issue Mar 13, 2020

Update FileData object and scancode executor

f0e5cae

This is work towards #480 The first commit fixes execution of the scancode extension. The second commit adds a new method to the FileData class along with a test.

nishakm mentioned this issue Mar 16, 2020

Integrate file level data from scancode #582

Merged

nishakm closed this as completed in #582 Mar 18, 2020

rnjudge added a commit to rnjudge/tern that referenced this issue Jun 5, 2020

Integrate scancode data

b3bfa1d

These changes are work towards tern-tools#480 This commit adds a setter for file_type in the FileData class and also integrates the scancode data into Tern's data model.

rnjudge mentioned this issue Mar 11, 2021

Map cve_bin_tool data into Tern's data model #909

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Map scancode's data into tern's data model #480

Map scancode's data into tern's data model #480

nishakm commented Oct 22, 2019 •

edited

Loading

Map scancode's data into tern's data model #480

Map scancode's data into tern's data model #480

Comments

nishakm commented Oct 22, 2019 • edited Loading

nishakm commented Oct 22, 2019 •

edited

Loading