Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block-cache read Pipeline #1462

Merged
merged 69 commits into from
Aug 9, 2024
Merged
Show file tree
Hide file tree
Changes from 66 commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
c30ba14
spell check
May 7, 2024
08707ce
Merge branch 'main' of https://github.com/Azure/azure-storage-fuse in…
May 8, 2024
0852969
merge
May 10, 2024
4afd00b
Merge branch 'main' of https://github.com/Azure/azure-storage-fuse in…
ashruti-msft May 15, 2024
5006528
Merge branch 'main' of https://github.com/Azure/azure-storage-fuse in…
ashruti-msft May 16, 2024
0628e9e
Merge branch 'main' of https://github.com/Azure/azure-storage-fuse in…
ashruti-msft May 19, 2024
ed3729e
Merge branch 'main' of https://github.com/Azure/azure-storage-fuse in…
ashruti-msft Jun 19, 2024
fa04d94
Merge branch 'main' of https://github.com/Azure/azure-storage-fuse in…
ashruti-msft Jun 21, 2024
95447ff
Merge branch 'main' of https://github.com/Azure/azure-storage-fuse in…
ashruti-msft Jun 24, 2024
6052556
Merge branch 'main' of https://github.com/Azure/azure-storage-fuse in…
ashruti-msft Jul 10, 2024
c24b669
fix
ashruti-msft Jul 10, 2024
7e5001a
Merge branch 'main' of https://github.com/Azure/azure-storage-fuse in…
ashruti-msft Jul 11, 2024
551b9b1
Merge branch 'main' of https://github.com/Azure/azure-storage-fuse in…
ashruti-msft Jul 15, 2024
e033855
Merge branch 'main' of https://github.com/Azure/azure-storage-fuse in…
ashruti-msft Jul 16, 2024
54ed05b
first
ashruti-msft Jul 17, 2024
e91f289
returns eof if block is not filled
ashruti-msft Jul 17, 2024
009047b
Merge branch 'main' of https://github.com/Azure/azure-storage-fuse in…
ashruti-msft Jul 17, 2024
6e53a10
Merge branch 'main' of https://github.com/Azure/azure-storage-fuse in…
ashruti-msft Jul 23, 2024
962e1c2
Merge branch 'main' of https://github.com/Azure/azure-storage-fuse in…
ashruti-msft Jul 26, 2024
37527f6
corrected check condition for EOF
ashruti-msft Jul 26, 2024
a3d3e65
Test
ashruti-msft Jul 27, 2024
566bf9c
fixe return statement
ashruti-msft Jul 27, 2024
7b13164
fixed name of test
ashruti-msft Jul 27, 2024
be7393f
generaye parquet files
ashruti-msft Jul 27, 2024
b78bfbf
del
ashruti-msft Jul 27, 2024
42d44b1
fix
ashruti-msft Jul 27, 2024
f11ba00
fix
ashruti-msft Jul 27, 2024
8c122d0
install pip3
ashruti-msft Jul 29, 2024
6d5b9ca
Added UT for checking total bytes read
ashruti-msft Jul 29, 2024
9f2e31e
checking disk size
ashruti-msft Jul 29, 2024
e52942e
Added file size check
ashruti-msft Jul 29, 2024
023dd57
fix yaml
ashruti-msft Jul 29, 2024
0b2fe15
checking err in file size check
ashruti-msft Jul 29, 2024
c8f5bba
printing bc details
ashruti-msft Jul 29, 2024
5395f94
change
ashruti-msft Jul 29, 2024
e332be9
added run with diff blocksize
ashruti-msft Jul 29, 2024
4d6da89
fix yaml
ashruti-msft Jul 29, 2024
26831d5
fix yaml
ashruti-msft Jul 29, 2024
b83be5b
changed myfile to datafile
ashruti-msft Jul 29, 2024
8bc4d88
changed myfile to datafile
ashruti-msft Jul 29, 2024
1e08558
changed directory structure and removed unwanted commands
ashruti-msft Jul 29, 2024
08908bd
changed test for file size check
ashruti-msft Jul 29, 2024
43626ea
Clear files on work dir
ashruti-msft Jul 29, 2024
279c256
added check for EOF err and unmount and mount before checking md5sum
ashruti-msft Jul 30, 2024
e69f66f
Merge remote-tracking branch 'origin/main' into ashruti/bcReadCorr
ashruti-msft Jul 30, 2024
d62d7c3
spell check
ashruti-msft Jul 30, 2024
3cb4d42
changed lh to l exiting on fail size check
ashruti-msft Jul 30, 2024
2e51140
test
ashruti-msft Jul 30, 2024
e04d052
test
ashruti-msft Jul 30, 2024
fce85f4
filecache
ashruti-msft Jul 30, 2024
ed14c35
fixyaml
ashruti-msft Jul 30, 2024
52d2289
fixyaml
ashruti-msft Jul 30, 2024
f52bec3
fixyaml
ashruti-msft Jul 30, 2024
51648df
Merge branch 'main' of https://github.com/Azure/azure-storage-fuse in…
ashruti-msft Jul 30, 2024
3040e89
filesizecheck
ashruti-msft Jul 30, 2024
5f22f37
Merge remote-tracking branch 'origin/main' into ashruti/bcReadCorr
ashruti-msft Jul 31, 2024
6333d32
Merge branch 'main' of https://github.com/Azure/azure-storage-fuse in…
ashruti-msft Jul 31, 2024
7b0fec8
added byte validation
ashruti-msft Jul 31, 2024
cfc27fa
Merge branch 'main' of https://github.com/Azure/azure-storage-fuse in…
ashruti-msft Aug 1, 2024
7f09150
Merge branch 'main' of https://github.com/Azure/azure-storage-fuse in…
ashruti-msft Aug 2, 2024
fc9a481
Merge branch 'main' of https://github.com/Azure/azure-storage-fuse in…
ashruti-msft Aug 5, 2024
23d6002
Merge branch 'main' of https://github.com/Azure/azure-storage-fuse in…
ashruti-msft Aug 5, 2024
7e87244
Merge branch 'main' into ashruti/bcReadCorr
ashruti-msft Aug 5, 2024
ccd6711
removed file size and bytes check as not required if md5sum matches
ashruti-msft Aug 5, 2024
db6e051
add tests with block-cache direct io
ashruti-msft Aug 5, 2024
e1870a8
fix
ashruti-msft Aug 6, 2024
2c027f1
redirecting cat to temp file
ashruti-msft Aug 7, 2024
e25c87a
added disk cache test
ashruti-msft Aug 8, 2024
cb6058c
minor changes
ashruti-msft Aug 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
206 changes: 206 additions & 0 deletions azure-pipeline-templates/e2e-tests-block-cache-data-integrity.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
parameters:
- name: conf_template
type: string
- name: config_file
type: string
- name: container
type: string
- name: temp_dir
type: string
- name: mount_dir
type: string
- name: idstring
type: string
- name: adls
type: boolean
- name: account_name
type: string
- name: account_key
type: string
- name: account_type
type: string
- name: account_endpoint
- name: distro_name
type: string
- name: quick_test
type: boolean
default: true
- name: verbose_log
type: boolean
default: false
- name: clone
type: boolean
default: false
- name: block_size_mb
type: string
default: "8"

steps:
- script: |
$(WORK_DIR)/blobfuse2 gen-test-config --config-file=$(WORK_DIR)/testdata/config/azure_key.yaml --container-name=${{ parameters.container }} --temp-path=${{ parameters.temp_dir }} --output-file=${{ parameters.config_file }}
displayName: 'Create Config File for RW mount'
env:
NIGHTLY_STO_ACC_NAME: ${{ parameters.account_name }}
NIGHTLY_STO_ACC_KEY: ${{ parameters.account_key }}
ACCOUNT_TYPE: ${{ parameters.account_type }}
ACCOUNT_ENDPOINT: ${{ parameters.account_endpoint }}
VERBOSE_LOG: ${{ parameters.verbose_log }}
continueOnError: false

- script: |
cat ${{ parameters.config_file }}
displayName: 'Print config file'

- template: 'mount.yml'
parameters:
working_dir: $(WORK_DIR)
mount_dir: ${{ parameters.mount_dir }}
temp_dir: ${{ parameters.temp_dir }}
prefix: ${{ parameters.idstring }}
mountStep:
script: |
$(WORK_DIR)/blobfuse2 mount ${{ parameters.mount_dir }} --config-file=${{ parameters.config_file }} --default-working-dir=$(WORK_DIR) --file-cache-timeout=3200

- script: |
for i in $(seq 1 10); do echo $(shuf -i 0-4294967296 -n 1); done | parallel --will-cite -j 5 'head -c {} < /dev/urandom > ${{ parameters.mount_dir }}/datafiles_{}'
for i in {1,2,3,4,5,6,7,8,9,10,20,30,50,100,200}; do echo $i; done | parallel --will-cite -j 5 'head -c {}M < /dev/urandom > ${{ parameters.mount_dir }}/mixedfiles_{}.txt'
for i in {1,2,3,4,5,6,7,8,9,10,20,30,50,100,200}; do echo $i; done | parallel --will-cite -j 5 'head -c {}M < /dev/urandom > ${{ parameters.mount_dir }}/mixedfiles_{}.png'
cd ${{ parameters.mount_dir }}
python3 $(WORK_DIR)/testdata/scripts/generate-parquet-files.py
ls -l ${{ parameters.mount_dir }}/mixedfiles_*
ls -l ${{ parameters.mount_dir }}/datafiles_*
displayName: 'Generate data with File-Cache'

- script: |
md5sum ${{ parameters.mount_dir }}/datafiles_* > $(WORK_DIR)/md5sum_original_files.txt
md5sum ${{ parameters.mount_dir }}/mixedfiles_* >> $(WORK_DIR)/md5sum_original_files.txt
displayName: 'Generate md5Sum with File-Cache'

- script: |
echo "----------------------------------------------"
ls -l ${{ parameters.mount_dir }}
displayName: 'Print contents of File-Cache'

- script: |
$(WORK_DIR)/blobfuse2 unmount all
displayName: 'Unmount RW mount'

- script: |
cd $(WORK_DIR)
$(WORK_DIR)/blobfuse2 gen-test-config --config-file=$(WORK_DIR)/testdata/config/azure_key_bc.yaml --container-name=${{ parameters.container }} --temp-path=${{ parameters.temp_dir }} --output-file=${{ parameters.config_file }}
sed -i 's/block-size-mb: [0-9]*/block-size-mb: ${{ parameters.block_size_mb }}/' ${{ parameters.config_file }}
displayName: 'Create Config File for RO mount'
env:
NIGHTLY_STO_ACC_NAME: ${{ parameters.account_name }}
NIGHTLY_STO_ACC_KEY: ${{ parameters.account_key }}
ACCOUNT_TYPE: ${{ parameters.account_type }}
ACCOUNT_ENDPOINT: ${{ parameters.account_endpoint }}
VERBOSE_LOG: ${{ parameters.verbose_log }}
continueOnError: false

- script:
cat ${{ parameters.config_file }}
displayName: 'Print config file'

- template: 'mount.yml'
parameters:
working_dir: $(WORK_DIR)
mount_dir: ${{ parameters.mount_dir }}
temp_dir: ${{ parameters.temp_dir }}
prefix: ${{ parameters.idstring }}
ro_mount: true
mountStep:
script: |
$(WORK_DIR)/blobfuse2 mount ${{ parameters.mount_dir }} --config-file=${{ parameters.config_file }} --default-working-dir=$(WORK_DIR) -o ro

- script: |
echo "----------------------------------------------"
ls -l ${{ parameters.mount_dir }}/mixedfiles*
ls -l ${{ parameters.mount_dir }}/datafiles*
displayName: 'Print contents of Block-Cache'

- script: |
md5sum ${{ parameters.mount_dir }}/datafiles_* > $(WORK_DIR)/md5sum_block_cache.txt
md5sum ${{ parameters.mount_dir }}/mixedfiles_* >> $(WORK_DIR)/md5sum_block_cache.txt
displayName: 'Generate md5Sum with Block-Cache'

- script: |
$(WORK_DIR)/blobfuse2 unmount all
displayName: 'Unmount RW mount'

- script: |
echo "----------------------------------------------"
cat $(WORK_DIR)/md5sum_original_files.txt
cat $(WORK_DIR)/md5sum_original_files.txt | cut -d " " -f1 > $(WORK_DIR)/md5sum_original_files.txt
echo "----------------------------------------------"
cat $(WORK_DIR)/md5sum_block_cache.txt
cat $(WORK_DIR)/md5sum_block_cache.txt | cut -d " " -f1 > $(WORK_DIR)/md5sum_block_cache.txt
echo "----------------------------------------------"
diff $(WORK_DIR)/md5sum_original_files.txt $(WORK_DIR)/md5sum_block_cache.txt
if [ $? -ne 0 ]; then
exit 1
fi
displayName: 'Compare md5Sum'

- script: |
cd $(WORK_DIR)
$(WORK_DIR)/blobfuse2 gen-test-config --config-file=$(WORK_DIR)/testdata/config/azure_key_bc.yaml --container-name=${{ parameters.container }} --temp-path=${{ parameters.temp_dir }} --output-file=${{ parameters.config_file }}
sed -i 's/block-size-mb: [0-9]*/block-size-mb: ${{ parameters.block_size_mb }}/' ${{ parameters.config_file }}
sed -i '/^libfuse:/a \ direct-io: true' ${{ parameters.config_file }}
displayName: 'Create Config File for RO mount with direct-io'
env:
NIGHTLY_STO_ACC_NAME: ${{ parameters.account_name }}
NIGHTLY_STO_ACC_KEY: ${{ parameters.account_key }}
ACCOUNT_TYPE: ${{ parameters.account_type }}
ACCOUNT_ENDPOINT: ${{ parameters.account_endpoint }}
VERBOSE_LOG: ${{ parameters.verbose_log }}
continueOnError: false

- script:
cat ${{ parameters.config_file }}
displayName: 'Print config file'

- template: 'mount.yml'
parameters:
working_dir: $(WORK_DIR)
mount_dir: ${{ parameters.mount_dir }}
temp_dir: ${{ parameters.temp_dir }}
prefix: ${{ parameters.idstring }}
ro_mount: true
mountStep:
script: |
$(WORK_DIR)/blobfuse2 mount ${{ parameters.mount_dir }} --config-file=${{ parameters.config_file }} --default-working-dir=$(WORK_DIR) -o ro

- script: |
echo "----------------------------------------------"
ls -l ${{ parameters.mount_dir }}
displayName: 'Print contents of Block-Cache'

- script: |
md5sum ${{ parameters.mount_dir }}/datafiles_* > $(WORK_DIR)/md5sum_block_cache_direct_io.txt
md5sum ${{ parameters.mount_dir }}/mixedfiles_* >> $(WORK_DIR)/md5sum_block_cache_direct_io.txt
displayName: 'Generate md5Sum with Block-Cache Direct-IO'

- script: |
$(WORK_DIR)/blobfuse2 unmount all
displayName: 'Unmount RW mount'

- script: |
echo "----------------------------------------------"
cat $(WORK_DIR)/md5sum_original_files.txt
cat $(WORK_DIR)/md5sum_original_files.txt | cut -d " " -f1 > $(WORK_DIR)/md5sum_original_files.txt
echo "----------------------------------------------"
cat $(WORK_DIR)/md5sum_block_cache_direct_io.txt
cat $(WORK_DIR)/md5sum_block_cache_direct_io.txt | cut -d " " -f1 > $(WORK_DIR)/md5sum_block_cache_direct_io.txt
echo "----------------------------------------------"
diff $(WORK_DIR)/md5sum_original_files.txt $(WORK_DIR)/md5sum_block_cache_direct_io.txt
if [ $? -ne 0 ]; then
exit 1
fi
displayName: 'Compare md5Sum with Block-Cache Direct-IO'

- template: 'cleanup.yml'
parameters:
working_dir: $(WORK_DIR)
mount_dir: ${{ parameters.mount_dir }}
temp_dir: ${{ parameters.temp_dir }}
98 changes: 98 additions & 0 deletions blobfuse2-nightly.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1407,6 +1407,104 @@ stages:
temp_dir: $(TEMP_DIR)
mount_dir: $(MOUNT_DIR)

- stage: BlockCacheDataIntegrityValidation
jobs:
# Ubuntu Tests
- job: Set_1
timeoutInMinutes: 300
strategy:
matrix:
Ubuntu-22:
souravgupta-msft marked this conversation as resolved.
Show resolved Hide resolved
AgentName: 'blobfuse-ubuntu22'
containerName: 'test-cnt-ubn-22'
adlsSas: $(AZTEST_ADLS_CONT_SAS_UBN_22)
fuselib: 'libfuse3-dev'
tags: 'fuse3'

pool:
name: "blobfuse-ubuntu-pool"
demands:
- ImageOverride -equals $(AgentName)

variables:
- group: NightlyBlobFuse
- name: ROOT_DIR
value: "/usr/pipeline/workv2"
- name: WORK_DIR
value: "/usr/pipeline/workv2/go/src/azure-storage-fuse"
- name: skipComponentGovernanceDetection
value: true
- name: MOUNT_DIR
value: "/usr/pipeline/workv2/blob_mnt"
- name: TEMP_DIR
value: "/usr/pipeline/workv2/temp"
- name: BLOBFUSE2_CFG
value: "/usr/pipeline/workv2/blobfuse2.yaml"
- name: GOPATH
value: "/usr/pipeline/workv2/go"

steps:
- template: 'azure-pipeline-templates/setup.yml'
parameters:
tags: $(tags)
installStep:
script: |
sudo apt-get update --fix-missing
sudo apt update
sudo apt-get install cmake gcc $(fuselib) git parallel -y
if [ $(tags) == "fuse2" ]; then
sudo apt-get install fuse -y
else
sudo apt-get install fuse3 -y
fi
displayName: 'Install fuse'

- script: |
sudo apt-get install python3-setuptools -y
sudo apt install python3-pip -y
sudo pip3 install pandas numpy pyarrow fastparquet
displayName: 'Install Python Packages'

- template: 'azure-pipeline-templates/e2e-tests-block-cache-data-integrity.yml'
parameters:
conf_template: azure_key.yaml
config_file: $(BLOBFUSE2_CFG)
container: $(containerName)
idstring: Block_Blob
adls: false
account_name: $(NIGHTLY_STO_BLOB_ACC_NAME)
account_key: $(NIGHTLY_STO_BLOB_ACC_KEY)
account_type: block
account_endpoint: https://$(NIGHTLY_STO_BLOB_ACC_NAME).blob.core.windows.net
distro_name: $(AgentName)
quick_test: false
verbose_log: ${{ parameters.verbose_log }}
clone: true
# TODO: These can be removed one day and replace all instances of ${{ parameters.temp_dir }} with $(TEMP_DIR) since it is a global variable
temp_dir: $(TEMP_DIR)
mount_dir: $(MOUNT_DIR)
block_size_mb: "1"

- template: 'azure-pipeline-templates/e2e-tests-block-cache-data-integrity.yml'
parameters:
conf_template: azure_key.yaml
config_file: $(BLOBFUSE2_CFG)
container: $(containerName)
idstring: Block_Blob
adls: false
account_name: $(NIGHTLY_STO_BLOB_ACC_NAME)
account_key: $(NIGHTLY_STO_BLOB_ACC_KEY)
account_type: block
account_endpoint: https://$(NIGHTLY_STO_BLOB_ACC_NAME).blob.core.windows.net
distro_name: $(AgentName)
quick_test: false
verbose_log: ${{ parameters.verbose_log }}
clone: true
# TODO: These can be removed one day and replace all instances of ${{ parameters.temp_dir }} with $(TEMP_DIR) since it is a global variable
temp_dir: $(TEMP_DIR)
mount_dir: $(MOUNT_DIR)
block_size_mb: "8"

- stage: FNSDataValidation
jobs:
# Ubuntu Tests
Expand Down
15 changes: 15 additions & 0 deletions testdata/scripts/generate-parquet-files.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import pandas as pd
import numpy as np
import random

# Function to generate a random number of rows for the DataFrame
def random_row_count(min_rows=10, max_rows=1000):
return random.randint(min_rows, max_rows)

# Generate 10 Parquet files with varying sizes
for i in range(10):
row_count = random_row_count()
df = pd.DataFrame(np.random.randn(row_count, 4), columns=list('ABCD'))
file_name = f'mixedfiles_{i}.parquet'
df.to_parquet(file_name, index=False)
print(f'Created {file_name} with {row_count} rows')
Loading