Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting for graphic interface #9

Merged
merged 1 commit into from
Jul 21, 2021
Merged

Starting for graphic interface #9

merged 1 commit into from
Jul 21, 2021

Conversation

Chetabahana
Copy link
Collaborator

@Chetabahana Chetabahana commented Jul 20, 2021

Summary

Main Code: split.py

# extract melalui notebook jika diperlukan
# !unzip flowers-recognition.zip

import os

mypath= 'flowers/'

file_name = []
tag = []
full_path = []
for path, subdirs, files in os.walk(mypath):
    for name in files:
        full_path.append(os.path.join(path, name)) 
        tag.append(path.split('/')[-1])        
        file_name.append(name)

import pandas as pd

# memasukan variabel yang sudah dikumpulkan pada looping di atas menjadi sebuah dataframe agar rapih
df = pd.DataFrame({"path":full_path,'file_name':file_name,"tag":tag})
df.groupby(['tag']).size()

#tag
#daisy        1538
#dandelion    2110
#rose         1568
#sunflower    1468
#tulip        1968
#dtype: int64

#cek sample datanya
print(df.head())

#load library untuk train test split
from sklearn.model_selection import train_test_split

#variabel yang digunakan pada pemisahan data ini
X= df['path']
y= df['tag']

# split dataset awal menjadi data train dan test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.20, random_state=300)

# kemudian data test dibagi menjadi 2 sehingga menjadi data test dan data validation.
X_test, X_val, y_test, y_val = train_test_split(
    X_test, y_test, test_size=0.5, random_state=100)

# menyatukan kedalam masing-masing dataframe

df_tr = pd.DataFrame({'path':X_train
              ,'tag':y_train
             ,'set':'train'})

df_te = pd.DataFrame({'path':X_test
              ,'tag':y_test
             ,'set':'test'})

df_val = pd.DataFrame({'path':X_val
              ,'tag':y_val
             ,'set':'validation'})

print('train size', len(df_tr))
print('val size', len(df_te))
print('test size', len(df_val))

# melihat proporsi pada masing masing set apakah sudah ok atau masih ada yang ingin diubah
df_all = df_tr.append([df_te,df_val]).reset_index(drop=1)\

print('===================================================== \n')
print(df_all.groupby(['set','tag']).size(),'\n')

print('===================================================== \n')

#cek sample datanya
print(df_all.sample(3))
print('===================================================== \n')

# menghapus folder dataset jika diperlukan
#!rm -rf dataset/

import shutil
from tqdm import tqdm as tq

datasource_path = "flowers/"
dataset_path = "dataset/"

for index, row in tq(df_all.iterrows(), total=df_all.shape[0]):
    
    #detect filepath
    file_path = row['path']
    if os.path.exists(file_path) == False:
            file_path = os.path.join(datasource_path,row['tag'],row['image'].split('.')[0])            
    
    #make folder destination dirs
    if os.path.exists(os.path.join(dataset_path,row['set'],row['tag'])) == False:
        os.makedirs(os.path.join(dataset_path,row['set'],row['tag']))
    
    #define file dest
    destination_file_name = file_path.split(os.sep)[-1]
    file_dest = os.path.join(dataset_path,row['set'],row['tag'],destination_file_name)
    
    #copy file from source to dest
    if os.path.exists(file_dest) == False:
        #print(file_path,'►',file_dest)
        shutil.copy2(file_path,file_dest)

# Output progress bar (notebook):
# HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Checklist

  1. Privileged views and APIs are guarded by proper permission checks.
  2. All visible strings are translated with proper context.
  3. All data-formatting is locale-aware (dates, numbers, and so on).
  4. Database queries are optimized and the number of queries is constant.
  5. Database migration files are up to date.
  6. The changes are tested.
  7. The code is documented (docstrings, project documentation).
  8. GraphQL schema and type definitions are up to date.
  9. Changes are mentioned in the changelog.

Reference

Screenshots

Untitled
2021-07-20 (8)
2021-07-20 (9)
2021-07-20 (15)
2021-07-20 (14)
2021-07-20 (13)
2021-07-20 (12)
2021-07-20 (11)
2021-07-21 (8)
Untitled
Untitled
2021-07-21 (9)
2021-07-21 (10)
Untitled
Untitled

2021-07-21 (11)
2021-07-21 (13)
2021-07-21 (12)
2021-07-21 (14)
2021-07-21 (15)
2021-07-21 (16)
2021-07-21 (21)
2021-07-21 (18)
2021-07-21 (17)
2021-07-21 (22)
2021-07-21 (22)
2021-07-21 (24)
2021-07-21 (11)
2021-07-21 (30)
2021-07-21 (27)
2021-07-21 (26)
2021-07-21 (37)
2021-07-21 (36)
2021-07-21 (39)
2021-07-21 (38)
2021-07-21 (41)
2021-07-21 (42)
Untitled
2021-07-22 (31)
2021-07-22 (32)
2021-07-22 (33)

2021-07-22 (36)
2021-07-22 (35)
2021-07-22 (34)
2021-07-25 (6)
2021-07-25 (7)
2021-07-25 (8)
Untitled
2021-07-25 (9)
2021-07-25 (13)

@Chetabahana Chetabahana merged commit c86be58 into Chetabahana Jul 21, 2021
@Chetabahana
Copy link
Collaborator Author

Chetabahana commented Jul 27, 2021

Add some more:

2021-07-27 (9)
2021-07-27 (8)
IMG-20210728-WA0008
IMG-20210728-WA0010

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants