Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganize be v1.0 #54

Open
wants to merge 18 commits into
base: 34-improvement-backend-improvements-for-typing-tests-modularity-and-pathlib
Choose a base branch
from

Conversation

prakash-2002-ramanathan

Hello @omeryusufyagci

This PR addresses #34.
Please see if the current work is satisfactory

Possible issues:

  • I have changed the config.json
  • Changed some names
  • Changed Logs
  • JSON message structure
  • Added new variable config_file_path in MediaHandler main to pass the config.json file directory.

I have not removed the original app.py. Not updated the README. Have not updated the requiremets.pip, some minor work is pending.

*Optimization add type hints to all the functions

*Optimization replace OS path function with pathlib functions

<Signed off by Prakash([email protected])>
*Optimization: Create new folder and create app.py, mediahandler.py
 response_handler.py and utils.py

*Modify: Modify main.cpp to take the working directory into
 consideration, and avoid missing config.json

 *Modify: modify config.json to point towards the correct directory

<Signed off by Praksh([email protected])>
*New : Create different function in response_handler to ensure
consistent JSON communicatino

*Change: In C++ use the std:out to return JSON object to backend

*Change: Move all the logs to the std:err, to ensure std:out is only
used for communication

*Future: Keep the original app.py redundantly for future reference

*Modify: Modify the frontend to be ensure compatibility with the New
JSON communication scheme
*Add: Add unit tests for all static functions in utils.py and
media_handler.py

*improve: Improve the logs, remove irrelevant logs

*refactor: remove unused import statements, change structure

<Signed off by Prakash([email protected])>
*Task: Merge all the changes
Copy link
Owner

@omeryusufyagci omeryusufyagci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @prakash-2002-ramanathan, thanks for this PR!

I skimmed through the changes, and I'd like to touch on a few points:

  1. CLI nature of core: For the moment, we need to keep this capability. There is an open issue assigned to build a proper CLI for the core, but until that’s addressed, it needs to remain independently usable.
  2. Keeping main clean: main should serve strictly as a simple entry point to the application, without containing any processing logic. Please encapsulate these into a class like CommunicationHandler and simply use its methods on main. Here's an example of what I had on my mind:

This:

    if (argc != 2) {
        std::cerr << "Usage: " << argv[0] << " <video_file_path>" << std::endl;
        return 1;
    }

would become:

    if (argc != 2) {
        CommunicationHandler::handleError("Usage: ...");
    }

and then within handleError() you'd call a member function to generate a json response and exit with a sensible code. This would allow us to abstract communication handling and improve over time while keeping the public API stable.

I understand this is not true IPC, but as I explained in #34, the immediate goal is to eliminate the existing error-prone implementation. Improving how we handle responses within core will already be a big step in the right direction.

  1. Config file path changes: I didn't quite understand why adding the ../ before MediaProcessor paths were needed there. Could you clarify why this was necessary?

@omeryusufyagci omeryusufyagci added feature New feature or request backend iteration Requires further iteration for approval core Media processing component, the core C++ binary. labels Oct 23, 2024
@prakash-2002-ramanathan
Copy link
Author

@omeryusufyagci, The file path in python is relative to the python file. But since the MediaHandler is outside the backend directory. We are going to the previous directory where the Mediahandler is present. This seemed like a simple fix

@omeryusufyagci
Copy link
Owner

omeryusufyagci commented Oct 28, 2024

Hi @prakash-2002-ramanathan, I wanted to check with you how you're progressing.

Although I appreciate the effort directly on the core, #34 mainly addresses the backend, where additions like the response handler will be ready in place, pending an update of the core for full integration.

You can still tackle the update of the core, but I'd propose to do it on a separate PR, and focus here fully on the backend improvements. The backend should function the same as it does right now, while the necessary handlers implemented and ready for the integration.

Tests are a big part of this PR, and would be quite impactful as we get ready to make larger changes.

Please let me know if this sounds good to you. Thanks!

@omeryusufyagci omeryusufyagci removed the core Media processing component, the core C++ binary. label Oct 28, 2024
@prakash-2002-ramanathan
Copy link
Author

Hi @omeryusufyagci
Yeah, that sounds good. I have already implemented changes in the core.
Is that fine, just over 50 lines.
I have written communication handlers for the core, but not used them
I have written communication handlers for the BE, and have used them. Except for the communication with the core

@omeryusufyagci
Copy link
Owner

Hi @prakash-2002-ramanathan, that's great to hear!

I would still like you to submit your work on the core as a separate PR though. Let's evaluate the work you've done on the backend, finalize, merge and then take a look at your work on the core separately. This is also to expedite the addition of backend tests in preparation of our first release.

Finally, could you please target this PR to the dedicated branch opened under #34? Thanks!

Looking forward to see the changes!

*(deletion) Original app.py

*(Updation) requirements

*(modification) make changes to preserve the CLI nature and only make
the backend changes
*(Fix) Add init files in python directory, so that the directories are
considered as modules

*(Chek) Make sure that the backend is functioning just like how it used
to
@prakash-2002-ramanathan
Copy link
Author

Hi @omeryusufyagci , have stashed all the changes made in the core, only the changes in backend have been pushed.
The CLI nature of the core and backend communication is currently preserved.
Fixed the imports and removed the old app.py

You can run the new app by going to the backend directory and then running app.py.

@prakash-2002-ramanathan prakash-2002-ramanathan changed the base branch from main to 34-improvement-backend-improvements-for-typing-tests-modularity-and-pathlib October 31, 2024 19:01
Copy link
Owner

@omeryusufyagci omeryusufyagci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @prakash-2002-ramanathan, thanks for the changes!

We're on the right track, but there are still a few things to address. Also your changes on the core are still visible on the PR.

I only reviewed the backend changes, and will review the tests once the backend part is finished.

In addition to the review comments, I'd also like you to keep the app.py at project root, but this will be a minimal entry point for the backend. So 2-3 lines just to instantiate thje backend service and run it. This is to ensure we keep our current ease of use. Please feel free to make the necessary changes to faciliatate this as well.

Many thanks!

backend/media_handler.py Outdated Show resolved Hide resolved
Comment on lines 67 to 82
# Parse the output to get the processed video path (TODO: encapsulate)
for line in result.stdout.splitlines():
if "Video processed successfully" in line:
processed_video_path = line.split(": ", 1)[1].strip()

# Remove any surrounding quotes (TODO: encapsulate)
if processed_video_path.startswith('"') and processed_video_path.endswith('"'):
processed_video_path = processed_video_path[1:-1]
processed_video_path = str(Path(processed_video_path).resolve())
logging.info(f"Processed video path returned: {processed_video_path}")
return processed_video_path

return None
except Exception as e:
logging.error(f"Error running C++ binary: {e}")
return None
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of the backend refactor, please address the 2 todo messages in the code. Mainly we want to organize common utils under a utils class/file.

return jsonify(response), 200

@staticmethod
def error(message: str, status_code: int = 400) ->tuple[Response, int]:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe here the return type should be simply a Response type object. Flask understands it carries an error code, but it's not necessary to hint int on the return type.

Comment on lines 8 to 15
def success(message: str, data: dict = None) ->Response:
"""Return a consistent JSON response for success."""
response = {
"status": "success",
"message": message,
"data": data or {}
}
return jsonify(response), 200
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method names should be descriptive. Please use generate_success_response(), and similar for others. Thinking from the users perspective can help; ResponseHandler.succes() vs ResponseHandler.generate_success_response()

Comment on lines 27 to 36
@staticmethod
def core_data_passer(message: str, data: dict = None) -> str:
"""Return a JSON-formatted response for core components with provided data."""
core_response = {
"status": "success",
"message": message,
"data": data or {}
}
return dumps(core_response)
#You can only use jasonify for http communication to send the data as a string use json.dumps
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make this a bit more clearer? Even for me it's abit difficult to understand what our intention is with this method. Is this a bidirectional serializer, or?.. Both in the method name and the docstring we should be able to immediately understand the purpose of this method.

config.json Outdated
Comment on lines 2 to 5
"deep_filter_path": "MediaProcessor/res/deep-filter-0.5.6-x86_64-unknown-linux-musl",
"deep_filter_tarball_path": "MediaProcessor/res/DeepFilterNet3_ll_onnx.tar.gz",
"deep_filter_encoder_path": "MediaProcessor/res/DeepFilterNet3_ll_onnx/tmp/export/enc.onnx",
"deep_filter_decoder_path": "MediaProcessor/res/DeepFilterNet3_ll_onnx/tmp/export/df_dec.onnx",
"deep_filter_path": "../MediaProcessor/res/deep-filter-0.5.6-x86_64-unknown-linux-musl",
"deep_filter_tarball_path": "../MediaProcessor/res/DeepFilterNet3_ll_onnx.tar.gz",
"deep_filter_enoder_path": "../MediaProcessor/res/DeepFilterNet3_ll_onnx/tmp/export/enc.onnx",
"deep_filter_decoder_path": "../MediaProcessor/res/DeepFilterNet3_ll_onnx/tmp/export/df_dec.onnx",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be handled on the backend side. The config represents the path relation of these files wrt the project root.

@prakash-2002-ramanathan
Copy link
Author

Hi @omeryusufyagci , fixed all your requested changes. Please have a look

Copy link
Owner

@omeryusufyagci omeryusufyagci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @prakash-2002-ramanathan, thanks for the changes. I took a look and there are a few things I'd like you to address. Could you please take a look? Thanks a lot!

Comment on lines 7 to 25
@staticmethod
def generate_success_response(message: str, data: dict = None) ->Response:
"""Return a consistent JSON response for success."""
response = {
"status": "success",
"message": message,
"data": data or {}
}
return jsonify(response), 200

@staticmethod
def generate_error_response(message: str, status_code: int = 400) ->Response:
"""Return a consistent JSON response for errors."""
response = {
"status": "error",
"message": message,
"data": {}
}
return jsonify(response), status_code
Copy link
Owner

@omeryusufyagci omeryusufyagci Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here please add a generate_response() function to hold shared behavior, that is, generating a response.
That new method should take in any data available with the status code, etc. So you would call the generate_response() from generate_success_response(), etc.

Comment on lines 1 to 5
`cd /home/prakash/fast-music-remover`

`PYTHONPATH=$(pwd) python -m unittest discover -s backend/tests -p 'test_*.py`

Run the above two commands to test the code
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should use pytest here, also as a side note, usually you would do:

cd /home/`whoami`/...

which would work for anyone using a Unix-based OS.

Let's go with pytest regardless

if (data.status === "completed") {
document.getElementById("videoPlayer").src = data.video_url;
if (data.status === "success") {
document.getElementById("videoPlayer").src = data.data.video_url;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data.data typo?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No @omeryusufyagci , in the response object that is being returned, the video_url is wrapped inside another data object. Do you have any suggestions on how to improve this ?

I have also asked questions on your other comments. Please answer them too

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also because data is used both in the js and the python side. You could rename the data field of the json object to payload which is quite std to use.

config.json Outdated
@@ -1,7 +1,7 @@
{
"deep_filter_path": "MediaProcessor/res/deep-filter-0.5.6-x86_64-unknown-linux-musl",
"deep_filter_tarball_path": "MediaProcessor/res/DeepFilterNet3_ll_onnx.tar.gz",
"deep_filter_encoder_path": "MediaProcessor/res/DeepFilterNet3_ll_onnx/tmp/export/enc.onnx",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

app.py Show resolved Hide resolved
@@ -58,4 +58,4 @@ int main(int argc, char* argv[]) {

std::cout << "Video processed successfully: " << processedMediaPath << std::endl;
return 0;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please run the clang (with the file available on project root, i.e. .clang-format) for MediaProcessor/src or setup the pre-commit hooks which would do this for you

Comment on lines 20 to 39
@patch("backend.media_handler.yt_dlp.YoutubeDL")
def test_download_media(self, mock_yt_dlp):
# Mock YoutubeDL instance
mock_ydl_instance = MagicMock()

# Mock extract_info for download=False and download=True
mock_ydl_instance.extract_info.side_effect = [
{"title": "Test Video", "ext": "mp4"}, # For download=False
{"ext": "mp4"} # For download=True
]

# Set the return value when instantiating YoutubeDL
mock_yt_dl_context_manager = MagicMock()
mock_yt_dl_context_manager.__enter__.return_value = mock_ydl_instance
mock_yt_dlp.return_value = mock_yt_dl_context_manager

# Test download_media method
result = MediaHandler.download_media(self.video_url, self.base_directory)
expected_file = Path(self.base_directory) / "Test_Video.mp4"
self.assertEqual(result, str(expected_file.resolve()))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes due to rate limiting yt-dlp takes minutes to finish. Please add a timeout, e.g. 10 seconds, here for getting the info which should be rather fast irrespective of media size.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey do you want to add the time out on the main code too ?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes would be good

Comment on lines 41 to 55
@patch("backend.media_handler.subprocess.run")
def test_process_with_media_processor_success(self, mock_subprocess_run):
# Mock subprocess.run to simulate a successful response
mock_subprocess_run.return_value = MagicMock(
returncode=0,
stdout="Video processed successfully: /path/to/processed_video.mp4",
stderr=""

)
# Test process_with_media_processor
result = MediaHandler.process_with_media_processor(
self.video_path, self.base_directory, self.config_path
)

self.assertEqual(result, "/path/to/processed_video.mp4")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's happening here with the assertion on l.55?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see here for an example on how a similar test was made for the core. You could reuse the same utility, just need to convert it to python code.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand, would appreciate if you could elaborate. In the files that you have attached, you did not mock anything, you created a processed file and compared byte by byte is that what you expect ?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, /path/to/... has to be a real path first. Regarding what we did in the core: we have a known truth, i.e. a processed video that we manually verified by watching it as a reference of what proper filtering is doing to a known source. We then do a byte by byte comparison with that known truth to see if functionality is in tact.

Byte-by-byte is definitely not the best way to do this, but it was a fast solution that actually works with a good tolerance. To be improved, but better than nothing

def setUp(self):
"""Set up base directory and mock paths for testing."""
self.base_directory = tempfile.mkdtemp()
self.video_url = "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't place data inside of the code. That later would require code updates for simple test data changes. These should be placed in appropriate test data files. You can see what has been done for the MediaProcessor for examples.

Comment on lines 13 to 14
self.config_path = "/path/to/config.json"
self.video_path = "/path/to/video.mp4"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what these mean?

@prakash-2002-ramanathan
Copy link
Author

prakash-2002-ramanathan commented Nov 14, 2024

Hey @omeryusufyagci , could you explain how do you want this
download_media function to be tested. Is it ok for you if I mock the yt_dlp.YoutubeDL() function? I tried to download the youtube video and check it with a previously downloaded video. But sometimes the byte by byte comparison causes problem., since the content on youtube is dynamic and changes everytime when it's downloaded

@omeryusufyagci
Copy link
Owner

Hey @omeryusufyagci , could you explain how do you want this download_media function to be tested. Is it ok for you if I mock the yt_dlp.YoutubeDL() function? I tried to download the youtube video and check it with a previously downloaded video. But sometimes the byte by byte comparison causes problem., since the content on youtube is dynamic and changes everytime when it's downloaded

Hey @prakash-2002-ramanathan, you could try to test via the extracted metadata instead of the video itself to test if yt-dlp works fine. If we can get the metadata, we should likely be OK. You could, however, mock it as well. Byte-by-byte is not the best, I know... I'll open an issue about that. Thanks!

@prakash-2002-ramanathan
Copy link
Author

Hello @omeryusufyagci, I have addressed all your requested changes. Please have a look

Copy link
Owner

@omeryusufyagci omeryusufyagci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @prakash-2002-ramanathan, thank you for the updates.

I've also updated branch 34-... to reduce the amount of clutter as it was getting hard to review.

I noticed the changes for core are still present in this PR. Could you please rebase your commit history to remove them from here? We should focus on those on a separate PR after this is finished.

Regarding the updates, in addition to the comments I left, here are some more global remarks:

  • Please try to ensure the test names reflect exactly what we're testing for and what do we expect in return.
  • Try to utilize more the features of the testing framework such as fixtures, and parameters. This is to make it easier to extend these tests later.

I appreciate your effort on this and looking forward to merge it soon! In the meantime, have you noticed we have a discord now? If you have any questions, or just want to say hello, pass by!

from typing import Optional


class Utils:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all member functions are static, there doesn't need to be a class. Please revise it as a module and update the implementations accordingly

Comment on lines +17 to +20
DOWNLOAD_URL = "https://www.youtube.com/watch?v=TK4N5W22Gts"
DEEP_FILTER_PATH = "MediaProcessor/res/deep-filter-0.5.6-x86_64-unknown-linux-musl"
TEST_VIDEO_PATH = "Media/test.webm"
PROCESSED_VIDEO_PATH = "Media/test_processed.mp4"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move these to the top of the file

Comment on lines +8 to +17
@pytest.fixture
def test_directory():
"""Fixture to set up and tear down a temporary directory."""
test_dir = tempfile.mkdtemp()
yield test_dir
for root, dirs, files in os.walk(test_dir, topdown=False):
for name in files:
os.remove(os.path.join(root, name))
for name in dirs:
os.rmdir(os.path.join(root, name))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please strictly use pathlib.Path instead of os everywhere for uniformity.
Also let's try to get rid of those loops. shutil.rmtree() should work

Comment on lines +54 to +57
def test_sanitize_filename():
# Test filename sanitization
assert Utils.sanitize_filename("test@file!.mp4") == "test_file_.mp4"
assert Utils.sanitize_filename("valid_file-name.mp4") == "valid_file-name.mp4"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For here and possibly other relevant tests, please use @pytest.mark.parametrize() to define the test cases which would allow using a single assertion per test function. In general, we should try to avoid having multiple assertions per test.

Comment on lines +74 to +85
def test_get_processed_video_path():
# Test with a valid string that includes the correct success message
valid_processed_string = Utils.get_processed_video_path(
["Video processed successfully: valid_file-name"]
)
assert valid_processed_string == "valid_file-name"

# Test with an invalid string that does not contain the success message
invalid_processed_string = Utils.get_processed_video_path(
["Video processed successful"]
)
assert invalid_processed_string is None
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is an important test, could you please add better coverage against edge cases. Empty list, strangely formatted messages, etc.

from concurrent.futures import ThreadPoolExecutor, TimeoutError as FuturesTimeoutError
from pathlib import Path
from backend.media_handler import MediaHandler
import utils as utils
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intentional?

Comment on lines +34 to +36
BASE_DIR = Path(__file__).resolve().parents[2]
TEST_DIR = Path(__file__).parent.resolve()
DEEPFILTERNET_PATH = str((BASE_DIR / utils.DEEP_FILTER_PATH).resolve())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to have a pytest.fixture to handle these.

@omeryusufyagci
Copy link
Owner

@prakash-2002-ramanathan please note the update on the branch you're targeting that was required to add urgent functionalities. Please ensure the added functionality is kept in this PR. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend feature New feature or request iteration Requires further iteration for approval
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants