Video inference using NodeJS #108

mayankagarwals · 2021-04-19T11:33:51Z

mayankagarwals
Apr 19, 2021

I was thinking of writing a video demo using nodeJS as client side. Do you know of a tool to capture video stream from nodeJS and that can be fed into human for detection? I could only find tools that have callback on each frame capture and from what I understood about the API, to leverage the aggressive caching I need to pass in a HTMLVideoElement. I might be wrong so it'd be great if you could point out how this could be done!

vladmandic · 2021-04-19T11:54:04Z

vladmandic
Apr 19, 2021
Maintainer

That's exactly why I don't explicitly enable video optimizations just for HTMLVideoElement but instead disable them for images and leave it as-is for everything else:

      if (input && this.config.videoOptimized && (
        (typeof HTMLImageElement !== 'undefined' && input instanceof HTMLImageElement)
        || (typeof Image !== 'undefined' && input instanceof Image)
        || (typeof Image !== 'undefined' && input instanceof ImageData)
        || (typeof ImageBitmap !== 'undefined' && image instanceof ImageBitmap))
      ) {
        log('disabling video optimization');
        previousVideoOptimized = this.config.videoOptimized;
        this.config.videoOptimized = false;
      }

so if you parse the video stream and read frames in NodeJS, you can pass either canvas or tensor as input to human.detect() and it will still use video optimizations.

1 reply

mayankagarwals Apr 19, 2021
Author

so if you parse the video stream and read frames in NodeJS, you can pass either canvas or tensor as input to human.detect() and it will still use video optimizations.

That's great! Do you think it'd be helpful if I create a demo of this and create a PR.

vladmandic · 2021-04-19T12:02:10Z

vladmandic
Apr 19, 2021
Maintainer

That's great! Do you think it'd be helpful if I create a demo of this and create a PR.

go for it!

the only thing i'm worried about extra dependencies - you need a library to parse video in nodejs and i hate having extra dependencies in human just for the demo. but if those a minimal and we document it as the user needs to install them manually (i don't want to pre-install them with human), no issues.

1 reply

mayankagarwals May 11, 2021
Author

Hey, could you point me to a library to do this? I'm finding it surprisingly hard to find one. node-webcam library as of now only captures one image, camera-capture library gives me an unreadable stream

vladmandic · 2021-05-11T13:48:44Z

vladmandic
May 11, 2021
Maintainer

accessing webcam from nodejs is messy at best as nodejs doesn't have any device support
so it's always going to be based on spawning some external package

camera-capture is not really a true nodejs solution, it's based on pupeteer which is basically a hidden chrome process
node-webcam uses fswebcam that writes frame to disk and then reads it

so lets look at node-webcam

don't think of stream, think of frames - since each frame will likely take more time to process than rate they're coming in, you should grab frames on-demand (from a device or from a stream)
there is a way to avoid fswebcam writing captured frame to disk, so i'd definitely setup a small tmpfs area to avoid disk thrashing and to ensure all i/o is in memory (although indirectly)
node-webcam uses callbacks, so i'd create a promise wrapper to make everything a bit more elegant
still need a something to process captured buffer into something that human can understand
note needed for browsers as browsers provide image decoding functionality, but required for nodejs as it has no such functionality
here i've used built-in tfjs-node functionality to decode jpeg and create a tensor,
but you can also use a 3rd party library like canvas and create canvas from buffer and pass that canvas to human instead

having said all that, i actually don't have an environment with nodejs and webcam, so this is written from the head without ANY testing - take it a hint more than as a solution!

const util = require('util');
const nodeWebCam = require('node-webcam');
const tf = require('@tensorflow/tfjs-node');
const Human = require('@vladmandic/human/dist/human.node.js').default;

// options for node-webcam
const optionsCamera = {
  callbackReturn: 'buffer', // this means whatever `fswebcam` writes to disk, no additional processing so it's fastest
  saveShots: false, // don't save processed frame to disk, note that temp file is still created by fswebcam thus recommendation for tmpfs
};

// options for human
const optionsHuman = {
  backend: 'tensorflow',
  modelBasePath: 'file://node_modules/@vladmandic/human/models/',
};

const camera = nodeWebCam.create(optionsCamera);
const capture = util.promisify(camera.capture);
const human = new Human(optionsHuman);
const results = [];

const buffer2tensor = human.tf.tidy((buffer) => {
  const decode = human.tf.node.decodeImage(buffer, 3);
  let expand;
  if (decode.shape[2] === 4) { // input is in rgba format, need to convert to rgb
    const channels = human.tf.split(decode, 4, 2); // tf.split(tensor, 4, 2); // split rgba to channels
    const rgb = human.tf.stack([channels[0], channels[1], channels[2]], 2); // stack channels back to rgb and ignore alpha
    expand = human.tf.reshape(rgb, [1, decode.shape[0], decode.shape[1], 3]); // move extra dim from the end of tensor and use it as batch number instead
  } else {
    expand = human.tf.expandDims(decode, 0); // inpur ia rgb so use as-is
  }
  const cast = human.tf.cast(expand, 'float32');
  return cast;
});

async function process() {
  // trigger next frame every 5 sec
  // triggered here before actual capture and detection since we assume it will complete in less than 5sec 
  // so it's as close as possible to real 5sec and not 5sec + detection time
  // if there is a chance of race scenario where detection takes longer than loop trigger, then trigger should be at the end of the function instead
  setTimeout(() => process(), 5000);

  const buffer = await capture(); // gets the (default) jpeg data from from webcam
  const tensor = buffer2tensor(buffer); // create tensor from image buffer
  const res = await human.detect(tensor); // run detection

  // do whatever here with the res
  // or just append it to results array that will contain all processed results over time
  results.push(res);

  // alternatively to triggering every 5sec sec, simply trigger next frame as fast as possible
  // setImmediate(() => process());
}

console.log('Human:', human.version);
console.log('TFJS:', tf.version_core);
process();

note#1: if it were up to me, i'd look into using software that can has broad format and device support and then re-broadcast stream in a well-known format (not write to disk), for example vlc
take a look at https://www.guidingtech.com/9554/vlc-stream-media-live-camera-feed-how-to/
which still leaves a challenge on how to decode the stream from nodejs, but that is solvable

note#2: for a typical IP camera (not WebCam) that uses RTSP protocol (such as most security cameras), I actually wrote a utility that takes that reads stream from IP camera and creates WebRTC stream that Human can connect to (I've added WebRTC support recently)
take a look at https://github.com/vladmandic/stream-rtsp

3 replies

mayankagarwals May 14, 2021
Author

Hey, apologies for the late reply and thank you for the detailed answer!

I was not able to get the node-webcam working continuously. The webcam would turn off and turn on between each frame, seriously affecting the fps.

I will look into the vlc solution and get back. Could you help me with some hints as to how to decode the stream from nodeJS?. Can I use the buffer2tensor method you have written in your solution for each frame?

That is great. As of now, I do need it for webcam, but that's good to know !

vladmandic May 14, 2021
Maintainer

more i think about it, it's still a lot of work to decode actual stream and continously output frames

it might be easier to use same fswebcam used by node-webcam
version included in ubuntu is old (2014), but if you fetch a newer version (http://www.sanslogic.co.uk/fswebcam/), it does have an option to dump frames to stdout

so you could:

spawn fswebcam from nodejs using child_process,
have it run continously (--loop) and
(that way webcam would not turn on/off every time like it does when used from node-webcam)
dump frames to stdout (that's a relatively new feature)

then in the nodejs:

have an event listener for child process stdout to receive each frame
use buffer2tensor (or equivalent) to process it and then
pass it to human for processing

processing should also set a busy flag so there is no race condition where you pipe data into human faster than it can process
that way there are no temp files (i hate any solution that relies on temp files to pass data)

this is all a theory since i don't have a system where i can play with fswebcam
(my dev environment does not have access to a webcam from server side, so cant access it from nodejs to try out fswebcam)

mayankagarwals May 14, 2021
Author

Sounds good!. I will try this out in the coming week!. Thank you for the detailed solution

mayankagarwals · 2021-05-15T11:37:28Z

mayankagarwals
May 15, 2021
Author

Hey, so turns out fswebcam's --loop takes only intervals in seconds. So using --loop 1 will take one picture every second. Do you know of a way to get continuous frames or atleast enough number to make it look real time from fswebcam, or a different tool that can replace it?

5 replies

vladmandic May 15, 2021
Maintainer

does fswebcam support fractional numbers (e.g. 0.1 to take 10 snaps per second)? project seems well maintained, maybe author can implement that.

regarding other tools, ffmpeg always comes out on top when it comes to do anything regarding video processing.

mayankagarwals May 15, 2021
Author

Unfortunately, it doesn't. I'll raise a request for the feature on the repository :)

Oh yes, I was following that lead last week. Will check it out. Thank you!

vladmandic May 15, 2021
Maintainer

Take a look at both options: fluent-ffmpeg is a good nodejs wrapper library, which makes it easier to use, but not sure if it has all the features regarding webcam you'd need. Or use ffmpeg directly by spawning it from nodejs and piping the output.

mayankagarwals May 15, 2021
Author

I think the challenge will be finding a way to get continuous frames instead of a h264/h265 encoded stream. I'll try to find out if there is a way to do that

vladmandic May 15, 2021
Maintainer

yup. i still think that ffmpeg should be able to do it.

worst case scenario, you can use pupeteer to basically use full browser functionality in nodejs - then you have browser access to webcam, video stream decoding, canvas, etc.

vladmandic · 2021-05-17T04:00:18Z

vladmandic
May 17, 2021
Maintainer

i just did a quick prototype using ffmpeg to decode input video (demo is using a file, but same applies to stream or webcam, just change input params) and pass data to human for processing. everything is done via pipes, so there are no ugly temp files.

https://github.com/vladmandic/human/blob/main/demo/node-video.js

main trick was to use motion jpeg as output format which is then easily parsed for frame start/end markers which gives jpeg per each frame.

1 reply

mayankagarwals May 17, 2021
Author

Great, looks good. I can just swap in /dev/video0 for file and it should work out of the box. Thank you for this :)

I also had a few more questions I was hoping you could help with!

My use case is just to get the face embedding. I know that user will be facing the camera at the right angle, so I don't think rotation is needed. Is the following configuration apt for it:

const humanConfig = {
  backend: 'tensorflow',
  modelBasePath: '',
  debug: false,
  videoOptimized: true,
  async: true,
  filter: { enabled: false },
  face: {
    enabled: true,
    detector: { enabled: true, rotation: false },
    mesh: { enabled: true },
    iris: { enabled: false },
    description: { enabled: true },
    emotion: { enabled: false },
  },
  hand: { enabled: false },
  body: { enabled: false },
  object: { enabled: false },
};

Am I correct in assuming that the models utilized by using this config would be MediaPipe BlazeFace -> MediaPipe FaceMesh -> HSE-FaceRes. If yes, what is the role of BecauseofAI MobileFace
Which variant of HSE-FaceRes is used? The repository contains a mobilenet and a resnet version. I wanted to find out the model's performance on different databases
I want to get the best match in case of multiple matches. I went through the code but couldn't get a hint if the results are sorted in any way based on confidence. Are they?

vladmandic · 2021-05-17T11:47:53Z

vladmandic
May 17, 2021
Maintainer

My use case is just to get the face embedding. I know that user will be facing the camera at the right angle, so I don't think rotation is needed. Is the following configuration apt for it:

Rotation is anyhow disabled in NodeJS due to missing function rotateWithOffset (feature request is open with TFJS team).
Mesh is also used to get tighter crop around face as BlazeFace is fast, but not really precise. But actual mesh results are not required for description.
So you can try with mesh disabled and enabled and see the difference in results.

Am I correct in assuming that the models utilized by using this config would be MediaPipe BlazeFace -> MediaPipe FaceMesh -> HSE-FaceRes. If yes, what is the role of BecauseofAI MobileFace

Correct.

And none, MobileFace is removed in Human v1.8 and higher as it's fully superseeded by HSE-FaceRes. Code remains, but actual weights are not included in distribution. If you wanted to use MobileFace, then set description.enabled = false; embedding.enabled = true;

Which variant of HSE-FaceRes is used? The repository contains a mobilenet and a resnet version. I wanted to find out the model's performance on different databases

MobileNet due to size and performance constraints. ResNet is much slower and larger and only gives slightly better results.

I want to get the best match in case of multiple matches. I went through the code but couldn't get a hint if the results are sorted in any way based on confidence. Are they?

human.match always returns just one result - best match (see src/faceres/faceres.ts:match())
and human.similarity returns just a score.

for multiple faces, the client demo facematch.js simply loops over all faces to run human.similarity and then sorts results (it does it in a silly way by storing similarity score in element title and then sorting all elements - see lines 108-110).

also, behind the scenes, match function is just a loop around human.similarity function but returns record with highest score.
if you loop over all faces, run human.similarity and store results in array, then at the end of the loop its one-liner to sort the array.

8 replies

vladmandic May 17, 2021
Maintainer

btw, i'm still fine-tuning descriptor precision, but it depends a lot on face image pre-processing, so if you want to contribute and experiment on finding best options, take a look at src/faceres/faceres.ts, specifically enhance() and predict() functions.

mayankagarwals May 17, 2021
Author

Oh yeah, I was reading this https://github.com/vladmandic/human/wiki/Embedding#face-image-pre-processing. I don't have any expertise but I will try my best to look into this :). Can I ask what ideas you were considering exploring?

vladmandic May 17, 2021
Maintainer

on input image, few examples that are already in the code, but disabled as i need to test all combinations:

how tight is the crop around face - playing with empiric values
convert face to black&white since color should not matter to descriptor
enhance face image contrast
normalize face brightness to always use full range regardless on input photo lighting conditions

on output descriptor, hse-faceres returns detailed 1k-element array, i'm looking how to reduce it's size without impacting precision.

mayankagarwals May 20, 2021
Author

Regarding the face descriptor size, the way is to change the architecture and retrain the required layers, correct? Or is there a different way?

vladmandic May 20, 2021
Maintainer

clearly, that's the best way
but doing math approximations seems to be ok as well if correct math is applied

input [1024] -> reshape to [256 x 4] -> reduce [4 -> 1] -> squeeze [256]

if reduce operation is decent, then resulting descriptor is only 256-elements and still maintains most of it's precision

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Video inference using NodeJS #108

{{title}}

Replies: 6 comments 19 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Video inference using NodeJS #108

mayankagarwals Apr 19, 2021

Replies: 6 comments · 19 replies

vladmandic Apr 19, 2021 Maintainer

mayankagarwals Apr 19, 2021 Author

vladmandic Apr 19, 2021 Maintainer

mayankagarwals May 11, 2021 Author

vladmandic May 11, 2021 Maintainer

mayankagarwals May 14, 2021 Author

vladmandic May 14, 2021 Maintainer

mayankagarwals May 14, 2021 Author

mayankagarwals May 15, 2021 Author

vladmandic May 15, 2021 Maintainer

mayankagarwals May 15, 2021 Author

vladmandic May 15, 2021 Maintainer

mayankagarwals May 15, 2021 Author

vladmandic May 15, 2021 Maintainer

vladmandic May 17, 2021 Maintainer

mayankagarwals May 17, 2021 Author

vladmandic May 17, 2021 Maintainer

vladmandic May 17, 2021 Maintainer

mayankagarwals May 17, 2021 Author

vladmandic May 17, 2021 Maintainer

mayankagarwals May 20, 2021 Author

vladmandic May 20, 2021 Maintainer

mayankagarwals
Apr 19, 2021

Replies: 6 comments 19 replies

vladmandic
Apr 19, 2021
Maintainer

mayankagarwals Apr 19, 2021
Author

vladmandic
Apr 19, 2021
Maintainer

mayankagarwals May 11, 2021
Author

vladmandic
May 11, 2021
Maintainer

mayankagarwals May 14, 2021
Author

vladmandic May 14, 2021
Maintainer

mayankagarwals May 14, 2021
Author

mayankagarwals
May 15, 2021
Author

vladmandic May 15, 2021
Maintainer

mayankagarwals May 15, 2021
Author

vladmandic May 15, 2021
Maintainer

mayankagarwals May 15, 2021
Author

vladmandic May 15, 2021
Maintainer

vladmandic
May 17, 2021
Maintainer

mayankagarwals May 17, 2021
Author

vladmandic
May 17, 2021
Maintainer

vladmandic May 17, 2021
Maintainer

mayankagarwals May 17, 2021
Author

vladmandic May 17, 2021
Maintainer

mayankagarwals May 20, 2021
Author

vladmandic May 20, 2021
Maintainer