Video inference using NodeJS #108
Replies: 6 comments 19 replies
-
That's exactly why I don't explicitly enable video optimizations just for if (input && this.config.videoOptimized && (
(typeof HTMLImageElement !== 'undefined' && input instanceof HTMLImageElement)
|| (typeof Image !== 'undefined' && input instanceof Image)
|| (typeof Image !== 'undefined' && input instanceof ImageData)
|| (typeof ImageBitmap !== 'undefined' && image instanceof ImageBitmap))
) {
log('disabling video optimization');
previousVideoOptimized = this.config.videoOptimized;
this.config.videoOptimized = false;
} so if you parse the video stream and read frames in |
Beta Was this translation helpful? Give feedback.
-
go for it! the only thing i'm worried about extra dependencies - you need a library to parse video in |
Beta Was this translation helpful? Give feedback.
-
accessing webcam from
so lets look at
having said all that, i actually don't have an environment with nodejs and webcam, so this is written from the head without ANY testing - take it a hint more than as a solution! const util = require('util');
const nodeWebCam = require('node-webcam');
const tf = require('@tensorflow/tfjs-node');
const Human = require('@vladmandic/human/dist/human.node.js').default;
// options for node-webcam
const optionsCamera = {
callbackReturn: 'buffer', // this means whatever `fswebcam` writes to disk, no additional processing so it's fastest
saveShots: false, // don't save processed frame to disk, note that temp file is still created by fswebcam thus recommendation for tmpfs
};
// options for human
const optionsHuman = {
backend: 'tensorflow',
modelBasePath: 'file://node_modules/@vladmandic/human/models/',
};
const camera = nodeWebCam.create(optionsCamera);
const capture = util.promisify(camera.capture);
const human = new Human(optionsHuman);
const results = [];
const buffer2tensor = human.tf.tidy((buffer) => {
const decode = human.tf.node.decodeImage(buffer, 3);
let expand;
if (decode.shape[2] === 4) { // input is in rgba format, need to convert to rgb
const channels = human.tf.split(decode, 4, 2); // tf.split(tensor, 4, 2); // split rgba to channels
const rgb = human.tf.stack([channels[0], channels[1], channels[2]], 2); // stack channels back to rgb and ignore alpha
expand = human.tf.reshape(rgb, [1, decode.shape[0], decode.shape[1], 3]); // move extra dim from the end of tensor and use it as batch number instead
} else {
expand = human.tf.expandDims(decode, 0); // inpur ia rgb so use as-is
}
const cast = human.tf.cast(expand, 'float32');
return cast;
});
async function process() {
// trigger next frame every 5 sec
// triggered here before actual capture and detection since we assume it will complete in less than 5sec
// so it's as close as possible to real 5sec and not 5sec + detection time
// if there is a chance of race scenario where detection takes longer than loop trigger, then trigger should be at the end of the function instead
setTimeout(() => process(), 5000);
const buffer = await capture(); // gets the (default) jpeg data from from webcam
const tensor = buffer2tensor(buffer); // create tensor from image buffer
const res = await human.detect(tensor); // run detection
// do whatever here with the res
// or just append it to results array that will contain all processed results over time
results.push(res);
// alternatively to triggering every 5sec sec, simply trigger next frame as fast as possible
// setImmediate(() => process());
}
console.log('Human:', human.version);
console.log('TFJS:', tf.version_core);
process(); note#1: if it were up to me, i'd look into using software that can has broad format and device support and then re-broadcast stream in a well-known format (not write to disk), for example note#2: for a typical IP camera (not WebCam) that uses RTSP protocol (such as most security cameras), I actually wrote a utility that takes that reads stream from IP camera and creates WebRTC stream that Human can connect to (I've added WebRTC support recently) |
Beta Was this translation helpful? Give feedback.
-
Hey, so turns out fswebcam's --loop takes only intervals in seconds. So using |
Beta Was this translation helpful? Give feedback.
-
i just did a quick prototype using https://github.com/vladmandic/human/blob/main/demo/node-video.js main trick was to use motion jpeg as output format which is then easily parsed for frame start/end markers which gives jpeg per each frame. |
Beta Was this translation helpful? Give feedback.
-
Rotation is anyhow disabled in NodeJS due to missing function
Correct. And none,
for multiple faces, the client demo also, behind the scenes, match function is just a loop around |
Beta Was this translation helpful? Give feedback.
-
I was thinking of writing a video demo using nodeJS as client side. Do you know of a tool to capture video stream from nodeJS and that can be fed into human for detection? I could only find tools that have callback on each frame capture and from what I understood about the API, to leverage the aggressive caching I need to pass in a HTMLVideoElement. I might be wrong so it'd be great if you could point out how this could be done!
Beta Was this translation helpful? Give feedback.
All reactions