Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More operators please. #8042

Closed
taffmak opened this issue Feb 22, 2017 · 13 comments
Closed

More operators please. #8042

taffmak opened this issue Feb 22, 2017 · 13 comments
Labels
area/functions area/influxql Issues related to InfluxQL query language kind/feature-request

Comments

@taffmak
Copy link

taffmak commented Feb 22, 2017

Feature Request

Proposal:
Implement the full complement of arithmetic operators including logical bitwise operators for and, or, not (invert) and xor.

Current behavior:
Not currently supported.

Desired behavior:
Ability to perform bitwise logical operations on integer field values.

Use case:
Enable the use of field values as bit fields that can be easily queried with bitwise logical operators.
We are instrumenting our applications and streaming measurements through a processing application in real-time. The application outputs measurements to influx in addition to identifying issues in real-time. E.g. stalling queues, counter resets - it can also detect issues over several measurements that are difficult to query for, such as a queue where latency is increasing over some period of time.

We are raising events in our system when these things occur as they are important to us in real-time.

It would also useful to pass through to influx the live detected status as bits within a field so we can query back these occurrences after the fact.

e.g. We can query the number of times per month that a queue stalls, is reset or it's length exceeds a threshold (where the thresholds may be different for each series and are not known to influx).

@e-dard e-dard added area/functions area/influxql Issues related to InfluxQL query language kind/feature-request labels Feb 27, 2017
@Tomcat-Engineering
Copy link
Contributor

Like you, we use InfluxDB to store lots of status words so this could be useful for us too.

There is an issue with implementing bitwise-not as an operator: what word size do we assume?

E.g. what should "NOT 0" return? If the word size is 8 bits, we should return 0b11111111 (i.e. 0xFF, decimal 255) but if the word size is bigger obviously we would need to return something different.

If implementing a generic operator we would have to assume that the word size is the biggest that InfluxDB supports, 64 bits, but unfortunately because integers in InfluxDB are defined as the signed int64 type, "NOT 0" would return binary of 64 ones which is interpreted and displayed as minus one! Neither 64 binary ones nor a negative number is going to be very helpful to most users.

I guess you could always AND the result with a bitmask representing the word width to get it back to a reasonable size and get rid of the negatives (it is only the very highest bit out of the 64 which makes things be interpreted as negative), but it would be a bit annoying to have to do something like (NOT x) AND 0xFF every time.

One option is to define bitwise-not as a function rather than an operator, e.g. bitwise_not(value, 8) where the second argument is the word width in bits. Would that be acceptable and better than having to do (NOT x) AND 0xFF every time?

Happily this only applies to NOT - the other bitwise operators that you mention (AND, OR, XOR) can all just operate on standard integers and you will get sensibly-sized non-negative results because all the unused higher bits of the result will be zero.

@Tomcat-Engineering
Copy link
Contributor

I might have a go at implementing these.

Everything below would only operate on integer data and integer constants.

I propose using the standard Go symbols of &, | and ^ for bitwise AND, bitwise OR and bitwise XOR respectively.

I propose to make bitwise_not(value, width) a function as discussed above.

I guess I might as well implement left and right bitshift (<< and >>) while I'm at it, though they will involve some care because we will need to re-interpret the bits as a uint64 before we can use the built-in Go operators (they do odd things to signed numbers which we don't want).

@jsternberg (and other InfluxData people) do you have any comments on my proposed use of these symbols/names before I get stuck in?

@jsternberg
Copy link
Contributor

I've got no problem with the standard Go symbols. CC @pauldix to make sure he's OK with this change to the query language. I personally think the operators are a good addition.

I don't know if the functions would be a good idea. While I have a desire to introduce math functions, we don't currently have any raw math functions. Math operators have much greater flexibility here. If you can figure out a way to do the math functions without running into a roadblock on some of the assumptions the query engine has about the Call node in the AST, that would be even better since it would allow the creation of abs() and pow(). You might have to dive pretty deep into the validator and query engine though to remove the assumption that a Call node indicates an aggregate exists.

@Tomcat-Engineering
Copy link
Contributor

@jsternberg (sorry, this is slightly off topic to this thread but relevant to your last comment!)

I started implementing loads of math functions like pow and abs a while ago.

Implementing the functions turned out to be surprisingly easy, but this was before subqueries existed so I thought that these new functions should also support nesting both inside and outside other functions (e.g. mean(abs(value)) and pow(mean(value), 2). As you anticipated, this required fairly major surgery to the query parser which I thought might take a lot of effort to get approved...

It then occurred to me that these nested queries were not running as efficiently as they could, because mean(abs(value)) could in theory be passed out to the shards for individual computation in the same way that mean(value) is. This turned out to be quite hard, and at this point I ran out of time.

Now that we have got subqueries, is it worth me just implementing the math functions without the complicated nesting stuff? e.g. you would implement the above examples as:

SELECT mean(abs) FROM (SELECT abs(value) FROM ...)

and

SELECT pow(mean, 2) FROM (SELECT mean(value) FROM ...)

This is a bit clunky, and would not give the super-optimised efficiency which I was trying to achieve, but it would at least give us a load of new functions without rewriting half of InfluxDB! The functions I wrote before were: "abs", "exp", "log", "log10", "sqrt", "asin", "acos", "atan", "sin", "cos", "tan", "pow". We can now add "bitwise_not" to this list.

Is it worth implementing this simple approach, or do we need nesting to work for these functions?

@Tomcat-Engineering
Copy link
Contributor

I have created a pull request for the basic operators.

I came up with a cunning solution to bitwise-not which doesn't involve creating an extra function and neatly allows the user to tell us the word-width they are using. I think any user who is playing with bitwise operators will be able to understand it. Avoiding a function call has the added advantage of being able to nest this with other operations, which will commonly be required, without having to use subqueries.

I haven't implemented the bitshift operators (<< and >>) because nobody has actually requested them and we'd need to debate the correct behaviour in the face of unknown user word-width.

My operators don't support negative inputs, as discussed on the pull request. It seems unlikely that anyone would have pushed a negative number into the database to indicate a bitfield with the highest possible (64th) bit set, so I don't think this restriction is going to affect any real-world cases.

Finally, when writing these bitfield queries it would be much easier if we could use hexadecimal integer constants like 0xFF. Is there any appetite for adding them to InfluxQL?

@taffmak
Copy link
Author

taffmak commented Mar 19, 2017 via email

@taffmak
Copy link
Author

taffmak commented Mar 20, 2017

Why do the operators not support -ve integers?
If you are treating values as 64 binary bits why do you care what value the top most bit takes? And why is 0 ok but 1 for some reason is not valid?
Personally I hope not to have a need for 63 nor 64 bits so it's a moot point for me - I just don't see why any of the 64 bits should be treated differently to the others - presumably that involves additional (pointless & wasteful) code paths.

@Tomcat-Engineering
Copy link
Contributor

@taffmak

Allowing the operators to support -ve integers would make the results directly dependent on InfluxDB's internal representation of integers, which at the moment happens to be twos-complement int64. If in the future InfluxDB decided to upgrade its integers to a hypothetical int128 datatype then the results would change.

We therefore get a much cleaner interface if we avoid negative numbers. The user doesn't have to know anything about the internal representation.

I guess it is slightly annoying to only be able to support 63-bit bitfields rather than 64, but like you I don't need anything like this width in my applications. In practice I think it would be very rare for users to store negative integers representing 64-bit bitfields in InfluxDB. They are much more likely to try pushing unsigned values in and discover that those with the top bit set trigger an out-of-range error.

@taffmak
Copy link
Author

taffmak commented Mar 21, 2017

@Tomcat-Engineering
I have transitioned from 8 to 16 to 32 and to 64 bit software and I cannot recall encountering any such issues with bitwise operations.
Is this really a problem and if so how has it been overcome by all other languages without limiting bitwise operators?

Allowing the operators to support -ve integers would make the results directly dependent on InfluxDB's internal representation of integers, which at the moment happens to be twos-complement int64.

I disagree - it actually makes them consistent with every language and CPU in mainstream use.

If in the future InfluxDB decided to upgrade its integers to a hypothetical int128 datatype then the results would change.

Of course such fundamental changes will introduce changes - summation that now overflows would also change. If bitwise operators do just that, i.e. simply operate on bits, then at least every single bit in the current integer will behave exactly the same in both cases.

We therefore get a much cleaner interface if we avoid negative numbers. The user doesn't have to know anything about the internal representation.

Not true - users will always make assumptions, run tests or look things up - especially using bit fields it is imperative to know how many bits one is dealing with 8, 16, 32, 64 etc. (note that wasn't 7, 15, 31 or 63!)
Rejecting -ve integers actually forces users to know how many bits there are and that Influx uses 2's complement 64 bits in order to avoid using the top bit.

I urge you take the path of least surprise - in this case just operate on all bits like every other language and CPU. I believe that most devs encountering bitwise operators would expect the same as experience gives them no reason to suspect anything unusual.

The real problem is the lack of a hex representation - perhaps it would be better to invest the time/effort in developing hex formatting for output and literal inputs?

@Tomcat-Engineering
Copy link
Contributor

Pull request has been merged, can we close this now?

@taffmak
Copy link
Author

taffmak commented Apr 11, 2017

Hi, thanks for implementing my request - it is really very much appreciated.
How can I find which version this change has gone into and when a download may become available?

Best regards
Richard.

@taffmak taffmak closed this as completed Apr 11, 2017
@Tomcat-Engineering
Copy link
Contributor

Looks like it will be in version 1.3, but I'm not sure when they are planning to release that.

@taffmak
Copy link
Author

taffmak commented Apr 12, 2017

Many thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/functions area/influxql Issues related to InfluxQL query language kind/feature-request
Projects
None yet
Development

No branches or pull requests

4 participants