Rectangle 27 0

Faster method of evaluating a boolean expression as a string in Python?


import ast

class ExpressionEvaluator:
    def __init__ (self, rawExpression):
        self.raw = rawExpression
        self.ast = ast.parse(rawExpression)

    def run (self):
        return self.evaluate(self.ast.body[0])

    def evaluate (self, expr):
        if isinstance(expr, ast.Expr):
            return self.evaluate(expr.value)
        elif isinstance(expr, ast.Name):
            return self.evaluateName(expr)
        elif isinstance(expr, ast.UnaryOp):
            if isinstance(expr.op, ast.Invert):
                return self.evaluateInvert(expr)
            else:
                raise Exception('Unknown unary operation {}'.format(expr.op))
        elif isinstance(expr, ast.BinOp):
            if isinstance(expr.op, ast.BitOr):
                return self.evaluateBitOr(expr.left, expr.right)
            elif isinstance(expr.op, ast.BitAnd):
                return self.evaluateBitAnd(expr.left, expr.right)
            elif isinstance(expr.op, ast.BitXor):
                return self.evaluateBitXor(expr.left, expr.right)
            else:
                raise Exception('Unknown binary operation {}'.format(expr.op))
        else:
            raise Exception('Unknown expression {}'.format(expr))

    def evaluateName (self, expr):
        return { expr.id: 1 }

    def evaluateInvert (self, expr):
        return self.evaluate(expr.operand)

    def evaluateBitOr (self, left, right):
        return self.join(self.evaluate(left), .5, self.evaluate(right), .5)

    def evaluateBitAnd (self, left, right):
        return self.join(self.evaluate(left), .5, self.evaluate(right), .5)

    def evaluateBitXor (self, left, right):
        return self.join(self.evaluate(left), .5, self.evaluate(right), .5)

    def join (self, a, ratioA, b, ratioB):
        d = { k: v * ratioA for k, v in a.items() }
        for k, v in b.items():
            if k in d:
                d[k] += v * ratioB
            else:
                d[k] = v * ratioB
        return d

expr = '((A&B)|(C&D)^~E)'
ee = ExpressionEvaluator(expr)
print(ee.run())
# > {'A': 0.25, 'C': 0.125, 'B': 0.25, 'E': 0.25, 'D': 0.125}

Code analysis usually works on the abstract syntax tree (or an augmented version of that). Python offers code analysis and abstract syntax tree generation with its ast module. We can use this to parse the expression and get the AST. Then based on the tree, we can analyze how relevant each part of an expression is for the whole.

I had thought of other ways of solving this problem but it always a lead to a dead end (non-functional code); hence the reason I stuck with the expression analysis. Your response is very insightful though, I will definitely try to modify and implement it to give 'correct' values - EX: expressions such as A | (A & B) and A & (A | B) etc. like you mention. From those expressions it isn't hard to see {'A': 1}, but an expression such as (A & B) | (A & D) would be a little more difficult to arrive at {'A': 0.75, 'B': 0.25, 'D': 0.25}. I'm not sure how I could program that in though.

In any way, lets first look at how you are trying to solve this. Im not exactly sure what gen_rand_bits is supposed to do, so I cant really take that into account. But still, you are essentially trying out every possible combination of variable assignments and see if flipping the value for a single variable changes the outcome of the formula result. Luckily, these are just boolean variables, so you are only looking at 2^N possible combinations. This means you have exponential run time. Now, O(2^N) algorithms are in theory very very bad, while in practice its often somewhat okay to use them (because most have an acceptable average case and execute fast enough). However, being an exhaustive algorithm, you actually have to look at every single combination and cant shortcut. Plus the compilation and value evaluation using Pythons eval is apparently not so fast to make the inefficient algorithm acceptable.

In this case, the evaluation is just based on ocurrence in the expression. I dont explicitely check for semantic effects. So for an expression A | (A & B), I get {'A': 0.75, 'B': 0.25}, although one could argue that semantically B has no relevance at all to the result (making it {'A': 1} instead). This is however something Ill leave for you. As of now, every binary operation is handled identically (each operand getting a relevance of 50%), but that can be of course adjusted to introduce some semantic rules.

Just to make sure that I understand this correctly: Your actual problem is to determine the relative influence of each variable within a boolean expression on the outcome of said expression?

Now, evaluating the relevance of each variable can get quite complicated, but you can do it all by analyzing the syntax tree. I will show you a simple evaluation that supports all boolean operators but will not further check the semantic influence of expressions:

One solution I had thought of previously was actually similar to your answer. I would start at a base expression and build up; to find the new value of A, for example, I would take A from the previous expression and multiply it by the probability the other input side is non-affecting. Take (A|B)&(C|D) I would multiply A * prob( (C|D) == 1) since (C|D) has to be 1 for A to be able to change the output. To handle A appearing on the other side (A|B) & (A|C) I came up with left(A) * prob( (A|D) == 1) + right(A) * prob( (A|B) == 1). This does not work for all cases, ie (A|B|C) & (A|C)

So, this seems to be a classic XY problem. You have an actual problem which is to determine the relative influence of each variable within the a boolean expression. You have attempted to solve this in a rather ineffective way, and now that you actually feel the inefficiency (in both memory usage and run time), you look for ways to improve your solution instead of looking for better ways to solve your original problem.

So, we should look for a different solution. When looking at your solution, one might say that more efficient is not really possible, but when looking at the original problem, we can argue otherwise.

That is what I am calculating but my problem is not with how I calculate it logically but with my use of the python eval built-in to perform evaluating.

This implementation will essentially generate a plain AST for the given expression and the recursively walk through the tree and evaluate the different operators. The big evaluate method just delegates the work to the type specific methods below; its similar to what ast.NodeVisitor does except that we return the analyzation results from each node here. One could augment the nodes instead of returning it instead though.

You essentially want to do things similar to what compilers do as static analysis. You want to look at the source code and analyze it just from there without having to actually evaluate that. As the language you are analyzing is highly restricted (being only a boolean expression with very few operators), this isnt really that hard.

Note
Rectangle 27 0

Faster method of evaluating a boolean expression as a string in Python?


exporession = 1 or 1 and (1 or 1) and 0 or (0 or 0 or 1 and (1 and 0 or 0 or (0 and 0 or 0 and 0 or 1 or 0 and 0 or 0) and 1) or 0 and 1 or 0 and 1 and 0)
expression output= 1
computing time: 0.000158071517944 seconds
import random, re, time

#Step 1: Input your expression as a string
logic_exp = "A|B&(C|D)&E|(F|G|H&(I&J|K|(L&M|N&O|P|Q&R|S)&T)|U&V|W&X&Y)"

#Step 2: Retrieve all the variable names.
#        You can design a rule for naming, and use regex to retrieve them.
#        Here for example, I consider all the single-cap-lettler are variables.
name_regex = re.compile(r"[A-Z]")

#Step 3: Replace each variable with its value. 
#        You could get the value with reading files or keyboard input.
#        Here for example I just use random 0 or 1.
for name in name_regex.findall(logic_exp):
    logic_exp = logic_exp.replace(name, str(random.randrange(2)))

#Step 4: Replace the operators. Python use 'and', 'or' instead of '&', '|' 
logic_exp = logic_exp.replace("&", " and ")
logic_exp = logic_exp.replace("|", " or " )    


#Step 5: interpret the expression with eval(exp) and output its value.
print "exporession =", logic_exp  
print "expression output =",eval(logic_exp)

According to your comment, I see that you are computing all the possible combinations instead of the output at a given input values. If so, it would become a typical NP-complete Boolean satisfiability problem. I don't think there's any algorithm that could make it by a complexity lower than O(2^N). I suggest you to search with the keywords fast algorithm to solve SAT problem, you would find a lot of interesting things.

I see. If you want to calculate all the possible combinations, then it would become a NP-complete problem. I don't think there's any algorithm that could make the complexity lower than O(2^N):(

I wish you solution was faster than mine. You are evaluating 1 expression for only 1 possible combination of input values. For my purposes I would evaluate that expression for all possible combinations which is 2^25 (although my algorithm limits to 2^15). Then for each combination evaluated I check all inputs by inverting their value individually and re-evaluating. a resulting code with similar runs would be for i in range(2**15): for j in range(25): for name in name_regex.findall(exp): exp = exp.replace(name, str(random.randrange(2))) eval(exp) exp = value

In you case, I would suggest a soluation that:

So far the only things I can think of to improve speed is make this massively parallel. It would not improve the runtime of 1 expression but when dealing with 20,000 expressions having 20,000 cores (each core assigned to an expression) would make things very fast (obviously lol) but very expensive. I have it implemented with multiprocessing but 4 cores is not much better than 1 core with that many expressions.

This would be very fast and take very little memory. For a test, I run the example above with 25 input variables:

Yes, it is NP-Complete for the most part, until it hits 15 inputs in my algorithm; then it becomes ((2**15) * N) I know there is no faster way to calculate what I am calculating in terms of the complexity, but I was hoping there was a faster alternative to using 'eval'

You don't have to prepare a static table for computing this. Python is a dynamic language, thus it's able to interpret and run a code by itself during runtime.

eval(exp) and exp = value happen every iteration of the second for loop: for j in range(25). as it stands my program can run through your expression and determine influence of the inputs in approx. 6.7 seconds not including the time it takes to generate a sample of size 2^15 containing random non repeating sequences 25 bits in length. (which would bump runtime up to 7.7 seconds - not very efficient but that is not something I'm currently working on to improve atm)

Note
Rectangle 27 0

Faster method of evaluating a boolean expression as a string in Python?


import ast

class ExpressionEvaluator:
    def __init__ (self, rawExpression):
        self.raw = rawExpression
        self.ast = ast.parse(rawExpression)

    def run (self):
        return self.evaluate(self.ast.body[0])

    def evaluate (self, expr):
        if isinstance(expr, ast.Expr):
            return self.evaluate(expr.value)
        elif isinstance(expr, ast.Name):
            return self.evaluateName(expr)
        elif isinstance(expr, ast.UnaryOp):
            if isinstance(expr.op, ast.Invert):
                return self.evaluateInvert(expr)
            else:
                raise Exception('Unknown unary operation {}'.format(expr.op))
        elif isinstance(expr, ast.BinOp):
            if isinstance(expr.op, ast.BitOr):
                return self.evaluateBitOr(expr.left, expr.right)
            elif isinstance(expr.op, ast.BitAnd):
                return self.evaluateBitAnd(expr.left, expr.right)
            elif isinstance(expr.op, ast.BitXor):
                return self.evaluateBitXor(expr.left, expr.right)
            else:
                raise Exception('Unknown binary operation {}'.format(expr.op))
        else:
            raise Exception('Unknown expression {}'.format(expr))

    def evaluateName (self, expr):
        return { expr.id: 1 }

    def evaluateInvert (self, expr):
        return self.evaluate(expr.operand)

    def evaluateBitOr (self, left, right):
        return self.join(self.evaluate(left), .5, self.evaluate(right), .5)

    def evaluateBitAnd (self, left, right):
        return self.join(self.evaluate(left), .5, self.evaluate(right), .5)

    def evaluateBitXor (self, left, right):
        return self.join(self.evaluate(left), .5, self.evaluate(right), .5)

    def join (self, a, ratioA, b, ratioB):
        d = { k: v * ratioA for k, v in a.items() }
        for k, v in b.items():
            if k in d:
                d[k] += v * ratioB
            else:
                d[k] = v * ratioB
        return d

expr = '((A&B)|(C&D)^~E)'
ee = ExpressionEvaluator(expr)
print(ee.run())
# > {'A': 0.25, 'C': 0.125, 'B': 0.25, 'E': 0.25, 'D': 0.125}

Code analysis usually works on the abstract syntax tree (or an augmented version of that). Python offers code analysis and abstract syntax tree generation with its ast module. We can use this to parse the expression and get the AST. Then based on the tree, we can analyze how relevant each part of an expression is for the whole.

I had thought of other ways of solving this problem but it always a lead to a dead end (non-functional code); hence the reason I stuck with the expression analysis. Your response is very insightful though, I will definitely try to modify and implement it to give 'correct' values - EX: expressions such as A | (A & B) and A & (A | B) etc. like you mention. From those expressions it isn't hard to see {'A': 1}, but an expression such as (A & B) | (A & D) would be a little more difficult to arrive at {'A': 0.75, 'B': 0.25, 'D': 0.25}. I'm not sure how I could program that in though.

In any way, lets first look at how you are trying to solve this. Im not exactly sure what gen_rand_bits is supposed to do, so I cant really take that into account. But still, you are essentially trying out every possible combination of variable assignments and see if flipping the value for a single variable changes the outcome of the formula result. Luckily, these are just boolean variables, so you are only looking at 2^N possible combinations. This means you have exponential run time. Now, O(2^N) algorithms are in theory very very bad, while in practice its often somewhat okay to use them (because most have an acceptable average case and execute fast enough). However, being an exhaustive algorithm, you actually have to look at every single combination and cant shortcut. Plus the compilation and value evaluation using Pythons eval is apparently not so fast to make the inefficient algorithm acceptable.

In this case, the evaluation is just based on ocurrence in the expression. I dont explicitely check for semantic effects. So for an expression A | (A & B), I get {'A': 0.75, 'B': 0.25}, although one could argue that semantically B has no relevance at all to the result (making it {'A': 1} instead). This is however something Ill leave for you. As of now, every binary operation is handled identically (each operand getting a relevance of 50%), but that can be of course adjusted to introduce some semantic rules.

Just to make sure that I understand this correctly: Your actual problem is to determine the relative influence of each variable within a boolean expression on the outcome of said expression?

Now, evaluating the relevance of each variable can get quite complicated, but you can do it all by analyzing the syntax tree. I will show you a simple evaluation that supports all boolean operators but will not further check the semantic influence of expressions:

One solution I had thought of previously was actually similar to your answer. I would start at a base expression and build up; to find the new value of A, for example, I would take A from the previous expression and multiply it by the probability the other input side is non-affecting. Take (A|B)&(C|D) I would multiply A * prob( (C|D) == 1) since (C|D) has to be 1 for A to be able to change the output. To handle A appearing on the other side (A|B) & (A|C) I came up with left(A) * prob( (A|D) == 1) + right(A) * prob( (A|B) == 1). This does not work for all cases, ie (A|B|C) & (A|C)

So, this seems to be a classic XY problem. You have an actual problem which is to determine the relative influence of each variable within the a boolean expression. You have attempted to solve this in a rather ineffective way, and now that you actually feel the inefficiency (in both memory usage and run time), you look for ways to improve your solution instead of looking for better ways to solve your original problem.

So, we should look for a different solution. When looking at your solution, one might say that more efficient is not really possible, but when looking at the original problem, we can argue otherwise.

That is what I am calculating but my problem is not with how I calculate it logically but with my use of the python eval built-in to perform evaluating.

This implementation will essentially generate a plain AST for the given expression and the recursively walk through the tree and evaluate the different operators. The big evaluate method just delegates the work to the type specific methods below; its similar to what ast.NodeVisitor does except that we return the analyzation results from each node here. One could augment the nodes instead of returning it instead though.

You essentially want to do things similar to what compilers do as static analysis. You want to look at the source code and analyze it just from there without having to actually evaluate that. As the language you are analyzing is highly restricted (being only a boolean expression with very few operators), this isnt really that hard.

Note