Rectangle 27 2

Your tuple contains only one element, a dictionary, with one key-value pair. If you wanted to extract that pair, you'd need to address it:

(x1, x2), = q[0].items()

The above expression extracts the first element from the tuple (the dictionary), and calls the dict.items() method on that. The resulting sequence of (key, value) pairs is then assigned to the (key, value), left-hand target, which can only take one such pair.

>>> q = ({'sum(total)': Decimal('89')},)
>>> (x1, x2), = q[0].items()
>>> x1
'sum(total)'
>>> x2
Decimal('89')

You could also just iterate over all key-value pairs in the tuple, or you could use the key name. The latter, for example, would look like this:

decimal_value = q[0]['sum(total)']

You can still use unpacking in the assignment of course:

contained_dictionary, = q
decimal_value = contained_dictionary['sum(total)']

It all depends on what you are trying to achieve, and if the dictionary has different keys or should only ever contain one key-value pair.

Extract specific number from tuple value in Python - Stack Overflow

python tuples
Rectangle 27 234

dict((k, bigdict[k]) for k in ('l', 'm', 'n'))
{k: bigdict[k] for k in ('l', 'm', 'n')}
bigdict
None
{k: bigdict.get(k, None) for k in ('l', 'm', 'n')}

If you're using Python 3, and you only want want keys in the new dict that actually exist in the original one, you can use the fact the view objects implement some set operations:

{k: bigdict[k] for k in bigdict.keys() & {'l', 'm', 'n'}}

Will fail if bigdict does not contain k

A bit harsh to downvote that - it seemed pretty clear from the context to me that it's known that these keys are in the dictionary...

{k: bigdict.get(k,None) for k in ('l', 'm', 'n')} will deal with the situation where a specified key is missing in the source dictionary by setting key in the new dict to None

@MarkLongair Depending on the use case {k: bigdict[k] for k in ('l','m','n') if k in bigdict} might be better, as it only stores the keys that actually have values.

bigdict.keys() & {'l', 'm', 'n'}
bigdict.viewkeys() & {'l', 'm', 'n'}

Extract subset of key-value pairs from Python dictionary object? - Sta...

python dictionary
Rectangle 27 89

What does ** (double star) and * (star) do for parameters

**

*args allows for any number of optional positional arguments (parameters), which will be assigned to a tuple named args.

**kwargs allows for any number of optional keyword arguments (parameters), which will be in a dict named kwargs.

You can (and should) choose any appropriate name, but if the intention is for the arguments to be of non-specific semantics, args and kwargs are standard names.

You can also use *args and **kwargs to pass in parameters from lists (or any iterable) and dicts (or any mapping), respectively.

The function recieving the parameters does not have to know that they are being expanded.

For example, Python 2's xrange does not explicitly expect *args, but since it takes 3 integers as arguments:

>>> x = xrange(3) # create our *args - an iterable of 3 integers
>>> xrange(*x)    # expand here
xrange(0, 2, 2)

As another example, we can use dict expansion in str.format:

>>> foo = 'FOO'
>>> bar = 'BAR'
>>> 'this is foo, {foo} and bar, {bar}'.format(**locals())
'this is foo, FOO and bar, BAR'

You can have keyword only arguments after the *args - for example, here, kwarg2 must be given as a keyword argument - not positionally:

def foo(arg, kwarg=None, *args, kwarg2=None, **kwargs): 
    return arg, kwarg, args, kwarg2, kwargs
>>> foo(1,2,3,4,5,kwarg2='kwarg2', bar='bar', baz='baz')
(1, 2, (3, 4, 5), 'kwarg2', {'bar': 'bar', 'baz': 'baz'})

Also, * can be used by itself to indicate that keyword only arguments follow, without allowing for unlimited positional arguments.

def foo(arg, kwarg=None, *, kwarg2=None, **kwargs): 
    return arg, kwarg, kwarg2, kwargs
kwarg2
>>> foo(1,2,kwarg2='kwarg2', foo='foo', bar='bar')
(1, 2, 'kwarg2', {'foo': 'foo', 'bar': 'bar'})
*args*
>>> foo(1,2,3,4,5, kwarg2='kwarg2', foo='foo', bar='bar')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: foo() takes from 1 to 2 positional arguments 
    but 5 positional arguments (and 1 keyword-only argument) were given
kwarg
def bar(*, kwarg=None): 
    return kwarg

In this example, we see that if we try to pass kwarg positionally, we get an error:

>>> bar('kwarg')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: bar() takes 0 positional arguments but 1 was given

We must explicitly pass the kwarg parameter as a keyword argument.

>>> bar(kwarg='kwarg')
'kwarg'

*args (typically said "star-args") and **kwargs (stars can be implied by saying "kwargs", but be explicit with "double-star kwargs") are common idioms of Python for using the * and ** notation. These specific variable names aren't required (e.g. you could use *foos and **bars), but a departure from convention is likely to enrage your fellow Python coders.

We typically use these when we don't know what our function is going to receive or how many arguments we may be passing, and sometimes even when naming every variable separately would get very messy and redundant (but this is a case where usually explicit is better than implicit).

The following function describes how they can be used, and demonstrates behavior. Note the named b argument will be consumed by the second positional argument before :

def foo(a, b=10, *args, **kwargs):
    '''
    this function takes required argument a, not required keyword argument b
    and any number of unknown positional arguments and keyword arguments after
    '''
    print('a is a required argument, and its value is {0}'.format(a))
    print('b not required, its default value is 10, actual value: {0}'.format(b))
    # we can inspect the unknown arguments we were passed:
    #  - args:
    print('args is of type {0} and length {1}'.format(type(args), len(args)))
    for arg in args:
        print('unknown arg: {0}'.format(arg))
    #  - kwargs:
    print('kwargs is of type {0} and length {1}'.format(type(kwargs),
                                                        len(kwargs)))
    for kw, arg in kwargs.items():
        print('unknown kwarg - kw: {0}, arg: {1}'.format(kw, arg))
    # But we don't have to know anything about them 
    # to pass them to other functions.
    print('Args or kwargs can be passed without knowing what they are.')
    # max can take two or more positional args: max(a, b, c...)
    print('e.g. max(a, b, *args) \n{0}'.format(
      max(a, b, *args))) 
    kweg = 'dict({0})'.format( # named args same as unknown kwargs
      ', '.join('{k}={v}'.format(k=k, v=v) 
                             for k, v in sorted(kwargs.items())))
    print('e.g. dict(**kwargs) (same as {kweg}) returns: \n{0}'.format(
      dict(**kwargs), kweg=kweg))

We can check the online help for the function's signature, with help(foo), which tells us

foo(a, b=10, *args, **kwargs)
foo(1, 2, 3, 4, e=5, f=6, g=7)
a is a required argument, and its value is 1
b not required, its default value is 10, actual value: 2
args is of type <type 'tuple'> and length 2
unknown arg: 3
unknown arg: 4
kwargs is of type <type 'dict'> and length 3
unknown kwarg - kw: e, arg: 5
unknown kwarg - kw: g, arg: 7
unknown kwarg - kw: f, arg: 6
Args or kwargs can be passed without knowing what they are.
e.g. max(a, b, *args) 
4
e.g. dict(**kwargs) (same as dict(e=5, f=6, g=7)) returns: 
{'e': 5, 'g': 7, 'f': 6}

We can also call it using another function, into which we just provide a:

def bar(a):
    b, c, d, e, f = 2, 3, 4, 5, 6
    # dumping every local variable into foo as a keyword argument 
    # by expanding the locals dict:
    foo(**locals())
bar(100)
a is a required argument, and its value is 100
b not required, its default value is 10, actual value: 2
args is of type <type 'tuple'> and length 0
kwargs is of type <type 'dict'> and length 4
unknown kwarg - kw: c, arg: 3
unknown kwarg - kw: e, arg: 5
unknown kwarg - kw: d, arg: 4
unknown kwarg - kw: f, arg: 6
Args or kwargs can be passed without knowing what they are.
e.g. max(a, b, *args) 
100
e.g. dict(**kwargs) (same as dict(c=3, d=4, e=5, f=6)) returns: 
{'c': 3, 'e': 5, 'd': 4, 'f': 6}

Example 3: practical usage in decorators

OK, so maybe we're not seeing the utility yet. So imagine you have several functions with redundant code before and/or after the differentiating code. The following named functions are just pseudo-code for illustrative purposes.

def foo(a, b, c, d=0, e=100):
    # imagine this is much more code than a simple function call
    preprocess() 
    differentiating_process_foo(a,b,c,d,e)
    # imagine this is much more code than a simple function call
    postprocess()

def bar(a, b, c=None, d=0, e=100, f=None):
    preprocess()
    differentiating_process_bar(a,b,c,d,e,f)
    postprocess()

def baz(a, b, c, d, e, f):
    ... and so on

We might be able to handle this differently, but we can certainly extract the redundancy with a decorator, and so our below example demonstrates how *args and **kwargs can be very useful:

def decorator(function):
    '''function to wrap other functions with a pre- and postprocess'''
    @functools.wraps(function) # applies module, name, and docstring to wrapper
    def wrapper(*args, **kwargs):
        # again, imagine this is complicated, but we only write it once!
        preprocess()
        function(*args, **kwargs)
        postprocess()
    return wrapper

And now every wrapped function can be written much more succinctly, as we've factored out the redundancy:

@decorator
def foo(a, b, c, d=0, e=100):
    differentiating_process_foo(a,b,c,d,e)

@decorator
def bar(a, b, c=None, d=0, e=100, f=None):
    differentiating_process_bar(a,b,c,d,e,f)

@decorator
def baz(a, b, c=None, d=0, e=100, f=None, g=None):
    differentiating_process_baz(a,b,c,d,e,f, g)

@decorator
def quux(a, b, c=None, d=0, e=100, f=None, g=None, h=None):
    differentiating_process_quux(a,b,c,d,e,f,g,h)

And by factoring out our code, which *args and **kwargs allows us to do, we reduce lines of code, improve readability and maintainability, and have sole canonical locations for the logic in our program. If we need to change any part of this structure, we have one place in which to make each change.

python - What does ** (double star/asterisk) and * (star/asterisk) do ...

python syntax parameter-passing identifier kwargs
Rectangle 27 84

What does ** (double star) and * (star) do for parameters

**

*args allows for any number of optional positional arguments (parameters), which will be assigned to a tuple named args.

**kwargs allows for any number of optional keyword arguments (parameters), which will be in a dict named kwargs.

You can (and should) choose any appropriate name, but if the intention is for the arguments to be of non-specific semantics, args and kwargs are standard names.

You can also use *args and **kwargs to pass in parameters from lists (or any iterable) and dicts (or any mapping), respectively.

The function recieving the parameters does not have to know that they are being expanded.

For example, Python 2's xrange does not explicitly expect *args, but since it takes 3 integers as arguments:

>>> x = xrange(3) # create our *args - an iterable of 3 integers
>>> xrange(*x)    # expand here
xrange(0, 2, 2)

As another example, we can use dict expansion in str.format:

>>> foo = 'FOO'
>>> bar = 'BAR'
>>> 'this is foo, {foo} and bar, {bar}'.format(**locals())
'this is foo, FOO and bar, BAR'

You can have keyword only arguments after the *args - for example, here, kwarg2 must be given as a keyword argument - not positionally:

def foo(arg, kwarg=None, *args, kwarg2=None, **kwargs): 
    return arg, kwarg, args, kwarg2, kwargs
>>> foo(1,2,3,4,5,kwarg2='kwarg2', bar='bar', baz='baz')
(1, 2, (3, 4, 5), 'kwarg2', {'bar': 'bar', 'baz': 'baz'})

Also, * can be used by itself to indicate that keyword only arguments follow, without allowing for unlimited positional arguments.

def foo(arg, kwarg=None, *, kwarg2=None, **kwargs): 
    return arg, kwarg, kwarg2, kwargs
kwarg2
>>> foo(1,2,kwarg2='kwarg2', foo='foo', bar='bar')
(1, 2, 'kwarg2', {'foo': 'foo', 'bar': 'bar'})
*args*
>>> foo(1,2,3,4,5, kwarg2='kwarg2', foo='foo', bar='bar')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: foo() takes from 1 to 2 positional arguments 
    but 5 positional arguments (and 1 keyword-only argument) were given
kwarg
def bar(*, kwarg=None): 
    return kwarg

In this example, we see that if we try to pass kwarg positionally, we get an error:

>>> bar('kwarg')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: bar() takes 0 positional arguments but 1 was given

We must explicitly pass the kwarg parameter as a keyword argument.

>>> bar(kwarg='kwarg')
'kwarg'

*args (typically said "star-args") and **kwargs (stars can be implied by saying "kwargs", but be explicit with "double-star kwargs") are common idioms of Python for using the * and ** notation. These specific variable names aren't required (e.g. you could use *foos and **bars), but a departure from convention is likely to enrage your fellow Python coders.

We typically use these when we don't know what our function is going to receive or how many arguments we may be passing, and sometimes even when naming every variable separately would get very messy and redundant (but this is a case where usually explicit is better than implicit).

The following function describes how they can be used, and demonstrates behavior. Note the named b argument will be consumed by the second positional argument before :

def foo(a, b=10, *args, **kwargs):
    '''
    this function takes required argument a, not required keyword argument b
    and any number of unknown positional arguments and keyword arguments after
    '''
    print('a is a required argument, and its value is {0}'.format(a))
    print('b not required, its default value is 10, actual value: {0}'.format(b))
    # we can inspect the unknown arguments we were passed:
    #  - args:
    print('args is of type {0} and length {1}'.format(type(args), len(args)))
    for arg in args:
        print('unknown arg: {0}'.format(arg))
    #  - kwargs:
    print('kwargs is of type {0} and length {1}'.format(type(kwargs),
                                                        len(kwargs)))
    for kw, arg in kwargs.items():
        print('unknown kwarg - kw: {0}, arg: {1}'.format(kw, arg))
    # But we don't have to know anything about them 
    # to pass them to other functions.
    print('Args or kwargs can be passed without knowing what they are.')
    # max can take two or more positional args: max(a, b, c...)
    print('e.g. max(a, b, *args) \n{0}'.format(
      max(a, b, *args))) 
    kweg = 'dict({0})'.format( # named args same as unknown kwargs
      ', '.join('{k}={v}'.format(k=k, v=v) 
                             for k, v in sorted(kwargs.items())))
    print('e.g. dict(**kwargs) (same as {kweg}) returns: \n{0}'.format(
      dict(**kwargs), kweg=kweg))

We can check the online help for the function's signature, with help(foo), which tells us

foo(a, b=10, *args, **kwargs)
foo(1, 2, 3, 4, e=5, f=6, g=7)
a is a required argument, and its value is 1
b not required, its default value is 10, actual value: 2
args is of type <type 'tuple'> and length 2
unknown arg: 3
unknown arg: 4
kwargs is of type <type 'dict'> and length 3
unknown kwarg - kw: e, arg: 5
unknown kwarg - kw: g, arg: 7
unknown kwarg - kw: f, arg: 6
Args or kwargs can be passed without knowing what they are.
e.g. max(a, b, *args) 
4
e.g. dict(**kwargs) (same as dict(e=5, f=6, g=7)) returns: 
{'e': 5, 'g': 7, 'f': 6}

We can also call it using another function, into which we just provide a:

def bar(a):
    b, c, d, e, f = 2, 3, 4, 5, 6
    # dumping every local variable into foo as a keyword argument 
    # by expanding the locals dict:
    foo(**locals())
bar(100)
a is a required argument, and its value is 100
b not required, its default value is 10, actual value: 2
args is of type <type 'tuple'> and length 0
kwargs is of type <type 'dict'> and length 4
unknown kwarg - kw: c, arg: 3
unknown kwarg - kw: e, arg: 5
unknown kwarg - kw: d, arg: 4
unknown kwarg - kw: f, arg: 6
Args or kwargs can be passed without knowing what they are.
e.g. max(a, b, *args) 
100
e.g. dict(**kwargs) (same as dict(c=3, d=4, e=5, f=6)) returns: 
{'c': 3, 'e': 5, 'd': 4, 'f': 6}

Example 3: practical usage in decorators

OK, so maybe we're not seeing the utility yet. So imagine you have several functions with redundant code before and/or after the differentiating code. The following named functions are just pseudo-code for illustrative purposes.

def foo(a, b, c, d=0, e=100):
    # imagine this is much more code than a simple function call
    preprocess() 
    differentiating_process_foo(a,b,c,d,e)
    # imagine this is much more code than a simple function call
    postprocess()

def bar(a, b, c=None, d=0, e=100, f=None):
    preprocess()
    differentiating_process_bar(a,b,c,d,e,f)
    postprocess()

def baz(a, b, c, d, e, f):
    ... and so on

We might be able to handle this differently, but we can certainly extract the redundancy with a decorator, and so our below example demonstrates how *args and **kwargs can be very useful:

def decorator(function):
    '''function to wrap other functions with a pre- and postprocess'''
    @functools.wraps(function) # applies module, name, and docstring to wrapper
    def wrapper(*args, **kwargs):
        # again, imagine this is complicated, but we only write it once!
        preprocess()
        function(*args, **kwargs)
        postprocess()
    return wrapper

And now every wrapped function can be written much more succinctly, as we've factored out the redundancy:

@decorator
def foo(a, b, c, d=0, e=100):
    differentiating_process_foo(a,b,c,d,e)

@decorator
def bar(a, b, c=None, d=0, e=100, f=None):
    differentiating_process_bar(a,b,c,d,e,f)

@decorator
def baz(a, b, c=None, d=0, e=100, f=None, g=None):
    differentiating_process_baz(a,b,c,d,e,f, g)

@decorator
def quux(a, b, c=None, d=0, e=100, f=None, g=None, h=None):
    differentiating_process_quux(a,b,c,d,e,f,g,h)

And by factoring out our code, which *args and **kwargs allows us to do, we reduce lines of code, improve readability and maintainability, and have sole canonical locations for the logic in our program. If we need to change any part of this structure, we have one place in which to make each change.

python - What does ** (double star/asterisk) and * (star/asterisk) do ...

python syntax parameter-passing identifier kwargs
Rectangle 27 83

What does ** (double star) and * (star) do for parameters

**

*args allows for any number of optional positional arguments (parameters), which will be assigned to a tuple named args.

**kwargs allows for any number of optional keyword arguments (parameters), which will be in a dict named kwargs.

You can (and should) choose any appropriate name, but if the intention is for the arguments to be of non-specific semantics, args and kwargs are standard names.

You can also use *args and **kwargs to pass in parameters from lists (or any iterable) and dicts (or any mapping), respectively.

The function recieving the parameters does not have to know that they are being expanded.

For example, Python 2's xrange does not explicitly expect *args, but since it takes 3 integers as arguments:

>>> x = xrange(3) # create our *args - an iterable of 3 integers
>>> xrange(*x)    # expand here
xrange(0, 2, 2)

As another example, we can use dict expansion in str.format:

>>> foo = 'FOO'
>>> bar = 'BAR'
>>> 'this is foo, {foo} and bar, {bar}'.format(**locals())
'this is foo, FOO and bar, BAR'

You can have keyword only arguments after the *args - for example, here, kwarg2 must be given as a keyword argument - not positionally:

def foo(arg, kwarg=None, *args, kwarg2=None, **kwargs): 
    return arg, kwarg, args, kwarg2, kwargs
>>> foo(1,2,3,4,5,kwarg2='kwarg2', bar='bar', baz='baz')
(1, 2, (3, 4, 5), 'kwarg2', {'bar': 'bar', 'baz': 'baz'})

Also, * can be used by itself to indicate that keyword only arguments follow, without allowing for unlimited positional arguments.

def foo(arg, kwarg=None, *, kwarg2=None, **kwargs): 
    return arg, kwarg, kwarg2, kwargs
kwarg2
>>> foo(1,2,kwarg2='kwarg2', foo='foo', bar='bar')
(1, 2, 'kwarg2', {'foo': 'foo', 'bar': 'bar'})
*args*
>>> foo(1,2,3,4,5, kwarg2='kwarg2', foo='foo', bar='bar')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: foo() takes from 1 to 2 positional arguments 
    but 5 positional arguments (and 1 keyword-only argument) were given
kwarg
def bar(*, kwarg=None): 
    return kwarg

In this example, we see that if we try to pass kwarg positionally, we get an error:

>>> bar('kwarg')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: bar() takes 0 positional arguments but 1 was given

We must explicitly pass the kwarg parameter as a keyword argument.

>>> bar(kwarg='kwarg')
'kwarg'

*args (typically said "star-args") and **kwargs (stars can be implied by saying "kwargs", but be explicit with "double-star kwargs") are common idioms of Python for using the * and ** notation. These specific variable names aren't required (e.g. you could use *foos and **bars), but a departure from convention is likely to enrage your fellow Python coders.

We typically use these when we don't know what our function is going to receive or how many arguments we may be passing, and sometimes even when naming every variable separately would get very messy and redundant (but this is a case where usually explicit is better than implicit).

The following function describes how they can be used, and demonstrates behavior. Note the named b argument will be consumed by the second positional argument before :

def foo(a, b=10, *args, **kwargs):
    '''
    this function takes required argument a, not required keyword argument b
    and any number of unknown positional arguments and keyword arguments after
    '''
    print('a is a required argument, and its value is {0}'.format(a))
    print('b not required, its default value is 10, actual value: {0}'.format(b))
    # we can inspect the unknown arguments we were passed:
    #  - args:
    print('args is of type {0} and length {1}'.format(type(args), len(args)))
    for arg in args:
        print('unknown arg: {0}'.format(arg))
    #  - kwargs:
    print('kwargs is of type {0} and length {1}'.format(type(kwargs),
                                                        len(kwargs)))
    for kw, arg in kwargs.items():
        print('unknown kwarg - kw: {0}, arg: {1}'.format(kw, arg))
    # But we don't have to know anything about them 
    # to pass them to other functions.
    print('Args or kwargs can be passed without knowing what they are.')
    # max can take two or more positional args: max(a, b, c...)
    print('e.g. max(a, b, *args) \n{0}'.format(
      max(a, b, *args))) 
    kweg = 'dict({0})'.format( # named args same as unknown kwargs
      ', '.join('{k}={v}'.format(k=k, v=v) 
                             for k, v in sorted(kwargs.items())))
    print('e.g. dict(**kwargs) (same as {kweg}) returns: \n{0}'.format(
      dict(**kwargs), kweg=kweg))

We can check the online help for the function's signature, with help(foo), which tells us

foo(a, b=10, *args, **kwargs)
foo(1, 2, 3, 4, e=5, f=6, g=7)
a is a required argument, and its value is 1
b not required, its default value is 10, actual value: 2
args is of type <type 'tuple'> and length 2
unknown arg: 3
unknown arg: 4
kwargs is of type <type 'dict'> and length 3
unknown kwarg - kw: e, arg: 5
unknown kwarg - kw: g, arg: 7
unknown kwarg - kw: f, arg: 6
Args or kwargs can be passed without knowing what they are.
e.g. max(a, b, *args) 
4
e.g. dict(**kwargs) (same as dict(e=5, f=6, g=7)) returns: 
{'e': 5, 'g': 7, 'f': 6}

We can also call it using another function, into which we just provide a:

def bar(a):
    b, c, d, e, f = 2, 3, 4, 5, 6
    # dumping every local variable into foo as a keyword argument 
    # by expanding the locals dict:
    foo(**locals())
bar(100)
a is a required argument, and its value is 100
b not required, its default value is 10, actual value: 2
args is of type <type 'tuple'> and length 0
kwargs is of type <type 'dict'> and length 4
unknown kwarg - kw: c, arg: 3
unknown kwarg - kw: e, arg: 5
unknown kwarg - kw: d, arg: 4
unknown kwarg - kw: f, arg: 6
Args or kwargs can be passed without knowing what they are.
e.g. max(a, b, *args) 
100
e.g. dict(**kwargs) (same as dict(c=3, d=4, e=5, f=6)) returns: 
{'c': 3, 'e': 5, 'd': 4, 'f': 6}

Example 3: practical usage in decorators

OK, so maybe we're not seeing the utility yet. So imagine you have several functions with redundant code before and/or after the differentiating code. The following named functions are just pseudo-code for illustrative purposes.

def foo(a, b, c, d=0, e=100):
    # imagine this is much more code than a simple function call
    preprocess() 
    differentiating_process_foo(a,b,c,d,e)
    # imagine this is much more code than a simple function call
    postprocess()

def bar(a, b, c=None, d=0, e=100, f=None):
    preprocess()
    differentiating_process_bar(a,b,c,d,e,f)
    postprocess()

def baz(a, b, c, d, e, f):
    ... and so on

We might be able to handle this differently, but we can certainly extract the redundancy with a decorator, and so our below example demonstrates how *args and **kwargs can be very useful:

def decorator(function):
    '''function to wrap other functions with a pre- and postprocess'''
    @functools.wraps(function) # applies module, name, and docstring to wrapper
    def wrapper(*args, **kwargs):
        # again, imagine this is complicated, but we only write it once!
        preprocess()
        function(*args, **kwargs)
        postprocess()
    return wrapper

And now every wrapped function can be written much more succinctly, as we've factored out the redundancy:

@decorator
def foo(a, b, c, d=0, e=100):
    differentiating_process_foo(a,b,c,d,e)

@decorator
def bar(a, b, c=None, d=0, e=100, f=None):
    differentiating_process_bar(a,b,c,d,e,f)

@decorator
def baz(a, b, c=None, d=0, e=100, f=None, g=None):
    differentiating_process_baz(a,b,c,d,e,f, g)

@decorator
def quux(a, b, c=None, d=0, e=100, f=None, g=None, h=None):
    differentiating_process_quux(a,b,c,d,e,f,g,h)

And by factoring out our code, which *args and **kwargs allows us to do, we reduce lines of code, improve readability and maintainability, and have sole canonical locations for the logic in our program. If we need to change any part of this structure, we have one place in which to make each change.

python - What does ** (double star/asterisk) and * (star/asterisk) do ...

python syntax parameter-passing identifier kwargs
Rectangle 27 2

ast.literal_eval
import ast
temp = "'mycode':['1','2','firstname','Lastname']"
key,value = map(ast.literal_eval, temp.split(':'))
status = {key: value}
{'mycode': ['1', '2', 'firstname', 'Lastname']}

python - extract a dictionary key value from a string - Stack Overflow

python string list
Rectangle 27 1

Procmail can extract those values, or you can just pass the whole message to Python on stdin.

Assuming you want the final digits and you require there to be 4 or 5, something like this:

R=`formail -zxReply-to: | sed 's/.*<//;s/>.*//'`
:0
* ^From:.*@(helpicantfindgoogle\.com|searchengineshateme\.net|disabled\.org)\>
* ^Subject:(.*[^0-9])?\/[0-9][0-9][0-9][0-9][0-9]?$
| scriptname.py --reply-to "$R" --number "$MATCH"

This illustrates two different techniques for extracting a header value; the Reply-To header is extracted by invoking formail (this will extract just the email terminus, as per your comment; if you mean something else by "alias" then please define it properly) while the trailing 4- or 5-number integer from the Subject is grabbed my matching it in the condition with the special operator \/.

Update: Added an additional condition to only process email where the From: header indicates a sender in one of the domains helpicantfindgoogle.com, searchengineshateme.net, or disabled.org.

As implied by the pipe action, your script will be able to read the triggering message on its standard input, but if you don't need it, just don't read standard input.

If delivery is successful, Procmail will stop processing when this recipe finishes. Thus you should not need to explicitly discard a matching message. (If you want to keep going, use :0c instead of just :0.)

As an efficiency tweak (if you receive a lot of email, and only a small fraction of it needs to be passed to this script, for example) you might want to refactor to only extract the Reply-To: when the conditions match.

:0
* ^From:.*@(helpicantfindgoogle.com|searchengineshateme\.net|disabled\.org)\>
* ^Subject:(.*[^0-9])?\/[0-9][0-9][0-9][0-9][0-9]?$
{
  R=`formail -zxReply-To: | sed 's/.*<//;s/>.*//'`
  :0
  | scriptname.py --reply-to "$R" --number "$MATCH"
}

The block (the stuff between { and }) will only be entered when both the conditions are met. The extraction of the number from the Subject: header into $MATCH works as before; if the From: condition matched and the Subject: condition matched, the extracted number will be in $MATCH.

Thx much. I am close. I now see that I need to better parse the REPLYTO line so Thanks much. I am getting close. I now see I need to better parse the REPLY-TO. I would like to get the contents of what is between the "<" and ">" symbols, and ignore all else for that email alias. A sample of what I am getting now is: procmail: Assigning "R=Jason SJOBECK <cases.76947_461131.6b69fec08d@cases.example.com>"

Thx, good sir. (I'm off to the next piece of my puzzle, which is nearly complete, which is writing the 2 values into mySQL, then I'm done.) Namaste.

procmail - procmal recipe to pass values to my Python script - Stack O...

python procmail
Rectangle 27 1

Assuming, this is located inside the script tag, you can use the BeautifulSoup module to parse the HTML and to locate the script by the same regular expression that you would use to extract the modelData value. Then, after fixing the modelData value to be "loadable" with json.loads(), you would have a Python data structure you can easily work with:

import json
from bs4 import BeautifulSoup

import re

data = """
<script>
var modelData = [{"Id":958,"Date":"20160428","Title":"Design","Description":"London Auction 28 April 2016","Department":"Design","Location":"LONDON","Permalink":"/auctions/auction/UK050116","Year":"2016","Image":"/Xigen/image.ashx?path=\\\\diskstation\\website\\Certificates\\UK050116\\UK050116.jpg\u0026width=308\u0026height=222","addThis":" addthis:url=\"https://www.example.com/auctions/auction/UK050116\" ","results_html":"\u003cli class=\"expandable past-auction-exp closed\"\u003e\u003ca href=\"#\"\u003eVIEW RESULTS\u003c/a\u003e\u003cdiv class=\"panel\" style=\"display:none\"\u003e\u003ca href=\"/auctions/auction/UK050116\"\u003eOnline\u003c/a\u003e\u003ca target=\"_blank\" href=\"/Xigen/file.ashx?path=\\\\diskstation\\website\\Media\\Auction\\auctionResultsFile_UK050116.pdf\"\u003ePDF\u003c/a\u003e\u003c/div\u003e\u003c/li\u003e","Download_catalog_html":"\u003cli class=\"expandable past-auction-exp closed\"\u003e\u003ca href=\"#\"\u003eCATALOGUES\u003c/a\u003e\u003cdiv class=\"panel\" style=\"display:none\"\u003e\u003ca target=\"_blank\" id=\"linkDownloadCatalog\" href=\"http://www.example.com/Xigen/file.ashx?path=\\\\diskstation\\website\\Certificates/UK050116/UK050116_catalog.pdf\"\u003eDownload Catalogue\u003c/a\u003e\u003ca href=\"/catalogues/buy\"\u003ePurchase Catalogue\u003c/a\u003e\u003c/div\u003e\u003c/li\u003e"}]
</script>
"""

soup = BeautifulSoup(data, 'lxml')

pattern = re.compile(r"var modelData = (\[.*?\])", re.MULTILINE | re.DOTALL)
script = soup.find("script", text=pattern)

s = pattern.search(script.text).group(1).encode('unicode_escape')
while True:
    try:
        result = json.loads(s)   # try to parse...
        break                    # parsing worked -> exit loop
    except Exception as e:
        # "Expecting , delimiter: line 34 column 54 (char 1158)"
        # position of unexpected character after '"'
        unexp = int(re.findall(r'\(char (\d+)\)', str(e))[0])
        # position of unescaped '"' before that
        unesc = s.rfind(r'"', 0, unexp)
        s = s[:unesc] + r'\"' + s[unesc+1:]
        # position of correspondig closing '"' (+2 for inserted '\')
        closg = s.find(r'"', unesc + 2)
        s = s[:closg] + r'\"' + s[closg+1:]

item = result[0]
print(item["Id"])
print(item["Title"])

Prints (works on Python 2 only in this state):

958
Design

python - How to Parse JavaScript Type Html - Stack Overflow

javascript python html regex web-scraping
Rectangle 27 348

A comment in the Python source code for float objects acknowledges that:

This is especially true when comparing a float to an integer, because, unlike floats, integers in Python can be arbitrarily large and are always exact. Trying to cast the integer to a float might lose precision and make the comparison inaccurate. Trying to cast the float to an integer is not going to work either because any fractional part will be lost.

To get around this problem, Python performs a series of checks, returning the result if one of the checks succeeds. It compares the signs of the two values, then whether the integer is "too big" to be a float, then compares the exponent of the float to the length of the integer. If all of these checks fail, it is necessary to construct two new Python objects to compare in order to obtain the result.

When comparing a float v to an integer/long w, the worst case is that:

  • v and w have the same sign (both positive or both negative),
  • the integer w has few enough bits that it can be held in the size_t type (typically 32 or 64 bits),
  • the exponent of the float v is the same as the number of bits in w.

And this is exactly what we have for the values in the question:

>>> import math
>>> math.frexp(562949953420000.7) # gives the float's (significand, exponent) pair
(0.9999999999976706, 49)
>>> (562949953421000).bit_length()
49

We see that 49 is both the exponent of the float and the number of bits in the integer. Both numbers are positive and so the four criteria above are met.

Choosing one of the values to be larger (or smaller) can change the number of bits of the integer, or the value of the exponent, and so Python is able to determine the result of the comparison without performing the expensive final check.

This is specific to the CPython implementation of the language.

The float_richcompare function handles the comparison between two values v and w.

Below is a step-by-step description of the checks that the function performs. The comments in the Python source are actually very helpful when trying to understand what the function does, so I've left them in where relevant. I've also summarised these checks in a list at the foot of the answer.

The main idea is to map the Python objects v and w to two appropriate C doubles, i and j, which can then be easily compared to give the correct result. Both Python 2 and Python 3 use the same ideas to do this (the former just handles int and long types separately).

The first thing to do is check that v is definitely a Python float and map it to a C double i. Next the function looks at whether w is also a float and maps it to a C double j. This is the best case scenario for the function as all the other checks can be skipped. The function also checks to see whether v is inf or nan:

static PyObject*
float_richcompare(PyObject *v, PyObject *w, int op)
{
    double i, j;
    int r = 0;
    assert(PyFloat_Check(v));       
    i = PyFloat_AS_DOUBLE(v);       

    if (PyFloat_Check(w))           
        j = PyFloat_AS_DOUBLE(w);   

    else if (!Py_IS_FINITE(i)) {
        if (PyLong_Check(w))
            j = 0.0;
        else
            goto Unimplemented;
    }

Now we know that if w failed these checks, it is not a Python float. Now the function checks if it's a Python integer. If this is the case, the easiest test is to extract the sign of v and the sign of w (return 0 if zero, -1 if negative, 1 if positive). If the signs are different, this is all the information needed to return the result of the comparison:

else if (PyLong_Check(w)) {
        int vsign = i == 0.0 ? 0 : i < 0.0 ? -1 : 1;
        int wsign = _PyLong_Sign(w);
        size_t nbits;
        int exponent;

        if (vsign != wsign) {
            /* Magnitudes are irrelevant -- the signs alone
             * determine the outcome.
             */
            i = (double)vsign;
            j = (double)wsign;
            goto Compare;
        }
    }

If this check failed, then v and w have the same sign.

The next check counts the number of bits in the integer w. If it has too many bits then it can't possibly be held as a float and so must be larger in magnitude than the float v:

nbits = _PyLong_NumBits(w);
    if (nbits == (size_t)-1 && PyErr_Occurred()) {
        /* This long is so large that size_t isn't big enough
         * to hold the # of bits.  Replace with little doubles
         * that give the same outcome -- w is so large that
         * its magnitude must exceed the magnitude of any
         * finite float.
         */
        PyErr_Clear();
        i = (double)vsign;
        assert(wsign != 0);
        j = wsign * 2.0;
        goto Compare;
    }

On the other hand, if the integer w has 48 or fewer bits, it can safely turned in a C double j and compared:

if (nbits <= 48) {
        j = PyLong_AsDouble(w);
        /* It's impossible that <= 48 bits overflowed. */
        assert(j != -1.0 || ! PyErr_Occurred());
        goto Compare;
    }

From this point onwards, we know that w has 49 or more bits. It will be convenient to treat w as a positive integer, so change the sign and the comparison operator as necessary:

if (nbits <= 48) {
        /* "Multiply both sides" by -1; this also swaps the
         * comparator.
         */
        i = -i;
        op = _Py_SwappedOp[op];
    }

Now the function looks at the exponent of the float. Recall that a float can be written (ignoring sign) as significand * 2exponent and that the significand represents a number between 0.5 and 1:

(void) frexp(i, &exponent);
    if (exponent < 0 || (size_t)exponent < nbits) {
        i = 1.0;
        j = 2.0;
        goto Compare;
    }

This checks two things. If the exponent is less than 0 then the float is smaller than 1 (and so smaller in magnitude than any integer). Or, if the exponent is less than the number of bits in w then we have that v < |w| since significand * 2exponent is less than 2nbits.

Failing these two checks, the function looks to see whether the exponent is greater than the number of bit in w. This shows that significand * 2exponent is greater than 2nbits and so v > |w|:

if ((size_t)exponent > nbits) {
        i = 2.0;
        j = 1.0;
        goto Compare;
    }

If this check did not succeed we know that the exponent of the float v is the same as the number of bits in the integer w.

The only way that the two values can be compared now is to construct two new Python integers from v and w. The idea is to discard the fractional part of v, double the integer part, and then add one. w is also doubled and these two new Python objects can be compared to give the correct return value. Using an example with small values, 4.65 < 4 would be determined by the comparison (2*4)+1 == 9 < 8 == (2*4) (returning false).

{
        double fracpart;
        double intpart;
        PyObject *result = NULL;
        PyObject *one = NULL;
        PyObject *vv = NULL;
        PyObject *ww = w;

        // snip

        fracpart = modf(i, &intpart); // split i (the double that v mapped to)
        vv = PyLong_FromDouble(intpart);

        // snip

        if (fracpart != 0.0) {
            /* Shift left, and or a 1 bit into vv
             * to represent the lost fraction.
             */
            PyObject *temp;

            one = PyLong_FromLong(1);

            temp = PyNumber_Lshift(ww, one); // left-shift doubles an integer
            ww = temp;

            temp = PyNumber_Lshift(vv, one);
            vv = temp;

            temp = PyNumber_Or(vv, one); // a doubled integer is even, so this adds 1
            vv = temp;
        }
        // snip
    }
}

For brevity I've left out the additional error-checking and garbage-tracking Python has to do when it creates these new objects. Needless to say, this adds additional overhead and explains why the values highlighted in the question are significantly slower to compare than others.

Here is a summary of the checks that are performed by the comparison function.

Let v be a float and cast it as a C double. Now, if w is also a float:

Well done Python developers - most language implementations would have just handwaved the issue by saying float/integer comparisons are not exact.

python - Why are some float < integer comparisons four times slower th...

python performance floating-point cpython python-internals
Rectangle 27 345

A comment in the Python source code for float objects acknowledges that:

This is especially true when comparing a float to an integer, because, unlike floats, integers in Python can be arbitrarily large and are always exact. Trying to cast the integer to a float might lose precision and make the comparison inaccurate. Trying to cast the float to an integer is not going to work either because any fractional part will be lost.

To get around this problem, Python performs a series of checks, returning the result if one of the checks succeeds. It compares the signs of the two values, then whether the integer is "too big" to be a float, then compares the exponent of the float to the length of the integer. If all of these checks fail, it is necessary to construct two new Python objects to compare in order to obtain the result.

When comparing a float v to an integer/long w, the worst case is that:

  • v and w have the same sign (both positive or both negative),
  • the integer w has few enough bits that it can be held in the size_t type (typically 32 or 64 bits),
  • the exponent of the float v is the same as the number of bits in w.

And this is exactly what we have for the values in the question:

>>> import math
>>> math.frexp(562949953420000.7) # gives the float's (significand, exponent) pair
(0.9999999999976706, 49)
>>> (562949953421000).bit_length()
49

We see that 49 is both the exponent of the float and the number of bits in the integer. Both numbers are positive and so the four criteria above are met.

Choosing one of the values to be larger (or smaller) can change the number of bits of the integer, or the value of the exponent, and so Python is able to determine the result of the comparison without performing the expensive final check.

This is specific to the CPython implementation of the language.

The float_richcompare function handles the comparison between two values v and w.

Below is a step-by-step description of the checks that the function performs. The comments in the Python source are actually very helpful when trying to understand what the function does, so I've left them in where relevant. I've also summarised these checks in a list at the foot of the answer.

The main idea is to map the Python objects v and w to two appropriate C doubles, i and j, which can then be easily compared to give the correct result. Both Python 2 and Python 3 use the same ideas to do this (the former just handles int and long types separately).

The first thing to do is check that v is definitely a Python float and map it to a C double i. Next the function looks at whether w is also a float and maps it to a C double j. This is the best case scenario for the function as all the other checks can be skipped. The function also checks to see whether v is inf or nan:

static PyObject*
float_richcompare(PyObject *v, PyObject *w, int op)
{
    double i, j;
    int r = 0;
    assert(PyFloat_Check(v));       
    i = PyFloat_AS_DOUBLE(v);       

    if (PyFloat_Check(w))           
        j = PyFloat_AS_DOUBLE(w);   

    else if (!Py_IS_FINITE(i)) {
        if (PyLong_Check(w))
            j = 0.0;
        else
            goto Unimplemented;
    }

Now we know that if w failed these checks, it is not a Python float. Now the function checks if it's a Python integer. If this is the case, the easiest test is to extract the sign of v and the sign of w (return 0 if zero, -1 if negative, 1 if positive). If the signs are different, this is all the information needed to return the result of the comparison:

else if (PyLong_Check(w)) {
        int vsign = i == 0.0 ? 0 : i < 0.0 ? -1 : 1;
        int wsign = _PyLong_Sign(w);
        size_t nbits;
        int exponent;

        if (vsign != wsign) {
            /* Magnitudes are irrelevant -- the signs alone
             * determine the outcome.
             */
            i = (double)vsign;
            j = (double)wsign;
            goto Compare;
        }
    }

If this check failed, then v and w have the same sign.

The next check counts the number of bits in the integer w. If it has too many bits then it can't possibly be held as a float and so must be larger in magnitude than the float v:

nbits = _PyLong_NumBits(w);
    if (nbits == (size_t)-1 && PyErr_Occurred()) {
        /* This long is so large that size_t isn't big enough
         * to hold the # of bits.  Replace with little doubles
         * that give the same outcome -- w is so large that
         * its magnitude must exceed the magnitude of any
         * finite float.
         */
        PyErr_Clear();
        i = (double)vsign;
        assert(wsign != 0);
        j = wsign * 2.0;
        goto Compare;
    }

On the other hand, if the integer w has 48 or fewer bits, it can safely turned in a C double j and compared:

if (nbits <= 48) {
        j = PyLong_AsDouble(w);
        /* It's impossible that <= 48 bits overflowed. */
        assert(j != -1.0 || ! PyErr_Occurred());
        goto Compare;
    }

From this point onwards, we know that w has 49 or more bits. It will be convenient to treat w as a positive integer, so change the sign and the comparison operator as necessary:

if (nbits <= 48) {
        /* "Multiply both sides" by -1; this also swaps the
         * comparator.
         */
        i = -i;
        op = _Py_SwappedOp[op];
    }

Now the function looks at the exponent of the float. Recall that a float can be written (ignoring sign) as significand * 2exponent and that the significand represents a number between 0.5 and 1:

(void) frexp(i, &exponent);
    if (exponent < 0 || (size_t)exponent < nbits) {
        i = 1.0;
        j = 2.0;
        goto Compare;
    }

This checks two things. If the exponent is less than 0 then the float is smaller than 1 (and so smaller in magnitude than any integer). Or, if the exponent is less than the number of bits in w then we have that v < |w| since significand * 2exponent is less than 2nbits.

Failing these two checks, the function looks to see whether the exponent is greater than the number of bit in w. This shows that significand * 2exponent is greater than 2nbits and so v > |w|:

if ((size_t)exponent > nbits) {
        i = 2.0;
        j = 1.0;
        goto Compare;
    }

If this check did not succeed we know that the exponent of the float v is the same as the number of bits in the integer w.

The only way that the two values can be compared now is to construct two new Python integers from v and w. The idea is to discard the fractional part of v, double the integer part, and then add one. w is also doubled and these two new Python objects can be compared to give the correct return value. Using an example with small values, 4.65 < 4 would be determined by the comparison (2*4)+1 == 9 < 8 == (2*4) (returning false).

{
        double fracpart;
        double intpart;
        PyObject *result = NULL;
        PyObject *one = NULL;
        PyObject *vv = NULL;
        PyObject *ww = w;

        // snip

        fracpart = modf(i, &intpart); // split i (the double that v mapped to)
        vv = PyLong_FromDouble(intpart);

        // snip

        if (fracpart != 0.0) {
            /* Shift left, and or a 1 bit into vv
             * to represent the lost fraction.
             */
            PyObject *temp;

            one = PyLong_FromLong(1);

            temp = PyNumber_Lshift(ww, one); // left-shift doubles an integer
            ww = temp;

            temp = PyNumber_Lshift(vv, one);
            vv = temp;

            temp = PyNumber_Or(vv, one); // a doubled integer is even, so this adds 1
            vv = temp;
        }
        // snip
    }
}

For brevity I've left out the additional error-checking and garbage-tracking Python has to do when it creates these new objects. Needless to say, this adds additional overhead and explains why the values highlighted in the question are significantly slower to compare than others.

Here is a summary of the checks that are performed by the comparison function.

Let v be a float and cast it as a C double. Now, if w is also a float:

This did indeed turn out to be one heck of a post.

Well done Python developers - most language implementations would have just handwaved the issue by saying float/integer comparisons are not exact.

python - Why are some float < integer comparisons four times slower th...

python performance floating-point cpython python-internals
Rectangle 27 1

The following code should work, the list comprehensions below iterate through every item in the second frame for each row in the first. The value and index are stored in a tuple. The minimum of these is found using a lambda that selects the first element. The indices are then extracted by mapping a different lambda which selects the second element only. This is a good explanation of lambdas. http://www.secnetix.de/olli/Python/lambda_functions.hawk.

ldf1 = len(list(df1.iterrows()))
ldf2 = len(list(df2.iterrows()))
funk = lambda df1, df2, j, i:f(df1.loc[j, 'lat'], df1.loc[j, 'lon'],df2.loc[i,'lat'], df2.loc[i, 'lon'])
pairs = [min([(funk(DF1, DF2, j, i), i) for i in xrange(ldf2)], key=lambda x:x[0]) for j in xrange(ldf1)]
mins = map(lambda x:x[1], pairs)

It's also worth noting that this is going to run in polynomial time, which is going to take a while with the number of rows you have. I chose to use map and list comprehensions because they will be faster than a standard for each

given the amount of data i dont think this is feasible

I don't see a way for him to accomplish this below polynomial time. Python can certainly handle this many calculations and since there seems to be no limit on time I think it should be ok. However I took this into account and used map, and nested comprehensions which are going to be faster than a for loop. wiki.python.org/moin/PythonSpeed/PerformanceTips#Loops

Find MinArg in Python -- Pandas DFs Distance - Stack Overflow

python pandas
Rectangle 27 3

data = {"widget": { "debug": "on", "window": { "title": "SampleWidget", "name": "main_window", "width": 500, "height": 500 }, "image": { "src": "Images/Sun.png", "name": "sun1", "hOffset": 250, "vOffset": 250, "alignment": "center" }, "text": { "data": "Click Here", "size": 36, "style": "bold", "name": "text1", "hOffset": 250, "vOffset": 100, "alignment": "center", "onMouseUp": "sun1.opacity = (sun1.opacity / 100) * 90;" } }} 

def pairs(d):
    for k, v in d.items():
        if isinstance(v, dict):
            yield from pairs(v)
        else:
            yield '{}={}'.format(k, v)

print(list(pairs(data)))
$ python3.5 extract.py 
['size=36', 'alignment=center', 'data=Click Here', 'onMouseUp=sun1.opacity = (sun1.opacity / 100) * 90;', 'vOffset=100', 'name=text1', 'hOffset=250', 'style=bold', 'name=sun1', 'hOffset=250', 'vOffset=250', 'alignment=center', 'src=Images/Sun.png', 'debug=on', 'name=main_window', 'title=SampleWidget', 'width=500', 'height=500']

Python extract all key values in in nested JSON in a list - Stack Over...

python json
Rectangle 27 1

col_names = y.corr().columns.values

for col, row in (y.corr().abs() > 0.7).iteritems():
    print(col, col_names[row.values])

Note that this works but it might be slow because the iteritems method converts each row into a series.

I tried to edit your code to exclude r=1, but I got an error message. How would you edit this code? for col, row in (y.corr().abs() > 0.7).iteritems(): This didn't work for me for col, row in (y.corr().abs() > 0.7 and y.corr().abs()<1).iteritems():

and
or
(y.corr().abs() > 0.7) & (y.corr().abs() < 1)

Thanks again! You solved my problem and I also learned from what you did. I'll be sure to pay it forward.

python - Correlation Matrix: Extract Variables with High R Values - St...

python matrix correlation
Rectangle 27 1

corr = y.corr().unstack().reset_index() #group together pairwise
corr.columns = ['var1','var2','corr'] #rename columns to something readable
print( corr[ corr['corr'].abs() > 0.7 ] ) #keep correlation results above 0.7

You could further exclude variables with the same name (corr = 1) by changing the last line to

print( corr[ (corr['corr'].abs() > 0.7) & (corr['var1'] != corr['var2']) ] )

python - Correlation Matrix: Extract Variables with High R Values - St...

python matrix correlation
Rectangle 27 2

You can use duplicated with param keep=False so it returns True for all duplicated rows and mask the df:

In [16]:
df[df['ID Number'].duplicated(keep=False)]

Out[16]:
  ID Number  col2  col3      DATE
0       111   0.5  -0.6  20160104
3       111  -0.7  -0.9  20150102

For the second part you can do:

gp = df[df['ID Number'].duplicated(keep=False)].groupby('ID Number')
gp.apply(lambda x: x.to_csv(str(x.name) + '.csv')

Actually if you're just wanting to write all rows with the same ID number to a named csv then:

Should do what you want

For the second part after EDIT, where can I specify the output path for saving these files?

df.groupby('ID Number').apply(lambda x: x.to_csv(r'c:\output\' + str(x.name) + '.csv'))
path = 'c:/my_output_folder/' and then do

How to filter values by Column Name and then extract the rows that hav...

python python-2.7 csv pandas
Rectangle 27 1

Disclaimer: My approach uses the pandas library.

id,description,amount,date
1,Some description.,$150.54,12/12/2012
2,Some other description.,$200,10/10/2015
3,Other description.,$25,11/11/2014
4,My description,$11.35,01/01/2015
5,Your description.,$20,03/03/2013
id,description,date
1,Some description.,12/12/2012
2,Some other description.,10/10/2015
3,Other description.,11/11/2014
4,122333222233332221,11/11/2014
5,Your description.,03/03/2013
id,description,amount,date
1,Some description.,$150.54,12/12/2012
2,Some other description.,$200,10/10/2015
3,Other description.,$25,11/11/2014
5,Your description.,$20,03/03/2013
import pandas as pd

def compare_extract(extract_name, reference='gold_std.csv'):

    gold = pd.read_csv(reference)
    ext = pd.read_csv(extract_name)

    gc = set(gold.columns)
    header = ext.columns
    extc = set(header)

    if gc != extc:
        missing = ", ".join(list(gc - extc))
        print "Extract has the following missing columns: {}".format(missing)
    else:
        print "Extract has the same column as standard. Checking for abberant rows..."
        gold_list = gold.values.tolist()
        ext_list = ext.values.tolist()
        # Somewhat non-pandaic approach because possible no same IDs so we're relying
        # on set operations instead. A bit hackish, actually.
        diff = list(set(map(tuple, gold_list)) - set(map(tuple, ext_list)))
        df = pd.DataFrame(diff, columns=header)
        print "The following rows are not in the extract: "
        print df
e1 = 'extract1.csv'
compare_extract(e1)
# Extract has the following missing columns: amount

e2 = 'extract2.csv'
compare_extract(e2)
# Extract has the same column as standard. Checking for abberant rows...
# The following rows are not in the extract: 
#    id     description  amount        date
# 0   4  My description  $11.35  01/01/2015

Finally, the last extract is a bit arbitrary. I think for that one you're better off writing a non-pandas algorithm.

regex - Comparing gold standard csv file and extracted values csv file...

python regex csv
Rectangle 27 7

What is happening is that the variable i is captured, and the functions are returning the value it is bound to at the time it is called. In functional languages this kind of situation never arises, as i wouldn't be rebound. However with python, and also as you've seen with lisp, this is no longer true.

The difference with your scheme example is to do with the semantics of the do loop. Scheme is effectively creating a new i variable each time through the loop, rather than reusing an existing i binding as with the other languages. If you use a different variable created external to the loop and mutate it, you'll see the same behaviour in scheme. Try replacing your loop with:

(let ((ii 1)) (
  (do ((i 1 (+ 1 i)))
      ((>= i 4))
    (set! flist 
      (cons (lambda (x) (* ii x)) flist))
    (set! ii i))
))

Take a look here for some further discussion of this.

[Edit] Possibly a better way to describe it is to think of the do loop as a macro which performs the following steps:

  • Define a lambda taking a single parameter (i), with a body defined by the body of the loop,
  • An immediate call of that lambda with appropriate values of i as its parameter.

ie. the equivalent to the below python:

flist = []

def loop_body(i):      # extract body of the for loop to function
    def func(x): return x*i
    flist.append(func)

map(loop_body, xrange(3))  # for i in xrange(3): body

The i is no longer the one from the parent scope but a brand new variable in its own scope (ie. the parameter to the lambda) and so you get the behaviour you observe. Python doesn't have this implicit new scope, so the body of the for loop just shares the i variable.

Interesting. I wasn't aware of the difference in semantics of the do loop. Thanks

python - How do lexical closures work? - Stack Overflow

python closures lazy-evaluation late-binding
Rectangle 27 198

To quickly extract the values for a particular key, I personally like to use "grep -o", which only returns the regex's match. For example, to get the "text" field from tweets, something like:

grep -Po '"text":.*?[^\\]",' tweets.json

This regex is more robust than you might think; for example, it deals fine with strings having embedded commas and escaped quotes inside them. I think with a little more work you could make one that is actually guaranteed to extract the value, if it's atomic. (If it has nesting, then a regex can't do it of course.)

And to further clean (albeit keeping the string's original escaping) you can use something like: | perl -pe 's/"text"://; s/^"//; s/",$//'. (I did this for this analysis.)

To all the haters who insist you should use a real JSON parser -- yes, that is essential for correctness, but

  • To do a really quick analysis, like counting values to check on data cleaning bugs or get a general feel for the data, banging out something on the command line is faster. Opening an editor to write a script is distracting.
  • grep -o is orders of magnitude faster than the Python standard json library, at least when doing this for tweets (which are ~2 KB each). I'm not sure if this is just because json is slow (I should compare to yajl sometime); but in principle, a regex should be faster since it's finite state and much more optimizable, instead of a parser that has to support recursion, and in this case, spends lots of CPU building trees for structures you don't care about. (If someone wrote a finite state transducer that did proper (depth-limited) JSON parsing, that would be fantastic! In the meantime we have "grep -o".)

To write maintainable code, I always use a real parsing library. I haven't tried jsawk, but if it works well, that would address point #1.

One last, wackier, solution: I wrote a script that uses Python json and extracts the keys you want, into tab-separated columns; then I pipe through a wrapper around awk that allows named access to columns. In here: the json2tsv and tsvawk scripts. So for this example it would be:

json2tsv id text < tweets.json | tsvawk '{print "tweet " $id " is: " $text}'

This approach doesn't address #2, is more inefficient than a single Python script, and it's a little brittle: it forces normalization of newlines and tabs in string values, to play nice with awk's field/record-delimited view of the world. But it does let you stay on the command line, with more correctness than grep -o.

grep -Po '"text":(\d*?,|.*?[^\\]",)'

Robert: Right, my regex was written only for string values for that field. Integers could be added as you say. If you want all types, you have to do more and more: booleans, null. And arrays and objects require more work; only depth-limited is possible, under standard regexes.

1. jq .name works on the command-line and it doesn't require "opening an editor to write a script". 2. It doesn't matter how fast your regex can produce wrong results

It seems that on OSX the -P option is missing. I tested on OSX 10.11.5 and grep --version was grep (BSD grep) 2.5.1-FreeBSD. I got it working with the "extended regex" option on OSX. The command from above would be grep -Eo '"text":.*?[^\\]",' tweets.json.

bash - Parsing JSON with Unix tools - Stack Overflow

json bash curl command-line
Rectangle 27 198

To quickly extract the values for a particular key, I personally like to use "grep -o", which only returns the regex's match. For example, to get the "text" field from tweets, something like:

grep -Po '"text":.*?[^\\]",' tweets.json

This regex is more robust than you might think; for example, it deals fine with strings having embedded commas and escaped quotes inside them. I think with a little more work you could make one that is actually guaranteed to extract the value, if it's atomic. (If it has nesting, then a regex can't do it of course.)

And to further clean (albeit keeping the string's original escaping) you can use something like: | perl -pe 's/"text"://; s/^"//; s/",$//'. (I did this for this analysis.)

To all the haters who insist you should use a real JSON parser -- yes, that is essential for correctness, but

  • To do a really quick analysis, like counting values to check on data cleaning bugs or get a general feel for the data, banging out something on the command line is faster. Opening an editor to write a script is distracting.
  • grep -o is orders of magnitude faster than the Python standard json library, at least when doing this for tweets (which are ~2 KB each). I'm not sure if this is just because json is slow (I should compare to yajl sometime); but in principle, a regex should be faster since it's finite state and much more optimizable, instead of a parser that has to support recursion, and in this case, spends lots of CPU building trees for structures you don't care about. (If someone wrote a finite state transducer that did proper (depth-limited) JSON parsing, that would be fantastic! In the meantime we have "grep -o".)

To write maintainable code, I always use a real parsing library. I haven't tried jsawk, but if it works well, that would address point #1.

One last, wackier, solution: I wrote a script that uses Python json and extracts the keys you want, into tab-separated columns; then I pipe through a wrapper around awk that allows named access to columns. In here: the json2tsv and tsvawk scripts. So for this example it would be:

json2tsv id text < tweets.json | tsvawk '{print "tweet " $id " is: " $text}'

This approach doesn't address #2, is more inefficient than a single Python script, and it's a little brittle: it forces normalization of newlines and tabs in string values, to play nice with awk's field/record-delimited view of the world. But it does let you stay on the command line, with more correctness than grep -o.

grep -Po '"text":(\d*?,|.*?[^\\]",)'

Robert: Right, my regex was written only for string values for that field. Integers could be added as you say. If you want all types, you have to do more and more: booleans, null. And arrays and objects require more work; only depth-limited is possible, under standard regexes.

1. jq .name works on the command-line and it doesn't require "opening an editor to write a script". 2. It doesn't matter how fast your regex can produce wrong results

It seems that on OSX the -P option is missing. I tested on OSX 10.11.5 and grep --version was grep (BSD grep) 2.5.1-FreeBSD. I got it working with the "extended regex" option on OSX. The command from above would be grep -Eo '"text":.*?[^\\]",' tweets.json.

bash - Parsing JSON with Unix tools - Stack Overflow

json bash curl command-line
Rectangle 27 1

The problem with overriding JSONEncoder().default is that you can do it only once. If you stumble upon anything a special data type that does not work with that pattern (like if you use a strange encoding). With the pattern below, you can always make your class JSON serializable, provided that the class field you want to serialize is serializable itself (and can be added to a python list, barely anything). Otherwise, you have to apply recursively the same pattern to your json field (or extract the serializable data from it):

# base class that will make all derivatives JSON serializable:
class JSONSerializable(list): # need to derive from a serializable class.

  def __init__(self, value = None):
    self = [ value ]

  def setJSONSerializableValue(self, value):
    self = [ value ]

  def getJSONSerializableValue(self):
    return self[1] if len(self) else None


# derive  your classes from JSONSerializable:
class MyJSONSerializableObject(JSONSerializable):

  def __init__(self): # or any other function
    # .... 
    # suppose your__json__field is the class member to be serialized. 
    # it has to be serializable itself. 
    # Every time you want to set it, call this function:
    self.setJSONSerializableValue(your__json__field)
    # ... 
    # ... and when you need access to it,  get this way:
    do_something_with_your__json__field(self.getJSONSerializableValue())


# now you have a JSON default-serializable class:
a = MyJSONSerializableObject()
print json.dumps(a)

python - Making object JSON serializable with regular encoder - Stack ...

python json serialization