在Python中,两个对象什么时候相同? [英] In Python, when are two objects the same?

查看:66
本文介绍了在Python中,两个对象什么时候相同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

2 is 23 is 3在python中似乎总是正确的,通常,对整数的任何引用都与对相同整数的任何其他引用相同. None(即None is None)也是如此.我知道用户定义的类型或可变类型 not 不会发生这种情况.但有时在不可变类型上也会失败:

It seems that 2 is 2 and 3 is 3 will always be true in python, and in general, any reference to an integer is the same as any other reference to the same integer. The same happens to None (i.e., None is None). I know that this does not happen to user-defined types, or mutable types. But it sometimes fails on immutable types too:

>>> () is ()
True
>>> (2,) is (2,)
False

也就是说,空元组的两个独立构造会产生对内存中同一对象的引用,但是相同的一个(不可变)元素元组的两个独立构造最终会创建两个相同的对象.我进行了测试,并且frozenset的工作方式类似于元组.

That is: two independent constructions of the empty tuple yield references to the same object in memory, but two independent constructions of identical one-(immutable-)element tuples end up creating two identical objects. I tested, and frozensets work in a manner similar to tuples.

由什么决定一个对象是否将在内存中复制或将具有包含大量引用的单个实例?在某种意义上,它是否取决于物体是否是原子的"?它会根据实现而变化吗?

What determines if an object will be duplicated in memory or will have a single instance with lots of references? Does it depend on whether the object is "atomic" in some sense? Does it vary according to implementation?

推荐答案

Python有一些类型,可以保证只有一个实例.这些实例的示例是NoneNotImplementedEllipsis.这些是(按定义)单例,因此像None is None之类的东西可以保证返回True,因为无法创建NoneType的新实例.

Python has some types that it guarantees will only have one instance. Examples of these instances are None, NotImplemented, and Ellipsis. These are (by definition) singletons and so things like None is None are guaranteed to return True because there is no way to create a new instance of NoneType.

它还提供了几个doubletons 1 TrueFalse 2 -对True的所有引用都指向同一对象.同样,这是因为无法创建bool的新实例.

It also supplies a few doubletons 1 True, False 2 -- All references to True point to the same object. Again, this is because there is no way to create a new instance of bool.

以上所有内容均由python语言保证.但是,您已经注意到,有些类型(全部不可变)存储一些实例以供重用.这是语言所允许的,但是不同的实现可以选择是否使用此配额-取决于其优化策略.属于此类的一些示例是小整数(-5-> 255),空的tuple和空的frozenset.

The above things are all guaranteed by the python language. However, as you have noticed, there are some types (all immutable) that store some instances for reuse. This is allowed by the language, but different implementations may choose to use this allowance or not -- depending on their optimization strategies. Some examples that fall into this category are small integers (-5 -> 255), the empty tuple and empty frozenset.

最后,Cpython intern某些不可变对象在解析过程中……

Finally, Cpython interns certain immutable objects during parsing...

例如如果您使用Cpython运行以下脚本,则会看到它返回True:

e.g. if you run the following script with Cpython, you'll see that it returns True:

def foo():
    return (2,)

if __name__ == '__main__':
    print foo() is foo()

这似乎真的很奇怪. Cpython正在玩的技巧是,每当构造函数foo时,它就会看到包含其他简单(不可变)文字的元组文字. python不会一遍又一遍地创建此元组(或其等价物),而是仅创建一次.因为整个交易是不可变的,所以没有更改该对象的危险.一遍又一遍地调用相同的紧密循环,这对于性能而言可能是一个巨大的胜利.小绳子也会被扣留.真正的胜利在于字典查找. Python可以执行(非常快的)指针比较,然后在检查哈希冲突时退回到较慢的字符串比较.由于很多python是基于字典查找构建的,因此对于整个语言而言,这可能是一个很大的优化.

This seems really odd. The trick that Cpython is playing is that whenever it constructs the function foo, it sees a tuple-literal that contains other simple (immutable) literals. Rather than create this tuple (or it's equivalents) over and over, python just creates it once. There's no danger of that object being changed since the whole deal is immutable. This can be a big win for performance where the same tight loop is called over and over. Small strings are interned as well. The real win here is in dictionary lookups. Python can do a (blazingly fast) pointer compare and then fall back on slower string comparisons when checking hash collisions. Since so much of python is built on dictionary lookups, this can be a big optimization for the language as a whole.

1 我可能刚刚编造了这个词...但希望您能理解...
2 在通常情况下,您不需要检查对象是否是对True的引用-通常,您只关心对象是否为诚实"-例如如果if some_instance: ...将执行分支.但是,我将其放在此处只是为了完整性.

1I might have just made up that word ... But hopefully you get the idea...
2Under normal circumstances, you don't need do check if the object is a reference to True -- Usually you just care if the object is "truthy" -- e.g. if if some_instance: ... will execute the branch. But, I put that in here just for completeness.

请注意,is可用于比较不是单例的事物.一种常见的用途是创建一个哨兵值:

Note that is can be used to compare things that aren't singletons. One common use is to create a sentinel value:

sentinel = object()
item = next(iterable, sentinel)
if items is sentinel:
   # iterable exhausted.

或者:

_sentinel = object()
def function(a, b, none_is_ok_value_here=_sentinel):
    if none_is_ok_value_here is sentinel:
        # Treat the function as if `none_is_ok_value_here` was not provided.

这个故事的寓意是总是说出您的意思.如果要检查值是否是是另一个值,请使用is运算符.如果要检查值是否等于另一个值(但可能不同),请使用==.有关is==之间的区别(以及何时使用它们)的更多详细信息,请参阅以下文章之一:

The moral of this story is to always say what you mean. If you want to check if a value is another value, then use the is operator. If you want to check if a value is equal to another value (but possibly distinct), then use ==. For more details on the difference between is and == (and when to use which), consult one of the following posts:

  • Is there a difference between `==` and `is` in Python?
  • Python None comparison: should I use "is" or ==?

我们已经讨论了这些CPython实现细节,并且声称它们是优化.最好只是衡量我们从所有优化中得到的收益(除了使用is运算符时会产生一些混乱).

We've talked about these CPython implementation details and we've claimed that they're optimizations. It'd be nice to try to measure just what we get from all this optimizing (other than a little added confusion when working with the is operator).

这是一个小脚本,如果您使用相同的字符串而不是其他字符串来查找值,则可以运行该脚本来查看字典查找的速度.请注意,我在变量名称中使用了术语"interned"-这些值不一定是intern(尽管可以).我只是用它来表示"interned"字符串是字典中的字符串.

Here's a small script that you can run to see how much faster dictionary lookups are if you use the same string to look up the value instead of a different string. Note, I use the term "interned" in the variable names -- These values aren't necessarily interned (though they could be). I'm just using that to indicate that the "interned" string is the string in the dictionary.

import timeit

interned = 'foo'
not_interned = (interned + ' ').strip()

assert interned is not not_interned


d = {interned: 'bar'}

print('Timings for short strings')
number = 100000000
print(timeit.timeit(
    'd[interned]',
    setup='from __main__ import interned, d',
    number=number))
print(timeit.timeit(
    'd[not_interned]',
    setup='from __main__ import not_interned, d',
    number=number))


####################################################

interned_long = interned * 100
not_interned_long = (interned_long + ' ').strip()

d[interned_long] = 'baz'

assert interned_long is not not_interned_long
print('Timings for long strings')
print(timeit.timeit(
    'd[interned_long]',
    setup='from __main__ import interned_long, d',
    number=number))
print(timeit.timeit(
    'd[not_interned_long]',
    setup='from __main__ import not_interned_long, d',
    number=number))

这里的确切值应该没什么大不了,但是在我的计算机上,短字符串显示的速度大约是7分之一. long 字符串的速度快将近2倍(因为如果要比较的字符串更多,则字符串比较会花费更长的时间).差异在python3.x上并没有那么明显,但肯定仍然存在.

The exact values here shouldn't matter too much, but on my computer, the short strings show about 1 part in 7 faster. The long strings are almost 2x faster (because the string comparison takes longer if the string has more characters to compare). The differences aren't quite as striking on python3.x, but they're still definitely there.

这是一个小脚本,您可以使用它:

Here's a small script you can play around with:

import timeit

def foo_tuple():
    return (2, 3, 4)

def foo_list():
    return [2, 3, 4]

assert foo_tuple() is foo_tuple()

number = 10000000
t_interned_tuple = timeit.timeit('foo_tuple()', setup='from __main__ import foo_tuple', number=number)
t_list = (timeit.timeit('foo_list()', setup='from __main__ import foo_list', number=number))

print(t_interned_tuple)
print(t_list)
print(t_interned_tuple / t_list)
print('*' * 80)


def tuple_creation(x):
    return (x,)

def list_creation(x):
    return [x]

t_create_tuple = timeit.timeit('tuple_creation(2)', setup='from __main__ import tuple_creation', number=number)
t_create_list = timeit.timeit('list_creation(2)', setup='from __main__ import list_creation', number=number)
print(t_create_tuple)
print(t_create_list)
print(t_create_tuple / t_create_list)

这个时间有点麻烦(我很高兴提出更好的主意,如何在评论中添加时间).要点是,平均而言(和在我的计算机上),创建一个元组所花费的时间大约是列表所花费的时间的60%.但是,foo_tuple()平均花费的时间是foo_list()花费的时间的40%.这表明我们确实从这些实习生那里获得了一点点提速.随着元组变大,节省的时间似乎增加了(创建更长的列表会花费更长的时间-元组创建"自创建以来就花费固定的时间).

This one is a bit trickier to time (and I'm happy to take any better ideas how to time it in comments). The gist of this is that on average (and on my computer), a tuple takes about 60% as long to create as a list does. However, foo_tuple() takes on average about 40% the time that foo_list() takes. That shows that we really do gain a little bit of a speedup from these interns. The time savings seem to increase as the tuple gets larger (creating a longer list takes longer -- The tuple "creation" takes constant time since it was already created).

还请注意,我称此为实习生".实际上不是(至少在相同的意义上,字符串是固定的).我们可以在这个简单的脚本中看到不同之处:

Also note that I've called this "interning". It actually isn't (at least not in the same sense the strings are interned). We can see the difference in this simple script:

def foo_tuple():
    return (2,)

def bar_tuple():
    return (2,)

def foo_string():
    return 'foo'

def bar_string():
    return 'foo'

print(foo_tuple() is foo_tuple())  # True
print(foo_tuple() is bar_tuple())  # False

print(foo_string() is bar_string())  # True

我们看到字符串确实是"interned"的-使用相同文字表示法的不同调用将返回相同的对象.元组实习"似乎特定于一行.

We see that the strings are really "interned" -- Different invocations using the same literal notation return the same object. The tuple "interning" seems to be specific to a single line.

这篇关于在Python中,两个对象什么时候相同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆