A list (in CPython) is an array at least as long as the list and up to twice as long. If the array isn't full, appending to a list is just as simple as assigning one of the array members (O(1)). Every time the array is full, it is automatically doubled in size. This means that on occasion an O(n) operation is required, but it is only required every n operations, and it is increasingly seldom required as the list gets big. O(n) / n ==> O(1). (In other implementations the names and details could potentially change, but the same time properties are bound to be maintained.)
Appending to a list already scales.
Is it possible that when the file gets to be big you are not able to hold everything in memory and you are facing problems with the OS paging to disk? Is it possible it's a different part of your algorithm that doesn't scale well?
Thanks for clarifying ~ so once in a while, I have an O(n) operation, but the next time the loop happens, it is back to the usual? I will do a detailed timing analysis of my code in the next few days and post again.
Yes, the O(n) operations only occur occasionally. The number of appends between them grows at the same rate as n, so it averages out to having only a constant effect.
256 GIGABYTES of ram or 128 or 64 but nothing lower than 64 GIGABYTES. For example: top - 02:36:31 up 36 days, 11:21, 7 users, load average: 0.84, 0.31, 0.11 Tasks: 274 total, 2 running, 272 sleeping, 0 stopped, 0 zombie Cpu(s): 6.2%us, 0.1%sy, 0.0%ni, 93.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 132370600k total, 7819100k used, 124551500k free, 481084k buffers Swap: 2031608k total, 3780k used, 2027828k free, 5256144k cached
Thanks Mike ~ below you mention that turning gc off helped you ~ is there any side effects to that? When should I turn it back on?