I found the alternative easier to read. I read it once. I had to scan yours twice. I'd say yours is borderline simple case.
Also, i think part of the difference can be found when you're dealing with large amounts of code in maintenance. I'd much rather read something dead simple (if somewhat verbose) than something that makes me think at all (another example here would be (in the ruby world) use of !unless).
Like anything that has to do with taste, to each their own (i prefer a vinegary bbq while you might like a smokier bbq)
I didn't like list comprehensions when I first encountered them, but after getting accustomed to them, I now strongly prefer to write and read a comprehension over a for-in loop.
result = [(x, y) for x in range(10)
for y in range(5)
if x * y > 10]
Anyway, I'd call it a simple case.
Also, it has nice semantics - the second example has a clear execution order. This one doesn't - it can all happen at once, from the program's perspective and unless you make assumptions about order in which the items are calculated (as opposed to returned), they don't even need to be all ready before you start iterating on them. If you assume it can happen at once, the list comprehensions can be neatly mapped to parallel computations.
So, it shouldn't be that hard to optimize something like:
"the second example has a clear execution order. This one doesn't - it can all happen at once, from the program's perspective"
I'm not sure what you're getting at here. With Python list comprehension semantics, those are both the same, including execution order. I don't see any ambiguity or need for "assumptions about the order of results". Am I missing something?
Optimization of Python list comprehensions to run on a GPU would take some serious mojo to ensure independence of each clause. Not an impossible amount, but certainly not trivial, especially as you move beyond calling 'sin'.
Sorry. I did some small clarifications (the original post got a little mangled during edition before my coffee kicked in) regarding order.
When you use a list comprehension, you may (or not) care about the order of the resulting items, but your program is completely shielded from the order in which the resulting list is calculated - unless your function is affecting a global state while the LC is being evaluated and each evaluation depends on the state changed by the last one. You can't insert a print in the outer loop, for instance, unless you explicitly nest the LCs.
This isn't Scala. In python list comprehensions are single threaded and have a very specific meaning (i.e., their computation order is deterministic). The pattern for working with parallel maps etc are done with separate map functions (see concurrent.futures.Executor.map).
Even if the result is in a certain sequence, why is the calculation required to be done sequentially?
With the map builtin deprecated because of LCs, wouldn't it make sense to exploit concurrent-like behavior with the nicer LC syntax? It makes a lot of sense to execute strictly in order for generators, but it doesn't make that much with LCs and I never saw code whose correctness depends on strictly ordered causation of side-effects.
Obviously, being that much multiprocessor-friendly makes little sense under a GIL, but CPython is not the only implementation of Python.
In general, Python is too flexible to make that easy. In theory, while you may never have seen a LC that depends on side effect order (and I believe that, I don't think I have either), the compiler can't assume that and doesn't. In general almost anything in Python could have a side effect, even though the culture is that it probably shouldn't. It turns out if you dig into it, this problem is ground deeply into the language.
Python is and probably ever shall be my favorite language of the OO-imperative 20th century/first decade of the 21st century style, but it will not be making the leap to the next generation of languages. And I sort of hope it doesn't even try; better to be the best of breed imperative-OO than a half-assed hybrid that does nothing well.
All that's needed to fix this is to place a "there is no guarantee the items of the list comprehension will be calculated sequentially one at a time" warning in the documentation. We don't need to change CPython to clear the way for other implementations.
This is also convenient for when you want to pay back technical debt and fix some of the old TODOs: you can search the codebase for those with your name on them.
Seems to be a Google-ism. I haven't worked there, but all the ex-Googlers I've worked with do this. It also makes it easy to grep for all your TODOs in a tree, something you can't do with cvs/svn/git annotate.
I hate those. Unless you're familiar with the whole team you don't know who qznc is. Git/hg/svn blame will tell you that more precisely. Inserting your name in the comment is only slightly less annoying than people insisting to put "author"comments on top of the file (or even a function)
If his username matches with his email, or if he uses the it consistently - maybe. But probably he wrote that note ages ago, someone else rewrote the function since then and left the comment because the content is still valid - ie. you're contacting the wrong person.
In short: metadata which is not updated automatically is most likely not up to date, and may be not correct in general.
Interestingly, the Python code I've seen from Google uses 2-space indents rather than 4 as the style guide recommends. And that includes code written by Guido himself (AppStats and NDB, tools used in App Engine). I prefer 2 spaces as well, and I was hoping that the official style guide would match what's being used most commonly.
I don't mean to start yet another tabs vs spaces debate, but I've always felt spaces dictate how others see the code while tabs allowed others to see code however they like (2/4/8 spaces).
One exception is some projects are strict about lining up code properly in multi-line statements, and spaces are more consistent in that respect.
I prefer tabs, but most Python code I've seen has been 4 space indents.
I used to prefer tabs (less bytes, better abstraction etc), but I've long given up on it. Most tools are not smart enough to switch mode when required, and reformatting entire files every time you want to make a small change on somebody else's code is a real pain (and dangerous).
Sublime text 2 can do it no problem, back and forth, and has visual indicators of tab levels regardless if it's spaces or tabs, 2 columns, 4 columns or 8 columns.
Why oh why the 80 character limit? It's the 21st century, screens are huge! I'm not saying let's put the limit in 300, but 100 or 120 is good enough to fit side by side diffs in one screen.
1 - yes, screens are huge, but it doesn't mean people can/will use small fonts or will scroll the screen
With today's big/wide screens it's more useful to have code side by side.
2 - Abuse. The 80 character limit is a pretty good indicator that you should be doing something else instead of having your code go over 80 characters.
Long lines are confusing, and you most likely can split the logic in several lines, facilitating maintenance.
I see both points but regarding (2), there are common cases where you are not necessarily doing anything wrong but the 80 character limit will make your code less readable, especially when using four spaces or more for indentation. For example you might end up in the fourth level of indentation wanting to write a list comprehension that would be perfectly readable in one line but have to break it down because of this rather short hard limit.
I think having a soft and a hard limit makes more sense, if anything, I would make 80 characters the soft limit and perhaps 100 a hard limit, although I'd prefer them to be 100 and 120.
"Might" being the key word. You might, if you are writing a simple script, but if you are writing some more complex code, class and method already take two levels, and that is your base line, I don't see how having two more levels is a readability problem.
I find that limiting line length to 80 characters causes some programmers to use overly terse variable names. This hurts code readability and can hide defects.
I've recently started developing in Python (after developing in many other languages) and I thought I would find the 80-character limit from PEP 8 a problem.
However, I've actually found it to be a good thing - I'm sure it aids readability.
My point is that having 80 characters as a hard limit sometimes imposes some linebreaks that negatively affect readability. My mention of the screen sizes comes from the fact that the 80 character limit originally was set to fit the code in a 80x24 character terminal without non-explicit line wraps.
Much as I like the general consistency of Python code with or without formal style guides, I prefer Go's "style guide" even more, which is to just run "go fmt": http://golang.org/cmd/go/#Run_gofmt_on_package_sources.
"gofmt -w" actually rewrites the files. I'd love a Python formatter which applies the robotic parts of PEP-8 automatically so you could focus on the parts which require more thought.
env is nothing more than running it in a new shell, so slightly slower for each exec (since it starts a shell and then searches through PATH), may pick the wrong interpreter (because PATH is dependent on the user and a lot of other things outside of your control) and since this is all running on Google infrastructure they know what /usr/bin/python.
their goal isn't to write portable code, it is to write fast code that runs on google servers
One significant difference: when running python from a shell there's a fork() and an exec(). env(1) doesn't fork: there are not two processes. (In other words: the shell does not exit when you run a command, env vanishes).
There is obviously still some overhead to using env (and I just learnt from the source that env has argument processing). I tried to replicate AncientPC's test but on my machine both invocations take around 0.015s. (Perhaps their username is an indication as to why they see a (2%!) difference...).
But okay. What I really came here to say is: security. You can make all the efforts in the world to ensure that all programs get called with full pathnames but then one env shebang and you're suddenly open to running whatever's first in the user's $PATH and happens to call itself "python".
That last point is important which I forgot to mention, especially in operating systems that have '.' first in PATH.
I notice the difference anecdotally without measuring it, but I don't know if that is a conception bias because I know env should be slower.
The best way would be an autoconf script in your package and an install run that finds and verifies the local framework.
I have to admit that I have never done this though. I have a few Python scripts with decent distribution and just rely on the direct path (and a batch file for win32)
I've been using this handle since '93 out of inertia. My tests were done on an idle i7-620M, sequentially.
If your invocation takes about 0.015s, that means you're not looping enough for differences to appear. A 2% increase from 0.015s is 0.0153s, invisible due to significant digits cut off.
So, if your PATH gets changed, say, by something that just lept out of your web browser's sandbox, that's the python that'll run your your scripts, right?
I wasn't referring to implementation really, but the principle.
The main python could be in /bin or /usr/bin or possibly /usr/local/bin. The poor user or developer shouldn't be required to anticipate where and should not need to resort to hacks such as env.
The Google style guide seems to match up with PEP 8 pretty closely, from my brief review of it. Better yet, it actually includes explicit guidance on things like 'how to name local variables and class properties' which PEP 8 is mysteriously silent on.
I commented about this on another post. There's a nice explanation here as to why using mutable objects as default values in function/method definitions is bad:
> Never use catch-all except: statements, or catch Exception or StandardError, unless you are re-raising the exception or in the outermost block in your thread (and printing an error message). Python is very tolerant in this regard and except: will really catch everything including Python syntax errors. It is easy to hide real bugs using except:.
What kind of SyntaxErrors are cought by the except: handler? Not all, I presume:
try:
a b
except:
pass
This fails with SyntaxError on Python 2.7.2 on my machine.
I don't like Google's import style, and I have noticed it in a lot of their code. They nicely namespace all of their packages and modules only to dump methods and classes into a single namespace when being imported and used.
for eg.
from sound.effects import echo
echo.EchoFilter(input, output)
What happens is that you then end up importing all of these methods and very quickly you start getting name conflicts. Lets say you want to support a third-party echo function:
from sound.effects import echo
from vendor.soundutil.effects import echo as soundutil_echo
You see this all the time in SDK and web API packages. Dozens of modules called 'auth' (which auth? twitter? facebook?) or 'oauth' or 'request'.
Lets say you have a user page that integrates with social networks, would you rather:
from facebook.api.auth import auth
from twitter.api.auth import twitter_auth
You end up either doing 'import as' hacks and a lot of renaming. Code is a lot clearer to read when you see full method names such as facebook.api.auth rather than just 'auth' and 'echo' everywhere. You also don't lose documentation paths.
My general rule of thumb is to use 'from' infrequently, never do import *, to retain the part of the path that still keeps namespacing sane and clear to the developer and as the doc says to never do relative imports.
It means you can scan any part of the code and understand what is going on without going back up to the top of the file. Also makes search/replace easier (rather than s/echo/echo_new s/sound.effects.echo/mynewpackage.echo)
The other one I didn't see mentioned is nesting levels and method lengths. Python isn't well suited to deep-nested and long methods. Especially if your coding style is to comment out blocks of code during development as you test things, you always end up commenting out parts and then having to re-indent the rest of it.
The same usually applies if you have long 'and' 'or' clauses in ifs that span multiple lines and make it harder to understand the code. I usually wrap those tests into separate methods (if you are using them once, you will probably use them again)
but for nesting, I try to stick to 2 levels max. If you go beyond that it is usually a hint that you can refactor the codepath and perhaps even separate out into another method.
I just happen to be doing this a few hours ago while writing an option and argument parser for a command line utility that has sub-commands. a quick re-factor made the code and all the different options and which options apply to which sub-commands a lot easier to understand
Edit: just further on breaking up code and bounds checking into methods, it makes life easier for other developers and for your future self. there is nothing more exhausting than trying to debug a module and finding a 3-page long method called 'run', which you end up having to break down yourself anyway. separate all the bounds checking into one or two line methods, break everything else up, document it, write some tests for it and then forget about it - that is done and it works. get on with important things.
checking nesting levels and method length is almost something I would want to put in a linter
I find it's easier to have short names, defined at the beginning, rather than clogging up my code with full.path.to.a.module. I imagine preferences vary.
What I don't understand is why they prefer:
from thepackage.subpackage import amodule
over
import thepackage.subpackage.amodule as amodule
I always found the second to be more clear, since it doesn't mix the idea of package-resolution with the idea of picking-things-out-of-a-module. On the down side, I end up typing "amodule" twice.
it can get confusing when you import a lot of things, but for quick prototyping I prefer to just do `from package.this import that` and then be able to `that.execute(element)` instead of having to type `package.this.that.execute(element)` every single time. (which already takes up 34 characters of the 80 char limit)
Personally, in my own code I also like to put pass at the end of all blocks. It's more consistent, and it also makes auto indent in the editors work properly. Anyone else?
Really, Google? I find the following far more convenient to read:
Than the alternative: