Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Google's Python style guide (googlecode.com)
130 points by cr4zy on April 19, 2012 | hide | past | favorite | 86 comments


"List Comprehensions: Ok to use for simple cases (only)": http://google-styleguide.googlecode.com/svn/trunk/pyguide.ht...

Really, Google? I find the following far more convenient to read:

    result = [(x, y) for x in range(10) for y in range(5) if x * y > 10]
Than the alternative:

    result = []
    for x in range(10):
        for y in range(5):
            if x * y > 10:
                result.append((x, y))


I found the alternative easier to read. I read it once. I had to scan yours twice. I'd say yours is borderline simple case.

Also, i think part of the difference can be found when you're dealing with large amounts of code in maintenance. I'd much rather read something dead simple (if somewhat verbose) than something that makes me think at all (another example here would be (in the ruby world) use of !unless).

Like anything that has to do with taste, to each their own (i prefer a vinegary bbq while you might like a smokier bbq)


I didn't like list comprehensions when I first encountered them, but after getting accustomed to them, I now strongly prefer to write and read a comprehension over a for-in loop.


I agree, but it depends on how complex the operation within the comprehension is.


It would be more readable to write it as:

    result = [(x, y) for x in range(10) 
                     for y in range(5) 
                     if x * y > 10]
Anyway, I'd call it a simple case.

Also, it has nice semantics - the second example has a clear execution order. This one doesn't - it can all happen at once, from the program's perspective and unless you make assumptions about order in which the items are calculated (as opposed to returned), they don't even need to be all ready before you start iterating on them. If you assume it can happen at once, the list comprehensions can be neatly mapped to parallel computations.

So, it shouldn't be that hard to optimize something like:

    data = [sin(x) for x in arange(0, pi, pi/20)]
to run on a GPU.

edit: small clarifications


"the second example has a clear execution order. This one doesn't - it can all happen at once, from the program's perspective"

I'm not sure what you're getting at here. With Python list comprehension semantics, those are both the same, including execution order. I don't see any ambiguity or need for "assumptions about the order of results". Am I missing something?

Optimization of Python list comprehensions to run on a GPU would take some serious mojo to ensure independence of each clause. Not an impossible amount, but certainly not trivial, especially as you move beyond calling 'sin'.


Sorry. I did some small clarifications (the original post got a little mangled during edition before my coffee kicked in) regarding order.

When you use a list comprehension, you may (or not) care about the order of the resulting items, but your program is completely shielded from the order in which the resulting list is calculated - unless your function is affecting a global state while the LC is being evaluated and each evaluation depends on the state changed by the last one. You can't insert a print in the outer loop, for instance, unless you explicitly nest the LCs.


This isn't Scala. In python list comprehensions are single threaded and have a very specific meaning (i.e., their computation order is deterministic). The pattern for working with parallel maps etc are done with separate map functions (see concurrent.futures.Executor.map).


Even if the result is in a certain sequence, why is the calculation required to be done sequentially?

With the map builtin deprecated because of LCs, wouldn't it make sense to exploit concurrent-like behavior with the nicer LC syntax? It makes a lot of sense to execute strictly in order for generators, but it doesn't make that much with LCs and I never saw code whose correctness depends on strictly ordered causation of side-effects.

Obviously, being that much multiprocessor-friendly makes little sense under a GIL, but CPython is not the only implementation of Python.


In general, Python is too flexible to make that easy. In theory, while you may never have seen a LC that depends on side effect order (and I believe that, I don't think I have either), the compiler can't assume that and doesn't. In general almost anything in Python could have a side effect, even though the culture is that it probably shouldn't. It turns out if you dig into it, this problem is ground deeply into the language.

Python is and probably ever shall be my favorite language of the OO-imperative 20th century/first decade of the 21st century style, but it will not be making the leap to the next generation of languages. And I sort of hope it doesn't even try; better to be the best of breed imperative-OO than a half-assed hybrid that does nothing well.


All that's needed to fix this is to place a "there is no guarantee the items of the list comprehension will be calculated sequentially one at a time" warning in the documentation. We don't need to change CPython to clear the way for other implementations.


Your first example is easier for me to read, but only in the case that I don't care about the order of (x, y) points.

If I care that (3, 6) comes before (4, 4), I find it much easier to ascertain that from the form that uses indentation for nesting loops.


List comprehensions iterate in the same order as the for loops. If it helps, stick a mental newline in there :P

    [(x, y) for x in range(10)
    for y in range(5)...]


Not only that, try running dis() on each of those and seeing what bytecode gets executed...


Right. List comprehensions even perform better.


I like the idea to include a name with TODOs, like

  # TODO(qznc) check Unicode handling
This never occured to me, but it provides a good pointer whom to ask for details. On the other hand git-blame should be able to provide the same info.


This is also convenient for when you want to pay back technical debt and fix some of the old TODOs: you can search the codebase for those with your name on them.


Seems to be a Google-ism. I haven't worked there, but all the ex-Googlers I've worked with do this. It also makes it easy to grep for all your TODOs in a tree, something you can't do with cvs/svn/git annotate.


> something you can't do with cvs/svn/git annotate.

Challenge accepted:

  for i in $(git grep -l TODO); do git blame -f $i |grep TODO |grep "$NAME"; done
Granted, it can be long on a big codebase, but does the trick.


If you're working with a Git repository, there's always `git blame` (http://schacon.github.com/git/git-blame.html).


I prefer something like

    #TODO #4001 needs more bananas
Where 4001 references something in our bug tracker (task, feature, issue, bug, etc.)


I hate those. Unless you're familiar with the whole team you don't know who qznc is. Git/hg/svn blame will tell you that more precisely. Inserting your name in the comment is only slightly less annoying than people insisting to put "author"comments on top of the file (or even a function)


I guess you could use "git blame" to find the name and email of the person who added that line?


And you don't have an easy to find out who he is? Or mail him based on his username?


If his username matches with his email, or if he uses the it consistently - maybe. But probably he wrote that note ages ago, someone else rewrote the function since then and left the comment because the content is still valid - ie. you're contacting the wrong person.

In short: metadata which is not updated automatically is most likely not up to date, and may be not correct in general.


You can configure the linter to check for valid username.


Interestingly, the Python code I've seen from Google uses 2-space indents rather than 4 as the style guide recommends. And that includes code written by Guido himself (AppStats and NDB, tools used in App Engine). I prefer 2 spaces as well, and I was hoping that the official style guide would match what's being used most commonly.


I don't mean to start yet another tabs vs spaces debate, but I've always felt spaces dictate how others see the code while tabs allowed others to see code however they like (2/4/8 spaces).

One exception is some projects are strict about lining up code properly in multi-line statements, and spaces are more consistent in that respect.

I prefer tabs, but most Python code I've seen has been 4 space indents.


I used to prefer tabs (less bytes, better abstraction etc), but I've long given up on it. Most tools are not smart enough to switch mode when required, and reformatting entire files every time you want to make a small change on somebody else's code is a real pain (and dangerous).

Community consensus is 4-spaces and that's it.


Yeah

I used to prefer tabs as well, and I still stand by it.

But you can always configure your editor to use 4 spaces. And 4 spaces looks like 'less wasted space'.

I'll keep using tabs for some things, and 4 spaces to most professional projects.

About 2 spaces I'll just say one thing: NO


Sublime text 2 can do it no problem, back and forth, and has visual indicators of tab levels regardless if it's spaces or tabs, 2 columns, 4 columns or 8 columns.


One easy way to break your brain is to work at Google... on an open source Python project. Two, four... two, four...

It could be far worse.


This is very weird, because the internal style guidelines say 2 spaces, not 4 spaces. This bugs me quite a lot.


They used to recommend 2 spaces (you can probably check the page with archive.org) but have switched to 4.


Why oh why the 80 character limit? It's the 21st century, screens are huge! I'm not saying let's put the limit in 300, but 100 or 120 is good enough to fit side by side diffs in one screen.


There are 2 issues there

1 - yes, screens are huge, but it doesn't mean people can/will use small fonts or will scroll the screen

With today's big/wide screens it's more useful to have code side by side.

2 - Abuse. The 80 character limit is a pretty good indicator that you should be doing something else instead of having your code go over 80 characters.

Long lines are confusing, and you most likely can split the logic in several lines, facilitating maintenance.


I see both points but regarding (2), there are common cases where you are not necessarily doing anything wrong but the 80 character limit will make your code less readable, especially when using four spaces or more for indentation. For example you might end up in the fourth level of indentation wanting to write a list comprehension that would be perfectly readable in one line but have to break it down because of this rather short hard limit.

I think having a soft and a hard limit makes more sense, if anything, I would make 80 characters the soft limit and perhaps 100 a hard limit, although I'd prefer them to be 100 and 120.


If you're in the fourth level of indentation, you might already have a readability problem.


"Might" being the key word. You might, if you are writing a simple script, but if you are writing some more complex code, class and method already take two levels, and that is your base line, I don't see how having two more levels is a readability problem.


I find that limiting line length to 80 characters causes some programmers to use overly terse variable names. This hurts code readability and can hide defects.


I've recently started developing in Python (after developing in many other languages) and I thought I would find the 80-character limit from PEP 8 a problem.

However, I've actually found it to be a good thing - I'm sure it aids readability.


Because it's easier for a human to read. It has nothing to do with prevailing screen sizes.


My point is that having 80 characters as a hard limit sometimes imposes some linebreaks that negatively affect readability. My mention of the screen sizes comes from the fact that the 80 character limit originally was set to fit the code in a 80x24 character terminal without non-explicit line wraps.


Depending slightly on your font size, 3 80-column lines fit side-by-side almost perfectly on a 30" monitor.


Much as I like the general consistency of Python code with or without formal style guides, I prefer Go's "style guide" even more, which is to just run "go fmt": http://golang.org/cmd/go/#Run_gofmt_on_package_sources.


You could just run pep8, you know.


I'll continue surrounding "="s in keyword arguments and default parameter values regardless of what PEP-8 says.


"gofmt -w" actually rewrites the files. I'd love a Python formatter which applies the robotic parts of PEP-8 automatically so you could focus on the parts which require more thought.


Weird. The recommendation for a shebang line is to use #!/usr/bin/python

That will definitely break stuff. Why not #!/usr/bin/env python?


env is nothing more than running it in a new shell, so slightly slower for each exec (since it starts a shell and then searches through PATH), may pick the wrong interpreter (because PATH is dependent on the user and a lot of other things outside of your control) and since this is all running on Google infrastructure they know what /usr/bin/python.

their goal isn't to write portable code, it is to write fast code that runs on google servers


I wrote a few small scripts to test this out. Each contained a simple print statement, then looped it 5000 times and timed it.

There is ~2.1% overhead when using env.

This doesn't matter in most cases, but in a Google-sized company with standardized environments it's worth it.


Sorry, but I fail to see where the "env" command spawns a shell:

http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/env....

Could you please explain what am I missing?


what I mean is that it is just like running 'python' in a shell, not that each instance spawns a new one


> like

One significant difference: when running python from a shell there's a fork() and an exec(). env(1) doesn't fork: there are not two processes. (In other words: the shell does not exit when you run a command, env vanishes).

There is obviously still some overhead to using env (and I just learnt from the source that env has argument processing). I tried to replicate AncientPC's test but on my machine both invocations take around 0.015s. (Perhaps their username is an indication as to why they see a (2%!) difference...).

But okay. What I really came here to say is: security. You can make all the efforts in the world to ensure that all programs get called with full pathnames but then one env shebang and you're suddenly open to running whatever's first in the user's $PATH and happens to call itself "python".

EDIT: eg http://portaudit.freebsd.org/d42e5b66-6ea0-11df-9c8d-00e0815...


That last point is important which I forgot to mention, especially in operating systems that have '.' first in PATH.

I notice the difference anecdotally without measuring it, but I don't know if that is a conception bias because I know env should be slower.

The best way would be an autoconf script in your package and an install run that finds and verifies the local framework.

I have to admit that I have never done this though. I have a few Python scripts with decent distribution and just rely on the direct path (and a batch file for win32)


I've posted my full test scripts and results here: http://williamting.com/2012/04/19/performance-decrease-using...

I've been using this handle since '93 out of inertia. My tests were done on an idle i7-620M, sequentially.

If your invocation takes about 0.015s, that means you're not looping enough for differences to appear. A 2% increase from 0.015s is 0.0153s, invisible due to significant digits cut off.


Maybe because it's a hack because to get around the following not working?

#!python


Bad idea. Which python (or program named "python") will run your script?


The first python in your path would run the script.


So, if your PATH gets changed, say, by something that just lept out of your web browser's sandbox, that's the python that'll run your your scripts, right?


Yes, that's how it would work, same as with env.


Yes; #!python should be made to work.

Not to use the user's PATH, but some system ordained python (via some kernel work).

env is a hack.


> via some kernel work

Kernels shouldn't be aware of the search path shells use.


I agree, as evidenced by the bit you didn't quote. (?!)


He may be referring to binfmt.


I wasn't referring to implementation really, but the principle. The main python could be in /bin or /usr/bin or possibly /usr/local/bin. The poor user or developer shouldn't be required to anticipate where and should not need to resort to hacks such as env.


Interesting they make no reference to PEP 8.


Yeah, especially since this old one does. http://code.google.com/p/soc/wiki/PythonStyleGuide#Naming


"Interesting" isn't the word I'd use - PEP 8 is pretty clearly ratified by the Python community (more than any other style) and yet Google Knows Best


The Google style guide seems to match up with PEP 8 pretty closely, from my brief review of it. Better yet, it actually includes explicit guidance on things like 'how to name local variables and class properties' which PEP 8 is mysteriously silent on.


Regarding this one:

http://google-styleguide.googlecode.com/svn/trunk/pyguide.ht...

I commented about this on another post. There's a nice explanation here as to why using mutable objects as default values in function/method definitions is bad:

http://effbot.org/zone/default-values.htm

In short, it can be bad to set default arguments to mutable objects because the function keeps using the same object in each call.


That's a classic Python gotcha, mentioned in all books and tutorials nowadays.


> Never use catch-all except: statements, or catch Exception or StandardError, unless you are re-raising the exception or in the outermost block in your thread (and printing an error message). Python is very tolerant in this regard and except: will really catch everything including Python syntax errors. It is easy to hide real bugs using except:.

What kind of SyntaxErrors are cought by the except: handler? Not all, I presume:

  try:
    a b
  except:
    pass
This fails with SyntaxError on Python 2.7.2 on my machine.


That code is never executed because it fails to parse. I think you can catch SyntaxError if you import a file with broken syntax.


This would catch a SyntaxError raised by eval().

For example,

    try:
        eval(".")
    except:
        pass


I wonder why pyChecker and not pyLint or pep8. Anyone got an insight?



Fun fact: Google recommend against using map/reduce :)

http://google-styleguide.googlecode.com/svn/trunk/pyguide.ht...

Oh, not different map/reduce? ;)


One of the nice things about using map() is multiprocessing.Pool.map() can be a drop-in replacement.

OTOH, map() does otherwise have a little overhead compared to list comprehension.


I don't like Google's import style, and I have noticed it in a lot of their code. They nicely namespace all of their packages and modules only to dump methods and classes into a single namespace when being imported and used.

for eg.

    from sound.effects import echo
    echo.EchoFilter(input, output)
What happens is that you then end up importing all of these methods and very quickly you start getting name conflicts. Lets say you want to support a third-party echo function:

    from sound.effects import echo
    from vendor.soundutil.effects import echo as soundutil_echo
You see this all the time in SDK and web API packages. Dozens of modules called 'auth' (which auth? twitter? facebook?) or 'oauth' or 'request'.

Lets say you have a user page that integrates with social networks, would you rather:

    from facebook.api.auth import auth
    from twitter.api.auth import twitter_auth
etc. etc. or

    import facebook.api
    import twitter.api
    ..
    facebook.api.auth()
    twitter.api.auth()
You end up either doing 'import as' hacks and a lot of renaming. Code is a lot clearer to read when you see full method names such as facebook.api.auth rather than just 'auth' and 'echo' everywhere. You also don't lose documentation paths.

My general rule of thumb is to use 'from' infrequently, never do import *, to retain the part of the path that still keeps namespacing sane and clear to the developer and as the doc says to never do relative imports.

It means you can scan any part of the code and understand what is going on without going back up to the top of the file. Also makes search/replace easier (rather than s/echo/echo_new s/sound.effects.echo/mynewpackage.echo)

The other one I didn't see mentioned is nesting levels and method lengths. Python isn't well suited to deep-nested and long methods. Especially if your coding style is to comment out blocks of code during development as you test things, you always end up commenting out parts and then having to re-indent the rest of it.

The same usually applies if you have long 'and' 'or' clauses in ifs that span multiple lines and make it harder to understand the code. I usually wrap those tests into separate methods (if you are using them once, you will probably use them again)

but for nesting, I try to stick to 2 levels max. If you go beyond that it is usually a hint that you can refactor the codepath and perhaps even separate out into another method.

I just happen to be doing this a few hours ago while writing an option and argument parser for a command line utility that has sub-commands. a quick re-factor made the code and all the different options and which options apply to which sub-commands a lot easier to understand

Edit: just further on breaking up code and bounds checking into methods, it makes life easier for other developers and for your future self. there is nothing more exhausting than trying to debug a module and finding a 3-page long method called 'run', which you end up having to break down yourself anyway. separate all the bounds checking into one or two line methods, break everything else up, document it, write some tests for it and then forget about it - that is done and it works. get on with important things.

checking nesting levels and method length is almost something I would want to put in a linter


I find it's easier to have short names, defined at the beginning, rather than clogging up my code with full.path.to.a.module. I imagine preferences vary.

What I don't understand is why they prefer:

    from thepackage.subpackage import amodule
over

    import thepackage.subpackage.amodule as amodule
I always found the second to be more clear, since it doesn't mix the idea of package-resolution with the idea of picking-things-out-of-a-module. On the down side, I end up typing "amodule" twice.


it can get confusing when you import a lot of things, but for quick prototyping I prefer to just do `from package.this import that` and then be able to `that.execute(element)` instead of having to type `package.this.that.execute(element)` every single time. (which already takes up 34 characters of the 80 char limit)


Wait, I thought Python WAS a style guide.


The last point is definitely the most important, and applies to any language


Personally, in my own code I also like to put pass at the end of all blocks. It's more consistent, and it also makes auto indent in the editors work properly. Anyone else?


For an offhand comparison: Google's unofficial Ruby style guide:

http://www.caliban.org/ruby/rubyguide.shtml




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: