用 Python 编写干净、可测试、高质量的代码

来源: www.ibm.com 发布时间: 2010-12-22 21:46 阅读: 1289 次推荐: 0 原文链接 [收藏]

摘要：用任何语言都可以写出极其糟糕的代码，包括优雅强大的 Python 语言。在本文中，我们讨论看待测试的方式不同如何导致差异很大的 Python 代码。最后，讨论如何科学地度量代码差异。

　　简介

　　编写软件是人所承担的最复杂的任务之一。AWK 编程语言和 "K and R C" 的作者之一 Brian Kernigan 在 Software Tools 一书中总结了软件开发的真实性质，他说，“控制复杂性是软件开发的根本。” 真实软件开发的残酷现实是，软件常常具有有意或无意造成的复杂性，而且开发人员常常漠视可维护性、可测试性和质量。这种不幸局面的最终结果是软件的维护变得越来越困难且昂贵，软件偶尔会出故障，甚至是重大故障。

　　编写高质量代码的第一步是，重新考量个人或团队开发软件的整个过程。在失败或陷入麻烦的软件开发项目中，常常按违反原则的方式开发软件，开发人员关注的重点是解决问题，无论采用什么方式。在成功的软件项目中，开发人员不但要考虑如何解决手中的问题，还要考虑解决问题涉及到的过程。

　　成功的软件开发人员会按照便于自动化的方式运行测试，这样就可以不断地证明软件工作正常。他们明白不必要的复杂性的危害。他们严格地遵守自己的方法，在每个阶段都进行认真的复查，寻找重构的机会。他们经常思考如何确保其软件是可测试、可读且可维护的。尽管 Python 语言的设计者和 Python 社区都非常重视编写干净、可维护的代码，但是仍然很容易出现相反的局面。在本文中，我们要探讨这个问题，讨论如何用 Python 编写干净、可测试、高质量的代码。

　　演示这种开发风格的最好方法是解决一个假想的问题。假设您是某公司的后端 web 开发人员，公司允许用户发表评论，您需要设法显示和突出显示这些评论的小片段。解决此问题的一种方法是编写一个大函数，它接受文本片段和查询参数，返回字符数量有限的片段并突出显示查询参数。解决此问题所需的所有逻辑都放在一个巨大的函数中，您只需反复运行脚本，直到得到想要的结果。代码结构很可能像下面的代码示例这样，常常包含打印语句或日志记录语句和交互式 shell。

def my_mega_function(snippet, query)
    """This takes a snippet of text, and a query parameter and returns """

#Logic goes here, and often runs on for several hundred lines
#There are often deeply nested conditional statements and loops
#Function could reach several hundred, if not thousands of lines

return result

　　对于 Python、Perl 或 Ruby 等动态语言，软件开发人员很容易一味专注于问题本身，常常采用交互方式进行探索，直到出现看似正确的结果，然后就宣告任务完成了。不幸的是，尽管这种方式很方便、很有吸引力，但是这常常会造成大功告成的错觉，这是很危险的。危险主要在于没有设计可测试的解决方案，而且没有对软件的复杂性进行适当的控制。

　　您如何确认这个函数工作正常呢？在开发期间最后一次运行它时它是正常的，您就此相信它是有效的，但是您能确定它的逻辑或语法中没有细微的错误吗？如果需要修改代码，会怎么样？它仍然有效吗？您如何确认它仍然有效？如果需要由另一位开发人员维护并修改代码，会怎么样？他如何确认他的修改不会造成问题？对于他来说，理解代码的作用有多难？

　　简单地说，如果没有测试，就不知道软件是否有效。如果在开发过程中总是假设而不是证明有效性，最终可能会开发出看似有效的代码，但是没人能够肯定代码会正确地运行。这种局面太糟糕了，我编写过这样的软件，也曾经帮助调试以这种方式编写的软件。幸运的是，很容易避免这种局面。应该先编写测试（比如测试驱动的开发），否则在编写逻辑的过程中编写代码的方向会偏离目标。先编写测试会产生模块化的可扩展的代码，这种代码很容易测试、理解和维护。对于有经验的开发人员来说，很容易看出软件是否是在一直牢记着测试的情况下编写的。软件本身在高手看来差别非常大。

　　您不必听信我的观点，也不必直接研究代码，可以通过其他方法明显地看出这两种风格之间的差异。第一种方法是实际度量得到测试的代码行数。Nose 是一种流行的 Python 单元测试框架扩展，它可以方便地自动运行一批测试和插件，比如度量代码覆盖率。通过在开发期间度量代码覆盖率，会很快看出对于由大函数组成、包含深度嵌套的逻辑、以非一般化方式构建的代码来说，测试覆盖率几乎不可能达到 100%。

　　度量差异的第二种方法是使用静态分析工具。有几种流行的 Python 工具可以为 Python 开发人员提供多种指标，从一般性代码质量指标到重复代码或复杂度等特殊指标。可以用 pygenie 或 pymetrics 度量代码的圈（cyclomatic）复杂度（见参考资料）。

　　下面是对相当简单的 “干净” 代码运行 pygenie 的结果示例：
　　pygenie 的圈复杂度输出

% python pygenie.py complexity --verbose highlight spy
File: /Users/ngift/Documents/src/highlight.py
Type Name                                                                   Complexity 
----------------------------------------------------------------------------------------
M    HighlightDocumentOperations._create_snippit                                  3
M    HighlightDocumentOperations._reconstruct_document_string                     3
M    HighlightDocumentOperations._doc_to_sentences                                2
M    HighlightDocumentOperations._querystring_to_dict                             2
M    HighlightDocumentOperations._word_frequency_sort                             2
M    HighlightDocumentOperations.highlight_doc                                    2
X    /Users/ngift/Documents/src/highlight.py 1          
C    HighlightDocumentOperations                                                  1
M    HighlightDocumentOperations.__init__                                         1
M    HighlightDocumentOperations._custom_highlight_tag                            1
M    HighlightDocumentOperations._score_sentences                                 1
M    HighlightDocumentOperations._multiple_string_replace                         1

　　什么是圈复杂度？

　　圈复杂度是 Thomas J. McCabe 在 1976 年开创的软件指标，用来判断程序的复杂度。这个指标度量源代码中线性独立的路径或分支的数量。根据 McCabe 所说，一个方法的复杂度最好保持在 10 以下。这是因为对人类记忆力的研究表明，人的短期记忆只能存储 7 件事（偏差为正负 2）。

　　如果开发人员编写的代码有 50 个线性独立的路径，那么为了在头脑中描绘出方法中发生的情况，需要的记忆力大约超过短期记忆容量的 5 倍。简单的方法不会超过人的短期记忆力的极限，因此更容易应付，事实证明它们的错误更少。Enerjy 在 2008 年所做的研究表明，在圈复杂度与错误数量之间有很强的相关性。复杂度为 11 的类的出错概率为 0.28，而复杂度为 74 的类的出错概率会上升到 0.98。

　　正如在此示例中看到的，每个方法都极其简单，复杂度都低于 10，这符合 McCabe 提出的原则。在我的从业经历中，我见过在没有测试的情况下编写的巨大函数，它们的复杂度超过 140，长度超过 1200 行。毫无疑问，根本不可能测试这样的代码。实际上甚至无法确认它是有效的，也不可能重构它。如果代码的作者一直牢记测试，在保持 100% 测试覆盖率的情况下编写相同的逻辑，就不可能出现如此高的复杂度。

　　现在，我们来看一个完整的源代码示例以及相配的单元测试和功能性测试，看看它的实际作用以及为什么说这样的代码是干净的。按照严格的指标，“干净” 的合理定义是代码满足以下要求：接近 100% 测试覆盖率；所有类和方法的圈复杂度都低于 10；用 pylint 得到的评分接近 10.0。下面的示例使用 nose 在 highlight 模块上执行单元测试和 doctest 覆盖率检查：

% nosetests -v --with-coverage --cover-package=highlight --with-doctest\
     --cover-erase --exe

Doctest: highlight.HighlightDocumentOperations._custom_highlight_tag ... ok
test_functional.test_snippit_algorithm ... ok
test_custom_highlight_tag (test_highlight.TestHighlight) ... ok
Consumes the generator, and then verifies the result[0] ... ok
Verifies highlighted text is what we expect ... ok
test_multi_string_replace (test_highlight.TestHighlight) ... ok
Verifies the yielded results are what is expected ... ok

Name Stmts Exec Cover Missing
-----------------------------------------
highlight 71 71 100%
----------------------------------------------------------------------
Ran 7 tests in 4.223s

OK

　　如上所示，带几个选项运行了 nosetests 命令，highlight spy 脚本的测试覆盖率为 100%。惟一需要注意的是 --cover-package=highlight，它让 nose 只显示指定的模块的覆盖率报告。这可以非常有效地把覆盖率报告的输出限制为您希望观察的模块或包。可以从本文下载源代码，注释掉一些测试，从而观察覆盖率报告机制的实际工作情况。

#/usr/bin/python
# -*- coding: utf-8 -*-

"""
:mod:`highlight` -- Highlight Methods
===================================

.. module:: highlight
:platform: Unix, Windows
:synopsis: highlight document snippets that match a query.
.. moduleauthor:: Noah Gift


Requirements::
1. You will need to install the ntlk library to run this code.
http://www.nltk.org/download
2. You will need to download the data for the ntlk:
See http://www.nltk.org/data::

import nltk
nltk.download()

"""

import re
import logging

import nltk

#Globals
logging.basicConfig()
LOG = logging.getLogger("highlight")
LOG.setLevel(logging.INFO)

class HighlightDocumentOperations(object):

"""Highlight Operations for a Document"""

def __init__(self, document=None, query=None):
"""
Kwargs:
document (str):
query (str):

"""
self._document = document
self._query = query

@staticmethod
def _custom_highlight_tag(phrase,
start="<strong>",
end="</strong>"):

"""Injects an open and close highlight tag after a word

Args:
phrase (str) - A word or phrase.
Kwargs:
start (str) - An opening tag. Defaults to <strong>
end (str) - A closing tag. Defaults to </strong>
Returns:
(str) word or phrase with custom opening and closing tags

>>> h = HighlightDocumentOperations()
>>> h._custom_highlight_tag("foo")
'foo'
>>>

"""
tagged_phrase = "{0}{1}{2}".format(start, phrase, end)
return tagged_phrase

def _doc_to_sentences(self):
"""Takes a string document and converts it into a list of sentences

Unfortunately, this approach might be a tad naive for production
because some segments that are split on a period are really an
abbreviation, and to make things even more complicated, an
abbreviation can also be the end of a sentence::
http://nltk.googlecode.com/svn/trunk/doc/book/ch03.html

Returns:
(generator) A generator object of a tokenized sentence tuple,
with the list position of sentence as the first portion of
the tuple, such as: (0, "This was the first sentence")

"""

tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
sentences = tokenizer.tokenize(self._document)
for sentence in enumerate(sentences):
yield sentence

@staticmethod
def _score_sentences(sentence, querydict):
"""Creates a scoring system for each sentence by substitution analysis

Tokenizes each sentence, counts characters
in sentence, and pass it back as nested tuple

Returns:
(tuple) - (score (int), (count (int), position (int),
raw sentence (str))

"""

position, sentence = sentence
count = len(sentence)
regex = re.compile('|'.join(map(re.escape, querydict)))
score = len(re.findall(regex, sentence))
processed_score = (score, (count, position, sentence))
return processed_score

def _querystring_to_dict(self, split_token="+"):
"""Converts query parameters into a dictionary

Returns:
(dict)- dparams, a dictionary of query parameters

"""

params = self._query.split(split_token)
dparams = dict([(key, self._custom_highlight_tag(key)) for\
key in params])
return dparams

@staticmethod
def _word_frequency_sort(sentences):
"""Sorts sentences by score frequency, yields sorted result

This will yield the highest score count items first.

Args:
sentences (list) - a nested tuple inside of list
[(0, (90, 3, "The crust/dough was just way too effin' dry for me.
Yes, I know what 'cornmeal' is, thanks."))]

"""

sentences.sort()
while sentences:
yield sentences.pop()

def _create_snippit(self, sentences, max_characters=175):
"""Creates a snippet from a sentence while keeping it under max_chars

Returns a sorted list with max characters. The sort is an attempt
to rebuild the original document structure as close as possible,
with the new sorting by scoring and the limitation of max_chars.

Args:
sentences (generator) - sorted object to turn into a snippit
max_characters (int) - optional max characters of snippit

Returns:
snippit (list) - returns a sorted list with a nested tuple that
has the first index holding the original position of the list::

[(0, (90, 3, "The crust/dough was just way too effin' dry for me.
Yes, I know what 'cornmeal' is, thanks."))]

"""

snippit = []
total = 0
for sentence in self._word_frequency_sort(sentences):
LOG.debug("Creating snippit", sentence)
score, (count, position, raw_sentence) = sentence
total += count
if total < max_characters:
#position now gets converted to index 0 for sorting later
snippit.append(((position), score, count, raw_sentence))

#try to reassemble document by original order by doing a simple sort
snippit.sort()
return snippit

@staticmethod
def _multiple_string_replace(string_to_replace, dict_patterns):
"""Performs a multiple replace in a string with dict pattern.

Borrowed from Python Cookbook.

Args:
string_to_replace (str) - String to be multi-replaced
dict_patterns (dict) - A dict full of patterns

Returns:
(str) - Multiple replaced string.

"""

regex = re.compile('|'.join(map(re.escape, dict_patterns)))
def one_xlat(match):
"""Closure that is called repeatedly during multi-substitution.

Args:
match (SRE_Match object)
Returns:
partial string substitution (str)

"""

return dict_patterns[match.group(0)]

return regex.sub(one_xlat, string_to_replace)

def _reconstruct_document_string(self, snippit, querydict):
"""Reconstructs string snippit, build tags, and return string

A helper function for highlight_doc.

Args:
string_to_replace (list) - A list of nested tuples, containing
this pattern::

[(0, (90, 3, "The crust/dough was just way too effin' dry for me.
Yes, I know what 'cornmeal' is, thanks."))]

dict_patterns (dict) - A dict full of patterns

Returns:
(str) The most relevant snippet with the query terms highlighted.

"""

snip = []
for entry in snippit:
score = entry[1]
sent = entry[3]
#if we have matches, now do the multi-replace
if score:
sent = self._multiple_string_replace(sent,
querydict)
snip.append(sent)
highlighted_snip = " ".join(snip)

return highlighted_snip

def highlight_doc(self):
"""Finds the most relevant snippit with the query terms highlighted

Returns:
(str) The most relevant snippet with the query terms highlighted.

"""

#tokenize to sentences, and convert query to a dict
sentences = self._doc_to_sentences()
querydict = self._querystring_to_dict()

#process and score sentences
scored_sentences = []
for sentence in sentences:
scored = self._score_sentences(sentence, querydict)
scored_sentences.append(scored)

#fit into max characters, and sort by original position
snippit = self._create_snippit(scored_sentences)
#assemble back into string
highlighted_snip = self._reconstruct_document_string(snippit,
querydict)

return highlighted_snip

#/usr/bin/python
# -*- coding: utf-8 -*-
"""
Tests this query searches a document, highlights a snippit and returns it
http://www.example.com/search?find_desc=deep+dish+pizza&ns=1&rpp=10&find_loc=\
                                                        San+Francisco%2C+CA

Contains both unit and functional tests.

"""


import unittest
from highlight import HighlightDocumentOperations

class TestHighlight(unittest.TestCase):

def setUp(self):

self.document = """
Review for their take-out only.
Tried their large Classic (sausage, mushroom, peppers and onions) deep dish;\
and their large Pesto Chicken thin crust pizzas.
Pizza = I've had better. The crust/dough was just way too effin' dry for me.\
Yes, I know what 'cornmeal' is, thanks. But it's way too dry.\
I'm not talking about the bottom of the pizza...I'm talking about the dough \
that's in between the sauce and bottom of the pie...it was like cardboard, sorry!
Wings = spicy and good. Bleu cheese dressing only...hmmm, but no alternative\
of ranch dressing, at all. Service = friendly enough at the counters.
Decor = freakin' dark. I'm not sure how people can see their food.
Parking = a real pain. Good luck.

"""
self.query = "deep+dish+pizza"
self.hdo = HighlightDocumentOperations(self.document, self.query)

def test_custom_highlight_tag(self):

actual = self.hdo._custom_highlight_tag("foo",
start="[BAR]",
end="[ENDBAR]")
expected = "[BAR]foo[ENDBAR]"
self.assertEqual(actual,expected)

def test_query_string_to_dict(self):
"""Verifies the yielded results are what is expected"""

result = self.hdo._querystring_to_dict()
expected = {"deep": "deep",
"dish": "dish",
"pizza":"pizza"}

self.assertEqual(result,expected)

def test_multi_string_replace(self):

query = """pizza = I've had better"""
expected = """pizza = I've had better"""
query_dict = self.hdo._querystring_to_dict()
result = self.hdo._multiple_string_replace(query, query_dict)
self.assertEqual(expected, result)

def test_doc_to_sentences(self):
"""Consumes the generator, and then verifies the result[0]"""

results = []
expected = (0,'\nReview for their take-out only.')

for sentence in self.hdo._doc_to_sentences():
results.append(sentence)
self.assertEqual(results[0], expected)

def test_highlight(self):
"""Verifies highlighted text is what we expect"""

expected = """Tried their large Classic (sausage, mushroom, peppers and onions)\
deep
dish;and their large Pesto Chicken thin crust \
pizzas."""

actual = self.hdo.highlight_doc()
self.assertEqual(expected, actual)

def tearDown(self):

del self.query
del self.hdo
del self.document

if __name__ == '__main__':
unittest.main()

　　如果想运行以上代码示例，需要下载 Natural Language Toolkit 源代码并按照说明下载 nltk 数据。因为本文并不讨论代码示例本身，而是讨论创建和测试它的方式，所以不详细解释代码的实际作用。最后，我们对源代码运行静态代码分析工具 pylint：

% pylint highlight spy 
No config file found, using default configuration
************* Module highlight
E: 89:HighlightDocumentOperations._doc_to_sentences: Instance of 'unicode' has no 
    'tokenize' member (but some types could not be inferred)
E: 89:HighlightDocumentOperations._doc_to_sentences: Instance of 'ContextFreeGrammar' 
    has no 'tokenize' member (but some types could not be inferred)
W:108:HighlightDocumentOperations._score_sentences: Used builtin function 'map'
W:192:HighlightDocumentOperations._multiple_string_replace: Used builtin function 'map'
R: 34:HighlightDocumentOperations: Too few public methods (1/2)

Report
======
69 statements analysed.

Global evaluation
-----------------
Your code has been rated at 8.12/10 (previous run: 8.12/10)

　　代码的得分为 10 分制的 8.12 分，工具还指出了几处缺陷。pylint 是可配置的，很可能需要根据项目的需求配置它。可以参考 pylint 官方文档（见参考资料）。对于这个示例，第 89 行上的两个错误源于外部库 nltk，两个警告可以通过修改 pylint 的配置消除。一般来说，不希望允许源代码中存在 pylint 指出的错误，但是在某些时候，比如对于上面的示例，可能需要做出务实的决定。它并不是完美的工具，但是我发现它在实际工作中非常有用。

　　结束语

　　在本文中，我们探讨了看待测试的方式如何影响软件的结构，以及缺乏面向测试的思想为什么会给项目带来致命的危害。我们提供了一个完整的代码示例，包括功能性测试和单元测试，用 nose 对它执行了代码覆盖率分析，还运行了两个静态分析工具 pylint 和 pygenie。我们没有来得及讨论的一个问题是，如何通过某种形式的连续集成测试使这个过程自动化。幸运的是，很容易用开放源码的 Java™ 连续集成系统 Hudson 实现这个目标。我希望您参考 Hudson 的文档（见参考资料），尝试为项目建立自动化测试，它应该运行您的所有测试，包括静态代码分析。

　　最后，测试不是万灵药，静态分析工具也不是。软件开发是艰难的工作。为了争取成功，我们必须时刻牢记真正的目标。不但要解决问题，而且要创建能够证明有效的东西。如果您同意这个观点，就应该明白过分复杂的代码、傲慢的设计态度以及对 Python 的强大能力缺乏尊重都会直接妨碍实现这个目标。

标签：Python 代码编程语言

用 Python 编写干净、可测试、高质量的代码

什么是圈复杂度？

推荐链接

编程语言热门文章

编程语言最新文章

最新新闻

热门新闻