塊 (編程)
在計算機編程中,塊(block)或譯為程式區塊、代碼塊,是將源代碼組織在一起的詞法結構。塊構成自一個或多個聲明和語句。編程語言允許創建塊,包括嵌入其他塊之內的塊,就叫做塊結構編程語言。塊和子程序是結構化編程的基礎,結構化所強調的控制結構可以用塊來形成的。
在編程中塊的功能,是確使成組的語句被當作如同就是一個語句,限定在一個塊中聲明的對象如變量、過程和函數的詞法作用域,使得它們不衝突於在其他地方用到的同名者。在塊結構編程語言中,在塊外部的對象名字在塊內部是可見的,除非它們被聲明了相同名字的對象所遮掩。
歷史
塊結構的想法是在1950年代開發最初的Autocode期間發展出來的,並形式化於ALGOL 60報告中。ALGOL 58介入了「複合」(compound)語句的概念,它只與控制流程有關[1]。在「ALGOL 60報告」中,介入了塊和作用域的概念[2]。最終在「修訂報告」中,複合語句被定義為:包圍在語句括號begin
和end
之間的成序列的語句,形成一個複合語句。塊被定義為:成序列的聲明,跟隨着成序列的語句,並被包圍在begin
和end
之間,形成一個塊;所有聲明以這種方式出現在一個塊中,並只在這個塊中有效[3]。塊與複合語句的主要差異是不能從塊外跳轉到塊內的標籤[4]。
語法
塊在不同語言家族中使用不同的語法:
- ALGOL語言家族,ALGOL 60及其後繼者比如Simula,使用語句括號
begin
和end
來界定複合語句和塊。ALGOL 68成為了面向表達式編程語言,偏好使用與begin
和end
等價的圓括號(
和)
[5]。 - Lisp語言家族,Lisp 1.5使用具有語法關鍵字
prog
的S-表達式表示塊[9],而Maclisp和Scheme使用let
形式的S-表達式來表示塊[10],S-表達式是圓括號(
和)
包圍的前綴表示法。 - Smalltalk語言家族,Smalltalk-80和Self使用方括號
[
和]
來界定塊。
此外,複合語句界定還可以採用:
建立控制結構,除了將所控制的語句序列,包圍入複合語句或匿名塊之外,還可以採用其他語法機制:
- 在ALGOL 68中,條件和迭代語句,使用塊首保留字的反寫保留字來終止,比如:
IF ~ THEN ~ ELIF ~ THEN ~ ELSE ~ FI
和FOR ~ FROM ~ TO ~ BY ~ WHILE ~ DO ~ OD
。繼承此風格的有:Dijkstra的守衛命令語言和Bourne的Bourne shell等。 - 一些結構化編程語言,如FORTRAN 77、Modula-2、Ada和Visual Basic等,對控制結構加結束關鍵字,比如Modula-2中的:
IF ~ THEN ~ ELSIF ~ THEN ~ ELSE ~ END
和FOR ~ TO ~ BY ~ DO ~ END
。
限制
受ALGOL影響的一些語言支持塊,但有着各自的限制:
基本語義
塊的語義是雙重的。首先,它向編程者提供了建立任意大和複雜的結構,並把它當作一個單元的一種途徑。其次,它確使編程者能限制變量的作用域,有時可以限制已經被聲明了的其他對象的作用域。
在早期語言比如FORTRAN和BASIC中,沒有語句塊或控制結構。直到1978年標準化FORTRAN 77之前,都沒有「塊狀IF
」語句,要實現按條件選擇,必須訴諸GOTO
語句。例如下述FORTRAN代碼片段,從雇員工資中分別扣除超出正稅閾值部分的稅款,和超出附加稅閾值部分的附加稅款:
C 语言:ANSI标准FORTRAN 66
C 初始化要计算的值
PAYSTX = .FALSE.
PAYSST = .FALSE.
TAX = 0.0
SUPTAX = 0.0
C 如果雇员挣钱小于等于正税阈值则跃过税款扣除
IF (WAGES .LE. TAXTHR) GOTO 10
PAYSTX = .TRUE.
TAX = (WAGES - TAXTHR) * BASCRT
10 CONTINUE
C 如果雇员挣钱小于等于附加税阈值则跃过附加税扣除
IF (WAGES .LE. SUPTHR) GOTO 20
PAYSST = .TRUE.
SUPTAX = (WAGES - SUPTHR) * SUPRAT
20 CONTINUE
TAXED = WAGES - TAX - SUPTAX
程序的邏輯結構不反映在代碼中,這裡的初始化的值,是後面的有關邏輯判斷為假時所應當設置的值。
塊允許編程者把一組語句當作一個單元。例如,在與上述FORTRAN代碼相對應的Pascal代碼片段:
{ 语言:Jensen与Wirth版标准Pascal }
if Wages > TaxThreshold then
begin
PaysTax := true;
Tax := (Wages - TaxThreshold) * TaxRate
end
else begin
PaysTax := false;
Tax := 0
end;
if Wages > SupertaxThreshold then
begin
PaysSupertax := true;
Supertax := (Wages - SupertaxThreshold) * SupertaxRate
end
else begin
PaysSupertax := false;
Supertax := 0
end;
Taxed := Wages - Tax - Supertax;
與上述FORTRAN代碼相比,上例中出現在初始化中的那些缺省值,通過複合語句即不帶聲明的塊結構,被分別放置作出有關邏輯判斷的地方。使用塊結構,能明晰編程者的意圖,使代碼的結構更加密切反映出編程者的思考;再憑藉某種風格的縮進和駝峰式大小寫增進可讀性,可使代碼更加容易理解和修改。
在早期語言中,在子例程中變量的作用域遍及整個子例程。假想在一個Fortran子例程中,完成了與管理者有關的任務,這裡可能用到叫做IEMPNO
的一個整數變量,指示作為管理者的雇員的社會安全號碼(SSN);後來在這個子例程的維護工作中,又增加與下屬們有關的任務,此時編程者可能不經意間使用同名變量IEMPNO
,指示了作為這個管理者的下屬的雇員的SSN,這就會導致一個難於跟蹤的缺陷。
塊結構使得編程者能夠容易地將作用域控制到細微級別。例如完成有關雇員任務的Scheme代碼片段:
;; 语言:R5RS标准Scheme
(let ((empno (ssn-of employee-name)))
(when (is-manager? empno) ;; when已列入R7RS-small标准
(let ((employee-list (underlings-of empno)))
(display
;; format是SRFI-28和SRFI-48规定的字符串格式化过程
(format "~a has ~a employees working under him:~%"
employee-name (length employee-list)))
(for-each
(lambda (empno)
(display
(format "Name: ~a, role: ~a~%"
(name-of empno) (role-of empno))))
employee-list))))
這裡在外層通過綁定宏let
將管理者的SSN綁定到了局部變量empno
,在其形成的塊的作用域中列出管理者的雇員名字和他的下屬數目;隨後通過for-each
高階函數,將他所有下屬的SSN逐個綁定到匿名函數lambda
的形式參數empno
上,執行此匿名函數列出這個下屬的名字和角色;這個形式參數的作用域是此匿名函數的主體,它與其外層的局部變量,標識符重名但不相互影響。在實踐中,出於清晰性的考慮,編程者更可能選取明顯不同的變量名字,但是即使名字選取存在重複,也難以在不經意間介入一個缺陷。在基於S-表達式的語言中,經常見到大量的嵌套圓括號,故而其代碼必須採用良好的縮進。
提升
在一些語言中,變量可以聲明為有函數作用域即使它位於函數的內嵌塊之中。例如在JavaScript中,變量應當總是在使用之前被聲明,它曾經允許賦值到未聲明變量,會為此建立為未聲明的全局變量,這在strict
模態下是個錯誤。以var
聲明的變量有函數作用域,而非以let
或const
聲明的變量可從屬的塊作用域。以var
聲明的變量會被提升(hoist),這意味着可以在這個函數的作用域內任何地方提及這個變量,即使還未觸及到它的聲明,從而可以將var
聲明視為被提舉(lift)到它所在函數的頂部或全局作用域。但是如果在其聲明之前訪問了一個變量,這個變量的值總是未指定的。
參見
引用
- ^ Perlis, A. J.; Samelson, K. Preliminary report: international algebraic language (PDF). Communications of the ACM (New York, NY, USA: ACM). 1958, 1 (12): 8–22 [2023-02-20]. doi:10.1145/377924.594925. (原始內容存檔 (PDF)於2023-02-20).
Strings of one or more statements may be combined into a single (compound) statement by enclosing them within the "statement parentheses"
begin
andend
. Single statements are separated by the statement separator ";
". - ^ John Backus; Friedrich L. Bauer; J. Green; C. Katz; John McCarthy; Alan Jay Perlis; Heinz Rutishauser; K. Samelson; B. Vauquois; J. H. Wegstein; A. van Wijngaarden; M. Woodger. Peter Naur , 編. Report on the Algorithmic Language ALGOL 60 (PDF) 3 (5). New York, NY, USA: ACM: 299–314. May 1960 [2009-10-27]. ISSN 0001-0782. doi:10.1145/367236.367262. (原始內容存檔 (PDF)於2022-12-13).
Sequences of statements may be combined into compound statements by insertion of statement brackets. ……
Each declaration is attached to and valid for one compound statement. A compound statement which includes declarations is called a block. - ^ J. W. Backus, F. L. Bauer, J. Green, C. Katz, J. McCarthy, P. Naur, A. J. Perlis, H. Rutishauser, K. Samelson, B. Vauquois, J. H. Wegstein, A. van Wijngaarden, M. Woodger. Peter Naur , 編. Revised Report on the Algorithmic Language ALGOL 60. Communications of the ACM, Volume 6, Number 1, pages 1-17. January 1963 [2023-02-20]. (原始內容存檔於2023-02-20).
A sequence of statements may be enclosed between the statement brackets
begin
andend
to form a compound statement. ……
A sequence of declarations followed by a sequence of statements and enclosed betweenbegin
andend
constitutes a block. Every declaration appears in a block in this way and is valid only for that block. - ^ J. W. Backus, F. L. Bauer, J. Green, C. Katz, J. McCarthy, P. Naur, A. J. Perlis, H. Rutishauser, K. Samelson, B. Vauquois, J. H. Wegstein, A. van Wijngaarden, M. Woodger. Peter Naur , 編. Revised Report on the Algorithmic Language ALGOL 60. Communications of the ACM, Volume 6, Number 1, pages 1-17. January 1963 [2023-02-20]. (原始內容存檔於2023-02-20).
Since labels are inherently local, no go to statement can lead from outside into a block. A go to statement may, however, lead from outside into a compound statement.
- ^ A. van Wijngaarden, B. J. Mailloux, J. E. L. Peck, C. H. A. Koster, M. Sintzoff, C. H. Lindsey, L. G. L.T. Meertens and R. G. Fisker. Revised Report on the Algorithmic Language Algol 68. IFIP W.G. 2.1. [2023-02-20]. (原始內容存檔於2020-07-11).
The ALGOL 60 concepts of block, compound statement and parenthesized expression are unified in ALGOL 68 into the serial-clause. A serial-clause may be an expression and yield a value. ……
A serial-clause consists of a possibly empty sequence of unlabelled phrases, the last of which, if any, is a declaration, followed by a sequence of possibly labelled units. The phrases and the units are separated by go-on-tokens, viz., semicolons. Some of the units may instead be separated by completers, viz.,EXIT
s; after a completer, the next unit must be labelled so that it can be reached. The value of the final unit, or of a unit preceding anEXIT
, determines the value of the serial-clause. - ^ 6.0 6.1 Kathleen Jensen, Niklaus Wirth. PASCAL User Manual and Report - ISO Pascal Standard (PDF). 1991 [2023-02-20]. (原始內容存檔 (PDF)於2023-02-20).
The program is divided into a heading and a body, called a block. The heading gives the program a name and lists its parameters. …… The block consists of six sections, where any except the last may be empty. They must appear in the order given in the definition for a block:
Block =
LabeLDeclarationPart
ConstantDefinitionPart
TypeDefinitionPart
VariableDeclarationPart
ProcedureAndFunctionDeclarationPart
StatementPart.
……
Each procedure and function declaration has a structure similar to a program; i.e. , each consists of a heading and a block. ……
The compound statement is that of Algol, and corresponds to the DO group in PL/I. ……
The "block structure" differs from that of Algol and PL/I insofar as there are no anonymous blocks; i.e., each block is given a name and thereby is made into a procedure or function. - ^ Kathleen Jensen, Niklaus Wirth. PASCAL User Manual and Report - ISO Pascal Standard (PDF). 1991 [2023-02-20]. (原始內容存檔 (PDF)於2023-02-20).
The compound statement specifies that its component statements be executed in the same sequence as they are written. The symbols
begin
andend
act as statement brackets. ……
Pascal uses the semicolon to separate statements, not to terminate statements; i.e., the semicolon is not part of the statement. - ^ 8.0 8.1
Brian Kernighan, Dennis Ritchie. The C Programming Language, Second Edition (PDF). Prentice Hall. 1988.
In C, the semicolon is a statement terminator, rather than a separator as it is in languages like Pascal.
Braces{
and}
are used to group declarations and statements together into a compound statement, or block, so that they are syntactically equivalent to a single statement. The braces that surround the statements of a function are one obvious example; braces around multiple statements after anif
,else
,while
, orfor
are another. (Variables can be declared inside any block; ……) There is no semicolon after the right brace that ends a block. ……
A label has the same form as a variable name, and is followed by a colon. It can be attached to any statement in the same function as thegoto
. The scope of a label is the entire function. ……
C is not a block-structured language in the sense of Pascal or similar languages, because functions may not be defined within other functions. On the other hand, variables can be defined in a block-structured fashion within a function. Declarations of variables (including initializations) may follow the left brace that introduces any compound statement, not just the one that begins a function. Variables declared in this way hide any identically named variables in outer blocks, and remain in existence until the matching right brace. ……
An automatic variable declared and initialized in a block is initialized each time the block is entered.
Automatic variables, including formal parameters, also hide external variables and functions of the same name. - ^
John McCarthy, Paul W. Abrahams, Daniel J. Edwards, Timothy P. Hart, Michael I. Levin. LISP 1.5 Programmer's Manual (PDF) 2nd. MIT Press. 1985 [1962] [2021-09-23]. ISBN 0-262-13011-4. (原始內容 (PDF)存檔於2021-03-02).
The LISP 1.5 program feature allows the user to write an Algol-like program containing LISP statements to be executed. ……
The program form has the structure - (PROG
, list of program variables, sequence of statements and atomic symbols...) An atomic symbol in the list is the location marker for the statement that follows. - ^
Kent M. Pitman. The Revised Maclisp Manual. 1983, 2007 [2021-10-14]. (原始內容存檔於2021-12-21).
LET
is used to bind some variables to some objects, and then to evaluate some forms (those which make up the body) in the context of those bindings. ……LET*
Same asLET
but does bindings in sequence instead of in parallel.