電腦編程中,(block)或譯為程式區塊代碼塊,是將原始碼組織在一起的詞法結構。塊構成自一個或多個聲明英語Declaration (computer programming)語句。程式語言允許建立塊,包括嵌入其他塊之內的塊,就叫做塊結構程式語言。塊和子程式結構化編程的基礎,結構化所強調的控制結構可以用塊來形成的。

在編程中塊的功能,是確使成組的語句被當作如同就是一個語句,限定在一個塊中聲明的對象如變數、過程和函式的詞法作用域,使得它們不衝突於在其他地方用到的同名者。在塊結構程式語言中,在塊外部的對象名字在塊內部是可見的,除非它們被聲明了相同名字的對象所遮掩

歷史

編輯

塊結構的想法是在1950年代開發最初的Autocode英語Autocode期間發展出來的,並形式化於ALGOL 60報告中。ALGOL 58介入了「複合」(compound)語句的概念,它只與控制流程有關[1]。在「ALGOL 60報告」中,介入了塊和作用域的概念[2]。最終在「修訂報告」中,複合語句被定義為:包圍在語句括號beginend之間的成序列的語句,形成一個複合語句。塊被定義為:成序列的聲明,跟隨著成序列的語句,並被包圍在beginend之間,形成一個塊;所有聲明以這種方式出現在一個塊中,並只在這個塊中有效[3]。塊與複合語句的主要差異是不能從塊外跳轉到塊內的標籤[4]

語法

編輯

塊在不同語言家族中使用不同的語法:

此外,複合語句界定還可以採用:

建立控制結構,除了將所控制的語句序列,包圍入複合語句或匿名塊之外,還可以採用其他語法機制:

限制

編輯

受ALGOL影響的一些語言支援塊,但有著各自的限制:

  • C家族語言,在塊和複合語句之中不僅支援巢狀入複合語句,還支援嵌入帶有聲明的匿名塊,但不允許聲明巢狀函式英語nested function[8]
  • Pascal家族語言,在語句部份的複合語句之中,不允許存在帶有聲明的匿名塊[6],只支援複合語句,用來在ifwhilerepeat等控制語句內組合語句序列。

基本語意

編輯

塊的語意是雙重的。首先,它向編程者提供了建立任意大和複雜的結構,並把它當作一個單元的一種途徑。其次,它確使編程者能限制變數的作用域,有時可以限制已經被聲明了的其他對象的作用域。

在早期語言比如FORTRANBASIC中,沒有語句塊或控制結構。直到1978年標準化FORTRAN 77之前,都沒有「塊狀IF」語句,要實現按條件選擇,必須訴諸GOTO語句。例如下述FORTRAN代碼片段,從雇員工資中分別扣除超出正稅閾值部分的稅款,和超出附加稅閾值部分的附加稅款:

C     语言:ANSI标准FORTRAN 66
C     初始化要计算的值
      PAYSTX = .FALSE.
      PAYSST = .FALSE.
      TAX = 0.0
      SUPTAX = 0.0
C     如果雇员挣钱小于等于正税阈值则跃过税款扣除
      IF (WAGES .LE. TAXTHR) GOTO 10
        PAYSTX = .TRUE.
        TAX = (WAGES - TAXTHR) * BASCRT
   10 CONTINUE
C     如果雇员挣钱小于等于附加税阈值则跃过附加税扣除
      IF (WAGES .LE. SUPTHR) GOTO 20
        PAYSST = .TRUE.
        SUPTAX = (WAGES - SUPTHR) * SUPRAT
   20 CONTINUE
      TAXED = WAGES - TAX - SUPTAX

程式的邏輯結構不反映在代碼中,這裡的初始化的值,是後面的有關邏輯判斷為假時所應當設定的值。

塊允許編程者把一組語句當作一個單元。例如,在與上述FORTRAN代碼相對應的Pascal代碼片段:

{ 语言:Jensen与Wirth版标准Pascal }
if Wages > TaxThreshold then
begin
    PaysTax := true;
    Tax := (Wages - TaxThreshold) * TaxRate
end
else begin
    PaysTax := false;
    Tax := 0
end;
if Wages > SupertaxThreshold then
begin
    PaysSupertax := true;
    Supertax := (Wages - SupertaxThreshold) * SupertaxRate
end
else begin
    PaysSupertax := false;
    Supertax := 0
end;
Taxed := Wages - Tax - Supertax;

與上述FORTRAN代碼相比,上例中出現在初始化中的那些預設值,通過複合語句即不帶聲明的塊結構,被分別放置作出有關邏輯判斷的地方。使用塊結構,能明晰編程者的意圖,使代碼的結構更加密切反映出編程者的思考;再憑藉某種風格的縮排駝峰式大小寫增進可讀性,可使代碼更加容易理解和修改。

在早期語言中,在次常式中變數的作用域遍及整個次常式。假想在一個Fortran次常式中,完成了與管理者有關的任務,這裡可能用到叫做IEMPNO的一個整數變數,指示作為管理者的雇員的社會安全號碼(SSN);後來在這個次常式的維護工作中,又增加與下屬們有關的任務,此時編程者可能不經意間使用同名變數IEMPNO,指示了作為這個管理者的下屬的雇員的SSN,這就會導致一個難於跟蹤的缺陷。

塊結構使得編程者能夠容易地將作用域控制到細微級別。例如完成有關雇員任務的Scheme代碼片段:

;; 语言:R5RS标准Scheme
(let ((empno (ssn-of employee-name)))
  (when (is-manager? empno) ;; when已列入R7RS-small标准
    (let ((employee-list (underlings-of empno)))
      (display
        ;; format是SRFI-28和SRFI-48规定的字符串格式化过程
        (format "~a has ~a employees working under him:~%"
          employee-name (length employee-list)))
      (for-each
        (lambda (empno)
          (display
            (format "Name: ~a, role: ~a~%"
              (name-of empno) (role-of empno))))
        employee-list))))

這裡在外層通過繫結let將管理者的SSN繫結到了局部變數empno,在其形成的塊的作用域中列出管理者的雇員名字和他的下屬數目;隨後通過for-each高階函式,將他所有下屬的SSN逐個繫結到匿名函式lambda的形式參數empno上,執行此匿名函式列出這個下屬的名字和角色;這個形式參數的作用域是此匿名函式的主體,它與其外層的局部變數,識別碼重名但不相互影響。在實踐中,出於清晰性的考慮,編程者更可能選取明顯不同的變數名字,但是即使名字選取存在重複,也難以在不經意間介入一個缺陷。在基於S-表達式的語言中,經常見到大量的巢狀圓括號,故而其代碼必須採用良好的縮排

提升

編輯

在一些語言中,變數可以聲明為有函式作用域即使它位於函式的內嵌塊之中。例如在JavaScript中,變數應當總是在使用之前被聲明,它曾經允許賦值到未聲明變數,會為此建立為未聲明的全域變數,這在strict模態下是個錯誤。以var聲明的變數有函式作用域,而非以letconst聲明的變數可從屬的塊作用域。以var聲明的變數會被提升(hoist),這意味著可以在這個函式的作用域內任何地方提及這個變數,即使還未觸及到它的聲明,從而可以將var聲明視為被提舉(lift)到它所在函式的頂部或全域作用域。但是如果在其聲明之前訪問了一個變數,這個變數的值總是未指定的。

參見

編輯

參照

編輯
  1. ^ Perlis, A. J.; Samelson, K. Preliminary report: international algebraic language (PDF). Communications of the ACM (New York, NY, USA: ACM). 1958, 1 (12): 8–22 [2023-02-20]. doi:10.1145/377924.594925. (原始內容存檔 (PDF)於2023-02-20). Strings of one or more statements may be combined into a single (compound) statement by enclosing them within the "statement parentheses" begin and end. Single statements are separated by the statement separator ";". 
  2. ^ John Backus; Friedrich L. Bauer; J. Green; C. Katz; John McCarthy; Alan Jay Perlis; Heinz Rutishauser; K. Samelson; B. Vauquois; J. H. Wegstein; A. van Wijngaarden; M. Woodger. Peter Naur , 編. Report on the Algorithmic Language ALGOL 60 (PDF) 3 (5). New York, NY, USA: ACM: 299–314. May 1960 [2009-10-27]. ISSN 0001-0782. doi:10.1145/367236.367262. (原始內容存檔 (PDF)於2022-12-13). Sequences of statements may be combined into compound statements by insertion of statement brackets. ……
    Each declaration is attached to and valid for one compound statement. A compound statement which includes declarations is called a block.
     
  3. ^ J. W. Backus, F. L. Bauer, J. Green, C. Katz, J. McCarthy, P. Naur, A. J. Perlis, H. Rutishauser, K. Samelson, B. Vauquois, J. H. Wegstein, A. van Wijngaarden, M. Woodger. Peter Naur , 編. Revised Report on the Algorithmic Language ALGOL 60. Communications of the ACM, Volume 6, Number 1, pages 1-17. January 1963 [2023-02-20]. (原始內容存檔於2023-02-20). A sequence of statements may be enclosed between the statement brackets begin and end to form a compound statement. ……
    A sequence of declarations followed by a sequence of statements and enclosed between begin and end constitutes a block. Every declaration appears in a block in this way and is valid only for that block.
     
  4. ^ J. W. Backus, F. L. Bauer, J. Green, C. Katz, J. McCarthy, P. Naur, A. J. Perlis, H. Rutishauser, K. Samelson, B. Vauquois, J. H. Wegstein, A. van Wijngaarden, M. Woodger. Peter Naur , 編. Revised Report on the Algorithmic Language ALGOL 60. Communications of the ACM, Volume 6, Number 1, pages 1-17. January 1963 [2023-02-20]. (原始內容存檔於2023-02-20). Since labels are inherently local, no go to statement can lead from outside into a block. A go to statement may, however, lead from outside into a compound statement. 
  5. ^ A. van Wijngaarden, B. J. Mailloux, J. E. L. Peck, C. H. A. Koster, M. Sintzoff, C. H. Lindsey, L. G. L.T. Meertens and R. G. Fisker. Revised Report on the Algorithmic Language Algol 68. IFIP W.G. 2.1. [2023-02-20]. (原始內容存檔於2020-07-11). The ALGOL 60 concepts of block, compound statement and parenthesized expression are unified in ALGOL 68 into the serial-clause. A serial-clause may be an expression and yield a value. ……
    A serial-clause consists of a possibly empty sequence of unlabelled phrases, the last of which, if any, is a declaration, followed by a sequence of possibly labelled units. The phrases and the units are separated by go-on-tokens, viz., semicolons. Some of the units may instead be separated by completers, viz., EXITs; after a completer, the next unit must be labelled so that it can be reached. The value of the final unit, or of a unit preceding an EXIT, determines the value of the serial-clause.
     
  6. ^ 6.0 6.1 Kathleen Jensen, Niklaus Wirth. PASCAL User Manual and Report - ISO Pascal Standard (PDF). 1991 [2023-02-20]. (原始內容存檔 (PDF)於2023-02-20). The program is divided into a heading and a body, called a block. The heading gives the program a name and lists its parameters. …… The block consists of six sections, where any except the last may be empty. They must appear in the order given in the definition for a block:
    Block =
        LabeLDeclarationPart
        ConstantDefinitionPart
        TypeDefinitionPart
        VariableDeclarationPart
        ProcedureAndFunctionDeclarationPart
        StatementPart.
    ……
    Each procedure and function declaration has a structure similar to a program; i.e. , each consists of a heading and a block. ……
    The compound statement is that of Algol, and corresponds to the DO group in PL/I. ……
    The "block structure" differs from that of Algol and PL/I insofar as there are no anonymous blocks; i.e., each block is given a name and thereby is made into a procedure or function.
     
  7. ^ Kathleen Jensen, Niklaus Wirth. PASCAL User Manual and Report - ISO Pascal Standard (PDF). 1991 [2023-02-20]. (原始內容存檔 (PDF)於2023-02-20). The compound statement specifies that its component statements be executed in the same sequence as they are written. The symbols begin and end act as statement brackets. ……
    Pascal uses the semicolon to separate statements, not to terminate statements; i.e., the semicolon is not part of the statement.
     
  8. ^ 8.0 8.1 Brian Kernighan, Dennis Ritchie. The C Programming Language, Second Edition (PDF). Prentice Hall. 1988. In C, the semicolon is a statement terminator, rather than a separator as it is in languages like Pascal.
    Braces { and } are used to group declarations and statements together into a compound statement, or block, so that they are syntactically equivalent to a single statement. The braces that surround the statements of a function are one obvious example; braces around multiple statements after an if, else, while, or for are another. (Variables can be declared inside any block; ……) There is no semicolon after the right brace that ends a block. ……
    A label has the same form as a variable name, and is followed by a colon. It can be attached to any statement in the same function as the goto. The scope of a label is the entire function. ……
    C is not a block-structured language in the sense of Pascal or similar languages, because functions may not be defined within other functions. On the other hand, variables can be defined in a block-structured fashion within a function. Declarations of variables (including initializations) may follow the left brace that introduces any compound statement, not just the one that begins a function. Variables declared in this way hide any identically named variables in outer blocks, and remain in existence until the matching right brace. ……
    An automatic variable declared and initialized in a block is initialized each time the block is entered.
    Automatic variables, including formal parameters, also hide external variables and functions of the same name.
     
  9. ^ John McCarthy, Paul W. Abrahams, Daniel J. Edwards, Timothy P. Hart, Michael I. Levin. LISP 1.5 Programmer's Manual (PDF) 2nd. MIT Press. 1985 [1962] [2021-09-23]. ISBN 0-262-13011-4. (原始內容 (PDF)存檔於2021-03-02). The LISP 1.5 program feature allows the user to write an Algol-like program containing LISP statements to be executed. ……
    The program form has the structure - (PROG, list of program variables, sequence of statements and atomic symbols...) An atomic symbol in the list is the location marker for the statement that follows.
     
  10. ^ Kent M. Pitman英語Kent Pitman. The Revised Maclisp Manual. 1983, 2007 [2021-10-14]. (原始內容存檔於2021-12-21). LET is used to bind some variables to some objects, and then to evaluate some forms (those which make up the body) in the context of those bindings. ……
    LET* Same as LET but does bindings in sequence instead of in parallel.