Skip to content

Latest commit

 

History

History
189 lines (155 loc) · 6.66 KB

DESIGN_NOTES.md

File metadata and controls

189 lines (155 loc) · 6.66 KB

Design Notes

This is a personal memo.

Note: Some not-so-popular Unicode characters are used. Recommended fonts to see this document are:

Symbols Used in This Document

Through out this document, I use these symbols:

Symbol Meaning
EOF
A Alphabets and numbers
. One of the "wordSeparator"
Whitespace
End of line

Additionally, location of the cursor after executing a command is expressed by vertical bar (|) symbol in a sequence of symbols. For example, A|. means that assuming there is a sequence consisted with alphabets and numbers at the cursor location, and nothing follow the sequence (EOF), then a command which we are discussing moves the cursor just after the sequence.

Japanese-Word-Handler

  • cursorWordRight

    • Procedure
      1. If the cursor is at an end-of-document, return original position.
      2. If the cursor is at an end-of-line, return position of the next line.
      3. If the cursor is at WSP character(s), skip the WSP(s) starting with it.
      4. If no characters exist after the WSPs, return the position.
      5. If there is a non-WSP character after the WSPs, return end position of a non-WSP character sequence which starts with it.
    • Illustration:
      |▮      A|▮     .|▮     ␣|▮     ⏎|▮
                      .|A     ␣A|▮    ⏎|A
                              ␣A|.
                              ␣A|␣
                              ␣A|⏎
              A|.             ␣|.     ⏎|.
              A|␣     .|␣             ⏎|␣
              A|⏎     .|⏎     ␣|⏎     ⏎|⏎
      
  • cursorWordStartRight

    • Procedure
      1. If the cursor is at an end-of-document, return original position.
      2. If the cursor is at an end-of-line, return position of the next line.
      3. Find ending position of a sequence starting with the character at cursor. Then, return position of where WSPs following the sequence end.
    • Illustration:
      |▮      A|▮     .|▮     ␣|▮     ⏎|▮
                      .|A     ␣|A     ⏎|A
              A|.             ␣|.     ⏎|.
              A␣|▮    .␣|▮            ⏎|␣
              A␣|A    .␣|A
              A␣|.    .␣|.
              A␣|⏎    .␣|⏎
              A|⏎     .|⏎     ␣|⏎     ⏎|⏎
      
  • cursorWordEndLeft

    • Procedure:
      1. If the cursor is at an start-of-document, return original position.
      2. If the cursor is at an start-of-line, return end position of the previous line.
      3. Find starting position of a sequence which ends at the cursor position. Then, return position of where WSPs preceding it starts.
    • Illustration:
      ▮|   ▮|A     ▮|.    ▮|␣     ▮|⏎
                   A|.    A|␣     A|⏎
           .|A            .|␣     .|⏎
          ▮|␣A    ▮|␣.            ␣|⏎
          A|␣A    A|␣.
          .|␣A    .|␣.
          ⏎|␣A    ⏎|␣.
          ⏎|A     ⏎|.    ⏎|␣     ⏎|⏎
      

There logic can be implemented as finite state automaton but I feel doing so is "overkill". So, I implemented these in a form of imperative procedures.

Anatomy of Cursor Movement in Other Text Editors

Visual Studio Code (v1.37.0)

VSCode has two set of word by word cursor movement logics. First one is the logic used in most cases except for "word part" related actions. Another one is the logic for "word part" related actions.

Commands of the second version have "part" in their name (e.g.: cursorWordPartRight) and they can recognize words inside a camelCasedWords or a sname_case_words. It seems that commands of this version are not affected by "wordSeparator" configuration.

Non "word part" version

  • cursorWordEndRight

    |▮          ⏎|▮
    A|▮         ⏎A|▮
    A|.         ⏎A|.
    A|␣         ⏎A|␣
    A|⏎         ⏎A|⏎
    .|▮         ⏎.|▮
    .|A         ⏎.|A
    .|␣         ⏎.|␣
    .|⏎         ⏎.|⏎
    ␣|▮         ⏎␣|▮
    ␣A|▮        ⏎␣A|▮
    ␣A|.        ⏎␣A|.
    ␣A|␣        ⏎␣A|␣
    ␣A|⏎        ⏎␣A|⏎
    ␣.|▮        ⏎␣.|▮
    ␣.|A        ⏎␣.|A
    ␣.|␣        ⏎␣.|␣
    ␣.|⏎        ⏎␣.|⏎
    ␣|⏎         ⏎␣|⏎
                ⏎|⏎
    
  • cursorWordStartRight

    |▮      A|▮     .|▮             ⏎|▮
                    .|A             ⏎|A
            A|.                     ⏎|.
            A␣|▮    .␣|▮    ␣|▮     ⏎␣|▮
            A␣|A    .␣|A    ␣|A     ⏎␣|A
            A␣|.    .␣|.    ␣|.     ⏎␣|.
            A␣|⏎    .␣|⏎    ␣|⏎     ⏎␣|⏎
            A|⏎     .|⏎             ⏎|⏎
    

Word part version

Essentially the difference from this version of commands and default ones is that these can stop inside a sequence of alphabets if condition met. The conditions are:

  1. Previous character is an underscore and the next is not an underscore (for snake_cased_words)
  2. Previous character is a lower cased alphabet and the next is an uppercased alphabet (for camelCasedWords or PascalCasedWords)
  3. Previous character is an upper cased alphabet, the next is an uppercased alphabet and the character next of the next is a lowercased alphabet (for all capital words inside a camelCASEDWords or a PascalCASEDWords)

Vim (v8.1.1843)

Vim separates words by character classification.

On classifying a character, Vim firstly checks whether it is less than 0xFF or not. If so, it will be classified into a white space, punctuation, or "word character" which is specified by the configuration iskeyword (wordSeparator in VSCode.) If the character is greater than 0xFF, Vim classifies it under the basic rule as: white spaces are 0, punctuations are 1, emojis are 3, and others are equals to or greater than 2 (but not 3). Punctuation characters in various languages and known character set are defined in a table and resolved as 1 or code point value of the first character in the set.

For example, unique class values are assigned for both Hiraganas and Katakanas so those are always separated from other character types.

Reference