Python code correcting text on laptop screen.

Captain Walker

Solving another annoying problem with AI collaboration

ms word, spelling, workflow, luddite, author, authorship, markdown, python, script, efficiency

Estimated reading time at 200 wpm: 9 minutes

Despite explicit instructions, generative AI systems routinely drift into US English spelling and conventions across longer writing projects. For British-English authors who use AI assistance, this creates a persistent and time-consuming editorial burden. The key insight that resolved this was accepting that generation and linguistic compliance should be treated as separate stages. Rather than attempting to force AI systems to behave perfectly, the solution is to normalise language deterministically at the final output stage. See also: Fear Dressed in Ethics: Why the AI Backlash is Really About Status, Not Souls – The Captain’s Watch

Whether or not you agree our Fat Disclaimer applies

This post documents how a fully offline, auditable, British-English normalisation workflow was designed and implemented in Python, for Markdown and Microsoft Word documents.

If you’re a luddite, leave now to avoid a headache. Save your time for Netflix!

Design principles

Several constraints guided the approach from the outset. The solution had to run locally, without paid services or APIs. It needed to work across Markdown and Word files. It had to protect quotations, references, and other material that should not be altered. Finally, every change needed to be traceable so that errors could be identified and corrected.

The resulting system behaves more like a build step than an editor. It is run deliberately at the end of drafting, not continuously during writing. This means that the author could avoid being annoyed and distracted by American spellings, safe in the knowledge that they would all be dealt with in one fell swoop at the end.

Architectural overview

The solution consists of three components:

  1. A Python script that performs deterministic US‑to‑UK normalisation.
  2. A lexicon folder containing editable reference files.
  3. Automatically generated logs and backups that record every change.

The script itself contains no hard‑coded linguistic knowledge. Instead, it reads all spelling knowledge from external reference files, which makes the system transparent and extensible.

Lexicon structure

The lexicon is organised as a small folder that sits alongside the script.

base_us_uk.csv contains the main US→UK spelling pairs. This file is alphabetised and intended to be relatively stable once mature. This is editable my the author, so that new spellings problems can be edited manually for the app to act on in the future.

overrides.yaml contains project‑specific preferences, exclusions, and rule toggles. This is where most fine‑tuning occurs over time.

exclude_phrases.txt contains exact phrases that must never be altered, such as titles of referenced works, quoted source material, product names, or institution names.

This file proved to be a particularly important safety mechanism. It allows the author to assert absolute precedence over automation in narrowly defined cases, without weakening the broader lexicon.

Typical uses include:

  1. Protecting the titles of books, papers, or reports that legitimately use US spelling.
  2. Preserving quoted passages where fidelity to the source matters more than house style.
  3. Avoiding unintended modification of proper nouns or branded terms.

Because matching is exact and case‑sensitive, the behaviour is predictable and transparent. Phrases listed here are temporarily protected before any normalisation occurs and are restored verbatim afterwards.

This separation allows large‑scale coverage without losing editorial control.

Protected regions

A critical requirement was that quotations and references must not be silently altered.

For Markdown files, the script automatically protects fenced code blocks, inline code, blockquotes (by default), and any content under a heading titled “References” or “Bibliography”.

For Word documents, paragraphs styled as Quote or Bibliography are skipped, and everything after a heading titled “References” or “Bibliography” is left untouched.

Additional protection can be applied explicitly using the exclude phrases file.

Logging and auditability

Every run of the script generates two timestamped CSV logs whose filenames include the processed file or folder name.

The summary log records how many word‑level changes were made per file.

The detailed log itemises each US spelling replaced, its UK equivalent, and the number of occurrences, per file.

These logs are generated even during dry runs, providing full visibility before any changes are committed.

An important refinement during development was recognising that logs are not merely diagnostic artefacts, but part of the authorial assurance process. They function as an explicit audit trail, allowing retrospective review of exactly what was altered, when, and why.

Installation instructions (Windows)

Python 3.10 or later must be installed. During installation, ensure that “Add Python to PATH” is selected.

It is common for multiple Python versions to coexist on Windows. Dependencies must be installed into the same Python interpreter that runs the normaliser.

Open PowerShell and install the required libraries using the explicit interpreter invocation:

python -m pip install python-docx pyyaml

If the graphical interface uses a different Python version (for example Python 3.13), repeat the installation using that interpreter explicitly:

C:\Path\To\Python313\python.exe -m pip install python-docx pyyaml

Verify installation by running:

`python normalise.py –help“

Running a safe test

A dry run should be treated as the default mode of operation.

To perform a non‑destructive analysis of a single file, use a dry run. Paths containing spaces must be wrapped in quotes.

python normalise.py --dry-run "D:\Path\To\Your\File.md"

The script reports how many changes would be made and generates full audit logs, but does not modify the file.

This step is essential because some word pairs, such as while/whilst or forward/forwards, are stylistic rather than purely orthographic. Their appropriateness depends on context and authorial intent.

Applying changes

Once satisfied with the dry‑run output, apply changes by running the same command without the --dry-run flag:

python normalise.py "D:\Path\To\Your\File.md"

Before writing any changes, the script automatically creates timestamped backups of each file.

As a matter of practice, changes should only be applied once the detailed log has been reviewed and any undesirable substitutions have been excluded or removed from the lexicon.

Expanding the lexicon

Initial coverage was seeded from a manually curated UK/US spelling table and then expanded into an alphabetised base lexicon.

During testing, rule‑based spelling transformations were found to produce invalid results in natural language contexts (for example door to doour). As a result, rule‑based conversion was explicitly disabled.

The final system relies exclusively on explicit mappings in base_us_uk.csv and carefully chosen entries in overrides.yaml.

This conservative approach favours correctness and authorial control over maximal automation.

Future additions are made incrementally as new cases are encountered, without modifying the script itself.

Final workflow

In practice, the workflow is intentionally simple and repeatable.

Draft content freely, without worrying about dialect drift.

When a section or chapter is complete, run the normaliser in dry‑run mode, either from the command line or via the graphical interface.

Review the detailed CSV log and adjust the lexicon if needed.

Apply changes only when the proposed substitutions align with authorial intent.

This staged approach preserves control while eliminating routine manual correction.

This approach removes an entire category of low‑value editorial work while preserving full linguistic control.

Graphical interface Development

To reduce reliance on the command line, a minimal Tkinter‑based graphical interface was added.

UK Normaliser software dry run interface screenshot.

The interface allows selection of a single Markdown or Word file, toggling of dry‑run and safety options, and execution of the normaliser with full output and logging displayed.

The Dry run default option makes no changes. It produces a CSV file of what would be changed. The author can take a look at this and decide if to make a full run if satisfied, and/or modify the base_us_uk.csv before a full run.

The GUI functions strictly as a launcher. All lexicon files, backups, and logs remain external and editable.

Placement and launch

The GUI file (uk_normaliser_gui.py) must sit in the same folder as normalise.py and the lexicon folder. This is deliberate: the GUI simply invokes the normaliser and relies on its adjacent lexicon and logging paths.

If double‑clicking opens the file in an editor rather than launching it, start it once from PowerShell:

python uk_normaliser_gui.py

Thereafter, Windows usually associates .py files with Python correctly.

Dependency alignment (the most common stumbling block)

A frequent Windows pitfall is having multiple Python versions installed. The GUI runs under whichever Python interpreter launches it. If the normaliser is then executed under a different interpreter, the following errors may be seen such as:

ModuleNotFoundError: No module named yaml

This is not a script error. It means the required packages were installed into one Python environment (for example Python 3.12) but the GUI/normaliser is running under another (for example Python 3.13).

To diagnose which interpreter is being used:

where python

To install dependencies into the exact interpreter that is running the GUI/normaliser, use the explicit interpreter path shown in the GUI output (or returned by where python):

C:\Path\To\Python313\python.exe -m pip install python-docx pyyaml

This approach is robust and avoids PATH ambiguity.

About pip warnings

Warnings may appear that pip.exe is not on PATH. These warnings are usually harmless when installing packages via:

python.exe -m pip …

This invocation targets the correct interpreter regardless of PATH configuration.

Operating practice with the GUI

Dry run should remain the default. The detailed CSV log is the primary guardrail against context‑sensitive substitutions (for example while/whilst or forward/forwards). The GUI makes this routine by keeping dry‑run enabled by default and displaying the exact command executed.

Once satisfied with the detailed log, untick dry run and rerun.

Troubleshooting checklist

If the GUI reports failure:

  1. Confirm normalise.py is in the same folder as the GUI.
  2. Confirm lexicon exists next to normalise.py.
  3. Check the GUI output for the Python interpreter path being used.
  4. Install dependencies into that interpreter using python.exe -m pip …
  5. Rerun in dry-run mode and confirm logs are generated.

UK normaliser: quick start on a new machine

When setting up the tool on a new machine (for example a laptop), the following short checklist prevents almost all problems.

  1. Open PowerShell in the normaliser folder

(Shift + right-click in the folder → Open PowerShell here)

  1. Confirm Python works
python --version
  1. Install required packages into this Python
python -m pip install python-docx pyyaml
  1. Run the graphical interface (recommended)
python uk_normaliser_gui.py
  1. First run (safe)
    In the GUI:
  • Select one .md or .docx file
  • Leave Dry run ticked
  • Click Run normaliser
  1. Apply changes (only after review)
    Untick Dry run and run again.

Closing reflections

The core lesson is that reliability comes from determination, not instruction. By treating language normalisation as a mechanical post‑processing step, British‑English authors can work with modern AI tools without surrendering consistency or spending hours correcting avoidable errors.

Equally important is the recognition that some linguistic choices are contextual and stylistic. A dry‑run‑first workflow ensures that automation assists the author without overriding judgement.