Spell checker: run LanguageTool server in Docker

 

While looking for a better spell checker for the browser, I came across the open source software LanguageTool. LanguageTool corrects errors for English, Spanish, French, German, Portuguese, Polish, Dutch and more than 20 other languages. In doing so, LanguageTool also finds errors that a simple spell checker cannot detect. Those who do not want to send their texts to a cloud service can run a LanguageTool server themselves for this purpose. Since the service is also available as a Docker version, it can be easily started on any computer or server and easily used in your own network. 

Browser plugin

LanguageTool is available as a browser plugin for the well-known web browsers, such as Google Chrome, Firefox or Edge. By default, the plugin sends all text entered in the browser to the URL: https://languagetool.org.

Functionality

LanguageTool examines all input fields and can be used universally for all web pages or web applications.

Advanced settings - own server service

Those who run their own LanguageTool server can store its address in the browser plugin.

Launch Docker container

Docker Basics

Docker allows applications to be launched by command in a so-called container.
A container is an isolated environment independent of the operating system (OS):
When a container is first launched, Docker independently loads all the necessary sources
from the internet.
Docker can be installed on Windows, macOS or an Linux Distribution
I have filled a docker-compose.yml file with the following content for starting LanguageTool:

version: "3"

services:
  languagetool:
    image: erikvl87/languagetool
    container_name: languagetool
    ports:
        - 8010:8010  # Using default port from the image
    environment:
        - langtool_languageModel=/ngrams  # OPTIONAL: Using ngrams data
        - Java_Xms=2g  # OPTIONAL: Setting a minimal Java heap size of 512 mib
        - Java_Xmx=4g  # OPTIONAL: Setting a maximum Java heap size of 1 Gib
        - timeoutRequestLimit=120
    volumes:
        - ./ngrams:/ngrams        
    restart: always

To make LanguageTool work with longer texts, I set the "Java heap size": "Java_Xms" to 2g and "Java_Xmx" to 4g in the example. The folder /.ngrams should be filled with NGRAM data for a more accurate spell check.

NGRAM data

To increase the accuracy of the server, so-called NGRAM data can be used. NGRAM data are decomposed text fragments that can be used to incorporate static probabilities for spell checking. The download of the NGRAM data is available at the following URL: languagetool.org/download/ngram-data/. The zip files should be unzipped to the ngrams subfolder:

On a Linux machine, the data can be loaded and unzipped via the terminal as follows:

wget https://languagetool.org/download/ngram-data/ngrams-de-20150819.zip
cd ngrams && unzip ../ngrams-de-20150819.zip
cd ..
wget https://languagetool.org/download/ngram-data/ngrams-en-20150817.zip
cd ngrams && unzip ../ngrams-en-20150817.zip

Conclusion

The LanguageTool has a very good recognition performance and looks not only at individual words, but also at entire sentences. In addition to the spelling check and punctuation, sentences are also marked where a possible style improvement is detected, for example repeated words or sentences that are too long. Since LanguageTool can be run on the user's own server, there is nothing to stop it from being used for sensitive texts.

 

positive Bewertung({{pro_count}})
Rate Post:
{{percentage}} % positive
negative Bewertung({{con_count}})

THANK YOU for your review!

Questions / Comments


By continuing to browse the site, you agree to our use of cookies. More Details