https://www.proz.com/forum/ai_for_translators_and_interpreters/366301-remove_all_lines_in_language_x.html

Remove all lines in language X
Thread poster: Samuel Murray
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 23:15
Member (2006)
English to Afrikaans
+ ...
Feb 27

Hello everyone

I have a text file with lines of text in English, but unfortunately some of the lines are in Afrikaans. I want to either remove the Afrikaans lines or create a list of all the Afrikaans lines (either option is good for me). Chat GPT claims to be able to do this, but as usual, it simply creates a list of lines that looks plausible, until you double-check it and discover that the bot had just made up a list that looks highly similar to the topic of the list of input l
... See more
Hello everyone

I have a text file with lines of text in English, but unfortunately some of the lines are in Afrikaans. I want to either remove the Afrikaans lines or create a list of all the Afrikaans lines (either option is good for me). Chat GPT claims to be able to do this, but as usual, it simply creates a list of lines that looks plausible, until you double-check it and discover that the bot had just made up a list that looks highly similar to the topic of the list of input lines.

Is there an AI (or technological) solution to do this?

Thanks
Samuel
Collapse


 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 00:15
English to Russian
Remove by language Feb 27

Probably you can try to remove by language: select all, set 'Detect language automatically' in the Proofing Language settings and then replace all Afrikaans text with a blank 'Replace with' field.

[Edited at 2024-02-27 18:22 GMT]


 
Neirda
Neirda  Identity Verified
China
Local time: 05:15
Chinese to French
+ ...
An alternative to AI May 23

If you can use Python, there's a few libraries you can use to detect the language in a text and optionnally do anything you want with it. What you can use ChatGPT for is walk you through the steps of doing that, it's simpler than you think.

The catch is:
- most of these libraries will probably not be too accurate with detecting Afrikaans and might mistake it with German.
- You need a sample of at least a few dozen characters to eliminate false positives.

... See more
If you can use Python, there's a few libraries you can use to detect the language in a text and optionnally do anything you want with it. What you can use ChatGPT for is walk you through the steps of doing that, it's simpler than you think.

The catch is:
- most of these libraries will probably not be too accurate with detecting Afrikaans and might mistake it with German.
- You need a sample of at least a few dozen characters to eliminate false positives.

These libraries are not related to AI but mostly work with "ngrams" (so called "trained data" with lots of samples of 3 to 4 letters, when you compare it to a corpus of text you can actually detect most languages pretty well).
Collapse


 
Hans Lenting
Hans Lenting
Netherlands
Member (2006)
German to Dutch
Python May 23

Neirda wrote:

- most of these libraries will probably not be too accurate with detecting Afrikaans and might mistake it with German.


I assume that it is more likely that the language will be identified as Dutch.

Ik neem aan dat het waarschijnlijker is dat de taal als Nederlands geïdentificeerd zal worden.

Ek neem aan dat dit meer waarskynlik is dat die taal as Nederlands geïdentifiseer sal word.

Since I’ve recently installed Python 3 on macOS Sonoma, I’d be grateful for a link to the Python scripts.


 
Neirda
Neirda  Identity Verified
China
Local time: 05:15
Chinese to French
+ ...
There's no link May 23

You have to do this yourself. Or ask ChatGPT to. I do not know the libraries in Python as I mostly use C sharp, but Python being the most popular coding language I'm sure they exist.

This is what ChatGPT told me:

In Python, there are several libraries available for language detection. Some of the most popular ones include:

langdetect: This library is a port of Google's language-detection library. It's simple to use and supports many languages.
... See more
You have to do this yourself. Or ask ChatGPT to. I do not know the libraries in Python as I mostly use C sharp, but Python being the most popular coding language I'm sure they exist.

This is what ChatGPT told me:

In Python, there are several libraries available for language detection. Some of the most popular ones include:

langdetect: This library is a port of Google's language-detection library. It's simple to use and supports many languages.

python

from langdetect import detect

text = "Bonjour tout le monde"
language = detect(text)
print(language) # Output: 'fr'

langid: This library is another option for language identification. It also supports many languages and is quite straightforward to use.

python

import langid

text = "Hello world"
language, _ = langid.classify(text)
print(language) # Output: 'en'

polyglot: This library offers language detection as part of a larger suite of NLP tools. It requires installing some additional dependencies.

python

from polyglot.detect import Detector

text = "Hola mundo"
detector = Detector(text)
print(detector.language.code) # Output: 'es'


I'd start with that.
Then you will also need to write your own routine for whatever you are trying to achieve.
Collapse


Hans Lenting
 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


Remove all lines in language X


Translation news





TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »