Remove Duplicate Lines
Strip duplicate lines from text — preserve order or sort, case-sensitive or insensitive, trim whitespace before comparing. Counts kept and removed. Browser-only, handles large lists.
Deduplicate Lists Without Losing Order
The most common deduplication mistake is using sort -u when you actually wanted to preserve insertion order. Sorting destroys the meaning of "first occurrence wins" — your CSV row order, your log timeline, your hand-curated list — gets shuffled into alphabetical noise. This tool defaults to order-preserving deduplication: the first time a line appears, it's kept; every subsequent identical line is silently dropped. The output looks like the input minus the repeats, with no reordering.
The Comparison Options Matter More Than You Think
- Case-sensitive (default) —
Apple,APPLE, andappleare three distinct values. Use this for code identifiers, file paths on case-sensitive filesystems, and any data where capitalization is semantic. - Case-insensitive — those three become one. Use this for email addresses (per RFC 5321 the local part is supposed to be case-sensitive but in practice is treated case-insensitively), tag names, and free-form user input.
- Trim whitespace — '
hello' (with no surrounding space) and 'hello' (padded) are merged. Useful when input came from copy-paste with stray indentation, or when joining data from sources that handle whitespace differently. - Remove blank lines — strips empty lines entirely from the output, even if they were unique. Default behavior dedupes blanks like any other line.
- Only show duplicates — inverts the operation: outputs only the lines that appeared more than once (each shown once). Useful for finding the dupes in a list rather than removing them.
Why Order Preservation Matters
A common scenario: you have a CSV exported from a CRM, and the export tool has a known bug that occasionally double-includes rows. You want to dedupe, but the export order encodes information — chronological, by sales region, by priority. Sorting alphabetically would corrupt that.
This tool's algorithm is the same as awk '!seen[$0]++': walk through the input once, maintain a Set of lines already seen, emit each line the first time it appears. O(n) time, O(unique-lines) space. The Set lookup is constant-time, so a list of 100,000 lines processes in well under a second.
Doing the Same in Code
// Bash / Unix — preserve order
awk '!seen[$0]++' file.txt
// Bash — alphabetize and dedupe
sort -u file.txt
// Bash — find duplicates only
sort file.txt | uniq -d
// PowerShell
Get-Content file.txt | Select-Object -Unique
// Python (3.7+ preserves dict insertion order)
unique = list(dict.fromkeys(open('file.txt').read().splitlines()))
// JavaScript
const unique = [...new Set(text.split('\n'))];
// Case-insensitive in Python
seen = set()
result = []
for line in lines:
key = line.casefold().strip()
if key not in seen:
seen.add(key)
result.append(line) Edge Cases Worth Knowing
- Different line endings —
\r\n(Windows),\n(Unix), and\r(legacy Mac) all parse correctly. Output uses\n. - Trailing newline — preserved in output if present in input. Most code expects this; some text editors strip it silently.
- Unicode normalization — '
é' (U+00E9) and 'é' (U+0065 + U+0301, combining acute) are distinct lines. Run them through a Unicode normalizer (NFC) first if you need to merge. - BOM — files saved by Excel may start with a UTF-8 BOM (EF BB BF). The tool does not strip it automatically here, but if you see a phantom invisible first line, that's why.
For comparing two lists side-by-side instead of deduplicating one, see our Text Diff tool. To sort lines, use Text Sorter.
How to Use
- Paste your list — one item per line.
- Pick options — case-insensitive, trim whitespace, sort, remove blanks, or invert to show only duplicates.
- Click Process — the deduped output appears below with stats (kept / removed / duplicates).
- Copy or download — use Download for very large outputs.
Frequently Asked Questions
Does it preserve the order of the first occurrence?
Yes. The default 'Preserve order' mode keeps the first occurrence of each unique line and discards subsequent duplicates. Output order matches input order minus the duplicates. Use the 'Sort A→Z' option if you want alphabetical output instead.
Is the comparison case-sensitive?
Yes by default. 'Apple' and 'apple' are kept as two distinct lines. Toggle 'Case-insensitive' to treat them as duplicates — the first occurrence wins, so if 'Apple' appears before 'apple', the output will have 'Apple'.
What about lines with extra whitespace?
By default, leading/trailing whitespace is significant — ' hello' and 'hello' are different lines. Toggle 'Trim whitespace' to ignore leading/trailing spaces and tabs when comparing. The kept lines retain their original whitespace unless you also enable 'Normalize whitespace' to apply trimming to the output.
How are blank lines handled?
Blank lines are deduplicated like any other line by default — multiple consecutive blank lines collapse to one, and any further blanks are removed. Toggle 'Keep blank lines' to preserve all of them, or 'Remove blank lines' to strip them entirely from the output.
Is there a size limit?
The browser handles tens of thousands of lines comfortably. For lists with hundreds of thousands of lines, the Set lookup remains O(1) so processing stays fast — the bottleneck is rendering the output. Use the Download button to skip rendering for very large outputs.
How do I do this on the command line?
Unix: `awk '!seen[$0]++' file` (preserve order) or `sort -u file` (alphabetical). PowerShell: `Get-Content file | Select-Object -Unique`. Python: `list(dict.fromkeys(open('file').read().splitlines()))` (Python 3.7+ preserves insertion order). For very large files (gigabytes), `awk` is the fastest single-pass option.
Comments
No comments yet. Be the first!