31

I'm making a presentation and I need to know if I should use "duplicated data" or "duplicate data". Is there any difference? I'm talking about removing observations of a database that are duplicates.

RegDwigнt
  • 97,231
  • You mean "observed instances of" perhaps? – Kris May 11 '12 at 15:05
  • 1
    I would go with duplicate data as is the case in duplicate records. – Noah May 11 '12 at 17:57
  • 3
    Thanks everybody for the answers. I learned a lot. English is not my first language and in Portuguese we don't have this distinction between duplicate and duplicated. All answers are great and I voted for all (I really like approval voting!). However, I can only choose one as accepted answer. – Manoel Galdino May 11 '12 at 20:13

4 Answers4

24

The difference is subtle, but in this case, Duplicate Data would be preferred. I would interpret the two phrases as follows:

Duplicate Data: Entries that have been added by a system user multiple times, for example, re-registering because you have forgotten your details.

Duplicated Data: Someone has deliberately taken a precise duplicate of the data - or a proportion of it - maybe for backup or reporting purposes. It may have been accidentally added to the original.

In the context of what you are talking about, the difference is important, because the second implies exact duplicates, whereas the first is a much more complex issue.

And yes, "exact duplicate" and "partial duplicate" are misnomers - it is either a duplicate or not - but these are the terms used.

  • 4
    Yup, I would interpret the difference between the two forms exactly as you describe. To put it concisely, duplicate is accidental, duplicated is deliberate. – Marthaª May 11 '12 at 16:46
  • @Marthaª - yes simplistically, but Duplicated can also be accidental. But as a rule, you are right. – Schroedingers Cat May 11 '12 at 17:20
  • 1
    I think this is a spurious distinction - just because we use both forms doesn't mean they have to have different meanings. – FumbleFingers May 11 '12 at 18:29
  • 1
    @Marthaª and duplicate can also be deliberate. I think it more matters what the source of the data is. See my answer :) – user606723 May 11 '12 at 18:56
  • Accidental and deliberate are literary senses as construed by the reader and thus, subjective. This is not relevant in a strictly technical context. – Kris May 12 '12 at 05:28
  • @FumbleFingers But in the context, they DO have different meanings, and the meanings are important. – Schroedingers Cat May 12 '12 at 10:13
  • @Schroedingers Cat: I understand that a system may contain multiple copies of the same data because it was entered more than once, and/or because existing data was replicated. What I don't accept is that any sensible writer would expect his readers to recognise which you kind he was referring to on the basis of it being called duplicate or duplicated. – FumbleFingers May 12 '12 at 17:18
12

Other answers are good, but in my words:

Duplicated data used the old data as the source of the data.

process(X) -> A
A-> B
A is duplicated as B

Duplicate data had the same input and therefore are matching records.

process(X)  -> A
process(X)  -> B
B is a duplicate of A
6

Duplicate data.

What you discover and remove are instances of duplicate data.

What you or the processes create, mostly for a purpose, is duplicated data.

Kris
  • 37,386
4

When I hear (or read) duplicate data, I presume duplicate is an adjective that modifies the word data.

Most of the dictionaries I consulted listed duplicate as a noun, verb, and adjective, but duplicated as a verb. For that reason, I'd use duplicate data.

J.R.
  • 58,828
  • 5
  • 95
  • 196