3

I want to describe data that is easy for a human to understand but very difficult for a machine to understand. Like, raw unfiltered user input.

For example, suppose I have a column in a table that indicates a location (sf_locations). One of the rows has the text "from 2nd to 5th" for that column.

This may be easily understandable to a human who knows the context is San Francisco, but a machine would not have a clue how to process that without hacky special-case logic or NLP.

Are there any good words to describe this kind of data (with regards to machine readability)?

Ryan
  • 151
  • 1
    I just listened to the Planet Money podcast about the Mechanical Turk at Amazon and how it is designed specifically to handle data like this. They didn't call it by a special name in the story, though, so I don't know if it has such a name. – Mark Thompson Feb 22 '15 at 04:02
  • 1
    So long as it's written in a legible hand, there is a computer that can "read" it. And computers are getting better all the time at interpreting the data, even though it's not intended for computer consumption. You probably need to think about what criteria/qualifications you're placing on the data and the "machine" reading it. – Hot Licks Feb 22 '15 at 04:09

3 Answers3

4

In tech circles, this is usually known as "unstructured data".

From Wikipedia, for example:

Unstructured data (or unstructured information) refers to information that either does not have a pre-defined data model or is not organized in a pre-defined manner.

Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. Examples of "unstructured data" may include books, journals, documents, metadata, health records, audio, video, analog data, images, files, and unstructured text such as the body of an e-mail message, Web page, or word-processor document.

Techniques such as data mining, Natural Language Processing (NLP), text analytics, and noisy-text analytics provide different methods to find patterns in, or otherwise interpret, this information.

Software that creates machine-processable structure exploits the linguistic, auditory, and visual structure inherent in all forms of human communication. Algorithms can infer this inherent structure from text, for instance, by examining word morphology, sentence syntax, and other small- and large-scale patterns.

While the main content being conveyed does not have a defined structure, it generally comes packaged in objects (e.g. in files or documents, ...) that themselves have structure and are thus a mix of structured and unstructured data, but collectively this is still referred to as "unstructured data". For example, an HTML web page is tagged, but HTML mark-up typically serves solely for rendering.

Dan Bron
  • 28,335
  • 17
  • 99
  • 138
  • I had thought of unstructured but feel it is a more broad term and not exactly what I was looking for. In my example, saying that the column contains "unstructured data" does not have the same intent nor convey the same concept as "<'not machine readable'> data". – Ryan Feb 22 '15 at 04:34
  • @Ryan The way I understand UD, it fits your example to a tee, but of course only you can speak to its applicability to your own circumstances. – Dan Bron Feb 22 '15 at 04:36
  • I basically want an adjective describing unstructured data that conveys that it is not easily readable by a machine -- not "unstructured data" itself. No matter, this is probably as good as it will get. – Ryan Feb 22 '15 at 04:52
  • 1
    @Ryan The adjective you're seeking is simply unstructured :) – Dan Bron Feb 22 '15 at 04:53
  • +1 because it answers the more general underlying question. I doubt there is a single word answer to the exact example in the question. – Chris H Feb 22 '15 at 10:53
  • +1. This is the answer to the question, IMO. One might also say 'untyped' to hammer it in, but unfortunately this word's got a lot to bear already. – anemone Feb 22 '15 at 19:49
0

I will go for non-machinable, nonmachinable; nonmachinable data.

sojourner
  • 3,533
  • Your link seems to have nothing to do with IT and the word "nonmachinable" doesn't seem to be used in the context of data parsing. The only usage of the term I can find is that the USPS uses it to mean "a letter that can't be sorted by machine". The question doesn't mention that it's looking for a neologism. – David Richerby Feb 22 '15 at 08:14
  • @David Richerby Thanks brother! I deleted the link. – sojourner Feb 22 '15 at 09:55
0

If it can be read by a machine, it's machine-readable; if it can't, it's non-machine-readable or just not machine-readable.