tulika  goyal tulika goyal

Unstructured vs. Structured data

Data for Humans

A plain sentence – “we have 5 white used golf balls with a diameter of 43mm at 50 cents each” – might be easy to understand for a human, but for a computer this is hard to understand. The above sentence is what we call unstructured data. Unstructured has no fixed underlying structure – the sentence could easily be changed and it’s not clear which word refers to what exactly. Likewise, PDFs and scanned images may contain information which is pleasing to the human-eye as it is laid-out nicely, but they are not machine-readable.

Data for Computers

Computers are inherently different from humans. It can be exceptionally hard to make computers extract information from certain sources. Some tasks that humans find easy are still difficult to automate with computers. For example, interpreting text that is presented as an image is still a challenge for a computer. If you want your computer to process and analyse your data, it has to be able to read and process the data. This means it needs to be structured and in a machine-readable form.

One of the most commonly used formats for exchanging data is CSV. CSV stands for comma separated values. The same thing expressed as CSV can look something like:

“quantity”, “color”, “condition”, “item”, “category”, “diameter (mm)”, “price per unit (AUD)”
5,”white”,”used”,”ball”,”golf”,43,0.5

This is way simpler for your computer to understand and can be read directly by spreadsheet software. Note that words have quotes around them: This distinguishes them as text (string values in computer speak) – whereas numbers do not have quotes. It is worth mentioning that there are many more formats out there that are structured and machine readable.

Task: Think of the last book you read. What data is connected to it and how would you make it structured data?

tulika  goyal

tulika goyal Creator

B-tech 2nd year student of polymer science.

Suggested Creators

tulika  goyal