Structure data on demand

[date-stamp]How to structure data? That has been a frequently asked question in many companies. The reason for the question is the need to use the data that is aggregated in a company in the form of documents, spreadsheets, customer information, sales data, financial data and much more later on. But how to structure it so the company can have maximum use of these data? The traditional way is to create a plan which details where things are to be stored, and then "force" users to follow the plan. But there are other and more clever ways to do this.


The traditional need for structured data

Why do we need to store customer documents in one specific folder on the file server, or file them on the "correct" customer in the CRM system? It is the need to use this information later which is the main reason for such rules. Plus it looks pretty. At least some think so. The idea is that a strict structure on where data can be found helps in the process of finding this information again later when it is needed. And this is a noble idea, it is hard to say that structure in general is a bad idea.

But I think it's a little too much of the "one-way-for-everyone"-approach to do it like this, but more on that later on.


Traditional ways of structuring data

There have been many ways to store documents and other information in a structured way. The most well-known one is to have a defined folder structure on your file server, and certain types of documents are to go to certain folders. Project documents go to the project folder (or even better, to the sub-folder of the project folder which is named after the project number), financial spreadsheets go to the Finance folder, and so on. Someone have spent a lot of time figuring out the best way to store things, and a folder structure is the result.

In CRM systems, attaching documents about a project needs to be done on the correct customer. Otherwise there'll be chaos. A misplaced document is disastrous and is lost forever. There is in most cases a search function, but can that search in content in a Word file? Not every search function can do that, unfortunately.

Overall, for every system there's a set of rules for how to do things. That has been the traditional way of doing things.


Goodbye to old-fashioned structure, and welcome to the on-demand structure

I have often meet the "structure police" in companies where I have worked, which is normally a self-appointed guardian of the structure rules. They would confront me that "you put this document in the wrong place!". Since I didn't know this other place existed, I said just that. But then I was told to "pay attention" and similar things. Such red tape is counterproductive and fosters bureaucracy. I also tend to get annoyed with such rules. Some people like bureaucracy, but I am certainly not one of them. So over the years I have stored documents and files where I thought they belonged, and that has created some nasty feedback from the "structure police". But I believe in search technology in favor of manually looking through dozens of folders, so I have been able to locate the files I have created later on.  I've saved time as well :-)

What is the solution here? As the amount of information keeps getting bigger, the task of manually looking for information becomes daunting. You really just scratch the surface of what's available. Imagine that Google or any other of the web search engines did not exist. How would you find the article you were looking for on the web? Manually looking?? No way, it is practically impossible to do it this way. But still, this is the way things are done in many companies, they're manually looking for things based on a structure which details where information is to be stored. But what if someone (like me) stored things in a different place? It is lost forever, never to be found again. The pre-planned structure breaks down in the face of massive amounts of information, and also when someone doesn't follow the structure rules.

What is an on-demand structure of data? Imagine you're looking for all documents written by one specific person which is about construction quality of Samsung aircondition units. Search for "Samsung aircondition construction quality" and filter by the author you're looking for. And you have instantly created a view into your own data which no-one could have planned for in advance using an old-fashioned structured data approach. If we continue our scenario, filter by document type (Word, .docx) and a date range (from 3 to 6 months ago), and you have a very narrow and targeted view of the information which is available for what you're looking for. You have actually created an on-demand structure of data based on the criteria you were looking for. Now try that with a traditional pre-planned structure and you'll see that it's not possible. So on-demand structure has some benefits :-)


The benefits of on-demand structure

The most immediate benefits of an on-demand structure of data are speed, completion and accuracy. Plus the ability to generate reports and statistics.

Since a on-demand structure is based on search technology sitting on top of your information, the time it takes to query large amounts of data can be measaured in seconds. For a manual approach we would measure it in minutes or hours, so a on-demand structure is much more suitable for re-finding information. Which is natural for anything built on search technology I guess, since search technology was initially created for this purpose.

If you have a search engine connected to all of your data silos, you can be sure that all information is available through this search engine. So there's a notioni of information comopleteless which is hard to get when looking for informatin manually. When you look on the file server, you only look in the file server. But in the scenario described here, with a search engine conected to all of your data silos, you can look in all data silos simultaneously,not one by one. So information completion is clearly a benefit of the on-demand structure of data.

Accuracy is another benefit. Getting a machine to do a job eliminates the source for human error. Plus it looks inside each file to look at the content, which takes forever and is notoriously inaccurate when done manually by a human.

Another benefit is a side-effect of having all your information in a search engine. Search engines are great for filtering, sorting and getting aggregated information out. So to generate reports and statistics from data which is gathered across multiple data silos is now fully possible. In addition, this information is up-to-date and almost real-time, thus giving a great time-saver comopared with similar reports created manually.



On-demand structure beats pre-planned structure hands down. It is way more flexible, it uses readily available technology and can make our lives much easier and more up-to-date. Let's spend our time on things that matters, and let search applications do the hard work when it comes to make information accessible (wherever it is stored :-)