A pretty cool display of topic models clustered and identified within a big bunch of well reports, so we can analyze... your analysis!
The input data was composed of 4,542 pages (6.5 GB) of well reports, where we attempt to understand the contents, relationships, and similarities within a bag of words of geoscientific documents. It is quite interesting to note that the geology pages started to cluster in one section below, and summary pages also do the same. One can see which reports are pretty standardized, and most probably written by the same scientist or at least the same service contractor.
Stay tuned for how we will integrate this technique in Iraya's Elastic Docs!
And you, how do you want to use it?