Thanks to Mary Guthmiller taking the lead on the topic, this month’s ThinkTank will be looking at Big Data and Data Visualization.
Quickly defined, Big Data is “…the massive amounts of data that collect over time that are difficult to analyze and handle using common database management tools. Big Data includes business transactions, photos, surveillance videos and activity logs (machine-generated data). Scientific data from sensors can reach mammoth proportions over time, and Big Data also includes unstructured text posted on the Web, such as blogs and social media.” (source: PC Magazine Encyclopedia, http://www.pcmag.com/encyclopedia_term/0,2542,t=Big+Data&i=62849,00.asp, accessed 09/19/2012.)
With the U.S Government’s announcement in March 2012 of the “Big Data Research and Development Inititiative” complete with $200 million in R & D Investments, research monies and grants for analyzing and preserving big data will become even more available in the coming years (http://www.whitehouse.gov/blog/2012/03/29/big-data-big-deal). One of the roles of research universities will be to build simple tools that help make sense of these growing datasets. And it’s not just about building tools, coming up with information design and visualizations that allow others to see new relationships hidden inside of big data will become even more important.
Please have a look at the following for tomorrow. (Note: we’ll be watching the McCandless video on data visualization of large data sets during ThinkTank due to short notice on this message…)
See you all soon,
Introduction to Big Data and the AMPLab
Michael Franklin, Director of AMPLab, UC Berkeley EECS Faculty
David McCandless: The beauty of data visualization
FILMED JULY 2010, POSTED AUG 2010, TEDGlobal 2010
Big grant for Big Data: NSF awards $10 million to harness vast quantities of data
By Sarah Yang, Media Relations | March 29, 2012
UC Berkeley News Center
Berkeley Group Digs In to Challenge of Making Sense of All That Data
By JEANNE CARSTENSEN
Published: April 7, 2012
New York Times
Questions for Discussion:
- What can big data tell us about where we are in the present and where we might go in the future?
- What are the big data sources in the university setting? What about the library setting?
- Do universities and libraries have a role in archiving and preserving big data? Why or why not?
Active Initiatives at Universities:
- Amplab at UC Berkeley
- Research Computing Group at Montana State University
Notes from the discussion:
Records management opportunity – Terry Sutherland
- paper analogy to the big data digital problem
“Data is the new soil” – McCandless
“Data is the new oil”
Library Big Data Points
- Library web site analytics
- Books ordered/processed
- Usage stats of equipment
- Electronic records management
- Physical spaces – Gate counts
- circulation and ILL data
- Records management
- Student records
- Data warehouse of all foundation and donor data
- email records
- storage of data sets – are we just digital hoarders?
- preservation of digital bits
- analysis of the data for business analytics
- ability to intelligently query and find those digital bits