Showing posts with label Big Data. Show all posts

Tuesday, April 9, 2013

Getting libraries out of the horse-and-buggy days

In 1967, Dr. Vannevar Bush, who envisioned a computer capable of massive data retrieval in device he called a memex published Science Is Not Enough, which included a chapter entitled "Memex Revisited," which considers the question of data compression and retrieval. You can read the entire chapter here: http://www.bekkahwalker.net/comt111a/reading_pdf/memex-revisited.pdf

On p. 88, he brings a critical observation about priorities:

The great digital machines of today have their exciting proliferation because they could vitally aid business, because they could increase profits. The libraries still operate by horse-and-buggy methods, for there is no profit in libraries. Governments spend billions on space since it has glamour and hence public appeal. There is no glamour about libraries, and the pubic do not understand that the welfare of their children depends far more upon effective libraries than it does on the collection of a bucket of talcum powder from the moon. So it will not be done soon. But eventually it will.

Now, 46 years, later, the public is understanding the importance of libraries, and the power of the internet to gather all the world's digitized information at put them at one's fingertips.On April 18 and 19, the Digital Public Library of America (DPLA) will celebrate its launch at the Boston Public Library. In keeping with the ideals underlying the project, there is no charge to attend, though the registration forms indicate the event has filled up.

The dream of the DPLA was to harness the power of the Internet to break through the silos that isolate vast quantities of data collection at various universities, museums, and libraries. It began to take shape in late 2010 when representatives of various institutions met at the Radcliffe Institute in Cambridge, Mass., and resolved to take the necessary steps to bring together that data through cooperative content sharing. In bridging the public-private divide, DPLA has had to overcome the challenge of managing metadata variations and staying on the right side of copyright law.
Read more in Metadata Key for Digital Public Library of America

Tuesday, April 2, 2013

Humans, big data, and cheesecake

"For all their brilliance, computers can be thick as a brick," observed Tom M. Mitchell, a computer scientist. Read more about this in There’s Still Room for Humans

Another post on big data looks at how a restaurant chain uses the technology in

Thursday, March 7, 2013

EHR in NYC

Though the Big Apple has had a bad time with storms and power outages, last month, it got to report some good news. At the “NYC Celebrates Improved Health Through Technology” event on February 7th, New York City’s Mayor Michael Bloomberg announced positive results for the adoption of electronic health records. Read more about it in

Tuesday, February 26, 2013

Trailing the Tiger Trade

You can learn how stealth cameras work in the tiger exhibit at the Bronx Zoo. In at least one case, the picture recorded by the stealth cam sealed the conviction for tiger poachers in Thailand last year. Just like fingerprints, the patterns of tiger stripes are individually distinctive. Consequently, the poacher’s picture proved that they did break the law. Thanks to the laws and enforcement that EIA works for, one was sentenced to four years and another to five, the longest such sentence yet for this sort of crime. Read more about the efforts to save big cats with big data in

The tiger from the cell phone images was identified as the same tiger captured by a camera trap image by WCS the year before, adding to the evidence against the poachers.

A cell p

Wednesday, February 13, 2013

The Big Bow-Wow & a Bit of Ivory

Sir Walter Scott contrasted his style of writing with that of Jane Austen: "The big Bow-Wow strain I can do myself like any now going; but the exquisite touch which renders ordinary commonplace things and characters interesting from the truth of the description and the sentiment is denied to me". While he characterized his work as large, Jane Austen called her own small, a "little bit (two inches wide) of ivory on which I work with so fine a brush."

The two are married together, so to speak, by Mathew Jockers, who declares them the literary equivalent of Homo erectus, or, if you prefer, Adam and Eve,"

Read more about the humanities going Google, as one article put it in my Big Data Republic blog, The Big Bow-Wow & a Bit of Ivory

DNA data storage

The latest developments in data storage turn to biology. DNA sequencing, not only allows for big data to be stored for thousands of years, but allows for a more compact encoding system based on the 4 letters in the DNA bases rather than than binary zeros and ones that have been used until now. Still, the cost makes it prohibitive for now, and the fact that the sequencing make the data impossible to update within DNA make it not quite the same as a flash drive. Read about it in Big Data: Tiny Storage

Friday, February 1, 2013

Post-Sandy Street Views

One thing about big data: it is not static. As there is always changes in a situation, what reflected reality one day can be out of date the next. That is especially true when a hurricane of the likes of Sandy sweeps through and alters the landscape and the structures built on it.

But not everyone is pleased about updates that include images of their hurricane-ravaged homes. Read more at New Maps After Sandy

Thursday, January 24, 2013

That's big data entertainment!

Snoopy was prophetic. 3-D movies have made a comeback. Consequently, movies today pack a lot more data per frame. But big data is also involved in the trend toward data streaming that is displacing discs and in the data on what people want to watch.

Big data drives today’s movie industry, both in terms of the amount of data packed into each frame you see at theaters and in terms of video streaming online. It’s what delivers 3-D effects in the theater and personalized recommendations to Netflix viewers. And very big numbers ride on both.

In the past few years, 3-D movies have staged a comeback on a scale much greater than their brief heyday in the 1950s. Adding in the 3-D effect adds "anywhere from 100% to 200% more data per frame," according to Jeff Denworth, vice president of marketing for DataDirect Networks (DDN). Denworth attributes the proliferation of present-day 3-D films to the huge success James Cameron had with the 3-D film "Avatar" in 2009, which packed a petabyte of data.

"Avatar" cost about $237 million to produce, but it brought in more than ten times that amount. It earned the distinction of IMDB identifies it as "the highest-grossing film of all time." By the beginning of 2010, it had taken in $2,779,404,183. A rash of 3-D films followed this success, and many did very well. According to iSuppli Market Intelligence (owned by IHS) in 2011 3-D films brought in $7 billion at the box-office, 16 percent more than the previous year.

The full figures for 2012 are not yet in, though they will likely be higher as the number of 3-D screens have gone up from about 9,000 in "2009 to 43,000 in by the third quarter" of 2012. One of the biggest draws of the year, Marvel’s 3D superhero flick, "The Avengers," grossed $1,511,757,910 in 2012. As 3-D has grown so common at the theater, movie-makers have to point to something else to distinguish their offering.

"The Hobbit: An Unexpected Journey" had to do 3-D one better with its "brand new format High Frame Rate 3D (HFR 3D)." Instead of the 24 frames per second, which is the movie standard, it packs in 48. The advantage to the viewer, it claims, is that the greater number offers an experience "closer to what the human eye actually sees." Perhaps so, but quite a number of viewers were less than thrilled by the effect. Nevertheless, by December 29, 2012, "The Hobbit" had already taken in $600,508,000, according to IMDB figures.

Big data is also the key to watching movies on the small screen. Instead of picking up a disc when they buy or rent a movie, people now can just have it come right to them. As Dan Cryan, senior principal analyst at HIS observed, in 2012 Americans made "a historic switch to Internet-based consumption, setting the stage for a worldwide migration from physical to online."

Estimates of online movie payments for the US in 2012 are “3.4 billion views or transactions, up from 1.4 billion in 2011. This form of video streaming is dominated by Netflix in the US, where it makes up "33% of peak period downstream traffic" Amazon, Hulu, and HBO Go follow far behind at 1.8%, 1.5%, and .5% respectively. It intends to keep its lead with the help of big data.

Netflix was the subject of a WSJ blog on using big data to improve streaming video. Though Netflix still offers to mail out the DVDs people select for rental, more customers now opt for streaming. In the interest of improving efficiency on that end, Netflix transferred its holdings to Amazon’s cloud. It also started using Hadoop, which enables it "to run massive data analyses, such as graphing traffic patterns for every type of device across multiple markets." That helps plan for improved data transmission and better understanding of the customer.

In addition to using big data solutions for delivery of content, Netflix applies algorithms to predict what their customers would likely want to watch next. This type of data mining technology makes Netflix confident that it can handle hosting original content. In fact, it bet more than $100 million on it; that’s the reported sum paid for the rights to two seasons of House of Cards, one of several original content series it plans on streaming.

As Netflix’s Chief Communications Officer, Jonathan Friedland, says, "We know what people watch on Netflix and we’re able with a high degree of confidence to understand how big a likely audience is for a given show based on people’s viewing habits."

So what do you think? Is it possible to guarantee a hit with big data?

Thursday, January 17, 2013

Seeing stones for military, rescue, and security operations

What do JRR Tolkien, JP Morgan Chase, the military, and rescue workers have in common? Palantir.

"The Palantír" is the title of the 11th chapter of Tolkien’s The Two Towers. The name refers to the "seeing stones" that allow one to view what is happening elsewhere. In 2004, the name was also taken on by a company that develops software organization to extract meaning from various streams of data to combat terrorism, fraud, and disaster damage.

Palantir distinguishes its approach from data mining by calling it "data surfacing." Read more at

For more on big data used by the army, see

"You can't have a data Tower of Babel" in which each system keeps its data isolated from other systems, Patrick Dreher, a senior technical director at DRC, told Military Information Technology.His company worked with the US Army on the Rainmaker cloud-based intelligence system, which integrates different data models used by the intelligence community. "For example, when Afghan drug lords finance Taliban insurgents, data from one database can be combined with Taliban financing data from an Army database inside the cloud, allowing analysts to make timely, critical connections and stay one step ahead of insurgents."

Thursday, January 10, 2013

Big Data on the Final Frontier

Missions in space may come and go, but the National Aeronautics and Space Administration has always stuck to a mission of bringing in data.

(

One of its early achievements in this field was sending a spacecraft close enough to Venus to get accurate readings of its surface and atmosphere. On Dec. 14, 1962, the Mariner 2 spacecraft got within 34,762km (21,600 miles) of the planet. Over a 42-minute period, it was able to pick up many points of data that proved Venus, which had been thought of as Earth's twin, would be uninhabitable, with a surface temperature of 425°C (797°F) and a toxic atmosphere.

This picture (from NASA's site) of the data gathered in that mission is cropped. The paper showing the data that was gathered is actually much longer, as this uncropped version shows.

Back then, the data covered a roll of paper, but the data NASA handles today takes supercomputing power to process. As Nick Skytland wrote in NASA blog post in October:

In the time it took you to read this sentence, NASA gathered approximately 1.73 gigabytes of data from our nearly 100 currently active missions! We do this every hour, every day, every year -- and the collection rate is growing exponentially...
In our current missions, data is transferred with radio frequency, which is relatively slow. In the future, NASA will employ technology such as optical (laser) communication to increase the download and mean a 1000x increase in the volume of data. This is much more then we can handle today and this is what we are starting to prepare for now. We are planning missions today that will easily stream more [than] 24TB's a day. That's roughly 2.4 times the entire Library of Congress -- EVERY DAY. For one mission.

Thursday, December 20, 2012

Inhalers that do more than dispense medication

Louisville, one of IBM's 100 selected selected cities is putting big data to work to track asthma triggers with Asthmapolis. Read about it in

Friday, December 14, 2012

Big Data Health Hazards

If anything can go wrong, it will." Murphy's Law (or Sod's Law, as it is known in the UK) applies to big data projects, as well. When those projects concern someone's health, something going wrong in the data can lead to something going very wrong with the patient.

The more one relies on the accuracy of the system, the higher the potential for error. Electronic health records (EHRs) are considered a boon to data aggregation, but they hold a potential downside. Read more here

Wednesday, December 5, 2012

Big Data and Cancer Research

That's the subject of my post Big Data Battles Cancer. For a completely different take on cancer, see Beyond Pink.

Tuesday, November 27, 2012

Big Data Applied to Health

I've written several pieces on the topic from various angles:

On how cell phone data is used to map the spread of Malaria in order to come up with effective prevention in Africa Analyzing Cellphone Data for the Greater Good

On Retrofit's approach: Data Gets Personal to Fight Obesity

On UPMC $10 million dollar big data plan: Creating Custom-Fit Healthcare

Monday, November 12, 2012

Dial a data scientist

Saturday, November 10, 2012

Big data for voters

Thursday, November 1, 2012

Thoughts on the Human Face of Big Data

See my blog post on the brand new board, Big Data Republic, Predicting People's Participation in a Big-Data Project

Tuesday, September 18, 2012

IT and business don't always agree on big data

Not all decision makers within an organization are on the same page with respect to big data plans. The disparity is due to the different perspectives of the business and IT end of the organizations.

Read more: IT, business have different views on data - FierceBigData

Thursday, September 13, 2012

Analysis in light of the Pareto Principle

Many businesses who are not getting as much utility out of big data as they would like identify the source of the problem as their inadequate hardware, and inadequate finances. However, in a Smart Data Collective post, Paige Roberts argues that it's not the hardware, but the software that's to blame.

"Investing in better utilization of existing hardware is a far better, more sustainable, and cost-effective solution" for businesses who find their current setups inadequate. Roberts points to the inefficiency built into current "utilization rates of hardware [that] are around 15 percent worldwide." Even the most efficient data centers max out at only 20 percent, meaning that 80 percent is untapped.

Do those numbers ring a bell?

Tuesday, September 11, 2012

What do Long Island and Arlington, Texas have in common?

The answer is science. This topic was of particular interest to me because I've visited Brookhaven, one of the institutions involved in the partnership, multiple times. It's on Long Island, which, surprising, as that may be, actually has quite a history in connection with science as engineering, including the space program.

Their goal is to extend the PanDA system for more general applications. Brookhaven and UT Arlington originally developed the workload management system to process the massive quantities of data involved in a component of the research of the Large Hadron Collider, or LHC.

Search This Blog