Over the past month I’ve been attending a series of Big Data related presentations in the Los Angeles area. I thought I’d provide a quick aerial view of observations from these presentations.

Big Data definition:

A term for data sets that are so large that traditional data process applications are inadequate.

This definition traces back to around 2004/2006. In this era Google published a paper with details on their MapReduce concept. People recognized the value of the work, got together in the open source community and put together Hadoop and HDFS.

The Big Data field is now a 10 year old. When it started, the “traditional data process applications” it proposed to supplant were SQL databases. Since the birth of Hadoop, the amount of data being generated has been doubling every 2 years.

But as we all know, as things get older and as things change, questions start to arise. The questions surrounding the subject of Big Data are:

  • Isn’t the Hadoop/HDFS solution now the “traditional data process application” itself?
  • The size of data grown by way more than an order of magnitude since MapReduce, Hadoop, HDFS concepts were architected. Would you do it the same way now?

In short, is it once again time for something new?

The job of an engineer is to build a cost effective solution for the problem at hand. When a requirement increases by an order of magnitude, it’s rare that a new “greenfield” solution can’t do much better than an incremental enhancement to the original. When a requirement grows by two orders of magnitude, the original approach might be ridiculous.

Let’s say you were asked to design a vehicle that could move a single person at speeds of up to 5 miles/hour. A skateboard would be a fine, cost effective, solution.

What happens when people love the skateboard, and ask for a top speed improvement to 50 miles/hour. You decide to put a motor on the skateboard. Yes, it would be far from optimal, but you could build one, and it would work. You’d never do it this way given a clean start.

What if the requests don’t stop there? What if people request to reach the goal of 500 miles/hour. Yes, you could strap a jet engine on a skateboard, it’s theoretically possible. I wouldn’t ride it, would you? You could then successfully say you’ve “gone to plaid.”

Well, unless you want to say that what constitutes the label “BIG”  got carved in stone 10 years ago, and will never change again, you have to consider that the suite of big data tools is ripe for change.

Over the past 10 years:

  • Hardware changes have been dramatic
    • SSD vs HDD
    • Direct memory storage interfaces vs SATA, FibreChannel, Infiniband
    • CPU architecture focus moved from higher Ghz to higher core count
  • Data ingest rates and size of retained data sets are up orders of magnitude and growth is not slowing

Thankfully I’m not the only one to notice an opportunity for improvement here.

Below are just some of the interesting developments underway in the Big Data arena:

Many organizations have noticed that incoming data streams are still needed for traditional transactional workloads, alongside a Big Data analytics workload. In practice this can result in keeping multiple copies of data with a lot of overhead associated with data movement. The “holy grail” is to come up with a means to efficiently support random queries and analytics simultaneously, from a single copy of the data.

At the beginning, the practitioners of Big Data analytics criticized the popular shared storage architectures of the era, such as SAN and NAS. They favored use of DAS, underneath a distributed file system (HDFS). HBASE was used to supplement performance for applications needing random access.

New developments on SSD and storage interface fronts, may be presenting opportunities to benefit from a new wave of re-architecture.

Amr Awadallah, CTO and co-founder of Cloudera gave a convincing presentation on Kudu, a project that takes advantage of modern hardware to supplement HDFS and HBASE by offering something that bridges the performance chasm between them. Based on measures of latency and throughput, Kudu achieves this goal.

John Leach, CTO and co-founder of Splice Machine, described a solution that addresses transactional and analytic workloads using a single copy of data. It supports transactional workloads by placing a SQL query engine as a top of a layer on an HBASE/HDFS stack, with SPARK on the side. The query engine is based on DERBY.

Arun Murthy, co-founder of Hortonworks, described achieving high performance SQL using Apache HIVE running on YARN (a scheduler for big data processing). Other advances include moving HDFS from a single storage class into a tiered (Memory+SSD+HDD) service, and Yarn enhancements to support dynamic execution with data locality.

Google may be causing another shake up, reminiscent of the original MapReduce paper, with the release of Dataflow as Apache Beam.

It is interesting to observe that the Big Data field that disrupted traditional data processing and storage seems to be taking the wise step of disrupting itself, in a refresh 10 years later.

I recently returned from AT&T’s Developer Summit & Hackathon. This was a developer training and social event directed toward new applications enabled by device connectivity including home automation, wearable devices, wireless telemetry, and “Smart Cities”.

This area, sometimes called the Internet Of Things, is fostering collection of data related to many aspects of environment, commerce, health, and transportation. I set out to write a brief trip report and I have to apologize in advance because this subject triggered some passions. This one “broke off the leash”.

Examples of data sources:

  • Crowd sourced temperature and humidity readings from privately operated devices with weather sensors (cars, consumer electronics).
  • Parking slot occupied/empty status.
  • Retail store refrigerator case status.
  • Bus/Train stop occupancy count.
  • Vehicle position and accelerometer readings
  • Cameras

Examples of applications:

  • Cars observing a rapid transition through the freezing point, under wet conditions could trigger icy road warnings specific to individual curves in the road.
  • Public transportation could actively adjust capacity and routing based a live demand.

AT&T cited an ABI Research forecast of 40.9 billion connected devices by 2020. Being a skeptic of claims made during marketing events, I looked for a second source and found this forecast from the Gartner analyst organization:

Internet of Things Units, Installed Base by Category (millions)

Category2014201520162020
Grand Total3.8074,9036,39220,797
Consumer2,2773,0234,02413,509
Business: Cross-Industry6328151,0924,408
Business-Vertical-Specific8981,0651,2762,880

Source: Gartner November 2015

Gartner’s forecast is smaller than ABI’s, but still, the predicted device count is a multiple of the world’s population. If forecasts are even remotely true, this will have huge impacts, not just on the IT industry, but on non-IT industries.

Think back on the impact of the internet and cell phone connectivity over the past two decades. Huge companies have been forged from nothing more than people and ideas. Others have been swept from market leader to extinction. It is obvious that connectivity (IOT) will fundamentally change the way the “things” are built.

Observations:

  1. IOT is going to generate way more data than we have today. Even a tiny device can generate a huge amount of data. And we will have lots of devices.
  2. Tiny devices, are not going to hold data for long. The devices will forward data to storage in internet connected remote data centers. This will boost demand for data center storage.
  3. Data in the cloud will beget an environment conducive to data analysis in the cloud. This will boost demand for data center compute capacity.
  4. User interaction requirements will drive demand for applications that provide user interface for configuration and display. These applications will run on connected devices, or on general purpose devices like smartphones. It’s likely that the same cloud hosting the data and analysis, will be suitable for hosting the mobile device applications that use the data.
  5. Some control loops, such as those demanding low latency, and extremely high security and availability, should run at the local site. However there are plenty of applications are suited to remote cloud execution. Examples: furnace/AC filter life monitoring; vehicle brake pad wear prediction based on accelerometer measurement of driving pattern. There will be both local and cloud hosted apps.
  6. There will be a Metcalfe’s Law effect with cloud hosted IOT data. cantbewong’s adjunct to MetCalfe’s Law: cloud value is proportional to the square of the number of IOT data connections. When you live in the most popular cloud, you will have lower latency access to the universe of other data you interact with, without adding an addition point of failure. Hosting for IOT data has aspects of a “winner takes all” poker game.

Ultimately, IOT will drive demands for carrier bandwidth, and cloud hosting business. Cloud hosting will include data storage, data analysis, and application hosting and delivery.

Existing companies who ignore an IOT strategy do this at peril of their lives. Basis of argument: Look at what internet version 1 did to the retail, music selling, video and IT industries.

I came away from this conference with the impression that AT&T is in full “land grab” mode for this opportunity. What actually surprises me is how few other technology companies seem to be making strategic commitments in this sector.

Technology choices are likely to be “sticky”

The technology choices include connectivity (WiFi, vs cellular) and cloud hosts (APIs, analytic services, application delivery services). Embedded devices are not updated frequently, or in an easy risk-free fashion.

WiFi and cellular data networks are currently the leading candidates for device connectivity. Resiliency, power consumption, and widespread geographic availability tends to make cellular attractive for battery operated and roaming mobile devices.

Demonstrations

During the event keynote, AT&T brought to stage these partners:

  • Red Bull demonstrated use of live vehicle telemetry feeds for use by the Formula 1 team they sponsor. Red Bull also demonstrated connected refrigerators that they provide to retailers of their energy drinks. There will be 200,000 of these deployed in the US, reporting GPS location, temperature, and door open/close cycles (which correlates predictably with sale of a beverage). Through these refrigerators, they expect to be able to monitor and optimize sales and inventory
  • Ford announced an exclusive multi-year agreement that will result in connectivity of all new Ford vehicles sold in the US and Canada by 2020.

On the expo floor, I saw an intriguing demo of an augmented reality enhancement to a mountain bike. Force, strain and angle sensors were installed on the bike. Sensor readings could be superimposed on live video of the bike while in operation. This is useful as design tool for bike engineers, showing actual dynamic loads under real use conditions.

Augmented reality bike

augemented reality bike prototype

I’m not sure how big the market is, but this is an example of IOT inspiring human minds to consider things that couldn’t be done before. Eventually, some of these ideas are going to change the world. link: Santa Cruz bike video

Hackathon

The first 2 days of the conference were devoted to a hands on Hackathon competition. I expected something like the Hackathon events that have been held at other conferences I have attended, such as Linuxcon. These generally involve participants encouraged to develop code for an open source project and award a Raspberry Pi or Arduino to the winner.

The AT&T event’s Hackathon was in a different league. Over $100,000 in cash was awarded, along with $100K’s of additional amounts in hardware. Many vendors of embedded system hardware and home automation were present, not just AT&T. Sponsors included Samsung Smart Things and Amazon Alexa.

I quickly realized I would be at a disadvantage, arriving as a solo contestant – many arrived in organized teams, with advance preparation. The Hackathon event attendance was bigger than some whole tech conferences. I’m guessing over 700 people. When I registered, I wasn’t paying attention to the Hackathon details, and enrolled for the educational value. I did not come away disappointed and I had a great time.

I decided to build an attic fan controller that operates based on interior temperature/humidity and exterior conditions, including weather forecasts obtained via REST API over wireless connection. My device logged all operational data to the AT&T cloud using their M2X API. This data was viewable from a smartphone.

my hackathon creation

my Hackathon entry – attic fan controller using local weather forecast via API, with performance logging to cloud

In assembling my prototype I felt like a kid at the free candy store. Vendors contributed hardware, sensors, and live support for the asking. The Mbed (ARM) booth was particularly helpful. In the end, I got to keep all the hardware. I think the DIRECTV developer boxes, and Digital Life security systems were the only exceptions to the take-it-home offer.

I ended up learning so much in the Hackathon experience, that the API sessions during the main conference seemed repetitive. My summary review for the Hackathon is: If you are a hands-on hardware + software developer, and they hold another one, you should go.

Disruptive hardware

I got to try out the Occulus powered Gear VR at the conference. 3D headset technology gets a lot of press, but I think that some of the home automation hardware might be even more disruptive in terms of enabling applications, and displacing incumbent vendors.

The early generation home automation hardware optimized for cost. As home automation moves applications such as door lock control, fire detection and intrusion detection, devices are getting better.

UL compliance in this arena requires battery backup, security, including anti-jamming mitigation, and supervision. This will start out “good enough” for the consumer market, with very low prices compared to what is currently used in industrial markets such as SCADA. This has the earmarks of a classic Clayton Christensen style market disruption.

I used to work in the industrial control business, and I can tell you that industry has been clinging to a combination of ancient legacy data communication technologies, along with dubiously implemented security. You often here users make the claim that “my network is safe because it is isolated from the Internet”.

But if your user interface stations are based on Windows, and other software, keeping this security patched over an air gap is problematic. We live in a real world where your own employees are subject to foibles, and even terrorist recruitment. An internet isolation breach is just a routing accident or a tethered cell phone away.

How well is this working now?, not very well according to the US government CERT team. It might be time to end the failing attempt to air gap SCADA systems, and put in place technology that keeps them safe even while a path to the Internet exists. I think it’s likely that the R&D investments and economies of scale of IOT hardware and firmware components will displace moribund industrial sensor and control technologies.

Disruptive software

Big Data is sometimes described as a data set that is so big or so large that traditional data processing tools are inadequate.

If you think your current data is big, wait till you see what IOT is going to unleash – I am going to call it Big Data².

If Big Data is a fire hose, Big Data² is a dam overflowing the spillways – this is going to drive the creation of new technologies.

Corps tests Hartwell Dam spillway gates

Source: Photo by Doug Young, Lake Hartwell Association

Disruptions to Society

Growing up with internet entertainment and social networks has shaped the defining characteristics of the millennial generation. The internet usurped the role of video, music, and news delivery. The internet and cell phones usurped the role of written letters and landline calls. Internet hosting of written documents fostered demand for search, resulting in the formation of Google.

When you look at the internet of things, it is poised to collect and host data related to environment, health, transportation, public, and private infrastructure. But it will go beyond simple data collection. It will also host analysis and reactions to this data. In short, it is poised to become the nervous system for society.

As an IOT grid drives control loops for commerce, transportation and public safety, it will become a target for mischief, criminals, and terrorists. It will be important that the system is highly availability and protected from tampering.

There is great opportunity to improve the quality of life for citizenry, but it should not be squandered by careless house-of-cards solutions. High availability, security, resistance to denial of service attacks, and protection against data forgery and tampering are all important.

Miscreants are not the only potential threat to open flow of information. Look at this “Smart City” website for the city of Los Angeles. You are greeted with this introduction:

A Message from Mayor Eric Garcetti

We are sharing city data with the public to increase transparency, accountability and customer service and to empower companies, individuals and non-profit organizations with the ability to harness a vast array of useful information to improve life in our city. I hope that this data will help drive innovation and problem solving within the public and private sectors and that Angelenos will use it to more deeply understand and engage with their city. I encourage you to explore data.lacity.org to conduct research, develop apps or simply to poke around.

 

This has the potential for real good, but it also has the potential to become politically charged. Much of the data related to government services could be interpreted as a “report card” for elected officials. There will be temptations to manipulate or restrict data.

There are already examples of government attacks on independently gathered data that predate IOT. The Fukushima nuclear disaster, and Flint, Michigan water contamination incident show that independent data sources can have great value.

When a Flint doctor submitted data showing high blood lead levels in toddlers, “the state publicly denounced her work, saying she was causing near hysteria”.

When Sean Bonner, entrepreneur and co-founder of the Los Angeles hackerspace, Crash Space, found that no meaningful radiation measurement were being made available by the Japanese government, he started a revolutionary movement to commission an army of volunteers to construct instrumentation, and crowd source data collection. The SAFECAST organization was born.

The internet may have deprecated newspapers and even the printing press, but open flow of data needs to be viewed as the modern form of Freedom of the Press.

There is much professionally organized PR about wondrous benefits of IOT, enabling self driving cars, to lower traffic congestion and provide transportation for the disabled. Lurking skepticism , leads me to suspect that taxation and revenue generation will be what really drives the IOT discussions in smoke filled rooms of government planners. You can already observe that some of the same cities that criticize Uber for surge pricing, are eager to deploy parking meters that adjust for time of day supply demand imbalances.

Just as an example of an unforeseen outcome, I can envision high parking prices inducing owners of self driving cars to command their vehicle to circle the neighborhood, empty, to avoid high parking prices. Outcome: self driving cars lead to more, not less traffic congestion. Likely second move on chessboard: Track vehicle usage, and charge by the mile, with surcharges in congested areas (“surge” pricing?). Whenever “Smart Cities” involves revenue generation, cat and mouse arms races will be inspired. Some of these battles will become  political “hot potatoes”.

The open data manifesto

IOT is going to disrupt the world. It’s not a question of if, but how much, and how soon.

Government agencies, and entrenched interests will occasionally feel threatened by IOT.

Open source software has benefited the world at large, but at times pressured commercial interests. Open data will do the same.

There is a great opportunity at stake. It will be important to establish a broad-based perception that data should be unrestricted.

From Wikipedia:
Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control.[1] The goals of the open data movement are similar to those of other “open” movements such as open source, open hardware, open content, and open access…

I submit that there is a need for an Open Data Bill of Rights:

  1. There should be no restriction on publication of legally obtained data.
  2. Data collected by government entities, or at government expense, should be published with open and free access, in raw form, without subjective post processing or manipulation.
  3. Those who corrupt data, or interfere with its availability, including government, will be subject to criminal prosecution.
  4. The people have a right to be secure in their homes, personal effects and information stores. As such, government shall be prohibited from compelling publication, or seizing, private data without a court order based on probable cause. You cannot be compelled to publish a data source coming from within your home.

Call for your contribution

I am hoping that watch dogs, like the EFF, will be there to protect the public interest. But, the technology here is complex. It is really up to technologists, such as those who I hope read this blog, to educate and inform the public, and those in public service. Everyone in the technology field stands to benefit personally, and perhaps even career-wise.

Community contribution on a technical, educational, and political basis will be needed to allow the IOT opportunity to deliver its full potential. If it does this, it can improve transportation, safety, healthcare, and the environment. It can also boost productivity and the economy. In short, it can raise the quality of life.

The IOT revolution is coming. It will not be televised. But it will be blogged and tweeted, spread the word amigos…

@cantbewong

Wrapping up this series, here are a few more non-price based reasons why users might choose open source:

Can customize

Commercial software economics works best when a common source base is used to address the needs of many customers. This has two effects:

  1. Features used by many, or most, customers are well addressed. But niche features are left out, or less completely implemented
  2. Features are included, that you, as a unique individual customer, will never use.

If you are a service provider, Amazon, or Twitter, for example, you likely have some specialized needs that are not common to the typical customer. Open source gives you the ability to add missing features, or change the implementation of features that are already there.

Surplus features have a cost in resource usage, and in bug exposure. More lines of code = higher probability of bugs. More code also brings a higher security risk exposure. Open source gives you a starting point, from which you can potentially remove code associated with unwanted features.

Suppose doing this removes 5% of memory usage, and removes a library with chronic patch requirements. This small difference wouldn’t concern a small business running a single instance. But if you are a service provider, running 1,000s of instances 24 by 7, getting rid of the fat is attractive.

Anybody can fix bugs

Access to the source can help with troubleshooting and document proper usage

Commercial software is commonly bundled with a time limited support agreement.

There will always be some subset of users that don’t even make an attempt to read documentation. This frequently results in a tiered support arrangement where a picket line of lesser skilled individuals filters out RTFM1 inquiries.

If your organization is staffed with skilled and trained users, you could well have higher caliber people using the product, than those staffing the vendor’s support organization. Self-help, with access to source code might resolve problems faster, and more effectively than paid support.

Source code access is somewhat analogous to having a car maker publish technical service manuals. Just because the service manual is available, doesn’t mean that you have to fix the car yourself. Most people still elect to get car service from the manufacturer. Odds are, you also have the option of using a third party repair shop. And if you are technically inclined, and have the time, you retain the option to try to fix it yourself.

There are vendors that publish source and still offer paid support. Even if you elect to take advantage of the paid support, the availability of the source can be useful to those on your staff that are technically advanced. Depending on the open source licensing terms, it might also cultivate a vigorous community of third party tools, extensions, and collateral documentation.

You do not risk getting stranded if the vendor loses interest in the product or encounters solvency issues

Companies don’t usually live forever. And even if they are still around, product lines can be dropped or sold off.

Take Windows XP as an example. In spite of a large user base, Microsoft elected to drop conventional support, and halt new sales of the software. Let’s assume that this decision was warranted, because the internal architecture was so technologically obsolete that attempts to secure the OS would amount to a complete rewrite anyway.

There could still be unfortunate users, who have applications that depend on XP – and are comfortable that they constrain it to a partitioned environment that mitigates the risks.

Open source at least offers the option of supporting abandoned software by yourself, through a third party, or with a community of users with a similar interest.

Continuing the theme from the previous post, here are some non-price based reasons why a user might prefer open source:

Ability to audit quality and security

Earlier in my career I was a developer of a popular industrial automation software product. A customer, Royal Dutch Shell used the product to control refineries, chemical plants, and pipelines. As a custom negotiated term of purchase, they had the right to audit source code and development processes. They conducted periodic unannounced visits by auditors. Audits would involve examination of source code, QA, and release engineering records, along with interviews of personnel.

In certain high risk businesses, audits are viewed as critical. Why? Look at something like the BP Deepwater Horizon oil spill of 2010. As of 2013, settlements, and trust fund payments by BP had cost the company $42.2B.

In some industries, liability risks can be enormous. Even if a supplier, such as a software vendor, is guilty of irresponsible behavior, a higher entity with “deep pockets” will end up as the backstop for liability.

Sure, vendors routinely profess “commitments to quality”, but history is littered with examples of companies that cut corners where they think it won’t be seen. Open source turns on the lights, to reduce the places where cockroaches can breed undetected. Royal Dutch Shell’s behavior is an example of a consumer that elects to “trust but verify”. 1

Don’t assume that the billion dollar liability club is confined to oil companies. It is easy to imagine multi-billion dollar costs associated with software failures in financial or even media companies. Witness the yet to be determined costs associated with Sony data breach.

Open source has an inherent transparency that makes verification and audit easier. For some users, this attribute can be a stronger factor than price in choosing open source.

The Snowden classified document releases focused media attention on efforts by nation states to inject “back doors” into commercial software.2 Whether these back doors are common, or not, the fear alone has led to distrust of software supplied across political borders.

If EMC is ever to be successful at selling software based products in China, or Huawei in the United States, open source might be the only way it will ever happen. 3


  1. Yes, as in this example, if you are a big customer, you might be able to negotiate audit rights in proprietary software. Open source avoids this added friction in the acquisition process. Also, in theory, open source draws review from many eyes, not just your own. The OpenSSL track record points out that the number of users that actually invest in auditing open source projects is likely small. Open source gives you the right, and ability, to audit, but you should never assume that others are performing an audit on your behalf. 
  2. link: NSA back doors in routers
  3. For hardware based products this probably requires open source to go all the way down into firmware, held on verifiable SD memory cards – along with locally based assembly and component sourcing. 

Some of the reasons consumers choose open source software over commercial closed source alternatives:

  • Price
  • Ability to audit quality and security
  • Can customize
  • Anybody can fix bugs
  • Access to the source can help with troubleshooting and document proper usage
  • You do not risk getting stranded if the vendor loses interest in the product or encounters solvency issues

Price as the basis of choosing open source

Vendors of commercial software, facing an open source competitor, have been known to offer the adage: “It’s only free if your time is worth nothing”1 – implying that use of open source is tantamount to declaring yourself worthless.

Is open source really free? Usually no, but …

All software has some cost of operation:

  • Unless the software is a virus, you have to install it.
  • Often you have to configure it.
  • Unless the version you install never has a bug fix, and never has an enhancement, you will engage in some form of maintenance.

From the user perspective, the question to ask is not “Is it free?” but “How does the non-zero cost compare to the non-zero cost of alternatives?”

There are examples of open source software that are cheaper, or the same cost as commercial substitutes. But there are also open source specimens that have huge learning curves and operational costs.

As the 24×7 support person for my family at large, I can attest that the Firefox browser is as “free” as a commercial alternative such as Internet Explorer. My mom or my dad can install it, and use it, without help. But that doesn’t mean I’d extrapolate this result to something like a non-commercial distribution of OpenStack.

Open source software with low cost of ownership tends to have these characteristics:

  • It’s popular (large base of users)
  • It’s been out for a long time, and iteratively improved during its lifetime

Popularity usually results in vigorous community driven expenditure on features, including ease of use and documentation, leading to lower cost of ownership.

Even if an open source software offering is relatively expensive to deploy and operate, sometimes the proper response from a user should be “So what”.

Users are not alike – and I submit that there is a distribution continuum between these 2 extremes:

  1. Users are willing to spend any amount of money to save time and labor cost.
  2. Users who are will to spend any amount of labor and time to save licensing cost.

In other words, for some number of users, the response to “Is it free?” is “So what, I don’t care”.

  • If you have a high margin, business, with no desire to manage a larger IT staff. Acquisition price might be a minor consideration.
  • If you are an education institution, with a near zero budget, and lots of cheap labor (=students), free licensing might be your only viable solution.

The next post in this series will discuss some of the other reasons a consumer might prefer open source software…


  1. Many vendors engage in the “it’s only free if your time is worth nothing” counter to open source, but rather than going negative on a competitor, I’ll point to an example from my own employer: link. For some, this could be a valid argument, but for others no. 

Most people would trace the open source software movement back to either the Richard Stallman GNU Manifesto in 1983, or perhaps Linus Tovalds’s Linux kernel released in 1991.

Eric Raymond’s “Cathedral & the Bazaar” essay proposed that an open community development model offered advantages over the processes typically used at the time. Author Neal Stephenson forecast the demise of proprietary software in his “In the Beginning…” essay in 1999.

Like many forecasts, these observers may have underestimated time, but also underestimated the degree of change.

Black Duck Software, offers a product for tracking open source software usage for purposes of compliance and security. Together with VC Noth Bridge, they conduct an annual open source usage survey.

Some eye opening results:

  • 78% of companies run open source, <3% don’t use open source in any way
  • >66% of corporate users now consider OSS options, before proprietary software
  • Use of open source has doubled in the past 5 years

There is no longer any doubt that the industry has changed.

I contend that this should not be a surprise. On a “geological” scale the IT industry has always experienced shifts. Because these shifts occur over the course of decades, you can have long periods where people can deny that a trend is in play. You can also have people too new to remember prior comparable disruptions in the industry.

In the 50’s and 60’s standard industry practice was to bundle “free” software with hardware. It was commonplace to see academic and corporate researchers sharing collaborative work with vendors. Usually this software had hardware portability issues, and an antitrust suit against the dominant vendor of the era, IBM, resulted in a settlement that resulted in software being sold as a proprietary product.

IBM still lives today, but most of the others did not. Business model disruptions are tough.

But there might be room for doubt as to whether the business models of legacy vendors has changed.

In the period of 2000-2005, there was a wave of statements by CEOs of proprietary software companies likening open source to communism.

You don’t hear this lately – instead companies like of IBM, Microsoft, EMC, and HP are suddenly crowing about their commitment to open source.

It is clear that the dominant players in proprietary software are now claiming to embrace the open source movement.

When a legacy vendor says they have changed, as a prospective customer, you should ask yourself “Is this real?” Or could this be a “bait and switch” tactic to label legacy products as open source while milking the old business model for as long as it will last.

I work as an open source developer for EMC. “EMC and open source- What does this mean?”

Is EMC just another lumbering dinosaur, arriving late to the open source party?

Stay tuned for my next post..

Who are you?

I’m Steve Wong. I’m a developer. I like to build things. I’ve worked on hardware, software and products that integrate both of these. I’ve worked on teams that have delivered computers, medical devices, factories, pipelines, software products, and clouds. I’ve had the good fortune to have participated in a few successful startups. When the latter of these was acquired, I joined my current employer, EMC.

I’ve held all but one of these jobs – I’ll let you guess the outlier: bartender, dishwasher, restaurant cook, baker, truck dispatcher, security alarm installer, manager of an engineering consulting firm, attorney, guy in fur suit portraying the Meadow Gold Dairy tiger, rodeo clown.

I mentor a high school robotics team. I am fond of books on college basketball coaching. I think there are human motivation and teamwork lessons from  John Wooden, Al McGuire, and even Bobby Knight that are more timeless than the content in the latest “flavor of the month” business best seller. When your feedstock is rookie teenagers, and your delivery dates starts in a few months, that’s a real challenge.

Why start blogging now, did you discover a lost package from 2003 and find a WordPress 1.0 CD?

I recently moved to a new role within EMC.

My former job had me working on classic EMC and VMware “closed source” products. For the most part, the stuff wasn’t even announced until it was for sale. By this time the technology choices you made are old, and less interesting, and the marketing channel wants an exclusive on messaging about the product.

My new role has me working exclusively on open source. It’s time to start blogging!

Is this blog one of those corporate mouthpieces, disguised as a personal blog ?

Nope this is my personal blog. What you find here is my personal opinion.

The content here is not reviewed or approved by EMC.

What you find here is well reasoned and correct. Needless to say this doesn’t necessarily match the opinions and views of every pointy haired boss and PR hack within EMC. All kidding aside, EMC is fairly tolerant of conflicting views – and this is a good thing.

If you read something here, and then quote it as “EMC said this..”, you are not being ethical. In this blog I speak for myself, and only myself. This is not the EMC “party line”. In some cases what I write here might be an attempt to influence those within EMC, as much as those outside.

What are you going to blog about?

This blog is about using modern development tools and platforms to deliver scalable software solutions. I expect that most of this will be about open source software.

Why would a for-profit software company choose to engage in open source?

Stay tuned for my next post…