Several months ago we ran a survey on developers' usage of Linux native containers utilizing the Docker format (LNCDF) in their development process. We solicited respondents from twitter (my account, OpenShift account, Kubernetes accounts, and encouraged retweets), the OpenShift newsletter, and calls to action on various blogs. We promised to release the report and the data to the public.

Today, you get to read the report as well as download the data (CSV format). I have removed the IP addresses for privacy reasons, but I have kept in the country and continent data derived from the IP addresses. The data contains all the free text responses as well, but I did not do any sentiment analysis or other qualitative analysis.  Where it seems to be possible to summarize some main points from free text responses I will try to show them.

You can also download the R script I used to process the data. I don't write the cleanest R syntax, but I believe all the steps are statistically correct.

Oh and in case you don't remember the original survey, I have made that available for download as well. Where possible, such as "programming languages used", I randomized the presentation order of choices to prevent bias.

TL;DR

(copied from the conclusion)

One of the overwhelming conclusions of the survey is that the use of LNCDF has made developers lives better for their day in and day out development. This fact was true across all age groups and experience categories. Our survey respondents work in a wide array of programming languages and use very different tool sets (IDEs vs Text Editors and CI/CD versus manual builds) yet they all found Docker containers of benefit.

Contrary to common stereotypes, using LNCDF does not appear to be appealing to just the young "hipster" developers. It is well accepted along all the age categories. Even those who have not used it yet expressed that they knew this was a technology they would need to learn and use.

On the other hand, the survey results show that the way in which developers use the technology is far from standardized. This is evidenced by the wide variation in responses for how developers are using LNCDF in development and how they build and run the resulting images. Moving past the local machine we see a wide array of uses of the technology. This appears to be common in the early days of widespread adoption of a "new" technology. What should be interesting to see is the rate of maturation and standardization given the broad array of players in this space.

To summarize, LNCDF are here to stay for developers but there is still a lot of growth to happen in maturing the development, building, and running of these containers. We can see this in the robust competition we see among  groups working on containers specifications, development tooling,  automation solutions, CI/CD tools, and orchestration/scheduling/hosting technology.

As always I encourage feedback and comments. And with that introduction, let's dive into the analysis.

Basic Demographic Data on Developers using Docker

We asked some basic demographic questions about our participants. The purpose in asking these questions was to see if there was particular age, experience, or geographic bias or patterns in our users. For those of you who are doing these kinds of tracking surveys in your own company, this data can help you to compare to your "typical" user.

Age

There were very few teenage respondents but other than that, the data shows a nice bell curve. Far and away, the dominant age group is those in the 30s with a slight rightward bias toward older age categories. We will use this data later to look at how other responses are affected by age of the respondent.

age

Experience

Despite the age curve following a typical normal distribution, users experience has a right biased distribution. I found this result interesting because much of the perception about developer is that there is a huge influx of new developers to the career. Our distribution could be seen as either representing either a more realistic experience range of developer or perhaps we have a biased sample. Again. we may use use these demographics to try and do further analysis of our later questions

experience

Geography

The majority of our respondents came from Europe and North America. Given the low counts from other continents it will not be possible to look at geographic variation besides between EU and NA.

continents

In terms of counts for individual countries, we can see that the US had the largest response rate with a quickly declining rate for other countries. Given the low counts per country it will not be possible to give country level analysis.

Countr                    Count
United States 41
Germany 14
United Kingdom 13
France 11
Canada 10
Brazil 9
Italy 7
Russian Federation 7
Australia 6
(Other) 78

What and How they Program

What Programming Languages are used

We first asked the respondents what programming languages they used more than once a month. This was to avoid languages that might be a hobby or not really their "working" languages. They were allowed to pick multiple choices, including "Other", which means that the sum of the bars will be higher than the total number of survey participants. The order of languages presented was randomized for each user to prevent bias in language location in the list.

progLanguages

Client side JavaScript being the most popular is both unexpected and interesting. Since the respondents are primarily developers using LNCDF, this indicates that most developers are also most likely full stack developers, having to both client coding and server side coding. While Java and Python are the second most popular, I think this result may be a selection bias. OpenShift and Red Hat followers on Twitter tend to lean more towards Java and Graham Dumpleton, our main blogging referrer, is quite well known in the Python community.The top 3 languages in the "other" category are C/C++, Scala, and Bash.

The main take away from this data is that there is a fairly large range of programming languages used by the developers of this survey.

Code Editors

We also asked respondents about what tool they use to write code, and again they were allowed to pick multiple choices (along with "Other").

editors

It may seem odd, given the languages represented, that the count of developers using text editors is high. Given that we allowed for multiple selection, it turns that about 50% of the other IDE users also use text editors to get their work done.

Percentage of IDE users who also use Text Editors

IntelliJ Eclipse Netbeans Other VS Web
54 59.1 50 31.6 41.7 66.7

Another interesting result is the heavy usage of IntelliJ products by developers despite the availability of free options. The "other" category included a few products made by IntelliJ other than IDEA. The most common in "other" was Atom, which is interesting because I consider it a Text Editor similar to Sublime.

Production Applications in the Cloud

Respondents were asked if they would be willing to run their application in "The Cloud".

Respondents willing to run their application in the cloud

No Yes NA's
40 135 21

Of the 175 respondents who answered 77 percent are willing to run their application in the cloud.

How they Use and Feel about Docker Containers

Used Docker

The first question we asked was whether people had used Docker containers or not. Unfortunately I did not set up the survey to branch after this first question. This may lead to more a lot more skipped responses for the other questions about LNCDF

Have you used Docker
No Yes NA's
35 159 2

Approximately 82% of the respondents have used LNCDF.

I thought that perhaps some of the difference in running production applications in the cloud was due to lack of faith in containers. Therefore, I ran a crosstab between Docker container usage and willingness to run in the cloud.

Cross tabulation of Having Used Docker Containers and Willingness to Run in Cloud (cell values are percentages)
Used Docker Run in Cloud    
No Yes Sum
No 4.6 12.0 16.6
Yes 18.3 65.1 83.4
Sum 22.9 77.1 100.0

The results do not support my hypothesis and seem to indicate there are other factors involved in people's reluctance to run in the public cloud.

What Non-Users Think of Docker

For those who had not used Docker we tried to assess their feeling about the technology.

Sentiment Count Percent
    Gotta’ try it soon 19 52.8
    Ground breaking new tech 2 5.6
    It’s yet another hipster tech – count me out 2 5.6
    Not sure 6 16.7
    VMs are fine for my development work 2 5.6
    Waiting for it to be more matur 2 5.6
    NA 3 8.3

Over 50% of the non-users indicated that they needed to use the technology soon, with remaining respondents generally being unsure or not foreseeing a need for LNCDF. Again, this is only 18 percent of the total respondents to the survey.

Docker in CI/CD Process

Slightly more than 50% of the respondents use Docker in their CI/CD process

No  Yes   NA's
85 94 17

Docker in Rapid Local Development

We see almost the exact same numbers for the use of LNCDF in local rapid development

No  Yes  NA's
86 92 18

If we look at the cross-tabulation between the last two questions, there is a fairly good correlation between use in CI/CD and local development, but there is also cases where it is used in one case but not the other.

Cross tabulation of using Docker containers for local dev. and in CI/CD (cell values are percentages)
Local Dev Docker CI/CD Docker    
No Yes Sum
No 30 16 47
Yes 18 35 53
Sum 48 52 100

 

We also asked developers, if they used LNCDF for development, how did it make their working lives.

quickIterStatus

It's obvious that for most developers, Docker containers has made their development lives better. This leads to questions about how demographics might influence these responses. Our sample size is big enough that I can actually statistically test if there are certain demographic groups that have different opinions than other groups.

There is a graphical technique, association plots, that can be used to visualize the associations and show the statistical test as well. These methods are available in the vcd package in R. I am going to take a little extra time explaining the diagram here but here is a nice quick introduction to it's use, .

Here is the association plot  between feelings about LNCDF for quick development flow and age:

quickIterStatusVage

On the X axis, each column of blocks is the age group, with the width of the bar being the number of respondents in that category. For example, there were a lot of 30-39 year olds who responded with "Better" and very few 19 or younger age group who responded "Worse"

The Y axis is the respondents answer for how Docker containers made their life for quick iteration development. The dotted line for each sentiment represents the expected response if there was no difference between age categories. If each age group answered the questions in exact proportion to their overall numbers then that would say there was no effect of age on their sentiment. It would also show up as absolutely no height to the gray bars.

The direction of the bar indicates whether there is more or less people in this category compared to what is expected under no association. For example, in 20-29 year olds, there is less than expected in "Better" category and more than expected in the "Mixed" category.  Finally, the color of the bar indicates the statistical probability that this group is different from expected number. Since all of these are grey, there is no statistical evidence that any of these are different from the expectation of no association.

The bar on the right and the p-value below it give even more credence to the idea that the data shows no association. For those of you with statistical backgrounds, please shut your eyes for the next bit as I give a "simple enough" explanation of the p-value. For the rest of you; the p-value can range between 0 and 1 and represents the probability that what we are seeing is a random association.  A small p-value, e.g. 0.002, indicates that the chance we would see this arrangements heights of bars by chance is 0.2%. On the other hand a high value, such as we have here, indicates there is no reason to believe these association we see are any different than if we just put people in the categories based on their representation in the respondents. In general, values below 0.1 and 0.05 (10% and 5% chance) are when statisticians think things get exciting. Ok statistical people you can open your eyes now, nothing to see here, just move along.

Now with all that explained, let's do an analysis with years programming:

quickIterStatusVexp

With this cross tabulation there is very weak evidence that developers with 10-14 years experience find the experience "Worse" more than expected (NOTE: there are still more developers in this age group  who find it "Better" overall).

There was no pattern with either "Continents" or "Willingness to Run in the Public Cloud",  so they will not be displayed here.

Free text suggestions for making quick iteration better

There was a diverse set of suggestions and opinions and I didn't really see any overarching themes. Of course you may see some and I would to hear them as comments.

Created and Shared Images on Docker Hub

A little less than 50% of the respondents have created and shared images on Docker Hub

 No  Yes
97 80

If you take a look at the responses of those who did create Docker images, some are using CI/CD or automated builds, but it seems most are doing hand builds and manually uploading. There also seems to be a good number who are taking base images and then modifying them for their own purposes.

Where Developers Run Their Containers

The final question we asked developers was where they run all these nice containers they are using (again, respondents were allowed to choose multiple options).

runLocation2

The high number for "Local Machine" is to be expected since developer usually run things on their local machines. There is an almost equal amount between the other 3 chosen options, with a slight favoring towards DIY systems. The free text responses in the "Other" category had no overall pattern and were mostly just a variation on the options provided.

Conclusion

One of the overwhelming conclusions of the survey is that the use of LNCDF has made developers lives better for their day in and day out development. This fact was true across all age groups and experience categories. Our survey respondents work in a wide array of programming languages and use very different tool sets (IDEs vs Text Editors and CI/CD versus manual builds) yet they all found Docker containers of benefit.

Contrary to common stereotypes, using LNCDF does not appear to be appealing to just the young "hipster" developers. It is well accepted along all the age categories. Even those who have not used it yet expressed that they knew this was a technology they would need to learn and use.

On the other hand, the survey results show that the way in which developers use the technology is far from standardized. This is evidenced by the wide variation in responses for how developers are using LNCDF in development and how they build and run the resulting images. Moving past the local machine we see a wide array of uses of the technology. This appears to be common in the early days of widespread adoption of a "new" technology. What should be interesting to see is the rate of maturation and standardization given the broad array of players in this space.

To summarize, LNCDF are here to stay for developers but there is still a lot of growth to happen in maturing the development, building, and running of these containers. We can see this in the robust competition we see among  groups working on containers specifications, development tooling,  automation solutions, CI/CD tools, and orchestration/scheduling/hosting technology.