Ben James – Hackaday

Ask Hackaday: What’s Your Favourite Build Tool? Can Make Ever Be Usurped?

Ben James — Thu, 11 Mar 2021 15:00:10 +0000

What do you do whilst your code’s compiling? Pull up Hackaday? Check Elon Musk’s net worth? Research the price of a faster PC? Or do you wonder what’s taking so long, and decide to switch out your build system?

Clamber aboard for some musings on Makefiles, monopolies, and the magic of Ninja. I want to hear what you use to build your software. Should we still be using make in 2021? Jump into the fray in the comments.

What is a Build Tool Anyway?

Let’s say you’ve written your C++ program, compiled it with g++ or clang++ or your compiler flavor of the week, and reveled in the magic of software. Life is good. Once you’ve built out your program a bit more, adding code from other files and libraries, these g++ commands are starting to get a bit long, as you’re having to link together a lot of different stuff. Also, you’re having to recompile every file every time, even though you might only have made a small change to one of them.

People realised fairly early on that this sucked, and that we can do better. They started to make automated software that could track compilation dependencies, track which bits of code were tweaked since the last build, and combine this knowledge to automatically optimise what gets compiled – ensuring your computer does the minimum amount of work possible.

Enter: GNU Make

Yet another product of the famous Bell Labs, make was written by [Stuart Feldman] in response to the frustration of a co-worker who wasted a morning debugging an executable that was accidentally not being updated with changes. Make solves the problems I mentioned above – it tracks dependencies between sources and outputs, and runs complex compilation commands for you. For numerous decades, make has remained utterly ubiquitous, and for good reason: Makefiles are incredibly versatile and can be used for everything from web development to low level embedded systems. In fact, we’ve already written in detail about how to use make for development on AVR or ARM micros. Make isn’t limited to code either, you can use it to track dependencies and changes for any files – automated image/audio processing pipelines anyone?

But, well, it turns out writing Makefiles isn’t actually fun. You’re not adding features to your project, you’re just coercing your computer into running code that’s already written. Many people (the author included) try to spend as little of their life on this planet as possible with a Makefile open in their editor, often preferring to “borrow” other’s working templates and be done with it.

The problem is, once projects get bigger, Makefiles grow too. For a while we got along with this – after all, writing a super complex Makefile that no-one else understands does make you feel powerful and smart. But eventually, people came up with an idea: what if we could have some software generate Makefiles for us?

CMake, Meson, Autotools et al

Yes, there are a sizeable number of projects concerned only with generating config files purely to be fed into other software. Sounds dumb right? But when you remember that people have different computers, it actually makes a lot of sense. Tools like CMake allow you to write one high-level project description, then will automatically generate config files for whatever build platforms you want to use down the line – such as Makefiles or Visual Studio solutions. For this reason, a very large number of open source projects use CMake or similar tools, since you can slot in a build system of your choice for the last step – everyone’s happy.

Except, it’s really quite hard to tell if everyone is happy or not. As we know, when people selflessly spend time writing and maintaining good quality open source software, others are very kind to them online, do not complain, and do not write passive-aggressive blog posts about 27 reasons why they’re never using it again. Just kidding!

What I’m getting at here is that it’s hard to judge popular opinion on software that’s ubiquitous, because regardless of quality, beyond a critical mass there will always be pitchfork mobs and alternatives. Make first appeared in 1976, and still captures the lion’s share of many projects today. The ultimate question: is it still around because it’s good software, or just because of inertia?

Either way, today its biggest competitor – a drop-in replacement – is Ninja.

Ninja

Examples of popular build tools at different abstraction levels

Ninja was created by [Evan Martin] at Google, when he was working on Chrome. It’s now also used to build Android, and by most developers working on LLVM. Ninja aims to be faster than make at incremental builds: re-compiling after changing only a small part of the codebase. As Evan wrote, reducing iteration time by only a few seconds can make a huge difference to not only the efficiency of the programmer, but also their mood. The initial motivation for the project was that re-building Chrome when all targets were already up to date (a no-op build) took around ten seconds. Using Ninja, it takes under a second.

Ninja makes extensive use of parallelization, and aims to be light and fast. But so does every other build tool that’s ever cropped up – why is Ninja any different? According to Evan, it’s because it didn’t succumb to the temptation of writing a new build tool that did everything — for example replacing both CMake and Make — but instead replaces only Make.

Source: The Meson Build System – A Simple Comparison. Apache 2.0

" data-medium-file="https://hackaday.com/wp-content/uploads/2021/02/build_comparison.png?w=400" data-large-file="https://hackaday.com/wp-content/uploads/2021/02/build_comparison.png?w=605" tabindex="0" role="button" class=" wp-image-461286" src="https://hackaday.com/wp-content/uploads/2021/02/build_comparison.png?w=400" alt="" width="482" height="271" srcset="https://hackaday.com/wp-content/uploads/2021/02/build_comparison.png 605w, https://hackaday.com/wp-content/uploads/2021/02/build_comparison.png?resize=250,140 250w, https://hackaday.com/wp-content/uploads/2021/02/build_comparison.png?resize=400,225 400w" sizes="(max-width: 482px) 100vw, 482px" />

Source: The Meson Build System – A Simple Comparison. Apache 2.0

This means that it’s designed to have its input files generated by a higher-level build system (not manually written), so integrates easily with the backend of CMake and others.

In fact, whilst it’s possible to handwrite your own .ninja files, it’s advised against. Ninja’s own documentation states that “In contrast [to Make], Ninja has almost no features; just those necessary to get builds correct. […] Ninja by itself is unlikely to be useful for most projects.”

Above you can see the differences in incremental build times in a medium-sized project. The two Ninja-based systems are clear winners. Note that for building the entire codebase from scratch, Ninja will not be any faster than other tools – there are no shortcuts to crunching 1s and 0s.

In his reflections on the success and failure of Ninja, Evan writes that:

“The irony of this aspect of Ninja’s design is that there is nothing preventing anyone else from doing this. Xcode or Visual Studio’s build systems (for example) could just as well do the same thing: do a bunch of work up front, then snapshot the result for quick reexecution. I think the reason so few succeed at this is that it’s just too tempting to mix the layers.”

It’s undeniable that this approach has been successful, with more and more projects using Ninja over time. Put simply, if you’re already using CMake, I can’t see many reasons why you wouldn’t use Ninja instead of make in 2021. But I want to know what you think.

Over to you

It’s impossible to write about all the build tools around today. So I want to hear from you. Have you switched from make to Ninja? Do you swear by Autotools, Buck or something else? Will make ever go away? Will there ever be a tool that can eclipse them all? Let me know below.

The Black Magic of a Disappearing Linear Actuator

Ben James — Tue, 29 Dec 2020 09:01:09 +0000

Many of the projects we serve up on Hackaday are freshly minted, hot off the press endeavors. But sometimes, just sometimes, we stumble across ideas from the past that are simply too neat to be passed over. This is one of those times — and the contraption in question is the “Kataka”, invented by [Jens Sorensen] and publicised on the cover of the Eureka magazine around 2003.

The device, trademarked as the Kataka but generically referred to as a Segmented Spindle, is a compact form of linear actuator that uses a novel belt arrangement to create a device that can reduce to a very small thickness, while crowing to seemingly impossible dimensions when fully extended. This is the key advantage over conventional actuators, which usually retract into a housing of at least the length of the piston.

It’s somewhat magical to watch the device in action, seeing the piston appear “out of nowhere”. Kataka’s youtube channel is now sadly inactive, but contains many videos of the device used in various scenarios, such as lifting chairs and cupboards. We’re impressed with the amount of load the device can support. When used in scissor lifts, it also offers the unique advantage of a flat force/torque curve.

Most records of the device online are roughly a decade old. Though numerous prototypes were made, and a patent was issued, it seems the mechanism never took off or saw mainstream use. We wonder if, with more recognition and the advent of 3D printing, we might see the design crop up in the odd maker project.

That’s right, 3D printed linear actuators aren’t as bad as you might imagine. They’re easy to make, with numerous designs available, and can carry more load than you might think. That said, if you’re building, say, your own flight simulator, you might have to cook up something more hefty.

Many thanks to [Keith] for the tip, we loved reading about this one!

Frances Allen Optimised Your Code Without You Even Knowing

Ben James — Tue, 25 Aug 2020 17:01:28 +0000

In 2020, our digital world and the software we use to create it are a towering structure, built upon countless layers of abstraction and building blocks — just think about all the translations and interactions that occur from loading a webpage. Whilst abstraction is undoubtedly a great thing, it only works if we’re building on solid ground; if the lower levels are stable and fast. What does that mean in practice? It means low-level, compiled languages, which can be heavily optimised and leveraged to make the most of computer hardware. One of the giants in this area was Frances Allen, who recently passed away in early August. Described by IBM as “a pioneer in compiler organization and optimization algorithms,” she made numerous significant contributions to the field.

Early Days

Via Wikimedia

" data-medium-file="https://hackaday.com/wp-content/uploads/2020/08/frances_allen.jpg?w=400" data-large-file="https://hackaday.com/wp-content/uploads/2020/08/frances_allen.jpg?w=625" tabindex="0" role="button" class=" wp-image-426990" src="https://hackaday.com/wp-content/uploads/2020/08/frances_allen.jpg?w=400" alt="" width="350" height="350" srcset="https://hackaday.com/wp-content/uploads/2020/08/frances_allen.jpg 2912w, https://hackaday.com/wp-content/uploads/2020/08/frances_allen.jpg?resize=250,250 250w, https://hackaday.com/wp-content/uploads/2020/08/frances_allen.jpg?resize=400,400 400w, https://hackaday.com/wp-content/uploads/2020/08/frances_allen.jpg?resize=625,625 625w, https://hackaday.com/wp-content/uploads/2020/08/frances_allen.jpg?resize=1536,1536 1536w, https://hackaday.com/wp-content/uploads/2020/08/frances_allen.jpg?resize=2048,2048 2048w" sizes="(max-width: 350px) 100vw, 350px" />

Via Wikimedia

Trained as a maths teacher, Allen worked at a high school in New York for two years after graduating. She went back to complete a Masters in mathematics and was recruited by IBM Research on campus. Though planning to only stay long enough to pay off her debt and quickly return to teaching, she found herself staying with IBM for the rest of her career, even becoming the first female IBM fellow in 1989.

Allen’s first role at IBM was teaching internal engineers and scientists how to use FORTRAN — apparently a difficult sell to people who at the time were used to programming in assembly, which did just fine thank you very much. In an interview, Allen talks about the resistance from scientists who thought it wasn’t possible for a compiled language to produce code that was good enough.

Img via IBM

" data-medium-file="https://hackaday.com/wp-content/uploads/2020/08/ibm-stretch.jpg?w=400" data-large-file="https://hackaday.com/wp-content/uploads/2020/08/ibm-stretch.jpg?w=784" tabindex="0" role="button" class="wp-image-427140 size-medium" src="https://hackaday.com/wp-content/uploads/2020/08/ibm-stretch.jpg?w=400" alt="" width="400" height="319" srcset="https://hackaday.com/wp-content/uploads/2020/08/ibm-stretch.jpg 2847w, https://hackaday.com/wp-content/uploads/2020/08/ibm-stretch.jpg?resize=250,199 250w, https://hackaday.com/wp-content/uploads/2020/08/ibm-stretch.jpg?resize=400,319 400w, https://hackaday.com/wp-content/uploads/2020/08/ibm-stretch.jpg?resize=784,625 784w, https://hackaday.com/wp-content/uploads/2020/08/ibm-stretch.jpg?resize=1536,1225 1536w, https://hackaday.com/wp-content/uploads/2020/08/ibm-stretch.jpg?resize=2048,1633 2048w" sizes="(max-width: 400px) 100vw, 400px" />

The Stretch supercomputer (via IBM)

After teaching, Allen began working on the compiler for a 100 kW “supercomputer” called Stretch. With 2048 kB of memory, the goal of Stretch was to be 100 times faster than any other system available at the time. Though this ultimately failed (to the dismay of a few clients, one finding Stretch took 18 hours to produce their 24 hour weather forecast), it caught the attention of the NSA.

Because of this, IBM designed a coprocessor addon, Harvest, specifically for codebreaking at the NSA. Harvest ended up being larger than Stretch itself, and Allen spent a year leading a team inside the NSA, working on highly classified projects. The team didn’t find out many things about what they were working on until they were leaked to the press (it was spying on the Soviet Union — no prizes for guessing).

Engineers with Tractor tapes for Harvest

An engineering feat, Harvest used a unique streaming architecture for code-breaking: information loaded onto IBM Tractor tape was continuously fed into memory, processed and output in real time, with perfectly synchronised IO and operations. Harvest could process 3 million characters a second and was estimated by the NSA to be 50-200 times faster than anything else commercially available. The project was extremely successful and was used for 14 years after installation, an impressive feat given the pace of technological advancement at the time.

Speed is of the Essence

The success of the project was in large part due to Allen’s work on the optimisations performed by its compiler. Compiler optimisations are magic. Some of us think of compilers as simple “source code in, machine code out” boxes, but much of their complexity lies in the entirely automatic suite of optimisations and intermediate steps they use to ensure your code runs as swiftly as possible. Of course, this was important for the limited hardware at the time, but the techniques that Allen helped develop are present in modern compilers everywhere. The New York Times quotes Graydon Hoare (the creator of Rust and one of today’s most famed compiler experts) as saying that Allen’s work is in “every app, every website, every video game or communication system, every government or bank computer, every onboard computer in a car or aircraft”.

So what do compiler optimisations actually look like? Allen wrote many influential papers on the subject, but “A catalogue of optimizing transformations” which she co-authored with John Cocke in 1972 was particularly seminal. It aims to “systematize the potpourri of optimizing transformations that a compiler can make to a program”. It has been said that compilers that implement just the eight main techniques from this paper can achieve 80% of best-case performance. Here are some of the most basic ideas:

Procedure integration: replacing calls to subprocedures with inline code where possible, avoiding saving/restoring registers
Loop unrolling: flattening loops by writing out statements explicitly, avoiding unnecessary comparison conditions
CSE (common subexpression elimination): eliminating redundant computations which calculate values already available
Code Motion: moving subexpressions out of loops where it is safe to do so
Peephole optimisation: replacing known instruction combinations with more efficient variants

Some of these might seem obvious to us now, but formalising and standardising these ideas at the time had great impact.

Parallelism

Allen’s last major project for IBM was PTRAN, the Parallel Translator. This was a system for automatic parallelism, a special type of compiler optimisation. The aim was to take programs that weren’t written with parallelism in mind and translate them for execution on parallel architectures. This concept of taking sequentially written code and automatically extracting features from it to be run in parallel led to the extensive use of dependency graphs, now a standard representation in compilers. One of the recurring themes throughout Allen’s career was her ability to take highly technical problems and abstract them into maths — often graphs and sets — and solve them precisely. On this project Allen led a team of young engineers, churning out industry leading papers and compilers for parallelism for 15 years.

IBM Academy and Beyond

In 1995 Allen became president of the IBM Academy, an internal steering group of IBM’s very top technical minds. She was able to use the position to advocate in two areas: mentoring and women in technology. In interviews, she frequently talked about how she didn’t have a mentor, and how important it is for people starting out in tech. Her visibility as an expert in the field inspired others — at its peak in the 70s/80s, half of the IBM experimental compiler group were women. Her advocacy for women in tech never ceased, even as she described a drop in participation after the early days of computing:

Later, as computing emerged as a specialized field, employers began to require engineering credentials, which traditionally attracted few women. But the pendulum is swinging back as women enter the field from other areas such as medical informatics, user interfaces and computers in education.

In 2006 Allen received the Turing Award (considered the Nobel Prize for computing) — the first woman to do so.

So the next time you fire up gcc, write anything in a language above assembly, or even use any software at all, remember that Frances Allen’s ideas are invisibly at play.

Ask Hackaday: Why Did GitHub Ship All Our Software Off To The Arctic?

Ben James — Wed, 29 Jul 2020 14:01:14 +0000

If you’ve logged onto GitHub recently and you’re an active user, you might have noticed a new badge on your profile: “Arctic Code Vault Contributor”. Sounds pretty awesome right? But whose code got archived in this vault, how is it being stored, and what’s the point?

They Froze My Computer!

On February 2nd, GitHub took a snapshot of every public repository that met any one of the following criteria:

Activity between Nov 13th, 2019 and February 2nd, 2020
At least one star and new commits between February 2nd, 2019 and February 2nd, 2020
250 or more stars

Then they traveled to Svalbard, found a decommissioned coal mine, and archived the code in deep storage underground – but not before they made a very cinematic video about it.

How It Works

Source: GitHub

For the combination of longevity, price and density, GitHub chose film storage, provided by piql.

There’s nothing too remarkable about the storage medium: the tarball of each repository is encoded on standard silver halide film as a 2d barcode, which is distributed across frames of 8.8 million pixels each (roughly 4K). Whilst officially rated for 500, the film should last at least 1000 years.

You might imagine that all of GitHub’s public repositories would take up a lot of space when stored on film, but the data turns out to only be 21TB when compressed – this means the whole archive fits comfortably in a shipping container.

Each reel starts with slides containing an un-encoded human readable text guide in multiple languages, explaining to future humanity how the archive works. If you have five minutes, reading the guide and how GitHub explains the archive to whoever discovers it is good fun. It’s interesting to see the range of future knowledge the guide caters to — it starts by explaining in very basic terms what computers and software are, despite the fact that de-compression software would be required to use any of the archive. To bridge this gap, they are also providing a “Tech Tree”, a comprehensive guide to modern software, compilation, encoding, compression etc. Interestingly, whilst the introductory guide is open source, the Tech Tree does not appear to be.

But the question bigger than how GitHub did it is why did they do it?

Why?

The mission of the GitHub Archive Program is to preserve open source software for future generations.

GitHub talks about two reasons for preserving software like this: historical curiosity and disaster. Let’s talk about historical curiosity first.

There is an argument that preserving software is essential to preserving our cultural heritage. This is an easily bought argument, as even if you’re in the camp that believes there’s nothing artistic about a bunch of ones and zeros, it can’t be denied that software is a platform and medium for an incredibly diverse amount of modern culture.

GitHub also cites past examples of important technical information being lost to history, such as the search for the blueprints of the Saturn V, or the discovery of the Roman mortar which built the Pantheon. But data storage, backup, and networks have evolved significantly since Saturn V’s blueprints were produced. Today people frequently quip, “once it’s on the internet, it’s there forever”. What do you reckon? Do you think the argument that software (or rather, the subset of software which lives in public GitHub repos) could be easily lost in 2020+ is valid?

Whatever your opinion, simply preserving open source software on long timescales is already being done by many other organisations. And it doesn’t require an arctic bunker. For that we have to consider GitHub’s second motive: a large scale disaster.

If Something Goes Boom

We can’t predict what apocalyptic disasters the future may bring – that’s sort of the point. But if humanity gets into a fix, would a code vault be useful?

Firstly, let’s get something straight: in order for us to need to use a code archive buried deep in Svalbard, something needs to have gone really, really, wrong. Wrong enough that things like softwareheritage.org, Wayback Machine, and countless other “conventional” backups aren’t working. So this would be a disaster that has wiped out the majority of our digital infrastructure, including worldwide redundancy backups and networks, requiring us to rebuild things from the ground up.

This begs the question: if we were to rebuild our digital world, would we make a carbon copy of what already exists, or would we rebuild from scratch? There are two sides to this coin: could we rebuild our existing systems, and would we want to rebuild our existing systems.

Tackling the former first: modern software is built upon many, many layers of abstraction. In a post-apocalyptic world, would we even be able to use much of the software with our infrastructure/lower-level services wiped out? To take a random, perhaps tenuous example, say we had to rebuild our networks, DNS, ISPs, etc. from scratch. Inevitably behavior would be different, nodes and information missing, and so software built on layers above this might be unstable or insecure. To take more concrete examples, this problem is greatest where open-source software relies on closed-source infrastructure — AWS, 3rd party APIs, and even low-level chip designs that might not have survived the disaster. Could we reimplement existing software stably on top of re-hashed solutions?

The latter point — would we want to rebuild our software as it is now — is more subjective. I have no doubt every Hackaday reader has one or two things they might change about, well, almost everything but can’t due to existing infrastructure and legacy systems. Would the opportunity to rebuild modern systems be able to win out over the time cost of doing so?

Finally, you may have noticed that software is evolving rather quickly. Being a web developer today who is familiar with all the major technologies in use looks pretty different from the same role 5 years ago. So does archiving a static snapshot of code make sense given how quickly it would be out of date? Some would argue that throwing around numbers like 500 to 1000 years is pretty meaningless for reuse if the software landscape has completely changed within 50. If an apocalypse were to occur today, would we want to rebuild our world using code from the 80s?

Even if we weren’t to directly reuse the archived code to rebuild our world, there are still plenty of reasons it might be handy when doing so, such as referring to the logic implemented within it, or the architecture, data structures and so on. But these are just my thoughts, and I want to hear yours.

Was This a Useful Thing to Do?

The thought that there is a vault in the Arctic directly containing code you wrote is undeniably fun to think about. What’s more, your code will now almost certainly outlive you! But do you, dear Hackaday reader, think this project is a fun exercise in sci-fi, or does it hold real value to humanity?

Does PHP Have A Future, Or Are Twenty Five Years Enough?

Ben James — Mon, 29 Jun 2020 14:01:00 +0000

In June, 1995, Rasmus Lerdorf made an announcement on a Usenet group. You can still read it.

Today, twenty five years on, PHP is about as ubiquitous as it could possibly have become. I’d be willing to bet that for the majority of readers of this article, their first forays into web programming involved PHP.

Announcing the Personal Home Page Tools (PHP Tools) version 1.0.

These tools are a set of small tight cgi binaries written in C.

But no matter what rich history and wide userbase PHP holds, that’s no justification for its use in a landscape that is rapidly evolving. Whilst PHP will inevitably be around for years to come in existing applications, does it have a future in new sites?

Before we look to the future, we must first investigate how PHP has evolved in the past.

The Beginnings

Rasmus Lerdorf initially created PHP as a way to track users who visited his online CV. Once the source code had been released and the codebase had been re-written from scratch a sizeable number of times, PHP was enjoying some popularity, reportedly being installed on 1% of all domains by 1998. At this point, the language looked nothing like we know today. It was entirely written within , with syntax noticeably different to modern versions.

Enter Zeev Suraski and Andi Gutmans, who were using PHP to try to build a business but found it lacking in features. Collaborating with Rasmus, PHP was once again re-written and released as PHP 3.0. Now we’re getting somewhere, with PHP 3 installed on an estimated 10% of domains at the time. This is also the point where the meaning of PHP changed from Personal Home Page to everyone’s favorite recursive acronym, “PHP: Hypertext Preprocessor”. This version and period is generally seen as the time in which PHP cemented its future status. Between PHP 3 and 4, phpMyAdmin was created, Zeev and Andi mashed their names together and founded the PHP services company Zend, and the venerable elephant logo appeared.

The rest is history: shortly after PHP 4 came Drupal; in 2003 we got WordPress; then in 2004 along came a student at Harvard named Mark.

Facebook and PHP

Facebook famously started as a PHP site. But when thousands of users became millions, and millions were beginning to look like billions, there were growing pains.

In particular, PHP was (and still is) a scripting language. Great for developer productivity, not so great for resource efficiency. So in 2008, Facebook began work on HipHop for PHP, a transpiler. Very simply, it parsed PHP, transpiled it to C++, then compiled the resulting C++ into x64. This was no mean feat given that PHP is weakly-typed and dynamic. But the results speak for themselves: a 50% reduction in CPU load.

I’m sure you’re imagining the horror of working as a developer at Facebook using this process. Making a change to the PHP code, running the transpiler, then compiler, drumming your fingers, running the executable and finding the problem you need to go back and fix. That’s a pretty long iteration cycle, which is why Facebook also developed HPHPi, an interpreter that does the same job as the transpiler/compiler (HPHPc), but just to be used for development. As you can imagine, keeping the two projects in sync was an almighty headache, so in 2011 they developed HHVM, the HipHop Virtual Machine.

HHVM is a PHP runtime. It uses JIT (just-in-time) compilation to provide the best of both worlds. It’s pretty cool, and you can read more in Facebook’s own blog post if you’re interested. The next big step came in 2014, with the invention of Hack, a language specifically built for HHVM. It’s both a superset and subset of PHP, adding optional type annotations and extra features such as asynchronous architecture. It also helps make HHVM’s JIT more efficient by enabling it to optimise with confidence using the specified type hints. Soon, new code at Facebook was written in Hack, with existing code being converted over time. Both Hack and HHVM are open source, and actively maintained today.

Does the fact that Facebook found PHP in its native form unusable at scale mean that it’s a badly engineered language? No, I don’t think so. I don’t believe any of the options which were around at the time had been created for the scale or specifics which Facebook required. However, that doesn’t stop people using it against PHP.

The Hate

Within the wider software community, as PHP became larger it inevitably drew fire from a growing group of cynics. Except, PHP objectively speaking gets more hate than most other languages. According to the recent 2020 Stack Overflow Developer Survey, PHP is the sixth most dreaded language. Why?

I don’t want to get into technical minutiae here, but if you do, PHP: a fractal of bad design, is the bible blog post for PHP haters. Written in 2012, some problems it mentions have since been fixed but many haven’t. (eg: why is there no native async support in 2020?)

Stack Overflow 2020 survey of most dreaded languages

" data-medium-file="https://hackaday.com/wp-content/uploads/2020/06/2020-Stack-Overflow-Developer-Survey-dreaded-languages.png?w=400" data-large-file="https://hackaday.com/wp-content/uploads/2020/06/2020-Stack-Overflow-Developer-Survey-dreaded-languages.png?w=514" tabindex="0" role="button" class="wp-image-419509 size-medium" src="https://hackaday.com/wp-content/uploads/2020/06/2020-Stack-Overflow-Developer-Survey-dreaded-languages.png?w=400" alt="" width="400" height="383" srcset="https://hackaday.com/wp-content/uploads/2020/06/2020-Stack-Overflow-Developer-Survey-dreaded-languages.png 514w, https://hackaday.com/wp-content/uploads/2020/06/2020-Stack-Overflow-Developer-Survey-dreaded-languages.png?resize=250,239 250w, https://hackaday.com/wp-content/uploads/2020/06/2020-Stack-Overflow-Developer-Survey-dreaded-languages.png?resize=400,383 400w" sizes="(max-width: 400px) 100vw, 400px" />

Stack Overflow 2020 survey of most dreaded languages

I think more general problems lie in the philosophy of the language. It’s a tool for a fairly narrow domain, implemented in a complex way. In an ideal world, if an application must be complex, the complexity should be visible to the developer in user code, not the language itself. You don’t need a complex tool to create complex projects. When I say PHP is complex, I’m not saying it’s hard for beginners to use (quite the opposite in fact), I’m saying it has inconsistent naming conventions and a lot of very specific functions, both make it easy to create errors which aren’t caught until runtime. But are these simply properties of PHP’s age, to be expected? Whilst perhaps a factor, it’s certainly isn’t the reason for the hate. After all, Python was created in 1989, six years before PHP, and comes in as the 3rd most loved language in the Stack Overflow survey, as well as being one of the fastest-growing languages today.

When it comes to security, there’s some debate as to whether the above-average number of vulnerabilities on PHP sites are the fault of the language or the site developers. On the one hand, a coding language designed to appeal to a broad range of people including non-programmers, who produce sites with code hacked together from decades-old tutorials will always have issues, no matter the merit of the language itself. On the other hand, PHP has attempted to fix basic security issues in questionably convoluted ways, for example fixing SQL injection first with escape_string(), then fixing vulnerabilities with that by adding real_escape_string(), then adding addslashes(), mysql_escape_string(), pg_escape_string() and so on. Add this to its labyrinthine error/exception handling (yes, errors and exceptions are different), and it’s easy to make mistakes if you’re not well versed in the nuances of the language. The amount of sites running old, unsupported versions of PHP past their End Of Life continues to be staggeringly large, so PHP sites will continue to be low-hanging fruits for hackers for years to come.

Be this as it may, I’m not convinced that the problems the language has are as large as many would make out. Despite there being reasonable grounds for complaint about PHP, it seems to me that much of the stigma is absorbed because it’s fashionable, rather than reasoned by individuals.

The Future

This author is well aware of the irony of typing critiques of the language into a page with post.php in the address bar. But this isn’t about existing sites. I don’t think even the most ardent pitchfork mobs would suggest that we re-write all existing sites made with PHP. The question is, in June 2020, if I want to create a new website, should PHP be an option I consider?

There is no doubt that the current web development trends are setting a course for Single Page Applications – where your browser never reloads, but navigations occur from Javascript re-rendering the page using data from lightning-fast API calls (eg: browsing GitHub or Google Drive). There is an ever-growing ecosystem of Javascript libraries, frameworks, and tools for building reactive and performant applications in the browser — React and Vue being the most popular.

Ultimately, PHP is for server-side rendering. That’s fine for most sites and the best option for many. But if you’re building something new in 2020, you have to accept that this brings limitations. And whilst PHP-style server-side rendering isn’t dead (did everyone forget about SEO?), modern sites are likely to be Isomorphic, that is, able to render the same Javascript on server and client, using frameworks such as Next.js (for React) or Nuxt.js (for Vue), putting PHP out of business on the server.

But we can’t ignore the fact that PHP is evolving too. Laravel, self-publicised as “The PHP Framework for web artisans”, provides an MVC architecture for creating PHP applications safely and quickly. It’s held in high esteem by the community, and enjoys active and rapid development. Additionally, PHP 8 is coming out later this year, with a bunch of new features (many of which will look familiar from the Facebook section), such as a JIT, Union types, and improved errors.

So, happy twenty-fifth birthday PHP, you are endlessly quirky and will undoubtedly endure for many more years. You’ve empowered many people and played a key role in the rise of the web. But don’t be too upset if people are looking elsewhere for the future, it’s 2020 after all.

What Does GitHub’s npm Acquisition Mean For Developers?

Ben James — Thu, 26 Mar 2020 17:00:58 +0000

Microsoft’s open-source shopping spree has claimed another victim: npm. [Nat Friedman], CEO of GitHub (owned by Microsoft), announced the move recently on the GitHub blog.

So what motivated the acquisition, and what changes are we likely to see as a result of it? There are some obvious upsides and integrations, but these will be accompanied by the usual dose of skepticism from the open-source community. The company history and working culture of npm has also had its moments in the news, which may well have contributed to the current situation. This post aims to explore some of the rationale behind the acquisition, and what it’s likely to mean for developers in the future.

What is npm?

Many Hackaday readers will be familiar with npm (Node Package Manager), one of the backbones of the open-source JavaScript community. If you’ve played around with any kind of web or JavaScript project recently, you’ve probably used npm to install and manage dependencies, with it currently servicing 75 billion downloads a month. It is the most popular package manager for JavaScript, and enables re-use and sharing of modules throughout the JavaScript community; it’s what’s responsible for the node_modules folder in your project munching all your disk space.

At its most basic level, npm allows you to download and install JavaScript modules from the online registry, either individually, by running for example, npm install express, or installing from a package.json file, which contains details of all a project’s dependencies. If you want to read more about how npm manages dependencies and how its parallels with the Node Module Loader allow some neat simultaneous version loading, npm have written a nice explainer here.

npm is certainly not without criticism or competitors, but most developers are familiar with basic use, and I think would agree that it’s played a vital role in the growth of the JavaScript ecosystem, whether that’s new frameworks, niche modules, Typescript, polyfilling or testing.

What is its history?

npm was started in 2009, by [Isaac Schlueter], who details in a blog post his thoughts on the recent acquisition.

npm Inc is a company, not an entirely open source project. They provide the open-source registry as a free service, and charge a fee for private, commercial packages. It has previously been rumored that there was trouble making ends meet from low quantity, low fee license sales.

As a business, it has previously received venture capital funding, and also brought in new executive management to attempt to dramatically increase revenues. Under new management, numerous employees were dismissed, with many claiming they were dismissed unfairly. Further employees resigned voluntarily, raising questions about company culture and the stability/longevity of npm. We hope that the acquisition by GitHub will relieve the financial pressure on the company and allow it to resolve these issues whilst serving the open-source community more effectively, under stable conditions.

Enter GitHub

In npm’s blog post, [Isaac Schlueter] talks about how an acquisition by GitHub has been on the cards for a while, even going so far as recounting asking the GitHub product lead [Shanku Niyogi] why on earth they hadn’t already bought npm.

Why did it seem so obvious? With the source for so many npm packages hosted on GitHub, and GitHub launching the moderately popular GitHub Packages, it seemed only natural that both could benefit from tighter integration. So what might we see in the future?

Many users of GitHub will be familiar with its automated security alerts for vulnerabilities. When your project contains a dependency that has had a security vulnerability disclosed, GitHub will send you an automated email/notification containing the level of risk, the affected code, and an automatically generated pull request which fixes the issue. This is a pretty neat feature, and this author has been glad of it on numerous occasions. While this works well in theory, in complex projects with many interdependent packages, I’ve found that the automated security fixes can sometimes awkwardly bump package versions without fully propagating through the dependency tree, requiring a lot of manual hassle to fix.

I’m very hopeful that this acquisition can bring about a security update experience with much tighter integration with npm, whether that’s making the automated updates more intelligent and frictionless for the developer, or making it easier for maintainers to disclose vulnerabilities and release automated GitHub patches faster. In GitHub’s blog post announcing the acquisition, they state their commitment to using the opportunity to improve open source security, and their aim to “trace a change from a GitHub pull request to the npm package version that fixed it”.

As far as GitHub Packages is concerned, the aim is to move all private packages from npm’s paid service to GitHub Packages, with the view of making npm an entirely public package repository.

Even with these obvious benefits in mind, there is still some uncertainty as to whether the move was driven and initiated by GitHub for these reasons, or whether it’s because of the value it provides to Microsoft as a whole instead.

What npm means to Microsoft

Microsoft’s appetite for open source is growing. It seems like yesterday that we wrote about Microsoft acquiring GitHub, and despite all the speculation on its future at the time, it only seems to have grown stronger with the extra resources available. Since the acquisition, we’ve notably seen the release of free unlimited private repos, GitHub Security Lab and GitHub Actions, all welcome and overdue features that have been well-received in the open-source community. GitHub mobile apps for iOS and Android have also been released in the past few days, attracting a few raised eyebrows for not being open source.

A cynic might say that acquiring npm is a cheap way of Microsoft trying to win some sentiment from the open-source community, and of course, that may be a factor, but the move will have technical benefits for them too. Microsoft are increasingly big users of JavaScript, and are invested in the ecosystem. Notably, they’ve created Typescript, and they need a stable and solid package repository as much as any group of developers.

It’s yet to be determined whether npm will have any integration with any of Microsoft’s offerings, or if it’s purely of use to GitHub. At this stage, it’s hard to say, though it’s telling that GitHub announced the move along with their strategy, whilst Microsoft has stayed quiet on the topic.

Conclusion

I don’t think anyone can deny that the open-source JavaScript development experience has the potential to become significantly smoother when the largest source repository becomes more integrated with the largest package repository. It remains to be seen how these improvements are implemented, whether they’re made available for public/private users, and how kind they’ll be to open-source competitors, but only time will tell.

Continuous Integration: What It Is And Why You Need It

Ben James — Mon, 06 Jan 2020 15:00:48 +0000

If you write software, chances are you’ve come across Continuous Integration, or CI. You might never have heard of it – but you wonder what all the ticks, badges and mysterious status icons are on open-source repositories you find online. You might hear friends waxing lyrical about the merits of CI, or grumbling about how their pipeline has broken again.

Want to know what all the fuss is about? This article will explain the basic concepts of CI, but will focus on an example, since that’s the best way to understand it. Let’s dive in.

What is CI anyway?

The precise definition of Continuous Integration refers to the practice of software developers frequently checking in their code, usually multiple times a day in a commercial setting, to a central repository. When the code is checked in, automated tests and builds are run, to verify the small changes which have been made. This is in preference to working on a ginormous slab of code for a week, checking it in, and finding out it fails a large number of tests, and breaks other people’s code.

Whilst this is a valid definition, colloquially CI has become synonymous with the automation part of this process; when people refer to CI, they are often referring to the tests, builds, and code coverage reports which run automatically on check-in.

Additionally, CI is often lumped together with its sister, Continuous Deployment (CD). CD is the practice of deploying your application automatically: as soon as your code has been pushed to the correct branch and tests have passed. We’ll talk more about this soon.

Case study – a simple API

I’m going to save any more explanation or discussion of the merits of CI until after we’ve seen an example, because this will make it easier to picture what’s going on.

The aim of this example is to make a very simple Python application, then use CI to automatically test it, and CD to automatically deploy it. We’re going to use GitLab CI, because it’s a neat, integrated solution that is easy to setup. You can view the finished repository containing all the files here.

Let’s start by creating a Python file containing our main application logic. In this case, it’s some string processing functions.

""" web/logic.py. Contains main application code. """

def capitalise(input_str):
    """Return upper case version of string."""
    return input_str.upper()

def reverse(input_str):
    """Return reversed string."""
    return input_str[::-1]

Let’s also add some extremely basic tests for this code:

""" test_logic.py. Tests for main application code. """

from web import logic

def test_capitalise():
    """Test the `capitalise` function logic."""
    assert logic.capitalise("hackaday") == "HACKADAY"

def test_reverse():
    """Test the `reverse` function logic."""
    assert logic.reverse("fresh hacks") == "skcah hserf"
    assert logic.reverse("racecar") == "racecar"

Ok, now that we’ve made our main application code, let’s expose it over an API. We’ll use Flask for this. Don’t worry about meticulously reading this, it’s just here to serve as an example, and is shown here for context.

""" web/api.py. Expose logic functions as API using Flask. """
from flask import Flask, jsonify

import web.logic as logic

app = Flask(__name__)


@app.route('/api/capitalise/', methods=['GET'])
def capitalise(input_str):
    """ Return capitalised version of string. """
    return jsonify({'result': logic.capitalise(input_str)})


@app.route('/api/reverse/', methods=['GET'])
def reverse(input_str):
    """ Return reversed string. """
    return jsonify({'result': logic.reverse(input_str)})


if __name__ == '__main__':
    app.run()

Note that we should test the API as well (and Flask has some nice ways to do this), but for conciseness, we won’t do this here.

Now that we have an example application setup, let’s do the part we’re all here for and add a CI/CD pipeline to GitLab. We do this by simply adding a .gitlab-ci.yml file to the repository.

In this explanation we’re going to walk through the file section by section, but you can view the full file here. Here’s the first few lines:

image: python:3

stages:
    - analyse
    - test
    - deploy

This sets the default Docker image to run jobs in (Python 3 in this case), and defines the three stages of our pipeline. By default, each stage will only run once the previous stage has passed.

pylint:
    stage: analyse
    script:
        - pip install -r requirements.txt
        - pylint web/ test_logic.py

This is the job for our first stage. We run pylint as an initial static analyser on the code to ensure correct code formatting and style. This is a useful way to enforce a style guide and statically check for errors.

pytest:
    stage: test
    script:
        - pip install -r requirements.txt
        - pytest

This is our second stage, where we run the tests we wrote earlier, using pytest. If they pass, we continue to our final stage: deployment.

staging:
    stage: deploy
    script:
        - apt-get update -qy && apt-get install -y ruby-dev
        - gem install dpl
        - dpl --provider=heroku --app=hackaday-ci-staging --api-key=$HEROKU_API_KEY  

production:
    stage: deploy
    only:
    - master
    script:
        - apt-get update -qy && apt-get install -y ruby-dev
        - gem install dpl
        - dpl --provider=heroku --app=hackaday-ci-prod --api-key=$HEROKU_API_KEY

Our aim here is to deploy the API onto some kind of server, so I’ve used Heroku as the platform, authorised with an API key.

This last stage is slightly different from the others because it contains two jobs that deploy to two places: staging and production. Note that we deploy to staging on any commit, but we only deploy to production when we push to or merge into master. This means that we can check, test and use our live app in staging after any code change, but the production app isn’t affected until our code is merged into master. (In a larger project, it often makes more sense to deploy to staging on master and only deploy to production when a commit is tagged.)

And that’s it! In less than 40 lines we’ve defined a completely automated system to check and deploy our code. We are rewarded by our pipeline showing up in GitLab as below:

Additionally, the .gitlab-ci.yml configuration file which specifies what to automate is usually also version-controlled, so that if the CI pipeline evolves, it evolves alongside the relevant version of your code.

Why it’s useful

All manner of tasks can be automated using CI, and can allow you to catch errors early and fix them before they propagate technical debt in the codebase.

Common tasks for larger Python projects might be to test our code for compatibility with different Python versions, build a Python module as a wheel, and/or push it to PyPi. For projects using compiled languages, you could automatically build your binaries for all your target platforms. For web development, it’s easy to see the benefit of automatically deploying new code on a server once certain conditions have been met.

Furthermore, part of the reason that CI is so powerful is its close relation to version control. Every time that code is pushed to any branch in a repository, tests and analysis can run, which means that people who control master or protected branches can easily see if code is safe to merge in.

Indeed, whilst CI is most satisfying when the pipeline is full of ticks, it is most useful when it looks like this:

This means that the tests failed, and as a result, the broken code was not deployed. People can clearly see not to merge this code into important branches.

Conclusions: do you need CI?

CI/CD is definitely more useful in some cases than others. But if you’re writing any code at all, you can save yourself time by writing tests for it. And if you have tests, why not run them automatically on every commit?

I can personally say that every time I’ve set up a CI pipeline, not only has it saved me time, but at some point or another it got me out of a scrape by catching broken code. I’d wager it will do the same for you.