Running Concrete in a serverless environment

mesuva · January 8, 2023, 9:53pm

Hi everyone,

First thing, take a quick glance at this site:
https://d39v4abs5ngx4v.cloudfront.net

There’s your pretty default V9 install, nothing unusual, the search works, etc.

There might be a few odd pages that don’t work, but overall it should be fairly fast and seem like a normal install.

I can log into this and edit it fine, add pages, edit blocks, visit the dashboard, etc.

But the interesting this is that this test copy is not running on a server as such, it’s running in a ‘serverless’ configuration.

If you’re not familiar with what ‘serverless’ is/means, it’s where instead of you spinning up a server and running something like Apache/Nginx, and having that server constantly running waiting for requests, you instead deploy your code to a service that effectively runs your code only when a request comes in.

In this particular case this site is running on Amazon Web Services, using their ‘Lambda’ service to actually run the PHP (there’s more to it than that, but that’s the main thing in play here).

The benefits then are:

you don’t actually have a server to monitor and maintain. It won’t go down due to lack of disk space, crashed processes, filesystem corruptions, etc
it can handle a large number of requests simultaneously, it scales automatically (and you can also have scaling databases too)
you have effectively unlimited storage space
security is arguably higher, as there’s no server to exploit
costs can be quite low, as you only pay for the time your code is running

Now all these points have caveats, and there are quite a number of potential drawbacks, but that’s the overview at least.

For a PHP framework like Laravel, it works great serverless, and we’ve had a lot of success running some mission-critical apps this way. I’ve actually moved some from being on normal servers over to serverless and haven’t looked back.

But I’ve often wondered if Concrete could be run this way, especially when Concrete and Laravel share a lot of the same DNA design and code wise. Hence setting up the little experiment.

So how does this work?

There’s a framework just called ‘Serverless’, which is where you describe what you’d like to run in a yaml script, resources like S3 buckets, and even databases, and it then automates the process of setting this up somewhere like AWS.

Then there’s a project called Bref, which provides a PHP runtime for Lambda. This is effectively a plugin for Serverless, and with both together allows you to run PHP scripts serverless. (if you are interested in serverless PHP, start here, and read the documentation thoroughly)

And finally I’ve used a related project called Lift, which further makes automating some of the common patterns of setting up a website easier. It was after realising what Lift did that I realised it might be possible to actually run Concrete serverless.

The final script is actually only about 50 lines, and most of that is copy and paste config that is quite generic and easy to read.

What it does when it deploys:

when the deployment is triggered it packages up all of Concrete’s code into a zip, pushes it up to Amazon’s Lambda and API gateway, configuring Bref to actually run the PHP code.
Separately collects static files, like in /concrete/js or /concrete/images, and sends them up to an S3 bucket
set up Amazon’s Cloudfront (which is like a CDN), to serve the static files from S3, but send other web requests to Lambda to be run (so actually like a normal webserver)

There’s also a database set up for this, but that’s not serverless, just created separately.
I’ve not added a custom domain here, but that’s not hard to do with Cloudfront.

Emailing should be quite easy to set up, as it’s just a case of using Amazon’s emailing system.

However, there’s quite a lot of issues to resolve:

Think of Lambdas like little instant virtual machines that spin up, run your code, then disappear. They don’t have a persistent file system, meaning you can’t write anything to them at all. The only place you can write is to /tmp, just for things like processing files, but that disappears at the end of the request. So this means that I’ve had to configure Concrete’s cache to write to /tmp instead of in the /application/files directory (not too tricky)
Because sessions write to the filesystem by default, I’ve I also had to change sessions to be database driven (very easy, fortunately)
Concrete’s File Manager also expects to write files and thumbnails to /application/files, so you have to install an S3 driver to read and write to an S3 bucket instead. Ordinarily this shouldn’t be a problem, but my S3 add-on (which admittedly was written a while ago) seems to still want to write to the filesystem for some reason, so I haven’t got this working yet. In theory it should work, but may take some fiddling.

So then the hard stuff (that might just not be solvable)

Any configuration that writes to /application/config simply doesn’t work, as it can’t write to those files (and if it did, it would disappear on the next request). Put another way, code is effectively frozen when you deploy an application. That’s great when you’ve got something like a Laravel application, where you don’t expect the code to change, but when config is stored in code, it then makes it impossible to be able to change settings without doing that somewhere else and triggering the serverless deployment again. The solution would be to store all overriding config in the database, but I don’t think that’s currently an option.
Concrete has a way to have environment specific config files, which is something I might take advantage of here (but won’t solve the problem above)
Not being able to write to the filesystem also means you can’t install new add-ons easily - you have to push them up via the serverless deployment, and then they might install ok, but many will do things like write their own config or trigger the creation of Doctrine files in /config/doctrine, so you quickly hit issues. Turning off Doctrine caching might make this easier, but it’s going to slow things down.
New packages you install are going to have their own static resources, js, css and image files, and they need to be specified in the serverless config and pushed up.
The sitemap.xml file can’t be written to the server either.

What I’ve actually got is a local Concrete install, one that I first set up, turned all the caching off for, set up the specific configuration and pushed up.

So to make further changes, I have to make these changes locally, push those up and hope it still works.

Now with the challenges remaining to have everything working as expected there might be no real value in continuing with this experiment. It might be Concrete just isn’t quite suited to running serverless.

Perhaps I could imagine a set up where you have a staging copy on a traditional server, where you set things up, add new add-ons, etc, and you then deploy from that to a production serverless environment, where everything is very locked down and highly scalable.

But even if I’ve sort of hit a dead end, I thought this would still worth sharing, as it might give others some ideas, or simply find interesting.

Cheers
-Ryan

JohntheFish · January 11, 2023, 10:04am

Congratulations on setting up the serverless demo. It is an interesting experiment.

I tried submitting the contact form. Got the success screen. I wonder if that worked?

The remaining issues appear daunting. They suggest that code has to be designed for serverless from the start. Many of my addons use /application/config, so I expect they would fail.

mesuva · January 11, 2023, 10:38am

Yes, contact form worked - I can see your submission in the dashboard (emailing is off though, but I’m sure that would work if I hooked it up)

Different ways to save configs does appear to have been discussed here, Generated Overrides on Scaling Environments · Issue #7695 · concretecms/concretecms · GitHub
@andrew makes a suggestion it’s possible
I could imagine a custom Config driver being used to swap out the file reading and writing.

I do see Redis drivers through the codebase - that’s an approach with serverless, just not something I jump to due to the additional cost.

The sitemap I think could be addressed with an add-on that dynamically provides it.

There’s still the broader issue of being able to install stuff, but that’s actually a lesser issue in my mind.

The big picture of this though is that it’s all well and good to get the PHP running serverless, but it’s not really that special, especially if all we’re doing is then flogging a database harder. For truely large traffic requirements, a standard server with good caching is still likely to be better value for money compared to serverless equivalents.

I just potentially like the idea of being able to deploy Concrete this way for test sites, ones that might get bursts of traffic. Serverless doesn’t solve everything, it’s just another approach.

Korvin · January 13, 2023, 5:50pm

This is awesome, I personally love serverless stuff and I’m a big fan of Bref. I hadn’t heard of Lift but I do a lot with CDK so that’s pretty neat. I kinda wish it was building easier lambda deploy into CDK rather than integrating CDK with serverless but I understand why he’d do it that way.

I actually did this a couple years ago using Bref and encapsulating absolutely everything into lambda: https://lambdac5.p.kor.vin/ In the two years this has been up I haven’t had to pay a penny for it (Though it doesn’t really get traffic so it’s not a great measure if you have a high traffic site)

To make this work with only lambda I:

Used Bref’s runtime same as you
Serve js/css/image assets using PHP
Embed a sqlite database directly into the lambda zip

If you have a site that doesn’t actually need to store changes to state but does need dynamic server side functionality embedding everything like I did is an extremely cheap way to do it. If I were actually wanting this to be a production site I’d probably do two things differently:

Uploaded files would go into an EFS. I’d potentially also put config php files in there too, I’d just have to sync them out to the ephemeral filesystem before parsing them with PHP
Database would be a serverless RDS instance. I didn’t do this at the time because serverless v1 had a very slow ramp up

So then the hard stuff (that might just not be solvable)

I think there are some easy answers to these items

Config is ephemeral
- There’s a redis config driver in the core that makes it relatively easy to store config in redis
- EFS is also supported on lambda, but it’s slow for files that need to be accessed on every request. One could just store config in the EFS and eat the performance hit, or they could have the lambda sync config out from EFS into ephemeral storage when the lambda service starts.
- Environment variables are easy to set for lambdas so you could also just have config files like 'foo' => $_ENV['SOMETHING'] and just set SOMETHING = 'some value' in the lambda environment configuration
Adding new packages is hard
- If you’re doing this much you could consider using composer based concrete and avoiding using the marketplace’s autoinstall anyway
Packages have public assets that would need to be synced to s3
- This is only true if you require these assets be served by s3. Using PHP to serve them and putting the lambda behind cloudfront keeps things snappy and avoids the need to split public assets out
The sitemap can’t be written
- Sitemap files can just be written straight to - and served from - EFS along with any uploaded files.

For truely large traffic requirements, a standard server with good caching is still likely to be better value for money compared to serverless equivalents.

I’m not sure I agree with this. You certainly pay a premium for the right to use serverless but the savings you get from the following outweigh the cost for high traffic use cases I’ve seen.

Not paying for unused resources. Lambda and aurora serverless scale up and down based on traffic and you only pay for the scale you’re actively at. With a traditional server you are always paying the cost of peak traffic (Or you’re incurring downtime when that peak traffic hits)
Not paying time or money for someone to manage a server like running updates and maintaining compliance.
Not incurring downtime due to excessive traffic. There’s obviously a limit to everything and if you’re using lambda you should be smart about your limits but for the most part lambdas scale much faster and much more safely than spinning up ec2 replicas and managing load balancer pools.

andrew · January 16, 2023, 8:32pm

Sounds really interesting!

I know we do support Redis/alternative config options, we should try and dig up some code samples to share here as well as in the documentation site. I think people would find those interesting.

I think we ought to explore integrating our S3 storage location type into the core. We have a custom storage location based off of the one Mnkras wrote and put in the marketplace - much like Recaptcha these types of things probably ought to be easier to get started with.

I definitely think most of what you’re doing should err on the side of fewer updates in the live environment. That might mean checking in your config files and simply disallowing config from being changed from the Dashboard – not ideal in a site where the editors are less tech savvy but perhaps an option. Certainly for things like packages, etc… you’re not going to get the Concrete marketplace integration you’d get elsewhere - you should download those packages ahead of time and check them into your app bundle or deliver them via composer like Korvin mentions.

Really interesting work, and exciting!

mesuva · January 17, 2023, 9:35am

I’m exploring creating an S3 storage adapter through a lighter-weight S3 library, AsyncAws S3 Adapter - Flysystem, instead of the standard Amazon SDK. The Amazon SDK is hugely bloated, it’s like 23 megs.

I’ve haphazardly put together a new file adapter using this library, haven’t tested it yet, but will do soon.

Where I’m now thinking this deployment approach could valuable is for test and demo sites, rather than production sites. I have a few sites around the place I used for just live testing or demo-ing to clients. Not having to have those on an actual server could be valuable.

I could imagine multiple of these sites being deployed, potentially sharing a single database instance to support multiple sites. They’d still have the ability to log in and make edits, use functionality, upload files, etc, but they’d effectively be pre-configured for one purpose/demo. Where it then also gets interesting is that the database could be reset quite easily, really just by re-importing a pre-made database dump and clearing an S3 bucket. If packaged up, that might even be able to be achieved via a button in the dashboard.

A few code samples would be great, even if it’s just for suggestions on how to tackle things.

JohntheFish · January 17, 2023, 10:25am

If you are creating a Concrete storage adapter that uses a Flysystem adapter, could it potentially be a generic Concrete storage adapter that could front for any other Flysystem adapter?

Is there further mileage in the other components used to create a serverless Concrete instance, beyond going serverless?

mesuva · January 17, 2023, 11:05am

@JohntheFish I think all the adapters are already effectively Flysystem adapters, with I think a generic Concrete specific interface as part of that setup (but happy to be corrected)

There’s a lot in common between load-balanced systems and serverless, at least in the way you consider persistent storage. Being able to make a tiny config change and store session data in the database is a great option to have for example, even if it’s not used that often.

So similarly, potentially being able to flick a switch that says something along the lines of ‘don’t ready and write your config files to the filesystem, write to the DB instead’ would also be really handy. That’s where something like Redis comes in, but I could see it being good to have all writable data in a database. But it’s easy for me to say that, it would be a lot of work to implement such for just some edge cases.

Something like an S3 adapter, if fully/officially supported is quite valuable I believe, for sites where you might have clients upload very large resources like videos. The adapters themselves aren’t too difficult to create (I pushed one up to github here for example), but I think it’s actually the flexibility of using those within Concrete’s file manager where the most benefit could come (for example being able to upload directly to a filesystem, rather than having to move for change the default).