I've been planning to write this post since Google App Engine firstly announced a billing system for App Engine. Meanwhile, I thought it would have been more useful to show some problems to the system hoping that the team will reconsider some of the quotas and the way the billing system is supposed to work.
The Google App Engine platform has been offered since the beginning under specific CPU, space and API quotas. Leaving aside the tons of applications that have been deployed on App Engine for playing or testing purposes or just for the coolness factor, I strongly believe that others have evaluated the alternatives and have picked up the platform to develop true applications and while doing so they have considered the original limitations/quotas.
Later on Google has previewed what would become during February the billing system. And I don't think this was a surprise to anyone (in fact I would have expected to see the tons of open bugs fixed or at least explained before seeing a billing system, but this is probably just how I see things).
CPU time, defined as
The total processing time for handling requests, including time spent running the app and performing datastore operations. This does not include time spent waiting for other services, such as waiting for a URL fetch to return or the image service to transform an image.
is part of the billing system. As you can notice right from the definition it includes internal API time. Basically, this means that you'll have to pay for something you have no control of (a very simple parallel to other pay-per-use services would be Amazon asking you to pay for their sporadic hardware replacements).
See note 1
Meanwhile, the Google App Engine forums were (and still are) full of reports of the internal infrastructure misbehavior and this having a clear impact on the applications' reported performance. I should also mention that there were cases when the Google monitoring tools were not even catching themselves the issues, not to mention that the team is failing to provide any real feedback about these problems.
Last but not least, Google App Engine is also reducing the CPU time original quota based on so called resource usage statistics for a recent 7-day period. But they fail to mention if the average was computed based on active apps only or using all the test applications that cannot be deleted or how recent these statistics are considering I've seen the same number mentioned couple of months ago.
Summarizing, I think billing for CPU time is wrong and I am suggesting the Google App Engine team to reconsider it because:
- the terms are not well defined [2]
- it is not clear how they are measured
- there have been repeated problems on the platform and these are impacting the CPU analytics
- it includes framework API internal calls CPU usage and this is not under developers' control
- framework API calls are already billable separately
[1] The picture shows a datastore timeout error for an extremely basic 15 record fetch operation (log record from Jun.7th)
[2] While the cpu_time metric is defined on the Quotas page, the logs are including other CPU related metrics which are not clearly defined.
12 comments:
Biggest issue is charging for calls that Timeout, imho, which exceed normal cpu usage by 10-1000x for no obvious reason.
Google should make AppEngine time free, and charge users to view ads.
Actually I don't think I would like to see an ad supported AppEngine version. But I do think there are better ways to bill the platform and I'll probably follow up with a proposal.
Utilization of CPU time is more under the control of the programmer than it is under anyone else's control, though; and CPU time is both a scarce and a costly resource. And by the way, Amazon does charge for equipment replacement -- not directly, but actuarially.
On the other hand, it would make sense for Google to charge a fee per use of each system service (i.e. code provided by Google), and exclude the actual CPU time taken by those calls from the CPU time charged for. This would provide incentive for programmers to prefer fewer system calls, and would provide both payment and incentive for Google to fix performance bugs.
wtanksley, very good points. CPU is indeed scarce and costly and it is considered to be under the programmer control. Anyways, as pointed out in the post for Google App Engine it is *not* the case. I am planning to write a follow up post commenting on the answer I've got from Google and describing my proposal (simply put my proposal is: bill for API calls as I have clear visibility and control)
You do have control over the API calls, you control when you call them. Beyond that, you're paying for the infrastructure you run on, which is what you do anywhere including other cloud resources.
Google's approach, from what I've seen, is to give us more control over the API calls. Have you seen the task queues that are coming? You'll be able to defer writes while keeping your data available, something I've personally worked on with the gaeutilities project for session. I'm hoping now I never need to release the version currently in trunk, and instead will be able to rewrite it to use the task queues.
The datastore timeouts are annoying. It's frustrating Google says they will always be there. But overall I still believe their CPU quota system is fair.
Joe, I have never said I don't have control over API calls (I bet there are quite a few apps that do not know that there are things that trigger additional calls, but that's a different story). What I've said is that you don't have control over CPU time spent within those API calls and that is wrong. Basically, you can end up paying completely different amounts for exactly same calls just because the framework/infrastructure is buggy or misbehaving or just underperforming.
There is a major difference between what you pay for on App Engine and what you pay for on AWS: in App Engine case you pay for infrastructure + framework behavior, while on AWS you pay only for infrastructure.
Can you explain on what basis you consider their CPU quota fair? I had log samples showing exactly same calls where the CPU is ranging from 100ms up to 7000ms. That is completely unfair and wrong (simply put it means that for the same request I may pay 70 times more).
I currently have my own task queue implementation that is behaving exactly as I want and I will probably post it pretty soon.
This is interesting.
Consider this situation: You load some data into or out of datastore API. Since Google uses 7200rpm disks in massive numbers, the hard drive response time is typically 10ms for one IO. If they timed the DataStore time against the hard drive latency, then it is bad bad business. That means the CPU time waiting for the slow hard drives are being billed as used CPU, where as on their servers, it just context switches to another thread doing more work.
If their datastore backend gets busy, and latency spikes to 100ms or so, every Google App Engine user using the DataStore API at that time is going to be charged 10x more CPU hours?
That's exactly what is happening.
My suggestion was to replace all wall clock-like metrics
with something that doesn't depend on their infrastructure
behavior. The simplest model is one based on API usage.
But this is Google we are talking about so I frankly don't
expect anything to change.
Alex: Thanks for bringing this into my attention. Google's way of counting App Engine CPU time is potentially a conflict of interest against its users.
The best way to handle this probably split up the CPU time vs DataStore API and price it separately. Pure CPU time can be wall clock based. DataStore API charges should be transactional charge based on size, so that a slow down of hard drives in the backend isn't going to be a conflict of interest.
Hey Alex:
On a second thought, this open letter to google isn't going to get you the leverage to get what Google App Engine users want because not many people read your blog. Why don't you send a nice email to Michael Arrington of TechCrunch to see what he can do? (He fights for the little guys).
You are probably right, so please feel free to reach out to TechCrunch or any major blog. Unfortunately I don't have any contacts there to do it myself.
Thanks
Post a Comment