Image for post
Image for post
No more head banging for setting up of infrastructure

Revolutionising the infrastructure generation thing for computer industry since 2006 and helping many large and small firms , AWS has given many cloud services like S3, DynamoDb, Elasticsearch etc, which are helping the industry scale easily.

One such tool which came in 2011 was AWS CloudFormation. It helps in modelling infrastructure in a template(YAML/JSON), also has a interactive UI for creating infrastructure.

Image for post
Image for post

In an American travel and leisure company, the audit director discovered something that looked odd in claims made by the health care department.
First, two digits of healthcare payments showed an unusual spike in number’s starting with 65 on conformity to Benford’s law.
And surprisingly a careful audit revealed 13 fraudulent checks for the amounts b/w $6500 and $6599, thus confirming the fraud.

In his book The Golden ratio: the story of phi,
“Mathematics more often tends to delight when it exhibits the unanticipated results rather than expected ones”, quotes Mario Livio. …

Image for post
Image for post

Don't remember the exact problem mentioned by that senior engineer in a particular tech talk, which they faced while working on some product. But it drilled down to something:
Given a cluster of nodes or servers, how to efficiently get the information about current status of a particular node .i.e is it up and running, dead, active, inactive etc.
As far I remember, the bigger picture was data retrieval or update from a distributed cluster of thousands of nodes, which came down to a particular problem statement failure detection in a really large cluster of nodes.

Failure detection in Distributed systems

Given that you have…

A few months back our on-calls were facing hard times with the random spark job failures in our workflow. The resolution was only retrying the job which eventually succeeded.

In error logs according to spark it was:
s3://bucketName/prefix/obj.json file not found exception.

It was unnecessary botheration and also there were uncertain number of retries. Things became worse when it started to happen for multiple jobs. Also, the same sort of issues started coming in more systems.

On diving deep we found the culprit here was S3's eventual data consistency model.

S3 data consistency model

Simply put, S3 provides read after write consistency for the…

Dhruv Sharma

SDE at Amazon, exp with scala, spark, aws, ruby on rails, Django . Morning runner. Wanna be eveything at once... :D

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store