Follow along with the jupyter notebook here
I recently documented how I created a simple data engineering ETL project using twitter, python and AWS. Briefly, the project extracts data from yahoo finance and twitter, cleans it, and dumps it into an Amazon RDS instance running postgres. The script is deployed onto and EC2 instance, and runs every 30 minutes using cron.
Click here to read more about the simple beginners project (repo included).
After about 2 weeks of running, the project was successful and I managed to build up a nice amount of data. …
In this article, I am going to give a review of the ‘Learn Apache Kafka for Beginners v2’ course by Stephane Maarek on Udemy. I will break this review down into the following sections:
5. Recommendations for students
Before that I want to give some background about myself and why I selected this course. Briefly, I am a full stack developer seeking to transition into a big data role. The requisite skills for a data engineer are far broader than those of a web developer, and while there is some overlap, there…
First of all, why bother with this project? Secondly, why write an article about it?
Addressing point one, the reason I worked on this this project in the first place is to take a step toward developing my data engineering skillset . I have no background in data engineering and the field is very new to me — you can read more about my story here. Learning theory is great, but it’s better when we can also accumulate first hand experience, and ingrain the lessons of the theory through action. …
I want to learn data engineering, and per the advice of Andreas Kretz, I’m going to document it.
Let me begin by introducing myself. First of all, who am I? At the time of writing this, I’m a full stack developer in my early twenties. …