Hacker News Comments on
University of California, Berkeley
Big Data Analysis with Apache Spark
Hacker News Stories and CommentsAll the comments and stories posted to Hacker News that reference this url.
One of the Spark-focused EdX courses has a very good module on Alternating Least Squares, that will help you understand how to build recommender systems in a scalable way with Spark.
⬐ xkyThis looks good but it's starting late 2016. Are there any old courses you could recommend?⬐ dserban http://bugra.github.io/work/notes/2014-04-19/alternating-lea...
This looks like a good introduction to ALS, albeit Python/Pandas centric.
If this is interesting then I recommend (ha!) the EdX Spark course. One assignment shows how to build a recommender on the MovieLens dataset mentioned in this article.
For everyone that wants to start working with Spark and Big Data, I recommend them to enrole into this MOOC published by UC Berkeley at EDX: https://www.edx.org/course/introduction-big-data-apache-spar...
Anyone who wants to pick up Spark basics - Berkeley (Spark was developed at Berkeley's AMPLab) in collaboration with DataBricks (Commercial company started by Spark creators) just started a free MOOC on edx: https://www.edx.org/course/introduction-big-data-apache-spar...
(If you wonder what is Spark, in a very unofficial nutshell - it is a computation / big data / analytics / machine learning / graph processing engine on top of Hadoop that usually performs much better and has arguably a much easier API in Python, Scala, Java and now R)
It has more than 5000 students so far, and the Professor seems to answer every single Piazza question (a popular student / teacher message board).
So far it looks really good (It started a week ago, so you can still catch up, 2nd lab is due only Friday 6/12 EOD, but you have 3 days "grace" period... and there is not too much to catch up)
I use Spark for work (Scala API) and still learned one or two new things.
It uses the PySpark API so no need to learn Scala. All homework labs are done in a iPython notebook. Very high quality so far IMHO.
It is followed by a more advanced spark course (Scalable Machine Learning) also by Berkeley & Databricks.
(not affiliated with edx, Berkeley or databricks, just thought it's a good place for a PSA to those interested)
The Spark originating academic paper by Matei Zaharia (Creator of Spark) got him a PHd dissertation award in 2014 by the ACM (http://www.acm.org/press-room/news-releases/2015/dissertatio...)
Spark also set a new record in large scale sorting (Beating Hadoop by far): https://databricks.com/blog/2014/11/05/spark-officially-sets...
* EDIT: typo in "Berkeley", thanks gboss for noticing :)
⬐ spacko> It is followed by a more advanced spark course (Scalable Machine Learning)
Is it really more advanced regarding Spark? The requirements state explicitely that no prior Spark knowledge is required.⬐ eranation⬐ tomnipotentCool, I stand correct. Thanks"... on top of Hadoop".
Can safely remove this part. Hadoop not required.⬐ digitalzombieHadoop isn't require and it only run better if you fit data in memory.
Spark does micro batch processing where as Hadoop traditionally does batch processing. Hadoop yarns is different now and even with old Hadoop if you can fit it into memory it can be supposely as fast according to a meetup I've attended.
There's also Apache Flink by data artisan.⬐ gttI've been struggling to set up it correctly on my debian machine. Are there debian packages or some concise tutorial? I've found some thing on the web, but certain things does not much mine and I'm lost...⬐ annapurnaThanks for the detailed info and context. Just signed up for my first edX course.⬐ yzhThanks! I've been following the course and so far it's been awesome!⬐ julnephtThanks for the plug, I have signed up as well to the class and its great !⬐ NoneNone⬐ 0xFFCI would love to learn about spark,but as some one who li e in third world country I hate edx,instead I am in love with udacity and coursera.the place I am living ,we don't have much traffic monthly ,instead we can download everything we want between 1am-6am,so there is no way to download course from edx ,simply and using it later.I wish it was on udacitg or coursera,is there any torrent for course material?⬐ sidmitraI'm doing the spark course. Edx has a download button on the videos, and can download PDF files for the lectures. The rest like quizes that are embeded, i just screenshot or save as pdf for posterity.
Are you sure you can't download, or maybe they've changed recently.⬐ 0xFFCYes I am aware of download button , but consider every course is ~50 distict video and also consider our downloading time you are going to agree with me about downloading is extermely painful ,why they just doesn't put whole material (at least just videos) like the way udacidy does.⬐ jm0You can download the lectures using the edx-downloader: https://github.com/shk3/edx-downloader