Quote: (03-10-2016 12:42 PM)InsertNameHere Wrote:
Quote: (03-10-2016 04:36 AM)cibo Wrote:
For most data Sci positions they require at least a masters. I've hired data scientists and the work leads itself to a lot of academic thinking which needs some formal training. After the masters, I expect self study since there's always some new tech coming down the pipeline.
Post or PM the link for the msc and I can tell you if the program is decent.
Thanks for the help. These are the two programs I'm looking at:
http://www.ensae.fr/formations-navhorizo...rs-3a.html
http://datascience-x-master-paris-saclay...ignements/
I couldn't find a description for either program written entirely in English, but most of the class titles themselves are either in English or their meaning is pretty apparent.
Both the programs involve schools that are well-respected in France, although the second is a bit more prestigious (one of the partners is the French equivalent to MIT). As far as I can tell, the former is a little more flexible for personalisation, while the latter is a bit more technical.
Thoughts?
So data science pretty much has 2 major branches trying to own the term.
1) The statistical branch that is into the theory of probability and how to utilize it to analyze data. It is mainly uses different shades of regression, curve fitting, forecast methods, and experiment design. They created most of the modern statistical methods from p-values, to Bayes, cluster analysis, etc. They use statistical programs/programming languages but they tend to work with cleaner data that have more consistent structure and in general more conservative in their approaches.
2) The other branch comes from computer science. They are more into how to work with the data and how to analyze data at scale. Most of the newer methods from data science have been coming out of this area: text analytics, decision trees, deep learning neutral networks etc. People coming from this branch use more traditional programming languages and may or may not use a statistical programming language to solve their data problems. They work with poorer data sources that may not be cleaned and can be unstructured (free text, images, sound). They tend to be less concerned with theoretical correctness and more into computer process time and algorithm design. In general, a bit more willing to try new approaches.
The first branch is more positioned for the stats/theory heavy parts of data science: forecasting, risk modeling, and research studies. This usage of data science is fairly intertwined with economics at this point and I would say this program leads itself to academia, financial institutions, think tank and government policy work.
The second branch is more towards the data processing side of data science. It leads itself to the tech industry and some hedge funds. Google epitomes that branch. Most of their machine learning models are theoretically simple but how they apply those models at scale is very impressive. When people are talking about “Big Data” it is usually this side of the data science branch.
The masters you posted fit nicely into those two paradigms I see come up all the time. And I think both will give a decent foundation to begin your career in data science for the most part.
I did notice neither program mentions anything about databases. And this has been a traditional gap I’ve seen in most data science masters. In industry, most of the data you will analyze will be on databases. When you start working is millions and billions of records, you will not be able to crunch the data on your laptop and most cases will need to use a data base querying engine of some sort. When you get to what data scientists (not business people) consider “Big Data”, you’re working with Hadoop and other distributed databases to process your data.
Most data scientists, unless their coming from a strong computer science background, are quite weak on database fundamentals. I don’t think anyone expects you to design all aspects of a database schema, but understanding joins in SQL in a basic relational database and the trade-offs between normalize/denormalized databases would go a long way. Most people will figure this stuff out pretty quickly since it’s not terrible hard but it’s another thing you need to learn when you might be already struggling to adjust from academia to the real world.