Category: data science

Why It’s So Freaking Hard To Make A Good COVID-19 Model

From FiveThirtyEight

Great article, please read it. Be wary of data without context. Always.

Numbers aren’t facts. They’re the result of a lot of subjective choices that have to be documented transparently and in detail before you can even begin to consider treating the output as fact.

8kb graphical visualization of the RNA sequence of SARS-CoV-2. [Source of the complete genome]

Via Data is Beautiful subReddit

Music Loudness by Genre

Via 

r/dataisbeautiful

A Topographic Map of the Moon

By Eleanor Lutz

Open-Source Code

💙

In short.

ESA’s star mapping mission, Gaia, has shown our Milky Way galaxy is still enduring the effects of a near collision that set millions of stars moving like ripples on a pond.

Via ESA: Gaia hints at our Galaxy’s turbulent life

Curve-Fitting by xkcd

alt-text:

Cauchy-Lorentz: “Something alarmingly mathematical is happening, and you should probably pause to Google my name and check what field I originally worked in.”

Explanation

Using machine learning for cross-lingual and cross-platform rumor verification

Via Tech Xplore

Preprint.- Cross-Lingual Cross-Platform Rumor Verification Pivoting on Multimedia ContentPDF – Code (GitHub)

Interesting, I guess over time it will be quite trusted depending precisely on the trust that platforms can offer, in this case Twitter, Google and its Chinese counterpart Baidu. However, common sense and education (natural intelligence as opposed to artificial intelligence) will remain essential as long as there are lots of folks who only believe (and share regardless of reliability) what fits in their pre-established and static schemes, and little lambs that treat as truth revealed all the stupid things they read on MSM or see on YouTube.

Getting started in Exploratory Data Analysis with Pandas and Jupyter Notebooks:

Hi! I am new to Medium and made this tutorial on the Pandas Framework for Python. Please tell me what you think!


A cool short introduction to Python + Pandas. Thanks!

Actually I’m a R enthusiast, but I’d say the tool (Python, R, Julia, C++, Java, etc…) is a secondary factor, the real core of data science is Mathematics, singularly Statistic, then Computer Science, and only after that, the tool. In my opinion both Python and R (with or without Jupyter) are powerful enough to enter happily in this wonderful field.

Image from Wikipedia article “Exploratory data analysis”:

Good synthesis!

Via @drewconway (Author: @sandserif)