In my last post we talked about Plotly and mentioned Dash as something I’d revisit later. Well here I am!

What is Dash exactly? In it’s own words:

“With Dash Enterprise, full-stack AI applications that used to require a team of front-end, back-end, and DevOps engineers can now be built, deployed, and hyperscaled by a single data scientist within hours.”

Quite a bold claim, right? Its totally true though! For my bootcamp capstone I was able to put up a complete webapp style, auto updating dashboard deployed to server and accesible by browser. It gathered crude oil price, supply and…


Of all the data visualizations I have used so far, Plotly is my favorite!

We all start with matplotlib. I still use it in a pinch to get some quick charts. Then we usually move on to Seaborn. Which is nice and all…but still lacking. What if I told you you could make charts that allowed the viewer to zoom in by selecting regions, toggle spikes to see axis values and scale, without having to code out all that functionality yourself? You can, with Plotly!

To get started, first install plotly.

To use in Jupyter, be sure to install the…


I said last post I would revisit Flask. Here I am, true to my word!

So the app I wanted to make was a task/project management app. I know, I know; theres no shortage of such apps out there. But for some reason they never seemed to quit fit my needs. Besides I wanted something much more…basically one giant app to organize my life, where I could keep contacts, documents, thoughts (“It’s called g-suite Umar!”). I got started working on this app and quickly realized it was a much larger project than I thought. …


I had been wanting to learn Flask for a while now, since I built my Dash based Crude Oil dashboard. While Dash was a nice and easy way to spin up interactive data visualization apps, I ran into its limitations fairly quickly. Flask is much more flexible and extensive and useful in a variety of use cases. It just made sense to have a web app development framework in one’s toolkit, as a general purpose utility with a variety of potential use cases.

Over the weekend, burned out from intense job searching, networking and take home assessments, I decided to…


Let’s take a step back and revisit how the internet works.

When you type “www.facebook.com” into your browser, it reaches out over the internet to a server. A server is program running on a dedicated computer that can respond to browser requests and deliver content to the user. These requests and responses are transmitted using the HyperText Transfer Protocol (HTTP), which is really just a standardized way for computers to exchange data over the internet.

HTTP defines methods (also called HTTP verbs) that are used by browsers and programs to make requests received by and acted on by servers. These…


If you’ve been doing any kind of data analysis or engineering for any length of time, you have probably worked with API’s to get data from. Today I want to explore how API’s are actually built and deployed, and some of the principles behind their design particularly under the REST paradigm.

As you probably know, API stands for Application Programming Interface. The entire purpose of an API is to simplify the use of a software system by creating an easy to use interface. In this way, we can rely on the functionality of the software without having to get into…


Harness the power of clusters!

In my last post we looked at how to begin using the Apache Beam library to build data processing pipelines. Lets now look at how to have these pipelines run on cluster sof computers that can work on our data in paralell and reduce processing time!

We will be deploying the pipeline on Google’s Dataflow service. So there are two ends to the deployment process: our local machine where we might currently have our data and code, and Google Cloud Services where we want to have that data processed or analyzed by cloud compute resources.


Big Data frameworks are all the rage these days. For good reason: the amount of data out there that needs to be processed and anlyzed is tremendous. So much so that oftentimes, it is impossible to work with some datasets on our personal computers.

This is where Big Data comes in. The idea is to use multiple computers to process a given dataset. This generally entails spinning up a number of virtual machines in a cloud service like AWS or Azure and then using specific frameworks like Spark or Dask to tackle your data problem. …


Technology, and for that matter life, is rife with duality.

It is in fact built on top of it; at the end of the day all your code and cat videos are read and represented as 1’s and 0’s, on and off, being and nothingness.

Nerd culture has no shortage of dualities. Star wars or star trek? Vim or eMacs? Having a strong position on these subjects is seen as a mark of high rank and recognition among such circles.

Being a vscode newb, I dont usually have the requisite familiarity to take a side on some of these eternal…


I want to talk about a scenario that occurs fairly often when doing any significant amount of data engineering and analysis in Pandas.

We all know that one of the big advantages of Pandas is the ability to broadcast functions to entire data arrays and easily transform large amounts of data. In addition to built in functions like mean(),sum()etc, pandas allows you to apply a custom function to an entire array in just a few lines of code. This is done by means of the apply() method to DataFrames.

We have all generated new pandas columns by applying a function…

Umar Khan

Just an attorney who wandered into data science and never wanted to leave.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store