Study Intro

I used to go to the gym consistently a few years ago, about 3x per week. I had a decent full-body workout plan.

It was definitely fun, I would leave almost every workout soaked in sweat with my body screaming.

My favorite was weighted front lunges. I would pick up two dumbbells and alternately lunge my legs forward.

It was all good until I was laying down one day and one of my knees suddenly felt like someone stuck a needle smack in the middle of the joint. I’ve had frostbite before, but I have to say that this was the most excruciating pain I’ve ever felt in my life. The slightest movement made a piercing pain in my knee, it should’ve might as well have been paralyzed.

After going to the doctor, I was told I had patellar tendonitis, aka. ‘jumpers knee’. Turns out my description of the pain wasn’t much different from the biological definition, it’s when your shinbone is rubbing against your kneecap.

My grave mistake was assuming that just because I was moving means that I am properly working the muscle.

I couldn’t believe it. How was I not doing it right even though I felt sore!?

This is exactly why I’m no longer relying on public education. They just put you through the motions, but you’re never actually getting stronger.

I only had tendonitis in one knee, but both of them still ache to this day. I can tell it’s going to get worse as I get older.

Most gym people want to be left alone, but I wish somebody would’ve pointed out to me that I was doing them wrong. Now I have to live with this for the rest of my life. At least this only happened as a result of my individual ignorance towards weightlifting, but it is absolutely ridiculous that an established, reputed institution is putting me through the same thing with academic subjects.

Luckily only that knee had an issue, but I noticed that I wasn’t making any gains on my body overall.

What I did wrong was I would go to the gym with the intention of ‘just to get through it’. I would rush through all of my workouts and listen to music as a way of forcing me to finish a set in an allotted time.

Public school does the exact same thing. They don’t have the slightest concern for ensuring that you are gaining muscle, they just want to see you move. This is a dangerous attitude that kills our work ethic and can lead to major issues in a professional setting. Speed does matter, but there’s no point if the work wasn’t properly done. Real life isn’t about how you look doing the rep, you have to actually lift the weight and target the muscle. I’m aware that optics matter, but for the most part, you have to actually get the job done too.

What is Data Science?

At its core, it’s statistics adjusted for technology.

The ‘science’ part comes from our mission to identify patterns. Science in general is all about observing patterns in our reality and trying to make sense of them; we then exploit them with engineering.

Types of Questions

  1. Comparison/Exploratory – the most common type of science question; observe patterns and trends between groups; primarily done in the gathering stage yet without regard for cause (how does x affect y?)
  2. Descriptive – summarize the characteristics of a piece of data, identifying the data on hand before worrying about what hasn’t been collected yet; answer is usually a discrete number (How many clicks has this ad gotten in the past 48 hours? What’s the average time a user spends on this site?)
  3. Classification – automatically classify variables; usually a yes/no or one-off answer; typically used for spam filters
  4. Inferential- usually asked following the results of descriptive or exploratory questions; attempting to make a statement about a feature outside of the data (how many people will drink & drive this New Year’s Eve given the data that shows the # of DUIs on this day for the past 5 years?) I’m seeing an overlap between this and predictive questions. Some lists have both, but still don’t specifically explain the difference between the two. I’ll have to look into this more.
  5. Causal – changing a characteristic of a variable to observe the cause of an outcome; similar to exploratory, but more goal-oriented towards finding an answer instead of initial gathering
  6. Clustering – identifying the relevant groups present in the problem; will revisit as it pertains to machine learning
  7. Regression – don’t know yet
  8. Mechanistic – don’t know yet

All of these will be revised as I progress. I at least have the general types down.

I’m going to attempt to ask 3 questions each from 9 different industries to demonstrate the versatility of this vehicle.

  1. Insurance
    1. How does the amount of visits to fast food restaurants correlate to how many visits to medical facilities a person makes? We could also include the reasons they make the visits.
    2. Should we raise the life insurance premiums for newlywed couples in this region given that there have been numerous mariticides (spouse murders)?
    3. Given our data that shows this particular demographic frequents bars, should we start offering a higher auto insurance premium for any potential policyholder who fits it, even if the individual has no history of DUI?
  2. Fashion
    1. Why is it that this particular ad got more clicks from one demographic, yet this other demographic actually makes more purchases? Do the models in the ad target the wrong audience?
    2. Even though this one pair of cleats is a bestseller with proven quality, this prolific athlete has also released his own signature pair, with no data on the quality compared to the first pair. Which pair should we promote for this upcoming sports season?
    3. Why does this shirt sell more online compared to in-store? Is the presentation better online? Are in-store prospects typically dissatisfied with the fit or the real-life look/feel of the shirt?
  3. Automotive
    1. Why is it the higher trim of this particular model sells more in a lower-income area? Is it the income of the buyers, or just the fact that the dealerships are located in this area?
    2. While we consider launching a recall for this vehicle, let’s observe the conditions that this part typically defects in (driver demographic, region, terrain, time of day, trim, mileage, etc.). This would be done to determine if the part itself is faulty, or if it’s a certain condition that makes it that way.
    3. Should we keep this luxurious amenity as an option, or just make it a part of the trim? Not only will this increase sales, but how would it affect the cost if the amenity is a default feature?
  4. Industrial Fishing
    1. While trawling is the standard type of net used for its large volume, are the catches necessarily higher quality?
    2. How much more harmful to the environment are certain boats and nets? Should we prioritize maximum volume or minimal bycatch (catching unwanted sea animals)? We must also define what ‘harmful’ means.
    3. Is there an endangered species we should minimize catching? What if this species is in high demand?
  5. Self-Storage
    1. What is the cost-difference between having an overnight associate present for assistance vs. giving renters a 24/7 chat-bot for after-hours visits to their unit?
    2. Should we run a deal on a lower price per-square foot for larger units? What if the renter doesn’t need every available amenity (ex. after-hours accessibility, climate-controlled, premium theft insurance)?
    3. How many more rentals could we sell if we implement an augmented reality storage unit that prospects can test before they commit to it? This could mainly solve the issue of people not knowing how much space they need.
  6. Public Transportation (buses & local subways/light rails)
    1. What’s the distribution of distance-to-public transportation and it’s correlation to the income-level of neighborhoods? Shouldn’t it be easier to access PT from the lower-income neighborhoods as they’re less likely to have cars?
    2. Should there be a special priority pass for daily commuters? Many have trouble getting a ride despite being on time. If this was implemented, it would have to be done in a way it can’t be easily abused (similar to handicap parking placards).
    3. Should there be more frequent pickups for the routes that intersect with the highest employers? For example, Amazon is one of the major employers in my city, there were extra bus routes added for the specific intention that they stop at the lot, letting off loads of employees at a time.
  7. Music Streaming
    1. What exactly counts as a ‘stream’? Is it simply clicking on the song, or playing it all the way through? What if I did listen to most of the song but cut it off at the last second, does that not count as a full stream? How does that translate to how much artists are paid per stream?
    2. If a certain artist’s song is popular, yet most streams show that listeners skip to a featured artist’s verse, should the featured artist get a higher percentage of the stream over the main artist?
    3. What constitutes a best-selling album? Is it based on the # of streams per song or just total streams altogether? If a 20-song album only has 2 songs that are highly streamed, does that make it better than a 12-song album where all songs are streamed at an average rate? Should an album be counted as ‘top’ if it’s only good for the singles?
  8. Furniture
    1. Should we run a ‘new home’ deal if the buyer can prove they’re moving into a new house? This could be lower financing, combo deals (ex. couch with 2 recliners and an ottoman), or even a warranty for in-house repairs.
    2. What channel should we primarily promote our summer bunkbed sale on? Just like toys, many are desired by the kids yet bought by the parents. Direct mail? Commercial on a kid’s cartoon channel? Ad on a parenting app?
    3. Are people with mentally demanding white-collar jobs more likely to buy a standup desk? Whether for their home office or the one at their employer’s building. What about middle vs upper-level managers, who is more likely to buy a standup vs a traditional sit-down?
  9. HVAC
    1. What’s the cause for the experience gap in the industry (many <5 years and 20+ year professionals, but few in between)? Is it due to the tediousness of obtaining the credentials, do the companies have high turnover rates?
    2. Are smaller HVAC companies doomed to work residential and/or smaller businesses as larger companies have long-term contracts with established brands?
    3. Are there any fairly recently (<10 years) implemented policies that have drastically changed the workflow of HVAC companies? Whether it’s a certain chemical solution that was approved/discontinued or an extra precaution workers must take?

Keep in mind, data science is NOT the skeleton key for all of our problems. All of these questions are assuming there is data present in the first place.

Most of these questions are pretty weak, given I’ve only generated them off of surface-level research of these industries. If you work in these industries, these may seem naïve, which they mostly are. This is just me starting out and exercising on how I frame my questions.

One big issue with this field is the concept of collecting data through surveys. Surveys are notoriously unreliable, especially if the facilitators aren’t meticulous in the conditions they have people take them in (ex. stopping them right before they leave, too many questions, asking for too much personal/contact info, etc.). Again, data science questions are reliant on the fact that there is data to be analyzed in the first place, so I will have to pay close attention to the softwares these companies use along with their gathering methods. I may even have to devise my own methods and create my own software for the companies who don’t have any at all.

Worse, even if a company does collect data, how do we know it’s properly gathered?

I can say I took 40,000 steps today, but if I show you that most of it was just me pacing in place, you don’t hold the number in as high regard as you initially did. Numbers mean jack if they weren’t properly gathered.

This will widely vary based on the domain, but this is the basic process of a data science project.

  1. Identify the problem
    • This will probably be my favorite part. I get to fire off all my questions and start researching. It’s important to understand what I’m looking for before I start gathering data. In my experience, a lot of problems persist not because we have trouble finding the answer, instead we’re asking the wrong questions. Though I need to keep in mind that I can’t keep asking questions all day, I have to know how to prioritize the important ones, especially when it comes to projects with strict deadlines.
  2. Collect the raw data
    • I will have to dig into the brand’s digital warehouses and repositories. For most retail brands, I will have to look through their CRM (customer relation management) softwares for their sales metrics. This will also be a slippery slope for me as a newbie since these brands will have to entrust me with this data, which raises the dilemma of intellectual property. As I progress, l will learn more about data confidentiality and how to protect the brand’s data, as well as myself as a data scientist.
  3. Clean the data
    • A lot of this will be me staring at spreadsheets. Some school assignments and online courses will supply you with the dataset already cleaned. I can’t get spoiled by this. Real world data is messy. So since I’ll be weeding out the irrelevant data, this means I’ll be jumping back and forth between this and step #2 to ensure I have what I need.
  4. Explore the data & Plan the model
    • Once the data is sifted, I can finally start playing with it. I’ll admit this is the part where I fall short. I’ll have to educate myself on machine learning and other softwares to better understand this.

This will be immensely difficult starting out. My goal is to fail as soon as possible so I can quickly learn how to improve.

My main areas of study will be:

  • Programming languages (Functional, Object-oriented, and Database)
  • Math (Statistics & Probability, Linear Algebra, Calculus, Regression Analysis)
  • Visualization Softwares
  • Data mining/cleaning Softwares

Those will be the ones I focus on as a beginner. As I go on, I’ll progress onto:

  • Software Engineering
  • Cloud Platforms
  • Neural Networks
  • Machine Learning

Why Data Science?

On my homepage, I state it’s for filtering information.

Ever since I graduated high school in 2016, I always got ads for data analytics courses. I have never searched for anything pertaining this, yet I always got ads for it.

Literally every platform I was on I would see these ads. From an unrelated Google search, to Twitter, Instagram, Quora, and even my emails.

I always dismissed it as seeming too hard or too much math. I was like any other late-teens kid, looking how to get rich with minimal work.

After realizing I have to do the work, I decided to suck it up and just learn the trade of data science.

Being a fairly new field, I was hesitant to commit, but I’m attracted to its dynamism and how it takes a wide array of subjects and puts them into one product.

At first I idealized it’s capability, “I can solve the world’s problems!”. Then like any other fantasy I started to notice its shortcomings and accepted that it can’t be used for everything.

Regardless, it is still arguably the greatest tool we have for solving those seemingly unreachable problems. I’m aware there will never be pure bliss, but that doesn’t mean I’ll cope with the unnecessary misery.

Ultimately, these are the main 5 reasons for why I choose to master this field:

  1. It’s one of the few fields where I’m encouraged to ask questions and not shamed for it.
  2. It applies to all fields; it’s a vehicle of information, I can go wherever I want once I know how to drive it.
  3. It’s dynamic, I’m never stuck doing the same thing over and over again. Even if a typical project has a base process, it will always be adjusted for its respective domain.
  4. I’m terrified of how much information is out here and that there is nobody trustworthy to properly gather it. So I thought I’d might as well do my part.
  5. It’s arguably the greatest tool that can bring us closer to solving humanity’s problems.

My learning style

There is an old video of Bill Nye. He takes a giant wheel of cheese, so big he needs a power saw to cut it in half.

Then he cuts that half in half.

Then that quarter in half…

Another half…half…half…

To an uncuttable piece so small it has to be picked up with tweezers.

This is what I seek to do with everything.

I am a firm believer that if you can’t break something down into its tiniest bits and build it back up to its complete state, you do not understand it.

Stealing from economics, we view everything through two types of lenses, micro and macro. Micro is our everyday experiences. Macro is the ‘bird’s-eye-view’ or the ‘bigger picture’. When you learn that they will be blocking a major street for road work, and you think about how this will affect your own commute, that is you thinking about the micro. The macro is understanding that this street needs to be repaired to prevent any colossal damages in the future; bad for you, great for the community.

I have a prominent ‘top-down’ learning style. This means I need to understand the macro before getting into the micro. Most people not only start with the micro, they ignore the macro. That’s setting off on a journey with no idea what your destination is.

School strictly focuses on the micro. The environment discourages you from adopting any attitude outside of “is this going to be on the test?” It’s funny because the administrators will act so confused as to why nobody is enjoying the subject and why us students are so grade-motivated. Why would we care about how the forest looks as a whole when you force us to stare at only a few trees?

I can’t count how many times I’ve asked a question in class only to be shot down with “you’re jumping too far ahead”. Most of the time, those questions were about the macro of the subject.

I’m sure it is impractical to completely rely on this learning style, but I have yet to see bottom-up work out for the better. It will be a sure sign of growth if I can experience the usefulness of bottom-up learning. The absolute only drawback I can see for top-down learning is that it takes longer. Welp, the biggest problems won’t be solved in one day, nor should we force them to be just because we’re too lazy to think about it.

Also, one of the biggest qualities of data science is searching for information on your own. College arguably goes against that, as they tell you what to know. You would think because it’s formal that the information would be better presented, but you can still pass the class without fully retaining the knowledge. Besides, even when we’re doing research for an assignment, we tend to only look for the bare minimum required; as long we get a C, the job is done, and there is nothing more to worry about. I will readily admit that I did this for school, as grades are moot, but it is dangerous if I bring this attitude into the real world. The consequences are much more severe and I can’t just ‘redo’ the assignment. I need to take my research very serious and never settle for ‘just enough’.

This is how stupid I am: The driving test does not go over all of the necessary functions that come with driving. So I could pass the test but still barely know how to drive. I refuse to go through the test, get my license and STILL not know how to do what’s necessary. That’s not fair to the people who could get hurt by my poor driving.

The biggest issue I’ve noticed in formal education is they don’t teach you why the subject is necessary. I choose to offset this by starting concepts with the macro. You can’t TEACH somebody to CARE about something. You can provide them with information that may spark their interest, but you can’t force somebody to feel for something. There have been multiple times I’ve asked someone a question about their area of study and they would give me a surface-level answer or just dismiss the question altogether. I don’t do it with malicious intent, but I’ve also noticed some people get uncomfortable or even hostile when I ask these questions. This showed me that many people only have a limited understanding in their field. They know of their subject, but they don’t understand it; they know how to drive a car, but they don’t know how a car works, yet some feel qualified to give mechanical advice. So for all my subjects, I will start with a book on the macro, then I will document my findings as I explore through the micro.

In most cases, the macro will be the history and philosophy of the subject.

My selected disciplines are vehicles. They are subjects in themselves, yet they allow me to explore other subjects.

All that said, my studies will be broken down into micro and macro.

Most of the micro will be what I learned for the subject. My walk through the town.

Most of the macro will be me summarizing the books I’ve read. My flight over the town.

Subjects 

Math

What’s annoying about most math is it’s only correct in the abstract realm of math. Learning this math as it pertains to data science will allow me to actually APPLY it to something instead of just passing a test.

Statistics & Probability

This is the part of data science I have studied the most so far.

Basic difference: STAT is looking backward, PROB is looking forward.

If you want to know the ‘chance’ of something, that’s PROB.

STAT is looking at what already happened, then trying to infer how and why it happened.

It’s weird, I used to think PROB was simple and STAT was complicated.

Turns out they’re both complicated, yet STAT is actually easier because there are more formulas at your disposal. PROB on the other hand is much more intuitive.

The most eye-opening thing about STAT&PROB is it isn’t math that is only right in its own realm. The job is not finished when you calculate the number. You have to ask your own questions. You have a duty to find something wrong with a study. It’s exciting because this makes it immune to automation. A computer can tell you what a number is, it can’t tell you why it is.

The scariest thing about these is you can be mathematically correct, but still contextually wrong.

The most common fallacy is correlation does not equal causation.

One recurring opinion I’ve heard in passing was that the consumption of marijuana is the cause for the rise in violent crime (I live in Baltimore City, who decriminalized weed in early 2019).

There are 2 problems with this opinion:

  1. It’s intellectually lazy. I have yet to hear any further justification for that claim besides the statement itself.
  2. It doesn’t bring us any closer to solving the problem. I don’t doubt that a lot of the people who commit violent crimes do smoke weed… that does not mean that the weed itself is what is causing them to commit the crime. Therefore, the solution wouldn’t just be: ban weed = less violent crime

This is just one of the many popular claims we make to assign blame to a problem instead of attacking it at the root; WHY do people commit violent crimes? What circumstances drive them to do it?

So far, I’ve breezed through Naked Statistics (with plenty of notes of course) and half of Lady Tasting Tea, which follows the history of not only STAT, but the scientific method and our current process of evaluating studies. So I got the basic mean/median/mode out the way.

My biggest takeaway so far was distinguishing mean vs median. Not so much the process, but the purpose of them.

  • Mean is the average. The problem with mean is it can easily be skewed by an outlier (a number significantly different from the others).
    • If there are 10 men sitting at a bar, and each of them make between $10,000-$20,000, the average salary of that bar will be around $15,000. But if Bill Gates walks in and sits at the bar, his presence will jack that average up to the millions. We can now freely say that the average salary of that bar is x million… that does not mean that all the men there are taking home 7-figures. This is an instance where the median would be more appropriate. Median is the middle number of a sequence.

You can mainly apply this to when you’re looking for a job. Always look for a median salary as the mean can be falsifying.

Another example is the notion that life spans were “shorter” back then. Intuitively, you would assume this means that people died ‘sooner’. In reality, people still lived fairly long, it’s that the child mortality rates were much higher, which brought the average down by consequence. Think of it as the reverse version of the ‘Bill Gates at the bar’ scenario.

I have a surface understanding of Bayes Theorem, I know it involves updating your beliefs based on new information.

Throughout my studies, I will document the rivalry between Ronald Fisher & Karl Pearson; the 2 fathers of statistics whose polarizing views changed the world of science. I also may dabble into William Playfair as he allegedly invented the statistical graph.

Overall, this subject will comprise of the math, visualization, and epistemology of the scientific method.

P.S. I passed statistics in college, yet I forgot it a week after the semester was over. But I passed, so that magically makes me an expert on this 😀

Calculus

For algebra & calculus, I’m currently reading Infinite Powers by Steven Strogatz. This follows the origins and philosophy of calculus and how it was used in many of our moderns inventions like the microwave, plane engines, and even our iPhones.

So Infinite Powers will be my macro for calc, and for the micro I will be going through the formulas I encounter.

Feedback Loops

The Brilliant app is great for introducing me to the concepts, while Khan Academy has a more traditional ‘get it right or be stupid’ interface. So I’ll let Khan Academy tell me that I’m wrong, while Brilliant and other resources can tell me how I’m wrong.

I have over a dozen data science learning paths, so I’ll be using the overlapping subjects in order.

There are more than plenty of educational math platforms, so I won’t be starving for feedback on these subjects.

For you college folks, keep in mind I’m using most of the same resources we use in our self-studies. I know Pearson is the standard module for math, but they don’t tell you anything these free sites can’t; they’re a TESTING platform, not learning. To be so ‘formal’, you would think their explanations would be more elaborate. Those who have taken a college math class know EXACTLY what I’m talking about with Pearson’s explanations.

While I’m on the subject, Pearson is the most atrocious platform that represents everything I hate about American Public Education. Most of my business and STEM classes relied solely on this for our assignments. It’s a lazy copout for teachers to outsource the work so these ‘qualified experts’ don’t have to create their own assignments. A few teachers who were required to use this even expressed this themselves. Like I said, it prioritizes you just shutting up and doing the work and not you UNDERSTANDING the material. Most of our textbooks were partnered with Pearson. For the most part, colleges don’t have in-house books with exclusive knowledge you won’t find anywhere else. The information is centralized, so there is no such thing as ‘doing your own research’ with these assignments. There are numerous forums and subreddits of people posting their bad experiences with Pearson, some of them are teachers! And to top it all off, you literally have to pay EXTRA money just do your own assignments. Yup, after spending thousands of dollars for the class itself, you also have to kick out an extra $150+ just to do your own damn assignments. You ever seen a screenshot of the math problems where the student got the ‘wrong’ answer when the correct one was the exact same? THAT’S PEARSON.

And if it ever comes to that, I’m willing to hire a tutor for the concepts I’m struggling with. At least with a tutor I’m sure I’ll actually come out on the other end with new knowledge.

Motion Design

Motion Design (or graphics) is basically animation, just not as elaborate and large scale as the movies and TV shows. You see it every day, shoot, probably every hour. When you open an app and see the bouncing logo with all the pretty colors dancing around it? That’s MD. Come to think about it, I can’t remember the last time I’ve seen an ad that didn’t use it in some way.

It doesn’t matter how accurate my findings are. Nobody is going to care if I can’t communicate it. This media-centric era has people straying farther away from reading long-form texts. Oh well, I can’t do anything about that. All I can do is do my best to take these complicated ideas and condense them into an animation.

Drawing was never my strong suit, even my stick figures look like they have rickets. So this will be one of my big learning curves for this skill. As of now, perspective and light are the main elements I will focus on as a beginner.

This can also open new doors for storytelling. Plus I’d rather give y’all something to look at instead of a block of text.

I’m excited for this because this is my creative supplement. I REFUSE to strictly be a ‘numbers guy’.

Given that this is another vehicle, I’ll be using MD to recreate the physical characteristics of literally anything. This will also allow me to incorporate what I learn in physics, since it’s literally the study of motion.

To start off, I’ll most likely be doing human physicality; basic movement and sports.

I will be using Adobe Illustrator, Animate and After Effects.

Feedback Loops

Since this is technically an artistic medium, there is no right way to do MD. There is a lot of geometry and other numerical properties involved, but for the most part, MD is all about what ‘looks best’.

So this will be primarily dependent on organic feedback from the platforms I post my works on. Besides, I’m too self-critical to ever think that I’ve done something 100% right.

Programming

This is obviously the most intense aspect of data science. After looking through a variety of learning paths, I’ve concluded that Python, R & SQL are the top 3 languages used in data science.

FreeCodeCamp is my primary resource. I enjoy it because it holds your hand the least out of all the learn-to-code resources. Too many of them are drag & drop or too gamified. Real coding is not like that. I need to experience the struggle of my code not running because I missed a comma.

As of now, I have a basic understanding of HTML/HTML5 and CSS. I’m going through the visual design chapter now, so at some point I should be able to build my own website.

My goal is to complete the bootcamp within 3 months of me posting this site.

I also have numerous programming books and apps to reference to. Including Donald Knuth’s Art of Computer Programming: Volumes 1-4A.

Documentation

I have a crude Word doc that I’m using to create my own coding dictionary. At some point this won’t be feasible, so I’ll eventually have to move onto a platform intended for this, most likely GitHub since I already have an account with them.

Like any other language, there is grammar and syntax in coding. I must be on a perpetual state of revising my code and finding a way to simplify it. One big drawback of me self-studying this subject is I will be mostly unaware of industry practices when it comes to presenting my code. I will have to consult my peers who work in tech and have them critique my work.

Feedback Loops

Like I said with FreeCodeCamp, I love that it doesn’t hold your hand on the challenges they give you. Though it can be frustrating because sometimes there is no Google answer for your problem. This means I’ll have to resort to consulting with others (on my level and above) and share our notes on what to do. Is this not the exact same thing I’d be doing in college?

Computer Science

Though ‘data science’ is a more attractive title, if I want to set myself apart, I have to be proficient in data engineering too.

Science is the sexy part; analyzing trends and asking questions to build these elaborate models that answer those giant questions. Engineering is the unsexy ‘back-end’ part. Data science is driving the car, data engineering is building and maintaining it.

We are still in the early stages of data science. Many non-Fortune 500 companies are skeptical to what A.I. and machine learning is, so I will still continue to learn. But I need to keep in mind that most companies do not have the tools to collect the data that I would analyze.

I’m hoping that learning computer science will give me the ability to fix the car or build one on my own if needed. Nothing terrifies me more than breaking down in the middle of nowhere and depending on someone else to help me.

Feedback Loops

This is still a brand new subject to me. I have documents of a few computer science college curriculums as well as self-learning ones. I also have a few connections with people who work in tech, so it’ll be great to get their criticisms.

I’m definitely lucky that there are people who took the time to lay out these curriculums along with their practical experiences. This is the beauty of writing on the internet, we can learn from each other’s mistakes.

To start, I have two introductory computer science books, CODE by Charles Petzold and Don’t Teach Coding Until You Read This Book by Handley & Foster. The former published in 1999 and the latter in 2020, an almost perfect 20 years apart. I’m hoping these will give me a good overview of computer science from two different eras so I can observe its evolution.

Oh right, Film

This was the last vehicle I listed on my homepage.

This will be more of a side interest compared to my other subjects.

I have a deep admiration for the art of saying a lot by saying little. Film uses subtlety to communicate things we don’t observe in our naked reality.

As someone who seeks to understand reality, I’ve recognized that film is the medium that allows me to recreate it.

It’s a beautiful paradox: to truly understand reality, you must recreate it, yet to recreate it, you must understand it.

Making a film is the ultimate form of deconstruction as I explained with the Bill Nye video. You must take a subject, break it down to its tiniest bits, and build it back up to its complete state. This requires me to look at obscure factors that most people glaze over in everyday life.

Also as someone who has an interest in a broad set of subjects, film allows me to take all of this disparate knowledge and put it into one product.

I get to be equal parts artist & engineer. I get to use my vivid imagination while grounding it with my cold logical reasoning. It’s one of the few professions where I have to oscillate between the passionate idealist and the rigid pessimist; the concrete and the abstract. Once again, I loathe the idea of being good at only one thing.

I like to analogize film with cooking. There are 2 parts, filmmaking and film theory: Filmmaking is pots, pans, measurements, utensils, temperatures, basic tools you need to get a finished dish. Film theory is taste. Should this be sweet or sour? Crunchy or chewy? Tough or tender? Subjective qualities that vary by the audience (or eaters in this case).

I’m super proud that I’m one of the few aspiring directors who started with theory. Many student filmmakers can tell you all about what camera they’re using or how many fancy lights they have, yet if you ask them a simple question about their story, they freeze up.

I’ve acted in a few short films, and I’ve spent hours in industry forums reading about people’s experiences with bad directors. Every last flaw of a bad director, whether they’re too passive or strict, is a result of not understanding their story.

So for now, most of my film posts will be analyses and any tidbits I find pertaining to tools or the industry itself.

I’m currently writing my own script, aiming to shoot it by late 2022. I’m already seeing how much different it is from others, but of course you have no reason to take me serious on that from the outside looking in. I accept that, I know what I see.

Why Can’t I Do a Data Science Project Now?

This is the #1 question I will ask myself on a regular basis

  1. My programming skills are too weak. I REALLY wish I could just do the learn-as-you-go projects, but not having a basic understanding of coding would be too much of an uphill battle.
    • As I said, I’m currently going through the FreeCodeCamp bootcamp. This will give me a basic overview of what coding is. While Python is the most popular language in general, I will instead start with R on a separate platform, given that it’s one of the more difficult functional languages. SQL is also on my list too, but I’ll do that a little later since it’s technically a database language instead of regular programming.
  2. I have to learn how to ‘clean’ datasets. Like I said, real world data is messy, so there’s no use in getting spoiled by the mock ones already made up. As of now, I’m thinking I’ll have to start with the mock sets, but I will definitely have to go collect my own data and clean it. Google says a project takes 2 weeks-6 months, with 60% of the time spent cleaning.
  3. Have to get a little further in my math. I know of the formulas, but I don’t understand them enough to apply them to projects.

Starting Takeaways

I’ve been informally self-educating for the past 5 years. I’m mad at myself that it took me this long to make it, but this site is my way of making it official and showing that I’m not just another impractical know-it-all who does a lot of basic Googling. I will actually progress through subjects and not just tickle the surface for the sake of passing a test.

The top result for ‘types of data science projects’ shows 4:

  1. Exploratory Analysis
  2. Data cleaning
  3. Visualization
  4. Machine Learning

As a beginner, I’m going for quantity over quality. To stop myself from overcomplicating, my short-term goal is to do these 4 types in various domains and fail them as soon as possible.

Overall, these are my current goals to bring me closer to doing a project:

  1. Start learning R
  2. Gather my math concepts
  3. Of course continue the FreeCodeCamp bootcamp
  4. Learn about cleaning data
  5. Go through the Adobe tutorials for motion design

I was trying to decide whether I should do one post per subject or put them all in one post. Since I’m a beginner, I’ll start with one post for each subject since I’m in information-gathering mode. Otherwise, each post would be pages long.

I stopped studying a few months ago to focus on this site, so I’m rusty on most of it. My first few posts will be me regaining traction.

Idiotic Conclusion

The absolutely most dangerous attitude I must stay away from is assuming that I have all the right answers, no matter how valid my findings are

Many of the formally educated assume that they know everything there is to know just because they did the assignments and got their piece of paper. The piece of paper doesn’t automatically correct all those questions you got wrong, you still have blind spots. There is always more to be done and there is always something that was missed.

It’s a curse and a blessing that my problem is the exact opposite of what others face. Most people struggle to get the ball rolling, while I struggle with knowing when to stop the ball. It’s an infinite state of revising and self-critiquing on how I can make my work better. This can lead to analysis paralysis, but at least I won’t have any trouble getting started.

Best of all, even if I don’t precisely become a ‘data scientist’, I will still have been practicing practical math, computer science, and visualization full-time. I guarantee my understanding will be more fluid in doing this on my own than it would be had I depended on formal education. It’s not like I won’t have ANY use for these skills in the future

And for the record, this is not ‘easier’. I know us young people like to look for shortcuts to success. This is not a ‘make $100k in the first month’ type of gig. I am very aware that this is MUCH harder and I’ll be spending a lot of time wandering in the dark. I just know that this will be better for me in the long run.

Learning to understand > Learning to pass

Many of you will find this to be idealistic, and I admit a lot of it is. To demonstrate my self-awareness, I will list the questions that I don’t have the answers to yet.

  • How do I plan to apply to companies without a degree when most companies have a screening system that automatically rejects resumes without degrees?
    • Not only that, most hiring managers aren’t well-versed in their fields, so even if I am competent, how could I prove myself to them?
  • Even if I go for smaller companies, how will I know they will possess the resources to collect data?
    • I can’t analyze data if they don’t have any data to begin with.
  • Which part of data science will I go in to? I have to choose a branch.
  • Where will I get the datasets to do mock projects?
    • I have a couple resources, but I’ll still need to search for more. It’s not like college would supply me with any exclusive sets, they would have me going through mock ones too.
  • Not only what domain will I go into, how will I learn about this domain? I can’t just learn data science and go into any field, I have to understand the domain in itself.
  • How will I obtain the certificates that require a college degree?

For the last time, I’m aware that this is stupid. I’m aware that there are a lot of people who will turn their nose up at me for thinking I can teach myself something better than a trillion-dollar busi… I mean ‘non-profit institution’. Call me foolish all you want, I’m not making it up when I say that (most) formal education does not prioritize retaining knowledge, so excuse me for having some agency and wanting to ensure that I actually know what I’m doing. If I’m going to face the consequences of a decision, I’d rather it be my own than someone else’s. No one will be there to save me either way.

My biggest epiphany in life was when I stopped trusting the prestige of things, or ‘clinging to titles’. I used to believe that just because that something was established and everyone went along with it meant that it’s ‘good’. I’ve had too many slaps in the face to realize that the people who made these things aren’t any better than me or my peers. I’m no longer impressed by how long something has been around or how many pieces of paper claims that it’s SUPPOSED to be good, that does not make the thing good in itself.

For those who don’t know me, I’m not trying to be some rebellious kid who’s going against the grain just to look edgy. I truly used to believe that college was a place for the ‘highly intelligent’ and those who couldn’t get in were just too stupid. I feel like an absolute moron for ever subscribing to that belief. It’d be one thing if I was some reject who was bitter over not being able to get in, I have BEEN THROUGH THE SYSTEM ITSELF, IT IS NOWHERE NEAR WHAT OUR SUPERIORS HAVE MADE IT OUT TO BE. Please do your own research before you go 5-figures into debt just to prove you’re comfortable with not asking questions.

I’m operating off of 3 things that I know for a fact.

  1. Information is now the #1 commodity of the world.
  2. College does not prepare you for discerning information (in most cases).
  3. Technology isn’t going anywhere.

So life can only go 1 of 2 ways for me now…

#1. Worst case scenario, I have a piece of paper that shows I know how to shut up and do what I’m told. It may get me a job, but my competence will strictly be dependent on what I do in that job (which will likely not be that comprehensive). Therefore, education doesn’t truly start until you’re already in the fire with real-world consequences. Older folks, tell me it’s not frustratingly often that you have to deal with stuck-up college grads who have all the book smarts in the world just to be a complete dunce when it comes to practical tasks.

Or

#2. Worst case scenario, I’ve built a multitude of skill sets and can create tangible value. I’m not against being employed, but at least I won’t be a robot who’s only had one programmer. Even if I’m not as competent as I want to be, I’d have demonstrated initiative in starting and maintaining this site. Isn’t initiative the #1 quality employers look for?

You know how some employers will make you wait hours for an interview just to see how much BS you’ll put up with? The waiting in itself isn’t doing anything for you, you’re just brownnosing? I don’t see college as any different than that.

I don’t think that I’m above having to prove my discipline. I DO believe that I’m above doing meaningless work just to show good of a sheep I can be.

We’ll see how this goes.

One thought on “Study Intro

Leave a comment