How to Deal With Units as a Data Scientist

Being able to manage units is critical, but most data scientists and developers have a loose understanding of them.

A unit of measurement denotes how a value is measured. For instance, something might be 4 pounds, 4 seconds, 4 inches, etc. All of these measurements contain the same value, but the units make the measurements fundamentally different.

When working as a data scientist or developer, it’s easy to ignore units. Unlike engineering and hard sciences like physics and chemistry, learning about units is often an “exercise left to the reader” (I’m guilty of the same thing in my medium post on the frequency domain). On top of that, dealing with units within a programming language is often more difficult than with pencil and paper. It can be compelling as a developer to avoid discussing units, but units really are a vital and fundamental topic when working in many domains.

Who is this useful for? Just about any data scientist, or anyone working with data

How advanced is this post? The topics discussed in this post should be accessible to anyone at any skill level with a firm concept of basic algebra

What will you get from this post? A conceptual and mathematical understanding of units, and a few ways of dealing with units as a developer or data scientist.

If you learn something, consider following and checking out my other work! Ok, let’s get started.

Unit Systems

There are different unit systems used throughout the world. The most common and widely used unit system is the International Unit System (SI), which features familiar words like meters, kilograms, and seconds.

There are other unit systems, which vary in popularity by region and discipline. Pounds and inches fall within the imperial system, which is used in the USA and a hand-full of other countries. A furlong, which is equal to one eighth of a mile, was originally derived from the distance a team of oxen could plow continually without resistance. furlongs are still used in horse racing and city infrastructure.

As you might infer, units can get messy, and fast. Often multiple units of measurement, from different unit systems, are used within a single problem. Keeping them straight can be daunting, and failing to do so can lead to wrong answers. In this post I’ll explain the intuition behind units, and teach you how to manage them.

Base Units

A base unit is the most fundamental unit in a unit system. These are the units that can’t be further broken down into some other units. In SI:

  1. m (meter): The fundamental unit of length

  2. s (seconds): The fundamental unit of time

  3. kg (seconds): The fundamental unit of mass

  4. mol (mole): The fundamental unit of particle count

  5. K (Kelvin): The fundamental unit of thermodynamic temperature

  6. cd (candela): The fundamental unit of luminous (light) intensity

  7. A (ampere): The fundamental unit of electrical current

You might be wondering, if they’re so fundamental, what’s up with the relationships between them? For instance, how are meters (m) related to seconds (s)?

There is no way to squash or stretch a second to equal a meter, but a meter is defined by the distance light travels in a vacuum during 1/299,792,458 of a second (source). In other words, you can not make two different base unit’s equivalent, but they can be related with certain universal constants, like the speed of light.

Unit Conversions

As previously mentioned, you can’t squash or stretch a second to equal a meter. However, you can squash and stretch a second to equal a minute. 1 minute = 60 seconds, easy.

if two units measure the same fundamental thing, you can easily convert between the two with division, multiplication, and occasionally addition and subtraction (like Celcius to Fahrenheit, for instance).

because the vast majority of unit conversion is done with multiplication and division, you can use a standard process to help wrangle your unit conversions. dimensional analysis is a complicated word for a simple system of converting one unit to another. The idea is to think of unit conversions as multiplying by a well chosen factor which equals 1.

Let’s use the conversion of seconds to hours as an example for dimensional analysis. Because 1 minute = 60 seconds, the ratio of 1min/60sec is equal to 1. The value of the numerator is equivalent to the value of the denominator. likewise 60 minutes is equal to 1 hour, so 60min/1hr equals 1. Because you can multiply any value by 1, and maintain the same value, we can multiply our conversion ratios to convert 1 second to hours

1sec = 1sec * 1min/60sec * 1hour/60min = 0.00027 hours.

it’s ok if you’re still confused, this isn’t really the best way to represent this type of operation. It’s much better to use dimensional analysis lines. We can draw a grid-like structure where all multiplication is done on the top, and division is done on the bottom.

The beauty of this representation is when you start crossing out units. for every instance of a unit that appears both at the top and the bottom, you can cross that unit out. If you’ve done your unit conversion correctly, only the unit you are converting to should remain.

For this example, the unit conversion is trivial. However, this method of organization becomes vital with more complex unit conversions.

Compound Units

A compound unit is a combination of multiple base units. The quintessential example for this is speed: miles per hour, kilometers per hour, feet per second, etc.

We can convert miles per hour to feet per second using dimensional analysis. A quick google search will reveal that 1 mile = 5280 feet. With that, we can plug everything into dimensional analysis lines:

Not every unit announces itself as a compound unit. Voltage, for instance, is actually a compound unit.

The same goes for units of force like newtons and pounds, units of motion like acceleration, unit’s for power like watts, and many other measurements. The relationship between some base and compound units can appear arbitrary and vague, but when you learn the relevant domain, the relationships are usually very elegant. Fortunately, you don’t have to understand the domain, you just have to be able to google base units and conversions.

Practical Unit Management in Code

using a programming language can be great for many things, but unfortunately most languages are not able to conveniently handle units. It’s really up to the developer to keep units in mind. This can be very difficult, especially within large and complex code bases.

These are a few tricks I’ve learned:

  1. Use units in variable names: instead of a variable name like length, you can use length_in. When converting between units, you can assign the same values in different units as fundamentally different variables. This is inefficient from a memory perspective, but can be useful when dealing with complex unit transformations within a complex codebase.

  2. Use Data Frames: Data frames, like Pandas Data Frames, encapsulate data into a tabular structure, and include certain labels like column names and hierarchical indexes. You can use objects like this to help separate certain types of data with different units.

  3. Bake units into your code flow: For unit heavy applications, it might be useful to build special classes to encapsulate units, and think critically about how those units flow through the functions, objects, and methods within your codebase.

  4. Convert to a single unit system: you can choose a unit system and stick with it. If you have several inputs to a system, you can do unit conversions at the point of ingestion to ensure compatibility.

Summary

and that’s it. You now understand how to use dimensional analysis to convert between units, and have an intuitive understand of unit systems so that you can easily look up unit definitions and conversions in the future.

Follow For More!

In a future post I’ll be describing several landmark papers in the ML space, with an emphasis on practical and intuitive explanations. I also have posts on not-so commonly discussed ML concepts.

Please like, share, and follow. As an independent author, your support really makes a huge difference!