I'm going to do something a bit strange, for someone who has sort of made "data collection" his brand. I'm going to tell you that you probably don't need the data you think you do.
In information security, the industry I've called my home for the last half a decade, there is a bit of an obsession with data. Everyone wants the most data, the freshest data and the most niche data. Hell, a fair bit of the industry is built upon billion dollar valuations for companies that essentially just repackage data for other companies.
Here's the problem, though. Most data is basically useless.
I know, 🔥 take there. But hear me out. If I told you that the temperature is going to drop 30 degrees over night, I'm giving you a piece of data. That data alone isn't super useful to you, a fun fact to wow about with particularly weather-obsessed friends, but not much more than that.
However, you being the forecast-savvy scientist you are, you know that when the temperature drops, it usually causes rain, so you decide to pack an umbrella in your bag before heading out for work in the morning. You also check the local weather channel to confirm the forecast, just to be sure.
What happened there? Well, you processed, enriched and exploited the data!
You combined your real life experience with the data I gave you (processing) to come up with an initial analysis. You then went to the weather channel to add more supporting data (enrichment) to confirm your analysis. Finally, you exploited the data, meaning you took the data and did something with it.
That last bit is vital, and it's the reason I say "you probably don't need the data you think you do."
I've met and been a part of so many information security teams and functions that have the data obsession I mentioned earlier. Though I'm fairly "new" to the entrepreneur/SaaS/startup landscape, I can see it here too. I actually listened to a great podcast with Chris Do and Neil Hoyne, the Chief Measurement Strategist at Google, who talked about this very subject:
Data is absolutely useless unless you can, are willing to and know how to exploit it.
If you're selling a course and want feedback from the students, but the feedback is overwhelmingly negative and you're too prideful to make any changes to your content, then the data you're looking for, the student feedback, is utterly useless.
If you're a solopreneur trying to do an analysis of millions of social media posts to find out how people feel about a certain problem you're trying to solve with a startup, that data is useless unless you know how to create scalable solutions to analyze the data you collect and act on it. It might even be more than useless, because you might waste so much time trying to collect and analyze the data that you don't even bother launching your startup.
If you are working in a marketing function of an organization and want to gather information about the viability of marketing your product in Russia, even though sanctions would currently bar your company from selling in Russia, then the data that you're gathering is completely useless because you can't action that data.
Web scraping and data science as it exists in public-facing business functions in general has become a ridiculous measuring contest with founders, data science teams, VC's, information security outfits and intelligence professionals spending the lion's share of their time comparing their data collection efforts and capacities, without any real effort to put on display their capability to action that data.
As for me, as someone spinning up a consultancy focused specifically on data collection and analysis for small to medium sized business and startups, I would much rather spend my time collecting data for a client that can actually exploit it. There is no purpose in me firehosing a bunch of data to a database that my client can't access or use. Subjectively, it would be incredibly unfulfilling work. Objectively, it would be a waste of money for my client and would not likely lead to much good in terms of word-of-mouth recommendations for my consultancy either.
So, before you decide to start siphoning off data to some server to brag to your friends and coworkers about how much data you have collected, ask yourself...
"Do I need this data? Can I exploit it? Can I analyze it? Is this the right kind of data for my business?"
If the answer is no to any of those questions, its best for you and for me as your subject matter expert to just let it alone and build your data collection plan first.
Interested in building data collection and analysis capacities for your small business or startup?
This is my bread and butter. I have built data collection functions at several billion dollar companies over the last several years. Let me empower your business with the data and analytics you need to succeed.
Reach out and let's talk:
mitch@secresearch.io