With data more critical than ever to companies’ success, Python is spreading beyond the realm of data professionals and being adopted by business analysts and other less technical users. But what are the opportunities if you’re relatively new to Python and what best practices should you be aware of to ensure your success?
Data professionals are a precious commodity and in many organizations the demands of the business have outgrown the resources and capacity of data teams. At the same time, business analysts are running into the limits of what BI tools can do for them and looking for ways to do more advanced analytics. Python is the key to success here.
In recent years, the Python community has created new frameworks and packages that make the language more accessible to non-professional developers for advanced analytics, machine learning, and app development. Examples include NumPy, an open source Python library for numerical data; Prophet, for running forecasts, and H3, a project begun at Uber for manipulating geospatial data.
Python’s spread to non-professional developers isn’t without precedent. A similar pattern played out with the rise of self-service BI tools, and with business people learning to script their own Excel macros. The expanded use of Python will be even more impactful because the language itself is so capable.
Getting started with Python analytics
Business users often understand better than professional developers what specific insights will be most helpful to their business units, and there are several entry-level use cases where they can start putting Python to work. Here are three examples:
A correlation matrix is a table that shows the correlation coefficients for different variables. This can allow you to analyze different dimensions of a data set to determine if a person who exhibits behavior A, for example, is also likely to exhibit behavior B. Correlation matrices are useful for determining which items to place near to each other in a grocery store, or which additional items to offer when an ecommerce user is checking out.
Principal component analysis
Another possible starting point is principal component analysis, which can reduce the size of a noisy data set and determine which attributes have the most predictive power for a given outcome. If a company sells mortgages, for example, a principal component analysis can reveal which demographic factors (income, ZIP code, marital status, etc.) are most predictive of a sale, helping to target campaigns and offers.
Another common problem for businesses is forecasting. Think of predicting customer demand, sales, or revenue, which all mature businesses need to do. Building forecasts is a way to explore predictive analytics, using open source libraries such as Prophet or Scikit-Learn in Python.
Great power, as they say, brings great responsibility, and there are best practices that new Python users should employ to ensure that the applications they build are robust and secure.
Python care and feeding
One issue is maintaining Python packages to ensure that dependencies are properly managed. Anaconda is helpful here, because it greatly simplifies package management and deployment. With Snowflake’s Snowpark for Python, we pre-install the most popular Python packages from the Anaconda defaults channel into our Python runtime so they don’t have to be installed manually. We’ve also integrated the Conda package manager into Snowpark to manage Python packages and their dependencies.
Like any data project, there are security and governance issues to be aware of, but modern cloud data platforms provide a runtime that is already set up and configured, and users can take advantage of the security and governance capabilities built into those platforms. For example, the Python runtime in Snowpark disallows external network access by default to protect against common security concerns such as data exfiltration. Using a pre-configured secure Python runtime like Snowpark is much easier for novice Python users compared to creating and maintaining your own environments or containers.
It’s early days still, and over time I expect additional Python tools and resources aimed specifically at non-professional developers to emerge. One area that needs to evolve is the methods by which Python users can share the outputs of their work with colleagues who don’t want to learn the language themselves. Snowflake’s purchase of Streamlit was intended in part to address this. The open source tool allows data teams to build applications that bring data to life visually for non-technical users. Python itself is a powerful language for building applications, so its use in building data applications for end users will make the language even more widely adopted.
To get started, RealPython offers a comprehensive beginner’s guide to Python, and Full Stack Python links to many resources here. The Python Software Foundation has an active community where experienced users provide advice and answer questions for all ability levels.
If you’re a Snowflake user, read about our Snowpark developer environment here, which natively supports Python development. You can also join one of the many Snowflake community user groups worldwide, which arrange meetups to discuss technical developments and opportunities.
Torsten Grabs is director of product management at Snowflake.
New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to [email protected]
Copyright © 2022 IDG Communications, Inc.