Social media analysis with Flask, Part I
Flask application presenting social media accounts analysis in form of dashboard. It transforms various data sources into clear and concise report. This part describes backbone of the application, which is basic Flask configuration, Travis Continuous Integration (with tests) and Heroku Continuous Deployment to multiple environments with a single button. It is under version control, tested and CI/CD ready.
Series consists of:
- Social media analysis with Flask, Part I (Setting environment, flask, Travis CI/Heroku CD )
- Social media analysis with Flask, Part II (Templates, login/register mechanism, data storage)
Part I
- Application description
- Setting up a development environment
- Basic file structure
- Documentation
- Setting Flask application
- Heroku deployment
- Continuous integration
- Summary
Application description
Web application displaying summary of the social media account analysis. Analyzed properties include:
- time-line statistics
- posts semantic analysis
- account followers list (names, number of connections)
- account friends list (names, number of connections)
- followers graph
- friends graph
- friends/followers cross-check between different services
Roles specification
Story to implement: As a user I want to see results of my social media account analysis in form of dashboard page. Steps:
- I go to “/” page
- I see “Landing page” with “sign in” and register” buttons
- I press “sign in” button
- I am redirected to “/signin” page
- I enter credentials
- I am redirected to “/dashboard/user”
- I press Register” button
- I am redirected to “/register” page
- I enter credentials
- I am redirected to “/dashboard/user” page
- I see “Connect
account" button, where SocialMediaName is i.e. Twitter, Facebook, GitHub or LinkedIn - I follow oAuth flow authorization
- I get redirected to “/dashboard/user/[SocialMediaName] page
- I see :
- timeline statistics (number of my posts, % of posts wit response etc.)
- general assessment of all my posts (positive, neutral, negative)
- social network analysis i.e. followers and/or friends statistics (counts of both groups)
- a drop-down menu to display list of my:
- if applicable - followers. Each contains:
- A name i.e. Twitter handle, like “@foo” or Facebook screen name
- A number of followers following this person
- A number of my followers, that this person follows (example, Bob and Rick follow me, Andrew follows both of them, the number is 2).
- if applicable - my friends, each contains:
- A Twitter handle, like “@foo” or Facebook screen name
- A number of friends of this person
- Common friends number
- if applicable - followers. Each contains:
- followers graph (users as nodes, connections as edges)
- friends graph (users as nodes, connections as edges)
- I can export chosen list to the JSON or CVS file
- I can export chosen graph to JSON or CVS
- I can log out from the session
- When I log out I am directed to “/” page
- When I am logged in I can see my user settings link and display it in “/user/name” address
Other requirements
- Application should be Heroku-deployable (please provide a link to the working deployment in the README).
- Application should be documented.
- Quick-start guide to deploy own version of the application
- User manual how to operate deployed application
- Application should be tested (min 90% test coverage)
- Application should be Version controlled (git)
Setting up a development environment
Make sure that Python is installed. After Python we have to make sure we have all necessary packages. Simple pip list
will show all installed packages. New Python distributions come with basic package manager pip
and tools helping in packages distribution: setuptools
and wheel
.
Another very important tool is virtualenv
, which will help us to deal with dependency issues. Virtualenv makes it easy to control which versions of Python as well as the third-party packages are used by the project. Another consideration is that installing packages system-wide generally requires elevated privileges (sudo pip install bazz). By using virtualenvs, you can create Python environments and install packages as a regular user. But first virtualenv must be install in your system Python:
Virtual environments management
Now that we have the proper tools installed, we are ready to create our first Flask application. First of all we need virtual environment specific for our application. It is a good practice to separate environment’s working directory (with all installed packages) and application folder, simply to avoid putting heavy folders under version control. Some virtual environments can have hundreds of MB and there is no use to keep track of them. Alternatively we can create virtual environment in projects folder and add this folder to the .gitignore
file. What I usually do is to create separate .envs
folder in user’s home directory and keep all virtual envs there.
Now we can activate md_analytics
virtual environment:
From now on, whenever we want to work on our app and environment is not activated, we have to activate it first. Changed prompt will inform us that now we use “local” python. All packages installed in active environment will be connected to it. To deactivate environment simply type source deactivate
.
Basic file structure
It is not required from Flask application to follow particular folder structure like Django application. Only thing is, files should keep names understood by Flask. Although it is not required, it is advised to structure app a bit to ensure maximum modularity and speed up development, testing etc. Our app structure comprises of three large parts:
- docs - package documentation linked to ReadTheDocs
- app module - the app itself wih login, register and dashboard , user components
- requirements - dependencies for each environment
- tests - suit of test for entire app
- boilerplate files for version control, coverage metrics and deployment
General structure is listed below:
Most of the files in this make up are self-explanatory. There are some files like app.json
or Procfile
which are elements of Heroku machinery and will be described in detail later. Rest belong to the elements of VCS or CI and ensure smooth development. Table with files description is given below:
File | Description | |
---|---|---|
app.json | Orchestrates steps involved in automatic deployment to Heroku. | |
Procfile | Declares commands run by application on the Heroku platform. | |
manage.py | Entry-point for executing our application | |
.travis.yml | Instructions for TravisCI service | |
docs/conf.py | Sphinx configuration | |
app/app.py | Flask app factory | |
app/settings.py | Flask application configuration variables |
GitHub
We use git to version control. Basics of git are beyond the scope of this article. Please refer to excellent git tutorial. Here I would like to focus on GitHub aspects related to project management. GitHub offers issues and projects support. Issues are more about specific bugs or features reported by users/developers, while project shows general status of the work progress. Issues can be grouped in milestones, which usually denote specific software release. Very useful for time management.
Here are some issues, which are “enhancements” of the app from current version v0.0.2 (working skeleton integrated with Travis and Heroku) to the pre-released alpha with landing page and dashboard functionalities v0.0.3.
Documentation
Project’s documentation is place in docs
folder and is based on excellent Python Sphinx
module. Folder is linked to ReadTheDocs, so that every change in GitHub repository initiates documentation rebuild. Hearth of the documentation is conf.py
file, where all extensions and static pages generator properties are configured. To manually start rebuild type:
Creating basic documentation stub out of the box.
Following quick-start help, lets build basic html documentation:
Content is written in reStructuredText markup language. For more check here.
Setting Flask application
Setting flask app is a multi-step process, which can be automated with some awesome python tools like cookiecutter, but here we will go step by step to learn it in detail. First we have planed basic application structure, now let’s install Flask and test installation, then create automatic configuration mechanism, semi-automatic dependencies installation and finally continuous integration and deployment of the app on TravisCI and Heroku. All tested and under version control (on GitHub).
Install Flask
Activate virtual environment and install Flask
locally (command bellow will install this package and it’s dependencies):
Flask and its dependencies were installed. It is good to keep requirements in separate file, to ensure all necessary packages can be installed automatically with a single command and to not pollute application directory.
Currently we have enough to run basic flask application. To test our environment lets write “Hello Flask” tryout. It will use basic routing mechanism to show “Hello Flask!” on the page. First Create app
package (with __init__.py
) and app.py
module in it. In the app.py file type:
app/app.py
This “toy” code creates instance of Flask
class which is the central object in any Flask project. It has all utilities necessary to start dynamic WSGI app. Running it will initiate local server with single route “/” answering all requests. Name of the view is index, since it is default name, that is looked up by all browsers. When we enter domain address (127.0.0.1:5000 in this case) flask will make sure that index
view is presented in response. Index view is whatever stands after index()
function return statement. If statement at the end of the file is Python convention that ensures that the app will run properly when it is called as a Python script from bash.
… should start server on http://127.0.0.1:5000/ saying Hello, Flask! Main page was served properly (code 200) and because we do not have favicon.ico
file (yet!), browser could not find it, hence 404 error code. Although it is possible to start Flask application this way, we will use different mechanism in “real app”. This part was just checking if there are any issues with our environment and Flask installation. Luckily it worked! We are ready to create “real” application and real unit tests.
Requirements
Requirements help speeding up application deployment. When source code is cloned from the pubic repository, all application dependencies are grouped in requirements.txt
file. Moreover CI services (including Heroku) need requirements file in application root folder to run app properly. The requirements.txt
in our application contains link to requirements folder with several files specifying separate sets of dependencies for different environments in which application can run. Link leads specifically to the requirements/prod.txt
. However during development we will install packages listed in requirements/dev.txt
, which of course includes production dependencies:
Now we will link requirements/prod.txt
to the requirements/dev.txt
, so that installing development dependencies will also install all required in production (which means when application is accessible for users).
requirements.txt
requirements/prod.txt
We will add more dependencies while app grows.
requirements/dev.txt
Installing requirements is easy:
Calling main requirements is enough to get application up and running, however for development other packages may be needed (i.e. debug toolbar). In development environment it is better to call:
Configuration management
Configurations are kept in settings.py
in main app folder. It will store different collections of application settings. Basic config class sets just some values common for all dev and production environments. Environment specific settings are set in children classes.
app/settings.py
We can always add other values like API keys later.
For now I will input configuration manually, but having configuration classes we can tell flask app to manage it dynamically. For example, to set environmental variable APP_SETTINGS
which value will be dependent on the environment our app is running. On Heroku server it will be ProdConfig
on development environment it will be DevConfig
and so on. To set environmental variable in Linux type: export APP_SETTINGS=<NameOfClass>
(i.e. export APP_SETTINGS=DevConfig). To make it permanent we should place export statement in ~/.bashrc
file. On Windows instead of export we will use set APP_SETTINGS=<NameOfClass>
. If we are going to employ environmental variable governing type of configuration, our app.py
should change.
Flask app factory
We will use app factory pattern and Flask-Script
extension to build our app foundations. The md_analytics/app.py
contains Flask factory, which is just a simple function wrapped around Flask object creation. File manage.py
from root directory contains application manager with all command-line directives. This way we can create multiple instances of the app, using different parameters (i.e. app settings). It is time to expand our toy “Hello Flask!” example to employ this pattern.
app/app.py
This application still doesn’t do much, except displaying ‘Welcome to Social media analytic tool’ from index view. It takes production configuration as default, however here we specify to use development environment instead. In next part we will swap <h1>
header to landing page and dashboard blueprints. For now simple text is enough to test page routing logic, settings management and deployment.
manage.py
File manage.py
uses Flask-Script to register command-line tasks outside web application (from bash level). There are several build in commands like Server()
, which runs the Flask development server. Still manage.py
is not the only way to run our app locally. We can use HerokuCLI to do that (see Heroku deployment). Two additional methods are show-urls
and clean
. First will show all endpoints for our app, the second will clean all app folders from __pycache__
folders.
Test Flask app
It is time to write some unit tests. This should also clear situation with continuous integration service, which requires test to run smoothly to finish all green. Lets update manage.py
to run some tests from command-line. First locate test folder in relation to manage.py
and write simple function running tests using excellent Python library pytest
. Add following snippet to manage.py
:
manage.py
This allows to call python manage.py test
from the app root directory. Result depends on battery of tests we currently have. Pytest convention places unit-tests in tests
directory, which resembles structure of the tested application. Also files should have “test” phrase in the name so that pytest can collect them without any problems. For now we will write three simple tests for our configurations:
tests/app/test_settings.py
Now we can call test suit via app manager and see results:
We can alway launch tests manually:
Additionally we can add --cov=app tests/
flag, which indicate source code location and tests location for test coverage package. Flag instructs pytest to create test coverage report, which can be further used by third party software (i.e. Jenkins or special services like coveralls.io).
Having all command line features integrated into one common interface is alway a good thing and helps other developers to understand code and join the project faster. However at this point our application will be tested on TravisCI server using command from .travis.yml
(see here).
Heroku deployment
For deployment we will use Heroku service. It is good practice to stage changes first and check consistency and after acceptance tests pass, to promote staged application to the production. Production is “live” application accessible for users. To achieve that we have to create two separate applications. Heroku implemented “quick deploy” button, which is specified in root directory. File app.json
contains all information necessary to copy files and deploy application.
app.json
Another file, we will need is Procfile
. It instructs Heroku platform how to run our app. Because Heroku allows to deploy php, js, java, ruby and python applications, we have to be specific what software should be used by platform. We have to be specific also about the location of our running function. We do not need development server on Heroku, therefore we will point it to the place, where WSGI will pick up code execution.
… and because gunicorn works only on “nix” systems, we need simple app call for windows (in case we want to develop on Windows machine):
Both should produce following output:
App creation can be done via Heroku web interface or command line tool, but first we need to make sure HerokuCLI is installed. Type:
Piece of advice for Fedora users: from my experience it is better to install Heroku official, standalone version, instead of Heroku Ruby gem. It causes less problems.
After successful HerokuCLI installation we can proceed and use it to create two separate applications. I called mine: md-analytics-stage
for staging environment, where app undergoes “acceptance tests” and md-analytics
for production environment (app released for users). We will do it in three steps:
- Create md-analytics and md-analytics-stage apps
- Create md-analytics pipeline (group of applications sharing same codebase) and add apps to it
- push files manually to Heroku (later CI system will deploy app to the staging environment upon success. Then after inspecting it app can be manually promoted to
md-analytics
.
I assume we have an application running on local machine. Lets create applications on Heroku service and bind them with a pipeline. First login to Heroku service:
Pull GitHub repository. Later on files will be pushed to the created Heroku application manually (once we configure Travis, it will be done automatically):
Create apps:
Order in which apps were created is relevant, because HerokuCLI will add extra remote address to our app’s git. Address will point to the app repository on Heroku, which was created last and we want it to be stage (remember that production promotion is done manually).
As we can see in addition to the original GitHub origin
repo, we have link to the heroku
.
We can also check present apps:
When we open our apps in the browser: http://md-analytics-stage.herokuapp.com,m we would see empty apps created by Heroku service. All we have is default Heroku message, that our WSGI python app was launched successfully:
Now we have to set environmental variables and upload application files to the remote location:
To upload files directly from local master branch to remote heroku repo type:
And voila! App is running in our stage environment:
After creating apps we can bind hem together in single pipeline. To do it from HeokuCLI we need additional plug-in. Lets install it:
Now lets create new pipeline and point specific apps to their environments (HerokuCLI provides nice text interface):
And add production app, choosing production stage this time:
Because app was launched without errors, I will promote staging environment to production.
We can also connect Giuthub repository to activate automatic deployment options (with “wait for CI” and “automatic deployment” ticked). If we check Heroku dashboard now we should see:
Unfortunately promoting app does not copy environment’s variables, therefore we have to set app specific variables for production as well:
Now both - staging and production environments run Flask app flawlessly.
Continuous integration
We will use TravisCI, but it is also possible to build automatic system with GitLab, Jenkins or BuildBot. There are (or will be) separate articles on this blog describing specific CI/CD options.
To set TravisCI we have to place .travis.yml
file in the application root directory. It will contain instructions for TravisCI service what language we use (Python), which versions (2.7, 3.6 and nightly build), how to test application, how to create basic metrics (like test coverage, which will be further processed by Coveralls.io service) and finally how to deploy it to the Heroku upon successfully finishing all steps (for detail description of Heroku deployment, see here). “.travis.yml” file will look similar to this:
.travis.yml
YML files are designed to be simple to read by humans. Options are defined using key: value
pairs with nested structure. I suppose, the only mysterious part is deploy -> api_key
. Lets examine this entry. Once registered to Heroku, I was given API key (Heroku -> account settings). It is unwise to paste your key just like that to the .travis.yml
. Therefore we will use Travis ruby gem to encrypt it (must have Ruby installed):
This appended .travis.yml
file with encrypted API token read by HerokuCLI. The only thing left to be done is to connect md_analytics
GitHub with TravisCI service:
Now when we upload new commit Travis will sense changes and run all tasks described in YML automatically. In this case it should perform tests on three python instances (2.7, 3.6 and 3.7-DEV). In all Python environments it should install requirements and run tests (including test coverage report). Upon success it should deploy our app to the staging environment on Heroku service.
And it failed… We can inspect build console log and track reason. Looks like tests run great, but there is a problem with Heroku deployment. Some error in API key encryption. We used travis ruby gem to extract authorization token and encrypt it with: travis encrypt
. Obviously it did not go well.
All right. For some reason $(heroku auth:token)
returned strange data (need to read more about Heroku authentication mechanism). I have some “initialization” scripts connected to the console and they might have interfered with travisCLI. I had to work around it and provide HEROKU API TOKEN directly:
… which fixed deployment problem. According to this GitHub issue another approach could be creating environmental variable HEROKU_API_TOKEN
on Travis an paste secret token there.
Summary
What we have is a Flask application with complete continuous integration and continuous deployment work-flow up and running. Basic elements of the system are:
- Code repository placed on GitHub
- TravisCI continuous integration running pytest and passing app further
- Two deployment environments on Heroku: stage and production grouped in one pipeline
To continue: