0

I'm trying to run a flask restful server using a global object. Currently, I'm using Flask-RESTful extension, with the structure detailed bellow.

myapp
    |_ endpoints
        |_ myendpoint1.py
        |_ myendpoint2.py
    |_ __init__.py
    |_ config.py
    |_ db_data.py 

Each time someone accesses the endpoint defined in myendpoint1.py, I have to instantiate a KeyedVectors object [ref]. That would not be a problem if the instantiation was faster, but it takes almost 3 minutes to load a 2.5Gb file. This object is quite simple, it just loads the file as some sort of lookup matrix, with some extra features. This is a static object, i.e., no modification can be done in it.

Initially I thought about instantiating KeyedVectors object as a global object in config.py (See my config.py file bellow), and that does work in Windows + PyCharm (running in 156 ms).

But, for some reason, when I run it in Ubuntu using Gunicorn nothing happens, literally. I get a time-out message and I don't even know how to debug this. And this happens not only in myendpoint1.py, but in every endpoint that I have.

config.py

from flask import Flask, g
from flask_marshmallow import Marshmallow
from flask_sqlalchemy import SQLAlchemy
from gensim.models import KeyedVectors
import myapp.db_data as db_data

app = Flask(__name__)

# Configure the SqlAlchemy part of the app instance.
app.config["SQLALCHEMY_ECHO"] = True
app.config["SQLALCHEMY_DATABASE_URI"] = db_data.postgres_url
app.config["SQLALCHEMY_TRACK_MODIFICATIONS"] = False

# Create the SqlAlchemy db instance.
db = SQLAlchemy(app)

# Initialize Marshmallow
ma = Marshmallow(app)

# My global KeyedVectors object.
global_word_embeddings_model = KeyedVectors.load_word2vec_format(path_to_embedding_matrix,
                                                                 binary=False,
                                                                 limit=None)

init.py

from flask_restful import Api

from inforadar.config import app
from inforadar.endpoints.myendpoint1 import MyEndpoint1 
from inforadar.endpoints.myendpoint2 import MyEndpoint2

api = Api(app)
api.add_resource(MyEndpoint1, '/api2/myendpoint1')
api.add_resource(MyEndpoint2, '/api2/myendpoint2')


if __name__ == '__main__':
    app.run()

Any ideas on how to debug or to solve this?

revy
  • 637
  • 1
  • 10
  • 25

0 Answers0