Django Upload Excel File to Database

Simple Excel Parsing to Database in Django with Pyexcel-xls and Pyexcel-xlsx

9cv9 official

…unless you finish this read :)

What is Pyexcel?

Pyexcel is a library which provides functionalities to read, manipulate, and write data in various Excel formats. In this article, we'll talk more about pyexcel-xls and pyexcel-xlsx. The two of them are tiny wrappers of pyexcel which provide special functions for .xls and .xlsx format, which obviously represented by their names.

That's pretty awesome, tell me more about it!

Using these libraries, we can get data from xls a n d xlsx formatted files. Their functions accept file objects as their input parameters, and not file paths. This feature is pretty good, especially when you get your files from users using the POST method in Django. With the POST (or specifically FILES) method, files uploaded by users will be kept in memory, without any physical address. In other words, if we want to manipulate or read users' uploaded files, it'll be harder if we've functions that need file paths as its input. In addition to reading data, we can also make and save our own data to .xls or .xlsx files. Nevertheless, in this article, we'll focus more on getting data from files and "parse" them to our Django models database.

Understood. Is there any specific excel template?

Actually, there isn't. By default, pyexcel-xls and pyexcel-xlsx will read data from the first column it found any data until the last column that contains any data, row by row, until the last row where it can find any data. But, to make our job easier, I'll recommend this kind of template:

The first row is the header, which gives users information about data in each column. The "No" column is unexpectedly important, especially if we have foreign key fields in our models. For example, we have this kind of models:

            class Company(models.Model):            KIND = (            (0, "Corporate"),            (1, "Start Up"),            )            name = models.CharField(max_length=50)            address = models.CharField(max_length=100)            kind = models.IntegerField(default=0, choices=TYPE)            class Job(models.Model):            name = models.CharField(max_length=50)            salary = models.DecimalField(max_digits=8,null=True)            company = models.ForeignKey(Company, on_delete=models.CASCADE,            blank=True, null=True)          

The template which we recommend for that kind of model is like this:

Sheet Name: Company

Sheet Name: Job

Pay attention to the last column in the second sheet (Job sheet). We'll use that column to link between Company model and Job model. For every job that belongs to company 1, fill the "Company No" column with 1, and so for other jobs.

Wait, I have choice fields in my models. How to make sure the data will be valid?

Microsoft Excel provides a solution for this question. Firstly, place your cursor in cells or columns which you need the data to be validated. Then, in Data Tools section, in Data menu tab, you could find Data Validation menu. Click on that menu, choose List on Allow dropdown, and fill the Source field with the choices for the column.

You can save the choices in cells and then select the cells, or you could also fill the choices (comma separated) directly in the field. If the choices are only integers, you can also choose Whole number and fill in the minimum and maximum fields.

Another tip: to let your users know what data should be filled in your form, you can also use Add Comment in your header cells.

My template is ready! Where should I start in Django?

Okay, before starting on anything else, you need to first know the basics of Django web programming. we won't talk about models, serializers, views, in detail here. First, you need to make a new Django project with an app, let's say we call it: sample_app. In that app, you have to make models stated on the previous section in models.py.

To use pyexcel-xls and pyexcel-xlsx, we need to install them using these commands:

            pip install pyexcel-xls            pip install pyexcel-xlsx          

After that, make a simple view to upload your excel file. You can refer to this link to make a simple html template view, or you can also refer to our previous article about modifying Django Admin to add upload file section to your admin page. Make sure you name the file input 'files'.

All set up! Should we start making the parser?

Change your form action in your view to redirect to your specific url, for example: /parse-excel/. We'll link that url to our view after this. Now, in our app directory, create a views.py file which imports Company and Job models from models.py. On top of that file, add these lines of code to import pyexcel-xls and pyexcel-xlsx:

            from django.shortcuts import redirect            from pyexcel_xls import get_data as xls_get            from pyexcel_xlsx import get_data as xlsx_get            from django.utils.datastructures import MultiValueDictKeyError          

After that, make a class called ParseExcel which extends APIView and has post function:

            class ParseExcel(APIView):            def post(self, request, format=None):          

In the post function, we'll get the files uploaded by the user using request.FILES['files']. We also need to catch the MultiValueDictKeyError exception, in case the user uploaded nothing:

            try:            excel_file = request.FILES['files']            except MultiValueDictKeyError:            return redirect(<your_upload_file_failed_url>)          

We've got our file! Now, we need to know whether it is .xls or .xlsx file. We can do it using Python's string split function. If it is .xls file, we parse it using xls_get function, and xlsx_get if it is .xlsx. We limit the column to 4 using column_limit to avoid junk data outside our tables. Let's say we save the parsed data in a variable data. In case the file extension is not .xls or .xlsx, just go redirect to your upload-file-failed view. Your code should look like below:

            if (str(excel_file).split('.')[-1] == "xls"):            data = xls_get(excel_file, column_limit=4)            elif (str(excel_file).split('.')[-1] == "xlsx"):            data = xlsx_get(excel_file, column_limit=4)            else:            return redirect(<your_upload_file_fail_url>)          

The data is JSON variable. We can get Company and Job sheet using this codes:

            companies = data["Company"]            jobs = data["Job"]          

Now, let's make objects from the data we've already got. First, we'll try to make our companies objects. Because pyexcel only read a row until the last column with data, we need to check every single row whether they contain all the columns or not. In this article, we check the length of every row, and if the length of a row is less than it needs to be, we append empty strings to that row. The code should look like:

            if (len(companies) > 1): # We have company data            for company in companies:            if (len(company) > 0): # The row is not blank            if (company[0] != "No"): # This is not header            # Fill ending columns with blank            if (len(company) < 4):            i = len(company)            while (i < 4):            company.append("")            i+=1            # Check if company exist            # Assume that company name is unique            c = Company.objects.filter(name=company[1])            if ( c.count() == 0):            Company.objects.create(            name= company[1],            address= company[2],            kind= company[3]            )          

Our Company parser is done. Now, how about the Job one? Inside if (len(companies) > 1) section, you can add this code to parse jobs data.

            for job in jobs:            if (len(job) > 0): # The row is not blank            if (job[0] != "No"): # This is not header            # Get company that own this job            comp_id = int(job[-1])]            comp_name = companies[comp_id][1]            comp = Company.objects.filter(name=comp_name)            if (comp.count() > 0): # Company exist            Job.objects.create(            company=comp[0],            name= job[1],            salary= int(job[2])            )          

On the bottom of your ParseExcel function, don't forget to return to your upload-file-done url.

            return redirect(<your_upload_file_done_url>)          

And, our parser is done! Your full code for views.py should look like this:

            from sample_app.models import Company, Job            from django.shortcuts import redirect            from pyexcel_xls import get_data as xls_get            from pyexcel_xlsx import get_data as xlsx_get            from django.utils.datastructures import MultiValueDictKeyError            class ParseExcel(APIView):            def post(self, request, format=None):            try:            excel_file = request.FILES['files']            except MultiValueDictKeyError:            return redirect(<your_upload_file_failed_url>)            if (str(excel_file).split('.')[-1] == "xls"):            data = xls_get(excel_file, column_limit=4)            elif (str(excel_file).split('.')[-1] == "xlsx"):            data = xlsx_get(excel_file, column_limit=4)            else:            return redirect(<your_upload_file_fail_url>)            companies = data["Company"]            jobs = data["Job"]            if (len(companies) > 1): # We have company data            for company in companies:            if (len(company) > 0): # The row is not blank            # This is not header            if (company[0] != "No"):            # Fill ending columns with blank            if (len(company) < 4):            i = len(company)            while (i < 4):            company.append("")            i+=1            # Check if company exist            # Assume that company name is unique            name = company[1]            c = Company.objects.filter(name=name)            if ( c.count() == 0):            Company.objects.create(            name= company[1],            address= company[2],            kind= company[3]            )            for job in jobs:            if (len(job) > 0): # The row is not blank            if (job[0] != "No"): # This is not header            # Get company that own this job            comp_id = int(job[-1])]            name = companies[comp_id][1]            c = Company.objects.filter(name=name)            if (c.count() > 0): # Company exist            Job.objects.create(            company=c[0],            name= job[1],            salary= int(job[2])            )          

Now, what we need to do is just link this view to our url. To do that, we need to write in our project's urls.py this code:

            from django.urls import path            from sample_app.views import ParseExcel            urlpatterns = [            path('parse-excel/', ParseExcel.as_view()),            ]          

That's it! Now, test your app by uploading some excel files and then check your database to make sure the data you've uploaded are nicely populated there.

Conclusion

To sum up, pyexcel is a Python library that sets up great conditions for developers who want to read, manipulate, and write excel formatted data. Using this library, especially its wrapper for xls and xlsx files, we can make a simple Excel parser to populate data from Excel to our Django database. This Excel parser easily enables admin users who don't have any programming knowledge to rapidly add big amounts of data to a Django app easily.

Django Upload Excel File to Database

Source: https://medium.com/@9cv9official/simple-excel-parsing-to-database-in-django-with-pyexcel-xls-and-pyexcel-xlsx-765aed49aaa9

0 Response to "Django Upload Excel File to Database"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel